Snowplow 0.8.3 released with unstructured events
trackUnstructEvent(). The Clojure Collector is also bumped to 0.5.0, to include some important bug fixes.
In the rest of this post, then, we will cover:
- What are unstructured events?
- When to use unstructured events?
- Roadmap for unstructured events
- Getting help
Custom unstructured events are user events which do not fit one of the existing Snowplow event types (page views, ecommerce transactions etc), and do not fit easily into our existing custom structured event format. A custom unstructured event consists of two elements:
name, e.g. “Game saved” or “returned-order”
- A set of
name: valueproperties (also known as a hash, associative array or dictionary)
You might recognise what we call custom unstructured events from other analytics tools including MixPanel, KISSmetrics and Keen.io, where they are the primary trackable event type.
Custom unstructured events are great for a couple of use cases:
- Where you want to track event types which are proprietary/specific to your business (i.e. not already part of Snowplow)
- Where you want to track events which have unpredictable or frequently changing properties
Note: because unstructured events are not currently processed by the ETL and enrichment step, or added to storage, we recommend using custom structured events for custom events types, assuming that you can fit your events into our custom structured event schema.
trackUnstructEvent(name, properties) function.
Here is an example taken from our codebase:
There are two components to upgrade in this release:
- The Clojure Collector, to version 0.5.0
This release bumps the Clojure Collector to version 0.5.0. To upgrade to this release:
- Download the new warfile by right-clicking on this link and selecting “Save As…”
- Log in to your Amazon Elastic Beanstalk console
- Browse to your Collector’s application
- Click the “Upload New Version” and upload your warfile
We are well aware that this release is only the start of adding custom unstructured events to Snowplow.
It makes sense to work next on extracting unstructured events in our Enrichment process; unfortunately this is not trivial, because our Enrichment process currently only outputs to Redshift, and Redshift has no support for JSON objects or maps of properties, which we would need to store the unstructured event properties.
Therefore we are exploring two different strands:
- Storing Snowplow events in Avro. Avro is a rich data serialization system that will allow us to store the unstructured event properties within the event object. Initially, you would be able to query these Avro-serialized events using a range of tools on Hadoop including Pig, Hive, Scalding and Cascalog. It should also be relatively straightforward to load these events into NoSQL databases such as MongoDB. We would then work on mapping the Avro events into Redshift
- Storing Snowplow events in PostgreSQL. Postgres has a JSON datatype, although the querying capabilities on that JSON datatype are so-far very primitive. Nonetheless, it should be possible to at least store the unstructured event properties in an appropriate JSON field in Postgres
If you have a preference for one of the two above options, or a suggested third approach, then get in touch and let us know as soon as possible, as we are thining through these alternatives now.
Please keep an eye on our Roadmap wiki page to see how Snowplow’s support for unstructured events evolves.
And if you want to find out more about the syntax for
trackUnstructEvent, do checkout our Snowplow Unstructured Events Guide, which was also published today.