Snowplow 0.6.5 released, with improved event tracking
We’re excited to announce our next Snowplow release - version 0.6.5, a Boxing Day release for Snowplow!
event field which specifies what type of event it is. This should be really helpful for a couple of things:
- It should make querying Snowplow events much easier
- It should make Snowplow event data a better fit for JSON-oriented datastores such as MongoDB and Riak
As well as event types, in this release we are also introducing event IDs. With this, the ETL phase adds an
event_id UUID (universally unique ID) to each event row, which should help with subsequent querying.
Here is a taster of how Snowplow event data looks with the new event types and event IDs:
These are not the only improvements in this version - here are the rest:
- We have cleaned up the code for on-page activity tracking (“page pings”)
- We have fixed a bug that affected ad impression tracking - thanks Alan Z!
- The ETL no longer dies if a raw event has a corrupted querystring (e.g. the
&refr=parameter was not escaped)
snowpak.sh, now has a combine-only option (no minification), which is helpful for testing purposes
attachUserId(boolean), which can be used to stop the tracker sending a
- We have added the ability to override the IP address by passing in an
&ip=parameter on the querystring
Below, we first explain how to upgrade, before taking a brief tour through these updates:
Upgrading is a two-step process:
If you are using EmrEtlRunner, you need to update your configuration file,
config.yml, to use the latest versions of the Hive serde and HiveQL scripts:
:snowplow: :serde_version: 0.5.3 :hive_hiveql_version: 0.5.3 :non_hive_hiveql_version: 0.0.4
That’s it! You don’t need to make any changes to your Infobright setup, assuming you are up-to-date with previous releases.
1. Event types
event field which specifies what type of event it is. Currently we have six different types of events, which are set out in the table below:
|Page view|| || |
|Page ping||None (automatic)*|| |
|Custom event|| || |
|Ad impression|| || |
|Transaction|| || |
|Transaction item|| || |
* for more information on on-page activity tracking, please see the relevant section later in this blog post.
This new event field should make it much easier to query Snowplow data by the type of event. For example, to retrieve the number of e-commerce transactions per day:
We will be updating our Analytics Cookbook to use the
event field to simplify queries where possible.
2. Event IDs
As stated above, the Snowplow ETL now attaches a unique ID to each event - specifically a type 4 UUID. This new
event_id is much more unique than the existing
txn_id is currently unused, but we may eventually use it to check for duplicate events
txn_id introduced prior to the ETL, see issue 24 for more details).
You can use the new
event_id field to uniquely identify individual events in your event store, and of course to count distinct events. For example, to count the number of page views by day, we simply execute the following query:
We will be updating our Analytics Cookbook to use
event_id in any examples which currently (erroneously) use
3. On-page activity tracking
setHeartBeatTimer() inherited from
piwik.js, and introduced a new function,
With activity tracking enabled, “page pings” are sent to Snowplow every
heartBeatDelay seconds, as long as the visitor remains active (moving the mouse, clicking etc) on the page. Page pings are not sent until the
minimumVisitLength seconds have elapsed.
Here is an example configuration:
This is still an experimental feature - but it should provide some interesting data to start to explore page residency, true bounce rates and so on.
Please note that enabling activity tracking can significantly increase the number of Snowplow events generated, especially with a short
4. And the rest
The rest of the changes in this release are much smaller, being either bug fixes or small preparatory features for future releases:
Ad impression tracking bug fix
ETL resilient against corrupted querystrings
We have updated the ETL process so that events with corrupted querystrings can be processed without error: these rows are stored as Snowplow events, but of course with most of the standard fields empty.
snowpak.sh combine-only option
Here are the updated usage options for the
Usage: ./snowpak.sh [options] Specific options: -y PATH path to YUICompressor 2.4.2 * -c combine only (no minification or removing debug) * or set env variable YUI_COMPRESSOR_PATH instead Common options: -h Show this message
attachUserId(boolean), which can be used to stop the tracker from sending a
uid parameter; you can now disable this by calling
attachUserId() like so:
This function is not of immediate use - but it will be an important part of the setup for using the new Clojure Collector, which we are currently working on.
IP address override
Finally, we have added the ability to override the IP address by passing in an
ip= parameter on the querystring.