These are not the only improvements in this version - here are the rest:
&refr=parameter was not escaped)
snowpak.sh, now has a combine-only option (no minification), which is helpful for testing purposes
attachUserId(boolean), which can be used to stop the tracker sending a
&ip=parameter on the querystring
Below, we first explain how to upgrade, before taking a brief tour through these updates:
Upgrading is a two-step process:
If you are using EmrEtlRunner, you need to update your configuration file,
config.yml, to use the latest versions of the Hive serde and HiveQL scripts:
:snowplow: :serde_version: 0.5.3 :hive_hiveql_version: 0.5.3 :non_hive_hiveql_version: 0.0.4
That’s it! You don’t need to make any changes to your Infobright setup, assuming you are up-to-date with previous releases.
event field which specifies what type of event it is. Currently we have six different types of events, which are set out in the table below:
|Page view|| || |
|Page ping||None (automatic)*|| |
|Custom event|| || |
|Ad impression|| || |
|Transaction|| || |
|Transaction item|| || |
* for more information on on-page activity tracking, please see the relevant section later in this blog post.
This new event field should make it much easier to query Snowplow data by the type of event. For example, to retrieve the number of e-commerce transactions per day:
We will be updating our Analytics Cookbook to use the
event field to simplify queries where possible.
As stated above, the Snowplow ETL now attaches a unique ID to each event - specifically a type 4 UUID. This new
event_id is much more unique than the existing
txn_id is currently unused, but we may eventually use it to check for duplicate events
txn_id introduced prior to the ETL, see issue 24 for more details).
You can use the new
event_id field to uniquely identify individual events in your event store, and of course to count distinct events. For example, to count the number of page views by day, we simply execute the following query:
We will be updating our Analytics Cookbook to use
event_id in any examples which currently (erroneously) use
setHeartBeatTimer() inherited from
piwik.js, and introduced a new function,
With activity tracking enabled, “page pings” are sent to Snowplow every
heartBeatDelay seconds, as long as the visitor remains active (moving the mouse, clicking etc) on the page. Page pings are not sent until the
minimumVisitLength seconds have elapsed.
Here is an example configuration:
This is still an experimental feature - but it should provide some interesting data to start to explore page residency, true bounce rates and so on.
Please note that enabling activity tracking can significantly increase the number of Snowplow events generated, especially with a short
The rest of the changes in this release are much smaller, being either bug fixes or small preparatory features for future releases:
Many thanks to Alan Z @ VeryCD for spotting a bug in the
trackImpression() method, which was stopping ad impressions from being logged. This is now fixed.
We have updated the ETL process so that events with corrupted querystrings can be processed without error: these rows are stored as Snowplow events, but of course with most of the standard fields empty.
Here are the updated usage options for the
Usage: ./snowpak.sh [options] Specific options: -y PATH path to YUICompressor 2.4.2 * -c combine only (no minification or removing debug) * or set env variable YUI_COMPRESSOR_PATH instead Common options: -h Show this message
attachUserId(boolean), which can be used to stop the tracker from sending a
uid parameter; you can now disable this by calling
attachUserId() like so:
This function is not of immediate use - but it will be an important part of the setup for using the new Clojure Collector, which we are currently working on.
Finally, we have added the ability to override the IP address by passing in an
ip= parameter on the querystring.
That’s it! If you have any problems with Snowplow version 0.6.5, please raise an issue or get in touch with us via the usual channels.