We have just released version 0.4.10 of Snowplow - people using 0.4.8 can jump straight to this version. This version updates:
- snowplow.js to version 0.7.0
- the Hive deserializer to version 0.4.9
The main changes are as follows:
- The Hive-based ETL process now extracts the ecommerce tracking fields and the site ID field and adds them into your processed events table
- We fixed a bug in the Hive deserializer where a partially-processed row was returned even if a fatal error was found in the row (now, a null row is returned instead)
The rest of the changes were all enhancements to the Hive deserializer’s Specs2 test suite - these improvements should help to accelerate work on the deserializer (we have lots of cool new stuff we want to add to the deserializer!).
New event table fields
setSiteId()functionality is now extracted to the
app_idfield (short for application ID)
- The ecommerce tracking functionality is now extracted to a set of
For details on the new fields, please review our latest Hive events table definition - there is now a column indicating in which version a given field was added.
How to get the new version
As usual, the new version of the Hive deserializer is available from the GitHub repository’s Downloads section as snowplow-log-deserializers-0.4.9.jar.
The updated snowplow.js is available in our GitHub repository for you to minify and upload, or alternatively you can use the one on our CDN:
If you have any problems with either of these components, please raise an issue!
A note on backwards compatibility for the events table
We will continue to add extra fields to the Snowplow events table as we add extra capabilities to the ETL process - for example, we are working on functionality to extract geo-location information from IP addresses via MaxMind.
Starting with our new
app_id field, we will be adding all such new fields to the end of our Hive events table definition. This will mean that you will not have to re-run the ETL process across all your historic raw logs, provided you do not need the data found in the new fields. This is because a Hive query across both the old event table format and the new table format works as long as you don’t explicitly query a new field.
In other words, Hive is futureproofed against new fields being added to the end of your underlying data files, and we’ll take advantage of this to improve backwards compatibility for our events table!