Snowplow 0.9.7 released with important bug fixes
We are pleased to announce the immediate availability of Snowplow version 0.9.7. 0.9.7 is a “tidy-up” release which fixes some important bugs, particularly:
- A bug in 0.9.5 onwards which was preventing events containing multiple JSONs from being shredded successfully (#939)
- Our Hive table definition falling behind Snowplow 0.9.6’s enriched event format updates (#965)
- A bug in EmrEtlRunner causing issues running Snowplow inside some VPC environments (#956)
As well as these important fixes, 0.9.7 comes with a set of smaller bug fixes plus two new features:
- The ability to perform shredding without prior enrichment (i.e. shred an existing folder of enriched events)
- The ability to load Redshift from an S3 bucket in a region different to Redshift’s own region
Below the fold we will cover:
- Shredding bug fix and new shredding functionality
- Other bug fixes
- Other new functionality
We discovered a serious bug (#939) in the new shredding functionality released in Snowplow 0.9.6. This bug meant that, for any enriched event which contained more than one JSON, none of those JSONs would be successfully shredded. Some examples of enriched events containing more than one JSON would be:
- An unstructured event with a single custom context attached
- A link click with a single custom context attached
- A page view with two custom contexts attached
Events containing zero or one JSONs were not affected by this bug.
This release fixes this bug and also introduces some new functionality to make it easier to re-shred of existing enriched events. You can now run the Shredding process on Elastic MapReduce without the prior Enrichment process, by using the command-line option:
Additionally, we have updated EmrEtlRunner’s
--skip functionality, adding an explicit
--skip enrich option which can be used to shred without enriching.
For more information on these new options, see Using EmrEtlRunner on the Snowplow wiki.
Many thanks to Elasticity author Rob Slifka for his help in tracking down a tricky bug in EmrEtlRunner’s VPC-related code (#956). If you have been having issues running Snowplow’s Elastic MapReduce job inside an Amazon VPC, this release should help.
We also fixed some other smaller issues in EmrEtlRunner:
- We fixed some bugs that had crept in to the behavior of
- We renamed the
--process-enrichto prevent confusion with
- We changed the
-xprevent clash with
We have made some small but important fixes around the loading of JSONs into Redshift:
- We removed the
EMPTYASNULLoption on our
COPYcommand for loading JSONs (#942). Converting empty strings into nulls was breaking records which had already passed required-field validation in JSON Schema
- We added the missing
targetUrlfield to our ad_impression JSON Path file, thanks to Gireesh Sreepathi for spotting this (#951)
- We made the
config.ymloptional, for users who are not using their own JSON Path files (#958)
As mentioned above, it is now possible to load events and JSONs into Redshift from an S3 bucket in a different region to Redshift’s own reion. This is done by setting the
REGION option on the
COPY commands to the
:s3:region: parameter found in
Separately, we have updated and added new git submodules in the [1-trackers sub-folder] [trackers-folder] of the repository, and improved the associated documentation; many thanks to community member Ozzie Gooen for his contribution here!
You need to update EmrEtlRunner and StorageLoader to the latest code (0.9.7 release) on GitHub:
In your EmrEtlRunner’s
config.yml file, update your Hadoop shred job’s version to 0.2.1, like so:
For a complete example, see our sample
You can find the updated Hive file in our repository as 4-storage/hive-storage/hiveql/table-def.q.
Note that enriched events generated by pre-0.9.6 Snowplow are not compatible with this updated Hive definition, and will need to be re-generated.
Robustness and stability are hugely important to the Snowplow team, and so we are always ready to balance featureful new versions of Snowplow with bug fixing releases such as 0.9.7. For more details on this release, please check out the 0.9.7 Release Notes on GitHub.