Snowplow 68 Turquoise Jay released

23 July 2015  •  Fred Blundun

We are happy to announce the release of Snowplow 68, Turquoise Jay. This is a small release which adapts the EmrEtlRunner to use the new Elastic MapReduce API.

Table of contents:

  1. Updates to the Elastic MapReduce API
  2. Multiple “in” buckets
  3. Backwards compatibility with old Hadoop Enrich versions
  4. Upgrading
  5. Getting help

turquoise-jay

1. Updates to the Elastic MapReduce API

The Snowplow EmrEtlRunner uses Rob Slifka’s Elasticity Ruby library to interact with the Elastic MapReduce API. AWS recently altered this API for new AWS users so that it is now based on clusters rather than job flows, breaking the API calls used by Elasticity to check the status of an EMR job.

Rob has moved very fast to put out a new Elasticity release (version 6.0.2) using the all-new EMR APIs. Thanks a lot Rob!

For more information about Elasticity, check out Rob’s guest post from back in 2013.

2. Multiple "in" buckets

The EmrEtlRunner is no longer limited to a single bucket. Now you can specify an array of in buckets in the configuration YAML and raw event files from all of them will be moved to the processing bucket. This is helpful when upgrading your collector version: you can process events from your own and new collectors in tandem until all event traffic has moved to the new collector.

See the repository for an example configuration file.

3. Backwards compatibility with old Hadoop Enrich versions

More recent versions of Scala Hadoop Enrich (1.0.0 and later) are stored in a different S3 bucket from previous versions. Unforunately, our previous EmrEtlRunner release (0.15.0 in Release 66 Oriental Skylark) always looked in the new location, no matter what version of Hadoop Enrich was specified.

The new version of EmrEtlRunner decides where to look for the jar based on the jar’s version; this means that you can use the latest EmrEtlRunner version with earlier versions of Hadoop Enrich.

4. Upgrading

You need to update EmrEtlRunner to the latest version (0.16.0) on GitHub:

$ git clone git://github.com/snowplow/snowplow.git
$ git checkout r68-turquoise-jay
$ cd snowplow/3-enrich/emr-etl-runner
$ bundle install --deployment
$ cd ../../4-storage/storage-loader
$ bundle install --deployment

5. Getting help

For more details on this release, please check out the r68 Turquoise Jay on GitHub.

If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.