Snowplow 62 Tropical Parula released

17 March 2015  •  Alex Dean

We are pleased to announce the immediate availability of Snowplow 62, Tropical Parula. This release is designed to fix an incompatibility issue between r61’s EmrEtlRunner and some older Elastic Beanstalk configurations. It also includes some other EmrEtlRunner improvements.

tropical-parulas

Many thanks to Snowplow community member Dani Solà from Simply Business for his contribution to this release!

  1. Fix to support legacy Beanstalk access logs
  2. Custom bootstrap actions
  3. Other improvements to EmrEtlRunner
  4. Upgrading
  5. Getting help

1. Fix to support legacy Beanstalk access logs

After the release of r61 Pygmy Parrot, we became aware that the updated file handling code for access logs generated by the Clojure Collector did not work with certain legacy Elastic Beanstalk environments, thus:

  • Middle portion of access log filename is tomcat7_rotated - works fine
  • Middle portion of access log filename is tomcat8_rotated - works fine
  • Middle portion of access log filename is tomcat7 - EmrEtlRunner does not move logs to Staging bucket

This is an easy issue to diagnose: if it affects you, then following an upgrade to r61 Pygmy Parrot, your Snowplow pipeline will copy no Clojure Collector access logs to Staging, and thus generate no enriched events.

This issue (#1480) is resolved in this release: EmrEtlRunner now supports all Clojure Collector access log filename formats again.

2. Custom bootstrap actions

The EmrEtlRunner now has support for adding one or more of your own custom bootstrap actions (#1405). This is particularly useful if you are running your own Hadoop job steps as part of your scheduled jobflow on EMR. Many thanks to Dani Solà for contributing this feature.

You simply set your custom bootstrap actions in your EmrEtlRunner’s config.yml as an array:

:emr:
  ...
  :ec2_key_name: ADD HERE
  :bootstrap:
    - s3://mybucket1/filename1
    - s3://mybucket2/filename2
  :software:
    ...

3. EmrEtlRunner improvements

We have made a variety of improvements “under the hood” to EmrEtlRunner:

  • EmrEtlRunner now tolerates more exception types in EmrJob’s wait_for (#358). This should reduce the incidence of monitoring failures during EMR runs
  • We have bumped the version of Contracts to 0.7 (#1498), and moved include Contracts into classes and modules following best practice (#1438)
  • The missing :archive: property has been added into the BucketHash (#1475)
  • We have removed time_diff as a dependency because it was no longer used (#1352)
  • The breaking test in the EmrEtlRunner’s test suite is now fixed (#1287). The test suite now passes again

4. Upgrading

You need to update EmrEtlRunner to the latest code (0.13.0) on GitHub:

$ git clone git://github.com/snowplow/snowplow.git
$ git checkout r62-tropical-parula
$ cd snowplow/3-enrich/emr-etl-runner
$ bundle install --deployment
$ cd ../../4-storage/storage-loader
$ bundle install --deployment

You must also update your EmrEtlRunner’s configuration file, or else you will get a Contract failure on start. See the next section for details.

Whether or not you use the new bootstrap option, you must update your EmrEtlRunner’s config.yml file to include an entry for it:

In the :emr: section of your EmrEtlRunner’s config.yml file, add in a :bootstrap: property like so:

:emr:
  ...
  :ec2_key_name: ADD HERE
  :bootstrap: []          # No custom boostrap actions
  :software:
    ...

For a complete example, see our sample config.yml template.

5. Getting help

For more details on this release, please check out the r62 Tropical Parula Release Notes on GitHub.

If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.