Snowplow 0.9.14 released with additional webhooks

31 December 2014  •  Alex Dean

We are pleased to announce the release of Snowplow 0.9.14, our 17th and final release of Snowplow for 2014! This release contains a variety of important bug fixes, plus support for three new event streams which can be loaded into your Snowplow event warehouse and unified log:

  • Mandrill - for tracking email and email-related events delivered by Mandrill
  • PagerDuty - for tracking incidents generated by PagerDuty
  • Pingdom - for tracking site outages detected by Pingdom

Read on for more information:

  1. Mandrill webhook support
  2. PagerDuty webhook support
  3. Pingdom webhook support
  4. Vagrant support
  5. EmrEtlRunner improvements
  6. Making Hadoop Enrich and Hadoop Shred more tolerant
  7. Clojure Collector bug fixes
  8. CloudFront Collector bug fix
  9. Upgrading
  10. Help

1. Mandrill webhook support

The Mandrill webhook adapter lets you track email and email-related events delivered by Mandrill. Using this functionality, you can warehouse all email-related events alongside your existing Snowplow events.

For help setting up the Mandrill support, see the Mandrill webhook setup wiki page.

For technical details on this adapter, see the Mandrill webhook adapter wiki page.

2. PagerDuty webhook support

The PagerDuty webhook adapter lets you track incidents reported by PagerDuty. Using this you can record all incident related events from PagerDuty with existing Snowplow events.

For help setting up PagerDuty support, see the PagerDuty webhook setup wiki page.

For technical details on this adapter, see the PagerDuty webhook adapter wiki page.

3. Pingdom webhook support

The Pingdom webhook adapter lets you track all site monitoring related events delivered by Pingdom.

For help setting up Pingdom support, see the Pingdom webhook setup wiki page.

For technical details on this adapter, see the Pingdom webhook adapter wiki page.

4. Vagrant support

A big focus for 2015 is going to be on making it easier to contribute to the various Snowplow projects. As part of this, we will be implementing a standard quickstart process for hacking on any given Snowplow repository using Vagrant.

The main Snowplow repository is among the first Snowplow projects to get the Vagrant quickstart treatment. A simple vagrant up && vagrant ssh will install all of the required development and build tools automatically for you. This is how you would get started if you wanted to hack on Snowplow’s Scala Common Enrich component:

 host$ git clone https://github.com/snowplow/snowplow.git
 host$ cd snowplow
 host$ vagrant up && vagrant ssh
guest$ cd /vagrant/3-enrich/scala-common-enrich
guest$ sbt test

We hope you find this useful - expect to see other Snowplow projects (including our trackers) getting the Vagrant treatment over the coming weeks.

5. EmrEtlRunner improvements

We have made an important bug fix to EmrEtlRunner for Clojure Collector users: Amazon has slightly changed the event log filenames generated by the Clojure Collector in some Elastic Beanstalk environments; we have updated EmrEtlRunner so that it can pick up both the old and new filenames (#1194).

To make debugging failed runs easier, we have updated EmrEtlRunner so that, on a job failure in EMR, the status of the overall jobflow and all individual steps are logged to stdout (#1153).

6. Making Hadoop Enrich and Hadoop Shred more tolerant

We have upgraded the Hadoop Enrich and Scala Hadoop Shred jobs to make them more tolerant of the Snowplow trackers sending newer self-describing JSON versions. If you are interested in finding out more, please check out tickets #1220 and #1231.

We have also relaxed the Hadoop Enrich job to accept Snowplow events POSTed with an application/json content-type not including a charset=utf-8 parameter (#1257).

The improvements to Scala Hadoop Enrich were implemented in the underlying Scala Common Enrich library, and will be applied to our Kinesis event flow in the next Kinesis release.

7. Clojure Collector bug fixes

We are making available a patch release of the Clojure Collector, version 0.9.1. This fixes two important bugs:

  1. We increased Tomcat’s HTTP header tolerance to 64kB to handle much larger GET requests, such as those containing multiple large contexts (#1249). Many thanks to Anton Kirillov for identifying this issue
  2. We changed the 1x1 pixel response to use a “stable” GIF image (#1258). This is needed to successfully track email opens in Gmail. Many thanks to Jaime Irurzun for identifying this issue

Please note that this is the last Clojure Collector release that will support Tomcat 7; by default new Elastic Beanstalk applications expect Tomcat 8, and we have a version of the Clojure Collector in the works to support this.

8. CloudFront Collector bug fix

The CloudFront Collector was affected by the same problem as the Clojure Collector: the 1x1 GIF image was not “stable”, leading to Gmail’s image-prefetcher not fetching it, and thus email opens in Gmail not being tracked in Snowplow.

We have now replaced the CloudFront pixel with a “stable” GIF (#1259).

9. Upgrading

You need to update EmrEtlRunner to the latest code (0.10.0) on GitHub:

$ git clone git://github.com/snowplow/snowplow.git
$ git checkout 0.9.14
$ cd snowplow/3-enrich/emr-etl-runner
$ bundle install --deployment
$ cd ../../4-storage/storage-loader
$ bundle install --deployment

This release bumps the Hadoop Enrichment process to version 0.11.0 and the Hadoop Shredding process to version 0.3.0.

In your EmrEtlRunner’s config.yml file, update your hadoop_enrich and hadoop_shred jobs’ versions like so:

  :versions:
    :hadoop_enrich: 0.11.0 # WAS 0.10.1
    :hadoop_shred: 0.3.0 # WAS 0.2.1

For a complete example, see our sample config.yml template.

This release bumps the Clojure Collector to version 0.9.1.

To upgrade to this release:

  1. Download the new warfile by right-clicking on this link and selecting “Save As…”
  2. Log in to your Amazon Elastic Beanstalk console
  3. Browse to your Clojure Collector’s application
  4. Click the “Upload New Version” and upload your warfile

You can find the new pixel in our GitHub repository as 2-collectors/cloudfront-collector/static/i - upload this to S3, overwriting your existing pixel.

Remember to invalidate the pixel in your CloudFront distribution.

Make sure to deploy Redshift tables for any of the new webhooks that you plan on ingesting into Snowplow. You can find the Redshift table deployment instructions on the corresponding webhook setup wiki pages:

10. Documentation and help

Documentation relating to the new webhook support is available on the wiki:

As always, if you do run into any issues or don’t understand any of the new features, please raise an issue or get in touch with us via the usual channels.

For more details on this release, please check out the 0.9.14 Release Notes on GitHub.