The Mailgun webhook adapter lets you track email and email-related events delivered by Mailgun. Using this functionality, you can warehouse all email-related events alongside your existing Snowplow events.
For help setting up the Mailgun webhook, check out the Mailgun webhook setup page.
All the currently documented Mailgun events are supported by this release: bounce, deliver, drop, spam, unsubscribe, click, and open events.
For technical details, see the Mailgun webhook documentation page.
The Olark webhook adapter lets you receive the transcripts of chats on you website, including messages that you received when a support representative was not online, using Olark. Using this functionality, you can track and analyse chat activity alongside your other Snowplow data.
For help setting up the Olark webhook, see the Olark webhook setup page.
StatusGator lets you track the availability of hundreds of SaaS and other cloud services that you may be relying on. Using the webhook integration with StatusGator, you can collect availability events and use them to find correlations with other activity in your Snowplow data (e.g. elevated error rates in your website).
You could also use this webhook to provide alerts to your operations team, writing an AWS Lambda function or similar to emit alerts if specific cloud services experience outages.
For help setting up the StatusGator webhook, refer to StatusGator webhook guide.
Using the Unbounce service you can experiment with different landing pages and variants thereof; Unbounce is a popular tool for lead generation and conversion rate optimization (CRO). Using the Unbounce webhook you can now integrate your lead generation data with the rest of the Snowplow data.
For help setting up the Unbounce webhook, refer to Unbounce webhook guide.
We have modified the S3DistCp EMR step which copies the raw gzipped log files produced by the Clojure Collector from S3 to HDFS - this step will now uncompress the files in transit. This modification greatly improves performance of the Spark Enrich job as gzipped files are not splittable and are consequently processed on the same core in their entirety.
This change represents a significant speedup in the performance of our Spark Enrich job when working with large gzipped files emitted by the Clojure Collector. This optimization is only enabled for the specific pairing of Spark Enrich (not Hadoop Enrich) and the Clojure Collector (not our other collectors).
By default, RDB Loader performs S3-level consistency checks, checking the files for atomic events and shredded types over time, to ensure that Amazon S3’s infamous eventual consistency issue is not going to confound the load.
The problem is that these checks are linearly correlated with the cardinality of shredded types; as a result, pipelines with a wide array of shredded types are disproportionately affected by this check.
To reduce friction for such pipelines, it is now possible to skip the S3 consistency checks performed by RDB Loader, using a new EmrEtlRunner
Be aware that this option requires a RDB Loader version greater or equal to 0.13.0.
In addition to the above we have made the following changes:
The latest version of EmrEtlRunner is available from our Bintray here.
To benefit from the new webhook integrations, you’ll need to bump the Spark Enrich version used in the EmrEtlRunner configuration file:
For a complete example, see our sample
Upcoming Snowplow releases will include:
For more details on this release, as always do check out the release notes on GitHub.
If you have any questions or run into any problems, please visit our Discourse forum.