Snowplow was the first event data pipeline to let you discover and investigate your invalid events - now we are the first pipeline to let you actively fix those bad events!
While this is a powerful tool, using it can be quite involved. To along with this release, we have published a tutorial on Discourse, Using Hadoop Event Recovery to recover events with a missing schema. This tutorial walks you through one common use case for event recovery: where some of your events failed validation because you forgot to upload a particular schema.
You can also check out the wiki documentation for Hadoop Event Recovery.
Our Scala Common Enrich library uses the Apache Commons Base64 class. Version 0.5 of this library wasn’t thread-safe. This didn’t matter when running the batch pipeline, since each worker node only uses one thread to process events. But in Stream Enrich it caused a race condition where multiple threads could simultaneously access the same Base64 object, sometimes resulting in erroneous Base64 decoding.
This issue was particularly affecting high-volume users running Stream Enrich on servers with 4+ vCPUs.
If this issue is affecting you, you’ll see potentially many bad rows where the error message reports corrupt-looking JSON, but if you Base64-decode the bad row’s original line, the JSON contained within it is valid.
In this release we have therefore upgraded our Stream Enrich component to use version 1.10 of the affected library, which makes the class thread-safe. Although non-critical, this update will come to the Hadoop pipeline in a future release.
We have added JSON Paths files and Redshift DDLs for the following schemas:
The Kinesis apps for R81 Kangaroo Island Emu are all available in a single zip file here:
Only the Stream Enrich app has actually changed. The change is not breaking, so you don’t have to make any changes to your configuration file. To upgrade Stream Enrich:
For more details on this release, please check out the release notes on GitHub.
The wiki has full information on how to use Hadoop Event Recovery.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.