Snowplow 80 Southern Cassowary is now available! This is a real-time pipeline release which improves stability and brings the real-time pipeline up-to-date with our Hadoop pipeline.
- The latest Common Enrich
- Exiting on error
- Configurable maxRecords
- Changes to logging
- Continuous deployment
- Other improvements
- Getting help
The latest Common Enrich
This version of Stream Enrich uses the latest version of Scala Common Enrich, the library containing Snowplow’s core enrichment logic. Among other things, this means that you can now use R79 Black Swan’s API Request Enrichment and the HTTP Header Extractor Enrichment in your real-time pipeline.
Exiting on error
There are certain error conditions under which our Kinesis apps would previously hang rather than crash outright:
- The Scala Stream Collector could hang when unable to bind to the supplied port
- All of the apps could hang if they were unable to find the Kinesis stream to write to on startup. This could be caused by a race condition when starting the app and creating the stream at the same time
In this release, we have modified the apps so that they exit with status code 1 whenever these errors are encountered, rather than hanging. This means that you can run the apps with a background script which restarts them whenever they die. This prevents transient error conditions from requiring human intervention.
You can now configure the number of records that the Kinesis Client Library should retrieve with each call to
GetRecords. The default is 10,000, which is also the maximum. If you frequently see
"Unable to execute HTTP request: Connection reset" in your error logs, then you should try reducing
maxRecords to make each request smaller and more likely to succeed.
You can set
maxRecords to any positive integer up to 10,000 in the configuration file for Stream Enrich (by setting
enrich.streams.in.raw.maxRecords = n) and Kinesis Elasticsearch Sink (by setting
sink.kinesis.in.maxRecords = n).
Changes to logging
Stream Enrich’s logging for failed records is now less verbose: instead of logging an error message for every failed record in a batch, it buckets the failed requests based on their error code, then prints the size of each bucket together with a representative error message for each bucket.
Additionally, both the Scala Stream Collector and Stream Enrich now log a missing stream - which is a showstopping issue - at the
error level rather than the
Continuous integration and deployment
The Kinesis apps are now continuously integrated and deployed using Travis CI. This speeds up our development cycle, making it easier to automatically publish new versions.
Other improvements across the Kinesis applications include:
- For each app, the example configuration files have been moved out of the
srcdirectory into a new
exampledirectory. This prevents them from being needlessly added to the jarfile
- We have upgraded the Scala Stream Collector to use Akka 2.3.9, Spray 1.3.3, and Scala 2.10.5
The Kinesis apps for R80 Southern Cassowary are all available in a single zip file here:
There are no breaking changes in this release - you can upgrade the individual Kinesis apps without worrying about having to update the configuration files or indeed the Kinesis streams.
However, if you want to configure how many records Stream Enrich should read from Kinesis at a time, update its configuration file to add a
maxRecords property like so:
If you want to configure how many records Kinesis Elasticsearch Sink should read from Kinesis at a time, again update its configuration file to add a
For more details on this release, please check out the release notes on GitHub.