This version of Stream Enrich uses the latest version of Scala Common Enrich, the library containing Snowplow’s core enrichment logic. Among other things, this means that you can now use R79 Black Swan’s API Request Enrichment and the HTTP Header Extractor Enrichment in your real-time pipeline.
There are certain error conditions under which our Kinesis apps would previously hang rather than crash outright:
In this release, we have modified the apps so that they exit with status code 1 whenever these errors are encountered, rather than hanging. This means that you can run the apps with a background script which restarts them whenever they die. This prevents transient error conditions from requiring human intervention.
You can now configure the number of records that the Kinesis Client Library should retrieve with each call to
GetRecords. The default is 10,000, which is also the maximum. If you frequently see
"Unable to execute HTTP request: Connection reset" in your error logs, then you should try reducing
maxRecords to make each request smaller and more likely to succeed.
You can set
maxRecords to any positive integer up to 10,000 in the configuration file for Stream Enrich (by setting
enrich.streams.in.raw.maxRecords = n) and Kinesis Elasticsearch Sink (by setting
sink.kinesis.in.maxRecords = n).
Stream Enrich’s logging for failed records is now less verbose: instead of logging an error message for every failed record in a batch, it buckets the failed requests based on their error code, then prints the size of each bucket together with a representative error message for each bucket.
Additionally, both the Scala Stream Collector and Stream Enrich now log a missing stream - which is a showstopping issue - at the
error level rather than the
The Kinesis apps are now continuously integrated and deployed using Travis CI. This speeds up our development cycle, making it easier to automatically publish new versions.
Other improvements across the Kinesis applications include:
srcdirectory into a new
exampledirectory. This prevents them from being needlessly added to the jarfile
The Kinesis apps for R80 Southern Cassowary are all available in a single zip file here:
There are no breaking changes in this release - you can upgrade the individual Kinesis apps without worrying about having to update the configuration files or indeed the Kinesis streams.
However, if you want to configure how many records Stream Enrich should read from Kinesis at a time, update its configuration file to add a
maxRecords property like so:
If you want to configure how many records Kinesis Elasticsearch Sink should read from Kinesis at a time, again update its configuration file to add a
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.