This release brings full support for Elasticsearch 2.x for both the HTTP and Transport clients. This lets you use the AWS Elasticsearch Service running ES 2.3, or indeed upgrade your self-hosted Elasticsearch cluster to version 2.x.
We delivered this by dynamically building a Kinesis Elasticsearch Sink binary for both Elasticsearch versions - you’ll choose the binary appended by
_2x as appropriate.
One important thing to flag is that Elasticsearch 2.x no longer allows field names to contain periods (
.). While we never used periods within Snowplow or Iglu Central property names, your team may have created some, like so:
Previously, this would have loaded into Elasticsearch like so:
From this release on, we are automatically converting the field name’s periods to underscores, whether you are loading Elasticsearch 1.x or 2.x:
For more information please see issue #2894.
Community member Stephane Maarek flagged to us that our Kinesis Elasticsearch Sink’s buffer settings did not seem to be working correctly.
We investigated and found an underlying issue in the Kinesis Connectors library, where every record in a
GetRecords call is added to the buffer for sinking, without checking between additions whether or not the buffer should be flushed.
In the case that your Elasticsearch Sink has to catch up with a very large number of records, and your
maxRecords setting is set to 10,000, this can leave the sink struggling to emit to Elasticsearch, because the buffer will be too large to send reliably.
To work around this issue, we updated our Elasticsearch Emitter code to also respect the buffer settings. The new approach works as follows:
It is important that you tune your record and byte limits to match the cluster you are pushing events to. If the limits are set too high you might not be able to emit successfully very often; if your limits are too low then your event sinking to Elasticsearch will be inefficient.
For more information on this issue and the corresponding commit please see issue #2895.
This release adds the ability to update your Scala Stream Collector’s cookie’s value with the
network_userid parameter. If a
nuid value is available within the querystring of your request, this value will then be used to update the cookie’s value.
This feature is only available through a querystring parameter lookup, so only works for
GET requests at the present.
Many thanks to Christoph Buente from Snowplow user LiveIntent for this contribution!
For more information and the reasoning behind this update please see issue #2512.
To ensure that the cookie path is always valid we have updated the Scala Stream Collector to statically set the cookie path to “/”. This is to avoid situations where a path resource such as “/r/tp2” results in the cookie path ending up at “/r”. Endpoints such as “/i” do not suffer from this issue.
Thanks again to Christoph Buente for this contribution!
For more information on this please see issue #2524.
An Iglu schema registry typically consists of schemas, Redshift table definitions and JSON Paths files - see our Iglu example schema registry for an example.
When we originally started building out Iglu Central, we diverted from this approach and stored the Redshift table definitions and JSON Paths files for Iglu Central schemas in the main Snowplow repository, snowplow/snowplow. You’ll see from the Snowplow CHANGELOG that many Snowplow releases have included Redshift and JSON Paths files, to complement specific schemas in new Iglu Central releases.
In hindsight, splitting the Iglu Central resources across two separate code repositories was a mistake:
Therefore, in this release we have moved all Redshift table definitions and JSON Paths files from the main Snowplow repository into Iglu Central, specifically in these paths:
Our hosting of these files for the correct operation of Snowplow is unchanged, and the Snowpow repository continues to hold current and previous definitions of the
atomic.events table, plus corresponding migration scripts.
This release also contains further changes, notably:
Config.resolver(). Many thanks to community member Shin for contributing this
The Kinesis apps for R84 Stellers Sea Eagle are available in the following zipfiles:
http://dl.bintray.com/snowplow/snowplow-generic/snowplow_scala_stream_collector_0.8.0.zip http://dl.bintray.com/snowplow/snowplow-generic/snowplow_stream_enrich_0.9.0.zip http://dl.bintray.com/snowplow/snowplow-generic/snowplow_kinesis_elasticsearch_sink_0.8.0_1x.zip http://dl.bintray.com/snowplow/snowplow-generic/snowplow_kinesis_elasticsearch_sink_0.8.0_2x.zip
Or you can download all of the apps together in this zipfile:
Only the Elasticsearch Sink app config has changed. The change does not include breaking config changes. To upgrade the Elasticsearch Sink:
NOTE: These timeouts are optional and will default to 300000 if they cannot be found in your Config.
Here is the updated config file template:
We have renamed the upcoming milestones for Snowplow to be more flexible around the ultimate sequencing of releases. Upcoming Snowplow releases, in no particular order, include:
event_fingerprints (synthetic duplicates) in Hadoop Shred
Note that these releases are always subject to change between now and the actual release date.
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.