Following on from Christoph Buente’s RFC, the Scala Stream Collector now provides a mechanism to test if third-party cookies are blocked and reacts appropriately. Huge thanks to Christoph and the team at LiveIntent for contributing this sophisticated new feature.
Simply put, the new “cookie bounce” mechanism:
cookie.nameconfiguration is present
To enable this feature, you can change the
cookieBounce.enabled configuration to
Be careful though: the redirects mentioned above can significantly increase the number of requests that your collectors have to handle.
The Scala Stream Collector was previously too restrictive when it came to parsing elements of an HTTP request, and would reject certain events despite their intrinsic correctness, most notably due to:
Those shortcomings have been fixed in the new version of the collector as part of our ongoing focus on removing any possible data loss scenarios across the pipeline.
Additionally, the enrich stream processing application won’t reject events for which
page_url contains more than one # characters (#2893).
If you are using Kinesis with Stream Enrich, you previously had two choices when it came to enriching your raw event stream:
With R93, you are now able to consume your raw event stream from an arbitrary point in time by specifying
AT_TIMESTAMP as the
streams.kinesis.initialPosition configuration setting. Additionally, you’ll need to specify an actual timestamp in
Before R93, when you launched Stream Enrich with the IP lookups enrichment, the MaxMind IP lookups database was downloaded locally and, if you were to launch it later it would reuse this local cache of the database.
R93 introduces a command line argument
--force-ip-lookups-download to download a new version of the ip lookup database every time that Stream Enrich is launched.
There are plans to introduce a time-to-live for this database and re-download it while Stream Enrich is running in issue #3407.
Before R93, it was only possible to use
user_ipaddress as a partition key for the enriched event stream emitted by Stream Enrich. This release extends the realm of possibilities by introducing the
streams.out.partitionKey configuration setting, which lets you specify which event property to use to partition the output stream of Stream Enrich.
The available properties have been selected based on their fitness as a partition key (i.e. good distribution and usefulness):
If none of these are used, a random UUID will be generated for each event as partition key.
As a reminder, in Kinesis and Kafka, two events having the same partition key are guaranteed to end up in the same shard or partition respectively.
Improvements have also been made regarding how both the Scala Stream Collector and Stream Enrich interact with Kafka. In particular:
streams.kafka.retriesconfiguration for both the Scala Stream Collector and Stream Enrich, allowing the Kafka producer to resend any record which failed being sent the specified number of times
streams.buffer.byteLimitsetting was used as the size of the batch being sent to Kafka, which didn’t make a lot of sense. It now corresponds to the quantity of memory the Kafka producer can use to buffer records before sending them
This release also includes a big set of other updates which are part of the modernization effort around the realtime pipeline, most notably:
The latest version of the Scala Stream Collector is available from our Bintray here.
For a complete example, see our sample
The Scala Stream Collector is no longer an executable JAR file. As a result, it has to be launched as:
The latest version of Stream Enrich is available from our Bintray here.
For a complete example, see our sample
Stream Enrich is no longer an executable JAR file. As a result, it will have to be launched as:
Additionally, a new
--force-ip-lookups-download flag has been introduced as mentioned above.
Upcoming Snowplow releases will include:
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problem, please visit our Discourse forum.