As part of our refactor to support GCP, Snowplow 101 Neapolis accidentally introduced the sharing of the same Kinesis sink across multiple Amazon Kinesis Client Library’s
RecordProcessors. This resulted in the same Kinesis sink being flushed as many times as there were
RecordProcessors, leading to duplicated events if there were more than one
RecordProcessor running on the same Stream Enrich instance.
This behavior has been corrected in this release by re-implementing one Kinesis sink per
There is a comprehensive guide to this issue on Discourse, detailing who can be affected and the steps to mitigate the issue, in case you would like to discuss it further.
The event duplication issue introduced in R101 was a major bug, and does not reflect the code quality and operational standards that we aim for at Snowplow.
As our team grows and we strive for an ever-faster release cadence across our major projects, it is crucial that our software quality actually improves - we cannot achieve flow and deliver high throughput without high-grade quality-supporting processes.
On our side, we are prioritising two areas of improvement:
Another idea we are starting to consider is less frequent “LTS” (Long-Term Support) releases of Snowplow, similar for example to the Ubuntu release process.
Above all we want the community’s ideas on how we can improve software quality at Snowplow. Do please share your thoughts in our Discourse forum.
The latest version of Stream Enrich is available from our Bintray here.
If you are currently on R101, please note that you will need to follow the R103 Stream Enrich upgrade steps, relating to the IP Lookups Enrichment. Check out the R103 Upgrading guide.
Upcoming Snowplow releases are unchanged:
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problem, please visit our Discourse forum.