We have changed the way we’re releasing
snowplow/snowplow repo is no longer where you’ll find code for key pipeline components. There are links in the repo readme and this article to their new locations. We will no longer be publishing the R releases - you can find the most recent recommended bundles on our new release matrix. If you’re only consuming the published assets, nothing has changed for you.
As we announced with the Failed Events production release, it is the last in a long line of umbrella releases - 119 in fact! We have long used the umbrella releases to indicate that the assets generated by the
snowplow/snowplow repo were compatible with each other and that it was our recommendation to move to them. This assurance is important, of course, but the way we were putting it together has been slowing us down. In an effort to bring improvements to you faster, we have made changes.
As a reminder, the
snowplow/snowplow monorepo has recently been the home for multiple key components: EMR ETL Runner, Scala Common Enrich, Beam Enrich, Stream Enrich and Scala Stream Collector. It used to also contain the recently deprecated batch pipeline components. There are a number of other pipeline components outside of these that need assurance around compatibility, such as the loaders and recovery tooling. So our umbrella releases have only been covering part of the estate. We want to provide you a clearer and broader level of assurance.
To date, every change to
snowplow/snowplow has been wrapped up in an umbrella release. Our umbrella release process has a lot of steps. Given there’s an overhead, we try to get as much value from them as possible. You’ll likely be familiar with where this scenario ends! The temptation is to put more in the release, in turn making the overhead larger, tempting everyone to add more. The snowball gains pace. We want to ship improvements faster, so need a leaner way to push changes.
Another key problem is that patching has had no natural home. We test our builds thoroughly - we will continue to do so! However, there are so many different ways to collect data we can’t guarantee our tests cover every use case. We of course need to patch sometimes. If every patch needs an umbrella release it means both our fanfare moments and our tiny oh-shoot patches get the same treatment.
Moving out of the monorepo
We must note the issues above were not the fault of a monorepo - each were solvable within that construct. However, when we sat down a while ago and thought about what our ideal design would look like it didn’t include the monorepo, so we’re making moves to disband it.
Each of the aforementioned services are now sitting in their own repos and will be iterated there.
We’re also taking the opportunity to drop the “Scala” from the repo names as it is an implementation detail, however it’s important to note that all published assets will continue to have the same names. If you’re consuming Snowplow only using the assets nothing has changed. We will of course maintain and continue to increment their version numbers too. To help communicate changes, we’ll be even stricter with our adoption of semantic versioning to denote changes in interfaces.
We have made small but significant changes to our release strategy around this. When we have a new version of a component, we will publish it straight away. All releases (including patches) will be announced in Discourse and Twitter. Releases introducing functionality and/or configuration changes (minor and above) will also have an accompanying blog post.
The release will be fully tested to make sure it fulfils core use cases and is compatible with the rest of the stack. It may not be battle hardened enough for us to recommend it. When it is, the version will appear on the recommended releases matrix. This will be an indicator to community that we recommend moving to this version. We will announce updates to the matrix on all channels too. In the interim that component will be available for you to battle test with us and we’d love your feedback.
We have kept
snowplow/snowplow as a signpost to the new repos. It will be a jumping off point for any new community member learning about Snowplow for the first time. We have moved issues related to components out to their respective new repos. Anything left can be joined by new issues related to the overarching Snowplow strategy.
Beyond recommended releases
We’re confident that this is going to help us to get changes to our stack to our community quicker, but it’s only the first step. Users of Snowplow may be familiar already with our tracker protocol. This dictates what a tracker must pass the Collector in order for a call to be accurately processed, and is immensely useful for iterating the trackers with confidence.
We’d like to extend our use of protocols between other components too. They’re not easy to write, so will take some time, but we believe that between strict semantic versioning and extensive adoption of protocols with automated tests to assure it, we should be able to recommend our releases a lot quicker - if not automatically.
What do you need to do?
If you’re consuming Snowplow via the assets - nothing. If you’re a keen contributor and are wondering where all the code has gone - see the new repos. If you have any strong opinions on where this goes next do please reach out on Discourse.