Introducing Snowplow 21.04 Pennine Alps

Share

Today we bring together and share the latest updates to the Snowplow platform, as well as revealing our new way of announcing Snowplow releases. Our new release announcements will clarify our recommended component versions and discuss our latest features.

In April 2020 we made our last umbrella release, R119 Tycho Magnetic Anomaly Two. Following this release, we moved the Snowplow components into separate repositories and we announced why we’re changing the way we’re releasing in a blog post. This has given us far greater agility and the flexibility to make updates to each individual component separately, but we’ve come to realise that it’s hard to keep up with all the updates across the Snowplow platform and understand how they all fit together.

With this in mind, over the last couple of months we’ve been thinking hard about how we can continue to help our users understand the latest features that are available in your Snowplow pipeline and what versions of the Snowplow components you need to be running to use them.

  1. Snowplow Releases
  2. Snowplow 21.04 Pennie Alps
  3. Recommended Component Versions
  4. Public Roadmap
  5. Snowplow Insights

Snowplow Releases

This brings us to today, where we reveal our new way of announcing Snowplow platform releases. We will be moving to periodic based releases, which are named by the year and month of the release along with a memorable, mountainous name 🏔. We wanted to land on something memorable, aspirational and ‘Snowy’ enough to tie these releases into the wider Snowplow ecosystem.

We will continue to publish new versions of our components within their associated repositories, but these platform releases will provide clarity on the current recommended component versions that are fully compatible with each other, battle tested and ready for production. 

There is no precise cadence for these releases, we will define a release when we feel there is a notable set of new features available for your Snowplow pipeline. We will aim to do at least two releases per year, although there is every possibility that you will see more as we aim to assess whether we’re ready to announce a platform release each quarter.

If you’d like to find the very latest updates and features, you can look at the latest commits to snowplow/snowplow where we now push component updates, check our product roadmap or you can check the releases and product features sections of the Snowplow Analytics blog.

Snowplow 21.04 Pennine Alps

So what’s new in 21.04 Pennine Alps? This release had a focus on a number of reliability and general hardening improvements to the platform, alongside exciting updates such as Surge Protection for AWS, Anonymous Tracking capabilities, brand new Data Models for the web, a major release of the JavaScript Trackers and automated updates to our GitHub home, `snowplow/snowplow`.

Automated snowplow/snowplow updates

Our Open Source homepage has been updated with new graphics, an architectural overview and automatic updates when any component of the Snowplow platform receives an update. This means it’s never been easier to spot the latest releases across the entire Snowplow platform. 

Head to https://github.com/snowplow/snowplow to check out the latest updates and whilst you’re there, watch and star the repository to keep up to date with all the latest releases going forward.

Surge Protection for AWS

We have released a new feature on AWS so our customers can have confidence that their pipeline will successfully scale to handle even the most extreme traffic spikes. 

We have achieved this by adding Amazon Simple Queue Service (SQS) as a buffer mechanism, acting as a pressure valve between the collector and Kinesis and preventing messages from having to wait in the collector’s memory while Kinesis is scaling. Kinesis’ slow scaling leads to over (and costly) sensitivity and provisioning in the scaling algorithm (unlike GCP where there is no need to wait for PubSub to scale). Instead, messages are written to SQS where they are queued whilst Kinesis is resizing, and the sqs2kinesis application is then responsible for reading the messages and writing to Kinesis once it is ready.  With Surge Protection, customers now have even greater assurance that their pipeline will scale faster to handle even the most extreme data surges, without having to pre-provision capacity.

Check out the announcement blog post and the docs for Open Source users to set this up. This was rolled out automatically to Snowplow Insights customers.

Anonymous Tracking

Introduced with the Snowplow JavaScript Tracker 2.17.0 and the Snowplow Stream Collector 2.1.0, it is now possible to toggle both the client-side and server-side cookies as you wish. This mode allows you to toggle between completely cookieless data collection, with the option to switch on cookies and other browser storage when the user consents.

For more information on how to leverage cookieless and anonymous tracking with Snowplow read our detailed blog post.

JavaScript Trackers Version 3

Our most popular tracker, the Snowplow JavaScript Tracker, has received it’s biggest update since it launched. This tracker is now ready for modern web applications and offers a number of benefits through it’s new plugin architecture.

Now available on NPM as well as a traditional tag based solution, the JavaScript tracker is ready for all types of web applications and deployment options. The updates also cover our Node.js tracker too, bringing all of our JavaScript trackers into a single codebase.

Read more about the JavaScript Tracker updates in the release blog post, and find out how you can create your own plugins with our new plugin templates.

Data Models for Web

We have introduced the next generation of our data models, starting with our web models. These models address a number of challenges, by moving to a new modular approach to data modelling. This allows us to segregate the ‘heavy lifting’ of an incremental Snowplow module by extrapolating the incremental logic into its own ‘base’ module. The base module produces a table which contains only events relevant to this run of the incremental logic, both new events and those events that require recomputing (for example because they are part of an ongoing session).

To find out more about our new data models, you can read our introduction to them as well as our follow up for BigQuery and Snowflake.

Reliability and General Hardening

On top of the above features, since the big R119 Failed Events update, we’ve been hard at work ensuring the core pipeline components – collector, enrich and loaders – are as reliable as possible. There have been a number of updates since R119 which are recommended. These updates offer a range of improvements and fixes to ensure your pipeline is performing optimally. Whilst some components may not have received new features since R119, many have seen updates as we continued to test and roll them out across new and existing pipelines.

Postgres Loader

Following much demand from the OSS community, we released an initial version of a Postgres Loader (v0.1.0), providing an alternative to Redshift or Snowflake when looking to try out Snowplow open source at lower volumes or for QA purposes.  In fact, the Postgres Loader is being used in our recently launched Try Snowplow experience.  

You can find further information on the Postgres Loader here, as well as documentation on how to set this up as an open source user.

Observability Updates

As part of our drive to improve the observability of each of the pipeline components, we introduced a ‘gauge’ metric to our BigQuery Loader which samples the latency of the data from the collector to point of loading to BigQuery every 1 second, giving you greater visibility of the health of your pipeline. We have published a blog post on this topic on the Snowplow blog.

In addition, we introduced basic observability to Snowplow Mini (v0.12.0) such that the logs of each of its internal services are now exported to CloudWatch on AWS and Cloud Logging on GCP, as well as runtime metrics. Read the documentation for this feature to find out more.

Our Version Compatibility Matrix is now broken down into specific platform releases and the current latest recommended compoenents. We’ve listed the major features above but many components have also seen smaller but significant updates. Running the components listed in the Snowplow 21.04 Pennie Alps Version Compatibility Matrix ensures you will be able to use all the features listed above and have the confidence they are battle tested and ready for production. Components which have been updated since the last release are highlighted in purple.

We’ve published the Snowplow 21.04 Pennie Alps Version Compatibility Matrix on our documentation site.

Public Roadmap

If you’re eager to play with the very latest Snowplow technology, you should head over to our Public Roadmap which highlights the latest updates we’ve released and what will be coming soon. We’d also love to hear from you, so please add an emoji or a comment to the features you’re excited about or that you’d like to know more about. If you’d like to know more, you can read all about it in our Public Roadmap blog post.

Snowplow Insights

For Snowplow Insights customers reading this, the majority of pipelines are already running 21.04 Pennine Alps components so you should be good to go ahead and explore the features above. If you’d like to find out exactly which versions you are running currently, please contact Snowplow Support.

As an Insights customer, you will also have benefitted from a number of features that make managing and configuring your pipeline easier, including; the ability to manage domain & cookie configuration as well as your data models from the Snowplow Insights console, an improved UI and a supporting API for managing your Data Structures, real time data quality alerting, and a re-design of many areas to improve the overall experience. Additionally, the above mentioned Surge Protection on AWS was automatically rolled out to all Insights pipelines.

Share

Related articles