Snowplow Mini 0.3.0 released

30 August 2017  •  Enes Aldemir

We are pleased to announce the 0.3.0 release of Snowplow Mini, our accessible “Snowplow in a box” distribution.

Snowplow Mini is the complete Snowplow real-time pipeline running on a single instance, available for easy deployment as a pre-built AMI. Use it to:

  1. Set up an inexpensive and easily discardable Snowplow stack for testing your tracker and schema changes
  2. Learn about Snowplow without having to set up a horizontally-scalable, highly-available production-grade pipeline

This release focuses on making Snowplow Mini much more ergonomic, with the newly bundled Control Plane, and much more secure, with built-in SSL support, courtesy of Caddy, plus HTTP authorization.

Read on for:

  1. Introducing the Control Plane
  2. Built-in SSL via Caddy
  3. HTTP basic authentication
  4. A simpler local setup via Vagrant
  5. Basic enrichments as standard
  6. Other updates
  7. Roadmap
  8. Documentation and getting help

1. Introducing the Control Plane

In our last Snowplow Mini release, internal issues with the Snowplow Mini, or schema registry updates, required “bouncing” the EC2 instance, or SSHing in and restarting all the applications; not an easy process.

To make Snowplow Mini much easier to control remotely, this release introduces a new Control Plane for Snowplow Mini (see issue 56).

The Control Plane’s first feature lets you restart all of the Snowplow Mini’s internal services with a single command. This command (also added into the Snowplow Mini’s UI as a button) makes it much easier to clear the internal schema registry’s schema cache, among other uses.

We have lots of features planned for the Control Plane in future releases - see the Roadmap section below for details.

2. Build-in SSL via Caddy

Snowplow Mini users expect to be able to communicate with the instance over HTTPS for security. We have typically recommended putting an Amazon Elastic Load Balancer in front of the Snowplow Mini to achieve this, but this over-complicates our “single box” vision for Snowplow Mini.

In this release, we bundle Caddy, the HTTPS-first webserver, and use this to provide out-of-the-box TLS support for Snowplow Mini (see issue #48).

To use this new functionality, you will need to provide a domain name for the Snowplow Mini instance - Caddy will handle the rest. The Quickstart Guide has more details on how to submit domain name to Snowplow Mini.

3. HTTP basic authentication

Previous releases of Snowplow Mini had no authentication of any sort, requiring you to resort to IP whitelisting of various ports to securely lock down the box.

Version 0.3.0 solves this problem - every service within the Snowplow Mini now has its own unique URL path, and you can lock down access to these services with HTTP basic authentication. Choose your own username and password at the start of the Snowplow Mini setup.

4. A simpler local setup via Vagrant

With this release, local setup of a Snowplow Mini for local development and testing is much more straightforward.

Simply git clone the Snowplow Mini repository and calling vagrant up in the main folder of the repository - this will bring up a full development environment, with all services running.

5. Basic enrichments as standard

Before this release, Snowplow Mini ran without any of the Snowplow configurable enrichments - which is rarely how Snowplow is run in production.

With this release, six of the most popular enrichments are enabled by default on Snowplow Mini. These enrichments are:

  • IP lookups enrichment
  • Campaign attribution enrichment
  • referer-parser enrichment
  • ua-parser enrichment
  • user-agent-utils enrichment
  • Event fingerprint enrichment

For now, these enrichments have been configured with sensible defaults. In a future release, we plan on making the enrichments fully user-configurable via the new Control Plane - watch this space!

6. Other updates

Version 0.3.0 also includes some internal changes and minor enhancements under the hood, including:

  • Upgrading the various constituent Snowplow micro-services to Snowplow R85 Metamorphosis (#81)
  • Authenticating Iglu schema registry access from Stream Enrich (#92)
  • Converting Snowplow Mini’s shell scripts to Ansible playbooks for easier provisioning (#52)

7. Roadmap

We have plenty planned for Snowplow Mini, and hope to increase the pace of development on this critical Snowplow project over the coming months.

7.1 Robustness

Our first priority is around robustness. Currently under the hood Snowplow Mini uses Unix named pipes to communicate between the various bundled micro-services. These pipes are relatively fragile - and so we are embarking on a project to add NSQ to all of the relevant micro-services. NSQ will provide a much more robust queueing system for Snowplow Mini.

7.2 Extending the Control Plane

We are also excited about extending Snowplow Mini’s new Control Plane. Through the Control Plane we can let non-technical users modify and tweak every aspect of their running pipeline. We are also considering whether Snowplow Mini’s Control Plane could be the blueprint for a more generalized control plane for the wider Snowplow ecosystem - watch this space!

7.3 Stateless by default

A final important philosophical change involves changing Snowplow Mini from inherently stateful to stateless by default. Currently, the Iglu schema registry and Elasticsearch instance live inside the Snowplow Mini; over time we want Snowplow Mini to default to having no such state inside it - instead you would use the Control Plane to connect Snowplow Mini to your external schema registries and storage targets. This should make Snowplow Mini more flexible and more robust. See the Stateless Snowplow Mini milestone for further details.

If you have other changes and suggestions for the roadmap, please let us know in our forums.

8. Documentation and getting help

To learn more about getting started with Snowplow Mini, check out the Quickstart guide.

If you run into any problems, please raise a bug or get in touch with us through the usual channels.