Snowplow .NET Analytics SDK 0.1.0 released

15 June 2017  •  Devesh Shetty
Following in the footsteps of the Snowplow Scala Analytics SDK and Snowplow Python Analytics SDK, we are happy to announce the release of the Snowplow .NET Analytics SDK. This SDK makes your Snowplow enriched events easier to work with from Azure Data Lake Analytics, Azure Functions, AWS Lambda, Microsoft Orleans and other .NET-compatible data processing frameworks. This SDK has been developed as a first step towards our RFC, Porting Snowplow to Microsoft Azure. Over time,...

Snowplow 89 Plain of Jars released, porting Snowplow to Spark

12 June 2017  •  Ben Fradet
We are tremendously excited to announce the release of Snowplow 89 Plain of Jars. This release centers around the port of our batch pipeline from Twitter Scalding to Apache Spark, a direct implementation of our most popular RFC, Migrating the Snowplow batch jobs from Scalding to Spark. Read on for more information on R89 Plain of Jars, named after an archeological site in Laos: Thanks Why Spark? Spark Enrich and RDB Shredder Under the hood...

Dataflow Runner 0.3.0 released

30 May 2017  •  Ben Fradet
We are pleased to announce version 0.3.0 of Dataflow Runner, our cloud-agnostic tool to create clusters and run jobflows. This release is centered around new features and usability improvements. In this post, we will cover: Preventing overlapping job runs through locks Tagging playbooks New template functions Other updates Roadmap Contributing 1. Preventing overlapping job runs through locks This release introduces a mechanism to prevent two jobs from running at the same time. This is great...

Snowplow Scala Analytics SDK 0.2.0 released

24 May 2017  •  Anton Parkhomenko
We are pleased to announce the 0.2.0 release of the Snowplow Scala Analytics SDK, a library providing tools to process and analyze Snowplow enriched events in Scala-compatible data processing frameworks such as Apache Spark, AWS Lambda, Apache Flink and Scalding, as wells other JVM-compatible data processing frameworks. This release adds run manifest functionality, removes the Scalaz dependency and adds SDK artifacts to Maven Central, along with many other internal changes. In the rest of this...

Snowplow JavaScript Tracker 2.8.0 released

18 May 2017  •  Ben Fradet
We are pleased to announce a new release of the Snowplow JavaScript Tracker. Version 2.8.0 gives you much more flexibility and control in the area of in-browser user privacy, as well as adding new integrations for Parrable and OptimizelyX. Read on below the fold for: State storage strategy Opt-out cookie Better form tracking for passwords New OptimizelyX and Parrable contexts Extracting valuable metadata from the tracker Improved page activity handling Upgrading Documentation and help 1....

Snowplow Meetup Amsterdam #3 was all about personalisation across the customer journey

08 May 2017  •  Idan Ben-Yaacov
We were delighted to be running our third Snowplow Meetup in Amsterdam on April 5th and lucky to have speakers from de Bijenkorf and Greenhouse Group alongside our co-founder Alex Dean. Such a compelling ensemble of speakers resulted in a great turnout and lots of interesting questions from the audience. It was great to connect with the Amsterdam community of analytics practitioners, digital agencies and data scientists. It’s always exciting to connect with our community...

Insights from the first Snowplow meetup in Brazil

03 May 2017  •  Bernardo Srulzon
This is a guest blog post by Bernardo Srulzon, Business Intelligence lead at GetNinjas and a Snowplow enthusiast since 2015. In this post, Bernardo shares his insights from our first Snowplow meetup in São Paulo, which took place on April 19th. Many thanks to Bernardo for sharing his thoughts with this post and to Getninjas for hosting our meetup! If you have a story to share, feel free to get in touch. It was a...

Introducing Factotum Server

28 April 2017  •  Nicholas Ung
We are pleased to announce the release of Factotum Server, a new open-source system for scheduling and executing Factotum jobs. In previous posts, we have talked about how our pipeline orchestration journey started with cron and make, before moving on to release Factotum. Initially, the only way to interact with Factotum has been through the CLI, but now we have Factotum Server. Where Factotum fills the gap of our previous make-based solution, Factotum Server replaces...

Snowplow 88 Angkor Wat released

27 April 2017  •  Anton Parkhomenko
We are pleased to announce the release of Snowplow 88 Angkor Wat. This release introduces event de-duplication across different pipeline runs, powered by DynamoDB, along with an important refactoring of the batch pipeline configuration. Read on for more information on R88 Angkor Wat, named after the largest religious monument in the world: New storage targets configuration Cross-batch natural deduplication Upgrading Roadmap Getting help 1. New storage targets configuration Historically storage targets for the Snowplow batch...

Snowplow at GDC: why gaming companies don’t need to build their own event data pipeline

18 April 2017  •  Yali Sassoon
We at Snowplow were very excited to be invited by AWS to this year’s Games Developer Conference (GDC) in San Francisco. We both presented at the AWS Developer Day and demoed Snowplow at the AWS stand. Snowplow presentation at GDC Alex Dean, my cofounder, and I were delighted to speak at the AWS Developer Day. You can view our presentation, “Open Source Game Analytics Powered by AWS”, below. And the slides by themselves: Snowplow: open...

How JustWatch uses Snowplow data to build a differentiated service for advertising movies and drive spectacular growth

13 April 2017  •  Giuseppe Gaviani
This blog post is about how JustWatch has been using Snowplow to build a highly effective and differentiated advertising technology business and drive spectacular business growth. You can download this story in pdf here. “Snowplow provides rich, granular data that enabled us to build a sophisticated audience intelligence and double the efficiency of trailer advertising campaigns for our clients compared to the industry average” Dominik Raute, Co-Founder & CTO, JustWatch JustWatch: a data-driven company JustWatch...

How to develop better games with level analytics

12 April 2017  •  Colm O Griobhtha
Summary Product managers and game designers generally aim to design game levels in such a way that they challenge gamers enough to make completing a level satisfying, but not so challenging that they drop out and stop playing the game. This blog post shows an example of how product managers and game designers can use a well designed dashboard to better understand user behaviour across a game’s levels, design highly playable game levels, A/B test...

Snowplow Python Analytics SDK 0.2.0 released

11 April 2017  •  Anton Parkhomenko
We are pleased to announce the 0.2.0 release of the Snowplow Python Analytics SDK, a library providing tools to process and analyze Snowplow enriched event format in Python-compatible data processing frameworks such as Apache Spark and AWS Lambda. This release adds new run manifest functionality, along with many internal changes. In the rest of this post we will cover: Run manifests Using the run manifest Documentation Other changes Upgrading Getting help 1. Run manifests This...

Snowplow Analytics gets nod at MeasureCamp London

03 April 2017  •  Dilyan Damyanov
It was a busy Saturday in Pimlico as hundreds descended on the area for the 10th edition of MeasureCamp London on 25 March. My colleague Diogo and I were there representing Snowplow’s Analytics team. The dozens of sessions that attendees delivered as part of the event were heavily dominated by topics around Google Analytics and its suite of accompanying tools and services. But open-source platforms such as Snowplow got their fair share of shout outs....

Dataflow Runner 0.2.0 released

31 March 2017  •  Ben Fradet
Building on the initial release of Dataflow Runner last month, we are proud to announce version 0.2.0, aiming to bring Dataflow Runner up to feature parity with our long-standing EmrEtlRunner application. As a quick reminder, Dataflow Runner is a cloud-agnostic tool to create clusters and run jobflows which, for the moment, only supports AWS EMR. If you need a refresher on the rationale behind Dataflow Runner, feel free to checkout the RFC on the subject....

Google Cloud Dataflow example project released

30 March 2017  •  Guilherme Grijó Pires
We are pleased to announce the release of our new Google Cloud Dataflow Example Project! This is a simple time series analysis stream processing job written in Scala for the Google Cloud Dataflow unified data processing platform, processing JSON events from Google Cloud Pub/Sub and writing aggregates to Google Cloud Bigtable. The Snowplow GCP Dataflow Streaming Example Project can help you jumpstart your own real-time event processing pipeline on Google Cloud Platform (GCP). In this...

How Peak uses Snowplow to drive product development and neuroscience

03 March 2017  •  Giuseppe Gaviani
This blog post explains how Peak has been using Snowplow since July 2015 to drive its business through product development and neuroscience. You can download this story in pdf here. “Snowplow is really powerful when you start to hit that growth curve and going upwards: when you see the signs of accelerating growth and you need to start collecting as much event data as possible”, Thomas in’t Veld, Lead Data Scientist, Peak About Peak Peak...

Sigfig and Weebly talk at second Snowplow Meetup San Francisco

24 February 2017  •  Yali Sassoon
Last night we were delighted to host our second Snowplow Meetup San Francisco, at the lovely Looker offices. The event kicked off with a talk from Sigfig’s Benny Wijatno and Jenna Lemonias. Benny and Jenna gave an overview of Sigfig, before exploring how they use Snowplow to answer a wide variety of questions related to customer acquisition. Snowplow at Sigfig Weebly’s Audrey Carstensen and Bo Han followed up with an overview of how Snowplow is...

Snowplow 87 Chichen Itza released

21 February 2017  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 87 Chichen Itza. This release contains a wide array of new features, stability enhancements and performance improvements for EmrEtlRunner and StorageLoader. As of this release EmrEtlRunner lets you specify EBS volumes for your Hadoop worker nodes; meanwhile StorageLoader now writes to a dedicated manifest table to record each load. Continuing with this release series named for archaelogical sites, Release 87 is Chichen Itza, the ancient...

Snowplow away week in Berlin

20 February 2017  •  Giuseppe Gaviani
Some of the Snowplow team works remotely, so last November the team went on an away week in Berlin to rekindle the team spirit on occasion of our third Snowplow Meetup in Berlin. Team members travelled from far and wide from four countries - Russia, Canada, France and the United Kingdom, - to convene in Berlin. Here is some of the things the team did on their away week… It started with a session about...

Snowplow Meetup London Number 4: a roundup

15 February 2017  •  Giuseppe Gaviani
Our fourth Snowplow London Meetup took place on February the 8th at CodeNode. It was a fun and informative event with around 60 people attending, great talks and lots of interesting questions from the audience. We have filmed the talks, which you can watch in the links below, along with the presentation slides. How Gousto is moving to the real-time pipeline to enable just-in-time personalization Why Snowplow is at the heart of Busuu’s data and...

Snowplow .NET Tracker 1.0.0 supporting mobile devices through Xamarin released

15 February 2017  •  Ed Lewis
We’re pleased to announce the 1.0.0 release of Snowplow’s .NET Tracker. This is a major reboot of the existing .NET Tracker, convering it into a .NET Standard project; this conversion brings with it support for the tracker on mobile devices through Xamarin, plus all platforms that support .NET Core (Windows, Linux and macOS). Here is our mobile demonstration app for the tracker running on Xamarin: Read on for more: A brief history of .NET Standard...

Introducing Dataflow Runner

10 February 2017  •  Joshua Beemster
We are pleased to announce the release of Dataflow Runner, a new open-source system for the creation and running of AWS EMR jobflow clusters and steps. Big thanks to Snowplow intern Manoj Rajandrakumar for all of his hard work on this project! This release signals the first step in our journey to deconstruct EmrEtlRunner into two separate applications, a Dataflow Runner and snowplowctl, per our RFC on Discourse. In the rest of this post we...

Iglu Ruby Client 0.1.0 released

08 February 2017  •  Anton Parkhomenko
We are pleased to announce the initial release of the Iglu Ruby Client, our third library in the family of Iglu clients. In the rest of this post we will cover: Introducing Iglu Ruby Client Use cases Setup guide Usage Roadmap and upcoming features Getting help 1. Introducing Iglu Ruby Client Iglu clients are simple SDKs which let users fetch schemas for self-describing data and validate that data against its schema. As part of broadening...

A look ahead at where the Snowplow team will be and upcoming events

01 February 2017  •  Giuseppe Gaviani
If you wonder where the Snowplow team will be in the next few months, here is a list of upcoming events, which we are excited to announce. Snowplow Meetup London number 4 Our London Snowplow Meetup #4 will take place at 6.30 pm on February the 8th, at CodeNode. In addition to a talk from one of the Snowplow team, we have two fantastic speakers lined up: Dejan Petelin, senior data scientist at Gousto, will...

Roundup of Snowplow Meetup Berlin Number 3

31 January 2017  •  Giuseppe Gaviani
The third Snowplow Meetup Berlin took place on November the 16th at Betahaus. The turnout was great with about 100 people attending. We have filmed the talks, which you can watch in the links below, along with the presentation slides. Below is a list and a description of the talks. Why JustWatch adopted Snowplow and what they learned along the way How Incuda builds user journey models with Snowplow Turning insights into action with Sauna...

How a clear data taxonomy drives insight and action

27 January 2017  •  João Correia
This is guest blog post by João Correia, Senior Analytics Strategist at YouCaring and an experienced analytics professional, helping organizations embed analytics for growth and innovation. In this post, João explains how to define an analytics strategy with Snowplow Analytics that considers your business context and drives insights and action. Many thanks to João for sharing his views on this topic! If you have a story to share, feel free to get in touch. Add...

How Simply Business is using real-time data to better engage and serve its customers with Snowplow

24 January 2017  •  Giuseppe Gaviani
“The Snowplow dataset has become part of our core strategic offering”, Stewart Duncan, director of Data Science, Simply Business Simply Business are using the Snowplow platform to collect and join up key business data at a very granular, event level to better understand the customer journey and use that insight to better serve customers at different points in their journey. Here is a summary of their story. You can ready the full story here. About...

Data collection: the essential, but unloved, foundation of the data value chain

16 January 2017  •  Yali Sassoon
it is so obvious no one bothers saying it. Data collection is an essential part of any data strategy After all: without data collection, there is no data. Without data there is no data value chain. No reporting, no analysis, no data science, no data-driven decision making. It is not just that people in data don’t remark on the importance of data collection. They do not talk about data collection at all. To take just...

Looking back at 2016

12 January 2017  •  Diogo Pacheco
With the start of 2017, we have decided to look back at our 2016 blog and our community Discourse posts that generated more engagement with our users. More than ten thousand users spent a total of 548 hours reading our blog posts whilst on Discourse (which we only launched this year), 8700 unique users spent 424 hours reading and participating in the Snowplow community. Let’s take a closer look at: Top 10 blog posts published...

Snowplow Javascript Tracker 2.7.0 released

09 January 2017  •  Yali Sassoon
We are delighted to kick off 2017 with a new release of our Javascript Tracker. Version 2.7.0 includes a number of new and improved features including: Improved tracking for single-page webapps Content Security Policy compliance Automatic and manual error tracking New configuration options for first party cookies More elegant Optimizely integration New trackSelfDescribingEvent method 1. Improved tracking for single-page webapps The webPage context is invaluable when you analyse or model web data, and want to...

Factotum 0.4.0 released with support for constraints

22 December 2016  •  Joshua Beemster
We’re pleased to announce the 0.4.0 release of Snowplow’s DAG running tool Factotum! This release centers around making DAGs safer to run on distributed clusters by constraining the run to a specific host. In the rest of this post we will cover: Constraining job runs Downloading and running Factotum Roadmap Contributing 1. Constraining job runs This release adds the ability to constrain your DAG’s execution to a single host. This allows for job distribution to...

Snowplow 86 Petra released

20 December 2016  •  Anton Parkhomenko
We are pleased to announce the release of Snowplow 86 Petra. This release introduces additional event de-duplication functionality for our Redshift load process, plus a brand new data model that makes it easier to get started with web data. This release also adds support for AWS’s newest regions: Ohio, Montreal and London. Having exhausted the bird population, we needed a new set of names for our Snowplow releases. We have decided to name this release...

SQL Runner 0.5.0 released

12 December 2016  •  Joshua Beemster
We are pleased to announce version 0.5.0 of SQL Runner. This release adds some powerful new features, including local and Consul-based remote locking to ensure that SQL Runner runs your playbooks as singletons. Locking your run Checking and deleting locks Running a single query Other changes Upgrading Getting help 1. Locking your run This release adds the ability to lock your run - this ensures that you cannot accidentally start another job whilst one is...

Snowplow 85 Metamorphosis released with beta Apache Kafka support

15 November 2016  •  Alex Dean
We are pleased to announce the release of Snowplow 85 Metamorphosis. This release brings initial beta support for using Apache Kafka with the Snowplow real-time pipeline, as an alternative to Amazon Kinesis. Metamorphosis is one of Franz Kafka’s most famous books, and an apt codename for this release, as our first step towards an implementation of the full Snowplow platform that can be run off the Amazon cloud, on-premise. (We’ll come up with a new...

Factotum 0.3.0 released with webhooks

07 November 2016  •  Ed Lewis
We’re pleased to announce the 0.3.0 release of Snowplow’s DAG running tool Factotum! This release centers around making DAGs easier to create, monitor and reason about, including adding outbound webhooks to Factotum. In the rest of this post we will cover: Improving the workflow when creating DAGs Improving job monitoring using webhooks Behaviors on task failure Extras Downloading and running Factotum Roadmap Contributing 1. Improving the workflow when creating DAGs We’ve decided that to separate...

3rd Snowplow Meetup Berlin in less than two weeks!

03 November 2016  •  Idan Ben-Yaacov
On the 16th of November 19:00 we are having another exciting Berlin meetup @ Betahaus. You’ll get a chance to hear all about Sauna, our new open-source product and listen to what our clients are building in the audience segmentation space with Snowplow data. Now for the cherry on top, the whole Snowplow team will be there. We have linedup some fantastic speakers: Dominik Raute and Christoph Hoyer from JustWatch will talk about how the...

Asynchronous micro-services and Crunch Budapest 2016

30 October 2016  •  Alex Dean
At Snowplow we have been firm supporters of the Hungarian data and BI scene for several years, and so it was great to be invited to speak at the Crunch conference in Budapest earlier this month. I gave a talk at Crunch on asynchronous micro-services and the unified log - a new twist on a theme that I have been developing in my book Unified Log Processing. This blog post will briefly cover: Asynchronous micro-services...

The Snowplow Meetup New York Number 2 - a recap

27 October 2016  •  Yali Sassoon
On September 22nd, the second Snowplow Meetup London took place at the fabulous Canary office in Manhatten. I wasn’t able to make the event but I’m very lucky that Alex, Idan and Christophe made sure the talks were filmed. You can view them below: Introducing Sauna: our new decisioning and response platform Snowplow at Canary: why and how Snowplow is used at Canary Using Snowplow to enable product analytics at Animoto Event data modeling Huge...

Schema registries and Strata + Hadoop World NYC 2016

23 October 2016  •  Alex Dean
In late September the Snowplow team attended Strata + Hadoop World in New York City. It was a great opportunity to check in on the US data science and engineering scenes, and I was pleased to also have the opportunity to give a talk on schema registries. In this blog post we will briefly cover: What Crimean War gunboats teach us about the need for schema registries Alex’s session picks Christophe’s session picks Some closing...

How Viewbix uses Snowplow to enable their customers to make data-driven decisions

17 October 2016  •  Yali Sassoon
This is a guest post by Dani Waxman, Product Manager at Viewbix and long time Snowplow user. In this post, Dani describes the journey that the Viewbix team went through in order to enable their users to make data-driven decisions, how they came to use Snowplow and the role that Snowplow plays today at Viewbix. At Viewbix there are two things we are passionate about, our coffee and using analytics to help us and our...

Snowplow Python Tracker 0.8.0 released

12 October 2016  •  Yali Sassoon
We are delighted to release version 0.8.0 of the Snowplow Python Tracker, for tracking events from your Python apps, services and games. This release adds Python 3.4-5 support, 10 new event types and much richer timestamp support. Read on for: Python 3.4 and 3.5 support First class support for 10 new event types Support for true timestamps and device sent timestamps Updated API for sending self-describing events Other changes Huge thanks to Snowplow user Adam...

Snowplow 84 Steller's Sea Eagle released with Elasticsearch 2.x support

08 October 2016  •  Joshua Beemster
We are pleased to announce the release of Snowplow 84 Steller’s Sea Eagle. This release brings support for Elasticsearch 2.x to the Kinesis Elasticsearch Sink for both Transport and HTTP clients. Elasticsearch 2.x support Elasticsearch Sink buffer Override the network_id cookie with nuid param Hardcoded cookie path Migrating Redshift assets to Iglu Central Other changes Upgrading Roadmap Getting help 1. Elasticsearch 2.x support This release brings full support for Elasticsearch 2.x for both the HTTP...

Iglu 6 Ceres released with significant updates to Igluctl

07 October 2016  •  Yali Sassoon
We are pleased to announce a new Iglu release with some significant updates to Igluctl - our Iglu command-line tool. Read on for more information on Release 6 Ceres, named after the first postage stamp release in France: New option to lint schemas to a higher standard Publish schemas and jsonpath files to S3 Other updates 1. New option to lint schemas to a higher standard Snowplow users will define JSON Schemas for event and...

Kinesis Tee 0.1.0 released for Kinesis stream filtering and transformation

03 October 2016  •  Ed Lewis
We are pleased to announce the release of version 0.1.0 of Kinesis Tee. Kinesis Tee is like Unix tee, but for Kinesis streams. You can use it to: Write a Kinesis stream to another Kinesis stream (in the same region, or a different AWS account/region) Transform the format of a Kinesis stream Filter records from a Kinesis stream based on JavaScript rules In the rest of this post we will cover: Introducing Kinesis Tee Example:...

The third Snowplow Meetup London was all about Real-Time!

23 September 2016  •  Idan Ben-Yaacov
The third Snowplow Meetup London take place last Wednesday evening. The event was focused on real-time event data processing. It’s been more than two and a half years since we started working on the Snowplow real-time pipeline and it is great that in the last few months usage of that technology has really started to sky rocket! Simply Business’s Dani Sola kicked the event off with a look at how Simply Business use their real-time...

Introducing Sauna, a decisioning and response platform

22 September 2016  •  Alex Dean
It’s not every day that we get to announce an all-new category of software product here at Snowplow: we are hugely excited to be releasing version 0.1.0 of Sauna, our new open-source decisioning and response platform. Our Snowplow platform is about enabling you, as a business, to track and capture events across all your different channels, in granular detail, in a data warehouse, so you can build intelligence on that data. The data that flows...

Snowplow at Measurecamp London September 2016 - a recap

15 September 2016  •  Yali Sassoon
Last Saturday, Christophe, Alex and myself were at Measurecamp London. It has been great watching Measurecamp grow from humble roots in with the first London event 4 years ago to something that really is a global phenomenon. In that time a real digital analytics community has formed and spread: there are events all over the world. It’s been great seeing the level of discussion and analytic sophistication rise event-on-event. We built Snowplow because we’re passionate...

Second Snowplow Meetup NYC scheduled for September

07 September 2016  •  Idan Ben-Yaacov
On the 26th September at 18:00 we’ll have another NYC meetup. The kind folks at canary have invited us to hold the meetup at their offices. Not only that, but they’re putting on some food and drinks. In this meetup you’ll have a chance to hear what some of our mobile only clients are doing with their data. For that we have four incredible speakers: Licoln Ritter and Stevie Clifton from Animoto will have a...

Snowplow 83 Bald Eagle released with SQL Query Enrichment

06 September 2016  •  Anton Parkhomenko
We are pleased to announce the release of Snowplow 83 Bald Eagle. This release introduces our powerful new SQL Query Enrichment, long-awaited support for the EU Frankfurt AWS region, plus POST support for our Iglu webhook adapter. SQL Query Enrichment Support for eu-central-1 (Frankfurt) POST support for the Iglu webhook adapter Other improvements Upgrading Roadmap Getting help 1. SQL Query Enrichment The SQL Query Enrichment lets us perform dimension widening on an incoming Snowplow event...

Third Snowplow Meetup London scheduled for September

02 September 2016  •  Idan Ben-Yaacov
On the 21st September we’ll be hosting our third London meetup taking place at Skills Matter \ Code Node (Shift Room) from 18:30-21:45. In this meetup you’ll have a chance to hear real people, doing real stuff with real time. We’ll also talk about our latest developments and our vision in the real time space. To do that, we have three great speakers: Dani Sola from Simply Business will talk about the near real time...

Snowplow Android Tracker 0.6.0 released with automatic crash tracking

29 August 2016  •  Joshua Beemster
We are pleased to announce the release of the Snowplow Android Tracker version 0.6.0. This is our first mobile tracker release featuring automated event tracking, in the form of uncaught exceptions and lifecycle events. The Tracker has also undergone a great deal of refactoring to simplify its codebase and make it easier to use. This release post will cover the following topics: Uncaught exception tracking Lifecycle event tracking Removing RxJava Singleton setup Client session updates...

Snowplow Ruby Tracker 0.6.0 released

17 August 2016  •  Ed Lewis
We are pleased to announce the release of version 0.6.0 of the Snowplow Ruby Tracker. This release introduces true timestamp support, and marks the end of our support for Ruby 1.9.3. Read on for more detail on: True timestamp support Device-sent timestamp support Self describing events Upgrading Getting help 1. True timestamp support True timestamps in Snowplow are a way to indicate that you really trust the time given as accurate; this is particularly useful...

Snowplow 82 Tawny Eagle released with Kinesis Elasticsearch Service support

08 August 2016  •  Joshua Beemster
We are happy to announce the release of Snowplow 82 Tawny Eagle! This release updates the Kinesis Elasticsearch Sink with support for sending events via HTTP, allowing us to now support Amazon Elasticsearch Service. Kinesis Elasticsearch Sink Distribution changes Upgrading Getting help 1. Kinesis Elasticsearch Sink This release adds support to the Kinesis pipeline for loading of an Elasticsearch cluster over HTTP. This allows Snowplow to now load Amazon Elasticsearch Service, which only supports interaction...

A roundup of recent Snowplow Meetups in Amsterdam, Berlin, London and Tel-Aviv

05 August 2016  •  Yali Sassoon
It has been a busy summer at Snowplow. We’ve done a number of meetups in some of our favorite cities around the world but failed (until now) to write them up on the website, so apologies - it is very important for us to share talks, slides and insights with the broader Snowplow community. Let us rectify that now! Amsterdam Meetup: May 2016 We had a great crowd turn out for the event at the...

Snowplow Tracking CLI 0.1.0 released

04 August 2016  •  Ronny Yabar
We are pleased to announce the first release of the Snowplow Tracking CLI! This is a command-line application (written in Golang) to make it fast and easy to send an event to Snowplow directly from the command line. You can use the app to embed Snowplow tracking directly into your shell scripts. In the rest of this post we will cover: How to install the app How to use the app Examples Under the hood...

Iglu Schema Registry 5 Scinde Dawk released

31 July 2016  •  Anton Parkhomenko
We are pleased to announce the fifth release of the Iglu Schema Registry System, with an initial release of igluctl - an Iglu command-line tool and Schema DDL as part of Iglu project. Read on for more information on Release 5 Scinde Dawk, named after the first postage stamp in Asia: igluctl Schema DDL Migration guide Iglu roadmap Getting help 1. igluctl The main feature of this release is our new igluctl command-line application, which...

How we're reinventing digital analytics at Snowplow: presentation to the DA Hub Europe

25 July 2016  •  Yali Sassoon
Last month I was very fortunate to speak at the DA Hub Europe, as part of their •Emerging Technology Showcase*. For those who haven’t been lucky enough to attend this event, it is a brilliant opportunity to discuss, debate with and learn from some of the smartest people in Digital Analytics in Europe. Snowplow at DA Hub emerging technology showcase from yalisassoon I used the talk as an opportunity to share some of our broader...

Snowplow C++ Tracker 0.1.0 released

23 June 2016  •  Ed Lewis
We are pleased to announce the release of the Snowplow C++ Tracker. The Tracker is designed to work asynchronously and dependency-free within your C++ code to provide great performance in your applications, games and servers, even under heavy load, while also storing all of your events persistently allowing recovery from temporary network outages. In the rest of this post we will cover: How to install the tracker How to use the tracker Core features Roadmap...

Snowplow 81 Kangaroo Island Emu released

16 June 2016  •  Fred Blundun
We are happy to announce the release of Snowplow 81 Kangaroo Island Emu! At the heart of this release is the Hadoop Event Recovery project, which allows you to fix up Snowplow bad rows and make them ready for reprocessing. Hadoop Event Recovery Stream Enrich race condition New schemas Upgrading Getting help 1. Hadoop Event Recovery In April 2014 we released Scala Hadoop Bad Rows as part of Snowplow 0.9.2. This was a simple project...

Factotum 0.2.0 released

13 June 2016  •  Ed Lewis
We are pleased to announce release 0.2.0 of Snowplow’s DAG running tool, Factotum. This release introduces variables for jobs and the ability to start jobs from a given task. In the rest of this post we will cover: Job configuration variables Starting a job from a given task Output improvements Downloading and running Factotum Roadmap Contributing 1. Job configuration variables Jobs often contain per-run information such as a target hostname or IP address. In Factotum...

Snowplow 80 Southern Cassowary released

30 May 2016  •  Fred Blundun
Snowplow 80 Southern Cassowary is now available! This is a real-time pipeline release which improves stability and brings the real-time pipeline up-to-date with our Hadoop pipeline. The latest Common Enrich Exiting on error Configurable maxRecords Changes to logging Continuous deployment Other improvements Upgrading Getting help The latest Common Enrich This version of Stream Enrich uses the latest version of Scala Common Enrich, the library containing Snowplow’s core enrichment logic. Among other things, this means that...

Iglu Schema Registry 4 Epaulettes released

22 May 2016  •  Anton Parkhomenko
We are pleased to announce the fourth release of the Iglu Schema Registry System, with an initial release of the Iglu Core library, implemented in Scala. Read on for more information on Release 4 Epaulettes, named after the famous Belgian postage stamps: Scala Iglu Core Registry Syncer updates Iglu roadmap Getting help 1. Scala Iglu Core Why we created Iglu Core Our initial development of Iglu two years ago was a somewhat piecemeal process. The...

Introducing Avalanche for load-testing Snowplow

20 May 2016  •  Joshua Beemster
We are pleased to announce the very first release of Avalanche, the Snowplow load-testing project. As the Snowplow platform matures and is adopted more and more widely, understanding how Snowplow performs under various event scales and distributions becomes increasingly important. Our new open-source Avalanche project is our attempt to create a standardized framework for testing Snowplow batch and real-time pipelines under various loads. It will hopefully also expand ours and the community’s knowledge on what...

Snowplow Python Analytics SDK 0.1.0 released

17 May 2016  •  Fred Blundun
Following in the footsteps of the Snowplow Scala Analytics SDK, we are happy to announce the release of the Snowplow Python Analytics SDK! This library makes your Snowplow enriched events easier to work with in Python-compatible data processing frameworks such as Apache Spark and AWS Lambda. Some good use cases for the SDK include: Performing event data modeling in PySpark as part our Hadoop batch pipeline Developing machine learning models on your event data using...

Snowplow Scala Tracker 0.3.0 released

14 May 2016  •  Anton Parkhomenko
We are pleased to release version 0.3.0 of the Snowplow Scala Tracker. This release introduces a user-settable “true timestamp”, as well as several bug fixes. In the rest of this post we will cover: True timestamp Availability on JCenter and Maven Central Minor updates and bug fixes Upgrading Roadmap Getting help 1. True timestamp Last year we published the blog post Improving Snowplow’s understanding of time, which introduced a new tracker parameter, true_tstamp. This parameter...

Snowplow 79 Black Swan with API Request Enrichment released

12 May 2016  •  Anton Parkhomenko
We are pleased to announce the release of Snowplow 79 Black Swan. This appropriately-named release introduces our powerful new API Request Enrichment, plus a new HTTP Header Extractor Enrichment and several other improvements on the enrichments side. API Request Enrichment HTTP Header Extractor Enrichment Iglu client update Other improvements Upgrading Roadmap Getting help 1. API Request Enrichment The API Request Enrichment lets us perform dimension widening on an incoming Snowplow event using any internal or...

The first Snowplow Meetup Tel-Aviv scheduled for July

11 May 2016  •  Idan Ben-Yaacov
It gives us great pleasure to announce our first Snowplow meetup in Tel-Aviv. The event will take place on the 11th July in central Tel-Aviv (Location to be finalised) and will kick off at 6:00pm. The talks will start at 6:30pm. We plan on having three talks, 20 minutes each. We look forward to talks from: Yali Sassoon, cofounder at Snowplow, will give an overview of Snowplow, before discussing some of the analytics trends we...

The second Snowplow Meetup London scheduled for June

06 May 2016  •  Idan Ben-Yaacov
We are extremely excited to announce our second London meetup taking place at Skills Matter \ Code Node (Backspace Room) on the 15th June from 18:30-21:45. We have three incredible speakers, doing great things with their Snowplow data: Thomas in’t Veld - Lead data scientist @ Peak Jorge Bastida - Head of Development @ Streetlife Andrew Shakespeare - Business Intelligence Manage @ Finery London More details to follow, sign up today! And yes, we’ll have...

Snowplow Golang Tracker 0.1.0 released

24 April 2016  •  Joshua Beemster
We are pleased to announce the release of the Snowplow Golang Tracker. The Tracker is designed to work asynchronously within your Golang code to provide great performance in your applications and servers, even under heavy load, while also storing all of your events persistently in the event of network failure. It will also be used as a building block for a number of projects, including a new daemon to support robust asynchronous sending for the...

The inaugural Snowplow Meetup Boston is a wrap!

21 April 2016  •  Yali Sassoon
Two weeks ago, hot on the heals of the Snowplow Meetup New York, Snowplow users in Boston convened on the Carbonite offices for the first Snowplow Meetup Boston. We had two early Snowplow adopters talk about what they were doing with Snowplow. First, Rob Johnson discussed how Carbonite has become progressively more sophisticated in their use of data analytics, and the role Snowplow has played in supporting that evolution. Analytics at Carbonite: presentation to Snowplow...

The inaugural Snowplow Meetup New York is a wrap!

20 April 2016  •  Yali Sassoon
Three weeks ago Snowplow users based in New York convened on the TripAdvisor offices for the first Snowplow Meetup New York. We had four speakers from Snowplow, Oyster.com and Viewbix. I kicked off the event with a brief overview of the history of Snowplow and a look forward to the key areas of development for the Snowplow platform going forwards. Snowplow: where we came from and where we are going - March 2016 Ben Hoyt...

The second Snowplow Meetup Berlin scheduled for May

14 April 2016  •  Idan Ben-Yaacov
It gives us great pleasure to announce another Snowplow meetup in Berlin co-organised with LeROI set for the 24th May. This is our second visit to Berlin following a successful meetup we had back in August. The event will take place at Betahaus (in the cafe) and will kick off at 6:30pm. The talks will start at 7:00pm. We plan on giving an uber quick intro to Snowplow and LeROI and then give the stage...

Introducing Factotum data pipeline runner

09 April 2016  •  Ed Lewis
We are pleased to announce the release of Factotum, a new open-source system for the execution of data pipeline jobs. Pipeline orchestration is a common problem faced by data teams, and one which Snowplow has discussed in the past. As part of the Snowplow Managed Service we operate numerous data pipelines for customers, with many pipelines including with customer-specific event data modeling. As we started to outgrow our existing Make-based solution, we reviewed many job...

Introducing Snowplow Mini

08 April 2016  •  Joshua Beemster
We’ve built Snowplow for robustness, scalability and flexibility. We have not built Snowplow for ease of use or ease of setup. Nor has the Snowplow Batch Pipeline been built for speed: you might have to wait several hours from sending an event before you can view and analyze that event data in Redshift. There are occasions when you might want to work with Snowplow in an easier, faster way. Two common examples are: New users...

Schema Guru 0.6.0 released with SQL migrations support

07 April 2016  •  Anton Parkhomenko
We are pleased to announce the release of Schema Guru 0.6.0, with long-awaited initial support for database migrations in SQL. This release is an important step in allowing Iglu users to easily and safely upgrade Redshift table definitions as they evolve their underlying JSON Schemas. This release post will cover the following topics: Introducing migrations Redshift migrations in Schema Guru New –force flag Minor CLI changes Upgrading Getting help Plans for future releases 1. Introducing...

Snowplow Scala Analytics SDK 0.1.0 released

23 March 2016  •  Alex Dean
We are pleased to announce the release of our first analytics SDK for Snowplow, created for data engineers and data scientists working with Snowplow in Scala. The Snowplow Analytics SDK for Scala lets you work with Snowplow enriched events in your Scala event processing, data modeling and machine-learning jobs. You can use this SDK with Apache Spark, AWS Lambda, Apache Flink, Scalding, Apache Samza and other Scala-compatible data processing frameworks. Some good use cases for...

The second Snowplow Meetup Amsterdam scheduled for May

21 March 2016  •  Yali Sassoon
Following on from the first Snowplow Meetup Amsterdam last year, we are organising a second for Thursday May 19th. The event will be held at the Impact Hub from 6pm. We’ll have pizza, beers, and a two to three speakers. I’ll give one talk, giving: An overview of Snowplow Examples of some of the interesting things our users are doing with their data A peak at some of the features we’re planning on rolling out...

Google Accelerated Mobile Pages adds support for Snowplow

19 March 2016  •  Alex Dean
We are pleased to announce that Google’s Accelerated Mobile Pages Project (AMPP or AMP) now supports Snowplow. AMP is an open source initiative led by Google to improve the mobile web experience by optimizing web pages for mobile devices. As of this week, Snowplow is natively integrated in the project, so pages optimized with AMP HTML can be tracked in Snowplow by adding the appropriate amp-analytics tag to your pages. Read on after the fold...

2015-2016 winternship wrap-up

17 March 2016  •  Alex Dean
Snowplow’s Data Engineering winternships wrapped up last week - many thanks to Ed and Oleks for their fantastic contributions to Snowplow over the winter period! In this blog post we’ll introduce both winterns to the Snowplow community, as well as giving a little more background on the projects they worked on. This is the fifth instalment of our internship program for open source hackers - you can read more about our previous winter and summer...

An introduction to event data modeling

16 March 2016  •  Yali Sassoon
Data modeling is an essential step in the Snowplow data pipeline. We find that those companies that are most successful at using Snowplow data are those that actively develop their event data models: progressively pushing more and more Snowplow data throughout their organizations so that marketers, product managers, merchandising and editorial teams can use the data to inform and drive decision making. ‘Event data modeling’ is a very new discipline and as a result, there’s...

Snowplow 78 Great Hornbill released

15 March 2016  •  Fred Blundun
We are pleased to announce the immediate availability of Snowplow 78 Great Hornbill! This release brings our Kinesis pipeline functionally up-to-date with our Hadoop pipeline, and makes various further improvements to the Kinesis pipeline. Access to the latest Common Enrich version Click redirect mode Configurable cookie name Randomized partition keys Kinesis Elasticsearch Sink: increased flexibility New format for bad rows Kinesis Client Library upgrade Renaming Scala Kinesis Enrich to Stream Enrich Other improvements Upgrading Getting...

Ad impression and click tracking with Snowplow

07 March 2016  •  Yali Sassoon
It is possible to track both ad impression events and ad click events into Snowplow. That means if you’re a Snowplow user buying display ads to drive traffic to your website or app, you can track not only what users do once they click through onto your site or app, but what ads they have been exposed and whether or not they clicked any of them. This is paticularly useful for companies building attribution models,...

Iglu JSON Schema Registry 3 Penny Black released

04 March 2016  •  Fred Blundun
We are excited to announce the immediate availability of a new version of Iglu, incorporating a release of the Swagger-powered Scala Repo Server. Iglu has existed as a project at Snowplow for over two years now: after a period of relative quiet, we have an ambitious release schedule for Iglu planned for 2016, starting with this release. To reflect the growing importance of Iglu, and the number of moving parts within the platform, we will...

Snowplow JavaScript Tracker 2.6.0 released with Optimizely and Augur integration

03 March 2016  •  Joshua Beemster
We are excited to announce the release of version 2.6.0 of the Snowplow JavaScript Tracker! This release brings turnkey Optimizely and Augur.io integration, so you can automatically grab A/B testing data (from Optimizely) and device and user recognition data (from Augur) with the events you track with the JavaScript Tracker. In addition, we have rolled out support for Enhanced Ecommerce tracking, improved domain management and better handling of time! Read on to find out more…...

Debugging bad data in Elasticsearch and Kibana - a guide

03 March 2016  •  Yali Sassoon
One of the features that makes Snowplow unique is that we actually report bad data: any data that hits the Snowplow pipeline and fails to be processed successfully. This is incredibly valuable, because it means you can: Spot data tracking issues that emerge, quickly, and address them at source Have a corresponding high degree of confidence that trends in the data reflect trends in the business and not data issues Recently we extended Snowplow so...

Snowplow 77 Great Auk released with EMR 4.x series support

28 February 2016  •  Fred Blundun
Snowplow release 77 Great Auk is now available! This release focuses on the command-line applications used to orchestrate Snowplow, bringing Snowplow up-to-date with the new 4.x series of Elastic MapReduce releases. Elastic MapReduce AMI 4.x series compatibility Moving towards running Storage Loader on Hadoop Retrying the job in the face of bootstrap failures Monitoring improvements Removal of snowplow-emr-etl-runner.sh and snowplow-storage-loader.sh Bug fixes and other improvements Upgrading Roadmap Getting help 1. Elastic MapReduce AMI 4.x series...

Building first and last touch attribution models in Redshift SQL

22 February 2016  •  Yali Sassoon
In order to calculate the return on marketing spend on individual campaigns, digital marketers need to connect revenue events, downstream in a user journey, with marketing touch events, upstream in a user journey. This connection is necessary so that the cost of those associated with the marketing campaign that drove those marketing touches can be connected to profit associated with the conversion events later on. Different attribution models involve applying different logic to connecting those...

The inaugural Snowplow Budapest Meetup is a wrap!

19 February 2016  •  Yali Sassoon
Two and a half weeks ag ago the Snowplow team was out in the beautiful city of Budapest for the inaugural Snowplow Meetup Budapest. The event was kicked off with an awesome presentation from Gabor Ratky, CTO at Secret Sauce Partners, one of our earliest adopters. Gabor discussed the journey that the Secret Sauce team had been on, and how they came to Snowplow after trying a number of alternatives including Mixpanel, Kissmetrics and building...

How RJ Metrics measure content engagement with Snowplow: a case study

16 February 2016  •  Yali Sassoon
This is a guest post written by Drew Banin from RJMetrics, on how the RJMetrics team uses Snowplow internally to measure and optimize their content marketing. Big thanks to Drew for sharing this with us and the wider Snowplow community! If you have a story to share, get in touch. One of the major headaches of content marketing is the shortcomings of traditional success measurement. While many marketers quietly obsess over traffic and social shares,...

Schema Guru 0.5.0 released

11 February 2016  •  Anton Parkhomenko
We are pleased to announce the releases of Schema Guru 0.5.0 and Schema DDL 0.3.0, with JSON Schema and Redshift DDL processing enhancements and several bug fixes. This release post will cover the following topics: More git-friendly DDL files Added Java interoperability Fixed DDL file version bug Improvements in Schema-to-DDL transformation Upgrading Getting help Plans for future releasess 1. More git-friendly DDL files Usually Schema Guru users store their DDL files along with their JSON...

Building robust data pipelines that cope with AWS outages and other major catastrophes

10 February 2016  •  Yali Sassoon
At Snowplow, we pride ourselves on building robust data pipelines. Recently that robustness has been severly tested, by two different outages in the AWS us-east-1 region (one S3 outage, and one DynamoDB outage that caused issues with very many other AWS APIs inculding EC2), and by an SSL certificate issue with one of our client’s collectors that meant that for five consecutive days no events were successfully recorded from their most important platform: iOS. In...

Snowplow Meetups set for New York and Boston this spring!

27 January 2016  •  Yali Sassoon
Hot off the heels of last month’s Snowplow Meetup Sydney and next month’s Snowplow Meetup Budapest, we are very excited to announce Snowplow Meetups in March in New York and Boston. The New York Meetup will take place on March 30th at the TripAdvisor offices. We’ve got Animoto’s Criz Posadas to talking about Snowplow at Animoto, and Ben Hoyt of Oyster.com will talk about how they came to Snowplow, how they set it up (configuration...

Snowplow 76 Changeable Hawk-Eagle released

26 January 2016  •  Alex Dean
We are pleased to announce the release of Snowplow 76 Changeable Hawk-Eagle. This release introduces an event de-duplication process which runs on Hadoop, and also includes an important bug fix for our recent SendGrid webhook support (#2328). Here are the sections after the fold: Event de-duplication in Hadoop Shred SendGrid webhook bug fix Upgrading Roadmap and contributing Getting help 1. Event de-duplication in Hadoop Shred 1.1 Event duplicates 101 Duplicate events are an unfortunate fact...

Snowplow Objective-C Tracker 0.6.0 released

18 January 2016  •  Joshua Beemster
We are pleased to release version 0.6.0 of the Snowplow Objective-C Tracker. This release refactors the event tracking API, introduces tvOS support and fixes an important bug with client sessionization (#257). Many thanks to community member Jason for his contributions to this release! In the rest of this post we will cover: Event batching Event creation API updates Geolocation context iOS 9.0 and XCode 7 changes tvOS support Demonstration app Other changes Upgrading Getting help...

Web and mobile data only gets you to first base when building a single customer view

17 January 2016  •  Yali Sassoon
One of the main reasons that companies adopt Snowplow is to build a single customer view. For many of our users, Snowplow lets them for the first time join behavioral data gathered from their website and mobile apps with other customer data sets (e.g. CRM). This simple step drives an enormous amount of value. However, this is just the beginning. Most companies engage with users on a very large number of channels - not just...

Bauer and Digdeep presentations from the second Snowplow Analytics Sydney meetup

12 January 2016  •  Yali Sassoon
Last month Josh returned to Sydney, where he organised the second Snowplow Analytics Meetup Sydney event. Simon Rumble kicked off the event with a detailed presentation (and demo): Snowplow drives everything we do. You can view Simon’s presentation below, or the original Google Doc here. Simon described the Snowplow journey that Bauer Australia had gone on, from using it as an experimental tool to using it at the heart of their dashboarding, analytics, trending and...

We need to talk about bad data

07 January 2016  •  Yali Sassoon
Architecting data pipelines for data quality No one in digital analytics talks about bad data. A lot about working with data is sexy, but managing bad data, i.e. working to improve data quality, is not. Not only is talking about bad data not sexy, it is really awkward, because it forces us to confront a hard truth: that our data is not perfect, and therefore the insight that we build on that data might not...

Snowplow 75 Long-Legged Buzzard released with support for Urban Airship Connect and SendGrid

02 January 2016  •  Ed Lewis
We are pleased to announce the immediate availability of Snowplow 75 Long-Legged Buzzard. This release lets you warehouse the event streams generated by Urban Airship and SendGrid, and also updates our web-recalculate data model. The new webhook integrations are as follows: Urban Airship - for tracking mobile app-related events from Urban Airship using the new Urban Airship Connect product SendGrid - for tracking email-related events delivered by SendGrid via SendGrid webhooks Here are the sections...

Anton Parkhomenko is a Snowplower!

25 December 2015  •  Alex Dean
Astute readers of this blog have probably noticed a regular new author - we are hugely excited to introduce Anton Parkhomenko to the Snowplow team! Anton joined us as a Data Engineering intern this summer to launch our new Schema Guru project. Anton was already an experienced software engineer; for him the Snowplow internship was about getting his first professional experience in Scala and Functional Programming, plus gaining exposure to Big Data technologies and open...

Looking back on 2015: Most read blogposts

24 December 2015  •  Christophe Bogaert
2015 is drawing to a close, so we decided to crunch our own numbers in Redshift and share which blogposts were read the most. The Snowplow team published 82 new posts in 2015 and more than 2953 hours were spent reading content on our blog (a metric which we calculated using page pings). Apache Spark and AWS Lambda were the topics that resonated most with our readers. We will continue to write about both topics,...

Snowplow 74 European Honey Buzzard with Weather Enrichment released

22 December 2015  •  Anton Parkhomenko
We are pleased to announce the release of Snowplow release 74 European Honey Buzzard. This release adds a Weather Enrichment to the Hadoop pipeline - making Snowplow the first event analytics platform with built-in weather analytics! The rest of this post will cover the following topics: Introducing the weather enrichment Configuring the weather enrichment Upgrading Getting help Upcoming releases 1. Introducing the weather enrichment Snowplow has a steadily growing collection of configurable event enrichments -...

Scala Weather 0.1.0 released

13 December 2015  •  Anton Parkhomenko
We are pleased to announce the release of Scala Weather version 0.1.0. Scala Weather is a high-performance Scala library for fetching historical, forecast and current weather data from the OpenWeatherMap API. We are pleased to be working with OpenWeatherMap.org, Snowplow’s third external data provider after MaxMind and Open Exchange Rates. This release post will cover the following topics: Why we wrote this library Usage The cache client Getting help Plans for next release 1. Why...

Snowplow 73 Cuban Macaw released

04 December 2015  •  Fred Blundun
Snowplow release 73 Cuban Macaw is now generally available! This release adds the ability to automatically load bad rows from the Snowplow Elastic MapReduce jobflow into Elasticsearch for analysis, and formally separates the Snowplow enriched event format from the TSV format used to load Redshift. The rest of this post will cover the following topics: Loading bad rows into Elasticsearch Changes to the event format loaded into Redshift and Postgres Improved Hadoop job performance Better...

SQL Runner 0.4.0 released

03 December 2015  •  Joshua Beemster
We are pleased to announce version 0.4.0 of SQL Runner. SQL Runner is an open source app, written in Go, that makes it easy to execute SQL statements programmatically as part of a Snowplow data pipeline. This release adds some powerful new features to SQL Runner - many thanks to community member Alessandro Andrioni for his huge contributions towards yet another release! Consul support Dry run mode Environment variables template function File loading order Upgrading...

Data modeling in Spark (Part 1): Running SQL queries on DataFrames in Spark SQL

02 December 2015  •  Christophe Bogaert
An updated version of this blogpost was posted to Discourse. We have been thinking about Apache Spark for some time now at Snowplow. This blogpost is the first in a series that will explore data modeling in Spark using Snowplow data. It’s similar to Justine’s write-up and covers the basics: loading events into a Spark DataFrame on a local machine and running simple SQL queries against the data. Data modeling is a critical step in...

Inaugural Snowplow Meetup Budapest set for February 2nd 2016

23 November 2015  •  Yali Sassoon
We are enormously excited to announce that the first Snowplow Meetup Budapest has been scheduled for February 2nd, 2016. You can sign up here. The event is being hosted by the awesome team at Secret Sauce Partners. The Secret Sauce team have been long time Snopwlow users: as well as hosting the event they will be speaking about their Snowplow story and plans going forwards, including diving into how they use Snowplow for A/B testing...

Schema Guru 0.4.0 with Apache Spark support released

17 November 2015  •  Anton Parkhomenko
We are pleased to announce the release of Schema Guru version 0.4.0 with Apache Spark support, new features in both schema and ddl subcommands, bug fixes and other enhancements. In support of this, we have also released version 0.2.0 of the schema-ddl library, with Scala 2.11 support, Amazon Redshift COMMENT ON and a more precise schema-to-DDL transformation algorithm. This release post will cover the following topics: Apache Spark support Predefined enumerations Comments on Redshift table...

Unified Log London 4 on Apache ZooKeeper and analytics on write with AWS Lambda

13 November 2015  •  Alex Dean
Last week we held the fourth Unified Log London meetup here in London. As always, huge thanks to Simone Basso and the Just Eat team for hosting us in their offices and keeping us all fed with pizza and beer! More on the event after the jump: There were two talks at the meetup: Flavio Junqueira from Confluent and an Apache ZooKeeper PMC member and contributor, gave an excellent talk introducing ZooKeeper I gave a...

The second Snowplow meetup in Sydney set for December 15

10 November 2015  •  Yali Sassoon
We are enormous excited to announce that the second Snowplow Analytics meetup Sydney will take place on December 15th. Sign up today! Final details are still being confirmed. However, we can already announce two awesome speakers: Simon Rumble, Head of Data, Analytics and CRM at Bauer Media Australia, and Narbeh Yousefian, cofounder at Digdeep digital, will both be giving talks. Both are incredibly knowledge about Snowplow and the digital analytics space in general. The event...

The Crunch Practical Big Data Conference Budapest was awesome - thank you!

09 November 2015  •  Yali Sassoon
A couple of weeks ago I was very lucky to attend, and speak at Crunch Conference, a practical big data conference in Budapest, organised by the folks at Ustream and Prezi, and headlined by some of the titans of the data industry, including Doug Cutting, the creator of Hadoop (not to mention Lucene and Nutch) and Martin Kleppmann, the creator of Samza. Emerging best practices in event data pipelines Being invited to speak gave me...

SQL Runner 0.3.0 released

05 November 2015  •  Joshua Beemster
We are pleased to announce version 0.3.0 of SQL Runner. SQL Runner is an open source app, written in Go, that makes it easy to execute SQL statements programmatically as part of a Snowplow data pipeline. This release adds some powerful new features to SQL Runner - many thanks to community member Alessandro Andrioni for his huge contributions towards this release! For the first time, we are also publishing SQL Runner binaries for Windows and...

The inaugural Snowplow meetup in San Francisco is a wrap!

22 October 2015  •  Yali Sassoon
Just over two weeks ago I met with 50 Snowplow users and prospective users at Tilt’s gorgeous offices in San Francisco for the first Snowplow meetup to take place in San Francisco. There we were very privilaged to hear three fantastic talks: from Pete O’Leary at Chefsfeed, Jackson Wang from Tilt and Nora Paymer from StumbleUpon. Chefsfeed’s Pete O’Leary opened the event with a detailed look at how the Chefsfeed team use Snowplow to understand...

Iglu Objective-C Client 0.1.0 released

19 October 2015  •  Joshua Beemster
We are pleased to announce the release of version 0.1.0 of the Iglu Objective-C Client. This is the second Iglu client to be released (following the Iglu Scala Client) and will allow you to test and validate all of your Snowplow self-describing JSONs directly in your OS X and iOS applications. The rest of this post will cover the following topics: How to install the client How to use the client Why you should use...

Snowplow 72 Great Spotted Kiwi released

15 October 2015  •  Alex Dean
We are pleased to announce the release of Snowplow version 72 Great Spotted Kiwi. This release adds the ability to track clicks through the Snowplow Clojure Collector, adds a cookie extractor enrichment and introduces new deduplication queries leveraging R71’s event fingerprint. The rest of this post will cover the following topics: Click tracking New cookie extractor enrichment New deduplication queries Upgrading Getting help Upcoming releases 1. Click tracking Although the Snowplow JavaScript Tracker offers link...

Snowplow Scala Tracker 0.2.0 released

14 October 2015  •  Anton Parkhomenko
We are pleased to release version 0.2.0 of the Snowplow Scala Tracker. This release introduces a new custom context with EC2 instance metadata, a batch-based emitter, new tracking methods and one breaking API change. In the rest of this post we will cover: EC2 custom context Batch emitter New track methods Device sent timestamp Other updates Bug fixes Upgrading Getting help 1. EC2 custom context On any AWS EC2 instance, you can access basic information...

Orchestrating batch processing pipelines with cron and make

13 October 2015  •  Alex Dean
At Snowplow we are often asked how best to orchestrate multi-stage ETL pipelines, where these pipelines typically include Snowplow and our SQL Runner, sometimes Huskimo and often third-party apps and scripts. There is a wide array of tools available for this kind of orchestration, including AWS Data Pipeline, Luigi, Chronos, Jenkins and Airflow. These tools tend to have the following two capabilities: A job-scheduler, which determines when each batch processing job will run A DAG-runner,...

Snowplow Node.js Tracker 0.2.0 released

09 October 2015  •  Fred Blundun
Version 0.2.0 of the Snowplow Node.js Tracker is now available! This release changes the Tracker’s architecture and adds the ability to send Snowplow events via either GET or POST. Read on for more information… Emitters Vagrant quickstart Getting help 1. Emitters This release brings the Node.js Tracker’s API closer to those of other trackers with the addition of Emitters, objects which control how and when the events created by the Tracker are sent to the...

Snowplow Unity Tracker 0.1.0 released

08 October 2015  •  Joshua Beemster
We are pleased to announce the release of our much-requested Snowplow Unity Tracker. This Tracker rounds out our support for popular mobile environments, and is an important part of our analytics offering for videogame companies. The Tracker is designed to work completely asynchronously within your Unity code to provide great performance in your games, even under heavy load. In the rest of this post we will cover: How to install the tracker How to use...

Snowplow 71 Stork-Billed Kingfisher released

02 October 2015  •  Fred Blundun
We are pleased to announce the release of Snowplow version 71 Stork-Billed Kingfisher. This release significantly overhauls Snowplow’s handling of time and introduces event fingerprinting to support deduplication efforts. It also brings our validation of unstructured events and custom context JSONs “upstream” from our Hadoop Shred process into our Hadoop Enrich process. The rest of this post will cover the following topics: Better handling of event time JSON validation in Scala Common Enrich New unstructured...

Samza Scala example project released

30 September 2015  •  Alex Dean
We are pleased to announce the release of our new Samza Scala Example Project! This is a simple stream processing job written in Scala for the Apache Samza framework, processing JSON events from an Apache Kafka topic and regularly emitting aggregates to a second Kafka topic: This project was built by the Data Engineering team at Snowplow Analytics as a proof-of-concept for porting the Snowplow Enrichment process (which is written in Scala) to Samza. Read...

Improving Snowplow's understanding of time

15 September 2015  •  Alex Dean
As we evolve the Snowplow platform, one area we keep coming back to is our understanding and handling of time. The time at which an event took place is a crucial fact for every event - but it’s surprisingly challenging to determine accurately. Our approach to date has been to capture as many clues as to the “true timestamp” of an event as we can, and record these faithfully for further analysis. The steady expansion...

Snowplow Java Tracker 0.8.0 released

14 September 2015  •  Joshua Beemster
We are pleased to release version 0.8.0 of the Snowplow Java Tracker. This release introduces several performance upgrades and a complete rework of the API. Many thanks to David Stendardi from Viadeo for his contributions! In the rest of this post we will cover: API updates Emitter changes Performance Changing the Subject Other improvements Upgrading Documentation Getting help 1. API updates This release introduces a host of API changes to make the Tracker more modular...

SQL Runner 0.2.0 released

13 September 2015  •  Alex Dean
We are pleased to announce version 0.2.0 of SQL Runner. SQL Runner is an open source app, written in Go, that makes it easy to execute SQL statements programmatically as part of the Snowplow data pipeline. To use SQL Runner, you assemble a playbook i.e. a YAML file that lists the different .sql files to be run and the database they are to be run against. It is possible to specify which sequence the files...

The first Snowplow meetup in San Francisco announced!

04 September 2015  •  Yali Sassoon
We are super excited to announce the first Snowplow meetup in San Francisco, this October. Exact details, including date and talk topics, are still to be finalized. I can reveal that: It will be hosted by our friends at Tilt.com It will take place one evening on the week beginning October 5th We’ll have talks from people at Tilt and Chefsfeed To keep up to date with the details as they’re finalized, please sign up...

Snowplow Objective-C Tracker 0.5.0 released

03 September 2015  •  Joshua Beemster
We are pleased to release version 0.5.0 of the Snowplow Objective-C Tracker. This release introduces client sessionization, several performance upgrades and some breaking API changes. In the rest of this post we will cover: Client sessionization Tracker performance Event decoration API changes Demonstration app Other changes Upgrading Getting help 1. Client sessionization This release lets you add a new client_session context to each of your Snowplow events, allowing you to easily group events from a...

Huskimo 0.3.0 released: warehouse your Twilio telephony data in Redshift

30 August 2015  •  Alex Dean
We are pleased to announce the release of Huskimo 0.3.0, for companies who use Twilio and would like to analyze their telephony data in Amazon Redshift, alongside their Snowplow event data. For readers who missed our Huskimo introductory post: Huskimo is a new open-source project which connects to third-party SaaS platforms (Singular and now Twilio), exports their data via API, and then uploads that data into your Redshift instance. Huskimo is a complement to Snowplow’s...

Kinesis S3 0.4.0 released with gzip support

26 August 2015  •  Joshua Beemster
We are pleased to announce the release of Kinesis S3 version 0.4.0. Many thanks to Kacper Bielecki from Avari for his contribution to this release! Table of contents: gzip support Infinite loops Safer record batching Bug fixes Upgrading Getting help 1. gzip support Kinesis S3 now supports gzip as a second storage/compression option for the files it writes out to S3. Using this format, each record is treated as a byte array containing a UTF-8...

AWS Lambda Scala example project released

20 August 2015  •  Vincent Ohprecio
We are pleased to announce the release of our new AWS Lambda Scala Example Project! This is a simple time series analysis stream processing job written in Scala for AWS Lambda, processing JSON events from Amazon Kinesis and writing aggregates to Amazon DynamoDB. AWS Lambda can help you jumpstart your own real-time event processing pipeline, without having to setup and manage clusters of server infrastructure. We will take you through the steps to get this...

Snowplow 70 Bornean Green Magpie released

19 August 2015  •  Fred Blundun
We are happy to announce the release of Snowplow version 70 Bornean Green Magpie. This release focuses on improving our StorageLoader and EmrEtlRunner components and is the first step towards combining the two into a single CLI application. The rest of this post will cover the following topics: Combined configuration Move to JRuby Improved retry logic App monitoring with Snowplow Compression support Loading Postgres via stdin Multiple in buckets New safety checks Other changes Upgrading...

The inaugural Snowplow Meetup Berlin is a wrap!

19 August 2015  •  Yali Sassoon
On Tuesday evening last week, the first Snowplow Meetup Berlin event took place at Betahaus. Over 60 people turned up to listen to three talks: Sixtine Vervial at Goeuro walked through a methodology for measuring the ROI of TV campaigns using event-level data Christian Schäfer at Sparwelt discussed how to model Snowplow data in Redshift, including a SQL deep dive Christian Lubash at LeROI gave an overview of the role Snowplow can play in broader...

Dealing with duplicate event IDs

19 August 2015  •  Christophe Bogaert
The Snowplow pipeline outputs a data stream in which each line represents a single event. Each event comes with an identifier, the event ID, which was generated by the tracker and is—or rather should be—unique. However, after having used Snowplow for a while, users often notice that some events share an ID. Events are sometimes duplicated within the Snowplow pipeline itself, but it’s often the client-side environment that causes events to be sent in with...

Snowplow Objective-C Tracker 0.4.0 released

16 August 2015  •  Joshua Beemster
We are pleased to release version 0.4.0 of the Snowplow Objective-C Tracker. Many thanks to Alex Denisov from Blacklane, James Duncan Davidson from Wunderlist, Agarwal Swapnil and Hao Lian for their huge contributions to this release! In the rest of this post we will cover: Tracker performance Emitter callback Static library Demonstration app Other changes Upgrading Getting help 1. Tracker performance This release brings a complete rework of how the tracker sends events to address...

Snowplow Ruby Tracker 0.5.0 released

11 August 2015  •  Fred Blundun
We are happy to announce the release of version 0.5.0 of the Snowplow Ruby Tracker. As well as making the Tracker more robust, this release introduces several breaking API changes. Read on for more detail on: Improved concurrency More robust error handling The SelfDescribingJson class New setFingerprint method Upgrading Getting help 1. Improved concurrency The Ruby Tracker’s AsyncEmitter class now uses the Queue class to implement the producer-consumer pattern, where a fixed pool of threads...

Snowplow Python Tracker 0.7.0 released

07 August 2015  •  Fred Blundun
We are pleased to announce the release of version 0.7.0 of the Snowplow Python Tracker. This release is focused on making the Tracker more robust. The rest of this post will cover: Better concurrency Better error handling The SelfDescribingJson class Unicode support Upgrading Getting help 1. Better concurrency The Python Tracker’s AsyncEmitter now uses the Queue class to implement the producer-consumer pattern where a fixed pool of threads work on sending events. Reusing threads this...

Issue with Elastic Beanstalk Tomcat container for Clojure Collector users - diagnosis and resolution

31 July 2015  •  Yali Sassoon
A few weeks ago one of our users reported that they were consistently missing data between 1am and 2am UTC. We investigated the issue and found that their Clojure Collector was not successfully logging data in that hour. Working with engineers at AWS we identified the cause of the issue. At some stage (we cannot confirm exactly when) Amazon released a new Elastic Beanstalk Tomcat container version which had a bug related to the anacron...

Schema Guru 0.3.0 released for generating Redshift tables from JSON Schemas

29 July 2015  •  Anton Parkhomenko
We are pleased to announce the release of Schema Guru 0.3.0 and Schema DDL 0.1.0, our tools to work with JSON Schemas. This release post will cover the following new topics: Meet the Schema DDL library Commands and CLI changes Overview of the ddl command ddl command for Snowplow users Advanced options for ddl command Upgrading Getting help Plans for next release 1. Meet the Schema DDL library Schema DDL is a new Scala library...

Snowplow Android Tracker 0.5.0 released

28 July 2015  •  Joshua Beemster
We are pleased to announce the release of the Snowplow Android Tracker version 0.5.0. The Tracker has undergone a series of performance improvements, plus the addition of client-side sessionization. This release post will cover the following topics: Client-side sessionization Tracker performance Event building Other changes Demo app Documentation Getting help 1. Client-side sessionization This release lets you add a new client_session context to each of your Snowplow events, allowing you to easily group events from...

Snowplow 69 Blue-Bellied Roller released with new and updated SQL data models

24 July 2015  •  Christophe Bogaert
We are pleased to announce the release of Snowplow 69, Blue-Bellied Roller, which contains new and updated SQL data models. The blue-bellied roller is a beautiful African bird that breeds in a narrow belt from Senegal to the northeast of the Congo. It has a dark green back, a white head, neck and breast, and a blue belly and tail. This post covers: Updated data model: incremental New data model: mobile New data model: deduplicate...

Snowplow 68 Turquoise Jay released

23 July 2015  •  Fred Blundun
We are happy to announce the release of Snowplow 68, Turquoise Jay. This is a small release which adapts the EmrEtlRunner to use the new Elastic MapReduce API. Table of contents: Updates to the Elastic MapReduce API Multiple “in” buckets Backwards compatibility with old Hadoop Enrich versions Upgrading Getting help 1. Updates to the Elastic MapReduce API The Snowplow EmrEtlRunner uses Rob Slifka’s Elasticity Ruby library to interact with the Elastic MapReduce API. AWS recently...

Snowplow JavaScript Tracker 2.5.0 released

22 July 2015  •  Fred Blundun
We are excited to announce the release of version 2.5.0 of the Snowplow JavaScript Tracker! Among other things, this release adds new IDs for sessions and pageviews, making rich in-page and in-session analytics easier. Read on for more information: The session ID The page view ID Context-generating functions New Grunt task Breaking change to trackPageView Breaking change to session cookie timeouts Upgrading Documentation and help 1. The session ID In April, Snowplow Release 63 Red-Cheeked...

Snowplow 67 Bohemian Waxwing released

13 July 2015  •  Joshua Beemster
We are pleased to announce the release of Snowplow 67, Bohemian Waxwing. This release brings a host of upgrades to our real-time Amazon Kinesis pipeline as well as the embedding of Snowplow tracking into this pipeline. Table of contents: Embedded Snowplow tracking Handling outsized event payloads More informative bad rows Improved Vagrant VM New Kinesis S3 repository Other changes Upgrading Getting help 1. Embedded Snowplow tracking Both Scala Kinesis Enrich and Kinesis Elasticsearch Sink now...

AWS Lambda Node.js example project released

11 July 2015  •  Vincent Ohprecio
We are pleased to announce the release of our new AWS Lambda Node.js Example Project! This is a simple time series analysis stream processing job written in Node.js for AWS Lambda, processing JSON events from Amazon Kinesis and writing aggregates to Amazon DynamoDB. The AWS Lambda can help you jumpstart your own real-time event processing pipeline, without having to setup and manage clusters of server infrastructure. We will take you through the steps to get...

Introducing our 2015 Snowplow summer interns

10 July 2015  •  Alex Dean
You have probably seen some new names and faces around the Snowplow blog and GitHub repos recently - we are hugely excited to extend a warm (if somewhat belated) welcome to our three Snowplow summer interns! In this blog post we’ll introduce both interns to the Snowplow community, as well as giving a little more background on the projects they are working on. This is the fourth instalment of our internship program for open source...

The inaugural Snowplow meetup in Berlin event to take place on August 11

10 July 2015  •  Yali Sassoon
I am really delighted to announce the first Snowplow meetup in Berlin will be taking place on August 11th. We’ve now had Snowplow meetups in London, Sydney and Amsterdam. These events have been great because Snowplow users tend to be very sophisticated data consumers - so the meetups provide a good opportunity to share ideas and approaches to answering questions with event-level data, as well as a good forum to debate different analytic and technical...

Kinesis S3 0.3.0 released

07 July 2015  •  Joshua Beemster
We are pleased to announce the release of Kinesis S3 version 0.3.0. This release greatly improves the speed, efficiency, and reliability of Snowplow’s real-time S3 sink for Kinesis streams. Table of contents: Embedded Snowplow tracking Optimization and efficiency More informative bad rows Improved Vagrant VM Other changes Upgrading Getting help 1. Embedded Snowplow tracking This release brings with it the ability to record Snowplow events from within the sink application itself. These events include a...

Schema Guru 0.2.0 released with brand-new web UI and support for self-describing JSON Schema

05 July 2015  •  Anton Parkhomenko
Almost a month has passed since the first release of Schema Guru, our tool for deriving JSON Schemas from multiple JSON instances. That release was something of a proof-of-concept - in this 0.2.0 release we are adding much richer functionality, plus deeper integration with the Snowplow platform. This release post will cover the following new features: Web UI Newline-delimited JSON Duplicated keys warning Base64 pattern Enums Schema segmentation Self-describing schemas Upgrading Getting help Plans for...

Analyzing marketing attribution data with a D3.js visualization

02 July 2015  •  Justine Courty
Marketing attribution, as in understanding what impact different marketing channels have in driving conversion, is a very complex problem: We have no way of directly measuring the impact of an individual channel on a user’s propensity to convert It is not uncommon for users to interact with many channels prior to converting It is likely that different channels impact each other’s effectiveness Because of this difficulty, there is not yet a consensus in digital analytics...

Snowplow Android Tracker 0.4.0 released

22 June 2015  •  Joshua Beemster
We are pleased to announce the release of the fourth version of the Snowplow Android Tracker. The Tracker has undergone a series of changes in light of the issues around the Android dex limit, resulting in the library being split in two, allowing users to either use an RxJava-based version of the tracker, or a “classic” version using a standard Java threadpool. Big thanks to Duncan at Wunderlist for his work on splitting apart the...

Huskimo 0.2.0 released: warehouse your Singular marketing spend data in Redshift

21 June 2015  •  Alex Dean
We are pleased to announce Huskimo, an all-new open-source product from the Snowplow team. This initial release of Huskimo is for companies who use Singular to manage their mobile marketing campaigns, and would like to analyze their Singular marketing spend data in Amazon Redshift, alongside their Snowplow event data. Although this is version 0.2.0 of Huskimo, this is the first publicized release, and so we will take some time in this blog post to explain...

Snowplow 66 Oriental Skylark released

16 June 2015  •  Alex Dean
We are pleased to announce the release of Snowplow 66, Oriental Skylark. This release upgrades our Hadoop Enrichment process to run on Hadoop 2.4, re-enables our Kinesis-Hadoop lambda architecture and also introduces a new scriptable enrichment powered by JavaScript - our most powerful enrichment yet! Table of contents: Our enrichment process on Hadoop 2.4 Re-enabled Kinesis-Hadoop lambda architecture JavaScript scripting enrichment Other changes Upgrading Getting help 1. Our enrichment process on Hadoop 2.4 Since the...

Apache Spark Streaming example project released

10 June 2015  •  Vincent Ohprecio
We are pleased to announce the release of our new Apache Spark Streaming Example Project! This is a simple time series analysis stream processing job written in Scala for the Spark Streaming cluster computing platform, processing JSON events from Amazon Kinesis and writing aggregates to Amazon DynamoDB. The Snowplow Apache Spark Streaming Example Project can help you jumpstart your own real-time event processing pipeline. We will take you through the steps to get this simple...

Schema Guru 0.1.0 released for deriving JSON Schemas from JSONs

03 June 2015  •  Anton Parkhomenko
We’re pleased to announce the first release of Schema Guru, a tool for automatic deriving JSON Schemas from a collection of JSON instances. This release is part of a new R&D focus at Snowplow Analytics in improving the tooling available around JSON Schema, a technology used widely in our own Snowplow and Iglu projects. Read on after the fold for: Why Schema Guru? Current features Design principles A fuller example Getting help Roadmap 1. Why...

Snowplow Scala Tracker 0.1.0 released

29 May 2015  •  Fred Blundun
We are pleased to announce the release of the new Snowplow Scala Tracker! This initial release allows you to build and send unstructured events and custom contexts using the json4s library. We plan to move Snowplow towards being “self-hosting” by sending Snowplow events from within our own apps for monitoring purposes; the idea is that you should be able to monitor the health of one deployment of Snowplow by using a second instance. We will...

Unified Log London 3 with Apache Kafka and Samza at State

28 May 2015  •  Alex Dean
Last week we held the third Unified Log London meetup here in London. Huge thanks to Just Eat for hosting us in their offices and keeping us all fed with pizza and beer! More on the event after the jump: There were two talks at the meetup: I gave a recap on the Unified Log “manifesto” for new ULPers, with my regular presentation on “Why your company needs a Unified Log” Mischa Tuffield, CTO at...

First experiments with Apache Spark at Snowplow

21 May 2015  •  Justine Courty
As we talked about in our May post on the Spark Example Project release, at Snowplow we are very interested in Apache Spark for three things: Data modeling i.e. applying business rules to aggregate up event-level data into a format suitable for ingesting into a business intelligence / reporting / OLAP tool Real-time aggregation of data for real-time dashboards Running machine-learning algorithms on event-level data We’re just at the beginning of our journey getting familiar...

The inaugural Snowplow meetup in Amsterdam is a wrap!

19 May 2015  •  Yali Sassoon
Last week Christophe and I headed over to Amsterdam for the first Snowplow meetup in Amsterdam. About 50 data scientists, engineers and analysts joined us at the beautiful Travelbird offices to share ideas and approaches to driving value from Snowplow data. We were very lucky to have three excellent speakers. Niels Reijmer and Andrei Scorus led with a talk about how de Bijenkorf use Snowplow to collect event-level data to generate more detailed customer-level reporting,...

Spark Example Project 0.3.0 released for getting started with Apache Spark on EMR

10 May 2015  •  Alex Dean
We are pleased to announce the release of our Spark Example Project 0.3.0, building on the original release of the project last year. This release is part of a renewed focus on the Apache Spark stack at Snowplow. In particular, we are exploring Spark’s applicability to two Snowplow-specific problem domains: Using Spark and Spark Streaming to implement r64 Palila-style data modeling outside of Redshift SQL Using Spark Streaming to deliver “analytics-on-write” realtime dashboards as part...

Snowplow 65 Scarlet Rosefinch released

08 May 2015  •  Fred Blundun
We are pleased to announce the release of Snowplow 65, Scarlet Rosefinch. This release greatly improves the speed, efficiency, and reliability of Snowplow’s real-time Kinesis pipeline. Table of contents: Enhanced performance CORS support Increased reliability Loading configuration from DynamoDB Randomized partition keys for bad streams Removal of automatic stream creation Improved Elasticsearch index initialization Other changes Upgrading Getting help 1. Enhanced performance Kinesis’ new PutRecords API enabled the biggest performance improvement: rather than sending events...

Christophe Bogaert is a Snowplower!

20 April 2015  •  Yali Sassoon
Snowplow clients who have been working with us on analytics projects, and anyone who’s been keeping up with our releases, will have noticed a new face on the Snowplow team. It is with great pleasure that we introduce Christophe Bogaert to the Snowplow communitiy. Christophe joined us as our first Data Scientist in Februray. He designed, tested and delivered the data models that are at the heart of last week’s Snowplow v.64 Palila release -...

Snowplow 64 Palila released with support for data models

16 April 2015  •  Christophe Bogaert
We are excited to announce the immediate availability of Snowplow 64, Palila. This is a major release which adds a new data modeling stage to the Snowplow pipeline, as well as fixes a small number of important bugs across the rest of Snowplow. In this post, we will cover: Why model your Snowplow data? Understanding how the data modeling takes place The basic Snowplow data model Implementing the SQL Runner data model Implementing the Looker...

Announcing our summer open source internship program

09 April 2015  •  Alex Dean
Snowplow Analytics is looking for 1-2 open source software interns this Summer (May through August), for a 6-8 week paid internship, building on our previous successful internships in winter 2013/14, summer 2014 and winter 2014/15. Our interns will work directly on and contribute to projects within the Snowplow open source stack. We have lots of ideas for cool projects to do around Snowplow - and if you have any suggestions of your own, we would...

Snowplow meetup set for Amsterdam, May 13th

07 April 2015  •  Yali Sassoon
Hot on the heels of the Snowplow meetups in London and Sydney in earlier this year, we are delighted to announce the first Snowplow meetup in the beautiful city of Amsterdam. One of my favorite things about working at Snowplow is that we have some very data-sophisticated users around the world, and I learn an enormous amount from them every day. The meetup groups are a great place for users to come, share what they’re...

Snowplow at the Data Insights meetup in Cambridge

05 April 2015  •  Yali Sassoon
I was very fortunate to be invited to speak at the Data Insights Cambridge last week, where I gave a talk describing our evolving thinking about event data at Snowplow. The slides I presented are below. The Data Insights meetup group is fantastic group with members working with data in a diverse set of industries and roles. It made for some very interesting questions and discussions after. Big thanks to Sobia for organising the event...

Snowplow 63 Red-Cheeked Cordon-Bleu released

02 April 2015  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 63, Red-Cheeked Cordon-Bleu. This is a major release which adds two new enrichments, upgrades existing enrichments and significantly extends and improves our Canonical Event Model for loading into Redshift, Elasticsearch and Postgres. The new and upgraded enrichments are as follows: New enrichment: parsing useragent strings using the ua_parser library New enrichment: converting the money amounts in e-commerce transactions into a base currency using Open Exchange...

Snowplow ActionScript 3 Tracker 0.1.0 released

23 March 2015  •  Alex Dean
We are pleased to announce the release of our new Snowplow ActionScript 3 Tracker, contributed by Snowplow customer Viewbix. This is Snowplow’s first customer-contributed tracker - an exciting milestone for us! Huge thanks to Dani, Ephraim, Mark and Nati and the rest of the team at Viewbix for making this tracker a reality. The Snowplow ActionScript 3.0 (AS3) Tracker supports ActionScript 3.0, and lets you add analytics to your Flash Player 9+, Flash Lite 4...

Snowplow 62 Tropical Parula released

17 March 2015  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 62, Tropical Parula. This release is designed to fix an incompatibility issue between r61’s EmrEtlRunner and some older Elastic Beanstalk configurations. It also includes some other EmrEtlRunner improvements. Many thanks to Snowplow community member Dani Solà from Simply Business for his contribution to this release! Fix to support legacy Beanstalk access logs Custom bootstrap actions Other improvements to EmrEtlRunner Upgrading Getting help 1. Fix to...

Snowplow JavaScript Tracker 2.4.0 released

15 March 2015  •  Fred Blundun
We are pleased to announce the release of version 2.4.0 of the Snowplow JavaScript Tracker! This release adds support for cross-domain tracking and a new method to track timing events. Read on for more information: Tracking users cross-domain Tracking timings Dynamic handling of single-page apps Improved PerformanceTiming context Other improvements Upgrading Documentation and help 1. Tracking users cross-domain Version 2.4.0 of the JavaScript Tracker adds support for tracking users cross-domain. When a user clicks on...

Snowplow JavaScript Tracker 2.3.0 released

03 March 2015  •  Fred Blundun
We are pleased to announce the release of version 2.3.0 of the Snowplow JavaScript Tracker! This release adds a number of new features including the ability to send events by POST rather than GET, some new contexts, and improved automatic form tracking. This blog post will cover the changes in detail. POST support Customizable form tracking Automatic contexts Development quickstart Other improvements Upgrading Documentation and getting help 1. POST support Until now, the JavaScript Tracker...

Snowplow 61 Pygmy Parrot released

02 March 2015  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 61, Pygmy Parrot. This release has a variety of new features, operational enhancements and bug fixes. The major additions are: You can now parse Amazon CloudFront access logs using Snowplow The latest Clojure Collector version supports Tomcat 8 and CORS, ready for cross-domain POST from JavaScript and ActionScript EmrEtlRunner’s failure handling and Clojure Collector log handling have been improved The rest of this post will...

Joshua Beemster is a Snowplower!

19 February 2015  •  Alex Dean
You have probably started seeing a new name behind software releases and blog posts recently: we are hugely excited to belatedly introduce Joshua Beemster to the Snowplow team! Josh joined us as a Data Engineer last fall. He is our first remote hire - he is currently based in Dijon, France. Josh hails from Australia and is currently taking his Bachelor of Computer Science at Charles Sturt University, Sydney via Distance. Since starting at Snowplow,...

Snowplow Android Tracker 0.3.0 released

18 February 2015  •  Joshua Beemster
We are pleased to announce the release of the third version of the Snowplow Android Tracker. The Tracker has undergone a series of changes including removing the dependancy on the Java Core Library and a move towards using RxJava as a way of implementing asynchronous background tasks. Big thanks to Hamid at Trello for his suggestions and guidance in using Rx to track events on Android. Please note that version 0.3.0 of the Android Tracker...

Snowplow Objective-C Tracker 0.3.0 released

15 February 2015  •  Alex Dean
We are pleased to release version 0.3.0 of the Snowplow Objective-C Tracker. Many thanks to James Duncan Davidson and atdrendel from 6Wunderkinder, and former Snowplow intern Jonathan Almeida for their huge contributions to this release! In the rest of this post we will cover: Mac OS-X support New trackTimingWithCategory event Removed AFnetworking dependency Other API changes Upgrading Getting help 1. Mac OS-X support The team at 6Wunderkinder have added Mac OS X support to the...

Snowplow Python Tracker 0.6.0 released

14 February 2015  •  Fred Blundun
We are pleased to announce the release of version 0.6.0.post1 of the Snowplow Python Tracker. This version adds several methods to help identify users by adding client-side data to events. This makes the Tracker more powerful when used in conjunction with a web framework such as Django or Flask. The rest of this post will cover: set_ip_address set_useragent_user_id set_domain_user_id set_network_user_id Improved logging Upgrading and compatibility Other changes Getting help 1. set_ip_address The ip_address field in...

JSON schemas for Redshift datatypes

12 February 2015  •  Fred Blundun
This blog contains JSON schemas for the all the data types supported by Amazon Redshift. We supply two schemas for each numeric type, since you may want to send in numeric types as JSON strings rather than JSON numbers. SMALLINT INTEGER BIGINT DECIMAL REAL DOUBLE PRECISION BOOLEAN CHAR VARCHAR DATE TIMESTAMP SMALLINT The schema for passing the value in as a number: { "type": "integer" } And the schema for passing the value in as...

Inaugural Snowplow meetup London - a recap

11 February 2015  •  Yali Sassoon
This time last week we held the inaugural Snowplow London meetup. Roughly 50 Snowplow users turned up to listen to two fantastic presentations from Simply Business and Metail on the role Snowplow plays in their data architecture and how they use their Snowplow data. The talks were incredibly interesting, so I’m keen to share them with the wider Snowplow community. I’m also very eager to get feedback so that we can build on this start...

Uploading Snowplow events to Google BigQuery

08 February 2015  •  Andrew Curtis
As part of my winternship here at Snowplow Analytics in London, I’ve been experimenting with using Scala to upload Snowplow’s enriched events to Google’s BigQuery database. The ultimate goal is to add BigQuery support to both Snowplow pipelines, including being able to stream data in near-realtime from an Amazon Kinesis stream to BigQuery. This blog post will cover: Getting started with BigQuery Downloading some enriched events Installing BigQuery Loader CLI Analyzing the event stream in...

Snowplow 60 Bee Hummingbird released

03 February 2015  •  Fred Blundun
We are happy to announce the release of Snowplow 60! Our sixtieth release focuses on the Snowplow Kinesis flow, and includes: A new Kinesis “sink app” that reads the Scala Stream Collector’s Kinesis stream of raw events and stores these raw events in Amazon S3 in an optimized format An updated version of our Hadoop Enrichment process that supports as an input format the events stored in S3 by the new Kinesis sink app Together,...

Introducing our 2014-2015 Snowplow winterns

25 January 2015  •  Alex Dean
We are pleased to announce our two new Data Engineering winterns for the 2014/2015 winter period, Andrew and Aalekh. In this blog post we’ll introduce both interns to the Snowplow community, as well as giving a little more background on the projects they are working on. This is the third instalment of our internship program for open source hackers - you can read more about our previous winter and summer internship programs at those links....

Snowplow Java Tracker 0.7.0 released

24 January 2015  •  Alex Dean
We are pleased to release version 0.7.0 of the Snowplow Java Tracker. Many thanks to David Stendardi from Viadeo, former Snowplow intern Jonathan Almeida and Hamid from Trello for their contributions to this release! In the rest of this post we will cover: Architectural updates API updates Testing updates Upgrading the Java Tracker Documentation Getting help 1. Architectural updates Some Snowplow Java and Android Tracker users have reported serious performance issues running these trackers respectively...

Modeling events through entity snapshotting

18 January 2015  •  Alex Dean
At Snowplow we spend a lot of time thinking about how to model events. As businesses re-orient themselves around event streams under the Unified Log model, it becomes ever more important to properly model those event streams. After all: “garbage in” means “garbage out”: deriving business value from events is hugely dependent on modeling those events correctly in the first place. Our focus at Snowplow has been on defining a semantic model for events: one...

Snowplow Ruby Tracker 0.4.1 released

06 January 2015  •  Fred Blundun
We are happy to announce the release of version 0.4.1 of the Snowplow Ruby Tracker. This is a bugfix release which resolves compatibility issues between the Ruby Tracker and the rest of the Snowplow data pipeline. Please note that version 0.2.0 of the Ruby Tracker is dependent upon Snowplow 0.9.14 for POST support; for more information please refer to the technical documentation. Read on for more detail on: POST request format fix Compatibility Getting help...

Snowplow PHP Tracker 0.2.0 released

05 January 2015  •  Joshua Beemster
We are pleased to announce the release of the second version of the Snowplow PHP Tracker. The tracker now supports a variety of synchronous, asynchronous and out-of-band event emitters for GET and POST requests. Please note that version 0.2.0 of the PHP Tracker is dependent upon Snowplow 0.9.14; for more information please refer to the technical documentation. This release post will cover the following topics: New emitters explained New client passthrough functions Debug mode added...

Snowplow 0.9.14 released with additional webhooks

31 December 2014  •  Alex Dean
We are pleased to announce the release of Snowplow 0.9.14, our 17th and final release of Snowplow for 2014! This release contains a variety of important bug fixes, plus support for three new event streams which can be loaded into your Snowplow event warehouse and unified log: Mandrill - for tracking email and email-related events delivered by Mandrill PagerDuty - for tracking incidents generated by PagerDuty Pingdom - for tracking site outages detected by Pingdom...

New Java and Android Tracker versions released

27 December 2014  •  Alex Dean
We are pleased to release new versions of the Snowplow Android Tracker (0.2.0) and the Snowplow Java Tracker (0.6.0), as well as the Java Tracker Core (0.2.0) that underpins both trackers. Many thanks to XiaoyiLI from Viadeo, Hamid from Trello and former Snowplow intern Jonathan Almeida for their contributions to these releases! In the rest of this post we will cover: Vagrant support Updates to Java Tracker Core Updates to the Java Tracker Updates to...

Building robust data pipelines in Scala - Session at Scala eXchange, December 2014

17 December 2014  •  Alex Dean
It was great to have the opportunity to speak at Scala eXchange last week in London on the topic of “Building robust data pipelines in Scala: the Snowplow experience”. It was my first time speaking at a conference dedicated to Scala - and it was fantastic to see such widespread adoption of Scala in the UK and Europe. It was also great meeting up with Snowplow users and contributors face-to-face for the first time! Many...

Introducing self-describing Thrift

16 December 2014  •  Fred Blundun
At Snowplow we have been thinking about how to version Thrift schemas. This was prompted by the realization that we need to update the SnowplowRawEvent schema, which we use to serialize the Snowplow events received by the Scala Stream Collector. We want to update this in a way that supports further schema evolution in the future. The rest of this post will discuss our proposed solution to this problem: The problem The un-versioned approach Adding...

Snowplow JavaScript Tracker 2.2.0 released

15 December 2014  •  Fred Blundun
We are happy to announce the release of version 2.2.0 of the Snowplow JavaScript Tracker. This release improves the Tracker’s callback support, making it possible to use access previously internal variables such as the tracker-generated user fingerprint and user ID. It also adds the option to disable the Tracker’s use of localStorage and first-party cookies. The rest of this blog post will cover the following topics: More powerful callbacks Disabling localStorage and cookies Non-integer offsets...

Snowplow meetups announced in London and Sydney

08 December 2014  •  Yali Sassoon
One of the best things about working at Snowplow is that it gives us the opportunity to work with some of the smartest companies in data. Snowplow attracts users who by definition are pushing event data beyond the limits imposed by traditional web and mobile analytics tools. Having and working with such a smart userbase means we get to learn from them every day. It’s no surprise then that it was our users, rather than...

Modeling event data in Looker - the Snowplow presentation from Look&Tell London, November 2014

07 December 2014  •  Yali Sassoon
Last month the Looker team flew into London for their inaugural Look&Tell London. I was very lucky to be given the opportunity to speak at the event. In my presentation I walked through, at a high-level, how Snowplow works with Looker, and the critical data modelling steps that occur in LookML, in particular. My slides are below: I look forward to the next Looker event in London! Update: video now available The folks at Looker...

Snowplow 0.9.13 released with important bug fixes

01 December 2014  •  Fred Blundun
We are happy to announce the release of Snowplow 0.9.13 fixing two bugs found in last week’s release. Read on for more information. Safer URI parsing Fixed dependency conflict Upgrading Help 1. Safer URI parsing Version 0.9.12 used the Net-a-Porter URI library to fix up non-compliant URIs which initially failed validation. This made the enrichment process more forgiving of bad URIs. It also introduced a bug: exceptions thrown by the new step were not caught....

Snowplow 0.9.12 released with real-time loading of data into Elasticsearch beta

26 November 2014  •  Fred Blundun
Back in February, we introduced initial support for real-time event analytics using Amazon Kinesis. We are excited to announce the release of Snowplow 0.9.12 which significantly improves and extends our Kinesis support. The major new feature is our all new Kinesis Elasticsearch Sink, which streams event data from Kinesis into Elasticsearch in real-time. The data is then available to power real-time dashboards and analysis (e.g. using Kibana). In addition to enabling real-time loading of data...

London NoSQL talk on Snowplow

19 November 2014  •  Alex Dean
It was great to have the opportunity to talk at London NoSQL earlier this week on Snowplow’s journey from NoSQL to SQL, and then back to a hybrid model supporting multiple storage targets. Many thanks to Couchbase developer evangelist Matthew Revell for inviting me! My talk took us through Snowplow’s journey from using NoSQL (via Amazon S3 and Hive), to columnar storage (via Amazon Redshift and PostgreSQL), and most recently to a mixed model of...

Snowplow 0.9.11 released with support for webhooks

10 November 2014  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 0.9.11. For the first time, you can now use Snowplow to collect, store and analyze event streams generated by supported third-party software. Many Software-as-a-Service vendors publish their own internal event streams for customers to consume - these event stream APIs are often referred to as “webhooks”, sometimes as “streaming APIs”, “postbacks” or “HTTP response APIs”. Snowplow 0.9.11 adds first-class support for an initial set of...

Snowplow iOS Tracker 0.2.0 released

08 November 2014  •  Alex Dean
We are pleased to announce the release of version 0.2.0 of the Snowplow iOS Tracker. This is an important update which changes the Tracker’s approach to recording Apple’s Identifier For Advertisers (IFA). Apps that do not display advertisements are not allowed to access the IFA on an iOS device, and Apple will reject apps that attempt to do this. Unfortunately, the Snowplow iOS Tracker v0.1.x was configured to always record the IFA as part of...

Snowplow Ruby Tracker 0.4.0 released

07 November 2014  •  Fred Blundun
We are pleased to announce the release of version 0.4.0 of the Snowplow Ruby Tracker. This release adds several methods to help identify users using client-side data, making the Ruby Tracker much more powerful when used from a Ruby web or e-commerce framework such as Rails, Sinatra or Spree. The rest of this post will cover: set_ip_address set_useragent_user_id set_domain_user_id set_network_user_id Other changes Getting help 1. set_ip_address The ip_address field in the Snowplow event model is...

Snowplow JavaScript Tracker 2.1.1 released with new events

06 November 2014  •  Fred Blundun
We are delighted to announce the release of version 2.1.1 of the Snowplow JavaScript Tracker! This release contains a number of new features, most prominently several new unstructured events and a context for recording the browser’s PerformanceTiming. This blog post will cover the following topics: New events Page performance context Link content Tracker core integration Custom callbacks forceSecureTracker Outbound queue New example page Other improvements Upgrading Getting help 1. New events 1.1 Automatic form tracking...

Snowplow 0.9.10 released with support for new JavaScript Tracker v2.1.0 events

06 November 2014  •  Alex Dean
We are pleased to announce the release of Snowplow 0.9.10. This is a minimalistic release designed to support the new events and context of the Snowplow JavaScript Tracker v2.1.1, also released today This release is primarily targeted at Snowplow users of Amazon Redshift who are upgrading to the latest Snowplow JavaScript Tracker (v2.1.0+). Here are the sections after the fold: New Redshift tables New JSON Path files A note on link_clicks Upgrading Documentation and help...

Span Conference and Why your company needs a Unified Log

02 November 2014  •  Alex Dean
It was great to have the opportunity to speak at Span Conference this week in London on the topic of “Why your company needs a Unified Log”. Span is a single-track developer conference about scaling, organized by Couchbase developer evangelist Matthew Revell; Tuesday’s was the inaugural Span and it was great to be a part of it. Below the fold I will (briefly) cover: Why your company needs a Unified Log My highlights from Span...

Snowplow 0.9.9 released with campaign attribution enrichment

27 October 2014  •  Fred Blundun
We are pleased to announce the release of Snowplow 0.9.9. This is primarily a comprehensive bug fix release, although it also adds the new campaign_attribution enrichment to our enrichment registry. Here are the sections after the fold: The campaign_attribution enrichment Clojure Collector fixes StorageLoader fixes EmrEtlRunner fixes and enhancements Hadoop Enrich fixes and enhancements Upgrading Documentation and help 1. The campaign_attribution enrichment Snowplow has five fields relating to campaign attribution: mkt_medium, mkt_source, mkt_term, mkt_content, and...

Snowplow PHP Tracker 0.1.0 released

30 September 2014  •  Joshua Beemster
We are pleased to announce the release of the first version of the Snowplow PHP Tracker. The tracker supports synchronous GET and POST requests. This introductory post will cover the following topics: Installation How to use the tracker Getting help 1. Installation The Snowplow PHP Tracker is published to Packagist, the central repository for Composer PHP packages. To add it to your project, add it as a requirement in your composer.json file: 1 { 2...

Snowplow .NET Tracker 0.1.0 released

29 September 2014  •  Fred Blundun
We are pleased to announce the release of the first version of the Snowplow .NET Tracker. The tracker supports synchronous and asynchronous GET and POST requests and has an offline mode which stores unsent events using Message Queueing. This introductory post will cover the following topics: Installation How to use the tracker Features Logging Getting help 1. Installation The Snowplow .NET Tracker is published to NuGet, the .NET package manager. To add it to your...

Berlin trip round-up

29 September 2014  •  Alex Dean
Yali and I are back from the Snowplow team’s trip to Berlin - it was a great visit, seeing plenty of new and old faces alike. Below the fold I will (briefly) cover: Wednesday: startups roundtable and DAALA Thursday: co-up and Big Data Beers Thoughts on the Berlin ecosystem and our next visit 1. Wednesday: startups roundtable and DAALA We started Wednesday with a Snowplow technology roundtable with some of Berlin’s large consumer startups, huge...

The Snowplow team will be in the Bay Area and Seattle in October - get in touch if you'd like to meet

25 September 2014  •  Alex Dean
I (Alex) will be in the Bay Area and Seattle for two weeks starting from Monday 6th October, visiting Snowplow customers, users and partners. If you’re interested in meeting up to discuss Snowplow, event analytics or unified log processing more generally, I’d love to arrange a meeting! I will be based in San Francisco from Monday October 6th, then flying up to Seattle early on the 15th, staying there for the rest of the week....

Snowplow 0.9.8 released for mobile analytics

18 September 2014  •  Alex Dean
We are hugely excited to announce the release of the long-awaited Snowplow version 0.9.8, adding event analytics support for iOS and Android applications. Mobile event analytics has been the most requested feature from the Snowplow community for some time, with many users keen to feed their Snowplow data pipeline with events from mobile apps, alongside their existing websites and server software. Mobile event analytics is a major step in Snowplow’s journey from a web analytics...

Snowplow iOS Tracker 0.1.1 released

17 September 2014  •  Jonathan Almeida
We’re extremely excited to announce our initial release of the Snowplow iOS Tracker. Mobile trackers have been one of the Snowplow community’s most highly requested features, and we are very pleased to finally have this ready for release. The Snowplow iOS Tracker will allow you to track Snowplow events from your iOS applications and games. This release comes with many features you may already be familiar with in other Snowplow Trackers, along with a few...

Snowplow Android Tracker 0.1.1 released

17 September 2014  •  Jonathan Almeida
We are proud to release the Snowplow Android Tracker, one of the most requested Trackers so far. This is a major milestone for us, leveraging Snowplow 0.9.8 for mobile analytics support. The Android Tracker has evolved in tandem with the Java Tracker. We have based the Android Tracker on the same Java Tracker Core that powers the Java Tracker, along with a few additions, such as tracking geographical location, and sending mobile-specific context data. So...

Come and meet the Snowplow team this September in Berlin

03 September 2014  •  Yali Sassoon
Both Alex and I will be in Berlin for two events later this September. I’ll be giving a talk at at September’s DAALA Berlin on September 24th, a monthly digital analytics event organised by Matthias Bettag. I will give an overview of the Snowplow platform, both from an analytical and technical point of view, followed by a talk from Christian Lubasch, from LeROI Marketing, who will cover how he worked with the team at GoEuro...

Snowplow 0.9.7 released with important bug fixes

02 September 2014  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow version 0.9.7. 0.9.7 is a “tidy-up” release which fixes some important bugs, particularly: A bug in 0.9.5 onwards which was preventing events containing multiple JSONs from being shredded successfully (#939) Our Hive table definition falling behind Snowplow 0.9.6’s enriched event format updates (#965) A bug in EmrEtlRunner causing issues running Snowplow inside some VPC environments (#956) As well as these important fixes, 0.9.7 comes with...

Snowplow Ruby Tracker 0.3.0 released

29 August 2014  •  Fred Blundun
We are happy to announce the release of the Snowplow Ruby Tracker version 0.3.0. This version adds support for asynchronous requests and POST requests, and introduces the new Subject and Emitter classes. The rest of this post will cover: The Subject class The Emitter class Chainable methods Logging Contracts Other changes Upgrading Getting help 1. The Subject class An instance of the Subject class represents a user who is performing an event in the Subject-Verb-Direct...

Iglu release 2 with a new RESTful schema server

28 August 2014  •  Ben Fradet
We are pleased to announce the second release of Iglu, our machine-readable schema repository system for JSON Schema. If you are not familiar with what Iglu is, please read the blog post for the initial release of Iglu. Iglu release 2 introduces a new Scala-based repository server, allowing users to publish, test and serve schemas via an easy-to-use RESTful interface. This is a huge step forward compared to our current approach, which involves uploading schemas...

Introducing our Snowplow summer interns

21 August 2014  •  Alex Dean
Following on from our highly successful winter internship program, we were keen to expand the program by recruiting open source hackers for more extended internships over summer 2014. We also wanted to expand the scope of our internship program, by including a data science intern alongside our traditional data engineering internships. And as always, our internships were open to remote applicants as well as candidates in London. If you have been following the Snowplow blog...

Snowplow Java Tracker 0.5.0 released

18 August 2014  •  Jonathan Almeida
We’re excited to announce another release of the Snowplow Java Tracker version 0.5.0 This release comes with a few changes to the Tracker method signatures to support our upcoming Snowplow 0.9.7 release with POST support, bug fixes, and more. Notably, we’ve added a new class for supporting your context data. I’ll be covering everything mentioned above in more detail: Project structure changes Collector endpoint changes for POST requests The SchemaPayload Class Emitter callback Configuring the...

Snowplow Python Tracker 0.5.0 released

13 August 2014  •  Fred Blundun
We are happy to announce the release of version 0.5.0 of the Snowplow Python Tracker! This release is focused mainly on synchronizing the Python Tracker’s support for POST requests with the rest of Snowplow, but also makes its API more consistent. In this post we will cover: POST requests New feature: multiple emitters More consistent API for callbacks More consistent API for tracker methods UUIDs Bug fix: flushing an empty buffer Upgrading Support 1. Updated...

Snowplow Node.js Tracker 0.1.0 released

08 August 2014  •  Fred Blundun
We are delighted to announce the release of the first version of the Snowplow Node.js Tracker. This is an npm module designed to send Snowplow events to a Snowplow collector from a Node.js environment. This post will cover installing and setting up the Node.js Tracker and introduce its main features. Background How to install the tracker How to use the tracker Features Getting help 1. Background The Snowplow Node.js Tracker is our first release making...

Using graph databases to perform pathing analysis - initial experiments with Neo4J

31 July 2014  •  Nick Dingwall
In the first post in this series, we raised the possibility that graph databases might allow us to analyze event data in new ways, especially where we were interested in understanding the sequences that events occured in. In the second post, we walked through loading Snowplow page view event data into Neo4J in a graph designed to enable pathing analytics. In this post, we’re going to see whether the hypothesis we raised in the first...

Unified Log Processing is now available from Manning Early Access

31 July 2014  •  Alex Dean
I’m pleased to announce that the first three chapters of my new book are now available as part of the Manning Publications’ Early Access Program (MEAP)! Better still, I can share a 50% off code for the book - the code is mldean and it expires on Monday 4th August. The book is called Unified Log Processing - it’s a distillation (and evolution) of my experiences working with event streams over the last two and...

Snowplow Ruby Tracker 0.2.0 released

31 July 2014  •  Fred Blundun
We are pleased to announce the release of the Snowplow Ruby Tracker version 0.2.0. This release brings the Ruby Tracker up to date with the other Snowplow trackers, particularly around support of self-describing custom contexts and unstructured events. Huge thanks go to Elijah Tabb, a.k.a. ebear, for contributing the updated track_unstruct_event and track_screen_view tracker API methods among other features! Read on for more information… New tracker initialization method Updated format for unstructured events Updated format...

Loading Snowplow event-level data into Neo4J

30 July 2014  •  Nick Dingwall
In the last post, we discussed how particular types of analysis, particularly path analysis, are not well-supported in traditional SQL databases, and raised the possibility that graph databases like Neo4J might be good platforms for doing this sort of analysis. We went on to design a graph to represent event data, and page view data specifically, which captures the sequence of events. In this post, we’re going to walk through the process of taking Snowplow...

Can graph databases enable whole new classes of event analytics?

28 July 2014  •  Nick Dingwall
With Snowplow, we want to empower our users to get the most out of their data. Where your data lives has big implications for the types of query and therefore analyses you can run on it. Most of the time, we’re analysing data with SQL, and specifically, in Amazon Redshift. This is great a whole class of OLAP style analytics - it enables us to slice and dice different combinations of dimensions and metrics, for...

Snowplow 0.9.6 released with configurable enrichments

26 July 2014  •  Fred Blundun
We are pleased to announce the release of Snowplow 0.9.6. This release does four things: It fixes some important bugs discovered in Snowplow 0.9.5, related to our new shredding functionality It introduces new JSON-based configurations for Snowplow’s existing enrichments It extends our geo-IP lookup enrichment to support all five of MaxMind’s commercial databases It extends our referer-parsing enrichment to support a user-configurable list of internal domains We are really excited about our new JSON-configurable enrichments....

Snowplow Java Tracker 0.4.0 released

23 July 2014  •  Jonathan Almeida
We’re excited to announce another release of the Snowplow Java Tracker version 0.4.0. This release makes some significant updates to the Java Tracker. The main objective for this release was to bring the Tracker much closer in functional terms to the Python Tracker. In doing so, we’ve added new Emitter, TrackerPayload and Subject classes along with various changes to the existing Tracker class. Some of the other more notable features in this release is support...

Snowplow Java Tracker 0.3.0 released

13 July 2014  •  Jonathan Almeida
Today we are introducing the release of the Snowplow Java Tracker version 0.3.0. Similar to the previous 0.2.0 release, this too is a mixture of minor & stability fixes. We’ve made only a few minor interface changes, so it shouldn’t affect current users of the Java Tracker too much. You can find more on the new additions futher down in this post: Strings replaced with Maps for Context Timestamp for Trackers Logging with SLF4J Dependency...

How configurable data models and schemas make digital analytics better

11 July 2014  •  Yali Sassoon
Digital analysts don’t typically spend a lot of time thinking about data models and schemas. How data is modelled and schema’d, both at data collection time, and at analysis time, makes an enormous difference to how easily insight and value can be derived from that data. In this post, I will explain why data models and schemas matter, and why being able to define your own event data model in Snowplow is a much better...

Snowplow 0.9.5 released with JSON validation and shredding

09 July 2014  •  Alex Dean
We are hugely excited to announce the release of Snowplow 0.9.5: the first event analytics system to validate incoming event and context JSONs (using JSON Schema), and then automatically shred those JSONs into dedicated tables in Amazon Redshift. Here are some sample rows from this website, showing schema.org’s WebPage schema being loaded into Redshift as a dedicated table. (Click to zoom into the image.): With the release of Snowplow 0.9.1 back in April, we were...

Snowplow JavaScript Tracker 2.0.0 released

03 July 2014  •  Fred Blundun
We are happy to announce the release of the Snowplow JavaScript Tracker version 2.0.0. This release makes some significant changes to the public API as well as introducing a number of new features, including tracker namespacing and new link click tracking and ad tracking capabilities. This blog post will cover the following changes: Changes to the Snowplow API New feature: tracker namespacing New feature: link click tracking New feature: ad tracking New feature: offline tracking...

Snowplow Java Tracker 0.2.0 released

02 July 2014  •  Jonathan Almeida
We are pleased to announce the release of the Snowplow Java Tracker version 0.2.0. This release comes shortly after we introduced the community-contributed event tracker a little more than a week ago. In that previous post, we also mentioned our roadmap for the Java Tracker to include Android support as well as numerous other features. This release doesn’t directly act on that roadmap, but is largely a refactoring for future releases of the tracker with...

Fred Blundun is a Snowplower!

02 July 2014  •  Alex Dean
You have probably seen a new name behind blog posts, new software releases and email threads recently: we are hugely excited to introduce (somewhat belatedly!) Fred Blundun to the team! Fred joined us a Data Engineer this spring. Fred is a Mathematics graduate from Cambridge University; data engineering at Snowplow is his first full-time role in software. Fred hit the ground running at Snowplow with some great new tracker releases, including: The Snowplow Python Tracker...

Iglu schema repository 0.1.0 released

01 July 2014  •  Alex Dean
We are hugely excited to announce the release of Iglu, our first new product since launching our Snowplow prototype two and a half years ago. Iglu is a machine-readable schema repository initially supporting JSON Schemas. It is a key building block of the next Snowplow release, 0.9.5, which will validate incoming unstructured events and custom contexts using JSON Schema. As far as we know, Iglu is the first machine-readable schema repository for JSON Schema, and...

Snowplow Java Tracker 0.1.0 by Kevin Gleason released

20 June 2014  •  Alex Dean
We are proud to announce the release of our new Snowplow Java Tracker, developed by Snowplow community member Kevin Gleason. This is our first community-contributed event tracker - a real milestone for us at Snowplow and it’s all thanks to Kevin’s fantastic work! The Snowplow Java Tracker is a simple client library for Snowplow, designed to send raw Snowplow events to a Snowplow collector. Use this tracker to add analytics to your Java-based desktop and...

Budapest Data round-up

13 June 2014  •  Alex Dean
So the Budapest Data event (aka Budapest DW Forum) is over for another year - a huge thanks to Bence Arató and the whole team for organizing another excellent conference! In this blog post I want to share my two talks and my “Zero to Hadoop” workshop with the wider Snowplow community. Continuous data processing with Kinesis at Snowplow My first talk was on the Wednesday afternoon, where I spoke about our process of porting...

Snowplow Python Tracker 0.4.0 released

10 June 2014  •  Fred Blundun
We are happy to announce the release of the Snowplow Python Tracker version 0.4.0. This version introduces the Subject class, which lets you keep track of multiple users at once, and several Emitter classes, which let you send events asynchronously, pass them to a Celery worker, or even send them to a Redis database. We have added support for sending batches of events in POST requests, although the Snowplow collectors do not yet support POST...

Making Snowplow schemas flexible - our technical approach

06 June 2014  •  Yali Sassoon
In the last couple of months we’ve been doing an enormous amount of work to make the core Snowplow schema flexible. This is an essential step to making Snowplow an event analytics platform that can be used to store event data from: Any kind of application. The event dictionary, and therefore schema, for a massive multiplayer online game, will look totally different to a newspaper site, which will look different to a banking application Any...

Snowplow 0.9.4 released with improved Looker models

30 May 2014  •  Yali Sassoon
We are very pleased to release Snowplow 0.9.4, which includes a new base LookML data model and dashboard to get Snowplow users started with Looker. The new base model has some significant improvements over the old one: Querying the data is much faster. When new Snowplow event data is loaded into Redshift, Looker automatically detects it and generates the relevant session-level and visitor-level derived tables, so that they are ready to be queried directly. We’ve...

Snowplow 0.9.3 released with Clojure Collector fixes

21 May 2014  •  Alex Dean
We are pleased to announce the release of Snowplow 0.9.3, with a whole host of incremental improvements to EmrEtlRunner, plus two important bug fixes for Clojure Collector users. The first Clojure Collector issue was a problem in the file move functionality in EmrEtlRunner, which was preventing Clojure Collector users from scaling beyond a single instance without data loss. Many thanks to community members Derk Busser and Ryan Doherty for identifying the issue and working with...

The Snowplow team will be in Berlin and Budapest in June - get in touch if you'd like to meet

20 May 2014  •  Yali Sassoon
The Snowplow team will be visiting both Berlin and Budapest this June. If you’d like to meet with us, then get in touch! Budapest Datawarehousing Forum Alex will be giving a talk at the Budapest DW Forum on Thursday 5th June on our experiences building real time data processing infrastructure on Amazon Kinesis. His talk will cover: “Hero” use cases for event streaming Buildin a lambda architecture with Kinesis and EMR Moving form a batch...

Introducing self-describing JSONs

15 May 2014  •  Alex Dean
Initial self-describing JSON draft. Date: 14 May 2014. Draft authors: Alexander Dean, Frederick Blundun. Updated 10 June 2014. Changed iglu:// references to iglu: as these resource identifiers do not point to specific hosts. At Snowplow we have been thinking a lot about how to add schemas to our data models, in place of the implicit data models and wiki-based tracker protocols that we have today. Crucially, whatever we come up with must also work for...

Introducing SchemaVer for semantic versioning of schemas

13 May 2014  •  Alex Dean
Initial SchemaVer draft. Date: 13 March 2014. Draft authors: Alexander Dean, Frederick Blundun. As we start to re-structure Snowplow away from implicit data models and wiki-based tracker protocols towards formal schemas (initially Thrift and JSON Schema, later Apache Avro), we have started to think about schema versioning. "There are only two types of developer: the developer who versions his code, and developer_new_newer_newest_v2" Proper versioning of software is taken for granted these days - there are...

Snowplow 0.9.2 released to support new CloudFront log format

30 April 2014  •  Alex Dean
We have now released Snowplow 0.9.2, adding Snowplow support for the updated CloudFront access log file format introduced by Amazon on the morning of 29th April. This release was a highly collaborative effort with the Snowplow community (see this email thread for background). If you currently use the Snowplow CloudFront-based event collector, you are recommended to upgrade to this release as soon as possible. As well as support for the new log file format, this...

Snowplow Python Tracker 0.3.0 released

25 April 2014  •  Fred Blundun
We are pleased to announce the release of the Snowplow Python Tracker version 0.3.0. In this version we have added support for Snowplow custom contexts for all events. We have also updated the API for tracker initialization and ecommerce transaction tracking, added the option to turn off Pycontracts to improve performance, and added an event vendor parameter for custom unstructured events. In the rest of the post we will cover: Tracker initialization Disabling contracts Ecommerce...

Snowplow Ruby Tracker 0.1.0 released

23 April 2014  •  Fred Blundun
We are happy to announce the release of the new Snowplow Ruby Tracker. This is a Ruby gem designed to send Snowplow events to a Snowplow collector from a Ruby or Rails environment. This post will cover installing and setting up the Tracker, and provide some basic information about its features: How to install the tracker How to use the tracker Features Getting help 1. How to install the tracker The Snowplow Ruby Tracker is...

Spark Example Project released for running Spark jobs on EMR

17 April 2014  •  Alex Dean
On Saturday I attended Hack the Tower, the monthly collaborative hackday for the London Java and Scala user groups hosted at the Salesforce offices in Liverpool Street. It’s an opportunity to catch up with others in the Scala community, and to work collaboratively on non-core projects which may have longer-term value for us here at Snowplow. It also means I can code against the backdrop of some of the best views in London (see below)!...

Understanding Snowplow's unique approach to identity stitching, including comparisons with Universal Analytics, Kissmetrics and Mixpanel

16 April 2014  •  Yali Sassoon
This post was inspired by two excellent, recently published posts on identity stitching: Yehoshua Coren’s post Universal Analytics is Out of Beta - Time to Switch? and Shay Sharon’s post on the intlock blog, The Full Customer Journey - Managing User Identities with Google Universal, Mixpanel and KISSmetrics. In both posts, the authors explain in great detail the limitations that traditional analytics solutions have when dealing with identity stitching. In this post, I hope to...

Snowplow Python Tracker 0.2.0 released

15 April 2014  •  Fred Blundun
We are happy to announce the release of the Snowplow Python Tracker version 0.2.0. This release adds support for Python 2.7, makes some improvements to the Tracker API, and expands the test suite. This post will cover: Changes to the API Python 2.7 Integration tests Other improvements Upgrading Support 1. Changes to the API The call to import the tracker module has not changed: 1 from snowplow_tracker.tracker import Tracker Tracker initialization has been simplified: 1...

Snowplow 0.9.1 released with initial JSON support

11 April 2014  •  Alex Dean
We are hugely excited to announce the immediate availability of Snowplow 0.9.1. This release introduces initial support for JSON-based custom unstructured events and custom contexts in the Snowplow Enrichment and Storage processes; this is the most-requested feature from our community and a key building block for mobile and app event tracking in Snowplow. Snowplow’s event trackers have supported custom unstructured events and custom contexts for some time, but prior to 0.9.1 there had been no...

Snowplow Python Tracker 0.1.0 by wintern Anuj More released

28 March 2014  •  Alex Dean
We are proud to announce the release of our new Snowplow Python Tracker, developed by Snowplow wintern Anuj More. Anuj was one of our two remote interns this winter, joining the Snowplow team from his base in Mumbai to work on making it easy to send events to Snowplow from Python environments. The Snowplow Python Tracker is a simple PyPI-hosted client library for Snowplow, designed to send raw Snowplow events to a Snowplow collector. Use...

Snowplow JavaScript Tracker 1.0.0 released

27 March 2014  •  Fred Blundun
We are pleased to announce the release of the Snowplow JavaScript Tracker version 1.0.0. This release adds new options for user fingerprinting and makes some minor changes to the Tracker API. In addition, we have moved to a module-based project structure and added automated testing. This post will cover the following topics: New feature: user fingerprint options Changes to the Snowplow API Move to modules Automated testing Removed deprecated functionality Other structural improvements Upgrading Getting...

Introduction to Snowplow - Talk at Big Data and Data Science Israel

24 March 2014  •  Alex Dean
I was delighted to speak at the Big Data & Data Science Israel in Herzliya Pituach last night, giving an introduction to Snowplow as an open source web and event analytics platform. Huge thanks to Asaf Birenzvieg of Zaponet: Data Science Solutions for organizing the meetup, and to Microsoft ILDC for hosting. Asaf and Zaponet are doing a great job building the data meetup community in the Tel Aviv area - I would strongly encourage...

Snowplow and Looker announce formal partnership - the most powerful, flexible, web analytics solution in the world

19 March 2014  •  Yali Sassoon
Over the last few months we’ve been using Looker more and more, as we’ve come to appreciate quite how powerfully Looker compliments our own event analytics platform. In that time, we’ve got to know the team at Looker and are in the process of working with them and some of our clients to implement the combined Looker / Snowplow stack. The whole team is very excited about the results so far and are watching eagerly...

The Snowplow team will be in Israel and Cyprus in March - get in touch if you'd like to meet

18 March 2014  •  Alex Dean
I (Alex) will be heading to Tel Aviv next week and then heading on to Nicosia. If you’re interested in meeting up to discuss Snowplow, event analytics or big data processing more generally, I’d love to arrange a meeting! I will be in Tel Aviv all day Sunday March 23rd and Monday March 24th, including speaking at Big Data & Data Science Israel in Herzeliyya on the Sunday. I’ll then be in Cyprus from March...

Building an event grammar - understanding context

11 March 2014  •  Alex Dean
Here at Snowplow we recently added a new feature called “custom contexts” to our JavaScript Tracker (although not yet into our Enrichment process or Storage targets). To accompany the feature release we published a User Guide for Custom Contexts - a practical, hands-on guide to populating custom contexts from JavaScript. We want to now follow this up with a post on the underlying theory of event context: what it is, how it is generated and...

LSUG talk - Building data processing apps in Scala, the Snowplow experience

05 March 2014  •  Alex Dean
I was delighted to speak at the London Scala Users’ Group (LSUG) last night about our experiences at Snowplow building data processing pipelines in Scala. It was a great turnout, and there were plenty of excellent questions afterwards showing that a large slice of the users’ group have had overlapping data processing experiences. Many thanks to Andy Hicks of the London Scala Users’ Group for organizing, and to Skills Matter for hosting! More on the...

Why and how to use big data tools to process web analytics data? Joint Qubole and Snowplow webinar

19 February 2014  •  Yali Sassoon
Last night, I presented at a webinar organized by our friends at Qubole on using big data tools to analyze web analytics data. You can view the slides I presented below: On the webinar, I talked through the limitations associated with using traditional web analytics tools like Google Analytics and Adobe SiteCatalyst to do web analytics data, and how using big data technologies, and Snowplow and Qubole in particular, addressed those limitations. My talk was...

Snowplow JavaScript Tracker 0.14.0 released with new features

12 February 2014  •  Fred Blundun
Alex writes: this is the first blog post - and code release - by Snowplow “springtern” Fred Blundun. Stay tuned for another blog post soon introducing Fred! We are pleased to announce the release of the Snowplow JavaScript Tracker version 0.14.0. In this release we have introduced some new tracking options and compressed our tracker for better load times. We have also updated our build process to use Grunt. This blog post will cover the...

Snowplow 0.9.0 released with beta Amazon Kinesis support

04 February 2014  •  Alex Dean
We are hugely excited to announce the release of Snowplow 0.9.0. This release introduces our initial beta support for Amazon Kinesis in the Snowplow Collector and Enrichment components, and was developed in close collaboration with Snowplow wintern Brandon Amos. At Snowplow we are hugely excited about Kinesis’s potential, not just to enable near-real-time event analytics, but more fundamentally to serve as a business’s unified log, aka its “digital nervous system”. This is a concept we...

Inaugural meetup of the Amazon Kinesis - London User Group

30 January 2014  •  Alex Dean
Yesterday evening saw the inaugural meetup of the London User Group for Amazon Kinesis. At Snowplow we have been working with Kinesis since its first announcement, and we were keen to organize a Kinesis-centric meetup for the tech community here in London and the South-East. And it looks like our excitement about Kinesis is widely shared - there were almost 40 “Kinetics” attending the first meetup. Huge thanks to Just Eat for hosting all of...

Snowplow JavaScript Tracker 0.13.0 released with custom contexts

27 January 2014  •  Alex Dean
We’re pleased to announce the immediate availability of the Snowplow JavaScript Tracker version 0.13.0. This is the first new release of the Snowplow JavaScript Tracker since separating it from the main Snowplow repository last year. The primary objective of this release was to introduce some key new tracking capabilities, in preparation for adding these to our Enrichment process. Secondarily, we also wanted to perform some outstanding housekeeping and tidy-up of the newly-independent repository. In the...

A guide to custom contexts in Snowplow JavaScript Tracker 0.13.0

27 January 2014  •  Alex Dean
WARNING: This blog contains an outdated information. To review the current uproach, please, refer to our wiki post Custom contexts. — Earlier today we announced the release of Snowplow JavaScript Tracker 0.13.0, which updated all of our track...() methods to support a new argument for setting custom JSON contexts. In our earlier blog post we introduced the idea of custom contexts only very briefly. In this blog post, we will take a detailed look at...

The Snowplow team will be in New York in February - get in touch if you'd like to meet

21 January 2014  •  Alex Dean
Both Yali and myself will be visiting New York this February, and I will be heading on to Boston too. If you’re interested in meeting up to discuss Snowplow, event analytics or big data processing more generally, we’d love to arrange a meeting. Yali will be in NYC between February 9th and 12th, including attending the Looker user group NYC meetup on February 11th. Alex will be visiting later on in the month: New York...

The three eras of business data processing

20 January 2014  •  Alex Dean
Every so often, a work emerges that captures and disseminates the bleeding edge so effectively as to define a new norm. For those of us working in eventstream analytics, that moment came late in 2013 with the publication of Jay Kreps’ monograph The Log: What every software engineer should know about real-time data’s unifying abstraction. Anyone involved in the operation or analysis of a digital business ought to read Jay’s piece in its entirety. His...

Scala Forex library by wintern Jiawen Zhou released

17 January 2014  •  Alex Dean
We are proud to announce the release of our new Scala Forex library, developed by Snowplow wintern Jiawen Zhou. Jiawen joined us in the Snowplow offices in London this winter and was tasked with taking Scala Forex from a README file to an enterprise-strength Scala library for foreign exchange operations. One month later and we are hugely excited to be sharing her work with the community! Scala Forex is a high-performance Scala library for performing...

Amazon Kinesis tutorial - a getting started guide

15 January 2014  •  Yali Sassoon
Of all the developments on the Snowplow roadmap, the one that we are most excited about is porting the Snowplow data pipeline to Amazon Kinesis to deliver real-time data processing. We will publish a separate post outlining why we are so excited about this. (Hint: it’s about a lot more than simply real-time analytics on Snowplow data.) This blog post is intended to provide a starting point for developers who are interested in diving into...

Snowplow 0.8.13 released with Looker support

08 January 2014  •  Yali Sassoon
We are very pleased to announce the release of Snowplow 0.8.13. This release makes it easy for Snowplow users to get started analyzing their Snowplow data with Looker, by providing an initial Snowplow data model for Looker so that a whole host of standard dimensions, metrics, entities and events are recognized in the Looker query interface. In this post we will cover: What’s so special about analyzing Snowplow data with Looker? What does the Looker...

Five things that make analyzing Snowplow data in Looker an absolute pleasure

08 January 2014  •  Yali Sassoon
Towards the end of 2013 we published our first blog post on Looker where we explored at a technical level why Looker is so well suited to analyzing Snowplow data. Today we released Snowplow 0.8.13, the Looker release. This includes a metadata model to make it easy for Snowplow users to get up and running with Looker on top of Snowplow very quickly. In this post, we get a bit less theoretical, and highlight five...

Snowplow 0.8.12 released with a variety of improvements to the Scalding Enrichment process

07 January 2014  •  Alex Dean
We are very pleased to announce the immediate availability of Snowplow 0.8.12. We have quite a packed schedule of releases planned over the next few weeks - and we are kicking off with 0.8.12, which consists of various small improvements to our Scalding-based Enrichment process, plus some architectural re-work to prepare for the coming releases (in particular, Amazon Kinesis support). Background on this release Scalding Enrichment improvements Re-architecting our Enrichment process Installing this release 1....

Introducing our Snowplow winterns

20 December 2013  •  Alex Dean
Just over two months ago we announced our winter internship program for open source hackers, here on this blog. We had no idea what kind of response we would receive - it was our first attempt at designing an internship program, and we had never heard of a startup (even an open source company like ours) recruiting remote interns. As it turned out, we were delighted by the response we received, and we decided to...

Introducing Looker - a fresh approach to Business Intelligence that works beautifully with Snowplow

10 December 2013  •  Yali Sassoon
In the last few weeks, we have been experimenting with using Looker as a front-end to analsye Snowplow data. We’ve really liked what we’ve seen: Looker works beautifully with Snowplow. Over the next few weeks, we’ll share example analyses and visualizations of Snowplow data in Looker, and dive into Looker in more detail. In this post, we’ll take a step back and walk through some context to explain why we are so excited about Looker....

The first Graduate Data Science Initiative event in London

04 December 2013  •  Yali Sassoon
Last night, I was very privilage to speak at the first meeting of the Graduate Data Science Initiative meetup, alongside Martin Goodson from Qubit and Eddie Bells from Lyst. The event was for graduates interested in careers in Data Science careers. It’s a great initiative and we were very happy to support it. I gave the first talk on how data scientists and big data technolgies, including Snowplow, are fundamentally changing the way that people...

Loading JSON data into Redshift - the challenges of quering JSON data, and how Snowplow can be used to meet those challenges

20 November 2013  •  Yali Sassoon
Very many of our Professional Services projects involve forking the Snowplow codebase so that specific clients can use it to load their event data, stored as JSONs, into Amazon Redshift, so that they can use BI tools to create dashboards and mine that data. We’ve been surprised quite how many companies have gone down the road of using JSONs to store their event data. In this blog post, we look at: Why logging event data...

Quick start guide to learning SQL to query Snowplow data published

19 November 2013  •  Yali Sassoon
Whilst it is possible to use different BI tools to query Snowplow data with limited or no knowledge of SQL, to really get the full power of Snowplow you need to know some SQL. To help Snowplow users who are not familiar with SQL, or those who could do with a refreshing their knowledge, we’ve put together a quick start guide on the Analytics Cookbook. The purpose of the guide is to get the reader...

A round up of our trip to the Budapest BI Conference last week, and a thank you to the many people who made the trip so worthwhile

11 November 2013  •  Yali Sassoon
Last week, Alex and I had the pleasure to attend the Budapest BI Forum. I learnt a great deal from the different people I got to meet, and got a chance to give a talk on what Snowplow is, where we’re at today and how we plan to develop it going forwards. To summarize a few of the things we learnt: 1. The Python toolset for data analytics is developing incredibly rapidly We were fortunate...

Our video introduction of Snowplow to code_n

28 October 2013  •  Yali Sassoon
We were very flattered to be invited by the team at code_n to enter their competition to identify “outstanding young companies and promote their groundbreaking business models”. This year’s competition is focused on data, and has the motto Driving the Data Revolution. As part of our application process, we put together a short video introducing Snowplow. You can watch the video below. We look forward to finding out if our application has been successful!

Call for data! Support us develop experimental analyses. Have us help you answer your toughest business questions.

28 October 2013  •  Yali Sassoon
This winter we are recruiting interns to join the Snowplow team to work on discrete projects. A number of the candidates we have interviewed have expressed an interest in working with us to develop new analytics approaches on Snowplow data. In particular, we’ve had a lot of interest in piloting machine learning approaches to: Segmenting audience by behaviour Leveraging libraries for content / product recommendation (e.g. PredictionIO, Mahout, Weka) Developing and testing new approaches to...

Join the Snowplow team in Budapest the first week of November

23 October 2013  •  Yali Sassoon
We are thrilled to be going to Budapest this November, where I’ve kindly been invited to speak at the Budapest BI Forum on Snowplow, on a day dedicated to Open Analytics. Budapest is home to a thriving community of tech-savvy companies. We are very keen to meet as many of them as possible whilst we’re in Budapest - so if you’re based in Budapest (or visiting for the conference), and would like to sit down...

Snowplow 0.8.11 released - supports all Cloudfront log file formats and host of small improvements for power users

22 October 2013  •  Alex Dean
We’re very pleased to announce the release of Snowplow 0.8.11. This releases includes two different sets of updates: Critical update: support for Amazon’s new Cloudfront log file format (rolled out by Amazon during 21st October 2013) Nice-to-have additions - the most significant of which is IP anonymization We’ll discuss the updates one at a time, before covering how to upgrade to the latest version. Critical upgrade: support for Amazon’s new CloudFront log file format IP...

Using the new SQL views to perform cohort analysis with ChartIO

22 October 2013  •  Yali Sassoon
We wanted to follow-up our recent launch of Snowplow 0.8.10, with inbuilt SQL recipes and cubes, with a few posts demonstrating how you can use those views to quickly perform analytics on your Snowplow data. This is the first of those posts. In this post, we’ll cover how to perform a cohort analysis using ChartIO and Snowplow. Recap: what is Cohort Analysis We have described cohort analysis at length in the Analyst Cookbook. To sum...

Scripting Hadoop, Part One - Adventures with Scala, Rhino and JavaScript

21 October 2013  •  Alex Dean
As we have got to know the Snowplow community better, it has become clear that many members have very specific event processing requirements including: Custom trackers and collector logging formats Custom event models Custom business logic that impacts on the way their event data is processed To date, we have relied on three main techniques to help Snowplow users meet these requirements: Adding additional configuration options into the core Enrichment process (e.g. IP address anonymization,...

Snowplow 0.8.10 released with analytics cubes and recipes 'baked in'

18 October 2013  •  Yali Sassoon
We are pleased to announce the release of Snowplow 0.8.10. In this release, we have taken many of the SQL recipes we have covered in the Analysts Cookbook and ‘baked them’ into Snowplow by providing them as views that can be added directly to your Snowplow data in Amazon Redshift or PostgreSQL. Background on this release Reorganizing the Snowplow database Seeing a recipe in action: charting the number of uniques over time Seeing a cube...

Announcing our winter open source internship program

07 October 2013  •  Alex Dean
Applications for the new Snowplow Analytics open source internship program are now open! At Snowplow we are passionate about enterprise-strength open-source technology, and we are hugely excited to be offering paid internships for open source hackers this winter. Snowplow Analytics is looking for one or two open source interns this winter (December through February), for 3-6 week paid internships. Our “winterns” will work directly on and contribute to data engineering projects within the Snowplow open...

Snowplow passes 500 stars on GitHub

01 October 2013  •  Alex Dean
As of yesterday, the Snowplow repository on GitHub now has over 500 stars! We’re hugely excited to reach this milestone, having picked up 300 new watchers since our last milestone in January. Many thanks to everyone in the Snowplow community and on GitHub for their continuing support and interest! Here’s a quick round-up of the most popular open source analytics projects on GitHub: Hummingbird (real-time web analytics) - 2,299 stars Piwik (web analytics) - 1,290...

Book review - Apache Hive Essentials How-to

30 September 2013  •  Yali Sassoon
Although it is no longer part of the core Snowplow stack, Apache Hive is the gateway drug that got us started on Hadoop. As some of our recent blog posts testify, Hive is still very much a part of our big data toolkit, and this will continue as we use it to roll out new features. (E.g. for analyzing custom unstructured events.) I suspect that many Hadoopers started out with Hive, before experimenting with the...

How much does Snowplow cost to run?

27 September 2013  •  Yali Sassoon
We are very pleased to announce the release of the Snowplow Total Cost of Ownership Model. This is a model we started developing back in July, to enable: Snowplow users and prospective users to better forecast their Snowplow costs on Amazon Web Services going forwards The Snowplow Development Team to monitor how the cost of running Snowplow evolves as we build out the platform Modelling the costs associated with running Snowplow has not been straightforward:...

Reprocessing bad rows of Snowplow data using Hive, the JSON Serde and Qubole

11 September 2013  •  Yali Sassoon
This post is outdated. For more documentation on debugging and recovering bad rows, please visit: Debugging bad rows in Elasticsearch and Kibana Debugging bad rows in Elasticsearch using curl (without Kibana) Snowplow 81 release post (for recovering bad rows) Hadoop Event Recovery One of the distinguishing features of the Snowplow data pipeline is the handling of “bad” data. Every row of incoming, raw data is validated. When a row fails validation, it is logged in...

Snowplow 0.8.9 released to handle CloudFront log file format change

05 September 2013  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 0.8.9. This release was necessitated by an unannounced change Amazon made to the CloudFront access log file format on 17th August, discussed in this AWS Forum thread and this snowplow-user email thread. Essentially, Amazon switched from URL-encoding all “%”” signs found in the cs-uri-query field, to only URL-encoding them if they were not already escaped, i.e. were not followed by “25” (“%25”). This unannounced change...

Using Qubole to crunch your Snowplow web data using Apache Hive

03 September 2013  •  Yali Sassoon
We’ve just published a getting-started guide to using Qubole, a managed Big Data service, to query your Snowpow data. You can read the guide here. Snowplow delivers event data to users in a number of different places: Amazon Redshift or PostgreSQL, so you can analyze the data using traditional analytics and BI tools Amazon S3, so you can analyze that data using Hadoop-backed, big data tools e.g. Mahout, Hive and Pig, on EMR Since we...

Towards universal event analytics - building an event grammar

12 August 2013  •  Alex Dean
As we outgrow our “fat table” structure for Snowplow events in Redshift, we have been spending more time thinking about how we can model digital events in Snowplow in the most universal, flexible and future-proof way possible. When we blogged about building out the Snowplow event model earlier this year, a comment left on that post by Loic Dias Da Silva made us realize that we were missing an even more fundamental point: defining a...

Snowplow 0.8.8 released with Postgres and Hive support

05 August 2013  •  Alex Dean
We are pleased to announce the immediate release of Snowplow 0.8.8. This is a big release for us: it adds the ability to store your Snowplow events in the popular PostgreSQL open-source database. This has been the most requested Snowplow feature all summer, so we are delighted to finally release it. And if you are already happily using Snowplow with Redshift, there are two other new features to check out: We have added support for...

Snowplow presentation at the Hadoop User Group London AWS event

19 July 2013  •  Yali Sassoon
Yesterday at the Hadoop User Group, I was very fortunate to get the opportunity to speak about Snowplow at the event focused specifically on Amazon Web Services, and Redshift in particular. I hope the talk was interesting to the participants who attended. I described how we use Cloudfront and Elastic Beanstalk to get event data into AWS for processing by EMR, and how we push the output of our EMR-based enrichment process into Redshift for...

Help us build out the Snowplow Total Cost of Ownership Model

10 July 2013  •  Yali Sassoon
In a previous blog post, we described how we were in the process of building a Total Cost of Ownership model for Snowplow: something that would enable a Snowplow user, or prospective user, to accurately forecast their AWS bill going forwards based on their traffic levels. To build that model, though, we need your help. In order to ensure that our model is accurate and robust, we need to make sure that the relationships we...

Unpicking the Snowplow data pipeline and how it drives AWS costs

09 July 2013  •  Yali Sassoon
Back in March, Robert Kingston suggested that we develop a Total Cost of Ownership model for Snowplow: something that would enable a user or prospective user to accurately estimate their Amazon Web Services monthly charges going forwards, and see how those costs vary with different traffic levels. We thought this was an excellent idea. Since Rob’s suggestion, we’ve made a number of important changes to the Snowplow platform that have changed the way Snowplow costs...

.NET (C#) support added to referer-parser

09 July 2013  •  Alex Dean
We are pleased to announce the addition of .NET support (C#) to our standalone referer-parser library. Many thanks to Sepp Wijnands at iPerform Software for contributing this latest port! To recap: referer-parser is a simple library for extracting seach marketing attribution data from referer (sic) URLs. You supply referer-parser with a referer URL; it then tells you the medium, source and term (in the case of a search) for this referrer. The Scala implementation of...

Snowplow 0.8.7 released with JavaScript Tracker improvements

07 July 2013  •  Alex Dean
After a brief summer intermission, we are pleased to announce the release of Snowplow 0.8.7. This is a small release, primarily consisting of bug fixes for the JavaScript Tracker, which is bumped to version 0.12.0. As well as some tweaks and improvements, this release fixes bugs which only occurred on older versions of Internet Explorer, and fixes a bug which prevented the setCustomUrl() method from working properly. Many thanks to community member mfu0 and Snowplow...

Snowplow Tracker for Lua event analytics released

03 July 2013  •  Alex Dean
We are very pleased to announce the release of our SnowplowTracker for Lua event analytics. This is our fourth tracker to be released, following on from our JavaScript, Pixel and Arduino Trackers. As a lightweight, easily-embeddable scripting language, Lua is available in a huge number of different computing environments and platforms, from World of Warcraft through OpenResty to Adobe Lightroom. And now, the Snowplow Lua Tracker lets you collect event data from these Lua-based applications,...

Reduce your Cloudfront costs with cache control

02 July 2013  •  Yali Sassoon
One of the reasons Snowplow is very popular with very large publishers and online advertising networks is that the cost of using Snowplow to track user behavior across your website or network is significantly lower than with our commercial competitors, and that difference becomes more pronounced as the number of users and events you track per day increases. We’ve been very focused on reducing the cost of running Snowplow further. Most of our efforts have...

Is web analytics easy or hard? Distinguishing different types of complexity, and approaches for dealing with them

28 June 2013  •  Yali Sassoon
This post is a response to an excellent, but old, blog post by Tim Wilson called Web Analytics Platforms are Fundamentally Broken, authored back in August 2011. Tim made the case (that is still true today) that web analytics is hard, and part of that hardness is because web analytics platforms are fundamentally broken. After Tim published his post, a very interesting conversation ensued on Google+. Reading through it, I was struck by how many...

Getting started using R for data analysis

26 June 2013  •  Yali Sassoon
R is one of the most popular data analytics tools out there, with a rich and vibrant community of users and contributors. In spite of its popularity in general (and particularly with amongst academics and statisticians), R is not a common tool to find in business or web analysts arsenal, where Excel and Google Analytics tend to reign supreme. That is a real shame. R is a fantastic tool for exploring data, reworking it, visualizing...

Tracking Olark chat events with Snowplow

05 June 2013  •  Yali Sassoon
As some of you will have noticed, we recently installed Olark on the Snowplow website. Olark powers the chat box you see on the bottom right of the screen: if you click on it, and if one of the Snowplow team is at their desks, you can talk directly to us. Setting up Olark was an incredibly easy process - the Olark team provides a very straightforward quick start guide. We tested Olark for a...

Snowplow 0.8.6 released with performance improvements

03 June 2013  •  Alex Dean
We are very pleased to announce the release of Snowplow 0.8.6, with two significant performance-related improvements to the Hadoop ETL. These improvements are: The Hadoop ETL process is now much faster at processing raw Snowplow log files generated by the CloudFront Collector, because we have tackled the Hadoop “small files problem” You can now configure your ETL process on Elastic MapReduce to use Task instances alongside your Master and Core instances; optionally these task instances...

Dealing with Hadoop's small files problem

30 May 2013  •  Alex Dean
Hadoop has a serious Small File Problem. It’s widely known that Hadoop struggles to run MapReduce jobs that involve thousands of small files: Hadoop much prefers to crunch through tens or hundreds of files sized at or around the magic 128 megabytes. The technical reasons for this are well explained in this Cloudera blog post - what is less well understood is how badly small files can slow down your Hadoop job, and what to...

Snowplow 0.8.5 released with ETL bug fixes

24 May 2013  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 0.8.5. This is a bug fixing release, following on from our launch last week of Snowplow 0.8.4 with geo-IP lookups. This release fixes one showstopper issue with Snowplow 0.8.4, and also includes a set of smaller enhancements to help the Scalding ETL better handle “bad quality” event data from webpages. We recommend everybody on the Snowplow 0.8.x series upgrade to this version. Many thanks to...

Measuring how much traffic individual items in your catalog drive to your website

22 May 2013  •  Yali Sassoon
We have just added a new recipe to the catalog analytics section of the Analytics Cookbook. This recipe describes: How to measure how effectively different items in your catalog drive visits to your website. How to use the data to unpick how each item drives that traffic. In digital marketing, we can distinguish classic “outbound marketing”, where we push visitors to our website using paid ad campaigns, for example, with “inbound market”, where we pull...

Performing market basket analysis on web analytics data with R

20 May 2013  •  Yali Sassoon
We have just added a new recipe to the Analytics Cookbook: one that walks through the process of performing a market basket analysis, to identify associations between products and/or content items based on user purchase / viewing behavior. The recipe covers performing the analysis on Snowplow data using R and the arules package in particular. Although the example walked through uses Snowplow data, the same approach can be used with other data sets: I’d be...

Snowplow 0.8.4 released with MaxMind geo-IP lookups

16 May 2013  •  Alex Dean
We are pleased to announce the immediate availability of Snowplow 0.8.4. This is a big release, which adds geo-IP lookups to the Snowplow Enrichment stage, using the excellent GeoLite City database from MaxMind, Inc. This has been one of the most requested features from the Snowplow community, so we are delighted to launch it. Now you can determine the location of your website visitors directly from the Snowplow events table, and plot that data on...

A guide to unstructured events in Snowplow 0.8.3

14 May 2013  •  Alex Dean
Earlier today we announced the release of Snowplow 0.8.3, which updated our JavaScript Tracker to add the ability to send custom unstructured events to a Snowplow collector with trackUnstructEvent(). In our earlier blog post we briefly introduced the capabilities of trackUnstructEvent with some example code. In this blog post, we will take a detailed look at Snowplow’s custom unstructured events functionality, so you can understand how best to send unstructured events to Snowplow. Understanding the...

Snowplow 0.8.3 released with unstructured events

14 May 2013  •  Alex Dean
We’re pleased to announce the release of Snowplow 0.8.3. This release updates our JavaScript Tracker to version 0.11.2, adding the ability to send custom unstructured events to a Snowplow collector with trackUnstructEvent(). The Clojure Collector is also bumped to 0.5.0, to include some important bug fixes. Please note that this release only adds unstructured events to the JavaScript Tracker - adding unstructured events to our Enrichment process and storage targets is on the roadmap -...

Where does your traffic really come from?

10 May 2013  •  Yali Sassoon
Web analysts spend a lot of time exploring where visitors to their websites come from: Which sites and marketing campaigns are driving visitors to your website? How valuable are those visitors? What should you be doing to drive up the number of high quality users? (In terms of spending more marketing, engaging with other websites / blogs / social networks etc.) Unfortunately, identifying where your visitors come from is not as straightforward as it often...

Snowplow 0.8.2 released with Clojure Collector enhancements

08 May 2013  •  Alex Dean
We’re pleased to announce the immediate availability of Snowplow 0.8.2. This release updates the Clojure Collector only; if you are using the CloudFront Collector, then no upgrade to 0.8.2 is necessary. Many thanks to community member Mark H. Butler for his major contributions to this release - much appreciated Mark! This release bumps the Clojure Collector to version 0.4.0. There are three main changes to the Collector: Building the Collector’s warfile is now much simpler,...

Funnel analysis with Snowplow (Platform analytics part 1)

23 April 2013  •  Yali Sassoon
Eleven days ago, we started building out the Catalog Analytics section of the Analytics Cookbook, with a set of recipes covering how to measure the performance of content pages and product pages. Today we’ve published the first set of recipes in the new platform analytics section of the Cookbook. By ‘platform analytics’, we mean analytics performed to answer questions about how your platform (or ‘website’, ‘application’ or ‘product’) performs. This is one of the most...

Measuring content page performance with Snowplow (Catalog Analytics part 2)

18 April 2013  •  Yali Sassoon
This is the second part in our blog post series on Catalog Analytics. The first part was published last week. Last week, we started building out the Catalog Analytics section of the Analytics Cookbook, with a section documenting how to measure the effectiveness of your product pages. Those recipes were geared specifically towards retailers. This week, we’ve added an extra section to the cookbook, covering how to measure engagement levels with content pages. The recipes...

Snowplow 0.8.1 released with referer URL parsing

12 April 2013  •  Alex Dean
Just nine days after our Snowplow 0.8.0 release, we are pleased to have our next release ready: Snowplow 0.8.1. With the last release we promised that the new Scalding-based ETL/enrichment process would lay a strong technical foundation for our roadmap - and hopefully this release bears that out! Until this release, Snowplow has provided users the raw referer URL, from which analysts can deduce who the referer was. In this release, Snowplow processes that referer...

Measuring product page performance with Snowplow (Catalog Analytics part 1)

12 April 2013  •  Yali Sassoon
We built Snowplow to enable businesses to execute the widest range of analytics on their web event data. One area of analysis we are particularly excited about is catalog analytics for retailers. Today, we’ve published the first recipes in the catalog analytics section of the Snowplow Analytics Cookbook. These cover how to measure and compare the performance of different product pages on an ecommerce site, using plots like the one below: In this blog post,...

Towards high-fidelity web analytics - introducing Snowplow's innovative new event validation capabilities

10 April 2013  •  Alex Dean
A key goal of the Snowplow project is enabling high-fidelity analytics for businesses running Snowplow. What do we mean by high-fidelity analytics? Simply put, high-fidelity analytics means Snowplow faithfully recording all customer events in a rich, granular, non-lossy and unopinionated way. This data is incredibly valuable: it enables companies to better understand their customers and develop and tailor products and services to them. Ensuring that the data is high fidelity is essential to ensuring that...

Snowplow 0.8.0 released with all-new Scalding-based data enrichment

03 April 2013  •  Alex Dean
A new month, a new release! We’re excited to announce the immediate availability of Snowplow version 0.8.0. This has been our most complex release to date: we have done a full rewrite our ETL (aka enrichment) process, adding a few nice data quality enhancements along the way. This release has been heavily informed by our January blog post, The Snowplow development roadmap for the ETL step - from ETL to enrichment. In technical terms, we...

Snowplow Arduino Tracker released - sensor and event analytics for the internet of things

25 March 2013  •  Alex Dean
Today we are releasing our first non-Web tracker for Snowplow - an event tracker for the Arduino open-source electronics prototyping platform. The Snowplow Arduino Tracker lets you track sensor and event-stream information from one or more IP-connected Arduino boards. We chose this as our first non-Web tracker because we’re hugely excited about the potential of sophisticated analytics for the Internet of Things, following in the footsteps of great projects like Cosm and Exosite. And of...

Inside the Plow - Rob Slifka's Elasticity

20 March 2013  •  Alex Dean
The Snowplow platform is built standing on the shoulders of a whole host of different open source frameworks, libraries and tools. Without the amazing ongoing work by these individuals, companies and not-for-profits, the Snowplow project literally could not exist. As part of our “Inside the Plow” series, we will also be showcasing some of these core components of the Snowplow stack, and talking to their creators. To kick us off, we are delighted to have...

Snowplow 0.7.6 released with Redshift data warehouse support

03 March 2013  •  Alex Dean
We’re excited to announce the immediate release of Snowplow version 0.7.6 with support for storing your Snowplow events in Amazon Redshift. We were very excited when Amazon announced Redshift back in late 2012, and we have been working to integrate Snowplow data since Redshift became generally available two weeks ago. Our tests with Redshift since launch have not disappointed - and we can’t wait to see what the Snowplow community do with the new platform!...

Snowplow 0.7.5 released with important JavaScript fix

25 February 2013  •  Alex Dean
We are releasing Snowplow version 0.7.5 - which upgrades the JavaScript tracker to version 0.11.1. This is a small but important release - because we are fixing an issue introduced in Snowplow version a month ago: if you are on versions 0.9.1 to 0.11.0 of the JavaScript tracker, please upgrade! Essentially, version 0.9.1 of the JavaScript tracker (released in Snowplow 0.7.2) fixed an old bug which we inherited from the Piwik JavaScript tracker when we...

Snowplow 0.7.4 released for better eventstream analytics

22 February 2013  •  Alex Dean
Another week, another release! We’re excited to announce Snowplow version 0.7.4. The primary purpose of this release is to clean up and rationalise our event data model, in particular around user IDs and event timestamps. This release should lay the foundations for more sophisticated eventstream analytics (such as funnel analysis), by: Enabling companies to assign custom user IDs (e.g. when a customer logs on) Distinguish between IDs set at a domain level (via first-party cookies)...

Bulk loading data from Amazon S3 into Redshift at the command line

20 February 2013  •  Yali Sassoon
On Friday Amazon launched Redshift, a fully managed, petabyte-scale data warehouse service. We’ve been busy since building out Snowplow support for Redshift, so that Snowplow users can use Redshift to store their granular, customer-level and event-level data for OLAP analysis. In the course of building out Snowplow support for Redshift, we need to bulk load data stored in S3 into Redshift, programmatically. Unfortunately, the Redshift Java SDK is very slow at inserts, so not suitable...

Reflections on Saturday's Measurecamp

18 February 2013  •  Yali Sassoon
On Satuday both Alex and I were lucky enough to attend London’s second Measurecamp, an unconference dedicated to digital analytics. The venue was packed with smart people sharing some really interesting ideas - we can’t do justice to all those ideas here, so I’ve just outlined my favourite two from the day: Using keywords to segment audience by product and interest match, courtesy of Carmen Mardiros Transferring commercially sensitive data into your web analytics platform...

Snowplow 0.7.3 released, tracking additional data

15 February 2013  •  Alex Dean
We’re excited to announce the release of Snowplow version 0.7.3. This release adds a set of 16 all-new fields to our event model: A new Event Vendor field The Page URL split out into its component parts (scheme, host, port, path, querystring, fragment/anchor) The web page’s character set The web page’s width and height The browser’s viewport (i.e. visible width and height) For page pings, we are now tracking the user’s scrolling during the last...

Writing Hive UDFs - a tutorial

08 February 2013  •  Alex Dean
Snowplow’s own Alexander Dean was recently asked to write an article for the Software Developer’s Journal edition on Hadoop The kind folks at the Software Developer’s Journal have allowed us to reprint his article in full below. Alex started writing Hive UDFs as part of the process to write the Snowplow log deserializer - the custom SerDe used to parse Snowplow logs generated by the Cloudfront and Clojure collectors so they can be processed in...

Help us build out the Snowplow Event Model

04 February 2013  •  Yali Sassoon
At its beating heart, Snowplow is a platform for capturing, storing and analysing event-data, with a real focus on web event data. Working out how best to structure the Snowplow event data is key to making Snowplow a success. One of the things that has surprised us, since we started working on Snowplow, is the extent to which our view of the best way to structure that data has changed over time. In this blog...

Snowplow 0.7.2 released, with the new Pixel tracker

29 January 2013  •  Alex Dean
We’re excited to announce the release of Snowplow version 0.7.2. As well as a couple of bug fixes, this release includes our second Snowplow tracker - the Pixel Tracker, to be used in web environments where a JavaScript-based tracker is not an option. One of the bug fixes is particularly important: we are recommending that all users of the Clojure-based Collector upgrade to the new version (0.2.0) due to a serious bug in the way...

Introducing the Pixel tracker

29 January 2013  •  Yali Sassoon
The Pixel tracker enables companies running Snowplow to track users in environments that do not support Javascript. In this blog post we will cover: The purpose of the Pixel tracker) How it works Considerations when using the Pixel tracker with the Clojure collector in particular Next steps on the Snowplow tracker roadmap What is the purpose of the Pixel tracker? Our aim with Snowplow has been to enables companies to track user events across all...

Snowplow 0.7.1 released, with easier-to-run Ruby apps

22 January 2013  •  Alex Dean
We’re happy to announce the release of Snowplow version 0.7.1. This release is designed to make it much easier to install and run the two Snowplow Ruby applications: EmrEtlRunner - which runs the Snowplow ETL job StorageLoader - which loads Snowplow events into Infobright From the feedback we received, setting up and running these two Ruby apps was the most challenging (and error-prone) part of the Snowplow experience. Many thanks to all of those in...

What data should you be passing into your tag manager?

21 January 2013  •  Yali Sassoon
Since the launch of Google Tag Manager, a plethora of blog posts have been written on the value of tag management solutions. What has been left out of the discussion is practical advice on how to setup your tag management solution (be it GTM or OpenTag or one of the paid solutions), and, crucially, what data you should be passing into your tag manager. In this post, we will outline a methodology for identifying all...

Snowplow reaches 202 stars on GitHub

20 January 2013  •  Alex Dean
As of this weekend, the Snowplow repository on GitHub now has over 200 stars! We’re hugely excited to reach this milestone - this makes us: The 3rd most-watched analytics project on GitHub, after Hummingbird (real-time analytics) and Countly (mobile analytics) The 58th most-watched Scala project on GitHub Many thanks to everyone in the Snowplow community and on GitHub for their support and interest! We thought it might be interesting to share the Red Dwarf heatmap...

Implementing Snowplow with QuBit's OpenTag

18 January 2013  •  Yali Sassoon
This is a short blog post to highlight a new section on the Snowplow setup guide covering how to integrate Snowplow with QuBit’s OpenTag tag management system. In November last year, we started playing with tag management systems: testing Snowplow with Google Tag Manager, and documented how to setup Snowplow with GTM on the Snowplow setup guide. We were impressed on a number of fronts, but thought that the much more thought need to be...

Scala MaxMind GeoIP library released

16 January 2013  •  Alex Dean
A short blog post this, to announce the release of Scala MaxMind GeoIP, our Scala wrapper for the MaxMind Java Geo-IP library. We have extracted Scala MaxMind GeoIP from our current (ongoing) work porting our ETL process from Apache Hive to Scalding. We extracted this as a separate library for two main reasons: Being good open-source citizens - as with our referer-parser library, we believe this library willl be useful to the wider community of...

The Snowplow development roadmap for the ETL step - from ETL to enrichment

09 January 2013
In this blog post, we outline our plans to develop the ETL (“extract, transform and load”) part of the Snowplow stack. Although in many respects the least sexy element of the stack, it is critical to Snowplow, and we intend to re-architect the ETL step in quite significant ways. In this post, we discuss our plans and the rationale behind them, in the hope to get: Feedback from the community on them Ideas for alternative...

Using ChartIO to visualise and interrogate Snowplow data

08 January 2013
In the last couple of weeks, we have been experimenting with ChartIO - a hosted BI tool for visualising data and creating dashboards. So far, we are very impressed - ChartIO is an excellent analytics tool to use to interrogate and visualise Snowplow data. Given the number of requests we get from Snowplow users to recommend tools to assist with analytics on Snowplow data, we thought it well worth sharing why ChartIO is so good,...

Understanding the thinking behind the Clojure Collector, and mapping out its development going forwards

07 January 2013
Last week we released Snowplow 0.7.0: which included a new Clojure Collector, with some significant new functionality for content networks and ad networks in particular. In this post we explain a lot of the thinking behind the Clojure Collector architecture, before taking a look ahead at the short and long-term development roadmap for the collector. This is the first in a series of posts we write where describe in some detail the thinking behind the...

Snowplow 0.7.0 released, with new Clojure-based collector

03 January 2013  •  Alex Dean
Today we are hugely excited to announce the release of Snowplow version 0.7.0, which includes an experimental new Clojure-based collector designed to run on Amazon Elastic Beanstalk. This release allows you to use Snowplow to uniquely identify and track users across multiple domains - even across a whole content or advertising network. Many thanks to community member Simon Rumble for developing many of the ideas underpinning the new collector in SnowCannon, his node.js-based collector for...

referer-parser now with Java, Scala and Python support

02 January 2013  •  Alex Dean
Happy New Year all! It’s been three months since we introduced our Attlib project, now renamed to referer-parser, and we are pleased to announce that referer-parser is now available in three additional languages: Java, Scala and Python. To recap: referer-parser is a simple library for extracting seach marketing attribution data from referer (sic) URLs. You supply referer-parser with a referer URL; it then tells you whether the URL is from a search engine - and...

Snowplow 0.6.5 released, with improved event tracking

26 December 2012  •  Alex Dean
We’re excited to announce our next Snowplow release - version 0.6.5, a Boxing Day release for Snowplow! This is a big release for us, as it introduces the idea of event types - every event sent by the JavaScript tracker to the collector now has an event field which specifies what type of event it is. This should be really helpful for a couple of things: It should make querying Snowplow events much easier It...

Snowplow 0.6.4 released, with Infobright improvements

20 December 2012  •  Alex Dean
We’re happy to announce our next Snowplow release - version 0.6.4. This release includes updates: An upgraded Infobright table definition which scales to millions of pageviews easily Clarified Hive table definitions Before we start - a big thanks to the community members who helped out on this release: Gilles Moncaubeig @ OverBlog worked closely with us on the updated Infobright table definition Mike Moulton @ meltmedia for flagging the missing Hive table definition We’ll take...

Snowplow 0.6.3 released, with JavaScript and HiveQL bug fixes

18 December 2012  •  Alex Dean
Today we are releasing Snowplow version 0.6.3 - another clean-up release following on from the 0.6.2 release. This release bumps the JavaScript Tracker to version 0.8.2, and the Hive-data-format HiveQL file to version 0.5.2. Many thanks to the community members who contributed bug fixes to this release: Mike Moulton @ meltmedia, Simon Andersson @ Qwaya and Michael Tibben @ 99designs. We’ll take a look at both fixes below: JavaScript tracker fixes This release fixes the...

Transforming Snowplow data so that it can be interrogataed in BI / OLAP tools like Tableau, Qlikview and Pentaho

17 December 2012  •  Yali Sassoon
Because Snowplow does not ship with any sort of user interface, we get many enquiries from current and prospective users who would like to interrogate Snowplow data with popular BI tools like Tableau or Qlikview. Unfortunately, it is not possible to run a tool like Tableau directly on top of the Snowplow events table. That is because these tools require the data to be in a particular format: one in which each line of data...

Snowplow 0.6.2 released, with JavaScript tracker bug fixes

28 November 2012  •  Alex Dean
Today we are releasing Snowplow version 0.6.2 - a clean-up release after yesterday’s 0.6.1 release. This release bumps the JavaScript Tracker to version 0.8.1; the updated minified tracker is available as always here: http(s)://d1fc8wv8zag5ca.cloudfront.net/0.8.1/sp.js This release fixes two bugs: Issue #101 - we had left in a console.log() in the production version, which should only have been printed in debug mode. Harmless but worth taking out. Many thanks to Michael Tibben @ 99designs for spotting...

Snowplow 0.6.1 released, with lots of small improvements

27 November 2012  •  Alex Dean
We’re happy to announce our next Snowplow release - version 0.6.1. This release includes updates: Additional data collection. The Javascript tracker has been updated to capture additional data points, including a user fingerprint (which can be used as a user_id for companies tracking users across domains), the tracker version, browser timezone and color depth Javascript tracker updates. A number of updates have been made to make the Javascript tracker more robust Updates to the ETL...

Integrating Snowplow with Google Tag Manager

16 November 2012  •  Yali Sassoon
A month and a half ago, Google launched Google Tag Manager (GTM), a free tag management solution. That was a defining moment in tag management history as it will no doubt bring tag management, until now the preserve of big enterprises, into the mainstream. We have spent some time testing how to get Snowplow tags working well with Google Tag Manager, and have documented our recommended approach to setting up Snowplow with GTM on the...

Snowplow 0.6.0 released, with the new StorageLoader

12 November 2012  •  Alex Dean
We’re very pleased to start the week by releasing a new version of Snowplow - version 0.6.0. This is a big release for us - as it includes the first version of our all-new StorageLoader. The release also includes a small set of tweaks and bug fixes across the existing Snowplow components, but let’s start by introducing StorageLoader: Introducing StorageLoader Up until now, Snowplow has stored all its data in S3, where it can be...

Snowplow 0.5.2 released, and introducing the Sluice Ruby gem

06 November 2012  •  Alex Dean
Another week, another release: Snowplow 0.5.2! This is a small release, consisting just of a small set of bug fixes and improvements to EmrEtlRunner - although we’ll also use this post to introduce our new Ruby gem, called Sluice. Many thanks to community member Tom Erik Stower for his testing of EmrEtlRunner over the weekend, which helped us to identify and fix these bugs: Bugs fixed Issue 71: the template config.yml (in the GitHub repo...

Snowplow 0.5.1 released, with lots of small improvements

01 November 2012  •  Alex Dean
We have just released Snowplow 0.5.1! Rather than one large new feature, version 0.5.1 is an incremental release which contains lots of small fixes and improvements to the ETL and storage sub-systems. The two big themes of these updates are: Improving the robustness of the ETL process Laying the foundations for loading Snowplow events into Infobright Community Edition (ICE) To take each of these themes in turn: 1. A more robust ETL process The Hive...

Snowplow in a Universal Analytics world - what the new version of Google Analytics means for companies adopting Snowplow

31 October 2012  •  Yali Sassoon
Earlier this week, Google announced a series of significant advances in Google Analytics at the GA Summit, that are collectively referred to as Universal Analytics. In this post, we look at: The actual features Google has announced How those advances change the case for companies considering adopting Snowplow 1. What changes has Google announced? The most significant change Google has announced is the new Measurement Protocol, which enables businesses using GA to capture much more...

Snowplow 0.5.0 released, now with a Ruby gem to run Snowplow's ETL process on Amazon EMR

25 October 2012  •  Alex Dean
We have just released Snowplow 0.5.0, with an all-new component, the Snowplow EmrEtlRunner. EmrEtlRunner is a Ruby application to run Snowplow’s Hive-based ETL (extract, transform, load) process on Amazon Elastic MapReduce with minimum fuss. We are hugely grateful to community member Michael Tibben from 99designs for his contributions to EmrEtlRunner: thanks to Michael, EmrEtlRunner is more efficient, more flexible and more robust than it otherwise would have been - and ready sooner. Many thanks Michael!...

Performing web analytics on Snowplow data using Tableau - a video demo

24 October 2012  •  Yali Sassoon
People who see Snowplow for the first time often ask us to "show Snowplow in action". It is one thing to tell someone that having access to their customer- and event-level data will open up whole new analysis possibilities, but it is another thing to demonstrate those possibilities. Demonstrating Snowplow is tricky because currently, Snowplow only gives you access to data: we have no snazzy front-end UI to show off. The good news is that...

Infobright Ruby Loader Released

21 October 2012  •  Alex Dean
We’re pleased to start the week with the release of a new Ruby gem, our Infobright Ruby Loader (IRL). At Snowplow we’re committed to supporting multiple different storage and analytics options for Snowplow events, alongside our current Hive-based approach. One of the alternative data stores we are working with is Infobright, a columnar database which is available in open source and commercial versions. For all but the largest Snowplow users, columnar databases such as Infobright...

How we use Hive at Snowplow, and how the role of Hive is changing. (Slides from our presentation to Hive London.)

12 October 2012  •  Yali Sassoon
Last night I gave a presentation to the clever folks at Hive London covering three things: How big data technologies like Apache Hive are transforming web analytics Howe we’ve used Hive in Snowplow development How the role of Hive has changed at Snowplow over time, including a comparison of Hive against other technologies. The slides from the presentation are below. As always, any questions / comments, please post them below.

Snowplow 0.4.10 released

11 October 2012  •  Alex Dean
We have just released version 0.4.10 of Snowplow - people using 0.4.8 can jump straight to this version. This version updates: snowplow.js to version 0.7.0 the Hive deserializer to version 0.4.9 Big thanks to community members Michael Tibben from 99designs and Simon Andersson from Qwaya for their most-helpful contributions to this release! Main changes The main changes are as follows: The querystring parameter for site ID which the JavaScript tracker sends to your collector is...

Attlib - an open source library for extracting search marketing attribution data from referrer URLs

11 October 2012  •  Yali Sassoon
Update 17-Dec-12: We have renamed Attlib to referer-parser, to make it clearer what Attlib does: parse referer URLs. The repository has been updated accordingly. Some of the example code below is out-of-date now: we recommend checking out the repository for more information. Last night we published Attlib, an open source Ruby library for extracting search marketing attribution data from referer (sic) URLs. In this post we talk through: What Attlib does, and how to use...

Why set your data free?

24 September 2012  •  Yali Sassoon
At Saturday’s Measure Camp, I had the chance to introduce Snowplow to a large number of some incredibly thoughtful and insightful people in the web analytics industry. With each person, I started by explaining that Snowplow gave them direct access to their customer-level and event-level data. The response I got in nearly all cases was: what does having direct access to my web analytics data enable me to do, that I can’t do with Google...

Snowplow 0.4.8 released

14 September 2012  •  Alex Dean
We have just released Snowplow version 0.4.8, with a set of enhancements to the existing Hive deserializer: The Hive deserializer now supports Amazon’s new CloudFront log file format (launched 12 September 2012) as well as the older format The Hive deserializer now supports a tracking pixel called simply i (saving some characters versus ice.png) (issue #35) The Hive deserializer now works if the CloudFront distribution has Forward Query String = yes (issue #39) The Hive...

Snowplow 0.4.7 released with additional JavaScript tracking options

06 September 2012  •  Alex Dean
We have just released Snowplow version 0.4.7. This release bumps the Snowplow JavaScript tracker to version 0.6, with two significant new features: The ability to set a site ID for your tracking - useful for multi-site publishers The ability to log ecommerce transactions - useful for merchants wanting to track orders A huge thanks to community member Simon Andersson from Qwaya for contributing the ecommerce tracking functionality - thank you Simon! We’ll take a look...

Amazon announces Glacier - lowers the cost of running Snowplow

21 August 2012  •  Alex Dean
Today Amazon announced the launch of Amazon Glacier, which is a low-cost data archiving service designed for rarely accessed data. As Werner Vogels described it in his blog post this morning: Amazon Glacier provides the same high durability guarantee as Amazon S3 but relaxes the access times to a few hours. This is the right service for customers who have archival data that requires highly reliable storage but for which immediate access is not needed....

Snowplow 0.4.6 released

20 August 2012  •  Alex Dean
Over the weekend we released Snowplow version 0.4.6. This was a minor release that added a new capability into the Snowplow JavaScript tracker. Specifically, with the JavaScript you can now specify your own collector URL, rather than simply pass in an account ID which resolves to a CloudFront bucket. You can use this feature in your JavaScript invocation code like so: 1 <!-- Snowplow starts plowing --> 2 <script type="text/javascript"> 3 var _snaq = _snaq...

Updated Hive SerDe released

14 August 2012  •  Alex Dean
One of the key elements in the Snowplow technology stack is the Hive SerDe. This is what makes it possible for Elastic MapReduce to read the Cloudfront log files generated by the Snowplow javascript trackings tags, extarct the relevant fields and make these available in Hive as a nice, clean query table. (The structure of the Hive table is documented here). A number of improvements have been made in the new versions. However, the most...

SnowCannon - a node.js collector for Snowplow

13 August 2012  •  Alex Dean
We are hugely excited to introduce SnowCannon, a Node.js collector for Snowplow, authored by [@shermozle] (http://twitter.com/shermozle). SnowCannon is an alternative collector to the default cloudfront collector included with Snowplow. It offers a number of significant advantages over the Cloudfront connector: It allows the use of 3rd party cookies. In particular, this makes it possible to track usage across multiple domains It enables real-time analytics. (This is not possible with the Cloudfront-enabled collector, where there’s a...

The setup guide has been overhauled

02 August 2012  •  Yali Sassoon
Following a lot of invaluable feedback from users setting up Snowplow for the first time, we’ve updated the Snowplow setup documentation. The documentation can be found here. Any further feedback would be much appreciated - we want to make it as painless as possible for Snowplow newbies to get up and running…