Loading and analyzing Snowplow event data in Neo4j

17 July 2017  •  Dilyan Damyanov
Back in 2014 we published a series of blog post on using Snowplow event data in the graph database Neo4j. Three years on, they’re still among our most popular blog posts. (See below for links to the original posts.) A lot has changed since then. Neo4j has strengthened its position as a leading graph database solution. Its query language, Cypher, has grown with the platform. It has changed to the point where some of the...

How to develop better games with level analytics

12 April 2017  •  Colm O Griobhtha
Summary Product managers and game designers generally aim to design game levels in such a way that they challenge gamers enough to make completing a level satisfying, but not so challenging that they drop out and stop playing the game. This blog post shows an example of how product managers and game designers can use a well designed dashboard to better understand user behaviour across a game’s levels, design highly playable game levels, A/B test...

How a clear data taxonomy drives insight and action

27 January 2017  •  João Correia
This is guest blog post by João Correia, Senior Analytics Strategist at YouCaring and an experienced analytics professional, helping organizations embed analytics for growth and innovation. In this post, João explains how to define an analytics strategy with Snowplow Analytics that considers your business context and drives insights and action. Many thanks to João for sharing his views on this topic! If you have a story to share, feel free to get in touch. Add...

Debugging bad data in Elasticsearch and Kibana - a guide

03 March 2016  •  Yali Sassoon
One of the features that makes Snowplow unique is that we actually report bad data: any data that hits the Snowplow pipeline and fails to be processed successfully. This is incredibly valuable, because it means you can: Spot data tracking issues that emerge, quickly, and address them at source Have a corresponding high degree of confidence that trends in the data reflect trends in the business and not data issues Recently we extended Snowplow so...

Building first and last touch attribution models in Redshift SQL

22 February 2016  •  Yali Sassoon
In order to calculate the return on marketing spend on individual campaigns, digital marketers need to connect revenue events, downstream in a user journey, with marketing touch events, upstream in a user journey. This connection is necessary so that the cost of those associated with the marketing campaign that drove those marketing touches can be connected to profit associated with the conversion events later on. Different attribution models involve applying different logic to connecting those...

Dealing with duplicate event IDs

19 August 2015  •  Christophe Bogaert
The Snowplow pipeline outputs a data stream in which each line represents a single event. Each event comes with an identifier, the event ID, which was generated by the tracker and is—or rather should be—unique. However, after having used Snowplow for a while, users often notice that some events share an ID. Events are sometimes duplicated within the Snowplow pipeline itself, but it’s often the client-side environment that causes events to be sent in with...

Analyzing marketing attribution data with a D3.js visualization

02 July 2015  •  Justine Courty
Marketing attribution, as in understanding what impact different marketing channels have in driving conversion, is a very complex problem: We have no way of directly measuring the impact of an individual channel on a user’s propensity to convert It is not uncommon for users to interact with many channels prior to converting It is likely that different channels impact each other’s effectiveness Because of this difficulty, there is not yet a consensus in digital analytics...

JSON schemas for Redshift datatypes

12 February 2015  •  Fred Blundun
This blog contains JSON schemas for the all the data types supported by Amazon Redshift. We supply two schemas for each numeric type, since you may want to send in numeric types as JSON strings rather than JSON numbers. SMALLINT INTEGER BIGINT DECIMAL REAL DOUBLE PRECISION BOOLEAN CHAR VARCHAR DATE TIMESTAMP SMALLINT The schema for passing the value in as a number: { "type": "integer" } And the schema for passing the value in as...

Using graph databases to perform pathing analysis - initial experiments with Neo4J

31 July 2014  •  Nick Dingwall
In the first post in this series, we raised the possibility that graph databases might allow us to analyze event data in new ways, especially where we were interested in understanding the sequences that events occured in. In the second post, we walked through loading Snowplow page view event data into Neo4J in a graph designed to enable pathing analytics. In this post, we’re going to see whether the hypothesis we raised in the first...

Loading Snowplow event-level data into Neo4J

30 July 2014  •  Nick Dingwall
In the last post, we discussed how particular types of analysis, particularly path analysis, are not well-supported in traditional SQL databases, and raised the possibility that graph databases like Neo4J might be good platforms for doing this sort of analysis. We went on to design a graph to represent event data, and page view data specifically, which captures the sequence of events. In this post, we’re going to walk through the process of taking Snowplow...

Can graph databases enable whole new classes of event analytics?

28 July 2014  •  Nick Dingwall
With Snowplow, we want to empower our users to get the most out of their data. Where your data lives has big implications for the types of query and therefore analyses you can run on it. Most of the time, we’re analysing data with SQL, and specifically, in Amazon Redshift. This is great a whole class of OLAP style analytics - it enables us to slice and dice different combinations of dimensions and metrics, for...

How configurable data models and schemas make digital analytics better

11 July 2014  •  Yali Sassoon
Digital analysts don’t typically spend a lot of time thinking about data models and schemas. How data is modelled and schema’d, both at data collection time, and at analysis time, makes an enormous difference to how easily insight and value can be derived from that data. In this post, I will explain why data models and schemas matter, and why being able to define your own event data model in Snowplow is a much better...

Understanding Snowplow's unique approach to identity stitching, including comparisons with Universal Analytics, Kissmetrics and Mixpanel

16 April 2014  •  Yali Sassoon
This post was inspired by two excellent, recently published posts on identity stitching: Yehoshua Coren’s post Universal Analytics is Out of Beta - Time to Switch? and Shay Sharon’s post on the intlock blog, The Full Customer Journey - Managing User Identities with Google Universal, Mixpanel and KISSmetrics. In both posts, the authors explain in great detail the limitations that traditional analytics solutions have when dealing with identity stitching. In this post, I hope to...

Why and how to use big data tools to process web analytics data? Joint Qubole and Snowplow webinar

19 February 2014  •  Yali Sassoon
Last night, I presented at a webinar organized by our friends at Qubole on using big data tools to analyze web analytics data. You can view the slides I presented below: On the webinar, I talked through the limitations associated with using traditional web analytics tools like Google Analytics and Adobe SiteCatalyst to do web analytics data, and how using big data technologies, and Snowplow and Qubole in particular, addressed those limitations. My talk was...

Introducing Looker - a fresh approach to Business Intelligence that works beautifully with Snowplow

10 December 2013  •  Yali Sassoon
In the last few weeks, we have been experimenting with using Looker as a front-end to analsye Snowplow data. We’ve really liked what we’ve seen: Looker works beautifully with Snowplow. Over the next few weeks, we’ll share example analyses and visualizations of Snowplow data in Looker, and dive into Looker in more detail. In this post, we’ll take a step back and walk through some context to explain why we are so excited about Looker....

Quick start guide to learning SQL to query Snowplow data published

19 November 2013  •  Yali Sassoon
Whilst it is possible to use different BI tools to query Snowplow data with limited or no knowledge of SQL, to really get the full power of Snowplow you need to know some SQL. To help Snowplow users who are not familiar with SQL, or those who could do with a refreshing their knowledge, we’ve put together a quick start guide on the Analytics Cookbook. The purpose of the guide is to get the reader...

Call for data! Support us develop experimental analyses. Have us help you answer your toughest business questions.

28 October 2013  •  Yali Sassoon
This winter we are recruiting interns to join the Snowplow team to work on discrete projects. A number of the candidates we have interviewed have expressed an interest in working with us to develop new analytics approaches on Snowplow data. In particular, we’ve had a lot of interest in piloting machine learning approaches to: Segmenting audience by behaviour Leveraging libraries for content / product recommendation (e.g. PredictionIO, Mahout, Weka) Developing and testing new approaches to...

Using the new SQL views to perform cohort analysis with ChartIO

22 October 2013  •  Yali Sassoon
We wanted to follow-up our recent launch of Snowplow 0.8.10, with inbuilt SQL recipes and cubes, with a few posts demonstrating how you can use those views to quickly perform analytics on your Snowplow data. This is the first of those posts. In this post, we’ll cover how to perform a cohort analysis using ChartIO and Snowplow. Recap: what is Cohort Analysis We have described cohort analysis at length in the Analyst Cookbook. To sum...

Book review - Apache Hive Essentials How-to

30 September 2013  •  Yali Sassoon
Although it is no longer part of the core Snowplow stack, Apache Hive is the gateway drug that got us started on Hadoop. As some of our recent blog posts testify, Hive is still very much a part of our big data toolkit, and this will continue as we use it to roll out new features. (E.g. for analyzing custom unstructured events.) I suspect that many Hadoopers started out with Hive, before experimenting with the...

Reprocessing bad rows of Snowplow data using Hive, the JSON Serde and Qubole

11 September 2013  •  Yali Sassoon
This post is outdated. For more documentation on debugging and recovering bad rows, please visit: Debugging bad rows in Elasticsearch and Kibana Debugging bad rows in Elasticsearch using curl (without Kibana) Snowplow 81 release post (for recovering bad rows) Hadoop Event Recovery One of the distinguishing features of the Snowplow data pipeline is the handling of “bad” data. Every row of incoming, raw data is validated. When a row fails validation, it is logged in...

Using Qubole to crunch your Snowplow web data using Apache Hive

03 September 2013  •  Yali Sassoon
We’ve just published a getting-started guide to using Qubole, a managed Big Data service, to query your Snowpow data. You can read the guide here. Snowplow delivers event data to users in a number of different places: Amazon Redshift or PostgreSQL, so you can analyze the data using traditional analytics and BI tools Amazon S3, so you can analyze that data using Hadoop-backed, big data tools e.g. Mahout, Hive and Pig, on EMR Since we...

Is web analytics easy or hard? Distinguishing different types of complexity, and approaches for dealing with them

28 June 2013  •  Yali Sassoon
This post is a response to an excellent, but old, blog post by Tim Wilson called Web Analytics Platforms are Fundamentally Broken, authored back in August 2011. Tim made the case (that is still true today) that web analytics is hard, and part of that hardness is because web analytics platforms are fundamentally broken. After Tim published his post, a very interesting conversation ensued on Google+. Reading through it, I was struck by how many...

Getting started using R for data analysis

26 June 2013  •  Yali Sassoon
R is one of the most popular data analytics tools out there, with a rich and vibrant community of users and contributors. In spite of its popularity in general (and particularly with amongst academics and statisticians), R is not a common tool to find in business or web analysts arsenal, where Excel and Google Analytics tend to reign supreme. That is a real shame. R is a fantastic tool for exploring data, reworking it, visualizing...

Measuring how much traffic individual items in your catalog drive to your website

22 May 2013  •  Yali Sassoon
We have just added a new recipe to the catalog analytics section of the Analytics Cookbook. This recipe describes: How to measure how effectively different items in your catalog drive visits to your website. How to use the data to unpick how each item drives that traffic. In digital marketing, we can distinguish classic “outbound marketing”, where we push visitors to our website using paid ad campaigns, for example, with “inbound market”, where we pull...

Performing market basket analysis on web analytics data with R

20 May 2013  •  Yali Sassoon
We have just added a new recipe to the Analytics Cookbook: one that walks through the process of performing a market basket analysis, to identify associations between products and/or content items based on user purchase / viewing behavior. The recipe covers performing the analysis on Snowplow data using R and the arules package in particular. Although the example walked through uses Snowplow data, the same approach can be used with other data sets: I’d be...

Where does your traffic really come from?

10 May 2013  •  Yali Sassoon
Web analysts spend a lot of time exploring where visitors to their websites come from: Which sites and marketing campaigns are driving visitors to your website? How valuable are those visitors? What should you be doing to drive up the number of high quality users? (In terms of spending more marketing, engaging with other websites / blogs / social networks etc.) Unfortunately, identifying where your visitors come from is not as straightforward as it often...

Funnel analysis with Snowplow (Platform analytics part 1)

23 April 2013  •  Yali Sassoon
Eleven days ago, we started building out the Catalog Analytics section of the Analytics Cookbook, with a set of recipes covering how to measure the performance of content pages and product pages. Today we’ve published the first set of recipes in the new platform analytics section of the Cookbook. By ‘platform analytics’, we mean analytics performed to answer questions about how your platform (or ‘website’, ‘application’ or ‘product’) performs. This is one of the most...

Measuring content page performance with Snowplow (Catalog Analytics part 2)

18 April 2013  •  Yali Sassoon
This is the second part in our blog post series on Catalog Analytics. The first part was published last week. Last week, we started building out the Catalog Analytics section of the Analytics Cookbook, with a section documenting how to measure the effectiveness of your product pages. Those recipes were geared specifically towards retailers. This week, we’ve added an extra section to the cookbook, covering how to measure engagement levels with content pages. The recipes...

Measuring product page performance with Snowplow (Catalog Analytics part 1)

12 April 2013  •  Yali Sassoon
We built Snowplow to enable businesses to execute the widest range of analytics on their web event data. One area of analysis we are particularly excited about is catalog analytics for retailers. Today, we’ve published the first recipes in the catalog analytics section of the Snowplow Analytics Cookbook. These cover how to measure and compare the performance of different product pages on an ecommerce site, using plots like the one below: In this blog post,...

Reflections on Saturday's Measurecamp

18 February 2013  •  Yali Sassoon
On Satuday both Alex and I were lucky enough to attend London’s second Measurecamp, an unconference dedicated to digital analytics. The venue was packed with smart people sharing some really interesting ideas - we can’t do justice to all those ideas here, so I’ve just outlined my favourite two from the day: Using keywords to segment audience by product and interest match, courtesy of Carmen Mardiros Transferring commercially sensitive data into your web analytics platform...

Using ChartIO to visualise and interrogate Snowplow data

08 January 2013
In the last couple of weeks, we have been experimenting with ChartIO - a hosted BI tool for visualising data and creating dashboards. So far, we are very impressed - ChartIO is an excellent analytics tool to use to interrogate and visualise Snowplow data. Given the number of requests we get from Snowplow users to recommend tools to assist with analytics on Snowplow data, we thought it well worth sharing why ChartIO is so good,...

Transforming Snowplow data so that it can be interrogataed in BI / OLAP tools like Tableau, Qlikview and Pentaho

17 December 2012  •  Yali Sassoon
Because Snowplow does not ship with any sort of user interface, we get many enquiries from current and prospective users who would like to interrogate Snowplow data with popular BI tools like Tableau or Qlikview. Unfortunately, it is not possible to run a tool like Tableau directly on top of the Snowplow events table. That is because these tools require the data to be in a particular format: one in which each line of data...

Snowplow in a Universal Analytics world - what the new version of Google Analytics means for companies adopting Snowplow

31 October 2012  •  Yali Sassoon
Earlier this week, Google announced a series of significant advances in Google Analytics at the GA Summit, that are collectively referred to as Universal Analytics. In this post, we look at: The actual features Google has announced How those advances change the case for companies considering adopting Snowplow 1. What changes has Google announced? The most significant change Google has announced is the new Measurement Protocol, which enables businesses using GA to capture much more...

Performing web analytics on Snowplow data using Tableau - a video demo

24 October 2012  •  Yali Sassoon
People who see Snowplow for the first time often ask us to "show Snowplow in action". It is one thing to tell someone that having access to their customer- and event-level data will open up whole new analysis possibilities, but it is another thing to demonstrate those possibilities. Demonstrating Snowplow is tricky because currently, Snowplow only gives you access to data: we have no snazzy front-end UI to show off. The good news is that...

Why set your data free?

24 September 2012  •  Yali Sassoon
At Saturday’s Measure Camp, I had the chance to introduce Snowplow to a large number of some incredibly thoughtful and insightful people in the web analytics industry. With each person, I started by explaining that Snowplow gave them direct access to their customer-level and event-level data. The response I got in nearly all cases was: what does having direct access to my web analytics data enable me to do, that I can’t do with Google...