20 May 2013
We have just added a new recipe to the Analytics Cookbook: one that walks through the process of performing a market basket analysis, to identify associations between products and/or content items based on user purchase / viewing behaviour. The recipe covers performing the analysis on Snowplow data using R and the arules package in particular. Although the example walked through uses Snowplow data, the same approach can be used with other data sets: I’d be interested in finding out if members of the #measure community can describe how to do the comparable analysis using data from Google Analytics.

Market basket analysis is the mining of transaction data to identify associations between different items. This is typically performed by retailers who use it to identify products that a customer is likely to buy, given the products that they have already bought (or added to basket): most famously, it is the approach behind Amazon’s users who bought this product also bought these items…
Read the rest of this entry >>
16 May 2013
We are pleased to announce the immediate availability of Snowplow 0.8.4. This is a big release, which adds geo-IP lookups to the Snowplow Enrichment stage, using the excellent GeoLite City database from MaxMind, Inc. This has been one of the most requested features from the Snowplow community, so we are delighted to launch it. Now you can determine the location of your website visitors directly from the Snowplow events table, and plot that data on a wide range of mapping tools including Tableau or Vincent:
Click on the image above to enlarge it
Here is some example geo-IP data:
Click on the image above to enlarge it
As well as geo-IP enrichment, there are a number of other code improvements to the Hadoop ETL, plus some minor improvements to EmrEtlRunner and some corresponding updates to the Redshift table. In this post we will cover:
- The new geo-IP capabilities
- Other changes
- Upgrading
- Getting help
Read the rest of this entry >>
14 May 2013
Earlier today we announced the release of Snowplow 0.8.3, which updated our JavaScript Tracker to add the ability to send custom unstructured events to a Snowplow collector with trackUnstructEvent().
In our earlier blog post we briefly introduced the capabilities of trackUnstructEvent with some example code. In this blog post, we will take a detailed look at Snowplow’s custom unstructured events functionality, so you can understand how best to send unstructured events to Snowplow.
Understanding the unstructured event format is important because our Enrichment process does not yet extract unstructured events, so you will not get any feedback yet from the ETL as to whether you are tracking them correctly. (Nor do we have validation for unstructured event properties in our JavaScript Tracker yet.)
In the rest of this post, then, we will cover:
- Basic usage
- The
properties JavaScript object
- Supported datatypes
- Getting help
Read the rest of this entry >>
14 May 2013
We’re pleased to announce the release of Snowplow 0.8.3. This release updates our JavaScript Tracker to version 0.11.2, adding the ability to send custom unstructured events to a Snowplow collector with trackUnstructEvent(). The Clojure Collector is also bumped to 0.5.0, to include some important bug fixes.
Please note that this release only adds unstructured events to the JavaScript Tracker - adding unstructured events to our Enrichment process and storage targets is on the roadmap - but rest assured we are working on it!
Many thanks to community members Gabor Ratky, Andras Tarsoly and Laszlo Bacsi, all from Secret Sauce Partners, for contributing this great feature: Gabor and his team took JavaScript unstructured events from an item on our roadmap to a code-complete feature, big thanks guys! (And if you are interested in seeing how the design and implementation of this powerful feature evolved, do have a read of the original GitHub pull request.)
In the rest of this post, then, we will cover:
- What are unstructured events?
- When to use unstructured events?
- Usage
- Upgrading
- Roadmap for unstructured events
- Getting help
Read the rest of this entry >>
10 May 2013
Web analysts spend a lot of time exploring where visitors to their websites come from:
- Which sites and marketing campaigns are driving visitors to your website?
- How valuable are those visitors?
- What should you be doing to drive up the number of high quality users? (In terms of spending more marketing, engaging with other websites / blogs / social networks etc.)
Unfortunately, identifying where your visitors come from is not as straightforward as it often seems. In this post, we will cover:
- How, technically, can we determine where visitors have come from?
- Potential sources of errors
- Problems with relying on the Google Analytics approach, and why the Snowplow approach is superior
- Surprises when examining visitors acquired from AdWords search campaigns: most visitors clicked on an ad that was not shown on a Google domain
- Pulling all the findings together: the value of high-fidelity data in determining where your visitors come from
Read the rest of this entry >>