How to get to a multi-touch attribution model that works for you

15 November 2019  •  Erika Wolfe
True multi-touch attribution begins with collecting and centralizing your multi-source data, making sure you have the volume, quality and completeness of data you need and then connecting the dots to build your attribution story. Marketing departments invest heavily in marketing strategy, tactics and campaigns. To justify marketing spend and fully understand the results of the marketing mix you’ve deployed, you undoubtedly ask yourself how you can account for or attribute the results you get. And...

How real-time data enables personalization and engagement

27 September 2019  •  Erika Wolfe
It would not be an overstatement to claim that real-time insight has changed the media industry. Moving from a traditional print environment in which media companies had static subscriber data and did not face cutthroat advertising competition to a completely digital, 24/7 information cycle driven by the onslaught of free content and granular targeting in advertising, it’s an entirely new paradigm. While this digital shift posed many challenges in terms of profitability and business models,...

Piecing together the complex customer journey for a complete single customer view

20 September 2019  •  Erika Wolfe
In many pieces on travel, we’ve touched on the concept of hyper-relevance - that is, delivering the right product to the right customer at the right time on the right platform or in the right channel. But how do you actually get to hyper-relevance? In this era of data-informed decision-making, your data can give you a complete, end-to-end view of your customer journey and the touchpoints all along the way. And, perhaps most importantly, this...

How Animoto uses event tracking data to understand and optimize the user journey

02 September 2019  •  Alex Beskin & Jason Bellinger
“How are people using my site?” “What is a typical customer’s journey?” “How many times does a user visit before purchasing? And how many pages do they visit? Which one had the biggest impact on their decision?” These are a few questions that data analysts hear all the time from marketing, product and finance teams. The concepts of a “customer journey”, “clickstream analysis”, and “multi-touch attribution” have been around for a long time and are...

Mapping the customer journey with complete-picture data to reach the single customer view

29 August 2019  •  Erika Wolfe
With data unification - the ability to join multiple data sources and get a full-picture view - travel companies are harnessing the power of understanding their customer journey at a more granular level, finally tapping into insight untethered from previously siloed sources. This creates conditions for “hyper-relevance” as a differentiator in a crowded travel market dominated by big, household names. Data joined together from across the customer journey - whether from web and mobile, from...

Time spent is the most important metric for media, here’s how to get it right

07 August 2019  •  Simon Rumble
The actual product media companies sell is the engagement and attention of their audience yet the way it’s commonly measured is completely broken. Let’s look at the problem of measuring attention, ways to solve it and explore some examples of media companies doing it well. Media companies trade in the attention of audiences. Audiences visit them—and hopefully pay them—to inform and entertain. Advertisers pay them to get the attention of those audiences. If there’s one...

Avoid turning your data lakes into data swamps: Focus on data quality, not capture

10 July 2019  •  Erika Wolfe
Putting data to work and gathering meaningful insights from it is growing increasingly complex, in large part because there is a near-unfathomable amount of data flowing in from all kinds of sources. This will come as no surprise to anyone, regardless of the industry in which they work, because data has come to be seen as the “holy grail” of business development and improvement as well as the defining factor in many changing and new...

Snowplow for media part 5: what can we do with the data when we're well established?

29 May 2019  •  Archit Goyal
We recommend you read the previous posts on this topic before diving into this article to ensure you have all the context you need: Main post What do I track What can we do with the data, we’re getting started What can we do with the data, we’re growing Do read the post that answers the question: What do I track? What can we do with the data, we’re a well established data team? Set...

Snowplow for media part 4: what can we do with the data when we're growing?

29 May 2019  •  Archit Goyal
We recommend you read the previous posts on this topic before diving into this article to ensure you have all the context you need: Main post What do I track What can we do with the data, we’re getting started Bear in mind there is 1 more post in this series you can read after this one: What can we do with the data, we’re well established. What can we do with the data as...

Snowplow for media part 3: what can we do with the data when we're getting started?

29 May 2019  •  Archit Goyal
We recommend you read the main post on this topic before diving into this article to ensure you have all the context you need! Bear in mind there are 2 more posts in this series you can read after this one: What can we do with the data, we’re growing What can we do with the data, we’re well established Do also read the post that answers the question: What do I track? What can...

Snowplow for media part 2: what do I track?

29 May 2019  •  Archit Goyal
We recommend you read the main post on this topic before diving into this article to ensure you have all the context you need! Bear in mind there are 3 more posts in this series that you can read after this one: What can we do with the data, we’re getting started What can we do with the data, we’re growing What can we do with the data, we’re well established What do I track?...

Snowplow for media part 1: how can I use Snowplow?

29 May 2019  •  Archit Goyal
“We don’t know where to focus our content creation efforts; which content leads to retention and subscription? which content categories, authors, themes drive high CPMs? We have several disparate brands and users have multiple products on the web and in our app so we don’t have a single customer view. This means we can’t effectively group our users based on engagement or monitor how they are retained. We don’t have a good understanding of how...

Guest post: 3 reasons why your company should own its data

04 April 2019  •  Jacob Thomas
This is a guest post by Jacob Thomas, Lead Data Engineer at CarGurus. You can find the original article and read more from Jacob on Bostata. When it comes to your company’s software and infrastructure, it often makes sense to buy vs. build. However, it will benefit you in the long-term to thoroughly understand and own your data management and collection. Here’s why. When it comes to software and related infrastructure, businesses get caught in...

How many of your visitors block your Snowplow tracking?

18 March 2019  •  Mike Nemirovsky
tl;dr As a company that focuses on helping businesses collect data in order to better serve their customers, we inevitably get asked about what happens when those customers don’t want to be tracked. With the usage of Ad-blockers, and in particular privacy filters, on the rise, some of our customers are seeing the effect on their data. This effect is at times perceived as a problem or a threat towards the quality of data collection....

Snowplow at Superweek: machine learning and actioning data

06 March 2019  •  Archit Goyal
For anyone unfamiliar with Superweek, it’s 5 days of analytics talks and is quite a lot like going to summer camp as a child with: Less sun and grass, more fog and snow Fewer hyperactive children, more hyperactive data analysts Less ice cream, more fried Hungarian potato pancakes with sour cream and chives Fewer sporting activities, more GTM activities And fewer small bonfires and more big bonfires I will aim to cover what I took...

Snowplow for retail part 1: how can I use Snowplow?

06 March 2019  •  Archit Goyal
“We have several disparate brands and users have multiple touch points (web, app, over the phone and in store) before purchasing and we don’t have a single customer view. This means we can’t effectively group our users. We don’t have a good understanding of how our marketing spend across multiple channels affects revenue. We just released an app and people are downloading it but not buying on it, it feels like lost revenue. We’re spending...

Snowplow for retail part 2: what data do I track?

06 March 2019  •  Archit Goyal
We recommend you you have read the first post in this series before diving into this one to ensure you have all the context you need! There are also three more posts in this series that you can read next: What can we do with data when we’re getting started? What can we do with data when we’re growing? What can we do with the data when we’re well established? What do I track? With...

Snowplow for retail part 3: what can we do with data when we're getting started?

06 March 2019  •  Archit Goyal
We recommend you you have read the first post in this series before diving into this one to ensure you have all the context you need! What can we do with the data, we’re getting the data team started? Five example of what you can do with a treasure trove of Snowplow data and 1 analyst are as follows: Look at user engagement on the website Look at user engagement on mobile apps Understand offline...

Snowplow for retail part 4: what can we do with data when we're growing?

06 March 2019  •  Archit Goyal
We recommend that you have read the first post in this series before diving into this one to ensure you have all the context you need! Now we’re looking at a data team that is growing and has several analysts and maybe some spare engineering resource as the company is starting to see real value in the analytics you have served to date. We’re working under the assumption that you’ve already taken all the steps...

Snowplow for retail part 5: what can we do with data when we're well established?

06 March 2019  •  Archit Goyal
We recommend you have read the first post in this series before diving into this one to ensure you have all the context you need! Senior management love the work of the data team so far: You’re tracking site and mobile app engagement with a host of custom events You’re tracking a host of offline conversions and can stitch these to the behavior on the site or app You have brought down marketing spend tremendously...

Guest post: After looking at the data of 80 tech companies- what have I learned? Part I

13 February 2019  •  Segah A Mir
This is a guest post by Segah A. Mir, Partner and Consultant at Seattle-based Caura & Co. The past five years have given me a tremendous opportunity to see firsthand the data of over 80 VC-backed tech companies. That is close to 100 teams and 300 individuals. Naturally, I’ve got to see a lot of data — very detailed information on every transaction, activity, click, and interaction. What would be expected of me now is to go...

How server-side tracking fills holes in your data and improves your analytics

05 February 2019  •  Rebecca Lane
Client side tracking: a brief history lesson At Snowplow Analytics, we fundamentally believe that getting data collection right is one of the most important steps for deriving value from data. This is often an iterative process and the data you collect and how you collect it should evolve over time as your use cases and your analytics set up evolves and matures. While collecting data client-side is universal across our customer base, we want to...

How data ownership makes you a more effective data scientist

05 February 2019  •  Anthony Mandelli
Data scientists report spending 80% of their time cleaning and collecting data, leaving only the remaining 20% for actual analysis. As a data scientist, you spend time finding ways to query across multiple data sets, formatting data to work with different analytics tools, and applying any number of modifications to take data you’ve collected and turn it into data you can use. Contrast this with companies who own their data infrastructure end-to-end with solutions like...

A misconception about how retail personalization drives sales

23 January 2019  •  Anthony Mandelli
When retailers are looking for a way to drive sales, personalization can look like a quick win: buy a recommendation engine and put more products a customer is likely to buy in front of them, then watch the sales come in. Unfortunately, it’s not quite that easy. When you implement a personalization strategy, you need to do it in the context of the overall customer experience you’re trying to create. You need to be clear...

Monitoring Bad Rows on GCP Using BigQuery and Data Studio

23 January 2019  •  Colm O Griobhtha
One of the key features to Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate all data types as they come through the pipeline. Another key feature to Snowplow is that it’s highly loss-averse - when data fails validation, those events are preserved as bad...

Debugging bad data in GCP with BigQuery

19 December 2018  •  Colm O Griobhtha
One of the key features of the Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate data as it comes through the pipeline. Another key feature is that it’s highly loss-averse: when data fails validation, those events are preserved as bad rows. Read more about...

Snowplow for Google Cloud Platform is here

03 December 2018  •  Anthony Mandelli
Since the early days of Snowplow Analytics, we’ve been committed to giving our users very granular, highly structured data because we believe that’s what you need to be truly data driven. Doing awesome things with this data, though, has been historically challenging because of how detailed it is. Thanks to Google, we have a solution to that problem. Google Cloud Platform (GCP) has grown, over the last ten years, to become one of the largest,...

Long sales cycles don't have to be trouble

14 November 2018  •  Anthony Mandelli
Retailers know that understanding the way customers behave during the sales cycle is the key to optimizing this process so it’s enjoyable, rewarding, and painless for the customer and efficient for the retailer. Marketers want to connect activities like advertising campaigns to downstream activities like making a purchase. While this process might be straightforward for many companies, it can be quite convoluted for retailers with longer sales cycles such as high value goods like cars,...

Building reliable, scalable customer acquisition for marketplaces

31 October 2018  •  Anthony Mandelli
Two-sided marketplaces are environments where you primarily have two distinct user types: buyers and sellers. There are many different marketplaces out there: GetNinjas in Brazil, Fiverr in the United States, and OneFlare in Australia are all marketplaces where people can find professional service providers like plumbers, photographers, personal trainers, or more; jobs boards, like Indeed or ZipRecruiter, are marketplaces where job seekers can connect with prospective employers or recruiters; 99Designs and DesignCrowd are marketplaces connecting...

The right data infrastructure to support successful squads

01 June 2018  •  Anthony Mandelli
Part eight of our series on product analytics. Read: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 Squads are self contained units, popularized by companies like Spotify, containing developers, engineers, analysts, data scientists, and individuals from other disciplines that allow the squad to operate independently. Squad-based organizations have demonstrated the effectiveness of their style of product development: a key strength of the squad model is that individual teams can...

Improving A/B testing with event data modeling

25 May 2018  •  Anthony Mandelli
Part seven of our series on product analytics. Read: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 8 Conducting an A/B test is significantly more complicated than just randomly assigning users into two groups. To run a truly meaningful experiment, as we’ve pointed out, requires meticulous planning around what experiment is run, what the expected impact of the experiment will be, and what metrics will best capture that impact. Effective...

Getting the most out of product analytics with intelligent questions

27 April 2018  •  Anthony Mandelli
Part six of our series on product analytics. Read: Part 1, Part 2, Part 3, Part 4, Part 5, Part 7, Part 8 In the beginning, when you’re starting out with analytic-driven product development, the amount of data your analytic platform gives you access to can be overwhelming. With all of the preconfigured charts and dashboards, it can be easy to fall into a trap of passively consuming the data in front of you, assuming...

Using Snowplow to solve business problems at our latest London Meetup

13 April 2018  •  Anthony Mandelli
On March 29th, we were excited to host members of the digital analytics community at Runway East for the return of the Snowplow Meetup to London for the fifth time. We had three presentations, including two from organizations that have found success using the Snowplow pipeline to power their digital analytics. Check out the videos below and join the official Snowplow Meetup group to find an event near you. Headstart Co-founder and CTO Jeremy Hindle...

Real-time data processing with Google Analytics using Snowplow

23 March 2018  •  Anthony Mandelli
It’s been almost a full eight weeks since we released support for ingesting your complete Google Analytics, hit level data into Snowplow with R99 Carnac. Shortly thereafter, we made that support available in our real-time stack. This now means that if you’re a Google Analytics user, using Snowplow you can now process your complete event stream in real-time. Currently, this is not even possible for Google Analytics 360 users. In this post, we’re going to...

Analyzing behavioral data with Indicative and Snowplow

22 March 2018  •  Anthony Mandelli
Digital platforms have changed the way companies engage with their users. Users can communicate directly with companies through email or social media, find support through chat interfaces, and learn about new products all through digital channels. As we lead increasingly digital lives, the range of products offered has grown exponentially. Gone are the days of simple document browsing and online shopping. Digital and web based products help us manage our schedules, monitor our health, communicate...

Using Snowplow for marketing data analytics

16 March 2018  •  Anthony Mandelli
The first step towards solving a problem is admitting one exists. And I have a lot of problems. As marketing professionals, we all do: which projects are highest priority, how many leads did we generate this month, did we publish a blog post yet this week, and so on. The fundamental goal of marketing is to support organizational growth. Whether you’re a retailer looking to boost sales, a B2B SaaS provider trying to increase lead...

Creative experiments and A/B tests produce the best results

23 February 2018  •  Anthony Mandelli
Part five of our series on product analytics. Read: Part 1, Part 2, Part 3, Part 4, Part 6, Part 7, Part 8 As our last post on product analytics demonstrated, there’s no shortage of tools and platforms for product analysts to help them more effectively do their job. With so many solutions available, it’s no surprise that many product teams invest significant amounts of time, effort, and money into analytics. What is surprising, though,...

The product analyst toolkit

09 February 2018  •  Anthony Mandelli
Part four of our series on product analytics. Read: Part 1, Part 2, Part 3, Part 5, Part 6, Part 7, Part 8 In our previous post on product analytics, Yali discussed the product development process and how data plays a crucial role at each step. While Yali’s post was focused on the processes and people necessary for building a highly effective product team, here we want to explore some of the tools that those...

Warehousing Google Analytics data: API vs hit-level data

08 February 2018  •  Anthony Mandelli
Recently, we very excitedly announced that Google Analytics users could use Snowplow to load their data into their own data warehouses in Redshift and Snowflake DB, a major milestone in making Snowplow available to as many data professionals as possible regardless of their infrastructure. We were humbled by how excited many of you were about this integration, though among the positive sentiment there was a recurring question: why? As we discussed in the release post,...

Data-driven product development is more about process, culture, and people than technology

02 February 2018  •  Yali Sassoon
Part three of our series on product analytics. Read: Part 1, Part 2, Part 4, Part 5, Part 6, Part 7, Part 8 What does successful use of data to drive the product development process look like? It’s much more about process and culture than it is about technology. There’s no getting around it - product analytics matters. As we’ve already explored, intelligent use of data as an integral part of the product development process...

Intelligent use of data in product development differentiates successful companies

26 January 2018  •  Anthony Mandelli
Part two of our series on product analytics. Read: Part 1, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8 “Every employee should be empowered to make data informed decisions,” wrote Jeff Feng, PM Lead for Data at Airbnb, in a post on the roomshare service’s Engineering & Data Science blog. Calling it one of Airbnb’s fundamental beliefs, Feng identifies this desire for empowerment as the driving force behind the Data Science...

Product analytics part one, data and digital products

19 January 2018  •  Anthony Mandelli
Part one of our series on product analytics. Read: Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8 Not long ago, the best way to witness firsthand how the general population interacted with your product was to assemble a random sample of potential users and stick them in a room while you observed from behind a two-way mirror. That, or conduct an in-home study, asking users to keep a journal...

Loading and analyzing Snowplow event data in Neo4j

17 July 2017  •  Dilyan Damyanov
Back in 2014 we published a series of blog post on using Snowplow event data in the graph database Neo4j. Three years on, they’re still among our most popular blog posts. (See below for links to the original posts.) A lot has changed since then. Neo4j has strengthened its position as a leading graph database solution. Its query language, Cypher, has grown with the platform. It has changed to the point where some of the...

How to develop better games with level analytics

12 April 2017  •  Colm O Griobhtha
Summary Product managers and game designers generally aim to design game levels in such a way that they challenge gamers enough to make completing a level satisfying, but not so challenging that they drop out and stop playing the game. This blog post shows an example of how product managers and game designers can use a well designed dashboard to better understand user behaviour across a game’s levels, design highly playable game levels, A/B test...

How a clear data taxonomy drives insight and action

27 January 2017  •  João Correia
This is guest blog post by João Correia, Solutions Director at Igloo Analytics and an experienced analytics professional, helping organizations embed analytics for growth and innovation. In this post, João explains how to define an analytics strategy with Snowplow Analytics that considers your business context and drives insights and action. Many thanks to João for sharing his views on this topic! If you have a story to share, feel free to get in touch. Add...

Debugging bad data in Elasticsearch and Kibana - a guide

03 March 2016  •  Yali Sassoon
One of the features that makes Snowplow unique is that we actually report bad data: any data that hits the Snowplow pipeline and fails to be processed successfully. This is incredibly valuable, because it means you can: Spot data tracking issues that emerge, quickly, and address them at source Have a corresponding high degree of confidence that trends in the data reflect trends in the business and not data issues Recently we extended Snowplow so...

Building first and last touch attribution models in Redshift SQL

22 February 2016  •  Yali Sassoon
In order to calculate the return on marketing spend on individual campaigns, digital marketers need to connect revenue events, downstream in a user journey, with marketing touch events, upstream in a user journey. This connection is necessary so that the cost of those associated with the marketing campaign that drove those marketing touches can be connected to profit associated with the conversion events later on. Different attribution models involve applying different logic to connecting those...

Dealing with duplicate event IDs

19 August 2015  •  Christophe Bogaert
The Snowplow pipeline outputs a data stream in which each line represents a single event. Each event comes with an identifier, the event ID, which was generated by the tracker and is—or rather should be—unique. However, after having used Snowplow for a while, users often notice that some events share an ID. Events are sometimes duplicated within the Snowplow pipeline itself, but it’s often the client-side environment that causes events to be sent in with...

Analyzing marketing attribution data with a D3.js visualization

02 July 2015  •  Justine Courty
Marketing attribution, as in understanding what impact different marketing channels have in driving conversion, is a very complex problem: We have no way of directly measuring the impact of an individual channel on a user’s propensity to convert It is not uncommon for users to interact with many channels prior to converting It is likely that different channels impact each other’s effectiveness Because of this difficulty, there is not yet a consensus in digital analytics...

JSON schemas for Redshift datatypes

12 February 2015  •  Fred Blundun
This blog contains JSON schemas for the all the data types supported by Amazon Redshift. We supply two schemas for each numeric type, since you may want to send in numeric types as JSON strings rather than JSON numbers. SMALLINT INTEGER BIGINT DECIMAL REAL DOUBLE PRECISION BOOLEAN CHAR VARCHAR DATE TIMESTAMP SMALLINT The schema for passing the value in as a number: { "type": "integer" } And the schema for passing the value in as...

Using graph databases to perform pathing analysis - initial experiments with Neo4J

31 July 2014  •  Nick Dingwall
In the first post in this series, we raised the possibility that graph databases might allow us to analyze event data in new ways, especially where we were interested in understanding the sequences that events occured in. In the second post, we walked through loading Snowplow page view event data into Neo4J in a graph designed to enable pathing analytics. In this post, we’re going to see whether the hypothesis we raised in the first...

Loading Snowplow event-level data into Neo4J

30 July 2014  •  Nick Dingwall
In the last post, we discussed how particular types of analysis, particularly path analysis, are not well-supported in traditional SQL databases, and raised the possibility that graph databases like Neo4J might be good platforms for doing this sort of analysis. We went on to design a graph to represent event data, and page view data specifically, which captures the sequence of events. In this post, we’re going to walk through the process of taking Snowplow...

Can graph databases enable whole new classes of event analytics?

28 July 2014  •  Nick Dingwall
With Snowplow, we want to empower our users to get the most out of their data. Where your data lives has big implications for the types of query and therefore analyses you can run on it. Most of the time, we’re analysing data with SQL, and specifically, in Amazon Redshift. This is great a whole class of OLAP style analytics - it enables us to slice and dice different combinations of dimensions and metrics, for...

How configurable data models and schemas make digital analytics better

11 July 2014  •  Yali Sassoon
Digital analysts don’t typically spend a lot of time thinking about data models and schemas. How data is modelled and schema’d, both at data collection time, and at analysis time, makes an enormous difference to how easily insight and value can be derived from that data. In this post, I will explain why data models and schemas matter, and why being able to define your own event data model in Snowplow is a much better...

Understanding Snowplow's unique approach to identity stitching, including comparisons with Universal Analytics, Kissmetrics and Mixpanel

16 April 2014  •  Yali Sassoon
This post was inspired by two excellent, recently published posts on identity stitching: Yehoshua Coren’s post Universal Analytics is Out of Beta - Time to Switch? and Shay Sharon’s post on the intlock blog, The Full Customer Journey - Managing User Identities with Google Universal, Mixpanel and KISSmetrics. In both posts, the authors explain in great detail the limitations that traditional analytics solutions have when dealing with identity stitching. In this post, I hope to...

Why and how to use big data tools to process web analytics data? Joint Qubole and Snowplow webinar

19 February 2014  •  Yali Sassoon
Last night, I presented at a webinar organized by our friends at Qubole on using big data tools to analyze web analytics data. You can view the slides I presented below: On the webinar, I talked through the limitations associated with using traditional web analytics tools like Google Analytics and Adobe SiteCatalyst to do web analytics data, and how using big data technologies, and Snowplow and Qubole in particular, addressed those limitations. My talk was...

Introducing Looker - a fresh approach to Business Intelligence that works beautifully with Snowplow

10 December 2013  •  Yali Sassoon
In the last few weeks, we have been experimenting with using Looker as a front-end to analsye Snowplow data. We’ve really liked what we’ve seen: Looker works beautifully with Snowplow. Over the next few weeks, we’ll share example analyses and visualizations of Snowplow data in Looker, and dive into Looker in more detail. In this post, we’ll take a step back and walk through some context to explain why we are so excited about Looker....

Quick start guide to learning SQL to query Snowplow data published

19 November 2013  •  Yali Sassoon
Whilst it is possible to use different BI tools to query Snowplow data with limited or no knowledge of SQL, to really get the full power of Snowplow you need to know some SQL. To help Snowplow users who are not familiar with SQL, or those who could do with a refreshing their knowledge, we’ve put together a quick start guide on the Analytics Cookbook. The purpose of the guide is to get the reader...

Call for data! Support us develop experimental analyses. Have us help you answer your toughest business questions.

28 October 2013  •  Yali Sassoon
This winter we are recruiting interns to join the Snowplow team to work on discrete projects. A number of the candidates we have interviewed have expressed an interest in working with us to develop new analytics approaches on Snowplow data. In particular, we’ve had a lot of interest in piloting machine learning approaches to: Segmenting audience by behaviour Leveraging libraries for content / product recommendation (e.g. PredictionIO, Mahout, Weka) Developing and testing new approaches to...

Using the new SQL views to perform cohort analysis with ChartIO

22 October 2013  •  Yali Sassoon
We wanted to follow-up our recent launch of Snowplow 0.8.10, with inbuilt SQL recipes and cubes, with a few posts demonstrating how you can use those views to quickly perform analytics on your Snowplow data. This is the first of those posts. In this post, we’ll cover how to perform a cohort analysis using ChartIO and Snowplow. Recap: what is Cohort Analysis We have described cohort analysis at length in the Analyst Cookbook. To sum...

Book review - Apache Hive Essentials How-to

30 September 2013  •  Yali Sassoon
Although it is no longer part of the core Snowplow stack, Apache Hive is the gateway drug that got us started on Hadoop. As some of our recent blog posts testify, Hive is still very much a part of our big data toolkit, and this will continue as we use it to roll out new features. (E.g. for analyzing custom unstructured events.) I suspect that many Hadoopers started out with Hive, before experimenting with the...

Reprocessing bad rows of Snowplow data using Hive, the JSON Serde and Qubole

11 September 2013  •  Yali Sassoon
This post is outdated. For more documentation on debugging and recovering bad rows, please visit: Debugging bad rows in Elasticsearch and Kibana Debugging bad rows in Elasticsearch using curl (without Kibana) Snowplow 81 release post (for recovering bad rows) Hadoop Event Recovery One of the distinguishing features of the Snowplow data pipeline is the handling of “bad” data. Every row of incoming, raw data is validated. When a row fails validation, it is logged in...

Using Qubole to crunch your Snowplow web data using Apache Hive

03 September 2013  •  Yali Sassoon
We’ve just published a getting-started guide to using Qubole, a managed Big Data service, to query your Snowpow data. You can read the guide here. Snowplow delivers event data to users in a number of different places: Amazon Redshift or PostgreSQL, so you can analyze the data using traditional analytics and BI tools Amazon S3, so you can analyze that data using Hadoop-backed, big data tools e.g. Mahout, Hive and Pig, on EMR Since we...

Is web analytics easy or hard? Distinguishing different types of complexity, and approaches for dealing with them

28 June 2013  •  Yali Sassoon
This post is a response to an excellent, but old, blog post by Tim Wilson called Web Analytics Platforms are Fundamentally Broken, authored back in August 2011. Tim made the case (that is still true today) that web analytics is hard, and part of that hardness is because web analytics platforms are fundamentally broken. After Tim published his post, a very interesting conversation ensued on Google+. Reading through it, I was struck by how many...

Getting started using R for data analysis

26 June 2013  •  Yali Sassoon
R is one of the most popular data analytics tools out there, with a rich and vibrant community of users and contributors. In spite of its popularity in general (and particularly with amongst academics and statisticians), R is not a common tool to find in business or web analysts arsenal, where Excel and Google Analytics tend to reign supreme. That is a real shame. R is a fantastic tool for exploring data, reworking it, visualizing...

Measuring how much traffic individual items in your catalog drive to your website

22 May 2013  •  Yali Sassoon
We have just added a new recipe to the catalog analytics section of the Analytics Cookbook. This recipe describes: How to measure how effectively different items in your catalog drive visits to your website. How to use the data to unpick how each item drives that traffic. In digital marketing, we can distinguish classic “outbound marketing”, where we push visitors to our website using paid ad campaigns, for example, with “inbound market”, where we pull...

Performing market basket analysis on web analytics data with R

20 May 2013  •  Yali Sassoon
We have just added a new recipe to the Analytics Cookbook: one that walks through the process of performing a market basket analysis, to identify associations between products and/or content items based on user purchase / viewing behavior. The recipe covers performing the analysis on Snowplow data using R and the arules package in particular. Although the example walked through uses Snowplow data, the same approach can be used with other data sets: I’d be...

Where does your traffic really come from?

10 May 2013  •  Yali Sassoon
Web analysts spend a lot of time exploring where visitors to their websites come from: Which sites and marketing campaigns are driving visitors to your website? How valuable are those visitors? What should you be doing to drive up the number of high quality users? (In terms of spending more marketing, engaging with other websites / blogs / social networks etc.) Unfortunately, identifying where your visitors come from is not as straightforward as it often...

Funnel analysis with Snowplow (Platform analytics part 1)

23 April 2013  •  Yali Sassoon
Eleven days ago, we started building out the Catalog Analytics section of the Analytics Cookbook, with a set of recipes covering how to measure the performance of content pages and product pages. Today we’ve published the first set of recipes in the new platform analytics section of the Cookbook. By ‘platform analytics’, we mean analytics performed to answer questions about how your platform (or ‘website’, ‘application’ or ‘product’) performs. This is one of the most...

Measuring content page performance with Snowplow (Catalog Analytics part 2)

18 April 2013  •  Yali Sassoon
This is the second part in our blog post series on Catalog Analytics. The first part was published last week. Last week, we started building out the Catalog Analytics section of the Analytics Cookbook, with a section documenting how to measure the effectiveness of your product pages. Those recipes were geared specifically towards retailers. This week, we’ve added an extra section to the cookbook, covering how to measure engagement levels with content pages. The recipes...

Measuring product page performance with Snowplow (Catalog Analytics part 1)

12 April 2013  •  Yali Sassoon
We built Snowplow to enable businesses to execute the widest range of analytics on their web event data. One area of analysis we are particularly excited about is catalog analytics for retailers. Today, we’ve published the first recipes in the catalog analytics section of the Snowplow Analytics Cookbook. These cover how to measure and compare the performance of different product pages on an ecommerce site, using plots like the one below: In this blog post,...

Reflections on Saturday's Measurecamp

18 February 2013  •  Yali Sassoon
On Satuday both Alex and I were lucky enough to attend London’s second Measurecamp, an unconference dedicated to digital analytics. The venue was packed with smart people sharing some really interesting ideas - we can’t do justice to all those ideas here, so I’ve just outlined my favourite two from the day: Using keywords to segment audience by product and interest match, courtesy of Carmen Mardiros Transferring commercially sensitive data into your web analytics platform...

Using ChartIO to visualise and interrogate Snowplow data

08 January 2013
In the last couple of weeks, we have been experimenting with ChartIO - a hosted BI tool for visualising data and creating dashboards. So far, we are very impressed - ChartIO is an excellent analytics tool to use to interrogate and visualise Snowplow data. Given the number of requests we get from Snowplow users to recommend tools to assist with analytics on Snowplow data, we thought it well worth sharing why ChartIO is so good,...

Transforming Snowplow data so that it can be interrogataed in BI / OLAP tools like Tableau, Qlikview and Pentaho

17 December 2012  •  Yali Sassoon
Because Snowplow does not ship with any sort of user interface, we get many enquiries from current and prospective users who would like to interrogate Snowplow data with popular BI tools like Tableau or Qlikview. Unfortunately, it is not possible to run a tool like Tableau directly on top of the Snowplow events table. That is because these tools require the data to be in a particular format: one in which each line of data...

Snowplow in a Universal Analytics world - what the new version of Google Analytics means for companies adopting Snowplow

31 October 2012  •  Yali Sassoon
Earlier this week, Google announced a series of significant advances in Google Analytics at the GA Summit, that are collectively referred to as Universal Analytics. In this post, we look at: The actual features Google has announced How those advances change the case for companies considering adopting Snowplow 1. What changes has Google announced? The most significant change Google has announced is the new Measurement Protocol, which enables businesses using GA to capture much more...

Performing web analytics on Snowplow data using Tableau - a video demo

24 October 2012  •  Yali Sassoon
People who see Snowplow for the first time often ask us to "show Snowplow in action". It is one thing to tell someone that having access to their customer- and event-level data will open up whole new analysis possibilities, but it is another thing to demonstrate those possibilities. Demonstrating Snowplow is tricky because currently, Snowplow only gives you access to data: we have no snazzy front-end UI to show off. The good news is that...

Why set your data free?

24 September 2012  •  Yali Sassoon
At Saturday’s Measure Camp, I had the chance to introduce Snowplow to a large number of some incredibly thoughtful and insightful people in the web analytics industry. With each person, I started by explaining that Snowplow gave them direct access to their customer-level and event-level data. The response I got in nearly all cases was: what does having direct access to my web analytics data enable me to do, that I can’t do with Google...