Looking back at 2016
With the start of 2017, we have decided to look back at our 2016 blog and our community Discourse posts that generated more engagement with our users.
More than ten thousand users spent a total of 548 hours reading our blog posts whilst on Discourse (which we only launched this year), 8700 unique users spent 424 hours reading and participating in the Snowplow community.
Let’s take a closer look at:
1. Top 10 blog posts published in 2016
Let’s start by looking into our top 10 blog posts by number of unique users.
|Top||Blog posts||Unique users||Time in min|
|1||An introduction to event data modeling||1504||5220|
|2||Introducing Snowplow Mini||1156||3141|
|3||Introducing Factotum data pipeline runner||1072||1729|
|4||We need to talk about bad data||891||2330|
|5||Ad impression and click tracking with Snowplow||791||2013|
|6||Introducing Sauna, a decisioning and response platform||761||1930|
|8||Building first and last touch attribution models in Redshift SQL||511||1686|
|9||Debugging bad data in Elasticsearch and Kibana - a guide||460||776|
|10||Web and mobile data only gets you to first base when building a single customer view||341||921|
While this ranking already gives us some insights on what type of content drove the most engagement, let’s plot this the number of uniques against the average engagement time per post by unique, to compare compare posts not only by how many people each attracts but how long each of those people spends reading the content (at least on average).
Number of unique users per average time spent
The blog post, An introduction to event data modeling, stands out as the post that not only attracted the largest number of readers but also kept them reading longer than any of the other 10 posts. Event data modeling is a hot topic: one we’ve done a lot of thinking about at Snowplow over hte last 18 months. This was the first post where we started to sketch out an overall approach and highlight some of the key challenges to event data modeling, and it’s great to see that the community at large engaged with us. We’ve certainly had a lot of interesting conversations of that back of that blog post, and the presentations and other posts and threads on this topic.
It’s therefore also great to see that the second post by average time engaged per user was another event data modeling post - this time on building first and last touch attribution models in Redshift SQL.
Snowplow Mini was a surprise hit for us in 2016. The initial version was prototyped on a company hackathon back in Feb. By the time we published Introducing Snowplow Mini we had already piloted its use across a number of our users and found that it was invaluable to them as they developed new event and entity (context) schemas: enabling to test those instrumentation updates prior to rolling them out.
Introducing Factotum data pipeline runner was the third most popular blog post by number of users. This is very exciting: Factotum is something we developed at Snowplow to make our jobs of reliably instrumenting and running a huge number of data pipelines, each defined by a DAG, efficiently and robustly across hundreds of our users. The interest in Factotum shows that other people and companies are also interested in better managing the ongoing running of complciated, multi-step data pipelines.
Drilling into the source of traffic of the top 10 blog posts
To better understand the channels that drove users to our most read posts, we can split traffic by
refr_medium . We have plotted the blog posts per referrer to understand the distribution of traffic between the posts.
Distribution of unique users per different sources of traffic ranked by total unique users:
Search was a significant driver for many posts and after further investigation, we discovered that for example, the top post An introduction to event data modeling was ranking as the first result in the Google search when searched “Event data modelling”. Direct traffic drove significant more traffic for Introducing Factotum data pipeline runner and Introducing Sauna, a decisioning and response platform while Social had a significant impact for the top 6 posts.
Let’s now look at our top Discourse posts.
2. Top 10 Discourse threads published in 2016
|Top||Discourse posts||Unique users||Time in min|
|1||Visualise Snowplow data using Airbnb Caravel & Redshift [tutorial]||530||1121|
|2||Identifying users (identity stitching)||429||979|
|3||Should I use views in Redshift?||362||376|
|5||How to setup a Lambda architecture for Snowplow||251||806|
|6||Debugging a Serializable isolation violation in Redshift (ERROR: 1023) [tutorial]||208||496|
|7||Debugging bad rows in Spark and Zeppelin [tutorial]||201||296|
|8||Comparing Snowplow with Google Analytics 360 BigQuery integration (WIP)||184||480|
|9||Basic SQL recipes for web data||183||726|
|10||Loading Snowplow events into Apache Spark and Zeppelin on EMR [tutorial]||181||289|
Now let’s plot the same visualisation as before:
Number of unique users per average time spent:
The Discourse tutorial on Visualis[ing] Snowplow data using Airbnb Caravel & Redshift was the post that attracted the largest number of users: people are certainly interested in open source tools for visualizing data! It’s not a surprise therefore that the post Wagon alternative also featured in the top 10.
Our Basic SQL recipes for web data ranked first for engaged time: perhaps not surprising as it’s likely readers will have walked through the different example queries whilst testing them on their own Snowplow data.
Event data modeling also feature in the top 10 with our post Identifying users (identity stitching).
It’s also great to see the active interest in Spark by the Snowplow community - two of the top 10 posts are about analyzing Snowplow data with Spark.
What should we be writing about in 2017?
And sign up to our mailing list for a monthy digest of new content by the Snowplow Team and broader Snowplow Community.