Do read the post that answers the question: What do I track?
What can we do with the data, we’re a well established data team?
Senior management love the work of the data team so far:
Lower marketing spend, higher ad revenue and an uptick in lifetime values have management pretty happy so they have decided to triple the size of the data team adding data engineers and data scientists.
It’s good you have Snowplow at this point in your growth because this is when the tool is most effective.
With the luxury of having a lot of analysts, we have seen our customers embed analysts in the teams that are the end consumers of the data such as editorial teams - this makes Snowplow data highly effective as these analysts are empowered with Snowplow’s rich event level data.
Here’s a small selection of the things you can begin to do now:
The focus in this section is on unlocking the power of real-time Snowplow data using your new engineering resource, and building on previous analytics with ML powered insights using your data science resource.
There are a few things that make Snowplow data great for feeding ML algorithms:
A reminder that this post is entirely non-technical. It is intended to give you an understanding of what is possible with the tool and is by no means a playbook. Where relevant there will be links to more technical deep dives, otherwise please do get in touch for more information!
So far we haven’t really leveraged the real-time features Snowplow offers. All the data being loaded into your data warehouse is first written to a real-time stream. In a few short steps, you can read off this stream to action the data using an AWS Lambda function or GCP cloud function.
Example: automated native ads
The data analyst defines the rules: users are scored depending on their past and current usage, for example viewing 4 articles and a video in 10 minutes increases this score greatly. These rules can be created by exploring Snowplow’s rich event level data.
The data engineer writes a function that, when the user reaches a certain score, a native ad can be surfaced to them with relevant content. Snowplow has SDKs that makes this process quite straightforward. Snowplow Co-Founder Yali Sassoon wrote a how to write a real-time app using Snowplow data in this tutorial.
The data scientist could maintain a lookup table of user groups and the content that is most effective for each. This way the email that is sent contains material that is most likely to make them convert.
All of this has the potential to vastly improve conversion rates and optimise marketing spend further. Why spend on marketing to the group of users who will never convert or will definitely convert - focus on those who might convert. The data scientists can group users into these 3 categories with the rich user profiles you can build with Snowplow data.
The point of Snowplow in this setup is to deliver the best possible inputs for a fully customised solution, it is however up to your data team to build that solution.
Using the approach taken in the section above on “Automated native ads”, you can also personalise the actual product features.
Using historic data you can gauge what a user would need to see to make them most likely to convert. Snowplow has delivered data that is the best possible input for your model and its up to you to take it from here. You can then use the real-time data feed to personalise the experience users have on the site or in the app by serving a paywall or suggesting content.
A/B testing with Snowplow is made easier with our Optimizely integration. Any other service you want to A/B test with can also be done using our custom tooling. See a blog post on our approach here.
By reading from Snowplow’s real-time stream, you can use time series forecasting to predict what the data should look like. If it deviates by more than some % this can trigger a notification via Slack or email to the relevant team. For example, the newsroom can be sent a slack message if an article is underperforming so they can update the title to improve performance.
We have seen this used and it is possible to alert relevant teams to abnormally low usage of a certain feature, prompting them to investigate and find a bug. They are then able to fix this bug before it causes a significant loss in revenue.
If there is a bug within a step in the subscription process that goes undetected for too long, this could have serious consequences. The potential return on investment of a good data feed and robust data strategy is immeasurable in cases like this.
Fraud detection combines the ideas of previous sections and is something Snowplow is very well suited to for a few reasons:
Using the wealth of historic data as inputs for a machine learning model, you can assess what behaviour is most predictive of ad fraud. Then as a ‘user’ arrives on your site you can, in real-time, compare their usage to key indicators of fraud generated by the machine learning model to assess whether or not to block site usage to prevent ad fraud. You can use this to flag users based on TTI, geography, IP and hyper engagement to name a few.
Again the return on investment here is potentially large, depending on the levels of fraud currently being faced.
Create content that people are asking for but doesn’t currently exist. Gauge user sentiment from their actions and searches to inform what kind of goods your consumers want. Machine learning can be used to predict what content will be most engaged with by users in ways that manual analysis wouldn’t be able to. As mentioned previously, Snowplow’s rich, structured data is perfectly suited to this application.