Snowplow at Superweek: machine learning and actioning data

06 March 2019  •  Archit Goyal

For anyone unfamiliar with Superweek, it’s 5 days of analytics talks and is quite a lot like going to summer camp as a child with:

  • Less sun and grass, more fog and snow
  • Fewer hyperactive children, more hyperactive data analysts
  • Less ice cream, more fried Hungarian potato pancakes with sour cream and chives
  • Fewer sporting activities, more GTM activities
  • And fewer small bonfires and more big bonfires

I will aim to cover what I took away as the themes of the week seen through the lens of a Snowplow user.

Although there were a series of talks about privacy and ethics, touching on the enforceability of GDPR in the courts and the fine line between influencing and manipulating users (my favorite talk of the week was by Stephane Hamel on this topic), the most interesting themes of the week to me were machine learning and data strategy.

Machine Learning

There was a lot of debate about where we are in the hype cycle for machine learning (ML), the value of ML, and rise of automated ML.

While the talks themselves were very engaging, it was the volume of ML and autoML related talks that interested me. If industry experts are speaking about it at length, it means data teams’ appetite for ML driven insights, whether generated by data scientists or autoML powered analysts, is still growing.

A major problem for companies with building out ML capabilities is that data scientists are an expensive resource and the majority of their time is spent on cleaning data they are given- this problem applies if using autoML too. Snowplow tackles the root problem as we focus on getting data collection right:

  1. Snowplow data is validated against a set of rules defined by you, only events that pass validation are loaded to the data warehouse in a highly structured schema
  2. The data is rich with over 130 out of the box properties captured with each event
  3. Data from different sources (email, web, iOS, Android, servers) are all in the same format and loaded to the same table
  4. Full ownership of event level data, no need to work with aggregated data

Whether or not AutoML drives significant change in the industry, fixing the core problems of collecting poor data will help drive insights from data using ML more effectively.

Actioning Data

It will be interesting to see what proportion of companies that use ML will take actions off the back of the insights they gain from it. There was a major theme at Superweek on the general lack of action taken from data.

From a range of talks, there seemed to be a few key reasons why data wasn’t being actioned within organizations:

  • The data team doesn’t know how to use the tool effectively, the implementation was done by a third party and knowledge transfer was limited
  • Even if the analysts and scientists trust the data, senior management who will take actions off the insights do not
  • There may not be sufficient motive to action the data
  • The communication structures in the organisation are broken so the right data will never be collected or sent to the right person

Simo Ahava covered that last point in his great talk. He spoke about how poor organizational structure and communication between teams can be a blocker in getting data actioned and how the best solution is to work with the team to fix communication channels and bring all the stakeholders to the table where possible.

While this is the best solution as it tackles the core issue, it may not always be doable. Since Snowplow offers a great deal of flexibility in designing a tracking setup with custom events and entities (each with their own custom properties), we have seen our users mimic their company’s communication structures in their tracking design quite successfully. If two internal product teams don’t communicate often, they can set up their own custom events and version them independently while maintaining a set of common entities (such as a user entity) which wouldn’t require as much evolution. This works well as an interim solution, but as mentioned, nothing can beat fixing the communication structures themselves.

Final thoughts

It was great to hear Snowplow being mentioned in talks by Simo Ahava, Charles Farina, Kristoffer Ewald and Damion Brown. It would be interesting to hear how Snowplow users find the tool in regards to the three key themes raised at Superweek of privacy, machine learning, and the actionability of data. For the right organization it can certainly bring some benefits in all three areas so if you want to learn more please don’t hesitate to contact us!