Data Science Festival: What makes an effective data team?

24 April 2019  •  Alex Dean
It was great to be able to speak at the Data Science Festival in London on April 13. DSF is an annual, week long celebration of data science that culminates in a one-day main event. It’s a place for current and future data scientists to meet, discuss challenges and opportunities and network with fellow data enthusiasts. The original topic of my talk was “Why high quality data is crucial for your machine learning models.” but...

Data Science Festival: Machine learning in real-time: the next frontier

24 April 2019  •  Alex Dean
In addition to my main talk at Data Science Festival on What makes an effective data team, I was lucky to give one of the 10 minute “Lightning talks” in the SHIFT room before lunch as well. This post briefly recaps my lightning talk on machine learning in real-time, before sharing my conference highlights and some closing thoughts. Machine learning in real-time: the next frontier For my Lightning talk I discussed “Machine learning in real-time:...

Snowplow Spotlight Benjamin Benoist

11 April 2019  •  Miriam de Medwe
Benjamin Benoist - Data Engineer based in Berlin What do you do at Snowplow? I joined Snowplow 6 months ago as a Data Engineer. During the first few months, I was working mainly for a customer to build a real-time application on top of Snowplow data. I’m now getting more involved in developments around the Snowplow pipeline. I’m currently adding my first enrichment and I’m also integrating contributions from the open source community. Why did...

Using AWS Glue and AWS Athena with Snowplow data

04 April 2019  •  Konstantinos Servis
This is a guide to interacting with Snowplow enriched events in Amazon S3 with AWS Glue. The objective is to open new possibilities in using Snowplow event data via AWS Glue, and how to use the schemas created in AWS Athena and/or AWS Redshift Spectrum. This guide consists of the following sections: Why analyze Snowplow enriched events in S3? AWS Glue prerequisites Creating the source table in Glue Data Catalog Optionally format shift to Parquet...

Guest post: 3 reasons why your company should own its data

04 April 2019  •  Jacob Thomas
This is a guest post by Jacob Thomas, Lead Data Engineer at CarGurus. You can find the original article and read more from Jacob on Bostata. When it comes to your company’s software and infrastructure, it often makes sense to buy vs. build. However, it will benefit you in the long-term to thoroughly understand and own your data management and collection. Here’s why. When it comes to software and related infrastructure, businesses get caught in...

How many of your visitors block your Snowplow tracking?

18 March 2019  •  Mike Nemirovsky
tl;dr As a company that focuses on helping businesses collect data in order to better serve their customers, we inevitably get asked about what happens when those customers don’t want to be tracked. With the usage of Ad-blockers, and in particular privacy filters, on the rise, some of our customers are seeing the effect on their data. This effect is at times perceived as a problem or a threat towards the quality of data collection....

Snowplow R113 Filitosa real-time pipeline improvements

06 March 2019  •  Ben Fradet
Snowplow 113 Filitosa, named after the megalithic site in Southern Corsica, is a release focusing on improvements to the Scala Stream Collector as well as new features for Scala Common Enrich, the library powering all the different enrichment platforms. This release is almost entirely made of community contributions, shoutout to all the contributors: LiveIntent for: adding Prometheus support to the Scala Stream Collector making it possible to use POST requests in the API request enrichment...

Snowplow at Superweek: machine learning and actioning data

06 March 2019  •  Archit Goyal
For anyone unfamiliar with Superweek, it’s 5 days of analytics talks and is quite a lot like going to summer camp as a child with: Less sun and grass, more fog and snow Fewer hyperactive children, more hyperactive data analysts Less ice cream, more fried Hungarian potato pancakes with sour cream and chives Fewer sporting activities, more GTM activities And fewer small bonfires and more big bonfires I will aim to cover what I took...

Snowplow for retail part 1: how can I use Snowplow?

06 March 2019  •  Archit Goyal
“We have several disparate brands and users have multiple touch points (web, app, over the phone and in store) before purchasing and we don’t have a single customer view. This means we can’t effectively group our users. We don’t have a good understanding of how our marketing spend across multiple channels affects revenue. We just released an app and people are downloading it but not buying on it, it feels like lost revenue. We’re spending...

Snowplow for retail part 2: what data do I track?

06 March 2019  •  Archit Goyal
We recommend you you have read the first post in this series before diving into this one to ensure you have all the context you need! There are also three more posts in this series that you can read next: What can we do with data when we’re getting started? What can we do with data when we’re growing? What can we do with the data when we’re well established? What do I track? With...