Using AWS Glue and AWS Athena with Snowplow data

30 July 2018  •  Konstantinos Servis
This is a guide to interacting with Snowplow enriched events in Amazon S3 with AWS Glue. The objective is to open new possibilities in using Snowplow event data via AWS Glue, and how to use the schemas created in AWS Athena and/or AWS Redshift Spectrum. This guide consists of the following sections: Why analyze Snowplow enriched events in S3? AWS Glue prerequisites Creating the source table in Glue Data Catalog Optionally format shift to Parquet...

Don't be a hammer

01 December 2017  •  Anthony Mandelli
I had a professor back in college who started class with an exercise that forever altered my way of thinking. The class was on design thinking and how we could apply it to creative problem solving (and vice versa). The first day we met, the students trickled into the classroom and selected seats at round tables, each with a pile of blank name tags and a rainbow of colored markers. It was all very standard:...

Possession is 9/10 of the Law

30 October 2017  •  Anthony Mandelli
We’re at a point now where data is a sexy word. Big Data, data science, data analytics- the list of emerging data-focused fields, tools, and products continues to grow. This growth is largely thanks to developing collection technology; as collection tools improve, we find ourselves handling vastly improved data and actively seeking out ways to use it. However, when it comes to utilizing data, most organizations are relatively unsophisticated in their methods. The truth is...

Ad impression and click tracking with Snowplow

07 March 2016  •  Yali Sassoon
It is possible to track both ad impression events and ad click events into Snowplow. That means if you’re a Snowplow user buying display ads to drive traffic to your website or app, you can track not only what users do once they click through onto your site or app, but what ads they have been exposed and whether or not they clicked any of them. This is paticularly useful for companies building attribution models,...

Issue with Elastic Beanstalk Tomcat container for Clojure Collector users - diagnosis and resolution

31 July 2015  •  Yali Sassoon
A few weeks ago one of our users reported that they were consistently missing data between 1am and 2am UTC. We investigated the issue and found that their Clojure Collector was not successfully logging data in that hour. Working with engineers at AWS we identified the cause of the issue. At some stage (we cannot confirm exactly when) Amazon released a new Elastic Beanstalk Tomcat container version which had a bug related to the anacron...

Unified Log Processing is now available from Manning Early Access

31 July 2014  •  Alex Dean
I’m pleased to announce that the first three chapters of my new book are now available as part of the Manning Publications’ Early Access Program (MEAP)! Better still, I can share a 50% off code for the book - the code is mldean and it expires on Monday 4th August. The book is called Unified Log Processing - it’s a distillation (and evolution) of my experiences working with event streams over the last two and...

The Snowplow team will be in Israel and Cyprus in March - get in touch if you'd like to meet

18 March 2014  •  Alex Dean
I (Alex) will be heading to Tel Aviv next week and then heading on to Nicosia. If you’re interested in meeting up to discuss Snowplow, event analytics or big data processing more generally, I’d love to arrange a meeting! I will be in Tel Aviv all day Sunday March 23rd and Monday March 24th, including speaking at Big Data & Data Science Israel in Herzeliyya on the Sunday. I’ll then be in Cyprus from March...

Our video introduction of Snowplow to code_n

28 October 2013  •  Yali Sassoon
We were very flattered to be invited by the team at code_n to enter their competition to identify “outstanding young companies and promote their groundbreaking business models”. This year’s competition is focused on data, and has the motto Driving the Data Revolution. As part of our application process, we put together a short video introducing Snowplow. You can watch the video below. We look forward to finding out if our application has been successful!

Reduce your Cloudfront costs with cache control

02 July 2013  •  Yali Sassoon
One of the reasons Snowplow is very popular with very large publishers and online advertising networks is that the cost of using Snowplow to track user behavior across your website or network is significantly lower than with our commercial competitors, and that difference becomes more pronounced as the number of users and events you track per day increases. We’ve been very focused on reducing the cost of running Snowplow further. Most of our efforts have...

Amazon announces Glacier - lowers the cost of running Snowplow

21 August 2012  •  Alex Dean
Today Amazon announced the launch of Amazon Glacier, which is a low-cost data archiving service designed for rarely accessed data. As Werner Vogels described it in his blog post this morning: Amazon Glacier provides the same high durability guarantee as Amazon S3 but relaxes the access times to a few hours. This is the right service for customers who have archival data that requires highly reliable storage but for which immediate access is not needed....