Why and how to seize the multi-cloud data analytics opportunity?

31 October 2019  •  Erika Wolfe
Most businesses across industries exist in a world in which cloud infrastructure is the norm. The cloud is no longer an esoteric buzzword-laden concept we’ve decided to evaluate. Instead, cloud computing accounts for at least a part of most organizations’ architectures, and complete or hybrid cloud strategies have continued to become dominant. We’ve moved beyond the question: “should we move to the cloud?” into the “how can we leverage multiple cloud services to gain competitive...

What is data modeling and why do I need it?

14 October 2019  •  Erika Wolfe
In its most basic form, data modeling is a way of giving structure to raw, event-level data. This structure is essentially your business logic applied to the data you bring into your data warehouse – making it easier to query and use for your specific use cases. (At least that is the Snowplow approach to data modeling.)The reason is fairly clear: data modeling adds meaning to what is probably a great volume of raw data...

The Snowplow Data Maturity Model

03 October 2019  •  Lyuba Golovina
As a business grows and expands its data strategy, it passes through different stages of data capability, or maturity, on its way toward becoming fully data informed. Data maturity can serve as a useful tool for businesses to measure where they are along the data journey, identify next steps and potential challenges or roadblocks. Companies including Dell and Periscope have already developed insightful definitions of data maturity. At Snowplow, thanks to our work with a...

How to optimize your pipeline for data quality

09 September 2019  •  Lyuba Golovina
Data-informed businesses rely on customer data. It powers their products, provides valuable insights and drives new ideas. However, as a business expands its data collection, it also becomes more vulnerable to data quality issues. Poor quality data, such as inaccurate, missing or inconsistent data, provides a bad foundation for decision making and can no longer be used to uphold arguments or support ideas based on that data. And once trust in the data is lost,...

Why ITP 2.1 affects your web analytics and what to do about it

17 June 2019  •  Lyuba Golovina
In the first part of this article series, we introduced Intelligent Tracking Prevention (ITP) and other browser-based privacy updates that have already begun to affect businesses relying on third-party data collection platforms to manage their web analytics activity. This second article addresses in detail what you can do to “ITP proof” your web data collection strategy. As we’ve described, ITP presents a fundamental problem for third-party web analytics vendors and the companies that rely on...

How ITP 2.1 works and what it means for your web analytics

17 June 2019  •  Lyuba Golovina
Big changes are coming to the way businesses collect web data. Browser manufacturers, led by Safari, continue to introduce privacy updates to prevent third parties from tracking users across websites. Although these measures target advertising companies that track users across different websites, they also impact businesses using web analytics to optimize their websites and provide visitors with the best possible experience: especially businesses relying on third-party web analytics tools, including Google Analytics. In this article,...

Interview with the author: Alex Dean on writing Event Streams in Action

27 May 2019  •  Lyuba Golovina
Snowplow’s Co-Founder and CEO Alex Dean recently completed his book Event Streams in Action. The book takes an in-depth look at techniques for aggregating, storing, and processing event streams using the unified log processing pattern. In this interview, Alex shares his insights on how businesses can benefit from ‘event centric’ data processing, his motivation behind writing the book, and more. You can get your copy of Event Streams in Action and 40% off any Manning...

Using AWS Glue and AWS Athena with Snowplow data

04 April 2019  •  Konstantinos Servis
This is a guide to interacting with Snowplow enriched events in Amazon S3 with AWS Glue. The objective is to open new possibilities in using Snowplow event data via AWS Glue, and how to use the schemas created in AWS Athena and/or AWS Redshift Spectrum. This guide consists of the following sections: Why analyze Snowplow enriched events in S3? AWS Glue prerequisites Creating the source table in Glue Data Catalog Optionally format shift to Parquet...

Don't be a hammer

01 December 2017  •  Anthony Mandelli
I had a professor back in college who started class with an exercise that forever altered my way of thinking. The class was on design thinking and how we could apply it to creative problem solving (and vice versa). The first day we met, the students trickled into the classroom and selected seats at round tables, each with a pile of blank name tags and a rainbow of colored markers. It was all very standard:...

Possession is 9/10 of the Law

30 October 2017  •  Anthony Mandelli
We’re at a point now where data is a sexy word. Big Data, data science, data analytics- the list of emerging data-focused fields, tools, and products continues to grow. This growth is largely thanks to developing collection technology; as collection tools improve, we find ourselves handling vastly improved data and actively seeking out ways to use it. However, when it comes to utilizing data, most organizations are relatively unsophisticated in their methods. The truth is...

Ad impression and click tracking with Snowplow

07 March 2016  •  Yali Sassoon
It is possible to track both ad impression events and ad click events into Snowplow. That means if you’re a Snowplow user buying display ads to drive traffic to your website or app, you can track not only what users do once they click through onto your site or app, but what ads they have been exposed and whether or not they clicked any of them. This is paticularly useful for companies building attribution models,...

Issue with Elastic Beanstalk Tomcat container for Clojure Collector users - diagnosis and resolution

31 July 2015  •  Yali Sassoon
A few weeks ago one of our users reported that they were consistently missing data between 1am and 2am UTC. We investigated the issue and found that their Clojure Collector was not successfully logging data in that hour. Working with engineers at AWS we identified the cause of the issue. At some stage (we cannot confirm exactly when) Amazon released a new Elastic Beanstalk Tomcat container version which had a bug related to the anacron...

Unified Log Processing is now available from Manning Early Access

31 July 2014  •  Alex Dean
I’m pleased to announce that the first three chapters of my new book are now available as part of the Manning Publications’ Early Access Program (MEAP)! Better still, I can share a 50% off code for the book - the code is mldean and it expires on Monday 4th August. The book is called Unified Log Processing - it’s a distillation (and evolution) of my experiences working with event streams over the last two and...

The Snowplow team will be in Israel and Cyprus in March - get in touch if you'd like to meet

18 March 2014  •  Alex Dean
I (Alex) will be heading to Tel Aviv next week and then heading on to Nicosia. If you’re interested in meeting up to discuss Snowplow, event analytics or big data processing more generally, I’d love to arrange a meeting! I will be in Tel Aviv all day Sunday March 23rd and Monday March 24th, including speaking at Big Data & Data Science Israel in Herzeliyya on the Sunday. I’ll then be in Cyprus from March...

Our video introduction of Snowplow to code_n

28 October 2013  •  Yali Sassoon
We were very flattered to be invited by the team at code_n to enter their competition to identify “outstanding young companies and promote their groundbreaking business models”. This year’s competition is focused on data, and has the motto Driving the Data Revolution. As part of our application process, we put together a short video introducing Snowplow. You can watch the video below. We look forward to finding out if our application has been successful!

Reduce your Cloudfront costs with cache control

02 July 2013  •  Yali Sassoon
One of the reasons Snowplow is very popular with very large publishers and online advertising networks is that the cost of using Snowplow to track user behavior across your website or network is significantly lower than with our commercial competitors, and that difference becomes more pronounced as the number of users and events you track per day increases. We’ve been very focused on reducing the cost of running Snowplow further. Most of our efforts have...

Amazon announces Glacier - lowers the cost of running Snowplow

21 August 2012  •  Alex Dean
Today Amazon announced the launch of Amazon Glacier, which is a low-cost data archiving service designed for rarely accessed data. As Werner Vogels described it in his blog post this morning: Amazon Glacier provides the same high durability guarantee as Amazon S3 but relaxes the access times to a few hours. This is the right service for customers who have archival data that requires highly reliable storage but for which immediate access is not needed....