An introduction to event data modeling

16 March 2016  •  Yali Sassoon
Data modeling is an essential step in the Snowplow data pipeline. We find that those companies that are most successful at using Snowplow data are those that actively develop their event data models: progressively pushing more and more Snowplow data throughout their organizations so that marketers, product managers, merchandising and editorial teams can use the data to inform and drive decision making. ‘Event data modeling’ is a very new discipline and as a result, there’s...

Data modeling in Spark (Part 1): Running SQL queries on DataFrames in Spark SQL

02 December 2015  •  Christophe Bogaert
An updated version of this blogpost was posted to Discourse. We have been thinking about Apache Spark for some time now at Snowplow. This blogpost is the first in a series that will explore data modeling in Spark using Snowplow data. It’s similar to Justine’s write-up and covers the basics: loading events into a Spark DataFrame on a local machine and running simple SQL queries against the data. Data modeling is a critical step in...