The event data pipeline.

2 pipelines

Batch pipeline

Your event data in your own data warehouse

  • Run any query on your event data
  • Join web, mobile and marketing data sets with other data sources
  • View all customers interactions across all channels in a single place

Real-time pipeline

Your event data available to process in real-time, across any application.

  • Monitor the performance of your business in real-time (real-time dashboards)
  • Respond to your users in real-time (e.g. personalization, marketing automation)

Both pipelines have the same high-level architecture:

Track all events across all channels.

Track events from your website, mobile apps, server-side applications, desktop applications, call center, warehouse, delivery network, set-top box, smart car, smart home and more, using our extensive library of trackers:

  • Javascript
  • Android
  • Apple
  • Scala
  • PHP
  • Java
  • .NET
  • Clojure
  • ActionScript
  • Python
  • Ruby
  • Node.js
  • Unity
  • Lua
  • Arduino

Track events from your third party SaaS providers using our third party integrations.

Enrich your data.

Enrich your event data with the data that you need to answer your most important questions, including:

  • GeoIP and business IP data
  • Weather data
  • Device, operating system and browser data (inferred from useragent strings)
  • Campaign, marketing and attribution data
  • Currency conversion rates
  • Third party data available from external APIs
  • First party data available in your own SQL databases and your own APIs

Your data where you want it.

Where your data lives has a big impact on what you can do with it. Snowplow supports loading your data directly into the following data stores out-of-the-box:

  • PostgreSQL
  • Amazon Redshift
  • Amazon Kinesis
  • Apache Spark
  • Elasticsearch

Trust your data.

The Snowplow pipeline is transparent and auditable. View the data that is input into and output from each stage in the pipeline. No data is dropped as part of the processing: data that fails validation is kept (with an error log) so you can proactively identify data quality issues and fix them fast.

Act on your data. Now.

Our real-pipeline delivers your event data to you in seconds.

Own your data.

It’s your event data, not ours. Your data is collected, stored and processed on your own AWS account. Your data never lives on our servers.

Designed to scale.

Snowplow has been designed from the ground up to be linearly scalable. Our largest users process billions of events every day, without breaking a sweat.

Stay relevant.

The Snowplow data pipeline is built to evolve with your business. Change the events that you track, the way that your describe those events using data, and the way that you model and query that data as your business evolves.

100% open source.

Snowplow has evolved according to the needs and experience of the thousands of users that trust our technology to deliver collect, process and deliver their event data. All of our code is available on Github under the Apache 2 license.

Interested? Let’s get started.