Under the hood
Snowplow consists of five loosely-coupled subsystems.

1. Trackers
- Snowplow data is generated by trackers and passed to a collector.
- Currently we have a Javascript tracker for tracking user interactions on websites and web apps, and a No-JS tracker for tracking user behaviour in web-environments that do not support Javascript e.g. emails.
- An iOS and Android tracker are on the product roadmap.
2. Collectors
- Collectors receive Snowplow event data from trackers and log it to S3.
- Currently we have a Cloudfront collector for tracking user activity across a single domain, and a Clojure collector for tracking activity across multiple domains
3. ETL and enrich
- Once raw data has been logged to S3, an ETL step processes that data, cleaning it (e.g. extracting data from querystrings) and enriching it (e.g. inferring user locations from IP addresses).
- Our ETL step currently using Apache Hive on EMR to process the raw logs via a custom serde. We are part way through developing a more robust subsystem using Scalding / Cascading.
- The ETL step finishes by loading the data into one or more data storage options
4. Storage
- Snowplow can be setup to load your event-level and customer level data into one or more data stores, to enable analytics
- Currently we support loading Snowplow data into S3 (for processing by Hive / Pig / Hadoop / Mahout on EMR), Redshift and Infobright Community Edition for more traditional analysis (e.g. using BI tools like ChartIO or sophisticated analytics tools like R)
5. Analytics
- Once your Snowplow data is available in storage, you can plug it into multiple different analytics tools to mine that data
Learn more