data collection platform.
The Snowplow pipeline is architected from the ground up for data quality, so you don’t have to worry about bad or missing data.
The Snowplow Pipeline
Our technology is ideal for data teams who want to manage the collection and warehousing of data across all their platforms and channels, in real-time.
Trackers & webhooks
Track events from anywhere
as the pipeline gateway
Event collectors Event collectors receive Snowplow events from our trackers and webhooks, reliably storing your events to Amazon S3, Amazon Kinesis or Google Cloud Pub/Sub. Collectors are the gateway into your Snowplow pipeline and our autoscaling tech ensures that traffic spikes never cause you any issues.
are validated upfront
Data validation Garbage in doesn’t have to mean garbage out, Snowplow prevents bad data from breaking your downstream systems by validating your events upfront. You define the structure of the incoming events and validation ensures that only well-structured, usable data comes through the pipeline. Data which fails validation isn’t blocked or thrown away - we preserve it in a bad data store for recovery and reprocessing.
Store and evolve
your data structures
Schema registry Snowplow event validation is built on top of our schema registry, a world-leading technology for storing and evolving data structures or schemas. Your schema registry contains the full history of all your event data structures, ready for fast retrieval by the Snowplow pipeline to validate your events in real-time.
The richest event data
Data enrichment Snowplow provides the richest event data in the market - we deliver on this promise by widening Snowplow events with additional information. Snowplow supports a range of configurable enrichments like marketing campaign attribution, GeoIP lookup, and Spiders and Bots - as well as fully custom enrichments. A complete overview of our enrichments can be found in our documentation.
Real-time event stream,
Real-time applications Snowplow provides a real-time stream of fully enriched event data, with a consistent, predictable structure - plus a set of Analytics SDKs to make it easy for your engineers to consume that event data. This lets you build real-time applications such as content recommendations, fraud detection or real-time dashboards on top of your Snowplow data.
Your data in your cloud
Warehousing Your event data is stored in your preferred cloud data warehouse, in a format ready for your data team to work with. On AWS, chose from Redshift or Snowflake - or work directly with the data in Amazon S3. On GCP, we load your event data into BigQuery in near-real-time; the pipeline will also back up your data to Cloud Storage.
Join, model and
act on your data
& Intelligence Snowplow delivers a data platform which can meaningfully reflect the nature of your business operations, not a vendor’s view of your industry. Snowplow data can easily be joined with other data sets, aggregated and modeled; you can plug the data straight into your favourite business intelligence tools. Our Implementation Engineers can help empower your analysts or data team to derive maximum insights from your Snowplow data.
Running native inside your cloud account
Snowplow was designed to run native on your public cloud - first AWS and now GCP. We don’t reinvent the wheel: Snowplow builds on top of the clouds’ own best-in-class data services, like Kinesis and BigQuery.
Your security officer can sleep easy: your business-critical event data is never routed through Snowplow or any other third-party’s servers.
Amazon Web Services
Snowplow runs native on AWS with a technical architecture that is linearly scalable. We build on top of the AWS services that data and systems engineers know and love - from Kinesis to S3 to Redshift.
We’re fully integrated into your AWS account so that you can work directly with your event data and your preferred tools.
Google Cloud Platform
Snowplow runs native on GCP, with a technical architecture that is linearly scalable. Snowplow is built on top of GCP’s best-in-class data services - from Cloud Dataflow to BigQuery to Cloud Pub/Sub.
We’re are fully deployed into your Google Cloud account so there’s nothing standing in-between you and your event streams.