data quality

Trust your data.

Snowplow provides you with extensive tooling to support accurate data collection, manage and monitor data quality.

Formal validation

All data collected is validated against its associated data schemas. You can configure schemas to be as strict as you like, making it easy to proactively identify data quality issues e.g. a value being stored in the wrong field.

data pipeline

No events sent to the Snowplow pipeline are silently “dropped”. If there is an issue processing the data, we surface it with the associated error messages, so you can easily monitor and proactively identify data quality issues as they emerge, rather than after the fact.

Fully auditable

You have direct access to the data at every stage in the Snowplow data pipeline, enabling you to audit the data quality at each stage and validate that no data has been lost or incorrectly transformed.

Recover and reprocess
bad data

It is possible to recover and reprocess bad data, so that data tracking issues do not necessarily need to result in gaps in your data collection.

Automated testing
for your tracking

Extend your test suites to include your Snowplow tracking so you can release new versions of your website, apps or server side applications knowing the new changes won’t break your tracking set up.

From the Snowplow blog

We need to talk about bad data

No one in digital analytics talks about bad data. A lot about working with data is sexy, but managing bad data, i.e. working to improve data quality, is not. Not only is talking about bad data not sexy, it is really awkward, because it forces us to confront a hard truth.

Debugging bad data in GCP with BigQuery

One of the key features of the Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate data as it comes through the pipeline.

Introducing Snowplow Micro

With Snowplow Micro you can validate your tracker setup as part of your automated test suite. This unique features lets you simulate particular situations, to ensure you receive the data in the format and values you expect. This way, you can release new versions of your website, apps or server side applications knowing you won’t break your tracking.