Monitor and manage data quality in real time

Share

Many customers and community members come to Snowplow specifically because they want accurate and complete data collection.

One of our core features is being a loss-less data pipeline, which means all data can be accounted for, even if it doesn’t make it all the way through to a data storage environment. More specifically, events coming into a Snowplow pipeline are validated to ensure only those that match the pre-defined expectation make it into the data warehouse. If any event has any processing issue while making its way through the pipeline, it too would be separated to prevent corrupting the data set in the downstream storage target.

Yet even with these features, ensuring complete and accurate web and mobile data is hard. Data quality issues often emerge when:

For many data teams the resulting data quality issues are often only identified once a graph or chart is computed on the data and shows something unusual. The source of the issue takes time and effort to diagnose. When it is finally identified, the issue is usually only fixed for newly incoming data. This means a business faces inaccurate or incomplete data for the entire time period the issue went undetected.

However, diagnosing and fixing these issues is critical. As businesses use web and mobile data to do more, such as power real-time applications, improve the user experience and inform critical product decisions, undetected data quality issues can lead to bad decisions. Unresolved data quality issues could lead to a loss of confidence in the data. And once a business loses trust in a data set, it is very hard to win it back.

Data Quality UI/API and notifications

At Snowplow, we have launched an improved toolset to make it easier for users to proactively monitor data quality and surface errors as soon as they happen. We actively validate every event processed by Snowplow against the associated schema definitions for an event or associated entities, and customers can: 

Benefits

This new functionality enables Snowplow Insights customers to:

As a result, users can expect an increase in the overall quality of their data, allowing data teams to build higher levels of assurance in data accuracy and completeness, and enabling the broader business to use web and mobile data across more applications with greater confidence.

Under the hood

The new UI is powered by a complete refactoring of our core pipeline technology. This means any data processing issues result in very highly structured errors that enable us to easily distinguish failures of “real data” from noise generated by bots and spiders on the web for example, or other requests hitting the Snowplow collector that do not represent real data.

Get started

Not a Snowplow Insights customer yet? Get in touch with us here to learn more.

How does data quality impact your product and organization? We want to hear your story and feature it in our next blog post! Reach out to lyuba@snowplowanalytics.com if you’d like to share your experience.

Share

Related articles