Why data collection is key to your data strategy – Part two


How to do data collection right

The second in a two-part blog series on data collection, and how, if done properly, it forms the foundation of your data strategy.

Doing data collection right puts data front and center, as it should be, given its strategic importance and high value. The way you secure data, share it, protect its value, and fundamentally, collect it demonstrates your respect for the trust users have shown in sharing their data in the first place. At a time when consumer trust in organizations is eroding, suspicion about data collection is growing, and regulations are beginning to impede what data analytics vendors can do with data, preserving trust and handling data respectfully has never been more important.

We know that data collection is not easy, and doing it right is even more challenging. Heads of data and their data teams frequently tell us that they struggle with the governance of collecting data. That is, how should they govern all the activity around data collection? How can they ensure consistency and better standardization in data collection? How should they define the nitty-gritty of how they collect data? Who is responsible for or should own each aspect of data collection? These are tough questions, and the shape of this process is evolving to meet changing needs. Part of the data collection equation is cultural, that is, it is focused on who works with data and how, and what they want to accomplish. But without two key things – the core that is data collection, and the confidence you have in that data – the journey to an effective data strategy is an exercise in futility.

Confidence in data begins in how you collect it. We discussed the strategic business imperatives of data collection in the first blog post in this series. In the second post, we will dig into the hands-on work of what you need to do to collect data properly.

Best practices in data collection: the Snowplow way

Right out of the gate, let’s state that we’re broadly defining best practices in data collection as the Snowplow approach to data collection. It’s not that there are not other ways to collect data, but we happen to believe that your having ownership and control of your data infrastructure are cornerstones of successful, long-term data strategies and effective data governance. Organizations with long-term interests in advanced data analytics use cases require not just the sophistication of a future-proof and extensible data stack, but also the ability to make decisions about data management.

Whether it is a matter of deciding how and where to store your data, or wanting to avoid vendor lock-in, or determining how to structure incoming data and your own data tracking attributes and policies, some proprietary analytics solutions are little more than black boxes that offer little or no transparency over how your data is handled. To have complete control, you need your own infrastructure and data pipeline and the ability to collect data your own way to perform the kinds of analysis you want to do.

How to collect reliable, understandable and easy-to-use data

In our previous post, we highlighted data ownership, data quality and reliability, data understandability, and data ease of use as the pillars of proper data collection. In this post we will go into greater depth about a few of these points as they pertain to techniques and processes for collecting high-quality data: reliability, understandability and ease of use.

Collect for data reliability

Nothing will ever be 100 percent reliable, and ensuring data reliability is no exception. Still, building confidence in the reliability of data is in part down to the quality of the underlying event data. To get as close as possible to having complete and accurate data, your data collection can be set up to ensure:

Collect for data understandability

At its most basic, data reflects what has happened in the world. We collect qualitative and quantitative data to understand why and to what degree things are the way they are. Thus making sense of, and understanding, data relies on its lining up with human logic, and how our mental representations of events work. Easy-to-understand data:

Collect for data ease-of-use

Is your data ready to go, plug-and-play, and analytics-ready? Does your data help your data team avoid time-consuming, onerous data preparation and cleaning work? In a nutshell, is your data easy to use? Our take on what makes data easy to use is that it is:

March to the beat of your own data: Build precision data collection into your data strategy

What we’ve outlined here is a great starting point, but getting data collection right is an ongoing process – technically and culturally. As business objectives change, so too do the demands made on your data. Will your data collection approach and tools evolve with and scale for these new demands, questions and use cases? Will it enable greater precision in data collection to match up with more targeted analytics goals? We believe that by embracing data collection as a strategic activity, organizations stand to forge ahead with meeting the challenges posed by new data sources, types of data, and custom data modeling needs while driving greater value from the data they collect.

Not every company needs this level of control and detail right now, but many do, certainly as enterprise data use cases become more sophisticated. Many organizations run into durability problems, and the solution that meets their needs now may have limited applicability or extensibility as they outgrow their previous data requirements and don’t have the flexibility to expand, or because new regulatory restrictions make it much more difficult to derive reliable insights using third-party tools. Taking ownership of and directing the collection of your own data will ensure that you don’t end up at a data dead-end.

Snowplow is designed to help companies achieve all of the above on their own terms. Snowplow’s flexibility lets you march to the beat of your own data, drumming precision into your current data strategy while keeping the pace for inevitable shifts, pivots and growth in your business.


Related articles