Making the case for the centralized data platform
Building trust in data comes down to making the same data available to everyone within an organization to create a single source of truth. If everybody isn’t working from the same blueprint, how can you build trust in your data? Trust in data is an oft-cited problem at the enterprise level; a 2018 KPMG study found that only 35% of executives have a high level of trust in their organization’s data, and 25% actively distrust their data. The rapidly changing face of technology and a plethora of competing, and sometimes confusing, solutions adds to this distrust.
Getting to a place of trust is cultural, organizational and technical. It requires breaking down silos and elevating the importance of data. It also requires deciding how data will be managed, leading you to ask a series of questions about governance and tools. Some of these questions include: how will you approach data collection and storage? What practices will help govern data quality, accessibility and usability? What kind of data team structure will you put in place; who will be responsible for what? How will you socialize data, making sure that different teams can access, analyze and visualize the data they need?
In this post, we will discuss why and how building a centralized data platform answers many of these questions and helps you:
- Understand what centralized data infrastructure is
- Centralize thinking - consider how you collect and where you put your data, and understand why this matters
- Break down silos and why
- Develop a data-oriented culture
- Build trust in your data
- Get closer to achieving your full data potential
What do we mean by centralized data platform?
Before we get started, what do we mean by “centralized data platform”? A centralized data platform is more a concept than a product. It’s not to be confused, for example, with something like a customer data platform (CDP), which you may also use, but is not what this article is about. When we discuss centralizing data, we refer to a centralized data infrastructure for collecting, processing and storing data. Putting together the right data stack to unify multiple sources of data and establish a shared methodology is a common challenge among many of our customers in making data accessible to and usable for everyone across their company. Whether evaluating tools and making decisions about how to pull this data together from a technical standpoint, or trying to break down siloed organizational structures, centralizing your data is the first step toward creating the sought-after single source of truth.
Centralizing the data not only ensures cross-functional access but also elevates the data to the level of value it actually represents. If data is truly an enterprise’s most valuable asset, its collection, storage and accessibility should be treated as key considerations.
Centralized platform thinking
Centralizing data aids in developing a cohesive data strategy: you know where all the data is, you have clear oversight into compliance and privacy, you gain insight into how and whether the data is being used and how you might make the data consumption experience better and more accessible. More broadly, you have a clear picture of how, from where and why that data has been collected. You facilitate this with the mechanics of collecting your data and deciding where it will live. A centralized data platform simplifies the path to a single source of truth, and fundamentally gives you a way to empower people to put the power of data to use.
Usually your centralized data infrastructure will be a combination of solutions that work seamlessly together to collect, process and store your data. Your approach to this will depend on your needs, but it’s fair to say that the terrain is complex enough that simplifying the data ecosystem should be a priority.
“Centralized platform thinking” takes into account a number of factors that highlight the importance of owning your data, including:
- the exponentially expanding number of data sources and channels from which you want to collect data, such as web, mobile/devices, on and offline, phone, CRM, ad data, etc.
- breaking down silos to move from each team having its own tool for data collection and visualization to a centralized single source of truth that enables joining these many data sources together
- the requirement from regulatory, compliance and privacy perspectives to have clear justifications for and oversight into the reasons for and methods of data collection and storage
- the evolving analytics landscape and more advanced, cross-functional use cases - the need to democratize access to data, which requires flexibility, scalability and the freedom to pick and choose the technologies and tools that fit your data needs at any given time
With a centralized data proposition, you control exactly what is ingested, what is stored, how and where, and retain the flexibility and freedom to make changes to the solutions that make up your platform when and as needed.
Real-world centralized data platform
Some use cases, such as many two-sided marketplaces, have many moving parts and different data systems running in isolated pockets, making deriving a clear, full picture of what happened, when and to whom, difficult.
One example we’ve seen is a two-sided marketplace running a consumer website and web apps for retailers. Tracking across consumer and retailer tools was inconsistent and siloed, employing different analytics solutions based on what different teams thought was best for their use cases.
When this customer decided that they wanted to evolve from their homegrown data pipeline to develop a centralized data platform, they needed to examine and make decisions about their data infrastructure. Driven by a fundamental desire to centralize to scale and democratize data, build efficiencies and furnish the company with a “system of record” rather than fragmented, inconsistent data, they identified key must-haves:
- Flexibility to grow and evolve, e.g. customizable tracking and collecting all event-level data across all data sources
- Scalability of technology to serve growing business needs, use cases and changing demands, e.g. moving from on-premise to cloud infrastructure
- Freedom, transparency and lack of lock-in to choose and integrate selected pipeline, warehousing, and other solutions. For example, choosing to own and run the data pipeline and warehousing on one cloud platform (e.g., GCP and BigQuery), while using another cloud provider for other purposes
- Data and infrastructure ownership
Once they successfully developed their centralized data platform, they were able to achieve overarching data goals, such as:
- Creating a single source of truth/system of record for data
- Making data available as a company-wide, self-serve asset through BI tools, such as Looker
- Empowering internal data consumers to make data-informed decisions
- Building a path to greater data literacy across the company
Why break down silos?
Simplifying and centralizing data is one big step toward breaking down inaccessibility silos and solving the “data all over the place” problem. Frequently cited challenges associated with silos are both technical and organizational. They include difficulties managing data and ensuring data consistency, difficulties gaining access to or socializing data, getting past organizational structures that don’t encourage cooperation. Being able to unpartition data by harmonizing its collection and processing and then distribute/democratize that newly unified data, you can start to realize the unlimited possibilities of your centralized approach.
Unlimited possibilities of centralized data
The move to centralized data isn’t only about avoiding inefficient, opaque data silos and compliance and privacy problems. It, first and foremost, helps accomplish what most companies expect of their data: to tell a compelling story, to justify a business decision or strategy, to attribute marketing spend, or understand the full user/customer journey from a single source of truth. Centralized data means that the whole organization works from the same blueprint, avoiding discrepancies that easily arise from disparate data and different tools. For example, marketing and product reporting on the same metrics is done with the same data.
But it’s also about the future, and being able to harness the right data for more advanced use cases. When data is siloed, limited, or untrustworthy, it’s impossible to expand the scope of applications that can be developed. Possibly more importantly, when the data infrastructure itself inhibits innovation and growth, there is no scaling beyond current use cases. As most businesses are moving in this data-driven direction, where data - sometimes real-time data - will inform development, a centralized data flow will unchain vast possibilities, paving the way forward.
“In every Tourlane team there is a need to understand the customer journey. To get there, we need to understand the whole data flow and have it centralized. We needed a high-quality, reliable single source of truth that every team at Tourlane could come to to get and build what they need.” - Kevin James Parks, Data Engineer, Tourlane
Most organizations citing data as key to their growth agree that centralizing data collection creates the path to a single source of truth that cross-organizational teams can rely on to enable cross-functional, diverse use cases and understand their user journeys more thoroughly.
Centralizing data to unify your culture
Many data challenges are related to organizational, possibly historical, and/or cultural issues. They are often human problems. Organizational and cultural change is the linchpin for breaking down many human-oriented barriers to data use. Removing them lets you:
- enable cross-organizational access to consistent, single source of truth data
- empower a self-serve data culture
- ensure data trust, quality and value while relieving burdens to data teams
- develop a data-oriented culture while improving user experience
A siloed approach to data, its use and consistency, and the tools teams use to work with data ties everyone’s hands. The fundamental action of centralizing data helps to socialize and democratize data to expand a company-wide data mindset and build trust in data across the company and its culture. This ultimately leads to enabling a self-serve culture that gives teams across an organization access to data from the same origin, and helping them to use it by making available different tools, such as Looker for the marketing team and Indicative for the product team.
“With Snowplow we are focused on extracting and centralizing data from everywhere, ensuring data quality to be able to stitch everything we need together to get a complete picture. That has required developing a tracking and data mindset in the company from scratch.” - Kevin James Parks, Data Engineer, Tourlane
Trust: Data quality is the beating heart of centralizing data
At the heart of the whole data endeavor is trust. It doesn’t really matter if you centralize your data infrastructure and democratize access to data company-wide if you don’t have confidence and trust in the data feeding your data strategies and data-powered activities. Happily, you can also adopt a data quality focus as a part of your move to a centralized data game plan.
We talk a lot about data quality and how to achieve it. Why? Time and again we see that the quality of data you collect and put into action influences what comes out. Incomplete, inaccurate and sloppy data not only makes the work of data analysts more difficult but clearly also messes up analyses, leads to misleading, incorrect, and possibly expensive conclusions and decision-making, erodes value and, ultimately, trust in data teams and the data itself. How you collect your data, and prioritizing the importance of this function, is at the heart of achieving data quality; centralizing and making that consistent, high-quality data transparent and accessible across departments contributes to fostering the trust required.
Analyst firm Gartner has also claimed that poor data quality has led companies to direct, significant financial loss - an annual average of up to 15 million USD. Less calculable losses - such as the value of trust in data teams and the work they perform, the tools the company has invested in - deeply affect the level of trust in data, coinciding with many businesses’ operations putting resources and focus on advanced, data-intensive use cases, i.e., the exact wrong time for your trust in data to plummet.
“The Snowplow initiative at PEBMED gave us the greatest value in that we could finally track all of our products and events in one place, consistently, and centralize this in a Snowplow-enabled single source of truth, saving effort on the data collection front and ensuring quality on the analytics end.” - Pedro Gemal Lanzieri, CTO, PEBMED
Achieve your full data potential
We believe that a centralized data platform is fundamental to accessing a single source of truth that will allow you to better understand your users’ journeys and behaviors, to tap into more advanced and compelling analytics use cases and to secure your continued competitive advantage.
According to the Snowplow philosophy on data collection, taking a modern and future-proof approach to centralized data should:
- ease the collection and unification/centralization of your data
- allow you to own and control your data, which will form the foundation of a flexible data layer that delivers consistent, clean data across systems, tools and teams
- enable flexibility and extensibility so your data stack can evolve with your data and business strategies
- give you more robust control over data governance and compliance issues, in terms of understanding how and why your data is collected, how it is handled and stored
Taking advantage of the benefits of centralized data, you can achieve the full potential of your data, gain efficiency from streamlined, consistent data, and move squarely into experimental and future-oriented applications, like AI and machine learning.