The modern data stack in 2021


The last few years have seen an explosion in the number of data tools an organization can use to drive better decision making largely based on data stored and queried in cloud data warehouses. The cloud data warehouse has given organizations the ability to store and query vast datasets quickly and cost-effectively. For the first time organizations of all sizes can build a single, high-value data asset and use it to drive value across their business. 

A scalable framework

Driven by demand from organizations trying to get as much value out of the data in their warehouse, numerous data tools have emerged. Each tool has become highly specialized in its portion of the data lifecycle and most have many options with which they can be interchanged. For example, there are dozens of BI tools to visualize the data in the warehouse, each excellent at democratizing data within the organization. 

These highly specialized tools come together to form the modern data stack, a scalable, low barrier to entry group of technologies that startups and enterprises alike can adopt to drive immense value from their data. The stack is made up of a few key categories:

Given the number of categories and tools in this ecosystem, the data landscape has become extremely exciting but also increasingly complicated and confusing to map, making it hard for organizations to build and, more importantly, evolve their data platforms. To help organizations with this, we have put together our version of the modern data stack which you can find below.

How did we get here: The rise of the cloud data warehouse

In September 2020, Snowflake had the biggest software IPO of all time but at the start of the millennium, the idea that every company could have a single source of truth with every customer interaction and company record accessible by the entire business would have seemed far-fetched. There were three technologies that paved the way for the cloud data warehouse to become the defacto source of truth for organisations:

  1. Decades ago, only large companies could analyse large datasets given that vertical scaling of compute resources was needed which required a lot of up front expense. Hadoop kick started the big data revolution in 2006 as it made it easy for organisations to remove hard processing limits and scale compute horizontally rather than vertically.
  2. Most organizations were still limited by the need to invest upfront in compute resources. AWS ushered in the public cloud era removing the need for companies to build and maintain capital intensive server centres. AWS, GCP and Azure made it possible for organisations of all sizes to pay for as much storage and compute resources as they needed on a metered basis. 
  3. The modern cloud data warehouse revolution began with the launch and widespread adoption of Redshift in 2012. 

With Redshift, it suddenly became possible to cost effectively store huge relational datasets and run parallelised queries in SQL, all without owning any of the computers needed to do this. Data teams could write SQL models and analysts could plug in their favourite BI tools like Tableau to build their dashboards faster and with a richer dataset. Far gone were the days of hard processing limits, simple queries that took half a day and large capex to ask questions of your own rich datasets.

This meant that, for the first time, data collection and visualisation could be decoupled, since storage was so cost effective and scalable so organisations could store the data first without worrying too much about the exact structure it was in, then transform it for use by the business. This meant they could create a source of truth upstream of all the business systems that need data, like Tableau. 

Today, most organisations build their data platforms following this ELT approach where they create a centralised data asset in their cloud data warehouse or lake, combining all their source data, that acts as the source of truth for all their business systems and appreciates in value over time. Using the specialised tooling in the Modern Data Stack is the key to building, managing and evolving these data platforms. 

Discover how you can power your data stack with Snowplow


Related articles