Snowplow Postgres Loader 0.1.0 released

Share

For several years we were getting requests from the OSS community to add PostgreSQL support to Snowplow. Today we’re thrilled to announce the initial release of Snowplow Postgres Loader.

In this post:

  1. Why PostgreSQL
  2. Snowplow Postgres Loader
  3. Next steps
  4. Setup
  5. Getting help

1. Why PostgreSQL

PostgreSQL is one of top 5 database engines as of mid 2020. It is almost the standard de-facto among OLTP relational databases with an enormous amount of extensions and tools that can turn it into anything from time-series database to a monstrous multi-node cluster with analytical workloads. Vanilla PostgreSQL though was never meant to be used as OLAP and it still can be hard to tune the performance of analytical queries on multi-GB scans. However, not everyone is having datasets with hundreds of gigabytes, nor everyone is interested in analytical queries.

In fact, most of the requests we’ve seen were coming from users who would like to try out Snowplow and don’t want to pay for Redshift or Snowflake or set them up. Snowplow is easy to run on-premise, it is supported by almost any known cloud provider and and has an incredible ecosystem and community.

These are good reasons to extend the pool of supported databases with such a great candidate!

2. Snowplow Postgres Loader

RDB Loader (and its predecessor, StorageLoader) had PostgreSQL support since inception, but this support was extremely limited. Several major examples of this limited functionality:

We decided that PostgreSQL has inherently different requirements over the RDB Loader, which is currently (watch this space) oriented towards batch loading, and therefore deserves a dedicated Snowplow Postgres Loader repo.

We’re planning to remove PostgreSQL support from RDB Loader in one of the next releases, while Postgres Loader will be leveraging all the benefits of its only storage target.

With this initial release, Postgres Loader is already capable of:

3. Next steps

Postgres Loader was born as a hackathon project, as a response to a very frequent OSS community request. Despite a very solid ground such as FS2 and KCL it is not meant to be used in pipelines with scalability requirements and we never tested it in real-world scenarios. However, we do seek your feedback! We believe Postgres Loader has a big potential in many areas: demos, QA, low-volume pipelines and eventually mid-high volume pipelines.

Currently, Postgres Loader is missing following features and we would like to implement them in upcoming releases:

If you spot other opportunities, please do share your feedback on Discourse and GitHub! It will help us to prioritize the work.

4. Setup

Setup is described on our docs website.

5. Getting help

If you have any questions or run into any problems, please visit our Discourse forum. If you spotted a bug or have a feature request – please fill an issue on GitHub.

Share

Related articles