Data Engineer Remote or London, UK or Berlin, Germany

A data engineer at Snowplow Analytics Ltd works across our product and infrastructure engineering efforts.

Product engineering

Over the past four years Snowplow has grown into the industry-leading open-source event data pipeline (main repository), consisting of a dizzying array of user-facing products, SDKs and software libraries.

All of these 30+ projects are products in some sense, but Snowplow is not a packaged SaaS product – instead, our various user constituencies (data analysts, developers, devops) interact with the platform via SQL, software SDKs and public APIs; being open source, the Snowplow codebase is itself an important user-facing aspect of the product.

Current and planned projects in product engineering include:

  • Migrating the Snowplow batch pipeline from Hadoop to Apache Spark (see the RFC)
  • New SaaS integrations for Sauna, our decisioning and response platform
  • Adding new event sources to Snowplow, including SaaS webhooks and database change-data-capture
  • Porting Snowplow to new platforms such as Apache Kafka and Google Cloud Platform
  • Adding schema inference support to Snowplow and Iglu, our schema registry system
  • Building tooling and user interfaces for event data modeling in SQL and Apache Spark

Infrastructure engineering

Infrastructure engineering is focused on helping Snowplow Analytics Ltd to grow to managing 100, then 1,000, then 10,000 AWS accounts as part of the Snowplow Managed Service.

To deliver the Snowplow Managed Service we have built a proprietary deployment, orchestration and monitoring stack, using pragmatic technologies including Ansible, CloudFormation, bash, Golang, cron and PagerDuty. We are also developing open source infrastructure tooling, such as DAG runners, Hadoop jobflow runners and similar.

We are constantly iterating on and evolving our infrastructure stack - current and planned projects include:

  • Porting our real-time pipeline orchestration engine to Kubernetes, then open sourcing it
  • Replacing our in-house secrets manager with HashiCorp Vault
  • Adding a UI to Factotum, our open source DAG runner
  • Implementing Mesos and evaluating options for running scheduled job DAGs on Mesos (replacing our in-house distributed cron system)
  • Building a framework for automatic upgrades of customers’ Snowplow pipelines
  • Evaluating Nix as a replacement for much of our Ansible automation

Responsibilities

Responsibilities include:

  • Working closely with the Snowplow co-founders, gaining deep familiarity with our 30+ open source projects, and making contributions back to those projects to make them easier to operate at scale
  • Designing and developing our in-house Managed Service stack, using pragmatic technologies including Ansible, CloudFormation, bash, Golang, cron, PagerDuty, Scala, Java, Mesos, Akka and Kubernetes
  • Designing and developing our open source infrastructure tooling, such as DAG runners, Hadoop jobflow runners and similar
  • Working closely with Support Engineering, including spending time regularly on the support rotation, to understand their requirements and build tooling to automate their ongoing work
  • Originating and specifying all-new open source projects on both the product and infrastructure engineering sides
  • Following best practices in terms of customer/user support, product documentation, testing and QA, software delivery techniques

What we’re looking for

We’d love to get to know you if:

  • You have strong technical skills. This role would be a great fit for a software engineer who loves infrastructure automation, or who wants more exposure to data engineering and functional programming
  • You communicate with clarity and precision. It’s super-important that our data engineers do not become bottlenecks across Snowplow’s processes and systems. Communicating your work and being responsive to feedback is as important as your technical ability
  • You have a mature attitude to InfoSec, documentation and process. Managed Service customers trust us with their event pipelines and AWS accounts - this is a huge responsibility and informs everything we do

Interested? Send your CV to recruitment@snowplowanalytics.com.

We do not welcome calls from recruitment consultants.

Back to jobs page