Snowplow Scala Analytics SDK 0.4.0 released

13 February 2019  •  Rostyslav Zatserkovnyi
We are excited to announce the 0.4.0 release of the Snowplow Scala Analytics SDK, a library that provides tools to process and analyze Snowplow enriched events in Apache Spark, AWS Lambda, Apache Flink, Scalding, and other JVM-compatible data processing frameworks. This release reworks the JSON Event Transformer to use a new type-safe API, and introduces several other internal changes. Read on below the fold for: Event API Using the typesafe API Additional changes Upgrading Getting...

Guest post: After looking at the data of 80 tech companies- what have I learned? Part I

13 February 2019  •  Segah A Mir
This is a guest post by Segah A. Mir, Partner and Consultant at Seattle-based Caura & Co. The past five years have given me a tremendous opportunity to see firsthand the data of over 80 VC-backed tech companies. That is close to 100 teams and 300 individuals. Naturally, I’ve got to see a lot of data — very detailed information on every transaction, activity, click, and interaction. What would be expected of me now is to go...

Resolving entities with graph databases using Neo4j

13 February 2019  •  Dilyan Damyanov
In the previous post in this series, we looked at how we can model the canonical Snowplow page_view events as a graph. We identified the various entities that make up the event and assigned each dimension of the event as a property on one of those entity nodes. We then used composable schemas to piece together a JSON schema for the event, composed of the individual schemas for each node and relationship. In the meantime,...

How server-side tracking fills holes in your data and improves your analytics

05 February 2019  •  Rebecca Lane
Client side tracking: a brief history lesson At Snowplow Analytics, we fundamentally believe that getting data collection right is one of the most important steps for deriving value from data. This is often an iterative process and the data you collect and how you collect it should evolve over time as your use cases and your analytics set up evolves and matures. While collecting data client-side is universal across our customer base, we want to...

How data ownership makes you a more effective data scientist

05 February 2019  •  Anthony Mandelli
Data scientists report spending 80% of their time cleaning and collecting data, leaving only the remaining 20% for actual analysis. As a data scientist, you spend time finding ways to query across multiple data sets, formatting data to work with different analytics tools, and applying any number of modifications to take data you’ve collected and turn it into data you can use. Contrast this with companies who own their data infrastructure end-to-end with solutions like...

Snowplow JavaScript Tracker 2.10.0 released with global contexts

23 January 2019  •  Mike Hadam
We are pleased to announce a new release of the Snowplow JavaScript Tracker. Version 2.10.0 introduces global contexts, a set of powerful tools for working with contexts. Contexts are one of the most important features in Snowplow: they enable companies running Snowplow to track rich, highly structured data that is easy to work with. In this post we’ll quickly review what contexts are, before explaining why the “global contexts” functionality released is so powerful for...

A misconception about how retail personalization drives sales

23 January 2019  •  Anthony Mandelli
When retailers are looking for a way to drive sales, personalization can look like a quick win: buy a recommendation engine and put more products a customer is likely to buy in front of them, then watch the sales come in. Unfortunately, it’s not quite that easy. When you implement a personalization strategy, you need to do it in the context of the overall customer experience you’re trying to create. You need to be clear...

Monitoring Bad Rows on GCP Using BigQuery and Data Studio

23 January 2019  •  Colm O Griobhtha
One of the key features to Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate all data types as they come through the pipeline. Another key feature to Snowplow is that it’s highly loss-averse - when data fails validation, those events are preserved as bad...

Iglu R11 Capul de bour released

23 January 2019  •  Anton Parkhomenko
We are excited to announce the release of Iglu R11 Capul de bour, with more helpful linter messages and improved functionality in both Iglu Server and the core libraries. Improved linter messages Improvements to Iglu Server Improvements to the core libraries Upgrading Roadmap Getting help Read on for more information about Release 11 Capul de bour, named after the series of first postage stamps of Romania. 1. Improved linter messages Since its inception, it has...

Snowplow Snowflake Loader 0.4.0 released

17 January 2019  •  Rostyslav Zatserkovnyi
We are pleased to announce version 0.4.0 of the Snowplow Snowflake Loader! This release introduces optional event deduplication, brings significant performance improvements to the Snowflake Transformer, and includes several other updates and bug fixes. Read on below the fold for: Deduplication S3 optimizations New configuration options Other changes Upgrading Getting help 1. Deduplication It’s possible for two or more Snowplow events to have the same event ID, for example because a duplicate has been introduced...