Iglu Scala Client 0.6.0 released

Share

We’re tremendously excited to announce the new 0.6.0 release of the Iglu Scala Client, a library in charge of schema resolution and data validation in all Snowplow components, including enrichment jobs and loaders. This release brings enormous amount of API changes we’ve made in order to facilitate implementation of Snowplow Platform Improvement Proposals, including new bad rows format, Amazon Redshift automigrations and deprecation of a batch pipeline.

In the rest of this post we will cover:

  1. API Changes
  2. Semantic Changes
  3. New Validator
  4. Acknowledgements
  5. Roadmap and Upcoming Features
  6. Getting Help

1. API changes

Iglu Scala Client 0.6.0 exposes a new class called Client which consists of two independent entities: Resolver and Validator. Resolver is responsible for schema resolution, caching and error handling and Validator receives resolved schemas, datum the user wants to validate and returns the validation report. These entities can be used separately or even re-defined by the user, but it is recommended to use Client class as an abstraction for the most common use case – validation of self-describing entities.

The Client class defines only one function:

def check[F[_]: RegistryLookup: Clock: Monad, A](instance: SelfDescribingData[A]): EitherT[F, ClientError, Unit]

Where:

From this very short excerpt an astute Scala developer might notice that we replaced several libraries with their modern counterparts:  

You can find more usage examples on dedicated wiki page.

2. Semantic Changes

In a batch ETL world, we tried to reduce the load on Iglu Registries by leveraging a very simple retry-and-cache algorithm that was making some configurable attempts before deciding whether the schema is missing or invalid and caching this failure. The only thing that potentially could reset this cached value is the cacheTtl property, that would force the resolver to retry whether the cached value was a success (in case somebody mutated schema) or a failure (in case the registry had a long outage).

This approach does not work for RT-first world anymore. There’s no meaningful amount of attempts that resolver needs to make before considering a schema missing or invalid. Streaming application can keep working for many weeks without restart and if during this time, one registry goes down for couple of minutes and resolver will try to resolve a schema it means that until next TTL eviction all data will be invalid. And retries won’t help here because they all will happen in a short period of time. 

However, we still need to have certain retry behavior, because registries always can go offline. In a streaming world, the best practice for retries is backoff period. In 0.6.0 the Iglu resolver will attempt to refetch failed schemas with steadily growing period of time between attempts. This period grows from subsecond delays to approximately 20 minutes. What is also very important, these re-attempts will be made only for non-successful responses.

The ResolutionError data type (subtype of ClientError) has two properties to reflect the history of attempts: lastAttempt a timestamp of last attempt being made and attempts reflecting the amount of attempts taken so far.

3. New Validator

As it was mentioned before, Iglu Scala Client uses the new JSON Schema validator under the hood (the hover can be replaced with any custom one). Even though this validator also targets JSON Schema spec v4, it nevertheless can have incompatibilities with our previous JSON Schema validator. As a result some instances that were considered valid by Iglu Scala Client pre-0.6.0 can now be silently invalidated.

Here’s a short list of the most widely used Snowplow components we’re planning to release with Iglu Client 0.6.0:

Please, monitor your bad rows produced by above assets.

4. Acknowledgements

This is a huge release, overhauling the core part of Snowplow and we were developing and testing it since Fall 2018. During this time, we received an enormous amount of contributions from outside of core Snowplow Engineering team. Huge thanks to our Summer 2018 intern Andrzej Sołtysik, Hacktoberfest 2018 participant Sajith Appukuttan and our partner from The Globe and Mail Inc. Saeed Zareian.

5. Roadmap and Upcomming Features

This release is planned to be a last one in 0.x series. Next release will likely include a relatively small amount of user-facing improvements and will have a 1.0.0 version, marking stability of API. From 1.0.0 onwards we plan to introduce MiMa-compatibility checks to our libraries in order to make the update process more reliable.

6. Getting Help

If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.

Share

Related articles