Iglu Schema Registry 4 Epaulettes released

22 May 2016  •  Anton Parkhomenko

We are pleased to announce the fourth release of the Iglu Schema Registry System, with an initial release of the Iglu Core library, implemented in Scala.

Read on for more information on Release 4 Epaulettes, named after the famous Belgian postage stamps:

  1. Scala Iglu Core
  2. Registry Syncer updates
  3. Iglu roadmap
  4. Getting help

epaulettes-img

1. Scala Iglu Core

Why we created Iglu Core

Our initial development of Iglu two years ago was a somewhat piecemeal process. The design was centred on a few core ideas such as self-describing schemas, SchemaVer and several associated applications and libraries, including Schema Guru, Iglu Scala Client and of course the Snowplow platform itself.

Working on these applications, we found ourselves implementing the same Iglu-related data structures and functions multiple times. To clean up this rather piecemeal approach, we decided to extract this common functionality into a single library - Iglu Core.

The goal of Iglu Core is to provide a reference implementation of the Iglu concepts, which can then be re-implemented for other languages. This is important because Iglu is designed to be platform and language independent - it should be as usable from Scala as it is from Arduino or C++ or JavaScript.

Core concepts

The key elements introduced in our Iglu Core library are:

  • SchemaKey, which contains information about the schema for a self-describing entity. A self-describing entity can be JSON data, a JSON Schema or any other rich schema or type system that can be made self-describing
  • SchemaVer, part of the SchemaKey holding semantic information about the schema’s version. This is a triplet of MODEL, REVISION and ADDITION
  • SchemaCriterion, a default way to filter self-describing entities. It holds a SchemaKey where some or all of the version components (MODEL, REVISION, ADDITION) can be unfilled

Scala-specific features

Alongside the key elements set out above, the Scala implementation of Iglu Core has some neat Scala-specific features.

Scala Iglu Core contains type classes for injecting and extracting the SchemaKey for various data types, including representations of JSON in different Scala libraries including Json4s and Circe.

The library also offers container classes called SelfDescribingSchema and SelfDescribingData, to represent the SchemaKey along with the data that key describes.

Use these containers to store, serialize and exchange data inside your Scala code in a more type-safe and concise way.

Using Iglu Core

Iglu Core has been designed around Snowplow and Iglu’s own requirements, but we expect the library will be useful to external implementers as well.

Typically you won’t have to learn the details of the Scala Iglu Core’s type classes, since we are also providing complete implementations for popular Scala JSON libraries, starting with iglu-core-json4s and iglu-core-circe.

Just include the appropriate implementation as a dependency in your project (the artifacts are available in Maven Central):

val igluCirce = "com.snowplowanalytics" %% "iglu-core-json4s"  % "0.1.0"

// Or:

val igluJson4s = "com.snowplowanalytics" %% "iglu-core-circe"  % "0.1.0"

Here is an example using iglu-core-json4s:

import com.snowplowanalytics.iglu.core.json4s._

implicit val stringifyData = StringifyData

val schemaKey = SchemaKey("com.acme", "event", "jsonschema", SchemaKey(1,0,0))
val data: JValue = ???

SelfDescribingData(schemaKey, data).asString

More detailed information can be found on wiki pages dedicated to Iglu Core and Scala Iglu Core.

2. Registry Syncer updates

Until recently, a static Iglu registry was the default way to host schemas; that is now changing as the Scala-based RESTful registry server starts to mature.

To help our users work with the registry server, Iglu includes a tool called Registry Syncer, a simple Bash script allowing you to populate a registry server over HTTP in a few commands.

This release introduce following some minor improvements to Registry Syncer:

  • We changed the name from Repo Syncer (as we are now referring to “schema registries” not “schema repositories”)
  • The synchronization process now stops on the first failure
  • We use PUT instead of POST, so existing schemas can be automatically overridden

In order to bootstrap your RESTful registry server with schemas you will need to:

  1. Setup the registry server
  2. Create a super API key
  3. Run the Registry Syncer like so:
${iglu_dir}/0-common/registry-syncer/sync.bash http://iglu.acme.com:8080 ${super_api_key} ${schemas_dir}

where ${iglu_dir} holds a checked-out copy of the Iglu repository, ${super_api_key} is the API key you created earlier and ${schemas_dir} holds a directory of schemas.

3. Iglu roadmap

We have a lot planned for Iglu - both in terms of new functionality and ongoing clean-up and consolidation of our existing Iglu technology.

The next release will introduce an Iglu command-line tool, “Iglu CLI”, to help users with various Iglu-related tasks. To start with, we will port over to Iglu CLI:

  • Schema Guru’s current schema-guru ddl command, which will evolve into a static registry generator comamnd in Iglu CLI
  • Our Registry Syncer, which will be ported from Bash into Scala and added as an Iglu CLI sub-command

Beyond Iglu CLI we have plenty more planned for Iglu, including adding first class support within Iglu for database table definitions (such as Redshift), mappings between different data formats (e.g. JSON Schema to Redshift), and schema migrations. Stay tuned!

4. Getting help

If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.