Iglu R8 Basel Dove released

07 February 2018  •  Oguzhan Unlu

We are excited to announce a new Iglu release, introducing a good number of improvements focused on our igluctl CLI tool.

  1. Switch from severity levels to granular linting
  2. Dealing with missing schema versions
  3. ZSTD encoding support for Redshift
  4. New linters
  5. Setting ownership of Redshift tables
  6. Other updates
  7. Getting help

Read on for more information about Release 8 Basel Dove, named after a Swiss postage stamp - the first tricolor stamp in the world.

basel-dove-img

1. Switch from severity levels to granular linting

In igluctl 0.2.0 we introduced the concept of severity levels for our schema linting, to help schemas meet higher standards during the authoring process. However, time has shown that different use cases imply different ideas of “higher standards”, and the lint levels approach lacks the flexibility required to cover all these use cases.

As of this release, igluctl 0.4.0 always defaults to our previous strictest level (known before as severityLevel 3), but you can then explicitly switch off certain bundles of checks or linters.

To reduce the linting strictness, the --skip-checks accepts a list of comma-separated pre-defined linter names, for example:

$ igluctl lint --skip-checks description,optionalNull $SCHEMAS_PATH

The above linting will not notify user that some fields miss description property, and it will ignore that some fields are only implicitly optional (missing the null type that makes them explitly optional).

For the full list of available checks, their descriptions and their use cases, please see the igluctl wiki page.

2. Dealing with missing schema versions

Imagine a folder containing versions 1-0-0 and 1-0-2 of a schema, but missing the 1-0-1 version.

Prior to this release, igluctl’s static generate command would happily run against this folder, generating the Redshift DDL for both schema versions. The problem? Without seeing version 1-0-1, igluctl cannot know which properties were introduced in 1-0-1, and which in 1-0-2. This means igluctl will simply make a best guess at the correct column order for the version 1-0-2 table; this guess is likely to be wrong.

As of this release, igluctl adds some safeguards to check for missing schemas and avert the problem above:

  • If the user specifies a folder as input and there is a missing schema version, igluctl will refuse to do anything (unless the --force option is supplied)
  • If the user specifies the full path to file with schema and this file is not a 1-0-0, igluctl will print a warning
  • In all other cases, igluctl will proceed as usual

These new heuristics make it almost impossible to generate corrupted DDL by mistake. This update can be considered as an important step towards consistent schema registries, and thus proper DDL migrations.

3. ZSTD encoding support for Redshift

In Snowplow R95 Ellora we migrated our atomic.events table to the much-anticipated ZSTD encoding, which had a large positive impact on the storage space required in Redshift.

As of Basel Dove, ZSTD is now the default encoding in DDL files generated by igluctl - many thanks to Mike Robbins of Snowflake Analytics for propelling this support forwards.

This isn’t a breaking change in any way - all existing tables with LZO encoding will work as before. If you want to migrate existing tables to ZSTD, you will have to write and execute a migration script; an example atomic.events migration can be found in the snowplow/snowplow repository.

Updating the tables in Iglu Central to use ZSTS is something we are also considering - see issue #720 in that repo for details.

4. New linters

Some more work has been done in igluctl 0.4.0 to improve the linting capabilities, including the addition of two new linters.

4.1 Linting missing schema versions

In addition to the already mentioned warnings in static generate, the lint command now also checks for missing schemas by version, for example if you have 1-0-1 schema without the initial 1-0-0.

This new linting feature is considered essential and cannot be excluded through --skip-checks.

4.2 Linting description

Mike Robbins of Snowflake Analytics came up with a proposal and PR for ensuring that fields have a human-readable description property, with a view to improving the maintainability of schemas.

To skip this check, you can pass description to --skip-checks.

5. Setting ownership of Redshift tables

A common problem we hear is that Iglu users forget to set the owner of their Redshift tables after generating and applying their DDL scripts.

With this release, we are introducing a --set-owner parameter to igluctl’s static generate command. It expects the new table owner as its argument, and causes igluctl append an ALTER TABLE statement at the end of DDL; when applied this will ensure that the Redshift table has the correct owner.

6. Other updates

R8 Basel Dove brings a few small adjustments too.

The header part of igluctl-generated Redshift DDLs contains an auto-generated comment section with project-related information. As of this release, it will contain the DDL generation time in UTC, instead of local time.

Basel Dove also fixes an important bug in igluctl, which could, confusingly, generate incorrect failure messages even though the static push command had in fact executed successfully.

7. Getting help

For more details on this release, as always do check out the release notes on GitHub.

If you have any questions or run into any problems, please raise a question in our Discourse forum.