Snowplow is designed to make it easy for you to change your tracking design in a safe and backwards-compatible way as your organisational data needs evolve.
Data structures are used to describe the structure your data should be delivered in. The structure itself is described by a JSON schemas. Each schema carries a version number expressed as three numeric digits. As your schema evolves, all previous versions of that schema remain available to ensure backwards-compatibility.
Why is versioning important?
As well as good practice, versioning has an important role in telling Snowplow Loaders how to handle the changes when loading into your data warehouse(s).
For example, for certain changes there will be a need to create new columns, update columns or even create whole new tables. For this reason, it’s important you understand when your change is breaking and version correctly.
How do I version?
Breaking and non-breaking changes
In Data Structures UI at the point of publishing a schema you’ll be asked to select which version you’d like to create. There are two options:
- Non-breaking – a non-breaking change is backward compatible with historical data and increments the
patch
number i.e.1-0-0
->1-0-1
. - Breaking – a breaking change is not backwards compatible with historical data and increments the
model
number i.e.1-0-0
->2-0-0
.
Should I choose breaking or non-breaking?
Different data warehouses handle schema evolution slightly differently. Use the table below as a guide to how to handle versioning in Data Structures for your warehouse.
Redshift | Snowflake | BigQuery | |
Add / remove / rename an optional field | Non-breaking | Non-breaking | Non-breaking |
Add / remove / rename a required field | Breaking | Breaking | Breaking |
Change a field from optional to required | Breaking | Breaking | Breaking |
Change a field from required to optional | Breaking | Non-breaking | Non-breaking |
Change the type of an existing field | Breaking | Breaking | Breaking |
Change the size of an existing field | Non-breaking | Non-breaking | Non-breaking |
Overwriting schemas
Wherever possible we would advise always versioning the schema when making a change. However in cases where this isn’t possible, Snowplow does allow you to overwrite a schema on your development environment, that is making a change and keeping the version the same.
Overwriting in your Production environment is forbidden due to the technology that auto-adjusts your tables, so when you promote an overwritten version to the Production environment you are required to increase the version as Breaking or Non-Breaking.
Incrementing the middle digit
For particular workflows you may want to make use of the middle digital as part of your versioning strategy. For simplicity, the UI allows only breaking or non-breaking changes.
Should you wish to use the middle versioning digit this is possible via the Data Structures API.