Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Iglu
  5. Iglu Common Architecture
  6. Self describing JSONs

Self describing JSONs

This page is adapted from the Snowplow Analytics blog post, Introducing self-describing JSONs.

In this section, we will be describing the approach we chose to link together a JSON with its JSON Schema in order to make it self-describing. Instead of embedding the JSON Schema directly into the JSON itself which would be very wasteful in terms of space, we chose only to store a reference to its JSON Schema.

For example, let’s say we have a JSON representing a click on an ad like so:

{ "bannerId": "4acd518feb82" }
Code language: JSON / JSON with Comments (json)

which is supposed to conform to this self-describing JSON Schema:

{ "$schema": "http://json-schema.org/schema#", "self": { "vendor": "com.snowplowanalytics", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "bannerId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false }
Code language: JSON / JSON with Comments (json)

Our self-describing JSON will look like this:

{ "schema": "iglu:com.snowplowanalytics/ad_click/jsonschema/1-0-0", "data": { "bannerId": "4acd518feb82" } }
Code language: JSON / JSON with Comments (json)

Notice the two main differences compared to our original JSON:

  1. There is a new schema field located at the root of the JSON which contains (in a space-efficient format) all the information required to uniquely identify the associated JSON Schema. The schema’s URI follows the following pattern:
  1. The data contained in the original JSON has been encapsulated in a data field to prevent any accidental collisions should the JSON already have a schema field

This way, our JSON becomes de facto self-describing, embedding a link to its JSON Schema.

Back to Common architecture.