1. Home
  2. Docs
  3. Managing data quality
  4. Testing and QA workflows
  5. Verify schema dependencies with the Data Structures CI tool

Verify schema dependencies with the Data Structures CI tool

The Data Structures CI is a command-line tool which integrates Data Structures API into your CI/CD pipelines and currently has one task which verifies that all schema dependencies for a project are already deployed into a specified environment (e.g. “DEV”, “PROD”).

This is available as a Github Action and as a universal install for other deployment pipelines e.g. Travis CI, CircleCI, Gitlab, Azure Pipelines, Jenkins…


In order to be able to perform tasks with the tool, you will need to supply both your Organization ID and an API key.

The Organization ID is a UUID that can be retrieved from the URL immediately following the .com when visiting console

An API Key can be created here.

Create your manifest file

This command allows you to verify that all schema dependencies for a project (declared in a specific “manifest”) are already deployed into an environment (e.g. “DEV”, “PROD”).

In your application project, create a JSON file for your manifest that will store references to the schema dependencies you have for your project. During a CI build this file will be parsed, validated and used by Data Structures CI to check that each schema is correctly deployed to the appropriate environment before the code for the application gets deployed, effectively guarding against the ‘Schema not found’ type of failed events.

Here is an example manifest file where our application has dependencies on three schemas:

  • checkout_process version 1-0-7
  • user version 1-0-1
  • product version 2-0-0
{ "schema": "iglu:com.snowplowanalytics.insights/data_structures_dependencies/jsonschema/1-0-0", "data": { "schemas": [ { "vendor": "com.acme.marketing", "name": "checkout_process", "format": "jsonschema", "version": "1-0-7" }, { "vendor": "com.acme", "name": "user", "format": "jsonschema", "version": "1-0-1" }, { "vendor": "com.acme", "name": "product", "format": "jsonschema", "version": "2-0-0" } ] } }
Code language: JSON / JSON with Comments (json)

The manifest must adhere to this self-describing JSON Schema.

Setting up as a Github Action

To use the Github Action simply add this snippet as a step on your existing GitHub Actions pipeline, replacing the relevant variables:

name: Example workflow using Snowplow's Data Structures CI on: push jobs: data-structures-check: runs-on: ubuntu-latest steps: - uses: actions/checkout@master - name: Run Snowplow's Data Structures CI uses: snowplow-product/msc-schema-ci-action/check@v1 with: organization-id: ${{ secrets.SNOWPLOW_ORG_ID }} api-key: ${{ secrets.SNOWPLOW_API_KEY }} manifest-path: 'snowplow-schemas.json' environment: ${{ env.ENVIRONMENT }}
Code language: YAML (yaml)

View the Github Action repository.

Setting up for other deployment pipelines


  • JRE 8 or above

Download the CI tool

You can download Data Structures CI from our Bintray repository, using the following command:

$ curl -L https://github.com/snowplow-product/msc-schema-ci-tool/releases/download/1.0.0/data_structures_ci_1.0.0.zip | jar xv && chmod +x ./data-structures-ci
Code language: Bash (bash)

Run the task

You can run the task using the following syntax:

$ export ORGANIZATION_ID=<organization-id> $ export API_KEY=<api-key> $ ./data-structures-ci check \ --manifestPath /path/to/snowplow-schemas.json \ --environment DEV
Code language: Bash (bash)

View the repository for integration examples.

If you’d like to learn more about Snowplow BDP you can book a demo with our team, or if you’d prefer, you can try Snowplow technology for yourself quickly and easily.