1. Home
  2. Docs
  3. Managing data quality
  4. Using the Data Structures CI tool

Using the Data Structures CI tool

The Data Structures CI is a command-line tool which integrates Data Structures API into your CI/CD pipelines and currently has one task which verifies that all schema dependencies for a project are already deployed into a specified environment (e.g. “DEV”, “PROD”).

This is available as a Github Action and as a universal install for other deployment pipelines e.g. Travis CI, CircleCI, Gitlab, Azure Pipelines, Jenkins…

Authorization

In order to be able to perform tasks with the tool, you will need to supply credentials of a user which you will use for CI purposes.

These credentials come in form of a username and a password which can be obtained by creating an admin user for your organization in the Snowplow Insights Console.

Create your manifest file

This command allows you to verify that all schema dependencies for a project (declared in a specific “manifest”) are already deployed into an environment (e.g. “DEV”, “PROD”).

In your application project, create a JSON file for your manifest that will store references to the schema dependencies you have for your project. During a CI build this file will be parsed, validated and used by Data Structures CI to check that each schema is correctly deployed to the appropriate environment before the code for the application gets deployed, effectively guarding against the ‘Schema not found’ type of failed events.

Here is an example manifest file where our application has dependencies on three schemas:

  • checkout_process version 1-0-7
  • user version 1-0-1
  • product version 2-0-0
{ "schema": "iglu:com.snowplowanalytics.insights/data_structures_dependencies/jsonschema/1-0-0", "data": { "schemas": [ { "vendor": "com.acme.marketing", "name": "checkout_process", "format": "jsonschema", "version": "1-0-7" }, { "vendor": "com.acme", "name": "user", "format": "jsonschema", "version": "1-0-1" }, { "vendor": "com.acme", "name": "product", "format": "jsonschema", "version": "2-0-0" } ] } }

The manifest must adhere to this self-describing JSON Schema.

Setting up as a Github Action

To use the Github Action simply add this snippet as a step on your existing GitHub Actions pipeline, replacing the relevant variables:

name: Example workflow using Snowplow's Data Structures CI on: push jobs: data-structures-check: runs-on: ubuntu-latest steps: - uses: actions/checkout@master - name: Run Snowplow's Data Structures CI uses: snowplow-product/msc-schema-ci-action/check@v0.3.0 with: manifest-path: 'snowplow-schemas.json' username: ${{ secrets.AUTH_USER }} password: ${{ secrets.AUTH_PASSWORD }} environment: ${{ env.ENVIRONMENT }}

View the Github Action repository.

Setting up for other deployment pipelines

Prerequisites

  • JRE 8 or above

Download the CI tool

You can download Data Structures CI from our Bintray repository, using the following command:

curl -L https://dl.bintray.com/snowplow/snowplow-generic/data_structures_ci_0.3.0.zip | jar xv && chmod +x ./data-structures-ci

Run the task

You can run the task using the following syntax:

$ ./data-structures-ci check \ --manifestPath /path/to/snowplow-schemas.json \ --username $USERNAME \ --password $PASSWORD \ --environment DEV

View the repository for integration examples.