1. Home
  2. Docs
  3. Modeling your data
  4. Setup and run SQL Runner

Setup and run SQL Runner

SQL Runner enables you to execute SQL scripts against the Snowplow data in your data warehouse. Specifically, it allows you to organize your SQL scripts in templatable playbooks, and execute them in series or in parallel on Snowflake, Amazon Redshift, GCP BigQuery and PostgreSQL.

To set up SQL Runner, Snowplow open source users can start from this user guide.

For Snowplow Insights customers, SQL Runner is already setup as part of your pipeline, so you can get started with configuring and deploying your SQL data models immediately (LINK).

A SQL Runner data model consists of:

  • SQL files (containing one or more SQL statements)
  • Playbooks (YAML files organizing the SQL into steps)

Playbooks

A playbook consists of one of more steps, each of which consists of one or more queries. Steps are run in series, queries are run in parallel within the step.

Each query contains the path to a query file

All steps are applied against all targets. All targets are processed in parallel.

In the following example, a.sql, b.sql and c.sql are run in parallel.

:steps: - :name: "Run a,b and c in parallel" :queries: - :name: a :file: a.sql - :name: b :file: b.sql - :name: c :file: c.sql
Code language: YAML (yaml)

By contrast, in the example below, the three SQL files are executed in sequence.

:steps: - :name: "Run a..." :queries: - :name: a :file: a.sql - :name: "...then run b..." :queries: - :name: b :file: b.sql - :name: "...then run c..." :queries: - :name: c :file: c.sql
Code language: YAML (yaml)

Playbooks can be templated, and corresponding variables can be passed in with the var flag like this:

sql-runner -var host=value,username=value2,password=value3
Code language: Bash (bash)

Here is the corresponding playbook template:

:targets: - :name: "My Postgres database 1" :type: postgres :host: {{.host}} :database: sql_runner_tests_1 :port: 5432 :username: {{.username}} :password: {{.password}} :ssl: false # SSL disabled by default :variables: :test_schema: sql_runner_tests :timeFormat: "2006_01_02" :steps: - :name: Create schema and table :queries: - :name: Create schema and table :file: postgres-sql/good/1.sql :template: true
Code language: YAML (yaml)

SQL files

A query file contains one or more SQL statements. These are executed “raw” (i.e. not in a transaction) in series by SQL Runner. If the query file is flagged as a template in the playbook, then the file is pre-processed as a template before being executed.

Note: If your query is a template that requires pre-processing, you must add template: true to the query definition in the playbook yml file, for example:

:name: "Run a.." :queries: - :name: a :file: a.sql :template: true
Code language: YAML (yaml)

Templates

Templates are run through Golang’s text template processor. The template processor can access all variables defined in the playbook.

The following custom functions are also supported:

  • nowWithFormat [timeFormat]: where timeFormat is a valid Golang time format
  • systemEnv "ENV_VAR": where ENV_VAR is a key for a valid environment variable
  • awsEnvCredentials: supports passing credentials through environment variables, such as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • awsProfileCredentials: supports getting credentials from a credentials file, also used by boto/awscli
  • awsEC2RoleCredentials: supports getting role-based credentials, i.e. getting the automatically generated credentials in EC2 instances
  • awsChainCredentials: tries to get credentials from each of the three methods above in order, using the first one returned
  • randomInt: will return a random integer

Note: All AWS functions output strings in the Redshift credentials format (CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s').

For an example query file using templating see: integration/resources/postgres-sql/good/3.sql

Failure modes

If a statement fails in a query file, the query will terminate and report failure.

If a query fails, its sibling queries will continue running, but no further steps will run.

Failures in one target do not affect other targets in any way.

Return codes

- 0 for no errors - 5 for target initialization errors - 6 for query errors - 7 for both types of error - 8 for no queries run

Target configuration

Redshift

If your storage target is Amazon Redshift, then the target configuration in the playbook is:

targets: - name: "My Redshift database" type: redshift host: ADD HERE # The endpoint as shown in the Redshift console database: ADD HERE # Name of database port: 5439 # Default Redshift port username: ADD HERE password: ADD HERE ssl: false # SSL disabled by default variables: ...
Code language: YAML (yaml)

BigQuery

To access a BigQuery project, sql-runner will need some Google credentials. These can be set up by creating a new service account in the GCP console, then providing its private key to the application via a GOOGLE_APPLICATION_CREDENTIALS environment variable – a detailed walkthrough of this process is available on the GCP documentation website.

After the credentials are set up, simply create a playbook with the following BigQuery-specific target configuration:

targets: - name: "My BigQuery database" type: bigquery project: ADD HERE # Project ID as shown in the GCP console's front page variables: ...
Code language: YAML (yaml)

Snowflake

If your data warehouse is Snowflake, then the SQL Runner playbooks will have a target configuration as:

targets: - name: "My Snowflake database" type: snowflake account: ADD HERE # Your Snowflake account name region: ADD HERE # Leave blank for default us-east-1 database: ADD HERE # Name of database warehouse: ADD HERE # Name of warehouse to run the queries username: ADD HERE password: ADD HERE host: # Leave blank port: # Leave blank ssl: true # Snowflake connection is always secured by TLS variables: ...
Code language: YAML (yaml)

PostgreSQL

Finally, if your storage target is PostgreSQL, then can be configured as:

targets: - name: "My Postgres database" type: postgres host: ADD HERE database: ADD HERE # Name of database port: 5432 # Default Postgres port username: ADD HERE password: ADD HERE ssl: false # SSL disabled by default variables:
Code language: YAML (yaml)

That’s it – you’re now ready to start running SQL against your data warehouse!