1. Home
  2. Docs
  3. Modeling your data
  4. The Snowplow web data model
  5. dbt: Web data model

dbt: Web data model

The snowplow-web dbt package provides a means to run the standard web model via dbt. It processes events incrementally and is designed in a modular manner, allowing you to easily integrate your own custom SQL into the incremental framework provided by the package.

The package can be found in the snowplow/dbt-snowplow-web repo, with the full doc site hosted here.

Quickstart

Requirements

Prerequisites

Supported Warehouses

  • Redshift only.
  • BigQuery and Snowflake support to be added soon.

Installing the package

Add the snowplow-web package to your packages.yml file. For more information refer to dbt’s package hub.

Essential Configuration

1 – Check source data

This package will by default assume your Snowplow events data is contained in the atomic schema of your target.database. In order to change this, please add the following to your dbt_project.yml file:

# dbt_project.yml ... vars: snowplow_web: snowplow__atomic_schema: schema_with_snowplow_events snowplow__database: database_with_snowplow_events
Code language: YAML (yaml)
2 – Enabled desired contexts

The web package has the option to join in data from the following 3 Snowplow enrichments:

By default these are all disabled in the web package. Assuming you have the enrichments turned on in your Snowplow pipeline, to enable the contexts within the package please add the following to your dbt_project.yml file:

# dbt_project.yml ... vars: snowplow_web: snowplow__enable_iab: true snowplow__enable_ua: true snowplow__enable_yauaa: true
Code language: YAML (yaml)
3 – Filter your data set

You can specify both start_date at which to start processing events and the app_id‘s to filter for. By default the start_date is set to 2020-01-01 and all app_id‘s are selected. To change this please add the following to your dbt_project.yml file:

# dbt_project.yml ... vars: snowplow_web: snowplow__start_date: 'yyyy-mm-dd' snowplow__app_id: ['my_app_1','my_app_2']
Code language: YAML (yaml)
4 – Verify page ping variables

The web package processes page ping events to calculate web page engagement times. If your tracker configuration for min_visit_length (default 5) and heartbeat (default 10) differs from the defaults provided in this package, you can override by adding to your dbt_project.yml:

# dbt_project.yml ... vars: snowplow_web: snowplow__min_visit_length: 5 # Default value snowplow__heartbeat: 10 # Default value
Code language: YAML (yaml)

Further configuration

There are many additional configurations you can make to the model such as changing destination schemas, disabling modules and removing bot traffic.

For more details please refer to the snowplow-web package documentation.

Operation

The Snowplow web model is designed to be run as a whole, which ensures all incremental tables are kept in sync. As such, we suggest running the model using:

dbt run --models snowplow_web tag:snowplow_web_incremental
Code language: Bash (bash)

We strongly advise reading the operation section on the full doc site for more information on operations such as full-refreshes of the web model and backfilling data, since operations such as full refreshes deviate slightly from the native dbt implementation.

Testing

A full suite of tests are included within the package to ensure data quality. For more information on these and suggested implementations when running in orchestration please refer to the doc site.

Customizing the web model

The package is designed in a modular manner, allowing you to easily add in your own custom SQL as well incorporate data from events outside of page views and page pings.

For more information on customizing the model please refer to the full doc site. A dummy example of dbt project with customisations applied can also be found in the package repo.

If you’d like to learn more about Snowplow Insights you can book a demo with our team, or if you’d prefer, you can try Snowplow technology for yourself quickly and easily.