Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on GCP
  5. Setup Validation and Enrich (GCP)
  6. Run Beam Enrich

Run Beam Enrich

Bean Enrich is packaged as a Docker image. When run, the Docker container creates the actual Dataflow job that will actually enrich events.

For instance the container can be run from from Kubernetes Engine or from a Compute Engine instance. It can also be run from other places, as long as it can communicate with Dataflow and have enough permissions to create a Dataflow job.

The docker container can be run with the following command:

docker run \ -v $PWD/config:/snowplow/config \ -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP snowplow/beam-enrich:latest \ --runner=DataFlowRunner \ --project=project-id \ --streaming=true \ --zone=europe-west2-a \ --gcpTempLocation=gs://location/ \ --job-name=beam-enrich \ --raw=projects/project/subscriptions/raw-topic-subscription \ --enriched=projects/project/topics/enriched-topic \ --bad=projects/project/topics/bad-topic \ --pii=projects/project/topics/pii-topic \ #OPTIONAL --resolver=/snowplow/config/iglu_resolver.json \ --enrichments=/snowplow/config/enrichments/
Code language: PHP (php)

This assumes that you have a config folder containing your resolver and your enrichments (as well as your GCP credentials if you’re running Beam Enrich outside of GCP) in the current directory.

Alternatively if you compiled it from source, Beam enrich can be run directly:

./bin/snowplow-beam-enrich \ --runner=DataFlowRunner \ --project=project-id \ --streaming=true \ --zone=europe-west2-a \ --gcpTempLocation=gs://location/ \ --job-name=beam-enrich \ --raw=projects/project/subscriptions/raw-topic-subscription \ --enriched=projects/project/topics/enriched-topic \ --bad=projects/project/topics/bad-topic \ --pii=projects/project/topics/pii-topic \ #OPTIONAL --resolver=iglu_resolver.json \ --enrichments=enrichments/
Code language: PHP (php)

You can also display a help message which will describe every Beam Enrich-specific options:

./bin/snowplow-beam-enrich --runner=DataFlowRunner --help

Tests and debugging

Testing

The tests for this codebase can be run with sbt "project beam" test.

Debugging

You can run the job locally and experiment with its different parts using the
SCIO REPL by running sbt repl/run.