Bean Enrich is packaged as a Docker image. When run, the Docker container creates the actual Dataflow job that will actually enrich events.
For instance the container can be run from from Kubernetes Engine or from a Compute Engine instance. It can also be run from other places, as long as it can communicate with Dataflow and have enough permissions to create a Dataflow job.
The docker container can be run with the following command:
docker run \
-v $PWD/config:/snowplow/config \
-e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP
snowplow/beam-enrich:latest \
--runner=DataFlowRunner \
--project=project-id \
--streaming=true \
--zone=europe-west2-a \
--gcpTempLocation=gs://location/ \
--job-name=beam-enrich \
--raw=projects/project/subscriptions/raw-topic-subscription \
--enriched=projects/project/topics/enriched-topic \
--bad=projects/project/topics/bad-topic \
--pii=projects/project/topics/pii-topic \ #OPTIONAL
--resolver=/snowplow/config/iglu_resolver.json \
--enrichments=/snowplow/config/enrichments/
Code language: PHP (php)
This assumes that you have a config
folder containing your resolver and your enrichments (as well as your GCP credentials if you’re running Beam Enrich outside of GCP) in the current directory.
Alternatively if you compiled it from source, Beam enrich can be run directly:
./bin/snowplow-beam-enrich \
--runner=DataFlowRunner \
--project=project-id \
--streaming=true \
--zone=europe-west2-a \
--gcpTempLocation=gs://location/ \
--job-name=beam-enrich \
--raw=projects/project/subscriptions/raw-topic-subscription \
--enriched=projects/project/topics/enriched-topic \
--bad=projects/project/topics/bad-topic \
--pii=projects/project/topics/pii-topic \ #OPTIONAL
--resolver=iglu_resolver.json \
--enrichments=enrichments/
Code language: PHP (php)
You can also display a help message which will describe every Beam Enrich-specific options:
./bin/snowplow-beam-enrich --runner=DataFlowRunner --help
Tests and debugging
Testing
The tests for this codebase can be run with sbt "project beam" test
.
Debugging
You can run the job locally and experiment with its different parts using the
SCIO REPL by running sbt repl/run
.