1. Home
  2. Docs
  3. Managing data quality
  4. Event Recovery for Snowplow Insights
  5. Running
  6. GCP – Beam

GCP – Beam

The Beam job is a dockerized purpose built Dataflow job that reads data from a GCS location specified through a pattern and stores the recovered payloads in a PubSub topic, unrecovered and unrecoverable in other GCS buckets.


In order to run the job a service-account with appropriate credentials must be set up in the project.

In order to be sure not to disrupt any service accounts that may be required for normal pipeline operations, it’s recommended to create a new service account for recovery.

To create a new service account in your GCP console navigate to IAM & Admin > Service Accounts. Create a new account and grant full project owner permissions.

Generate a new key for the service account and save down to your local machine.


To build the docker image, run:

sbt beam/docker:publishLocal


To run on Apache Beam in GCP Dataflow run it through a docker-deployment the options to set are shown below.

-v {{path-to-local-key-file}}.json:/snowplow/config/credentials.json \ -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ snowplow-docker-registry.bintray.io/snowplow/snowplow-event-recovery-beam:0.3.0-rc14 \ --runner=DataFlowRunner \ --project={{your-gcp-project-name}} \ --name=event-recovery-rt-pipeline \ --zone={{your-project-zone}} \ --gcpTempLocation=gs://sp-storage-loader-tmp-{{pipeline_tag_e.g._prod1}}-{{pipeline_name}}/temp \ --inputDirectory=gs://sp-storage-loader-bad-{{pipeline_tag_e.g._prod1}}-{{project_name}}/partitioned/** \ --outputTopic=projects/{{project_name}}/topics/{{recovery-topic}} \ --failedOutput=gs://{{create_or_choose_a_failed_output_folder}}/failed/ \ --unrecoverableOutput=gs://{{create_or_choose_an_unrecoverable_folder}}/unrecoverable/ \ --config={{base64-encoded-string of your recovery configuration}} \ --resolver={{base64-encoded-string of a schema resolver file (Iglu Central ok for default)}}
Code language: JavaScript (javascript)

As this is a self-contained docker image the first flag sets the key for the deployment in the folder for use by the job.

Note that for zone GCP prefers you may need to use -a or -b zones (e.g. europe-west2-b).

Also note, that no " are needed for any of the strings.

If you’d like to learn more about Snowplow Insights you can book a demo with our team, or if you’d prefer, you can try Snowplow technology for yourself quickly and easily.