1. Home
  2. Docs
  3. Managing data quality
  4. Event Recovery for Open Source
  5. Running
  6. Beam

Beam

The Beam job reads data from a GCS location specified through a pattern and stores the recovered payloads in a PubSub topic, unrecovered and unrecoverable in other GCS buckets.

Building

To build the docker image, run:

sbt beam/docker:publishLocal

Running

To run on Apache Beam in GCP Dataflow run it through a docker-deployment like so:

docker run \ snowplow-event-recovery-beam:0.2.0 \ --runner=DataFlowRunner \ --job-name=${JOB_NAME} \ --project=${PROJECT_ID} \ --zone=${ZONE} \ --gcpTempLocation=gs://${TEMP_BUCKET_PATH} \ --inputDirectory=gs://${SOURCE_BUCKET_PATH}/** \ --outputTopic=${OUTPUT_PUBSUB} \ --failedOutput=gs://${UNRECOVERED_BUCKET_PATH} \ --unrecoverableOutput=gs://${UNRECOVERABLE_BUCKET_PATH} \ --config=${JOB_CONFIG} \ --resolver=${RESOLVER_CONFIG}