Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Enrichment
  5. Beam Enrich

Beam Enrich

Overview

Beam Enrich is the latest real-time enrichment platform developed at Snowplow Analytics. It takes as input a stream of raw data collected by the Scala Stream Collector, enrich it (using scala-common-enrich), and outputs both a stream of successfully enriched events as well as a stream of events that have failed enrichment.

Technical details

Beam Enrich is built on top of Apache Beam and its Scala wrapper SCIO.

It enriches the raw data, using scala-common-enrich outputted by the Scala Stream Collector in a GCP PubSub topic and outputs both the successfully enriched and those who failed enrichments to their respective PubSub topics.

It runs on GCP’s Dataflow.

Beam Enrich output

Beam Enrich turns the raw events into TSV enriched events following our Canonical event model that are ready to be fed into the BigQuery Loader which takes care of actually loading the data into BigQuery.

See also:

Articles