Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on GCP
  5. Setup the Snowplow collector

Setup the Snowplow collector

The Snowplow collector is a version of the Snowplow Scala Stream Collector that should, in production, be setup as an autoscaling application behind a load balancer. It receives data sent from different sources over HTTP(S), and writes them to a raw Pub/Sub topic. From there the data is picked up and processed by the Snowplwo validation and enrichment job.

To setup the collector, we are going to:

  1. Setup the Pub/Sub topics required. (A good topic for data that is successfully processed by the collector, and a bad one in case any data is not successfully processed.)
  2. Setup, configure and run the collector application
    1. Locally (for testing)
    2. As a since instance VM (e.g. for a development environment)
    3. As an autoscaling group of instances behind a load balancer (recommended for production
  3. Configuring the collector