1. Home
  2. Docs
  3. Setup Open Source Snowplow on GCP
  4. Setup the Snowplow collector

Setup the Snowplow collector

The Snowplow collector is a version of the Snowplow Scala Stream Collector that should, in production, be setup as an autoscaling application behind a load balancer. It receives data sent from different sources over HTTP(S), and writes them to a raw Pub/Sub topic. From there the data is picked up and processed by the Snowplwo validation and enrichment job.

To setup the collector, we are going to:

  1. Setup the Pub/Sub topics required. (A good topic for data that is successfully processed by the collector, and a bad one in case any data is not successfully processed.)
  2. Setup, configure and run the collector application
    1. Locally (for testing)
    2. As a since instance VM (e.g. for a development environment)
    3. As an autoscaling group of instances behind a load balancer (recommended for production
  3. Configuring the collector