The Snowplow collector is a version of the Snowplow Scala Stream Collector that should, in production, be setup as an autoscaling application behind a load balancer. It receives data sent from different sources over HTTP(S), and writes them to a raw Pub/Sub topic. From there the data is picked up and processed by the Snowplwo validation and enrichment job.
To setup the collector, we are going to:
- Setup the Pub/Sub topics required. (A good topic for data that is successfully processed by the collector, and a bad one in case any data is not successfully processed.)
- Setup, configure and run the collector application
- Locally (for testing)
- As a since instance VM (e.g. for a development environment)
- As an autoscaling group of instances behind a load balancer (recommended for production
- Configuring the collector