Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on GCP
  5. Setup the Snowplow collector
  6. Run the collector locally

Run the collector locally

It is useful to run the collector locally for testing.

In order to run the collector, you will need:

  • to download the Scala Stream Collector from Bintray
  • fill a configuration file as detailed in the preceeding section

To run the collector locally, you’ll also to authenticate the machine where the collector will run by doing:

$ gcloud auth login $ gcloud auth application-default login

Then, running it is just a matter of executing the following command:

$ java -jar snowplow-stream-collector-google-pubsub-*version*.jar --config config.hocon

Running the collector on a GCP instance

To run the collector on a single GCP instance, you’ll first need to spin one up:

  • Go to the GCP dashboard, and once again, make sure your project is selected.
  • Click the hamburger on the top left corner, and select Compute Engine, under Compute
  • Enable billing if you haven’t (if you haven’t enabled billing, at this point the only option you’ll see is a button to do so)
  • Click “Create instance” and pick the apropriate settings for your case, making sure of, at least the following:
    • Under Access scopes, select “Set access for each API” and enable “Cloud PubSub”
    • Under Firewall, select “Allow HTTP traffic”
    • Optional Click Management, disk, networking, SSH keysUnder Networking, add a Tag, such as “collector”. (This is needed to add a tagged Firewall rule, explained below)
  • Click the hamburger on the top left corner, and click on “VPC Network”, under Networking
  • On the sidebar, click on “Firewall rules”
  • Click “Create Firewall Rule”
  • Name your rule
  • Under Source filter pick “Allow from any source”
  • Under Protocols and ports add “tcp:8080”
    • Note that 8080 is the port assigned to the collector in the configuration file. If you choose another port here, make sure you change the config file
  • Under Target tags add the Tag with which you labeled your instance (here collector)
  • Click “Create”
  • Then click “Upload Files” and upload your configuration file

Once you have your config file in place, ssh into your instance:

$ gcloud compute ssh your-instance-name --zone your-instance-zone

And then run:

$ sudo apt-get update $ sudo apt-get -y install default-jre $ sudo apt-get -y install unzip $ wget https://dl.bintray.com/snowplow/snowplow-generic/snowplow_scala_stream_collector_google_pubsub_<VERSION>.zip $ gsutil cp gs://<YOUR-BUCKET-NAME/<YOUR-CONFIG-FILE-NAME> . $ unzip snowplow_scala_stream_collector_google_pubsub_<VERSION>.zip $ java -jar snowplow-stream-collector-google-pubsub-<VERSION>.jar --config <YOUR-CONFIG-FILE-NAME>