1. Home
  2. Docs
  3. Snowplow Open Source Quick Start
  4. Quick Start Installation Guide on GCP

Quick Start Installation Guide on GCP

This guide will take you through how to spin up an open source pipeline using the Snowplow terraform modules. Learn more about Infrastructure as code with Terraform here.

Before you begin

Sign up on discourse! If you run into any problems or have any questions, we are here to help.

If you are interested in receiving the latest updates from Product & Engineering, such as critical bug fixes, security updates, new features and the rest, then join our mailing list.

You can find more details on the infrastructure and applications that will be deployed in your cloud here.

Prerequisites

Select which example you want to use

The Quickstart Examples repository contains two different deployment strategies:

  • default
  • secure (Recommended for production use cases)

The main difference is around the VPC that the components are deployed within. In default you will deploy everything into a public subnet, this is the easiest route if you want to try out Snowplow as you can use your default network (auto mode VPC). However, to increase the security of your components, it is recommended and best practise to deploy components into private subnets. This ensures they are not available publicly. To use the secure configuration you will need your own custom VPC network with public and private subnets. You can follow this guide for steps on how to create networks and subnetworks on GCP.

Setting up your Iglu Server

The first step is to set up your Iglu Server stack.  This will mean that you can create and evolve your own custom event & entities. Iglu enables you to store the schemas for your events & entities and fetch them as your events are getting processed by your pipeline. 

We will go into more details on why this is very valuable and how to create your custom events & entities later, but for now you will need to set this up first so that your pipeline (specifically the Enrich application and your Postgres loader) can communicate with Iglu. 

Step 1: Update your input variables

Once you have cloned the quickstart-examples repository, you will need to navigate to the /gcp/iglu_server directory to update the input variables in terraform.tfvars.

git clone https://github.com/snowplow/quickstart-examples.git cd quickstart-examples/terraform/gcp/iglu_server/default #or secure nano terraform.tfvars #or other text editor of your choosing
Code language: Bash (bash)

To update your input variables, you’ll need to know a couple of things:

  • Your IP Address. Help.
  • A UUID for your Iglu Servers API Key. Help.
  • If you have opted for secure, the network and subnetworks you will deploy your Iglu Server into.
    • If you are deploying to your default network then set network = default and leave subnetworks empty
  • How to generate a SSH Key.
    • On most systems you can generate a SSH Key with: ssh-keygen -t rsa -b 4096
    • This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub
    • You can get the value with cat ~/.ssh/id_rsa.pub
Step 2 (optional): Update telemetry settings

We want to make this experience as easy & as valuable as possible for open source users new to Snowplow, and so we have added (optional) telemetry. You can find further details on what we track here, along with our telemetry principles.

  • If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.
    • Providing a consistent user_provided_id across your modules allows us to tie events together across applications so we can get a better understanding of unique users, and the topology of open source pipelines. This helps us to know how we can improve the experience going forward, so we really appreciate it being set!
  • To disable telemetry simply set variable telemetry_enabled = false.

Step 3: Run the terraform script to set up your Iglu stack

You can now use terraform to create your Iglu Server stack.

terraform init terraform plan terraform apply
Code language: Bash (bash)

This will output your iglu_server_dns_name. Make a note of this, you’ll need it when setting up your pipeline. If you have attached a custom ssl certificate and set up your own DNS records then you don’t need this value.

Step 4: Seed your Iglu Server from Iglu Central

For your pipeline to work, you’ll need to seed your Iglu Server with the standard Snowplow Schemas that are hosted in Iglu Central. To do this you will need igluctl, your Iglu Servers DNS and your Iglu API key that you created for your terraform.tfvars. You should update the igluctl command below with the correct values for your Iglu Server.

git clone https://github.com/snowplow/iglu-central cd iglu-central igluctl static push --public schemas/ http://CHANGE-TO-MY-IGLU-URL.elb.amazonaws.com 00000000-0000-0000-0000-000000000000
Code language: Bash (bash)

Setting up your pipeline

In this section you will update the input variables for the terraform module, and then run the terraform script to set up your pipeline.  At the end you will have a working Snowplow pipeline.

Step 1: Update your input variables

Once you have cloned the quickstart-examples repository, you will need to navigate to the pipeline directory to update the input variables in terraform.tfvars.

git clone https://github.com/snowplow/quickstart-examples.git cd quickstart-examples/terraform/gcp/pipeline/default #or secure nano terraform.tfvars #or other text editor of your choosing
Code language: Bash (bash)

To update your input variables, you’ll need to know a couple of things:

  • Your IP Address. Help.
  • Your Iglu Servers DNS from Setting up your Iglu Server.
  • Your UUID for your Iglu Servers API Key. Help.
  • If you have opted for secure, the network and subnetworks you will deploy your Iglu Server into.
    • If you are deploying to your default network then set network = default and leave subnetworks empty.
  • How to generate a SSH Key.
    • On most systems you can generate a SSH Key with: ssh-keygen -t rsa -b 4096
    • This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub
    • You can get the value with cat ~/.ssh/id_rsa.pub
Step 2 (optional): Update telemetry settings

We want to make this experience as easy & as valuable as possible for open source users new to Snowplow, and so we have added (optional) telemetry. You can find further details on what we track here, along with our telemetry principles.

  • If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.
    • Providing a consistent user_provided_id across your modules allows us to tie events together across applications so we can get a better understanding of unique users, and the topology of open source pipelines. This helps us to know where to invest our efforts going forward.
  • To disable telemetry simply set variable telemetry_enabled = false.
Step 3: Run the terraform script to set up your Pipeline stack

You can now use terraform to create your Pipeline stack.

terraform init terraform plan terraform apply
Code language: Bash (bash)

This will output your collector_dns_namedb_addressdb_port and db_id. Make a note of these, you’ll need it when sending events and connecting to your database. If you have attached a custom ssl certificate and set up your own DNS records then you don’t need your collector_dns_name as you will use your own DNS record to send events from the Snowplow trackers.

Now let’s send some events to your pipeline! >>


Do you have any feedback for us?

We are really interested in understanding how you are finding the Quick Start and what we can do to better support you in getting started with our open source. If you have a moment, let us know via this short survey.

Articles