Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on GCP
  5. Setup the Snowplow collector
  6. Configuration options

Configuration options

A template configuration file is available from Github: config.hocon.sample.

Stream configuration

You will need to update the config file with your good and bad Pub/Sub topic names:

  • collector.streams.good: The name of the good input stream of the tool which you choose as a sink. This is stream where the events which have successfully been collected will be stored in.
  • collector.streams.bad: The name of the bad input stream of the tool which you choose as a sink. This is stream where the events that are too big will be stored in.
  • collector.streams.useIpAddressAsPartitionKey: Whether to use the incoming event’s ip as the partition key for the good stream/topic

HTTP settings

Also verify the settings of the HTTP service:

  • collector.interface
  • collector.port

Buffer settings

You will also need to set appropriate limits for:

  • collector.streams.buffer.byteLimit
  • collector.streams.buffer.recordLimit
  • collector.streams.buffer.timeLimit

Sinks

The collector.streams.sink.enabled setting determines which of the supported sinks to write raw events to:

  • "kinesis" for writing Thrift-serialized records and error rows to a Kinesis stream
  • "googlepubsub" for writing Thrift-serialized records and error rows to a Google Cloud Pubsub topic
  • "stdout" for writing Base64-encoded Thrift-serialized records and error rows to stdout and stderr respectively
  • "kafka" for writing Thrift-serialized records and error rows to a Kafka topic
  • "nsq" for writing Thrift-serialized records and error rows to NSQ topic

If you switch to "stdout", we recommend setting ‘akka.loglevel = OFF’ and ‘akka.loggers = []’ to prevent Akka debug information from polluting your event stream on stdout.

You should fill the rest of the collector.streams.sink section according to your selection as a sink.

To use stdout as a sink comment everything in the collector.streams.sink but collector.streams.sink.enabled which should be set to stdout.

Setting the domain name

Set the cookie name using the collector.cookie.name setting.

Setting the domain name in collector.cookie.domain can be useful if you want to make the cookie accessible to other applications on your domain. In our example above, for example, we’ve setup the collector on collector.snplow.com. If we do not set a domain name, the cookie will default to this domain. However, if we set it to .snplow.com, that cookie will be accessible to other applications running on *.snplow.com.

Please, refer to RFC 6265 for the domain matching rules.

Setting the cookie duration

The cookie expiration duration is set in collector.cookie.expiration. If no value is provided, cookies set the default to expire after one year (i.e. 365 days). If you don’t want to set a third party cookie at all it could be disabled by setting collector.cookie.enabled to false. Alternatively, it could be achieved if collector.cookie.expiration is set to 0 (from version 0.4.0).

Additional configuration options

There are a number of other options which should be set at your discretion based on your needs:

  • collector.p3p allows you to configure a P3P header
  • collector.crossDomain allows you to configure a cross domain policy
  • collector.doNotTrackCookie lets you specify a first-party cookie the Scala Stream Collector will be checking against. If it is present in the incoming requests, the collector will ignore them.
  • collector.redirectMacro allows you to template your redirect URL
  • collector.rootResponse allows you to customize what the collector sends back when the / route is reached

Custom & default redirect paths

Changelog

v2.0.0 – Default redirect path set to false by default (enableDefaultRedirect = false)

v0.17.0 (R117) – Allowed users to disable the default redirect endpoint r/tp2 (enableDefaultRedirect)

v0.16.0 (R116) – Added the option to map the default redirect path r/tp2 to custom values e.g. randomstring1/randomstring2

v0.6.0 (R78) – Added click redirect mode

From v2.0.0 onwards, the default redirect path is disabled by default (enableDefaultRedirect = false). Prior to this release it was set to true by default.

If you wish to enable the default redirect path /r/tp2, then you would need to set enableDefaultRedirect = true in your config.hocon file. Alternatively, you can set the default endpoint to false (disabled) and instead set up a custom user-defined url for redirects. For example, the following configuration will only allow redirects for the custom-defined /com.acme/redirect-me endpoint, whereas the default /r/tp2 will not be available.

enableDefaultRedirect = false
paths {
  "/com.acme/redirect-me" = "/r/tp2"
}