Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Loaders and storage targets
  5. Snowplow Postgres Loader
  6. Postgres Loader Configuration Reference

Postgres Loader Configuration Reference

This is a complete list of the options that can be configured in the postgres loader’s HOCON config file. The example configs in github show how to prepare an input file.

input.typeRequired. Can be “Kinesis”, “PubSub” or “Local”. Configures where input events will be read from.
input.streamNameRequired when input.type is Kinesis. Name of the Kinesis stream to read from.
input.regionRequired when input.type is Kinesis. AWS region in which the Kinesis stream resides.
input.initialPositionOptional. Used when input.type is Kinesis. Use “TRIM_HORIZON” (the default) to start streaming at the last untrimmed record in the shard, which is the oldest data record in the shard. Or use “LATEST” to start streaming just after the most recent record in the shard.
input.retrievalMode.typeOptional. When input.type is Kinesis, this sets the polling mode for retrieving records. Can be “FanOut” (the default) or “Polling”.
input.retrievalMode.maxRecordsOptional. Used when input.retrievalMode.type is “Polling”. Configures how many records are fetched in each poll of the kinesis stream. Default 10000.
input.projectIdRequired when input.type is PubSub. The name of your GCP project.
input.subscriptionIdRequired when input.type is PubSub. Id of the PubSub subscription to read events from
input.pathRequired when input.type is Local. Path for event source. It can be directory or file. If it is directory, all the files under given directory will be read recursively. Also, given path can be both absolute path or relative path w.r.t. executable.
output.good.hostRequired. Hostname of the postgres database.
output.good.portOptional. Port number of the postgres database. Default 5432.
output.good.databaseRequired. Name of the postgres database.
output.good.usernameRequired. Postgres role name to use when connecting to the database
output.good.passwordRequired. Password for the postgres user.
output.good.schemaRequired. The Postgres schema in which to create tables and write events.
output.good.sslModeOptional. Configures how the client and server agree on ssl protection. Default “REQUIRE”
output.bad.typeOptional. Can be “Kinesis”, “PubSub”, “Local” or “Noop”. Configures where bad rows will be sent. Default is “Noop” which means bad rows will be discarded
output.bad.streamNameRequired when bad.type is Kinesis. Name of the Kinesis stream to write to.
output.bad.regionRequired when bad.type is Kinesis. AWS region in which the Kinesis stream resides.
output.bad.projectIdRequired when bad.type is PubSub. The name of your GCP project.
output.bad.topicIdRequired when bad.type is PubSub. Id of the PubSub topic to write bad rows to
output.bad.pathRequired when bad.type is Local. Path of the file to write bad rows
purposeOptional. Set this to “ENRICHED_EVENTS” (the default) when reading the stream of enriched events in tsv format. Set this to “JSON” when reading a stream of self-describing json, e.g. snowplow bad rows.
monitoring.metrics.cloudWatchOptional boolean, with default true. For kinesis input, this is used to disable sending metrics to cloudwatch.

Advanced options

We believe these advanced options are set to sensible defaults, and hopefully you won’t need to ever change them.

backoffPolicy.minBackoffIf producer (PubSub or Kinesis) fails to send item, it will retry to send it again. This field configures backoff time for first retry. Every retry will double the backoff time of previous one.
backoffPolicy.maxBackoffMaximum backoff time for retry. After this value is reached, backoff time will no more increase.
input.checkpointSettings.maxBatchSizeUsed when input.type is Kinesis. Determines the max number of records to aggregate before checkpointing the records. Default is 1000.
input.checkpointSettings.maxBatchWaitUsed when input.type is Kinesis. Determines the max amount of time to wait before checkpointing the records. Default is 10 seconds.
input.checkpointSettings.maxConcurrentUsed when input.type is PubSub. The max number of concurrent evaluation for checkpointer.
output.good.maxConnectionsMaximum number of connections database pool is allowed to reach. Default 10
output.good.threadPoolSizeSize of the thread pool for blocking database operations. Default is value of “maxConnections”
output.bad.delayThresholdSet the delay threshold to use for batching. After this amount of time has elapsed (counting from the first element added), the elements will be wrapped up in a batch and sent. Default 200 milliseconds
output.bad.maxBatchSizeA batch of messages will be emitted when the number of events in batch reaches the given size. Default 500
output.bad.maxBatchBytesA batch of messages will be emitted when the size of the batch reaches the given size. Default 5 MB