Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Loaders and storage targets
  5. RDB Loader
  6. Previous versions
  7. RDB Loader
  8. RDB loader configuration reference

RDB loader configuration reference

Shredder and loader use different configurations starting from 2.0.0. An example config for loader can be found here.

This is a complete list of the options that can be configured

regionOptional if it can be resolved with AWS region provider chain. AWS region of the S3 bucket.
messageQueueRequired. A SQS topic name used by the shredder and loader to communicate.
jsonpathsOptional. A S3 URI that holds JSONPath files.
storage.hostRequired. Host name of redshift.
storage.portRequired. Port of redshift.
storage.databaseRequired. Name of the database.
storage.roleArnRequired. WS Role ARN allowing Redshift to load data from S3
storage.schemaRequired. Redshift schema name, e.g. “atomic”
storage.usernameRequired. DB user with permission to load data.
storage.passwordRequired. Password of DB user
storage.jdbc.blockingRowsOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.disableIsValidQueryOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.dsiLogLevelOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.filterLevelOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.loginTimeoutOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.logLevelOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.socketTimeoutOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.sslOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.sslModeOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.sslRootCertOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.tcpKeepAliveOptional. Refer to the Redshift JDBC driver reference.
storage.jdbc.tcpKeepAliveMinutesOptional. Refer to the Redshift JDBC driver reference.
storage.maxErrorOptional. Configures the Redshift MAXERROR load option. Default value 10.
monitoring.webhook.endpointOptional. An http endpoint where monitoring alerts should be sent.
monitoring.webhook.tagsOptional. Custom key-value pairs which can be added to the monitoring webhooks. E.g. {"tag1": "label1"}
monitoring.snowplow.appIdOptional. When using Snowplow tracking, set this appId in the event.
monitoring.snowplow.collectorOptional. Set to a collector url to turn on snowplow tracking.
monitoring.sentry.dsnOptional. For tracking runtime exceptions.
monitoring.statsd.hostnameOptional, for sending loading metrics (latency and event counts) to a statsd server.
monitoring.statsd.portOptional, port of the statsd server.
monitoring.statsd.tagsE.g. { "key1": "value1", "key2": "value2" }. Tags are used to annotate the statsd metric with any contextual information.
monitoring.statsd.prefixOptional, default “snoplow.rdbloader”. Configures the prefix of statsd metric names.
monitoring.folders.stagingRequired if folder monitoring section included in the config. Configuration for periodic unloaded/corrupted folders checks. Path where Loader could store auxiliary logs. Loader should be able to write here, Redshift should be able to load from here
monitoring.folders.periodRequired if folder monitoring section included in the config. How often to check for unloaded/corrupted folders.
monitoring.folders.sinceRequired if folder monitoring section included in the config. Specifies until when folder monitoring will monitor.
monitoring.folders.untilRequired if folder monitoring section included in the config. Specifies from when folder monitoring will start to monitor.
monitoring.folders.shredderOutputRequired if folder monitoring section included in the config. Path to shredded archive.
monitoring.healthCheck.frequency
added in 2.1.0
Optional. How often to run a periodic DB health check, which raises a warning if DB does not respond to a SELECT 1
monitoring.healthCheck.timeout
added in 2.1.0
Optional. How long to wait for a health check response.
retryQueue.period
added in 2.1.0
Optional. Configures a backlog of recently failed folders that could be automatically retried. period is how often a batch of failed folders should be pulled into a discovery queue.
retryQueue.size
added in 2.1.0
Required if retryQueue section is included. How many failures should be kept in memory. After the limit is reached, new failures are dropped.
retryQueue.maxAttempts
added in 2.1.0
Required if retryQueue section is included. How many attempts to make for each folder. After the limit is reached new failures are dropped.
retryQueue.interval
added in 2.1.0
Required if retryQueue section is included. Artificial pause after each failed folder before being added to the retry queue.