Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Enrichment
  5. Stream Enrich
  6. Run stream enrich

Run stream enrich

Stream enrich can be run with different message queues:

  • kinesis
  • kafka
  • nsq
  • stdin

1. Run

1.1. Docker image (recommended)

With configuration files in path_to_config_dir directory :

docker run \ -d \ --name stream-enrich \ --restart always \ --log-driver awslogs \ --log-opt awslogs-group=${log_group_name} \ --log-opt awslogs-stream=`ec2metadata --instance-id` \ --network host \ -v ${path_to_config_dir}:/snowplow/config \ -e 'JAVA_OPTS=-Xms${heap_size} -Xmx${heap_size} -Dorg.slf4j.simpleLogger.defaultLogLevel=${log_level}' \ snowplow/stream-enrich-${message_queue}:${version} \ --config /snowplow/config/config.hocon \ --resolver file:/snowplow/config/iglu_resolver.json \ --enrichments file:/snowplow/config/enrichments/ \ --force-cached-files-download
Code language: JavaScript (javascript)

1.2. Fat jar

$ java -Dorg.slf4j.simpleLogger.defaultLogLevel=${log_level} \ -jar snowplow-stream-enrich-${message_queue}-${version}.jar \ --config config.hocon \ --resolver file:iglu_resolver.json \ --enrichments file:path/to/enrichments

2. Config in DynamoDB / Datastore

2.1. DynamoDB

When using with Kinesis, it’s possible to store the configuration of the resolver and/or enrichments in DynamoDB. In this case dynamodb: prefix needs to be used in place of file: prefix:

--resolver dynamodb:eu-west-1/configuration_table/resolver \ --enrichments dynamodb:eu-west-1/configuration_table/enrichment_

In this case it’s assumed that the enrichments and resolver are stored in a table named configuration_table in eu-west-1, that the key for that table is id, that the resolver JSON is stored in an item whose key has value resolver, and the enrichments are stored in items whose keys have values beginning with enrichment.

In the example above configuration_table is a table with 2 columns : id and json.

There must be one line with resolver as id and the content in the json column.

enrichment_ is the prefix used in the id column to configure an enrichment, and then the content must be put in the json column. Here is the list of all the enrichments (with enrichment_ prefix) in id column :

  • enrichment_api_request_enrichment_config
  • enrichment_http_header_extractor_config
  • enrichment_iab_spiders_and_robots_enrichment
  • enrichment_pii_enrichment_config
  • enrichment_sql_query_enrichment_config
  • enrichment_weather_enrichment_config
  • enrichment_yauaa_enrichment_config
  • enrichment_anon_ip
  • enrichment_campaign_attribution
  • enrichment_cookie_extractor_config
  • enrichment_currency_conversion_config
  • enrichment_event_fingerprint_config
  • enrichment_ip_lookups
  • enrichment_javascript_script_config
  • enrichment_referer_parser
  • enrichment_ua_parser_config
  • enrichment_user_agent_utils_config

2.2. Datastore

When using with Google PubSub, it’s possible to store the configuration of the resolver and/or enrichments in Datastore. In this case datastore: prefix needs to be used in place of file: prefix:

--resolver datastore:resolver/iglu \ --enrichments datastore:enrichment/enrich-

In this case it’s assumed that the resolver has kind resolver and has iglu as key, the value being stored in column “json”. It also assumes that the enrichments have kind enrichment and their names start with enrich- with their values being stored in column “json”.