Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on AWS
  5. Setup Destinations
  6. Elastic
  7. Configuring Elasticsearch

Configuring Elasticsearch

Getting started

First off, install and set up Elasticsearch version 5.x or 2.x.x. For more information check out the installation guide.

Raising the file limit

Elasticsearch keeps a lot of files open simultaneously so you will need to increase the maximum number of files a user can have open. To do this:

$ sudo vim /etc/security/limits.conf

Append the following lines to the file:

{{USERNAME}} soft nofile 32000 {{USERNAME}} hard nofile 32000

Where {{USERNAME}} is the name of the user running Elasticsearch. You will need to logout and restart Elasticsearch before the new file limit takes effect.

To check that this new limit has taken effect you can run the following command from the terminal:

$ curl localhost:9200/_nodes/process?pretty

If the max_file_descriptors equals 32000 it is running with the new limit.

Defining the mapping

Use the following request to create the mapping for the enriched event type:

$ curl -XPUT 'http://localhost:9200/snowplow' -d '{ "settings": { "analysis": { "analyzer": { "default": { "type": "keyword" } } } }, "mappings": { "enriched": { "_ttl": { "enabled":true, "default": "604800000" }, "properties": { "geo_location": { "type": "geo_point" } } } } }'

Elasticsearch will then treat the collector_tstamp field as the timestamp and the geo_location field as a “geo_point”. Documents will be automatically deleted one week (604800000 milliseconds) after their collector_tstamp.

This initialization sets the default analyzer to “keyword”. This means that string fields will not be split into separate tokens for the purposes of searching. This saves space and ensures that URL fields are handled correctly.

If you want to tokenize specific string fields, you can change the “properties” field in the mapping like this:

$ curl -XPUT 'http://localhost:9200/snowplow' -d '{ "settings": { "analysis": { "analyzer": { "default": { "type": "keyword" } } } }, "mappings": { "enriched": { "_timestamp" : { "enabled" : "yes", "path" : "collector_tstamp" }, "_ttl": { "enabled":true, "default": "604800000" }, "properties": { "geo_location": { "type": "geo_point" }, "field_to_tokenize": { "type": "string", "analyzer": "english" } } } } }'