Getting started
First off, install and set up Elasticsearch version 5.x or 2.x.x. For more information check out the installation guide.
Raising the file limit
Elasticsearch keeps a lot of files open simultaneously so you will need to increase the maximum number of files a user can have open. To do this:
$ sudo vim /etc/security/limits.conf
Append the following lines to the file:
{{USERNAME}} soft nofile 32000 {{USERNAME}} hard nofile 32000
Where {{USERNAME}} is the name of the user running Elasticsearch. You will need to logout and restart Elasticsearch before the new file limit takes effect.
To check that this new limit has taken effect you can run the following command from the terminal:
$ curl localhost:9200/_nodes/process?pretty
If the max_file_descriptors
equals 32000 it is running with the new limit.
Defining the mapping
Use the following request to create the mapping for the enriched event type:
$ curl -XPUT 'http://localhost:9200/snowplow' -d '{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "keyword"
}
}
}
},
"mappings": {
"enriched": {
"_ttl": {
"enabled":true,
"default": "604800000"
},
"properties": {
"geo_location": {
"type": "geo_point"
}
}
}
}
}'
Code language: PHP (php)
Elasticsearch will then treat the collector_tstamp field as the timestamp and the geo_location field as a “geo_point”. Documents will be automatically deleted one week (604800000 milliseconds) after their collector_tstamp.
This initialization sets the default analyzer to “keyword”. This means that string fields will not be split into separate tokens for the purposes of searching. This saves space and ensures that URL fields are handled correctly.
If you want to tokenize specific string fields, you can change the “properties” field in the mapping like this:
$ curl -XPUT 'http://localhost:9200/snowplow' -d '{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "keyword"
}
}
}
},
"mappings": {
"enriched": {
"_timestamp" : {
"enabled" : "yes",
"path" : "collector_tstamp"
},
"_ttl": {
"enabled":true,
"default": "604800000"
},
"properties": {
"geo_location": {
"type": "geo_point"
},
"field_to_tokenize": {
"type": "string",
"analyzer": "english"
}
}
}
}
}'
Code language: PHP (php)