IP Lookup Enrichment

Summary

This enrichment uses MaxMind databases to look up useful data based on the IP address collected by your Snowplow tracker(s).

Overview

When a user browses your site or app their IP address is collected. MaxMind maintains databases of additional points of information like geographic location, second level domain names (acme.com), Internet Service Provider, organization name and several other data points publicly associated with a given IP address.

The IP lookup enrichment uses MaxMind databases in order to take the IP address collected and add additional data points to every event generated by the user with a given IP address.

Some of the databases MaxMind maintains require a commercial subscription with MaxMind.

Setting up this Enrichment

1. Decide which databases you’d like to use and download them

MaxMind offers five different databases with information on different IP addresses which can be used with Snowplow, one free:

And four paid for databases:

  • GeoIP2 City, which also contains geographic information, but that with a lot more precision and coverage than that found in the GeoLite2 Free Database
  • GeoIP2 ISP, which contains information about the ISP serving that IP
  • GeoIP2 Domain, which contains information about the domain at that IP address
  • GeoIP2 Connection Type, which contains information about the connection type at that IP address.

You need to decide which of the different Maxmind databases listed above you wish to enrich your data with, download the .mmdb files and then setup the enrichment configuration accordingly.

2. Upload the databases to a location on your cloud

Once downloaded, take the .mmdb file(s) and upload them to a location on your cloud:

  • Amazon S3 (if running Snowplow on AWS) e.g. s3://my-private-bucket/third-party/maxmind
  • Google Cloud Storage (if running Snowplow on GCS) e.g. gs://my-private-bucket/third-party/maxmind

When the database(s) need updating in future you can simply download the latest version and overwrite this file in your storage.

MaxMind also offer a method to download and update their databases programmatically.

3. Configure the enrichment for your pipeline

Enable the IP Lookup enrichment for your pipeline either through Enrichments UI or by enabling it in your Github account.

There are four possible fields you can add to the “parameters” section of the enrichment configuration JSON: “geo”, “isp”, “domain”, and “connectionType”:

  • The database field contains the name of the MaxMind database file.
  • The uri field contains the URI of the bucket in which the database file is found. This can have either http: or s3: or gs: as the scheme and must not end with a trailing slash.

It is important to note that accepted database filenames are the strings which are allowed in the database subfield. If the file name you provide is not one of these, the enrichment JSON will fail validation.

Enrichment parameter Valid database names
geo "GeoLite2-City.mmdb"
"GeoIP2-City.mmdb"
isp "GeoIP2-ISP.mmdb"
domain "GeoIP2-Domain.mmdb"
connectionType "GeoIP2-Connection-Type.mmdb"

Example configurations

Example minimal configuration

On AWS
{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLite2-City.mmdb",
                "uri": "s3://my-private-bucket/third-party/maxmind"
            }
        }
    }
}
On GCS
{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLite2-City.mmdb",
                "uri": "gs://my-private-bucket/third-party/maxmind"
            }
        }
    }
}

In the configurations above, we are enabling this enrichment to take all IP addresses from each event and do a lookup against the GeoLite2-City.mmdb.

The parameters to set start with the type of MaxMind database we are accessing (in this case the “geo” type). Then we specify the name of the database file, and the URI it’s available at.

When configuring the enrichment you will replace the following string my-private-bucket/third-party/maxmind with the path to your hosted database.

If we were to enable this enrichment as shown, we would see the following columns in our data warehouse get populated with data for a user with the IP Address 37.157.33.178:

Column name Sample data Purpose
geo_country GB Country of IP origin
geo_region ENG Region of IP origin
geo_city London City of IP origin
geo_zipcode EC2A Zip (postal) code of IP origin
geo_latitude 51.5237 An approximate latitude (coordinates)
geo_longitude -0.089 An approximate longitude (coordinates)
geo\_region_name England Region of IP origin
geo_timezone Europe/London Timezone of IP origin

Example full configuration

To extend this enrichment for the additional databases offered by Maxmind we would simply repeat the process for the other databases:

On AWS
{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoIP2-City.mmdb",
                "uri": "s3://my-private-bucket/third-party/maxmind"
            },
            "isp": {
                "database": "GeoIP2-ISP.mmdb",
                "uri": "s3://my-private-bucket/third-party/maxmind"
            },
            "domain": {
                "database": "GeoIP2-Domain.mmdb",
                "uri": "s3://my-private-bucket/third-party/maxmind"
            },
            "connectionType": {
                "database": "GeoIP2-Connection-Type.mmdb",
                "uri": "s3://my-private-bucket/third-party/maxmind"
            }
        }
    }
}
On GCS
{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoIP2-City.mmdb",
                "uri": "gs://my-private-bucket/third-party/maxmind"
            },
            "isp": {
                "database": "GeoIP2-ISP.mmdb",
                "uri": "gs://my-private-bucket/third-party/maxmind"
            },
            "domain": {
                "database": "GeoIP2-Domain.mmdb",
                "uri": "gs://my-private-bucket/third-party/maxmind"
            },
            "connectionType": {
                "database": "GeoIP2-Connection-Type.mmdb",
                "uri": "gs://my-private-bucket/third-party/maxmind"
            }
        }
    }
}

The data from these databases would then be loaded into the following columns:

Column name Purpose
ip_isp ISP name
ip_organization Organization name for larger networks
ip_domain Second level domain name
ip_netspeed Indication of connection type (dial-up, cellular, cable/DSL)

For help on this or any other enrichment please contact support@snowplowanalytics.com.