PII Pseudonymization Enrichment

Summary

The PII Enrichment enables Snowplow users to better protect the privacy rights of data subjects, therefore aiding in compliance for regulatory measures.

Overview

As more and more regulation is brought out worldwide to protect individuals in regards to their behavioural and personal data that is collected, stored and processed, Snowplow wants to ensure that we enable our users to have more control over how that data is handled.

This enrichment builds off of the ability to pseudonimize certain fields collected using Snowplow trackers. This enrichment is configured to choose which fields to hash along with other configuration settings related to the hashing itself.

To read more detail on this enrichment go here.

For help setting up this enrichment for your pipeline please contact us at support@snowplowanalytics.com

Example

{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments\/pii_enrichment_config\/jsonschema\/2-0-0",
  "data": {
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "name": "pii_enrichment_config",
    "emitEvent": true,
    "enabled": true,
    "parameters": {
      "pii": [
        {
          "pojo": {
            "field": "user_id"
          }
        },
        {
          "pojo": {
            "field": "user_fingerprint"
          }
        },
        {
          "json": {
            "field": "unstruct_event",
            "schemaCriterion": "iglu:com.mailchimp\/subscribe\/jsonschema\/1-*-*",
            "jsonPath": "$.data.['email', 'ip_opt']"
          }
        }
      ],
      "strategy": {
        "pseudonymize": {
          "hashFunction": "SHA-1",
          "salt": "pepper123"
        }
      }
    }
  }
}

The configuration above is for a Snowplow pipeline that is receiving events from the Snowplow JavaScript Tracker, plus a Mailchimp webhook integration:

  • The Snowplow JavaScript Tracker has been configured to emit events which includes the user_id and user_fingerprin fields
  • The Mailchimp webhook (available since release 0.9.11) is emitting subscribe events (among other events, ignored for the purpose of this example)

With the above PII Enrichment configuration, then, you are specifying that:

  • You wish for the user_id and user_fingerprint from the Snowplow Canonical event model fields to be hashed (the full list of supported fields for pseudonymization is viewable in the enrichment configuration schema)
  • You wish for the data.email and data.ip_opt fields from the Mailchimp subscribe event to be hashed, but only if the schema version begins with 1-
  • You wish to use the SHA-256 variant of the algorithm for the pseudonymization
  • You wish for the re-identification events to be emitted to the pii stream (see stream enrich configuration for configuring the stream)
  • You wish for the salt value pepper123 to be used in hashing all the values