Encrypted data ingestion

Last update: 2023-11-08
  • Topics:
  • Sources
    View more on this topic
  • Created for:
  • Developer
    User
    Admin
    Leader

Adobe Experience Platform allows you to ingest encrypted files through cloud storage batch sources. With encrypted data ingestion, you can leverage asymmetric encryption mechanisms to securely transfer batch data into Experience Platform. Currently, the supported asymmetric encryption mechanisms are PGP and GPG.

The encrypted data ingestion process is as follows:

  1. Create an encryption key pair using Experience Platform APIs. The encryption key pair consists of a private key and a public key. Once created, you can copy or download the public key, alongside its corresponding public key ID and Expiry Time. During this process, the private key will be stored by Experience Platform in a secure vault. NOTE: The public key in the response is Base64-encoded and must be decrypted prior to using.
  2. Use the public key to encrypt the data file that you want to ingest.
  3. Place your encrypted file in your cloud storage.
  4. Once the encrypted file is ready, create a source connection and a dataflow for your cloud storage source. During the flow creation step, you must provide an encryption parameter and include your public key ID.
  5. Experience Platform retrieves the private key from the secure vault to decrypt the data at the time of ingestion.
IMPORTANT

The maximum size of a single encrypted file is 1 GB. For example, you can ingest 2 GBs worth of data in a single dataflow run, however, any individual file in that data cannot exceed 1 GB.

This document provides steps on how to generate a encryption key pair to encrypt your data, and ingest that encrypted data to Experience Platform using cloud storage sources.

Getting started

This tutorial requires you to have a working understanding of the following components of Adobe Experience Platform:

  • Sources: Experience Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
    • Cloud storage sources: Create a dataflow to bring batch data from your cloud storage source to Experience Platform.
  • Sandboxes: Experience Platform provides virtual sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.

Using Platform APIs

For information on how to successfully make calls to Platform APIs, see the guide on getting started with Platform APIs.

Supported file extensions for encrypted files

The list of supported file extensions for encrypted files are as follows:

  • .csv
  • .tsv
  • .json
  • .parquet
  • .csv.gpg
  • .tsv.gpg
  • .json.gpg
  • .parquet.gpg
  • .csv.pgp
  • .tsv.pgp
  • .json.pgp
  • .parquet.pgp
  • .gpg
  • .pgp
NOTE

Encrypted file ingestion in Adobe Experience Platform Sources supports openPGP and not any specific proprietary version of PGP.

Create encryption key pair

The first step in ingesting encrypted data to Experience Platform is to create your encryption key pair by making a POST request to the /encryption/keys endpoint of the Connectors API.

API format

POST /data/foundation/connectors/encryption/keys

Request

The following request generates an encryption key pair using the PGP encryption algorithm.

curl -X POST \
  'https://platform.adobe.io/data/foundation/connectors/encryption/keys' \
  -H 'Authorization: Bearer {{ACCESS_TOKEN}}' \
  -H 'x-api-key: {{API_KEY}}' \
  -H 'x-gw-ims-org-id: {{ORG_ID}}' \
  -H 'x-sandbox-name: {{SANDBOX_NAME}}' \
  -H 'Content-Type: application/json'
  -d '{
      "encryptionAlgorithm": "PGP",
      "params": {
          "passPhrase": "{{PASSPHRASE}}"
      }
  }'
Parameter Description
encryptionAlgorithm The type of encryption algorithm that you are using. The supported encryption types are PGP and GPG.
params.passPhrase The passphrase provides an additional layer of protection for your encryption keys. Upon creation, Experience Platform stores the passphrase in a different secure vault from the public key. You must provide a non-empty string as a passphrase.

Response

A successful response returns your Base64-encoded public key, public key ID, and the expiry time of your keys. The expiry time automatically sets to 180 days after the date of key generation. Expiry time is currently not configurable.

{
    ​"publicKey": "{PUBLIC_KEY}",
    ​"publicKeyId": "{PUBLIC_KEY_ID}",
    ​"expiryTime": "1684843168"
}
Property Description
publicKey The public key is used to encrypt the data in your cloud storage. This key corresponds with the private key that was also created during this step. However, the private key immediately goes to Experience Platform.
publicKeyId The public key ID is used to create a dataflow and ingest your encrypted cloud storage data to Experience Platform.
expiryTime The expiry time defines the expiration date of your encryption key pair. This date is automatically set to 180 days after the date of key generation and is displayed in unix timestamp format.
 (Optional) Create sign verification key pair for signed data

Create customer managed key pair

You can optionally create a sign verification key pair to sign and ingest your encrypted data.

During this stage, you must generate your own private key and public key combination and then use your private key to sign your encrypted data. Next, you must encode your public key in Base64 and then share it to Experience Platform in order for Platform to verify your signature.

Share your public key to Experience Platform

To share your public key, make a POST request to the /customer-keys endpoint while providing your encryption algorithm and your Base64-encoded public key.

API format

POST /data/foundation/connectors/encryption/customer-keys

Request

curl -X POST \
  'https://platform.adobe.io/data/foundation/connectors/encryption/customer-keys' \
  -H 'Authorization: Bearer {{ACCESS_TOKEN}}' \
  -H 'x-api-key: {{API_KEY}}' \
  -H 'x-gw-ims-org-id: {{ORG_ID}}' \
  -H 'x-sandbox-name: {{SANDBOX_NAME}}' \
  -H 'Content-Type: application/json'
  -d '{
      "encryptionAlgorithm": {{ENCRYPTION_ALGORITHM}},
      "publicKey": {{BASE_64_ENCODED_PUBLIC_KEY}}
    }'
Parameter Description
encryptionAlgorithm The type of encryption algorithm that you are using. The supported encryption types are PGP and GPG.
publicKey The public key that corresponds to your customer managed keys used for signing your encrypted. This key must be Base64-encoded.

Response

{
  "publicKeyId": "e31ae895-7896-469a-8e06-eb9207ddf1c2"
}
Property Description
publicKeyId This public key ID is returned in response to sharing your customer managed key with Experience Platform. You can provide this public key ID as the sign verification key ID when creating a dataflow for signed and encrypted data.

Connect your cloud storage source to Experience Platform using the Flow Service API

Once you have retrieved your encryption key pair, you can now proceed and create a source connection for your cloud storage source and bring your encrypted data to Platform.

First, you must create a base connection to authenticate your source against Platform. To create a base connection and authenticate your source, select the source you would like to use from the list below:

After creating a base connection, you must then follow the steps outlined in the tutorial for creating a source connection for a cloud storage source in order to create a source connection, a target connection, and a mapping.

Create a dataflow for encrypted data

NOTE

You must have the following, in order to create a dataflow for encrypted data ingestion:

To create a dataflow, make a POST request to the /flows endpoint of the Flow Service API. To ingest encrypted data, you must add an encryption section to the transformations property and include the publicKeyId that was created in an earlier step.

API format

POST /flows

Request

The following request creates a dataflow to ingest encrypted data for a cloud storage source.

curl -X POST \
  'https://platform.adobe.io/data/foundation/flowservice/flows' \
  -H 'x-api-key: {{API_KEY}}' \
  -H 'x-gw-ims-org-id: {{ORG_ID}}' \
  -H 'x-sandbox-name: {{SANDBOX_NAME}}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "ACME Customer Data",
    "description": "ACME Customer Data (Encrypted)",
    "flowSpec": {
        "id": "9753525b-82c7-4dce-8a9b-5ccfce2b9876",
        "version": "1.0"
    },
    "sourceConnectionIds": [
        "655f7c1b-1977-49b3-a429-51379ecf0e15"
    ],
    "targetConnectionIds": [
        "de688225-d619-481c-ae3b-40c250fd7c79"
    ],
    "transformations": [
        {
            "name": "Mapping",
            "params": {
                "mappingId": "6b6e24213dbe4f57bd8207d21034ff03",
                "mappingVersion":"0"
            }
        },
        {
            "name": "Encryption",
            "params": {
                "publicKeyId":"311ef6f8-9bcd-48cf-a9e9-d12c45fb7a17"
            }
        }
    ],
    "scheduleParams": {
        "startTime": "1675793392",
        "frequency": "once"
    }
}'
Property Description
flowSpec.id The flow spec ID that corresponds with cloud storage sources.
sourceConnectionIds The source connection ID. This ID represents the transfer of data from source to Platform.
targetConnectionIds The target connection ID. This ID represents where the data lands once it is brought over to Platform.
transformations[x].params.mappingId The mapping ID.
transformations.name When ingesting encrypted files, you must provide Encryption as an additional transformations parameter for your dataflow.
transformations[x].params.publicKeyId The public key ID that you created. This ID is one half of the encryption key pair used to encrypt your cloud storage data.
scheduleParams.startTime The start time for the dataflow in epoch time.
scheduleParams.frequency The frequency at which the dataflow will collect data. Acceptable values include: once, minute, hour, day, or week.
scheduleParams.interval The interval designates the period between two consecutive flow runs. The interval’s value should be a non-zero integer. Interval is not required when frequency is set as once and should be greater than or equal to 15 for other frequency values.
curl -X POST \
  'https://platform.adobe.io/data/foundation/flowservice/flows' \
  -H 'x-api-key: {{API_KEY}}' \
  -H 'x-gw-ims-org-id: {{ORG_ID}}' \
  -H 'x-sandbox-name: {{SANDBOX_NAME}}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "ACME Customer Data (with Sign Verification)",
    "description": "ACME Customer Data (with Sign Verification)",
    "flowSpec": {
        "id": "9753525b-82c7-4dce-8a9b-5ccfce2b9876",
        "version": "1.0"
    },
    "sourceConnectionIds": [
        "655f7c1b-1977-49b3-a429-51379ecf0e15"
    ],
    "targetConnectionIds": [
        "de688225-d619-481c-ae3b-40c250fd7c79"
    ],
    "transformations": [
        {
            "name": "Mapping",
            "params": {
                "mappingId": "6b6e24213dbe4f57bd8207d21034ff03",
                "mappingVersion":"0"
            }
        },
        {
            "name": "Encryption",
            "params": {
                "publicKeyId":"311ef6f8-9bcd-48cf-a9e9-d12c45fb7a17",
                "signVerificationKeyId":"e31ae895-7896-469a-8e06-eb9207ddf1c2"
            }
        }
    ],
    "scheduleParams": {
        "startTime": "1675793392",
        "frequency": "once"
    }
}'
Property Description
params.signVerificationKeyId The sign verification key ID is the same as the public key ID that was retrieved after sharing your Base64-encoded public key with Experience Platform.

Response

A successful response returns the ID (id) of the newly created dataflow for your encrypted data.

{
    "id": "dbc5c132-bc2a-4625-85c1-32bc2a262558",
    "etag": "\"8e000533-0000-0200-0000-5f3c40fd0000\""
}

Restrictions on recurring ingestion

Encrypted data ingestion does not support ingestion of recurring or multi-level folders in sources. All encrypted files must be contained in a single folder. Wildcards with multiple folders in a single source path are also not supported.

The following is an example of a supported folder structure, where the source path is /ACME-customers/*.csv.gpg.

In this scenario, the files in bold are ingested into Experience Platform.

  • ACME-customers
    • File1.csv.gpg
    • File2.json.gpg
    • File3.csv.gpg
    • File4.json
    • File5.csv.gpg

The following is an example of an unsupported folder structure where the source path is /ACME-customers/*.

In this scenario, the flow run will fail and return an error message indicating that data cannot be copied from the source.

  • ACME-customers
    • File1.csv.gpg
    • File2.json.gpg
    • Subfolder1
      • File3.csv.gpg
      • File4.json.gpg
      • File5.csv.gpg
  • ACME-loyalty
    • File6.csv.gpg

Next steps

By following this tutorial, you have created an encryption key pair for your cloud storage data, and a dataflow to ingested your encrypted data using the Flow Service API. For status updates on your dataflow’s completeness, errors, and metrics, read the guide on monitoring your dataflow using the Flow Service API.

On this page