[Beta]{class="badge informative"} [Ultimate]{class="badge positive"}
Snowflake streaming source
- The Snowflake streaming source is in beta. Please read the Sources overview for more information on using beta-labeled sources.
- The Snowflake streaming source is available in the API to users who have purchased Real-Time Customer Data Platform Ultimate.
Adobe Experience Platform allows data to be ingested from external sources while providing you with the ability to structure, label, and enhance incoming data using Platform services. You can ingest data from a variety of sources such as Adobe applications, cloud-based storage, databases, and many others.
Experience Platform provides support for streaming data from a Snowflake database.
Understanding the Snowflake streaming source
The Snowflake streaming source works by having data loaded by periodically executing an SQL query and creating an output record for each row in the resulting set.
By using Kafka Connect, the Snowflake streaming source tracks the latest record that it receives from each table, so that it can start in the correct location for the next iteration. The source uses this functionality to filter data and only get the updated rows from a table on each iteration.
Prerequisites
The following section outlines prerequisite steps to complete before you can stream data from your Snowflake database to Experience Platform:
Update your IP address allow list
A list of IP addresses must be added to an allow list prior to working with source connectors. Failing to add your region-specific IP addresses to your allow list may lead to errors or non-performance when using sources. See the IP address allow list page for more information.
The documentation below provides information on how to connect Amazon Redshift to Platform using APIs or the user interface:
Gather required credentials
In order for Flow Service to connect with Snowflake, you must provide the following connection properties:
account
The full account identifier (account name or account locator) of your Snowflake account appended with the suffix snowflakecomputing.com
. The account identifier can be of different formats:
- {ORG_NAME}-{ACCOUNT_NAME}.snowflakecomputing.com (e.g.
acme-abc12345.snowflakecomputing.com
) - {ACCOUNT_LOCATOR}.{CLOUD_REGION_ID}.snowflakecomputing.com (e.g.
acme12345.ap-southeast-1.snowflakecomputing.com
) - {ACCOUNT_LOCATOR}.{CLOUD_REGION_ID}.{CLOUD}.snowflakecomputing.com (e.g.
acme12345.east-us-2.azure.snowflakecomputing.com
)
For more information, read the Snowflake document on account identifiers.
warehouse
database
username
password
role
public
.connectionSpec.id
51ae16c2-bdad-42fd-9fce-8d5dfddaf140
.Configure role settings configure-role-settings
You must configure privileges to a role, even if the default public role is assigned, to allow your source connection to access the relevant Snowflake database, schema, and table. The various privileges for different Snowflake entities is as follows:
For more information on role and privilege management, refer to the Snowflake API reference.
Limitations and frequently asked questions limitations-and-frequently-asked-questions
-
The data throughput for the Snowflake source is 2000 records per second.
-
Pricing can vary depending on the amount of time that a warehouse is active and the size of the warehouse. For the Snowflake source integration, the smallest size, x-small warehouse is sufficient. It is suggested to enable auto-suspend so that the warehouse can suspend on its own when not in use.
-
The Snowflake source polls the database for new data every 10 seconds.
-
Configuration options:
-
You can enable a
backfill
boolean flag for your Snowflake source when creating a source connection.- If backfill is set to true, then the value for timestamp.initial is set to 0. This means that data with a timestamp column greater than 0 epoch time are fetched.
- If backfill is set to false, then the value for timestamp.initial is set to -1. This means that data with a timestamp column greater than the current time (the time in which the source begins ingesting) are fetched.
-
The timestamp column should be formatted as type:
TIMESTAMP_LTZ
orTIMESTAMP_NTZ
. If the timestamp column is set toTIMESTAMP_NTZ
, then the corresponding timezone in which the values are stored should be passed via thetimezoneValue
parameter. If unprovided, the value will default to UTC.TIMESTAMP_TZ
cannot be used a timestamp column or in a mapping.
-
Next steps
The following tutorial provides steps on how to connect your Snowflake streaming source to Experience Platform using the API: