A dataflow is a scheduled task that retrieves and ingests data from a source to a Platform dataset. This tutorial provides steps to configure a new dataflow using your cloud storage base connector.
This tutorial requires a working understanding of the following components of Adobe Experience Platform:
Additionally, this tutorial requires that you have already created a cloud storage connector. A list of tutorials for creating different cloud storage connectors in the UI can be found in the source connectors overview.
After creating your cloud storage connector, the Select data step appears, providing an interface for you to select which stream you will stream data from.
The Mapping step appears, providing an interactive interface to map the source data to a Platform dataset.
Choose a dataset for inbound data to be ingested into. You can either use an existing dataset or create a new one.
Use an existing dataset
To ingest data into an existing dataset, select Use existing dataset, then click the dataset icon.
The Select dataset dialog appears. Find the dataset you you wish to use, select it, then click Continue.
Use a new dataset
To ingest data into a new dataset, select Create new dataset and enter a name and description for the dataset in the fields provided. Then, select the schema you want to use under the dropdown.
The Dataflow detail step appears, allowing you to name and give a brief description about your new dataflow.
Provide values for the dataflow and click Next.
The Review step appears, allowing you to review your new dataflow before it is created. Details are grouped within the following categories:
Once you have reviewed your dataflow, click Finish and allow some time for the dataflow to be created.
Once your cloud storage dataflow has been created, you can monitor the data that is being ingested through it. For more information on monitoring and deleting dataflows, see the tutorial on monitoring dataflows.
By following this tutorial, you have successfully created a dataflow to bring in data from an external cloud storage, and gained insight on monitoring datasets. Incoming data can now be used by downstream Platform services such as Real-time Customer Profile and Data Science Workspace. See the following documents for more details:
The following sections provide additional information for working with source connectors.
When a dataflow is created, it immediately becomes active and ingests data according to the schedule it was given. You can disable an active dataflow at any time by following the instructions below.
Within the Sources workspace, click the Browse tab. Next, click the name of the connection that’s associated the active dataflow you wish to disable.
The Source activity page appears. Select the active dataflow from the list to open its Properties column on the right-hand side of the screen, which contains an Enabled toggle button. Click the toggle to disable the dataflow. The same toggle can be used to re-enable a dataflow after it has been disabled.
Inbound data from your source connector can be used towards enriching and populating your Real-time Customer Profile data. For more information on populating your Real-time Customer Profile data, see the tutorial on Profile population.