Filter row-level data for a source using the Flow Service API
Read this guide for steps on how to filter row-level data for a source using the Flow Service API.
Get started
This tutorial requires you to have a working understanding of the following components of Adobe Experience Platform:
- Sources: Experience Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
- Sandboxes: Experience Platform provides virtual sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.
Using Platform APIs
For information on how to successfully make calls to Platform APIs, see the guide on getting started with Platform APIs.
Filter source data filter-source-data
The following outlines steps to take in order to filter row-level data for your source.
Retrieve your connection specs retrieve-your-connection-specs
The first step in filtering row-level data for your source is to retrieve your source’s connection specs and determine the operators and language that your source supports.
To retrieve a given source’s connection spec, make a GET request to the /connectionSpecs
endpoint of the Flow Service API and providing the property name of your source as part of your query parameters.
API format
GET /connectionSpecs/{QUERY_PARAMS}
{QUERY_PARAMS}
name
property and specifying "google-big-query"
in your search.The following request retrieves the connection specs for Google BigQuery.
code language-shell |
---|
|
A successful response returns the status code 200 and the connection specs for Google BigQuery, including information on its supported query language and logical operators.
code language-json |
---|
|
table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 layout-auto | |
---|---|
Property | Description |
attributes.filterAtSource.enabled |
Determines whether the queried source supports filtering for row-level data. |
attributes.filterAtSource.queryLanguage |
Determines the query language that the queried source supports. |
attributes.filterAtSource.logicalOperators |
Determines the logical operators that you can use to filter row-level data for your source. |
attributes.filterAtSource.comparisonOperators |
Determines comparison operators that you can use to filter row-level data for your source. See the table below for more information on comparison operators. |
attributes.filterAtSource.columnNameEscapeChar |
Determines the character to use to escape columns. |
attributes.filterAtSource.valueEscapeChar |
Determines how values will be surrounded when writing an SQL query. |
Comparison operators comparison-operators
==
!=
<
>
<=
>=
like
WHERE
clause to search for a specified pattern.in
Specify filtering conditions for ingestion specify-filtering-conditions-for-ingestion
Once you have identified the logical operators and query language that your source supports, you can use Profile Query Language (PQL) to specify the filtering conditions you want to apply to your source data.
In the example below, conditions are applied to only select data that equal the provided values for the node types listed as parameters.
{
"type": "PQL",
"format": "pql/json",
"value": {
"nodeType": "fnApply",
"fnName": "=",
"params": [
{
"nodeType": "fieldLookup",
"fieldName": "city"
},
{
"nodeType": "literal",
"value": "DDN"
}
]
}
}
Preview your data preview-your-data
You can preview your data by making a GET request to the /explore
endpoint of the Flow Service API while providing filters
as part of your query parameters and specifying your PQL input conditions in Base64.
API format
GET /connections/{BASE_CONNECTION_ID}/explore?objectType=table&object={TABLE_PATH}&preview=true&filters={FILTERS}
{BASE_CONNECTION_ID}
{TABLE_PATH}
{FILTERS}
code language-shell |
---|
|
A successful response returns the contents and structure of your data.
code language-json |
---|
|
Create a source connection for filtered data
To create a source connection and ingest filtered data, make a POST request to the /sourceConnections
endpoint and provide your filtering conditions in the request body parameters.
API format
POST /sourceConnections
The following request creates a source connection to ingest data from test1.fasTestTable
where city
= DDN
.
code language-shell |
---|
|
A successful response returns the unique identifier (id
) of the newly created source connection.
code language-json |
---|
|
Filter activity entities for Marketo Engage filter-for-marketo
You can use row-level filtering to filter for activity entities when using the Marketo Engage source connector. Currently, you can only filter for activity entities and standard activity types. Custom activities remain governed under Marketo field mappings.
Marketo standard activity types marketo-standard-activity-types
The following table outlines the standard activity types for Marketo. Use this table as reference for your filtering criteria.
Follow the steps below to filter your standard activity entities when using the Marketo source connector.
Create a draft dataflow
First, create a Marketo dataflow and save it as a draft. Refer to the following documentation for detailed steps on how to create a draft dataflow:
Retrieve your dataflow ID
Once you have a drafted dataflow, you must then retrieve its corresponding ID.
In the UI, navigate to the sources catalog and then select Dataflows from the top header. Use the status column to identify all dataflows that were saved in draft mode, and then select your dataflow’s name. Next, use the Properties panel on the right to locate your dataflow ID.
Retrieve your dataflow details
Next, you must retrieve your dataflow details, particularly the source connection ID associated with your dataflow. To retrieve your dataflow details, make a GET request to the /flows
endpoint and provide your dataflow ID as a path parameter.
API format
GET /flows/{FLOW_ID}
{FLOW_ID}
The following request retrieves information on dataflow ID: a7e88a01-40f9-4ebf-80b2-0fc838ff82ef
.
code language-shell |
---|
|
A successful response returns your dataflow details, including information on its corresponding source and target connections. You must take note of your source and target connection IDs, as these values are required later, in order to publish your dataflow.
code language-json line-numbers data-start-1 data-line-offset-4 h-23 h-26 |
---|
|
Retrieve your source connection details
Next, use your source connection ID and make a GET request to the /sourceConnections
endpoint to retrieve your source connection details.
API format
GET /sourceConnections/{SOURCE_CONNECTION_ID}
{SOURCE_CONNECTION_ID}
code language-shell |
---|
|
A successful response returns the details of your source connection. Take note of the version as you will need this value in the next step in order to update your source connection.
code language-json line-numbers data-start-1 data-line-offset-4 h-30 |
---|
|
Update your source connection with filtering conditions
Now that you have your source connection ID and its corresponding version, you can now make a PATCH request with the filter conditions that specify your standard activity types.
To update your source connection, make a PATCH request to the /sourceConnections
endpoint and provide your source connection ID as a query parameter. Additionally, you must provide an If-Match
header parameter, with the corresponding version of your source connection.
If-Match
header is required when making a PATCH request. The value for this header is the unique version/etag of the dataflow you want to update. The version/etag value updates with every successful update of a dataflow.API format
PATCH /sourceConnections/{SOURCE_CONNECTION_ID}
{SOURCE_CONNECTION_ID}
code language-shell |
---|
|
A successful response returns your source connection ID and etag (version).
code language-json |
---|
|
Publish your source connection
With your source connection updated with your filtering conditions, you can now move on from the draft state and publish your source connection. To do so, make a POST request to the /sourceConnections
endpoint and provide the ID of your draft source connection as well as an action operation for publishing.
API format
POST /sourceConnections/{SOURCE_CONNECTION_ID}/action?op=publish
{SOURCE_CONNECTION_ID}
op
op
to publish
.The following request publishes a drafted source connection.
code language-shell |
---|
|
A successful response returns your source connection ID and etag (version).
code language-json |
---|
|
Publish your target connection
Similar to the previous step, you must also publish your target connection, in order to proceed and publish your draft dataflow. Make a POST request to the /targetConnections
endpoint and provide the ID of the draft target connection that you want to publish, as well as an action operation for publishing.
API format
POST /targetConnections/{TARGET_CONNECTION_ID}/action?op=publish
{TARGET_CONNECTION_ID}
op
op
to publish
.The following request publishes the target connection with ID: 7e53e6e8-b432-4134-bb29-21fc6e8532e5
.
code language-shell |
---|
|
A successful response returns the ID and corresponding etag for your published target connection.
code language-json |
---|
|
Publish your dataflow
With your source and target connections both published, you can now proceed to the final step and publish your dataflow. To publish your dataflow, make a POST request to the /flows
endpoint and provide your dataflow ID and an action operation for publishing.
API format
POST /flows/{FLOW_ID}/action?op=publish
{FLOW_ID}
op
op
to publish
.The following request publishes your draft dataflow.
code language-shell |
---|
|
A successful response returns the ID and corresponding etag
of your dataflow.
code language-json |
---|
|
You can use the Experience Platform UI to verify that your draft dataflow has been published. Navigate to the dataflows page in the sources catalog and reference the Status of your dataflow. If successful, the status should now be set to Enabled.
- A dataflow with filtering enabled will only be backfilled once. Any changes in the that you make in the filtering criteria (be it an addition or a removal) can only take effect for incremental data.
- If you need to ingest historical data for any new activity type(s), you are recommended to create a new dataflow and define the filtering criteria with the appropriate activity types in the filter condition.
- You cannot filter custom activity types.
- You cannot preview filtered data.
Appendix
This section provides further examples of different payloads for filtering.
Singular conditions
You can omit the initial fnApply
for scenarios that only require one condition.
code language-json |
---|
|
Using the in
operator
See the sample payload below for an example of the operator in
.
code language-json |
---|
|
Using the isNull
operator
See the sample payload below for an example of the operator isNull
.
code language-json |
---|
|
Using the NOT
operator
See the sample payload below for an example of the operator NOT
.
code language-json |
---|
|
Example with nested conditions
See the sample payload below for an example of complex nested conditions.
code language-json |
---|
|