This document provides guidance on best practices for organizing data assets including, datasets, views, and temporary tables for use with Adobe Experience Platform Query Service. It covers how to structure your data as well as information on how to access, update, and delete this information.
It is important to logically organize your data assets within the Platform Data Lake as they grow. Query Service extends SQL constructs that enable you to logically group data assets within a sandbox. This method of organization allows for the sharing of data assets between schemas without the need to move them physically.
Before continuing with this document, you should have a good understanding of Query Service features and have read the user interface guide.
The following examples demonstrate the constructs available to you through Adobe Experience Platform Query Service to logically organize your data using standard SQL syntax. You should start by creating a database to act as a container for your data points. A database can contain one or more schemas, and each schema can then have one or more references to a data asset (datasets, views, temporary tables, etc). These references include any relationships or associations between the datasets.
See the Query Editor user guide for detailed guidance on how to use the Query Service UI to create SQL queries.
The following SQL constructs to logically organize datasets in a sandbox are supported.
CREATE DATABASE databaseA; CREATE SCHEMA databaseA.schema1; CREATE table t1 ...; CREATE view v1 ...; ALTER TABLE t1 ADD PRIMARY KEY (c1) NOT ENFORCED; ALTER TABLE t2 ADD FOREIGN KEY (c1) REFERENCES t1(c1) NOT ENFORCED;
The example (slightly truncated for brevity) demonstrates this methodology where
databaseA contains schema
Once a schema has been created to act as a container for the data assets, each dataset can be associated with one or more schemas in the database by using standard SQL ALTER TABLE syntax.
The following example adds
v1 to the
databaseA.schema1 container created in the previous example.
ALTER TABLE dataset1 SET SCHEMA databaseA.schema1; ALTER TABLE dataset2 SET SCHEMA databaseA.schema1; ALTER TABLE dataset3 SET SCHEMA databaseA.schema1; ALTER VIEW v1 SET SCHEMA databaseA.schema1;
By appropriately qualifying the database name, any PostgreSQL client can connect to any of the data structures you have created using the SHOW keyword. For more information on the SHOW keyword please see the SHOW section within the SQL syntax documentation.
“all” is the default database name that contains every database and schema container in a sandbox. When you make a PostgreSQL connection using
dbname="all", you can access any database and schema that you have created to logically organize your data.
Listing all databases under
dbname="all" displays three available databases.
SHOW DATABASES; name --------- databaseA databaseB databaseC
Listing all schema under
dbname="all" displays the three schemas related to every database in the sandbox.
SHOW SCHEMAS; database | schema ---------------------- databaseA | schema1 databaseA | schema2 databaseB | schema3
When you make a PostgreSQL connection using
dbname="databaseA", you can access any schema associated with that specific database, as shown in the example below.
SHOW DATABASES; name --------- databaseA SHOW SCHEMAS; database | schema ---------------------- databaseA | schema1 databaseA | schema2
Dot notation allows you to access every table associated with a specific schema connected to your chosen database. By connecting to
DBNAME = databaseA.schema1;, all tables associated with that specific schema (
schema1) are shown. This provides information on which dataset contains which table.
SHOW DATABASES; name --------- databaseA SHOW SCHEMAS; database | schema ---------------------- databaseA | schema1 SHOW tables; name | type ---------------------- dataset1| table dataset2| table dataset3| table
As the amount of data assets in your IMS Organization (or sandbox) grows, it becomes necessary to update or remove data assets from a data container. Individual assets can be removed from the organization container by referencing the appropriate database and schema name using dot notation. The table and view (
v1 respectively) added to
databaseA.schema1 in the first example, are removed using the syntax in the following example.
ALTER TABLE databaseA.schema2.t1 REMOVE SCHEMA databaseA.schema2; ALTER VIEW databaseA.schema2.v1 REMOVE SCHEMA databaseA.schema2;
The DROP TABLE function only physically removes a data asset from the Data Lake when a single reference to the table exists across all databases in your IMS Organization.
DROP TABLE databaseA.schema2.t1;
Both the database and schema can also be removed using standard SQL functions.
If there are other references to data assets associated with the database, the function will throw an error when attempting to remove the database.
DROP DATABASE databaseA;
There are three important considerations to note when removing a schema:
DROP SCHEMA databaseA.schema2;
By reading this document, you now have a better understanding of the best practices regarding the organization and structure of your data assets for use with Adobe Experience Platform Query Service. It is recommended to continue learning about Query Service best practices by reading about data deduplication documentation.