查询服务中的SQL语法
您可以在Adobe Experience Platform查询服务中为SELECT语句和其他有限命令使用标准ANSI SQL。 本文档介绍Query Service支持的SQL语法。
选择查询 select-queries
以下语法定义了Query Service支持的SELECT查询:
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT [( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ SNAPSHOT { SINCE start_snapshot_id | AS OF end_snapshot_id | BETWEEN start_snapshot_id AND end_snapshot_id } ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT | MINUS } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start ]
下面的选项卡部分提供了FROM、GROUP和WITH关键字的可用选项。
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
| code language-sql |
|---|
|
以下小节提供了可在查询中使用的附加子句的详细信息,前提是它们遵循上述格式。
SNAPSHOT子句
此子句可用于基于快照ID增量读取表中的数据。 快照ID是由Long-type数字表示的检查点标记,每次将数据写入到快照ID时,该数字都会应用于数据湖表。 SNAPSHOT子句将其自身附加到它旁边使用的表关系上。
[ SNAPSHOT { SINCE start_snapshot_id | AS OF end_snapshot_id | BETWEEN start_snapshot_id AND end_snapshot_id } ]
示例
SELECT * FROM table_to_be_queried SNAPSHOT SINCE start_snapshot_id;
SELECT * FROM table_to_be_queried SNAPSHOT AS OF end_snapshot_id;
SELECT * FROM table_to_be_queried SNAPSHOT BETWEEN start_snapshot_id AND end_snapshot_id;
SELECT * FROM table_to_be_queried SNAPSHOT BETWEEN 'HEAD' AND start_snapshot_id;
SELECT * FROM table_to_be_queried SNAPSHOT BETWEEN end_snapshot_id AND 'TAIL';
SELECT * FROM (SELECT id FROM table_to_be_queried SNAPSHOT BETWEEN start_snapshot_id AND end_snapshot_id) C;
(SELECT * FROM table_to_be_queried SNAPSHOT SINCE start_snapshot_id) a
INNER JOIN
(SELECT * from table_to_be_joined SNAPSHOT AS OF your_chosen_snapshot_id) b
ON a.id = b.id;
SNAPSHOT子句中使用HEAD或TAIL时,必须用单引号将它们括起来(例如,“HEAD”、“TAIL”)。 使用不带引号的这些变量会导致语法错误。下表说明了SNAPSHOT子句中每个语法选项的含义。
SINCE start_snapshot_idAS OF end_snapshot_idBETWEEN start_snapshot_id AND end_snapshot_idstart_snapshot_id且包括end_snapshot_id。BETWEEN HEAD AND start_snapshot_idstart_snapshot_id中的行。BETWEEN end_snapshot_id AND TAILend_snapshot_id之后将数据读取到数据集结尾(不包括快照ID)。 This means that if end_snapshot_id is the last snapshot in the dataset, the query will return zero rows because there are no snapshots beyond that last snapshot.SINCE start_snapshot_id INNER JOIN table_to_be_joined AS OF your_chosen_snapshot_id ON table_to_be_queried.id = table_to_be_joined.idtable_to_be_queried and joins it with the data from table_to_be_joined as it was at your_chosen_snapshot_id. The join is based on matching IDs from the ID columns of the two tables being joined.A SNAPSHOT clause works with a table or table alias but not on top of a subquery or view. A SNAPSHOT clause works anywhere a SELECT query on a table can be applied.
Also, you can use HEAD and TAIL as special offset values for snapshot clauses. Using HEAD refers to an offset before the first snapshot, while TAIL refers to an offset after the last snapshot.
resolve_fallback_snapshot_on_failure) is set:- If the optional fallback behavior flag is set, Query Service chooses the earliest available snapshot, sets it as the start snapshot, and returns the data between the earliest available snapshot and the specified end snapshot. This data is inclusive of the earliest available snapshot.
WHERE clause
By default, matches produced by a WHERE clause on a SELECT query are case-sensitive. If you want matches to be case-insensitive, you can use the keyword ILIKE instead of LIKE.
[ WHERE condition { LIKE | ILIKE | NOT LIKE | NOT ILIKE } pattern ]
The logic of the LIKE and ILIKE clauses are explained in the following table:
WHERE condition LIKE pattern~~WHERE condition NOT LIKE pattern!~~WHERE condition ILIKE pattern~~*WHERE condition NOT ILIKE pattern!~~*示例
SELECT * FROM Customers
WHERE CustomerName ILIKE 'a%';
This query returns customers with names beginning in "A" or "a".
JOIN
A SELECT query that uses joins has the following syntax:
SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition
UNION, INTERSECT, and EXCEPT
The UNION, INTERSECT, and EXCEPT clauses are used to combine or exclude like rows from two or more tables:
SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT | MINUS]
SELECT statement 2
CREATE TABLE AS SELECT create-table-as-select
Use the CREATE TABLE AS SELECT (CTAS) command to materialize the results of a SELECT query into a new table. This is useful for creating transformed datasets, performing aggregations, or previewing feature-engineered data before using it in a model.
If you're ready to train a model using transformed features, see the Models documentation for guidance on using CREATE MODEL with the TRANSFORM clause.
You can optionally include a TRANSFORM clause to apply one or more feature engineering functions directly within the CTAS statement. 使用TRANSFORM在模型训练之前检查转换逻辑的结果。
此语法适用于永久表和临时表。
CREATE TABLE table_name
[WITH (schema='target_schema_title', rowvalidation='false', label='PROFILE')]
[TRANSFORM (transformFunctionExpression1, transformFunctionExpression2, ...)]
AS (select_query)
CREATE TEMP TABLE table_name
[WITH (schema='target_schema_title', rowvalidation='false', label='PROFILE')]
[TRANSFORM (transformFunctionExpression1, transformFunctionExpression2, ...)]
AS (select_query)
schemarowvalidationlabelPROFILE将数据集标记为已启用配置文件摄取。transformTRANSFORM子句文档。select_querySELECT语句。 有关详细信息,请参阅SELECT查询部分。SELECT语句必须包含聚合函数(如COUNT、SUM或MIN)的别名。 您可以提供SELECT查询,无论是否带有圆括号。 无论是否使用TRANSFORM子句,均适用。示例
使用TRANSFORM子句预览一些工程功能的基本示例:
CREATE TABLE ctas_transform_table_vp14
TRANSFORM(
String_Indexer(additional_comments) si_add_comments,
one_hot_encoder(si_add_comments) as ohe_add_comments,
tokenizer(comments) as token_comments
)
AS SELECT * FROM movie_review_e2e_DND;
更高级的示例,其中包含多个转换步骤:
CREATE TABLE ctas_transform_table
TRANSFORM(
String_Indexer(additional_comments) si_add_comments,
one_hot_encoder(si_add_comments) as ohe_add_comments,
tokenizer(comments) as token_comments,
stop_words_remover(token_comments, array('and','very','much')) stp_token,
ngram(stp_token, 3) ngram_token,
tf_idf(ngram_token, 20) ngram_idf,
count_vectorizer(stp_token, 13) cnt_vec_comments,
tf_idf(token_comments, 10, 1) as cmts_idf
)
AS SELECT * FROM movie_review;
临时表示例:
CREATE TEMP TABLE ctas_transform_table
TRANSFORM(
String_Indexer(additional_comments) si_add_comments,
one_hot_encoder(si_add_comments) as ohe_add_comments,
tokenizer(comments) as token_comments,
stop_words_remover(token_comments, array('and','very','much')) stp_token,
ngram(stp_token, 3) ngram_token,
tf_idf(ngram_token, 20) ngram_idf,
count_vectorizer(stp_token, 13) cnt_vec_comments,
tf_idf(token_comments, 10, 1) as cmts_idf
)
AS SELECT * FROM movie_review;
限制和行为 limitations-and-behavior
在将TRANSFORM子句与CREATE TABLE或CREATE TEMP TABLE一起使用时,请记住以下限制:
- 如果有任何变换函数生成矢量输出,则它会自动转换为数组。
- 因此,使用
TRANSFORM创建的表不能直接在CREATE MODEL语句中使用。 必须在模型创建期间重新定义转换逻辑,以生成相应的特征向量。 - 转换仅在表创建期间应用。 插入到具有
INSERT INTO的表中的新数据是未自动转换。 To apply transformations to new data, you must recreate the table usingCREATE TABLE AS SELECTwith theTRANSFORMclause. - This method is intended for previewing and validating transformations at a point in time, not for building reusable transformation pipelines.
TRANSFORM clause transform
Use the TRANSFORM clause to apply one or more feature engineering functions to a dataset before model training or table creation. This clause lets you preview, validate, or define the exact shape of your input features.
The TRANSFORM clause can be used in the following statements:
CREATE MODELCREATE TABLECREATE TEMP TABLE
See the Models documentation for detailed instructions on using CREATE MODEL, including how to define transformations, set model options, and configure training data.
For usage with CREATE TABLE, see the CREATE TABLE AS SELECT section.
CREATE MODEL example
CREATE MODEL review_model
TRANSFORM(
String_Indexer(additional_comments) si_add_comments,
one_hot_encoder(si_add_comments) AS ohe_add_comments,
tokenizer(comments) AS token_comments,
stop_words_remover(token_comments, array('and','very','much')) AS stp_token,
ngram(stp_token, 3) AS ngram_token,
tf_idf(ngram_token, 20) AS ngram_idf,
count_vectorizer(stp_token, 13) AS cnt_vec_comments,
tf_idf(token_comments, 10, 1) AS cmts_idf,
vector_assembler(array(cmts_idf, viewsgot, ohe_add_comments, ngram_idf, cnt_vec_comments)) AS features
)
OPTIONS(MODEL_TYPE='logistic_reg', LABEL='reviews')
AS SELECT * FROM movie_review_e2e_DND;
限制 limitations
The following limitations apply when using TRANSFORM with CREATE TABLE. See CREATE TABLE AS SELECT limitations and behavior section for a detailed explanation of how transformed data is stored, how vector outputs are handled, and why the results cannot be reused directly in model training workflows.
- Vector outputs are automatically converted to arrays, which cannot be used directly in
CREATE MODEL. - Transformation logic is not persisted as metadata and cannot be reused across batches.
INSERT INTO
The INSERT INTO command is defined as follows:
INSERT INTO is the only supported data manipulation command, update and delete operations are not available. To reflect changes in your data, insert new records that represent the desired state.INSERT INTO table_name select_query
table_nameselect_query示例
INSERT INTO Customers SELECT SupplierName, City, Country FROM OnlineCustomers;
INSERT INTO Customers AS (SELECT * from OnlineCustomers SNAPSHOT AS OF 345)
SELECT statement in parentheses (). Also, the schema of the result of the SELECT statement must conform to that of the table defined in the INSERT INTO statement. You can provide a SNAPSHOT clause to read incremental deltas into the target table.Most fields in a real XDM schema are not found at the root level and SQL does not permit the use of dot notation. To achieve a realistic result using nested fields, you must map each field in your INSERT INTO path.
To INSERT INTO nested paths, use the following syntax:
INSERT INTO [dataset]
SELECT struct([source field1] as [target field in schema],
[source field2] as [target field in schema],
[source field3] as [target field in schema]) [tenant name]
FROM [dataset]
示例
INSERT INTO Customers SELECT struct(SupplierName as Supplier, City as SupplierCity, Country as SupplierCountry) _Adobe FROM OnlineCustomers;
DROP TABLE
The DROP TABLE command drops an existing table and deletes the directory associated with the table from the file system if it is not an external table. If the table does not exist, an exception occurs.
DROP TABLE [IF EXISTS] [db_name.]table_name
IF EXISTSCREATE DATABASE
The CREATE DATABASE command creates an Azure Data Lake Storage (ADLS) database.
CREATE DATABASE [IF NOT EXISTS] db_name
DROP DATABASE
The DROP DATABASE command deletes the database from an instance.
DROP DATABASE [IF EXISTS] db_name
IF EXISTSDROP SCHEMA
The DROP SCHEMA command drops an existing schema.
DROP SCHEMA [IF EXISTS] db_name.schema_name [ RESTRICT | CASCADE]
IF EXISTSRESTRICTCASCADECREATE VIEW create-view
An SQL view is a virtual table based on the result-set of an SQL statement. Create a view with the CREATE VIEW statement and give it a name. You can then use that name to refer back to the results of the query. This makes it easier to reuse complex queries.
The following syntax defines a CREATE VIEW query for a dataset. 此数据集可以是ADLS或加速存储数据集。
CREATE VIEW view_name AS select_query
示例
CREATE VIEW V1 AS SELECT color, type FROM Inventory
CREATE OR REPLACE VIEW V1 AS SELECT model, version FROM Inventory
以下语法定义了在数据库和架构的上下文中创建视图的CREATE VIEW查询。
示例
CREATE VIEW db_name.schema_name.view_name AS select_query
CREATE OR REPLACE VIEW db_name.schema_name.view_name AS select_query
db_nameschema_nameview_nameselect_query示例
CREATE VIEW <dbV1 AS SELECT color, type FROM Inventory;
CREATE OR REPLACE VIEW V1 AS SELECT model, version FROM Inventory;
显示视图
以下查询显示视图列表。
SHOW VIEWS;
Db Name | Schema Name | Name | Id | Dataset Dependencies | Views Dependencies | TYPE
|----------------------------------------------------------------------------------------------
qsaccel | profile_agg | view1 | view_id1 | dwh_dataset1 | | DWH
| | view2 | view_id2 | adls_dataset | adls_views | ADLS
(2 rows)
放置视图
以下语法定义了DROP VIEW查询:
DROP VIEW [IF EXISTS] view_name
IF EXISTSview_name示例
DROP VIEW v1
DROP VIEW IF EXISTS v1
匿名块 anonymous-block
匿名块由两部分组成:可执行部分和异常处理部分。 在匿名块中,可执行部分是必需的。 但是,“例外处理”部分是可选的。
以下示例说明如何创建包含一个或多个要一起执行的语句的块:
$$BEGIN
statementList
[EXCEPTION exceptionHandler]
$$END
exceptionHandler:
WHEN OTHERS
THEN statementList
statementList:
: (statement (';')) +
以下是使用匿名块的示例。
$$BEGIN
SET @v_snapshot_from = select parent_id from (select history_meta('email_tracking_experience_event_dataset') ) tab where is_current;
SET @v_snapshot_to = select snapshot_id from (select history_meta('email_tracking_experience_event_dataset') ) tab where is_current;
SET @v_log_id = select now();
CREATE TABLE tracking_email_id_incrementally
AS SELECT _id AS id FROM email_tracking_experience_event_dataset SNAPSHOT BETWEEN @v_snapshot_from AND @v_snapshot_to;
EXCEPTION
WHEN OTHERS THEN
DROP TABLE IF EXISTS tracking_email_id_incrementally;
SELECT 'ERROR';
$$END;
匿名块中的条件语句 conditional-anonymous-block-statements
当条件被评估为TRUE时,IF-THEN-ELSE控制结构允许有条件地执行语句列表。 此控制结构仅适用于匿名块。 如果此结构用作独立命令,则会导致语法错误(“匿名块外部的命令无效”)。
下面的代码段演示了匿名块中IF-THEN-ELSE条件语句的正确格式。
IF booleanExpression THEN
List of statements;
ELSEIF booleanExpression THEN
List of statements;
ELSEIF booleanExpression THEN
List of statements;
ELSE
List of statements;
END IF
示例
以下示例执行SELECT 200;。
$$BEGIN
SET @V = SELECT 2;
SELECT @V;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT 200;
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT 'DEFAULT';
END IF;
END$$;
此结构可与raise_error();一起使用以返回自定义错误消息。 下面所示的代码块使用“自定义错误消息”终止匿名块。
示例
$$BEGIN
SET @V = SELECT 5;
SELECT @V;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT 200;
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT raise_error('custom error message');
END IF;
END$$;
嵌套IF语句
匿名块中支持嵌套的IF语句。
示例
$$BEGIN
SET @V = SELECT 1;
IF @V = 1 THEN
SELECT 100;
IF @V > 0 THEN
SELECT 1000;
END IF;
END IF;
END$$;
异常块
匿名块中支持异常块。
示例
$$BEGIN
SET @V = SELECT 2;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT raise_error(concat('custom-error for v= ', '@V' ));
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT 'DEFAULT';
END IF;
EXCEPTION WHEN OTHERS THEN
SELECT 'THERE WAS AN ERROR';
END$$;
自动转换为JSON auto-to-json
查询服务支持可选会话级别设置,以便从JSON字符串形式的交互式SELECT查询返回顶级复杂字段。 auto_to_json设置允许将复杂字段中的数据作为JSON返回,然后使用标准库解析为JSON对象。
在执行包含复杂字段的SELECT查询之前,将功能标志auto_to_json设置为true。
set auto_to_json=true;
在设置auto_to_json标记之前
下表提供了应用auto_to_json设置之前的示例查询结果。 这两种方案都使用同一个SELECT查询(如下所示),该查询用于定向具有复杂字段的表。
SELECT * FROM TABLE_WITH_COMPLEX_FIELDS LIMIT 2;
结果如下:
_id | _experience | application | commerce | dataSource | device | endUserIDs | environment | identityMap | placeContext | receivedTimestamp | timestamp | userActivityRegion | web | _adcstageforpqs
|-----------------------------------+----------------------------------------------------------------------------+--------------+----------------------------------------------+------------+--------------------------------------------------------------------+--------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------------------------------+-----------------------+-----------------------+--------------------+--------------------------------------------------------------------------------------+-----------------
31892EE15DE00000-401D52664FF48A52 | ("("("(1,1)","(1,1)")","(-209479095,4085488201,-2105158467,2189808829)")") | (background) | (NULL,"(USD,NULL)",NULL,NULL,NULL,NULL,NULL) | (475341) | (32,768,1024,205202,https://ns.adobe.com/xdm/external/deviceatlas) | ("("(31892EE080007B35-E6CE00000000000,"(AAID)",t)")") | ("(en-US,f,f,t,1.6,"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7",490,1125)",xo.net,64.3.235.13) | [AAID -> "{(31892EE080007B35-E6CE00000000000,t)}"] | ("("(34.01,-84.0)",lawrenceville,US,524,30043,ga)",600) | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | (UT1) | ("(f,Search Results,"(1.0)")","(http://www.google.com/search?ie=UTF-8&q=,internal)") |
31892EE15DE00000-401B92664FF48AE8 | ("("("(1,1)","(1,1)")","(-209479095,4085488201,-2105158467,2189808829)")") | (background) | (NULL,"(USD,NULL)",NULL,NULL,NULL,NULL,NULL) | (475341) | (32,768,1024,205202,https://ns.adobe.com/xdm/external/deviceatlas) | ("("(31892EE100007BF3-215FE00000000001,"(AAID)",t)")") | ("(en-US,f,f,t,1.5,"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7",768,556)",ntt.net,219.165.108.145) | [AAID -> "{(31892EE100007BF3-215FE00000000001,t)}"] | ("("(34.989999999999995,138.42)",shizuoka,JP,392005,420-0812,22)",-240) | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | (UT1) | ("(f,Home - JJEsquire,"(1.0)")","(NULL,typed_bookmarked)") |
(2 rows)
设置auto_to_json标志后
下表显示了auto_to_json设置与结果数据集在结果中的差异。 这两种方案都使用了相同的SELECT查询。
_id | receivedTimestamp | timestamp | _experience | application | commerce | dataSource | device | endUserIDs | environment | identityMap | placeContext | userActivityRegion | web | _adcstageforpqs
|-----------------------------------+-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
31892EE15DE00000-401D52664FF48A52 | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | {"analytics":{"customDimensions":{"eVars":{"eVar1":"1","eVar2":"1"},"props":{"prop1":"1","prop2":"1"}},"environment":{"browserID":-209479095,"browserIDStr":"4085488201","operatingSystemID":-2105158467,"operatingSystemIDStr":"2189808829"}}} | {"userPerspective":"background"} | {"order":{"currencyCode":"USD"}} | {"_id":"475341"} | {"colorDepth":32,"screenHeight":768,"screenWidth":1024,"typeID":"205202","typeIDService":"https://ns.adobe.com/xdm/external/deviceatlas"} | {"_experience":{"aaid":{"id":"31892EE080007B35-E6CE00000000000","namespace":{"code":"AAID"},"primary":true}}} | {"browserDetails":{"acceptLanguage":"en-US","cookiesEnabled":false,"javaEnabled":false,"javaScriptEnabled":true,"javaScriptVersion":"1.6","userAgent":"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7","viewportHeight":490,"viewportWidth":1125},"domain":"xo.net","ipV4":"64.3.235.13"} | {"AAID":[{"id":"31892EE080007B35-E6CE00000000000","primary":true}]} | {"geo":{"_schema":{"latitude":34.01,"longitude":-84.0},"city":"lawrenceville","countryCode":"US","dmaID":524,"postalCode":"30043","stateProvince":"ga"},"localTimezoneOffset":600} | {"dataCenterLocation":"UT1"} | {"webPageDetails":{"isHomePage":false,"name":"Search Results","pageViews":{"value":1.0}},"webReferrer":{"URL":"http://www.google.com/search?ie=UTF-8&q=","type":"internal"}} |
31892EE15DE00000-401B92664FF48AE8 | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | {"analytics":{"customDimensions":{"eVars":{"eVar1":"1","eVar2":"1"},"props":{"prop1":"1","prop2":"1"}},"environment":{"browserID":-209479095,"browserIDStr":"4085488201","operatingSystemID":-2105158467,"operatingSystemIDStr":"2189808829"}}} | {"userPerspective":"background"} | {"order":{"currencyCode":"USD"}} | {"_id":"475341"} | {"colorDepth":32,"screenHeight":768,"screenWidth":1024,"typeID":"205202","typeIDService":"https://ns.adobe.com/xdm/external/deviceatlas"} | {"_experience":{"aaid":{"id":"31892EE100007BF3-215FE00000000001","namespace":{"code":"AAID"},"primary":true}}} | {"browserDetails":{"acceptLanguage":"en-US","cookiesEnabled":false,"javaEnabled":false,"javaScriptEnabled":true,"javaScriptVersion":"1.5","userAgent":"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7","viewportHeight":768,"viewportWidth":556},"domain":"ntt.net","ipV4":"219.165.108.145"} | {"AAID":[{"id":"31892EE100007BF3-215FE00000000001","primary":true}]} | {"geo":{"_schema":{"latitude":34.989999999999995,"longitude":138.42},"city":"shizuoka","countryCode":"JP","dmaID":392005,"postalCode":"420-0812","stateProvince":"22"},"localTimezoneOffset":-240} | {"dataCenterLocation":"UT1"} | {"webPageDetails":{"isHomePage":false,"name":"Home - JJEsquire","pageViews":{"value":1.0}},"webReferrer":{"type":"typed_bookmarked"}} |
(2 rows)
解决故障时的回退快照 resolve-fallback-snapshot-on-failure
resolve_fallback_snapshot_on_failure选项用于解决快照ID过期的问题。
将resolve_fallback_snapshot_on_failure选项设置为true以使用以前的快照ID覆盖快照。
SET resolve_fallback_snapshot_on_failure=true;
以下代码行使用元数据中最早可用的snapshot_id覆盖@from_snapshot_id。
$$ BEGIN
SET resolve_fallback_snapshot_on_failure=true;
SET @from_snapshot_id = SELECT coalesce(last_snapshot_id, 'HEAD') FROM checkpoint_log a JOIN
(SELECT MAX(process_timestamp)process_timestamp FROM checkpoint_log
WHERE process_name = 'DIM_TABLE_ABC' AND process_status = 'SUCCESSFUL' )b
on a.process_timestamp=b.process_timestamp;
SET @to_snapshot_id = SELECT snapshot_id FROM (SELECT history_meta('DIM_TABLE_ABC')) WHERE is_current = true;
SET @last_updated_timestamp= SELECT CURRENT_TIMESTAMP;
INSERT INTO DIM_TABLE_ABC_Incremental
SELECT * FROM DIM_TABLE_ABC SNAPSHOT BETWEEN @from_snapshot_id AND @to_snapshot_id WHERE NOT EXISTS (SELECT _id FROM DIM_TABLE_ABC_Incremental a WHERE _id=a._id);
Insert Into
checkpoint_log
SELECT
'DIM_TABLE_ABC' process_name,
'SUCCESSFUL' process_status,
cast( @to_snapshot_id AS string) last_snapshot_id,
cast( @last_updated_timestamp AS TIMESTAMP) process_timestamp;
EXCEPTION
WHEN OTHERS THEN
SELECT 'ERROR';
END
$$;
数据资产组织
随着数据资产的增长,在Adobe Experience Platform数据湖中对它们进行逻辑组织非常重要。 查询服务扩展了SQL构造,使您能够按逻辑对沙盒中的数据资产进行分组。 这种组织方法允许在架构之间共享数据资产,而无需在物理上移动它们。
您可以使用标准SQL语法支持以下SQL结构,以便从逻辑上组织数据。
CREATE DATABASE dg1;
CREATE SCHEMA dg1.schema1;
CREATE table t1 ...;
CREATE view v1 ...;
ALTER TABLE t1 ADD PRIMARY KEY (c1) NOT ENFORCED;
ALTER TABLE t2 ADD FOREIGN KEY (c1) REFERENCES t1(c1) NOT ENFORCED;
有关查询服务最佳实践的更详细说明,请参阅数据资产的逻辑组织指南。
表存在
table_exists SQL命令用于确认系统中当前是否存在表。 如果表确实存在,则命令返回布尔值: true;如果表确实存在确实不存在,则返回false。
通过在运行语句之前验证表是否存在,table_exists功能简化了编写匿名块以涵盖CREATE和INSERT INTO用例的过程。
以下语法定义table_exists命令:
$$
BEGIN
#Set mytableexist to true if the table already exists.
SET @mytableexist = SELECT table_exists('target_table_name');
#Create the table if it does not already exist (this is a one time operation).
CREATE TABLE IF NOT EXISTS target_table_name AS
SELECT *
FROM profile_dim_date limit 10;
#Insert data only if the table already exists. Check if @mytableexist = 'true'
INSERT INTO target_table_name (
select *
from profile_dim_date
WHERE @mytableexist = 'true' limit 20
) ;
EXCEPTION
WHEN OTHERS THEN SELECT 'ERROR';
END $$;
内嵌 inline
inline函数将结构数组的元素分隔并生成表中的值。 它只能放在SELECT列表或LATERAL VIEW中。
inline函数不能放在存在其他生成器函数的选择列表中。
默认情况下,生成的列将命名为“col1”、“col2”等。 如果表达式为NULL,则不会生成任何行。
RENAME命令重命名列名。示例
> SELECT inline(array(struct(1, 'a'), struct(2, 'b'))), 'Spark SQL';
此示例返回以下内容:
1 a Spark SQL
2 b Spark SQL
第二个示例进一步演示了inline函数的概念和应用。 下图说明了示例的数据模型。
示例
select inline(productListItems) from source_dataset limit 10;
从source_dataset中获取的值用于填充目标表。
设置
SET命令设置属性,并返回现有属性的值或列出所有现有属性。 如果为现有的属性键提供了值,则会覆盖旧值。
SET property_key = property_value
property_keyproperty_value要返回任何设置的值,请使用不带property_value的SET [property key]。
PostgreSQL命令
以下子部分介绍了查询服务支持的PostgreSQL命令。
分析表 analyze-table
ANALYZE TABLE命令对命名表执行分布分析和统计计算。 根据数据集是存储在加速存储还是数据湖中,ANALYZE TABLE的使用会有所不同。 有关其使用的更多信息,请参阅各自的部分。
加速存储的计算统计信息 compute-statistics-accelerated-store
ANALYZE TABLE命令计算加速存储上表的统计信息。 统计信息是在加速存储上给定表的已执行CTAS或ITAS查询中计算的。
示例
ANALYZE TABLE <original_table_name>
以下是使用ANALYZE TABLE命令:-后可用的统计计算列表
fielddata-typecountdistinct-countmissingmaxminmeanstdevCOMPUTE STATISTICS on the data lake compute-statistics-data-lake
You can now calculate column-level statistics on Azure Data Lake Storage (ADLS) datasets with the COMPUTE STATISTICS SQL command. Compute column statistics on either the entire dataset, a subset of a dataset, all columns, or a subset of columns.
COMPUTE STATISTICS extends the ANALYZE TABLE command. However, the COMPUTE STATISTICS, FILTERCONTEXT, and FOR COLUMNS commands are not supported on accelerated store tables. These extensions for the ANALYZE TABLE command are currently only supported for ADLS tables.
示例
ANALYZE TABLE tableName FILTERCONTEXT (timestamp >= to_timestamp('2023-04-01 00:00:00') and timestamp <= to_timestamp('2023-04-05 00:00:00')) COMPUTE STATISTICS FOR COLUMNS (commerce, id, timestamp);
The FILTER CONTEXT command calculates statistics on a subset of the dataset based on the filter condition provided. The FOR COLUMNS command targets specific columns for analysis.
Statistics ID and the statistics generated are only valid for each session and cannot be accessed across different PSQL sessions.Limitations:
- Statistics generation is not supported for array or map data types
- Computed statistics are not persisted across sessions.
skip_stats_for_complex_datatypes
SET skip_stats_for_complex_datatypes = falseThe console output appears as seen below.
| Statistics ID |
| ---------------------- |
| adc_geometric_stats_1 |
(1 row)
然后,可以通过引用Statistics ID直接查询计算统计信息。 使用Statistics ID或下面示例语句中所示的别名来完整查看输出。 要了解有关此功能的更多信息,请参阅别名文档。
-- This statement gets the statistics generated for `alias adc_geometric_stats_1`.
SELECT * FROM adc_geometric_stats_1;
使用SHOW STATISTICS命令显示会话中生成的所有临时统计数据的元数据。 此命令可帮助您优化统计分析的范围。
SHOW STATISTICS;
下面显示了SHOW STATISTICS的输出示例。
statsId | tableName | columnSet | filterContext | timestamp
|----------------------+---------------+-----------+-----------------------------+--------------------
adc_geometric_stats_1 | adc_geometric | (age) | | 25/06/2023 09:22:26
demo_table_stats_1 | demo_table | (*) | ((age > 25)) | 25/06/2023 12:50:26
age_stats | castedtitanic | (age) | ((age > 25) AND (age < 40)) | 25/06/2023 09:22:26
有关详细信息,请参阅数据集统计信息文档。
表示例 tablesample
Adobe Experience Platform查询服务提供了示例数据集,作为其近似查询处理功能的一部分。
当不需要对数据集进行聚合操作的确切答案时,最好使用数据集示例。 若要通过发出近似查询以返回近似答案来对大型数据集执行更有效的探索性查询,请使用TABLESAMPLE功能。
使用来自现有Azure Data Lake Storage (ADLS)数据集的统一随机样本创建样本数据集,仅使用来自原始数据集的记录百分比。 数据集示例功能使用TABLESAMPLE和SAMPLERATE SQL命令扩展ANALYZE TABLE命令。
在下面的示例中,第一行演示如何计算表格的5%样本。 第二行演示如何从表中数据的过滤视图中计算5%的样本。
示例
ANALYZE TABLE tableName TABLESAMPLE SAMPLERATE 5;
ANALYZE TABLE tableName FILTERCONTEXT (timestamp >= to_timestamp('2023-01-01')) TABLESAMPLE SAMPLERATE 5:
有关详细信息,请参阅数据集示例文档。
开始
BEGIN命令,或者BEGIN WORK或BEGIN TRANSACTION命令启动事务块。 在给出显式的COMMIT或ROLLBACK命令之前,在开始命令之后输入的任何语句都将在单个事务中执行。 此命令与START TRANSACTION相同。
BEGIN
BEGIN WORK
BEGIN TRANSACTION
关闭
CLOSE命令释放与打开游标关联的资源。 游标关闭后,不允许对其执行任何后续操作。 当不再需要游标时,应将其关闭。
CLOSE name
CLOSE ALL
如果使用CLOSE name,则name表示必须关闭的打开游标的名称。 如果使用CLOSE ALL,则将关闭所有打开的游标。
取消分配
要取消分配以前准备的SQL语句,请使用DEALLOCATE命令。 如果未显式取消分配预准备语句,则该语句将在会话结束时取消分配。 有关预准备语句的详细信息,请参阅PREPARE命令部分。
DEALLOCATE name
DEALLOCATE ALL
如果使用DEALLOCATE name,则name表示必须取消分配的准备语句的名称。 如果使用DEALLOCATE ALL,则会取消分配所有准备语句。
声明
DECLARE命令允许用户创建游标,该游标可用于从较大的查询中检索少量行。 创建游标后,将使用FETCH从中获取行。
DECLARE name CURSOR FOR query
namequerySELECT或VALUES命令。执行
EXECUTE命令用于执行以前准备的语句。 由于预准备语句仅存在于会话期间,因此预准备语句必须由在当前会话中较早执行的PREPARE语句创建。 有关使用预准备语句的详细信息,请参阅PREPARE命令部分。
如果创建该语句的PREPARE语句指定了某些参数,则必须将一组兼容的参数传递到EXECUTE语句。 如果未传入这些参数,则会引发错误。
EXECUTE name [ ( parameter ) ]
nameparameter说明
EXPLAIN命令显示所提供语句的执行计划。 执行计划显示了如何扫描语句引用的表。 如果引用了多个表,它会显示使用哪些联接算法将每个输入表中的所需行组合在一起。
EXPLAIN statement
要定义响应的格式,请使用FORMAT关键字和EXPLAIN命令。
EXPLAIN FORMAT { TEXT | JSON } statement
FORMATFORMAT命令指定输出格式。 可用选项为TEXT或JSON。 非文本输出包含的信息与文本输出格式相同,但程序更容易解析。 此参数默认为TEXT。statementSELECT、INSERT、UPDATE、DELETE、VALUES、EXECUTE、DECLARE、CREATE TABLE AS或CREATE MATERIALIZED VIEW AS语句。EXPLAIN关键字运行时,SELECT语句可能返回的任何输出都将被丢弃。 这一声明的其他副作用照常发生。示例
以下示例显示了对具有单个integer列和10000行的表进行简单查询的计划:
EXPLAIN SELECT * FROM foo;
QUERY PLAN
|---------------------------------------------------------
Seq Scan on foo (dataSetId = "6307eb92f90c501e072f8457", dataSetName = "foo") [0,1000000242,6973776840203d3d,6e616c58206c6153,6c6c6f430a3d4d20,74696d674c746365]
(1 row)
FETCH
FETCH命令使用以前创建的游标检索行。
FETCH num_of_rows [ IN | FROM ] cursor_name
num_of_rowscursor_name准备 prepare
使用PREPARE命令可以创建预准备语句。 预准备语句是服务器端对象,可用于对类似的SQL语句进行模板化。
预准备语句可以接受参数,这些参数是在执行语句时替换到该语句中的值。 在使用预准备语句时,参数由位置引用,使用$1、$2等。
或者,您可以指定参数数据类型的列表。 如果未列出参数的数据类型,则可以从上下文推断该类型。
PREPARE name [ ( data_type [, ...] ) ] AS SELECT
namedata_type回滚
ROLLBACK命令撤消当前事务并放弃该事务进行的所有更新。
ROLLBACK
ROLLBACK WORK
选择范围
SELECT INTO命令创建一个新表,并用查询计算的数据填充该表。 数据不会返回到客户端,因为它使用的是普通SELECT命令。 新表的列具有与SELECT命令的输出列关联的名称和数据类型。
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
* | expression [ [ AS ] output_name ] [, ...]
INTO [ TEMPORARY | TEMP | UNLOGGED ] [ TABLE ] new_table
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
[ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
有关标准SELECT查询参数的详细信息,请参阅SELECT查询节。 此部分仅列出SELECT INTO命令独有的参数。
TEMPORARY 或 TEMPUNLOGGEDnew_table示例
以下查询创建新的表films_recent,其中仅包含表films中的最近条目:
SELECT * INTO films_recent FROM films WHERE date_prod >= '2002-01-01';
显示
SHOW命令显示运行时参数的当前设置。 可以使用SET语句、通过编辑postgresql.conf配置文件来设置这些变量、通过PGOPTIONS环境变量(在使用libpq或基于libpq的应用程序时)或通过命令行标志(在启动Postgres服务器时)来设置这些变量。
SHOW name
SHOW ALL
nameSERVER_VERSION:此参数显示服务器的版本号。SERVER_ENCODING:此参数显示服务器端字符集编码。LC_COLLATE:此参数显示数据库的归类(文本排序)区域设置。LC_CTYPE:此参数显示数据库的字符分类区域设置。IS_SUPERUSER:此参数显示当前角色是否具有超级用户权限。ALL示例
以下查询显示参数DateStyle的当前设置。
SHOW DateStyle;
DateStyle
|-----------
ISO, MDY
(1 row)
复制
COPY命令将任何SELECT查询的输出复制到指定位置。 用户必须具有此位置的访问权限,此命令才能成功。
COPY query
TO '%scratch_space%/folder_location'
[ WITH FORMAT 'format_name']
queryformat_nameformat_name可以是parquet、csv或json之一。 默认情况下,值为parquet。adl://<ADLS_URI>/users/<USER_ID>/acp_foundation_queryService/folder_location/<QUERY_ID>更改表 alter-table
使用ALTER TABLE命令可以添加或删除主键或外键约束,并向表中添加列。
添加或删除约束
以下SQL查询显示了向表添加或删除约束的示例。 可使用逗号分隔值将主键和外键约束添加到多个列中。 您可以通过传递两个或多个列名称值来创建组合键,如下面的示例所示。
定义主键或复合键
ALTER TABLE table_name ADD CONSTRAINT PRIMARY KEY ( column_name ) NAMESPACE namespace
ALTER TABLE table_name ADD CONSTRAINT PRIMARY KEY ( column_name1, column_name2 ) NAMESPACE namespace
根据一个或多个键定义表之间的关系
ALTER TABLE table_name ADD CONSTRAINT FOREIGN KEY ( column_name ) REFERENCES referenced_table_name ( primary_column_name )
ALTER TABLE table_name ADD CONSTRAINT FOREIGN KEY ( column_name1, column_name2 ) REFERENCES referenced_table_name ( primary_column_name1, primary_column_name2 )
定义标识列
ALTER TABLE table_name ADD CONSTRAINT PRIMARY IDENTITY ( column_name ) NAMESPACE namespace
ALTER TABLE table_name ADD CONSTRAINT IDENTITY ( column_name ) NAMESPACE namespace
删除约束/关系/标识
ALTER TABLE table_name DROP CONSTRAINT PRIMARY KEY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT PRIMARY KEY ( column_name1, column_name2 )
ALTER TABLE table_name DROP CONSTRAINT FOREIGN KEY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT FOREIGN KEY ( column_name1, column_name2 )
ALTER TABLE table_name DROP CONSTRAINT PRIMARY IDENTITY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT IDENTITY ( column_name )
table_namecolumn_namereferenced_table_nameprimary_column_name添加或删除主标识和辅助标识
要添加或删除主标识表和辅助标识表列的约束,请使用ALTER TABLE命令。
以下示例通过添加约束来添加主标识和辅助标识。
ALTER TABLE t1 ADD CONSTRAINT PRIMARY IDENTITY (id) NAMESPACE 'IDFA';
ALTER TABLE t1 ADD CONSTRAINT IDENTITY(id) NAMESPACE 'IDFA';
也可以通过删除约束来删除身份,如下面的示例所示。
ALTER TABLE t1 DROP CONSTRAINT PRIMARY IDENTITY (c1) ;
ALTER TABLE t1 DROP CONSTRAINT IDENTITY (c1) ;
有关更多详细信息,请参阅有关在临时数据集中设置标识的文档。
添加列
以下SQL查询显示了向表添加列的示例。
ALTER TABLE table_name ADD COLUMN column_name data_type
ALTER TABLE table_name ADD COLUMN column_name_1 data_type1, column_name_2 data_type2
支持的数据类型
下表列出了向Azure SQL中具有Postgres SQL、XDM和Accelerated Database Recovery (ADR)的表添加列时接受的数据类型。
bigintint8bigintintegerint4integersmallintint2smallinttinyintint1tinyintvarchar(len)stringvarchar(len)varchar。doublefloat8double precisionFLOAT8和FLOAT是DOUBLE PRECISION的有效同义词。 double precision是浮点数据类型。 浮点值以8字节为单位存储。double precisionfloat8double precisionFLOAT8是double precision.double precision的有效同义词 是浮点数据类型。 浮点值以8字节为单位存储。datedatedatedate数据类型是4字节存储的日历日期值,没有任何时间戳信息。 有效日期的范围为01-01-0001到12-31-9999。datetimedatetimedatetimedatetime包含限定符:年、月、日、小时、秒和分数。 datetime声明可以包括在该序列中连接这些时间单位的任何子集,或者甚至只包括单个时间单位。char(len)stringchar(len)char(len)关键字用于指示该项为固定长度字符。ADD SCHEMA
The following SQL query shows an example of adding a table to a database / schema.
ALTER TABLE table_name ADD SCHEMA database_name.schema_name
REMOVE SCHEMA
The following SQL query shows an example of removing a table from a database / schema.
ALTER TABLE table_name REMOVE SCHEMA database_name.schema_name
参数
table_namecolumn_namedata_typeSHOW PRIMARY KEYS
The SHOW PRIMARY KEYS command lists all the primary key constraints for the given database.
SHOW PRIMARY KEYS
tableName | columnName | datatype | namespace
|------------------+----------------------+----------+-----------
table_name_1 | column_name1 | text | "ECID"
table_name_2 | column_name2 | text | "AAID"
SHOW FOREIGN KEYS
The SHOW FOREIGN KEYS command lists all the foreign key constraints for the given database.
SHOW FOREIGN KEYS
tableName | columnName | datatype | referencedTableName | referencedColumnName | namespace
|------------------+---------------------+----------+---------------------+----------------------+-----------
table_name_1 | column_name1 | text | table_name_3 | column_name3 | "ECID"
table_name_2 | column_name2 | text | table_name_4 | column_name4 | "AAID"
SHOW DATAGROUPS
The SHOW DATAGROUPS command returns a table of all associated databases. For each database, the table includes schema, group type, child type, child name, and child ID.
SHOW DATAGROUPS
Database | Schema | GroupType | ChildType | ChildName | ChildId
-------------+-------------------+-----------+----------------------+----------------------------------------------------+--------------------------------------
adls_db | adls_scheema | ADLS | Data Lake Table | adls_table1 | 6149ff6e45cfa318a76ba6d3
adls_db | adls_scheema | ADLS | Accelerated Store | _table_demo1 | 22df56cf-0790-4034-bd54-d26d55ca6b21
adls_db | adls_scheema | ADLS | View | adls_view1 | c2e7ddac-d41c-40c5-a7dd-acd41c80c5e9
adls_db | adls_scheema | ADLS | View | adls_view4 | b280c564-df7e-405f-80c5-64df7ea05fc3
SHOW DATAGROUPS FOR table
The SHOW DATAGROUPS FOR 'table_name' command returns a table of all associated databases that contain the parameter as its child. For each database, the table includes schema, group type, child type, child name, and child ID.
SHOW DATAGROUPS FOR 'table_name'
参数
table_name: The name of the table that you want to find associated databases for.
Database | Schema | GroupType | ChildType | ChildName | ChildId
-------------+-------------------+-----------+----------------------+----------------------------------------------------+--------------------------------------
dwh_db_demo | schema2 | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce
dwh_db_demo | schema1 | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce
qsaccel | profile_aggs | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce