Use exists to discover whether a product with a specific SKU exists
exists(array<T>, function<T, boolean>): boolean
In the snippet above, the exists
function is applied to each element of the array and returns a boolean value. The boolean indicates if there are one or more elements in the array that satisfy a specified condition. In this case, it confirms whether a product with a specific SKU exists.
Example
In the SQL example below, the query fetches productListItems
from the geometrixxx_999_xdm_pqs_1batch_10k_rows
table and evaluates whether an element with an SKU equal to 123679
in the productListItems
array exists. It then filters the results based on a specific range of timestamps and limits the final results to ten rows.
SELECT productListItems
FROM geometrixxx_999_xdm_pqs_1batch_10k_rows
WHERE EXISTS( productListItems, value -> value.sku == 123679)
AND timestamp > to_timestamp('2017-11-01 00:00:00')
AND timestamp < to_timestamp('2017-11-02 00:00:00')limit 10;
Result
Results for this SQL would appear similar to those seen below.
productListItems
-----------------
{(123679, NULL,NULL)}
{(123679, NULL, NULL)}
{(123679, NULL, NULL), (150196, NULL, NULL)}
{(123679, NULL, NULL), (150196, NULL, NULL)}
{(123679, NULL, NULL), (150196, NULL, NULL)}
{(123679, NULL, NULL)}
{(123679, NULL, NULL)}
{(123679, NULL, NULL)}
{(123679, NULL,NULL)}
{(123679,NULL, NULL)}
(10 rows)
Use filter to find products where the SKU > 100000
filter(array<T>, function<T, boolean>): array<T>
This function filters an array of elements based on a given condition that evaluates each element as a boolean value. It then returns a new array that only includes elements where the condition returned a true value.
Example
The query below selects the productListItems
column, applies a filter to include only elements with SKU greater than 100000, and restricts the result set to rows within a specific timestamp range. The filtered array is then aliased as _filter
in the output.
SELECT productListItems,
Filter(productListItems, value -> value.sku > 100000) AS _filter
FROM geometrixxx_999_xdm_pqs_1batch_10k_rows
WHERE timestamp > To_timestamp('2017-11-01 00:00:00')
AND timestamp < To_timestamp('2017-11-02 00:00:00')
LIMIT 10;
Result
Results for this SQL would appear similar to those seen below.
productListItems | _filter
-----------------+---------
(123679, NULL, NULL) (123679, NULL, NULL)
(1346, NULL, NULL) |
(98347, NULL, NULL) |
(176015, NULL, NULL) | (176015, NULL, NULL)
(10 rows)
Use aggregate to sum the SKUs of all product list items associated with a specific ID and double the resulting total
aggregate(array<T>, A, function<A, T, A>[, function<A, R>]): R
This aggregate operation applies a binary operator to an initial state and all elements in the array. It also reduces multiple values to a single state. After this reduction, the final state is then converted into the ultimate result by using a finish function. The finish function takes the last state obtained after applying the binary operator to all array elements and does something with it to produce the final result.
Example
This query example calculates the maximum SKU value from the productListItems
array within the given timestamp range and doubles the result. The output includes the original productListItems
array and the computed max_value
.
SELECT productListItems,
aggregate(productListItems, 0, (acc, value) ->
case
WHEN (
value.sku > acc) THEN cast(value.sku AS int)
ELSE cast(acc AS int)
END, acc -> acc * 2) AS max_value
FROM geometrixxx_999_xdm_pqs_1batch_10k_rows
WHERE timestamp > to_timestamp('2017-11-01 00:00:00')
AND timestamp < to_timestamp('2017-11-02 00:00:00')
LIMIT 50;
Result
Results for this SQL would appear similar to those seen below.
productListItems | max_value
-----------------+---------
(123679, NULL, NULL) | 247358
(1346,NULL, NULL) | 2692
(98347, NULL, NULL) | 196694
(176015, NULL, NULL) | 352030
(10 rows)
Use zip_with to assign a sequence number to all the items in the product list
zip_with(array<T>, array<U>, function<T, U, R>): array<R>
This snippet combines the elements of two arrays into a single new array. The operation is performed independently on each element of the array and generates pairs of values. If one array is shorter, null values are added to match the length of the longer array. This happens before the function is applied.
Example
The following query uses the zip_with
function to create pairs of values from two arrays. It does this by adding the SKU values from the productListItems
array to an integer sequence, which was generated using the Sequence
function. The result is selected alongside the original productListItems
column and is limited based on a timestamp range.
SELECT productListItems,
zip_with(Sequence(1,5), Transform(productListItems, p -> p.sku), (x,y) -> struct(x, y)) AS zip_with
FROM geometrixxx_999_xdm_pqs_1batch_10k_rows
WHERE timestamp > to_timestamp('2017-11-01 00:00:00')
AND timestamp < to_timestamp('2017-11-02 00:00:00')
limit 10;
Result
Results for this SQL would appear similar to those seen below.
productListItems | zip_with
---------------------+---------
| {(1,NULL), (2,NULL), (3,NULL),(4,NULL), (5,NULL)}
(123679, NULL, NULL) | {(1,123679), (2,NULL), (3,NULL), (4,NULL), (5,NULL)}
| {(1,NULL), (2,NULL),(3,NULL),(4,NULL), (5,NULL)}
| {(1,NULL), (2,NULL), (3, NULL),(4,NULL), (5,NULL)}
(1346,NULL, NULL) | {(1,1346), (2,NULL),(3,NULL),(4,NULL), (5,NULL)}
| {(1,NULL), (2,NULL), (3,NULL),(4,NULL), (5,NULL)}
(98347, NULL, NULL) | {(1,98347), (2,NULL), (3,NULL), (4,NULL), (5,NULL)}
| {(1,NULL), (2,NULL), (3,NULL), (4,NULL), (5,NULL)}
(176015, NULL, NULL) | {(1,176015),(2,NULL), (3,NULL), (4,NULL), (5,NULL)}
| {(1,NULL), (2,NULL), (3,NULL), (4,NULL), (5,NULL)}
(10 rows)
Use map_from_entries to assign a sequence number to each item in the product list and obtain the final result as a map
map_from_entries(array<struct<K, V>>): map<K, V>
This snippet converts an array of key-value pairs into a map. It is useful when dealing with key-value pair data that could benefit from a more organized and efficient structure.
Example
The following query creates pairs of values from a sequence and the productListItems array, converts these pairs into a map using map_from_entries, and then selects the original productListItems column along with the newly created map_from_entries column. The result is filtered and limited based on the specified timestamp range.
SELECT productListItems, map_from_entries(zip_with(Sequence(1,Size(productListItems)), productListItems, (x,y) -> struct(x, y))) AS map_from_entries
FROM geometrixxx_999_xdm_pqs_1batch_10k_rows
WHERE timestamp > to_timestamp('2017-11-01 00:00:00')
AND timestamp < to_timestamp('2017-11-02 00:00:00')
LIMIT 10;
Result
Results for this SQL would appear similar to those seen below.
productListItems | map_from_entries
---------------------+------------------
(123679, NULL, NULL) | [1 -> "(123679,NULL,NULL)"]
(1346, NULL, NULL) | [1 -> "(1346, NULL, NULL)"]
(98347, NULL, NULL) | [1 -> "(98347, NULL, NULL)"]
(176015, NULL, NULL) | [1 -> "(176015, NULL, NULL)"]
(92763, NULL, NULL) | [1 -> "(92763, NULL, NULL)"]
(48576, NULL, NULL) | [1 -> "(48576, NULL, NULL)"]
(135778, NULL, NULL) | [1 -> "(135778, NULL, NULL)"]
(123679, NULL, NULL) | [1 -> "(123679, NULL, NULL)"]
(98347, NULL, NULL) | [1 -> "(98347, NULL, NULL)"]
(167753, NULL, NULL) | [1 -> "(167753, NULL, NULL)"]
(10 rows)