Spark SQL functions

Adobe Experience Platform Query Service provides several built-in Spark SQL functions to extend SQL functionality. This document lists the Spark SQL functions that are supported by Query Service.

For more detailed information about the functions, including their syntax, usage, and examples, please read the Spark SQL function documentation.

NOTE

Not all functions in the external documentation are supported.

Categories

Math and statistical operators and functions

Operator/Function Description
% Returns the remainder of the two numbers
* Multiplies the two numbers
+ Adds the two numbers
- Subtracts the two numbers
/ Divides the two numbers
abs Returns the absolute value of the input
acos Returns the inverse cosine value
approx_count_distinct Returns the estimated cardinality by HyperLogLog++
approx_percentile Returns the approximate percentile value at the given percentage
asin Returns the inverse sine value
atan Returns the inverse tangent value
atan2 Returns the angle between the positive x-axis plane and the points given by the coordinates
avg Returns the average value
cbrt Returns the cube root
ceil or ceiling Returns the smallest integer not larger than the inputted value
conv Convert from one base to another
corr Returns the Pearson coefficient between the numbers
cos Returns the cosine value
cosh Returns the hyperbolic cosine value
cot Returns the cotangent value
dense_rank Returns the rank of a value in a group of values
e Returns Euler’s number
exp Returns e to the power of the value
expm1 Returns e to the power of the value minus 1
factorial Returns the factorial of the value
floor Returns the largest integer not smaller than the value
greatest Returns the largest value of all the parameters
hypot Returns the hypotenuse of the two values given
kurtosis Returns the kurtosis value from the group
least Returns the smallest value of all the parameters
ln Returns the natural logarithm of the value
log Returns the logarithm of the value
log10 Returns the logarithm, in base 10, of the value
log1p Returns the logarithm of the value plus 1
log2 Returns the logarithm, in base 2, of the value
max Returns the maximum value of the expression
mean Returns the mean calculated from the values
min Returns the minimum value of the expression
monotonically_increasing_id Returns monotonically increasing IDs
negative Returns the negated value
percent_rank Returns the percentage ranking of a value
percentile Returns the exact percentile at a given percentage
percentile_approx Returns the approximate percentile at a given percentage
pi Returns pi
pmod Returns the positive modulo between two values
positive Returns the positive balue
pow, power Returns the first value to the power of the second value
radians Converts the value to radians
rand Returns a random number between 0 and 1
randn Returns a random value
rint Returns the closest double value
round Returns the closest rounded value
sign, signum Returns the number’s sign
sin Returns sine of the value
sinh Returns hyperbolic sine of the value
sqrt Returns the square root of the value
stddev Returns the standard deviation of the value
sttdev_pop Returns the population standard deviation of the value
stddev_samp Returns the sample standard deviation of the value
sum Returns the sum of the values
tan Returns tangent of the value
tanh Returns hyperbolic tangent of the value
var_pop Returns the calculated population variance
var_samp, variance Returns the calculated sample variance

Logical operators and functions

Operator/Function Description
! or not Logical not
< Less than
<= Less than or equal to
= Equal to
> Greater than
>= Greater than or equal to
^ Bitwise exclusive or
>= Greater than or equal to
[` `](https://spark.apache.org/docs/latest/api/sql/index.html#_15)
~ Bitwise not
arrays_overlap Returns the common elements
assert_true Asserts if the expression is true
if If the expression evaluates to true, return the second expression. Otherwise, return the third expression.
ifnull If the expression is null, it returns the second expression. Otherwise, it returns the first expression.
in Returns true if the first expression is in any of the subsequent expressions.
isnan Returns true if the value is not a number
isnotnull Returns true if the value is not null
isnull Returns true if the value is null
nanvl Returns the first expression if not a number, returns the second expression otherwise
or Logical or
when When can be used to create branch conditions for comparison
xpath_boolean Returns true if the XPath expression evaluates to true or if a matching node is found

Date/time functions

Function Description
add_months Add months to date
date_add Add days to date
date_format Modify date format
date_sub Subtract days from date
date_trunc Returns the date truncated to the specified unit
datediff Returns the difference between dates in days
day, dayofmonth Returns the day of the month
dayofweek Returns the day of week (1-7)
dayofyear Returns the day of year
from_unixtime Returns date in Unix time
from_utc_timestamp Returns date in UTC time
hour Returns the hour of the input
last_day Returns the last day of the month the date belongs to
minute Returns the minute of the input
month Returns the month of the input
months_between Number of months between
next_day Returns the first day later than the input
quarter Returns the quarter of the input
second Returns the second of the string
to_date Converts the string to a date. Note: The string must be in the format yyyy-mm-ddTHH24:MM:SS.
to_timestamp Converts the string to a timestamp. Note: The string must be in the format yyyy-mm-ddTHH24:MM:SS.
to_unix_timestamp Converts the string to a Unix timestamp
to_utc_timestamp Converts the string to a UTC timestamp
trunc Truncates the date
unix_timestamp Returns the Unix timestamp
weekday Day of the week (0-6)
weekofyear Returns the week of the year for a given date
year Returns the year of the string

Arrays

Function Description
array Creates an array with the given elements
array_contains Checks if the array contains the value
array_distinct Removes duplicate values from the array
array_except Returns an array of the elements in the first array, but not the second
array_intersect Returns the intersection of the two arrays
array_join Joins two arrays together
array_max Returns the maximum value of the array
array_min Returns the minimum value of the array
array_position Returns the 1-based position of the element
array_remove Removes all elements that are equal to the element
array_repeat Creates an array containing the value counted times
array_sort Sorts the array
array_union Joins the array together, without any duplicates
array_zip Zip
cardinality Return the size of the array
element_at Return the element at position
explode Separate elements of array into multiple rows, excluding null
explode_outer Separate elements of array into multiple rows, including null
find_in_set Returns the 1 based position of array
flatten Flattens an array of arrays
inline Separate array of structs into a table, excluding null
inline_outer Separate array of structs into a table, including null
posexplod Separate elements of array into multiple rows with positions, excluding null
posexplod Separate elements of array into multiple rows with positions, including null
reverse Reverse elements of the array
shuffle Return a random permutation of the array
slice Subsets an array
sort_array Sort an array, given an order
zip_with Merges the two arrays into a single array, before applying a function

Datatype casting functions

Function Description
bigint Change the data type to bigint
binary Change the data type to binary
boolean Change the data type to boolean
type Change the data type to the specified type
date Change the data type to date
decimal Change the data type to decimal
double Change the data type to double
float Change the data type to float
int Change the data type to int
smallint Change the data type to smallint
str_to_map Create a map from a string
string Change the data type to string
struct Create a struct
tinyint Change the data type to tinyint

Conversion and formatting functions

Function Description
ascii Return the numeric (ASCII) value
base64 Change the argument to a base64 string
bin Change the argument to a binary value
bit_length Return the bit length
char, chr Return the ASCII character
char_length, character_length Return the string length
crc32 Returns the cyclic redundancy check value
degrees Convert radians to degrees
format_number Change the number’s format
from_json, get_json_object Get data from JSON
hash Return the hash value
hex Convert the argument to a hexadecimal value
initcap Changes the string to be title cased
lcase, lower Changes the string to be all lowercase
lpad Pads the left side of a string
map Create a map
map_from_arrays Create a map from an array
map_from_entries Create a map from an array of structs
md5 Return the md5 value
rpad Pads the right side of a string
rtrim Removes trailing spaces
sha, sha1 Return the SHA1 value
sha2 Return the SHA2 value
soundex Return the soundex code
stack Separate values into rows
substr, substring Return the substring
to_json Returns a JSON string
translate Replace values within string
trim Remove leading and trailing characters
ucase, upper Change the string to be all uppercase
unbase64 Convert the base64 string to binary
unhex Convert the hexadecimal to binary
uuid Return a UUID

Data evaluation

Function Description
coalesce Return the first non-null argument
collect_list Return a list of non-unique elements
collect_set Return a set of unique elements
concat Concatenation
concat_ws Concatenation with separator
count Returns the total count for rows
decode Decode using a character set
elt Return the nth input
encode Encode using a character set
first, first_value Returns the first value
grouping Indicates if a column is grouped
grouping_id Returns the level of grouping
instr Returns a 1-based index of character occurrence
json_tuple Returns a tuple from a JSON input
lag, lead Returns the value before the offset
last, last_value Returns the last value
left Returns the first n characters
length Returns the length of the string
levenshtein Returns the Levenshtein distance between strings
locate, position Returns the position of the first occurrence of a substring
map_concat Concatenate a map
map_keys Return a map’s keys
map_values Return a map’s values
ntile Divide rows into partitions
nullif Returns null if true
nvl Returns value if null
nvl2 Returns value if not null
parse_url Extracts part of a URL
rank Computes rank of a value
regexp_extract Extracts something that matches the regex
regex_replace Replaces something that matches the regex
repeat Returns a string that repeats
replace Replace all instances of a string
rollup Create a multi-dimensional rollup
row_number Assigns a unique row number
schema_of_json Returns the schema of the JSON
sentences Splits string into an array of words
sequence Generates an array of elements
shiftleft Signed bitwise shift left
shiftright Signed bitwise shift right
shiftrightunsigned Unsigned bitwise shift right
size Return the size of the array
space Return a string with n spaces
split Split string
substring_index Return index of substring
window Window
xpath Parse XML nodes
xpath_double, xpath_number Parse XML nodes for double
xpath_float Parse XML nodes for float
xpath_int Parse XML nodes for integer
xpath_long Parse XML nodes for long
xpath_short Parse XML nodes for short integer
xpath_string Parse XML nodes for string

Current information

Function Description
current_database Returns current database
current_date Returns current date
current_timestamp, now Returns current timestamp

Higher order functions

Function Description
transform Transform elements in an array
exists Check if element exists
filter Filter the input array
aggregate Apply a binary operator to all elements

On this page