PQL Theory

Data Model Design

Overview

This section covers basic data-related features of data models.

Catalog Tables - OCPM

OCPM Catalog Tables allow accessing metadata about a object centric process mining perspective (former: data model) including registered tables. "OCPM":"Object_Type" "OCPM":"Event_Type" "OCPM":"Event_Object_Type_Relations"


Standard Aggregation

Overview

Standard aggregation functions group input rows together and calculate a single value for each group.

AVG

This aggregate function calculates the average of a set of values. AVG ( table.column )

COUNT_TABLE

This aggregate function calculates the number of rows in a specified table. COUNT_TABLE ( table )

COUNT

This function counts the rows of a column per group. It can be applied on columns of any data type. COUNT ( table.column )

COUNT DISTINCT

This function calculates the number of distinct elements per group. COUNT DISTINCT can be applied on any data type. COUNT ( DISTINCT table.column )

FIRST

Returns the first element of the specified source column for each element in a group. FIRST ( table.input_column [, ORDER BY table.column [ASC|DESC] ] )

GLOBAL

In general, only one aggregation is calculated for a query. With a global aggregation it is possible to calculate an additional aggregation function over all the values of a column. GLOBAL ( aggregation )

LAST

Returns the last element of the specified source column for each element in a group. LAST ( table.input_column [, ORDER BY table.column [ASC|DESC] ] )

MAX

This function calculates the maximum value per group. MAX can be applied on any data type. MAX ( table.column )

MEDIAN

This function calculates the median per group. Median can be applied to INT, FLOAT or DATE columns. MEDIAN ( table.column )

MIN

This function calculates the minimum value per group. MIN can be applied on any data type. MIN ( table.column )

MODE

This function finds the most frequent elements per group. Mode can be applied on any data type. For multi-modal input, where there are several result candidates, the element with the smallest value is chosen. For elements of type STRING, the smallest value is determined by its lexicographical order. MODE ( table.column )

PRODUCT

PRODUCT calculates the product over an INT or FLOAT column. PRODUCT ( table.column )

QUANTILE

This function calculates the given quantile value per group. QUANTILE can be applied to INT, FLOAT or DATE columns. QUANTILE ( table.column, quantile )

STDEV

This function calculates the standard deviation of a set of samples. The standard deviation uses the "n-1" method. Standard deviation can be applied to INT or FLOAT data types. STDEV ( table.column )

STRING_AGG

The STRING_AGG operator returns the concatenation of strings from the specified source column for each element in a group. The delimiter will always be inserted between the concatenation of the strings. Multiple order by expressions can be used in order to determine the order of the concatenation. STRING_AGG ( source_table.column, delimiter [, ORDER BY source_table.column [ASC|DESC] ] )

SUM

SUM calculates the sum over an INT or FLOAT column. SUM ( table.column )

TRIMMED_MEAN

This function calculates the trimmed mean over an INT or FLOAT column. TRIMMED_MEAN ( table.column [, lower_cutoff [, upper_cutoff ] ] )

VAR

VAR calculates the variance per group. The variance is the square root of the standard deviation. Variance can be applied to INT or FLOAT columns. VAR ( table.column )


Window Aggregation

Overview

Window functions calculate an aggregate value based on a range of neighboring rows.

RUNNING_TOTAL

RUNNING_TOTAL sums up all entries of a given column and returns all intermediate sums. It can be applied to INT or FLOAT columns. RUNNING_TOTAL ( table.column )


Moving Aggregation

Overview

Moving operators calculate a value across a range of neighboring rows. A range of neighboring rows, used for the calculation of a moving operator result is called window.

MOVING_AVG

This function calculates the average for each window. It can be applied to INT or FLOAT columns. MOVING_AVG ( table.column, start, end )

MOVING_COUNT_DISTINCT

This function counts the distinct values for each window. It can be applied to any data type. MOVING_COUNT_DISTINCT ( table.column, start, end )

MOVING_COUNT

This function counts the values for each window. It can be applied to any data type. MOVING_COUNT ( table.column, start, end )

MOVING_MAX

Calculates the maximum for each window. It can be applied to any data type. MOVING_MAX ( table.column, start, end )

MOVING_MEDIAN

This function calculates the median for each window. It can be applied to any data type. MOVING_MEDIAN ( table.column, start, end )

MOVING_MIN

Calculates the minimum for each window. It can be applied to any data type. MOVING_MIN ( table.column, start, end )

MOVING_STDEV

This function calculates the standard deviation for each window. It can be applied to INT or FLOAT columns. MOVING_STDEV ( table.column, start, end )

MOVING_SUM

Calculates the sum for each window. It can be applied to INT or FLOAT columns. MOVING_SUM ( table.column, start, end )

MOVING_TRIMMED_MEAN

Calculates the trimmed mean with a cut off of 10% for each window. It can be applied to INT or FLOAT columns. MOVING_TRIMMED_MEAN ( table.column, start, end )

MOVING_VAR

Calculates the variance for each window. It can be applied to INT or FLOAT columns. MOVING_VAR ( table.column, start, end )


Pull Up Aggregation

Overview

The Pull-Up-functions allow you to aggregate a column based on another table. You can define the target table to which the entries of a column from another table are pulled, and you can explicitly define on which basis calculations are executed.

PU_AVG

Calculates the average of the specified source column for each element in the given target table. PU_AVG ( target_table, source_table.column [, filter_expression] )

PU_COUNT_DISTINCT

Calculates the number of distinct elements in the specified source column for each element in the given target table. PU_COUNT_DISTINCT ( target_table, source_table.column [, filter_expression] )

PU_COUNT

Calculates the number of elements in the specified source column for each element in the given target table. PU_COUNT ( target_table, source_table.column [, filter_expression] )

PU_FIRST

Returns the first element of the specified source column for each element in the given target table. An order by expression can be set to define the order that should be used to determine the first element. PU_FIRST ( target_table, source_table.column [, filter_expression] [, ORDER BY source_table.column [ASC|DESC] ] )

PU_LAST

Returns the last element of the specified source column for each element of the given target table. An order by expression can be set to define the order that should be used to determine the last element. PU_LAST ( target_table, source_table.column [, filter_expression] [, ORDER BY source_table.column [ASC|DESC] ] )

PU_MAX

Calculates the maximum of the specified source column for each element in the given target table. PU_MAX ( target_table, source_table.column [, filter_expression] )

PU_MEDIAN

Calculates the median of the specified source column for each element in the given target table. PU_MEDIAN ( target_table, source_table.column [, filter_expression] )

PU_MODE

Calculates the mode of the specified source column for each element in the given target table. For multi-modal input, where there are several result candidates, the element with the smallest value is chosen. For elements of type STRING, the smallest value is determined by its lexicographical order. PU_MODE ( target_table, source_table.column [, filter_expression] )

PU_MIN

Calculates the minimum of the specified source column for each element in the given target table. PU_MIN ( target_table, source_table.column [, filter_expression] )

PU_PRODUCT

Calculates the product of the specified source column for each element in the given target table. PU_PRODUCT ( target_table, source_table.column [, filter_expression] )

PU_QUANTILE

Calculates the quantile of the specified source column for each element in the given target table. PU_QUANTILE ( target_table, source_table.column, quantile [, filter_expression] )

PU_TRIMMED_MEAN

Calculates the trimmed mean of the specified source column for each element in the given target table. PU_TRIMMED_MEAN ( target_table, source_table.column [, lower_cutoff [, upper_cutoff ] ] [, filter_expression ] )

PU_STRING_AGG

The PU_STRING_AGG operator returns the concatenation of strings from the specified source column for each element in the given target table. The delimiter will be always inserted between the concatenation of the strings. Multiple order by expressions can be used in order to determine the order of the concatenation. PU_STRING_AGG ( target_table, source_table.column, delimiter [, filter_expression ] [, ORDER BY source_table.column [ASC|DESC] ] )

PU_SUM

Calculates the sum of the specified source column for each element in the given target table. PU_SUM ( target_table, source_table.column [, filter_expression] )

PU_STDEV

Calculates the standard deviation of the specified source column per each group of samples per group in the given target table. The standard deviation is using the "n-1" method. Standard deviation can be applied to INT or FLOAT columns. PU_STDEV ( target_table, source_table.column [, filter_expression] )

Pull Up Aggregation - Table Options

Besides using a table from the data model as a target table inside a PU-function, generated tables can be passed.

CONSTANT

To aggregate all values from a source column to a single value, the CONSTANT function can be used as a target table inside a PU-function. The resulting value is a constant. CONSTANT()

DOMAIN_TABLE

The DOMAIN_TABLE function can be used to create a temporary table from various column(s), which can be used as a target table inside all PU-functions. DOMAIN_TABLE ( table.column, ... )


Currency

Overview

This section includes functions related to currency manipulation.

CURRENCY_CONVERT_SAP

This operator converts the document currency of SAP tables using the internal tables TCURR, TCURF and TCURX to perform the conversion. CURRENCY_CONVERT_SAP ( to_currency, exchange_rate_type, mandt_column, document_currency_column, document_date_column, amount_column [, source_system_main_table_column ])

CURRENCY_CONVERT

This operator converts currencies based on a provided CURRENCY_CONVERSION_RATES table. CURRENCY_CONVERT ( amount, FROM ( from_currency ), TO ( to_currency ), date [, currency_conversion_rates [, exchange_rate_type [,

CURRENCY_SAP

This operator converts the currency value as stored by SAP to the actual value based on the internal TCURX table. CURRENCY_SAP ( amount_column, document_currency_type_column [, source_system_column ] )

QUANTITY_CONVERT

This operator converts quantity units based on a QUANTITY_CONVERSION_RATES table, which is optionally provided by the user. If not provided by the user, a default QUANTITY_CONVERSION_RATES table will be used. QUANTITY_CONVERT ( amount, FROM(from_unit), TO(to_unit) [, identifier, [ quantity_conversion_rates ] ] )


Custom

Overview

This section includes Celonis Analysis specific functions.

COMMON_TABLE

COMMON_TABLE allows to refer to the common table of multiple expressions in the data model without using its exact table name. This operator might be helpful when the common table of multiple expressions is unknown. The given input expressions have to be columns. COMMON_TABLE ( expression [, expression, ...] )

KPI

Saved formulas are reusable PQL statements which can be defined in the frontend. With KPI a saved formula can be called. The result is the same as if the saved formula would have been directly placed where KPI is called. KPI ( formula_name (, argument )* )

USER_NAME

Returns the celonis username of the current user. USER_NAME()


Data Flow

Overview

This section contains data flow operators.

BIND

BIND pulls a column or constant to a specified table. In case of a column, this requires that the owner table of the column and the specified target table have a direct or indirect 1:N relationship. BIND( target_table, value)

BIND_FILTERS

BIND_FILTERS pulls a filter to a specified table. Multiple filters on a table are merged together by a logical AND. BIND_FILTERS( target_table , condition [, condition ] *)

CASE WHEN

CASE WHEN evaluates a list of conditions and returns result expressions based on these conditions. CASE WHEN condition THEN result_expression [ WHEN condition THEN result_expression ]* [ ELSE result_expression ] END

COALESCE

The COALESCE function returns the first element that is not NULL of a set of values. COALESCE (table.column1, ..., table.columnN )

FILTER

Filters can be defined as Analysis filters, Sheet filters or Component filters. If a query is sent to Celonis, all active filters are propagated to the requested table(s). Multiple filters on a table are merged together by a logical AND. FILTER [FORCED] condition;

FILTER_TO_NULL

FILTER_TO_NULL immediately propagates the FILTER to the specified column by setting all violating rows to NULL. FILTER_TO_NULL ( table.column )

GREATEST

The GREATEST function returns the greatest element that is not NULL from a set of values. GREATEST ( table.column1 , ... , table.columnN )

LEAST

The LEAST function returns the least element that is not NULL from a set of values. LEAST ( table.column1 , ... , table.columnN )

LOOKUP

The LOOKUP operator allows you to lookup matching rows of a source column to a target table where the target and source tables cannot have a pre-defined join relationship in the Data Model. As no join between the tables exists, LOOKUP implicitly disregards filters from the source side. LOOKUP ( target_table, source_column, (join_condition) [, (join_condition),... ] ) (target_table.column, source_table.column)

MULTI CASE WHEN

The MULTI CASE WHEN operator evaluates a list of conditions and returns a new column containing result expressions based on these conditions. In contrast to the CASE WHEN operator, the result expressions for all conditions that evaluate to true are returned. Furthermore, the MULTI CASE WHEN operator creates a new table that is joined back to the common table of all inputs. MULTI CASE WHEN condition THEN result_expression [ WHEN condition THEN result_expression ]* [ ELSE result_expression ] END

REMAP_INTS

REMAP_INTS allows you to map values of a column of type INT. REMAP_INTS ( table.column, [ old_value1, new_value1 ] , ... [, others_value ] )

REMAP_VALUES

REMAP_VALUES allows you to map values of a column of type STRING. REMAP_VALUES ( table.column, [ old_value1, new_value1 ] , ... [, others_value ] )

UNIQUE_ID

The UNIQUE_ID function returns a unique non-negative INT for each unique tuple in the combination of the input columns. UNIQUE_ID ( table.column1 , ... , table.columnN )


Data Generation

Overview

This section includes data generation functions.

GENERATE_RANGE

GENERATE_RANGE creates values inside a range based on a given step size. It can be used to generate values for dropdown components. GENERATE_RANGE ( step_size , range_start , range_end )

RANGE_APPEND

RANGE_APPEND creates values inside a range based on a given step size and appends them to the given column. RANGE_APPEND ( table.column, step_size [, range_start [, range_end ] ] )

RANGE

With RANGE, a column which is not related to any table in the Data Model, can be created. It is filled with a range of INT or DATE values. RANGE ( start_value, step_size, step_count )


Data Type Conversion

Overview

Data type conversion can be explicit by using custom functions, or implicit depending on context.

TO_DATE

TO_DATE converts STRING input to DATE output. You must define the FORMAT. If the input does not match the FORMAT, the function returns NULL. TO_DATE ( table.column, FORMAT ( format ) )

TO_FLOAT

TO_FLOAT converts STRING input to FLOAT output. If the input does not represent a float number (or the value of the representation is outside of the supported FLOAT range), the function returns NULL. TO_FLOAT ( table.column )

TO_INT

TO_INT converts STRING input to INT output. If the input does not represent an integer number (or the value of the representation is outside of the supported INT range), the function returns NULL. TO_INT ( table.column )

TO_STRING

TO_STRING converts INT or DATE input to STRING output. For DATE input, you must specify the output format. TO_STRING ( table.column [, FORMAT ( format ) ] )

TO_NULLABLE_INT

TO_NULLABLE_INT evaluates a boolean condition input and produces an INT output. The function returns 1 if the condition evaluates to TRUE, 0 if the condition evaluates to FALSE, and NULL if the condition evaluates to NULL. TO_NULLABLE_INT ( condition )


DateTime

Overview

This section includes a variety of datetime functions.

TIMELINE_COLUMN - TIMELINE_TABLE

The TIMELINE_COLUMN and TIMELINE_TABLE functions create a continuous timeline for a given time unit (e.g., days). They can be used to aggregate values from one or multiple tables onto one common time axis. This makes it easy to show the trend of a KPI over a period of time. TIMELINE_COLUMN ( timeunit, time_columns [, partition ] [, time_range] ) TIMELINE_TABLE ( timeunit, time_columns [, partition ] [, time_range] )

DateTime Calendars

DateTime Calendars can be used to define the calendar which is used for DATE calculations.

FACTORY_CALENDAR

FACTORY_CALENDAR produces a new factory calendar with the configuration as passed to it. FACTORY_CALENDAR ( calendar_table.start_date_column, calendar_table.end_date_column [, filter_condition ] ) FACTORY_CALENDAR ( calendar_table [, filter_condition ] )

INTERSECT

A call to INTERSECT takes two arbitrary calendars as input and produces a new intersected calendar (its type depends on the input calendars). INTERSECT ( lhs_calendar, rhs_calendar )

WEEKDAY_CALENDAR

WEEKDAY_CALENDAR produces a new weekday calendar using the given configuration. WEEKDAY_CALENDAR ( day [ day_time ] ... ) WEEKDAY_CALENDAR ( calendar_table )

WORKDAY_CALENDAR

WORKDAY_CALENDAR produces a new workday calendar (i.e., SAP TFACS-style calendar or sometimes called SAP Factory Calendar) using the TFACS table passed to it. WORKDAY_CALENDAR ( TFACS_table [, TFACS_entry_identifier ] )

DateTime Constant

DateTime Constant Functions allow to use the current date or time in a query.

HOUR_NOW

The HOUR_NOW function returns the hour of the current time of the app-server in the specified time zone. HOUR_NOW ( [time_zone_id] )

MINUTE_NOW

The MINUTE_NOW function returns the minute of the current time of the app-server in the specified time zone. MINUTE_NOW ( [time_zone_id] )

TODAY

The TODAY function returns the current date in the specified timezone. TODAY( [time_zone_id] )

DateTime Difference

DateTime Difference Functions can be used to calculate the time difference between two DATEs.

DATE_BETWEEN

DATE_BETWEEN determines whether the first date is between the second date and the third date. DATE_BETWEEN ( table.column1, table.column2, table.column3 )

DATEDIFF

DATEDIFF computes the difference between two dates in the specified time unit. DATEDIFF ( time_unit, table.date_column1, table.date_column2 )

DAYS_BETWEEN

DAYS_BETWEEN computes the difference between two dates in days. DAYS_BETWEEN ( table.column1, table.column2 )

HOURS_BETWEEN

HOURS_BETWEEN computes the difference between two timestamps in hours. If a calendar is specified, only the passed time in the calendar is considered. HOURS_BETWEEN ( start_column, end_column [, calendar_specification [, calendar_id_column]] )

MILLIS_BETWEEN

MILLIS_BETWEEN computes the difference between two timestamps in milliseconds. If a calendar is specified, only the passed time in the calendar is considered. MILLIS_BETWEEN ( start_column, end_column [, calendar_specification [, calendar_id_column]] )

MINUTES_BETWEEN

MINUTES_BETWEEN computes the difference between two timestamps in minutes. If a calendar is specified, only the passed time in the calendar is considered. MINUTES_BETWEEN ( start_column, end_column [, calendar_specification [, calendar_id_column]] )

MONTHS_BETWEEN

MONTHS_BETWEEN computes the difference between two dates in months. MONTHS_BETWEEN ( table.column1, table.column2 )

SECONDS_BETWEEN

SECONDS_BETWEEN computes the difference between two timestamps in seconds. If a calendar is specified, only the passed time in the calendar is considered. SECONDS_BETWEEN ( start_column, end_column [, calendar_specification [, calendar_id_column]] )

WORKDAYS_BETWEEN

The WORKDAYS_BETWEEN function determines the amount of workdays between two given dates. When the start date is before the the end date, the end date is not counted. When the start date is after the end date, the start date is not counted and a negative value is returned. WORKDAYS_BETWEEN ( start_date, end_date, calendar_specification [, calendar_id_column]) WORKDAYS_BETWEEN ( calendar_id, start_date, end_date [, INVALID_TO_NULL] )

YEARS_BETWEEN

YEARS_BETWEEN computes the difference between two dates in years. YEARS_BETWEEN ( table.column1, table.column2 )

DateTime Modification

DateTime Constant Functions can be used to modify DATEs.

ADD_DAYS

ADD_DAYS adds a given number of days to a given date. ADD_DAYS ( table.base_column, table.days_column )

ADD_HOURS

ADD_HOURS adds a given number of hours to a given timestamp. If a calendar is specified, only the passed time in the calendar is considered. ADD_HOURS ( start_column, hours_column [, calendar_specification [, calendar_id_column]])

ADD_MILLIS

ADD_MILLIS adds a given number of milliseconds to a given timestamp. If a calendar is specified, only the passed time in the calendar is considered. ADD_MILLIS ( start_column, milliseconds_column [, calendar_specification [, calendar_id_column]])

ADD_MINUTES

ADD_MINUTES adds a given number of minutes to a given timestamp. If a calendar is specified, only the passed time in the calendar is considered. ADD_MINUTES ( start_column, minutes_column [, calendar_specification [, calendar_id_column]])

ADD_MONTHS

ADD_MONTHS adds a given number of months to a given date. ADD_MONTHS ( table.base_column, table.months_column )

ADD_SECONDS

ADD_SECONDS adds a given number of seconds to a given timestamp. If a calendar is specified, only the passed time in the calendar is considered. ADD_SECONDS ( start_column, seconds_column [, calendar_specification [, calendar_id_column]])

ADD_WORKDAYS

The ADD_WORKDAYS function adds a given number of workdays to a given date. If the start date is not a workday, ADD_WORKDAYS returns the same result as if the start date was the next workday instead (even when 0 days are added). ADD_WORKDAYS ( date, number_of_days, calendar_specification [, calendar_id_column]) ADD_WORKDAYS ( workday_calendar_config, date, number_of_days )

ADD_YEARS

ADD_YEARS adds a given number of years to a given date. ADD_YEARS ( table.base_column, table.years_column )

CONVERT_TIMEZONE

CONVERT_TIMEZONE converts a given date from one specified timezone to another. The function takes into account the Daylight Saving Time. CONVERT_TIMEZONE ( table.date_column [, from_timezone_id], to_timezone_id )

DateTime Projection

DateTime Projection Functions contains functions to extract the different parts from a DATE and to map DATEs to other data types and vice versa.

CALENDAR_WEEK

CALENDAR_WEEK returns the calendar week of a given date. Celonis follows the standard ISO 8601 for numbering the calendar weeks. CALENDAR_WEEK ( table.column )

DATE_MATCH

DATE_MATCH(column, [YEARS], [QUARTERS], [MONTHS], [WEEKS], [DAYS])

DAY_OF_WEEK

Returns the day of the week as number. The operator returns for Sunday 0, for Monday 1, for Tuesday 2 and so on. DAY_OF_WEEK ( table.column )

DAYS_IN_MONTH

Returns the number of days in the given month of the given timestamp. DAYS_IN_MONTH( table.column )

DAY

Returns the day of a given date. DAY( table.column )

HOURS

Returns the hour of a given date. HOURS ( table.column )

IN_CALENDAR

IN_CALENDAR checks whether a given date is within a calendar. IN_CALENDAR ( timestamp_column, calendar_specification [, calendar_id_column])

MILLIS

Returns the milliseconds of a given date. MILLIS ( table.column )

MINUTES

Returns the minutes of a given date. MINUTES ( table.column )

MONTH

Returns the month of a given date. MONTH ( table.column )

QUARTER

Returns the quarter of a given date. QUARTER ( table.column )

REMAP_TIMESTAMPS

The REMAP_TIMESTAMPS function counts the number of passed time units for given dates since the epoch year (1970-01-01 00:00:00.000). The timestamps for which to calculate the passed time and also the time unit to use, are given as a parameter to the function call. Additionally, the user can specify a CALENDAR configuration which allows to restrict the dates considered in the calculations. For example, using the WEEKDAY_CALENDAR allows to only consider certain valid weekdays in the calculations. REMAP_TIMESTAMPS ( table.column, time_unit [, calendar_specification [, calendar_id_column ] ] )

SECONDS

Returns the seconds of a given date. SECONDS ( table.column)

TO_TIMESTAMP

Deprecated: Please use TO_DATE instead. TO_TIMESTAMP converts a string into a DATE. The format of the input string can be defined. TO_TIMESTAMP ( table.column, format )

YEAR

Returns the year of a given date. YEAR ( table.column )

DateTime Rounding

DateTime rounding functions round down the subordinates of the chosen time unit into the ground state. They can be applied to DATE columns and always return a DATE column.

ROUND_YEAR

This function rounds a DATE input down to the year by setting its MONTH, DAY, HOUR, MINUTE, SECOND and MILLISECOND values to their ground state. ROUND_YEAR ( table.date_column )

ROUND_MONTH

This function rounds a DATE input down to the month by setting its DAY, HOUR, MINUTE, SECOND and MILLISECOND values to their ground state. ROUND_MONTH ( table.date_column )

ROUND_DAY

This function rounds a DATE input down to the day by setting its HOUR, MINUTE, SECOND and MILLISECOND values to their ground state. ROUND_DAY ( table.date_column )

ROUND_HOUR

This function rounds a DATE input down to the hour by setting its MINUTE, SECOND and MILLISECOND values to their ground state. ROUND_HOUR ( table.date_column )

ROUND_MINUTE

This function rounds a DATE input down to the minute by setting its SECOND and MILLISECOND values to their ground state. ROUND_MINUTE ( table.date_column )

ROUND_SECOND

This function rounds a DATE input down to the second by setting its MILLISECOND values to their ground state. ROUND_SECOND ( table.date_column )

ROUND_QUARTER

ROUND_QUARTER(column)

ROUND_WEEK

This function rounds a DATE input down to the week by setting its HOUR, MINUTE, SECOND and MILLISECOND values to their ground state. The DAY value is set to the Monday of the week of that date. ROUND_WEEK ( table.date_column )


Logical

Overview

Logical operators are used to combine boolean expressions into a more complex boolean expressions.

AND

AND evaluates to true, when the left and the right hand side logical expressions are true. Evaluates to false otherwise. logical expression AND logical expression

NOT

NOT negates a logical expression. NOT (logical expression)

OR

OR evaluates to true, when either the left or the right hand side logical expression is true. Evaluates to false otherwise. logical expression OR logical expression


Machine Learning Functions

Overview

This section comprises standard machine learning functions.

KMEANS

KMeans finds clusters based on the K-means++ algorithm. It assigns a number to every cluster. KMEANS ( k ) KMEANS ( TRAIN_[FILTERED_]KM ( [ EXCLUDED ( table.exclude_column, ... ), ] INPUT ( table.input_column, ... ), k ), CLUSTER ( table.cluster_column, ... ) )

LINEAR_REGRESSION

The Linear Regression function models the relationship between input variables by fitting a linear equation to the input data. LINEAR_REGRESSION ( TRAIN_[FILTERED_]LM ( INPUT ( table.input_column, ... ), OUTPUT ( table.output_column ) ), PREDICT ( table.predict_column, ... ) )


Math

Overview

This section comprises standard mathematical functions including basic arithmetic operations.

ABC

The ABC function performs an ABC Analysis. Such an analysis divides entries in a column into three groups (A, B, C). Group A represents the values with the highest share of the sum of all values. Group C represents the values with the lowest share. The size of each group is defined with thresholds. [FILTERED_]ABC ( table.column [,A [, B ] ] )

ABS

ABS is a mathematical function that returns the absolute (positive) value of a specified numeric expression. ABS ( table.column )

ADD

The addition operator computes the sum of two values. ADD ( table.column1 , table.column2 ) table.column1 + table.column2

CEIL

The Ceiling function returns the smallest integer greater than, or equal to, the specified numeric expression. CEIL ( table.column )

DIV

The division operator computes the quotient of two values. DIV ( table.column1 , table.column2 ) table.column1 / table.column2

FLOOR

The FLOOR function returns the greatest integer smaller than, or equal to, the specified numeric expression. FLOOR ( table.column )

INVERSE

INVERSE negates the input number. INVERSE ( table.column )

LOG

The LOG function returns the logarithm of the specified float expression. LOG ( table.column [, base] )

MODULO

The modulo operator returns the remainder of a division of both arguments. MODULO ( dividend, divisor ) dividend % divisor

MULT

The multiplication operator computes the product of two values. MULT ( table.column1 , table.column2 ) table.column1 * table.column2

POWER

The POWER function returns the value of the specified expression to the specified power. POWER ( table.column, exponent )

ROUND

The ROUND function rounds a numeric value to the closest number. ROUND ( column [, precision ] )

SQRT

The SQRT function calculates the positive square root of the specified value. SQRT ( table.column )

SQUARE

The SQUARE function calculates the square for each input value. SQUARE ( table.column )

SUB

The subtraction operator computes the difference of two values. SUB ( table.column1 , table.column2 ) table.column1 - table.column2


Object Link

Overview

Object Link enables the analysis of graph data. An example for such data is a bill of materials.

LINK_FILTER

LINK_FILTER enables object-based recursive filtering on the Object Link graph. LINK_FILTER ( filter_expression, {ANCESTORS|DESCENDANTS} [, number_of_hops] ) LINK_FILTER ( BIND_FILTERS ( object_link_table, filter_expression ), {ANCESTORS|DESCENDANTS} [, number_of_hops] ) LINK_FILTER_ORDERED ( filter_expression, {ANCESTORS|DESCENDANTS} [, number_of_hops] ) LINK_FILTER_ORDERED ( BIND_FILTERS ( activity_table, filter_expression ), {ANCESTORS|DESCENDANTS} [, number_of_hops] )

LINK_PATH_SOURCE - LINK_PATH_TARGET

The LINK_PATH_SOURCE - LINK_PATH_TARGET operators return tables containing the links, respectively joined to source or target object, that were used during LINK_PATH traversal. LINK_PATH_SOURCE ( link_path_table , mapping_table1.column [, mapping_table2.column ] ... ) LINK_PATH_SOURCE ( link_path_table.column , mapping_table1.column [, mapping_table2.column ] ... ) LINK_PATH_TARGET ( link_path_table , mapping_table1.column [, mapping_table2.column ] ... ) LINK_PATH_TARGET ( link_path_table.column , mapping_table1.column [, mapping_table2.column ] ... )

LINK_PATH

The LINK_PATH operator creates an internal activity and case table to represent individual paths calculated by traversing the Object Link graph. The resulting activity table is joined towards the input table and to the internal case table in a N:1 fashion. Additional tables containing link attributes are generated and can be accessed via the LINK_PATH_SOURCE/TARGET operators. An example of the created tables and how they are joined is shown below: LINK_PATH ( input_table.column [, direction ] [, CONSTRAINED BY ( [ START ( start_objects_expression ) ] [, END ( end_objects_expression ) ] [, LENGTH ( comparison ) ] [, ALL ( all_objects_expression ) ] [, WITH/WITHOUT CYCLES] ) ] )

LINK_OBJECTS

The LINK_OBJECTS operator creates a new table containing all rows of the input table that are part of the Object Link graph. LINK_OBJECTS ( input_table.column )[.COLUMN]

LINK_SOURCE - LINK_TARGET - LINK_ATTRIBUTES

LINK_SOURCE and LINK_TARGET output the source or target objects of links that are generated according to the Object Link configuration of the data model. LINK_ATTRIBUTES returns link values. LINK_SOURCE ( object_link_table1.column [, object_link_table2.column] ... ) LINK_TARGET ( object_link_table1.column [, object_link_table2.column] ... ) LINK_ATTRIBUTES ( mapping_table1.column [, mapping_table2.column] ... ) LINK_SOURCE( BIND ( object_link_table1, table1.column ) [, BIND ( object_link_table2, table2.column ) ] ...) LINK_TARGET( BIND ( object_link_table1, table1.column ) [, BIND ( object_link_table2, table2.column ) ] ...)


Predicate

Overview

Predicate functions and operators check if the input argument satisfies a condition.

BETWEEN

BETWEEN ... AND ... returns true for all cases of a column that are in the closed interval given. Returns false otherwise. The reverse is true for NOT BETWEEN ... AND .... table.column BETWEEN inclusive start of interval AND inclusive end of interval table.column NOT BETWEEN inclusive start of interval AND inclusive end of interval

IN_LIKE

The IN_LIKE predicate determines whether one of the multiple right string patterns matches the left expression string. The left expression could either be a column or a constant string. The pattern list has to contain at least one entry while all entries have to be STRING values. The output can be negated by using NOT IN_LIKE. value/column IN_LIKE ( value1/column1, ... ) value/column NOT IN_LIKE ( value1/column1, ... )

IN

IN returns true for all cases of a column or a value that match any of the values of the match list. An item in the match list can also be another column. Returns false otherwise. The reverse is true for NOT IN. value/column IN ( value1/column1, ... ) value/column NOT IN ( value1/column1, ... )

ISNULL

The ISNULL function returns an INT which indicates whether the input value is NULL. If the input value is NULL, the function returns 1; otherwise it returns 0. ISNULL ( table.column )

LIKE

The LIKE predicate determines whether the right expression string matches the left expression string. The output can be negated by using NOT LIKE. table.column LIKE pattern table.column NOT LIKE pattern

MULTI_IN

MULTI_IN returns true for all cases where one or multiple columns match any tuple of the match list. Returns false otherwise. The reverse is true for NOT MULTI_IN. MULTI_IN ( (table.column, ...), (value1_1, ...), (value2_1, ...), ...) NOT MULTI_IN ( (table.column, ...), (value1_1, ...), (value2_1, ...), ...)


Process

Overview

Process functions perform calculations based on the Case-Id and the Activity column of the event log.

ACTIVATION_COUNT

Deprecated: Please use INDEX_ACTIVITY_TYPE instead. The ACTIVATION_COUNT operator returns for every activity in every case, how many times, at a given point in a process this activity has already occurred. ACTIVATION_COUNT ( activity_table.column )

ACTIVITY_LAG

ACTIVITY_LAG returns the row of the activity table that precedes the current row by offset number of rows within a case. The resulting column is a temporary column of the activity table. ACTIVITY_LAG ( activity_table.column [, offset ] )

ACTIVITY_LEAD

ACTIVITY_LEAD returns the row of the activity table that follows the current row by offset number of rows within a case. The resulting column is a temporary column of the activity table. ACTIVITY_LEAD ( activity_table.column [, offset ] )

BPMN_CONFORMS

BPMN_CONFORMS provides a way for simple binary yes/no conformance checking on a BPMN model to distinguish fitting traces from non-fitting traces. BPMN_CONFORMS ( flattened_events_table.column , bpmn_model, [ ALLOW( filter_or_shorthand [, filter_or_shorthand ... ] ) ] )

CALC_CROP

This operator crops cases to a range of activities. All values outside this range are mapped to null. Values within the range are flagged with 1, in contrast to CALC_CROP_TO_NULL which returns the input values. The result is an INT column that is activity-based and returns 1 for the activities inside the specified range and null otherwise. CALC_CROP ( begin_range_specifier TO end_range_specifier, activity_table.string_column )

CALC_CROP_TO_NULL

This operator crops cases to a range of activities. All values outside this range are mapped to null. Values within the range are kept, in contrast to CALC_CROP which maps them to 1. The result is a temporary STRING column of the activity table. It contains activities of the specified range and null otherwise. CALC_CROP_TO_NULL ( begin_range_specifier TO end_range_specifier, activity_table.string_column )

CLUSTER_VARIANTS

The CLUSTER_VARIANTS operator groups similar process variants (or traces) into clusters. Variants which cannot be assigned to a cluster are marked as noise. CLUSTER_VARIANTS ( variant_column, MIN_PTS, EPSILON )

CONFORMANCE

CONFORMANCE flags activities whether they are conformant with the given model. READABLE returns violation descriptions in English. To use the model from the conformance checker sheet in Analysis, the CONFORMANCE query can be written to a variable. CONFORMANCE ( activity_table.string_column, [ places ], [ transitions ], [ edges ], [ mapping ], [ start_places ], [ end_places ] ) READABLE ( conformance_query )

CREATE_EVENTLOG

CREATE_EVENTLOG returns an activity table that is based on a given lead object and a set of included event types. CREATE_EVENTLOG ( object_table_name [ FILTER lead_object_filter ] , INCLUDE [included_event_1 [ VIA ( relationship_1 ) ] [ FILTER event_filter_1 ], included_event_2 [ VIA ( relationship_2 ) ] [ FILTER event_filter_2 ], ...]).column_name

DEFAULT ACTIVITY_COLUMN

The DEFAULT ACTIVITY_COLUMN statement can be used to temporarily overwrite the default eventlog for the execution of a single query. The specified column is used whenever an operator would use the activity column of the default eventlog. DEFAULT ACTIVITY_COLUMN activity_table.column;

SOURCE

SOURCE and TARGET functions provide a way to combine values from two different rows of the activity table into the same row, e.g. for calculating the throughput time between consecutive events inside a case. SOURCE ( activity_table.column [, activity_table.filter_column ] [, edge_configuration [ WITH START( [ start_value ] ) ] ] )

TARGET

SOURCE and TARGET functions provide a way to combine values from two different rows of the activity table into the same row, e.g. for calculating the throughput time between consecutive events inside a case. SOURCE ( activity_table.column [, activity_table.filter_column ] , WITH START( [ start_value ] ) )

ESTIMATE_CLUSTER_PARAMS

The ESTIMATE_CLUSTER_PARAMS operator estimates appropriate values for the MIN_PTS parameter of the CLUSTER_VARIANTS operator. ESTIMATE_CLUSTER_PARAMS ( table.variant_column, epsilon, number_of_values, recursion_depth )

EVENTLOG_SOURCE_TABLE

For each row of a dynamically created eventlog, EVENTLOG_SOURCE_TABLE will return the activity table name from which the row originates. EVENTLOG_SOURCE_TABLE ( dynamically-created-eventlog.column )

MATCH_ACTIVITIES

MATCH_ACTIVITIES flags cases with certain activities without taking the exact order of activities into account. If the order is of importance, use MATCH_PROCESS_REGEX or MATCH_PROCESS. MATCH_ACTIVITIES([ activity_table.string_column, ] [STARTING activity_list ] [, ] [NODE activity_list ] [, ] [NODE_ANY activity_list ] [, ] [ENDING activity_list ] [, ] [EXCLUDING activity_list ] [, ] [EXCLUDING_ALL activity_list ])

MERGE_EVENTLOG

MERGE_EVENTLOG merges two columns of the same type from two activity tables into one common column. The sorting of entries in the resulting column is based on timestamps. Depending on the join scenario, the result of MERGE_EVENTLOG might contain some duplicated activities. If this is unwanted, then MERGE_EVENTLOG_DISTINCT can be used. Please note that MERGE_EVENTLOG and MERGE_EVENTLOG_DISTINCT only differ regarding duplicated activities and otherwise behave the same. MERGE_EVENTLOG ( target_table.column [, FILTER target_table_filter_expression ] , source_table.column [, FILTER source_table_filter_expression ] )

MERGE_EVENTLOG_DISTINCT

MERGE_EVENTLOG merges two columns of the same type from two activity tables into one common column. The sorting of entries in the resulting column is based on timestamps. Depending on the join scenario, the result of MERGE_EVENTLOG might contain some duplicated activities. If this is unwanted, then MERGE_EVENTLOG_DISTINCT can be used. Please note that MERGE_EVENTLOG and MERGE_EVENTLOG_DISTINCT only differ regarding duplicated activities and otherwise behave the same. MERGE_EVENTLOG_DISTINCT ( target_table.column [, FILTER target_table_filter_expression ] , source_table.column [, FILTER source_table_filter_expression ] )

PROCESS EQUALS

PROCESS EQUALS matches the variants of a process based on simple expressions. PROCESS EQUALS is less powerful than MATCH_PROCESS_REGEX and MATCH_PROCESS but it is also simpler to use. PROCESS [ ON activity_table.string_column ] [ NOT ] equals [ start ] activity ( to activity )* [ end ]

MATCH_PROCESS_REGEX

Filters the variants based on a regular expression defined over the activities. MATCH_PROCESS_REGEX ( activity_table.string_column, regular_expression )

MATCH_PROCESS

MATCH_PROCESS matches the variants of a process against a given pattern. MATCH_PROCESS ( [ activity_table.string_column ,] node (, node)* CONNECTED BY edge (, edge)* )

PROCESS_ORDER

Deprecated: Please use INDEX_ACTIVITY_ORDER instead. PROCESS_ORDER returns the position of each activity within a case. Only not null activities are counted. PROCESS_ORDER ( activity_table.column )

PROJECT_ON_OBJECT

Deprecated: Deprecated alias for CREATE_EVENTLOG, please use CREATE_EVENTLOG instead. PROJECT_ON_OBJECT returns an activity table that is based on a given lead object and a set of included event types. PROJECT_ON_OBJECT ( object_table_name [ FILTER lead_object_filter ] , INCLUDE [included_event_1 [ VIA ( relationship_1 ) ] [ FILTER event_filter_1 ], included_event_2 [ VIA ( relationship_2 ) ] [ FILTER event_filter_2 ], ...]).column_name

CALC_REWORK

CALC_REWORK counts the number of activities per case. A filter expression can be specified to restrict the activities that should be taken into account. The result is an INT column in which the value of each row gives the number of activities for the respective case ID. The resulting column is temporarily added to the case table. CALC_REWORK () CALC_REWORK ( filter_condition ) CALC_REWORK ( activity_table.column ) CALC_REWORK ( filter_condition , activity_table.column )

CALC_THROUGHPUT

Throughput is used to calculate, for each case, the time between two activities. From which activity the calculated throughput time should start and at which it should end can be configured through range specifiers. The resulting column is temporarily added to the case table and returns the throughput time between the two specified activities per case. CALC_THROUGHPUT ( begin_range_specifier TO end_range_specifier, timestamps [, activity_table.string_column ] ) CALC_THROUGHPUT ( object_table_name , begin_event_specifier TO end_event_specifier , time_unit )

TRANSIT_COLUMN

TRANSIT_COLUMN computes transition edges between related cases from two different processes. TRANSIT_COLUMN(TIMESTAMP_INTERLEAVED_MINER(activity_table_a.activity_column, activity_table_b.activity_column), activity_column ) TRANSIT_COLUMN(TIMESTAMP_NONINTERLEAVED_MINER(activity_table_a.activity_column, activity_table_b.activity_column), activity_column ) TRANSIT_COLUMN(MATCH_MINER(activity_table_a.activity_column, activity_table_b.activity_column, activity_table_a.out_msg, activity_table_b.in_msg), activity_column ) TRANSIT_COLUMN(MANUAL_MINER(activity_table_a.activity_column, activity_table_b.activity_column, [manual_value_1, manual_value_2], ...), activity_column )

VARIANT

VARIANT aggregates all activities of a case into a string, which represents a variant of the process. SHORTENED can be used to shorten self-loops to a maximum number of occurrences. The resulting column is case-based and contains one value for each case that gives the (shortened) string of activities for this case. VARIANT ( activity_table.string_column ) SHORTENED ( VARIANT ( activity_table.column ) [, max_cycle_length ] )

Process Index

Index functions create integer indices based on a column.

INDEX_ACTIVITY_LOOP_REVERSE

The INDEX_ACTIVITY_LOOP_REVERSE operator returns how many times every activity has occurred in direct succession in every case in reverse order. INDEX_ACTIVITY_LOOP_REVERSE ( activity_table.column )

INDEX_ACTIVITY_LOOP

The INDEX_ACTIVITY_LOOP operator returns how many times every activity has occurred in direct succession in every case. INDEX_ACTIVITY_LOOP ( activity_table.column )

INDEX_ACTIVITY_ORDER_REVERSE

INDEX_ACTIVITY_ORDER_REVERSE returns the reverse position of each activity within a case. Only not null activities are counted. INDEX_ACTIVITY_ORDER_REVERSE ( activity_table.column )

INDEX_ACTIVITY_ORDER

INDEX_ACTIVITY_ORDER returns the position of each activity within a case. Only not null activities are counted. INDEX_ACTIVITY_ORDER ( activity_table.column )

INDEX_ACTIVITY_TYPE_REVERSE

The INDEX_ACTIVITY_TYPE_REVERSE operator returns for every activity in every case, how many times, at a given point in a process this activity has already occurred in reversed order. INDEX_ACTIVITY_TYPE_REVERSE ( activity_table.column )

INDEX_ACTIVITY_TYPE

The INDEX_ACTIVITY_TYPE operator returns for every activity in every case, how many times, at a given point in a process this activity has already occurred. INDEX_ACTIVITY_TYPE ( activity_table.column )

Process Reference

Process reference functions can be used to reference the case table or configured activity columns based on an expression.

ACTIVITY_COLUMN

ACTIVITY_COLUMN allows to refer the activity column of an activity table in the data model without using the exact column name. ACTIVITY_COLUMN ( [ expression ] )

ACTIVITY_TABLE

ACTIVITY_TABLE allows to refer to the activity table in the data model without using its exact table name. This operator is helpful in data models with multiple activity/case tables. ACTIVITY_TABLE ( [ expression ] )

CASE_ID_COLUMN

CASE_ID_COLUMN allows to refer the case column of an activity table in the data model without using the exact column name. CASE_ID_COLUMN ( [ expression ] )

CASE_TABLE

CASE_TABLE allows to refer to the case table in the data model without using its exact table name. This operator is helpful in data models with multiple activity/case tables. CASE_TABLE ( [ expression ] )

END_TIMESTAMP_COLUMN

END_TIMESTAMP_COLUMN allows to refer the end timestamp column of an activity table in the data model without using the exact column name. If no end timestamp column is defined for the target activity table, the timestamp column is referenced. END_TIMESTAMP_COLUMN ( [ expression ] )

SORTING_COLUMN

SORTING_COLUMN allows to refer the sorting column of an activity table in the data model without using the exact column name. SORTING_COLUMN ( [ expression ] )

TIMESTAMP_COLUMN

TIMESTAMP_COLUMN allows to refer the timestamp column of an activity table in the data model without using the exact column name. TIMESTAMP_COLUMN ( [ expression ] )


Set Operators

Overview

Set operators operate on sets of column values and table rows that may also contain multiple identical elements.

UNION_ALL_PULLBACK

The UNION_ALL_PULLBACK function returns the column corresponding to the index requested from the UNION_ALL table. UNION_ALL_PULLBACK ( expression, index )

UNION_ALL_TABLE

The UNION_ALL_TABLE function returns the table formed from the vertical concatenation of the columns of the input tables. UNION_ALL_TABLE ( table1 , ... , tableN )

UNION_ALL

The UNION_ALL function returns the vertical concatenation of columns. UNION_ALL ( table.column1 , ... , table.columnN )


Static PQL Functions

Overview

Static PQL functions provide a lightweight way to check general properties of a data model or expression before actually executing a PQL query. For example, you can check the data type of a data model column and based on the type modify the actual PQL query to be executed.

ARGUMENT_COUNT

ARGUMENT_COUNT counts the number of arguments (i.e., constants or expressions) passed to it. ARGUMENT_COUNT ( [ Expression [ , ... ] ] )

COLUMN_TYPE

COLUMN_TYPE returns the type of a data model column. COLUMN_TYPE ( "TABLE"."COLUMN" )

IF

IF evaluates a list of static conditions and replaces itself with the query for the first static condition that is true. IF static_condition THEN calculation [ ELSE IF condition THEN calculation ]* [ ELSE calculation ] END

STATIC CASE WHEN

STATIC CASE WHEN evaluates a list of static conditions and replaces itself with the expression for the first static condition that is true. STATIC CASE WHEN static_condition THEN expression [ WHEN static_condition THEN expression ]* ELSE expression END


Statistics

Overview

The three statistical functions available right now - QNORM, ZSCORE and BUCKET_LOWER_BOUND/BUCKET_UPPER_BOUND- are used to map or standardize a column under the assumption of statistical properties of the data, or in the case of BUCKET_LOWER_BOUND/BUCKET_UPPER_BOUND to sort the column values into suitable histogram buckets.

BUCKET_UPPER_BOUND - BUCKET_LOWER_BOUND

BUCKET_UPPER_BOUND and BUCKET_LOWER_BOUND calculate histogram buckets and identify the buckets in which the input values reside. BUCKET_UPPER_BOUND ( table.column [, SUGGESTED_WIDTH(suggested_width)] [, SUGGESTED_TICKS[suggested_ticks] ] [, MAX_COUNT(max_count) ] ) BUCKET_UPPER_BOUND ( table.column [, SUGGESTED_COUNT(suggested_count)] [, SUGGESTED_TICKS[suggested_ticks] ] [, MAX_COUNT(max_count) ] ) BUCKET_LOWER_BOUND ( table.column [, SUGGESTED_WIDTH(suggested_width)] [, SUGGESTED_TICKS[suggested_ticks] ] [, MAX_COUNT(max_count) ] ) BUCKET_LOWER_BOUND ( table.column [, SUGGESTED_COUNT(suggested_count)] [, SUGGESTED_TICKS[suggested_ticks] ] [, MAX_COUNT(max_count) ] ) BUCKET_UPPER_BOUND ( ) BUCKET_LOWER_BOUND ( )

QNORM

Evaluates the quantile function of values in a FLOAT column assuming the normal distribution N(0,1). QNORM ( table.column )

ZSCORE

This function calculates the z-score over an INT or a FLOAT. The output type is always FLOAT. ZSCORE ( table.column [, PARTITION BY ( partition_column, ... )] )


String Modification

Overview

There are multiple String Functions available in Celonis that allow to modify STRING data of columns and constants.

CLUSTER_STRINGS

CLUSTER_STRINGS clusters strings in a column based on their similarity. CLUSTER_STRINGS ( table.column , EDIT_THRESHOLD ( edit_distance ) [, WEIGHT ( weighted_tokens , weight ) ] [, PARTITION BY ( partition_column, ...)])

CONCAT

CONCAT and || return the concatenation of two or more strings. CONCAT ( table.column1, ..., table.columnN ) table.column1 || table.column2 [ || table.columnN ]*

DELETE_CHARACTERS

DELETE_CHARACTERS returns a string with the characters defined in the match string removed. DELETE_CHARACTERS ( table.column, match_string )

LEFT

LEFT returns the first n characters of a string. If the string is smaller than n characters, the whole string is returned. LEFT ( table.column, n )

LEN

LEN returns the size of the string, i.e., the number of characters. LEN ( table.column )

LOWER

LOWER returns the lower-cased version of a string. LOWER ( table.column )

LTRIM

LTRIM returns the string with leading whitespace removed. Optionally pass a string argument containing Unicode characters to be trimmed from the string. LTRIM ( table.column [, characters ])

MAP_CHARACTERS

MAP_CHARACTERS returns a string with the characters defined in the match string replaced by the corresponding character. MAP_CHARACTERS ( table.column, match_string, replace_string )

MATCH_STRINGS

MATCH_STRINGS finds the top-k best matching strings from another column for each string in the input column. MATCH_STRINGS (table1.input_column, table2.match_column [, TOP_K ( top_k )] [, SEPARATOR ( separator ) ])

PATINDEX

PATINDEX searches for the pattern-substring and returns its index (starting from 1) within the input if it was found (zero otherwise). PATINDEX ( table.column, pattern [, occurrence ] )

REPLACE

REPLACE returns the strings with pattern-substrings replaced by the string replace. REPLACE ( table.column, pattern, replace )

REVERSE

REVERSE returns the reversed string. REVERSE ( table.column )

RIGHT

RIGHT returns the last n characters of a string. If the string is smaller than n characters, the whole string is returned. RIGHT ( table.column, n )

RTRIM

RTRIM returns the string with trailing whitespace removed. Optionally pass a string argument containing Unicode characters to be trimmed from the string. RTRIM ( table.column [, characters ] )

STRINGHASH

STRINGHASH computes a cryptographic hash of a string, encoded using base64 encoding. The returned hash may change in the future. STRINGHASH ( table.column )

STRING_SPLIT

STRING_SPLIT splits the string into sub-strings based on pattern and returns the split-index-th split. STRING_SPLIT ( table.column, pattern, split-index )

STR_TO_INT

Deprecated: Please use TO_INT or TO_FLOAT instead. STR_TO_INT converts the input strings to integers. If the input is not a number (or outside of the supported range), null is returned. A decimal point is allowed in input, but decimal places are ignored/truncated. STR_TO_INT ( table.column )

SUBSTRING

SUBSTRING returns the substring of size size starting at character with zero-based index start. SUBSTRING ( table.column, start, size )

UPPER

UPPER returns the upper-cased version of a string. UPPER ( table.column )


Window

Overview

Window functions and operators perform calculations in a partitioned table.

INDEX_ORDER

INDEX_ORDER returns a column with integer indices, starting from 1. The indices indicate the order of the rows. INDEX_ORDER ( column [, ORDER BY ( sort_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] )

INTERPOLATE

INTERPOLATE can be used to interpolate missing values (i.e. NULL values). It is possible to specify a column based ordering and partitioning. The output type is INT if input column is INT and is CONSTANT. FLOAT otherwise. INTERPOLATE ( column, interpolation type [, ORDER BY ( sort_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] )

LAG

LAG returns the row that precedes the current row by offset number of rows. It is possible to specify a column based ordering and partitioning. Null values are skipped. LAG ( column [, ORDER BY ( sort_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] [, offset ] )

LEAD

LEAD returns the row that follows the current row by offset number of rows. It is possible to specify a column based ordering and partitioning. Null values are skipped. LEAD ( table.column [, ORDER BY ( order_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] [, offset ] )

RUNNING_SUM

RUNNING_SUM returns the Sum of the previous rows. It is possible to specify a column based ordering and partitioning. Null values are skipped. RUNNING_SUM ( column [, ORDER BY ( sort_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] )

WINDOW_AVG

WINDOW_AVG can be used to calculate the average over a user-defined window. It is possible to specify a column based ordering and partitioning. ``` WINDOW_AVG ( column, start, end [, ORDER BY ( sort_column [sorting], ... )] [, PARTITION BY ( partition_column, ... )] )