MaxCompute SQL provides other functions commonly used in the development process. You can use appropriate functions based on your business requirements. This topic describes the command syntax, parameters, and examples of the functions, such as CAST, DECODE, LEAST, ARRAY, SPLIT, and MAP.

Function Description
CAST Converts the result of an expression to a specific data type.
COALESCE Returns the first non-NULL value in the list.
DECODE Implements the IF-THEN-ELSE conditional branching feature.
GET_IDCARD_AGE Returns an age based on the ID card number.
GET_IDCARD_BIRTHDAY Returns the date of birth based on the ID card number.
GET_IDCARD_SEX Returns the gender based on the ID card number.
GREATEST Returns the maximum value of the input parameters.
ORDINAL Sorts the values of the input variables in ascending order and returns the value that is ranked at a specific position.
PARTITION_EXISTS Checks whether a specific partition exists in a table.
LEAST Returns the minimum value of the input parameters.
MAX_PT Returns the name of the largest level-1 partition in a partitioned table.
UUID Returns a random ID.
SAMPLE Samples all column values that are read and filters out the rows that do not meet sampling conditions.
CASE WHEN expression Returns different values based on the computing result of an expression.
IF expression Checks whether a specified condition evaluates to true.
SPLIT Splits a string with a specified delimiter and returns an array.
STR_TO_MAP Splits a string with a specified delimiter and returns a key-value pair.
EXPLODE Transposes one row of data into multiple rows. This function is a user-defined table function (UDTF).
STACK Splits a specified parameter group into a specified number of rows.
MAP Defines a MAP parameter based on a specified key-value pair.
MAP_KEYS Returns all keys in the MAP parameter as an array.
MAP_VALUES Returns all values in the MAP parameter as an array.
ARRAY Creates an array by using given input values.
SIZE Returns the number of key-value pairs in a specified MAP parameter or the number of elements in an array.
ARRAY_CONTAINS Checks whether a specified array contains a specified value.
SORT_ARRAY Sorts given arrays.
POSEXPLODE Converts an array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements.
STRUCT Creates a struct based on a given value list.
NAMED_STRUCT Creates a struct based on given name-value pairs.
INLINE Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
TABLE_EXISTS Checks whether a specified table exists.
TRANS_ARRAY Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.
TRANS_COLS Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.
#section_c2j_hbb_wdb Returns the specified default_value if value is NULL and returns the actual value in other cases.
GET_USER_ID Obtains the ID of the current account.

CAST

  • Syntax
    cast(<expr> as <type>)
  • Description
    Converts the result of expr to a specific data type. Usage:
    • cast(double as bigint): converts a value of the DOUBLE type to the BIGINT type.
    • cast(string as bigint): converts a value of the STRING type to the BIGINT type. If the string consists of numerals expressed in the INTEGER form, it is converted to the BIGINT type. If the string consists of numerals expressed in the FLOAT or EXPONENTIAL form, it is converted to the DOUBLE type and then to the BIGINT type.
    • The default date format, yyyy-mm-dd hh:mi:ss, is used for cast(string as datetime) and cast(datetime as string).
  • Parameters
    • expr: required. The expression whose computing result you want to convert.
    • type: required. The data type to which you want to convert the data.
  • Return value

    The return value is of the specified data type.

  • Examples
    • Example 1: common usage. Sample command:
      -- 1 is returned.
      select cast('1' as bigint);
    • Example 2: incorrect usage. If the conversion fails or an unsupported type conversion occurs, NULL is returned. Sample command:
      -- NULL is returned.
      select cast('abc' as bigint);

COALESCE

  • Syntax
    coalesce(<expr1>, <expr2>, ...)
  • Description

    Returns the first non-NULL value in <expr1>, <expr2>, .... If all values in the list are NULL, NULL is returned.

  • Parameter

    expr: required. A value that you want to check. Except the NULL values, all other values must be of the same data type. Otherwise, an error is returned. If all values are NULL, an error is returned.

  • Return value

    The data type of the return value is the same as the data type of the parameter.

  • Examples
    • Example 1: common usage. Sample command:
      -- 1 is returned.
      select coalesce(null,null,1,null,3,5,7);
    • Example 2: incorrect usage. The data types of parameter values are not the same, and an error is returned. Sample command:
      -- An error is returned, and the value abc cannot be identified.
      select coalesce(null,null,1,null,abc,5,7);
    • Example 3: incorrect usage. Non-NULL values do not exist and an error is returned. Sample command:
      -- An error is returned because non-NULL values do not exist.
      select coalesce(null,null,null,null);

DECODE

  • Syntax
    decode(<expression>, <search>, <result>[, <search>, <result>]...[, <default>])
  • Description

    Implements the IF-THEN-ELSE conditional branching feature.

  • Parameters
    • expression: required. The expression that you want to compare.
    • search: required. The search item that you use to compare with the expression.
    • result: required. The value returned when the value of search matches the value of expression.
    • default: optional. If no search items match the expression, the value of default is returned. If this parameter is not specified, NULL is returned.
    Note
    • You must specify at least three parameters.
    • Except the NULL values, all other values of the result parameter must be of the same data type. Otherwise, an error is returned.
    • The values of search and expression must be of the same data type. Otherwise, an error is returned.
  • Return value
    • If a search item matches the expression, result is returned.
    • If no search item matches the expression, default is returned.
    • If default is not specified, NULL is returned.
    • If duplicate search items match the expression, the first value is returned.
    • In most cases, NULL is returned when MaxCompute SQL calculates NULL=NULL. However, the DECODE function considers that the two NULL values are the same.
  • Example
    -- If the value of customer_id is 1, Taobao is returned. If the value is 2, Alipay is returned. If the value is 3, Aliyun is returned. If the value is NULL, N/A is returned. In other cases, Others is returned.
    select
    decode(customer_id,
    '1', 'Taobao',
    '2', 'Alipay',
    '3', 'Aliyun',
    Null, 'N/A',
    'Others') as result
    from sale_detail;
    -- The preceding statement is equivalent to the following statement:
    if customer_id = 1 then
    result := 'Taobao';
    elsif customer_id = 2 then
    result := 'Alipay';
    elsif customer_id = 3 then
    result := 'Aliyun';
    ...
    else
    result := 'Others';
    end if;

GET_IDCARD_AGE

  • Syntax
    get_idcard_age(<idcardno>)
  • Description

    Returns the current age based on the ID card number. The current age is the current year minus the birth year in the ID card number.

  • Parameter

    idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the BIGINT type is returned. If the input value is NULL, NULL is returned.

GET_IDCARD_BIRTHDAY

  • Syntax
    get_idcard_birthday(<idcardno>)
  • Description

    Returns the date of birth based on the ID card number.

  • Parameter

    idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the DATETIME type is returned. If the input value is NULL, NULL is returned.

GET_IDCARD_SEX

  • Syntax
    get_idcard_sex(<idcardno>)
  • Description

    Returns the gender based on the ID card number. Valid values: M and F. M indicates male and F indicates female.

  • Parameter

    idcardno: required. The 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the STRING type is returned. If the input value is NULL, NULL is returned.

GREATEST

  • Syntax
    greatest(<var1>, <var2>, …)
  • Description

    Returns the maximum value of the input parameters.

  • Parameter

    var: required. The value can be of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.

  • Return value
    • The maximum value of the input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • NULL is interpreted as the minimum value.
    • If the input parameters are of different data types, those of the DOUBLE, BIGINT, DECIMAL, and STRING types need to be converted to the DOUBLE type for comparison, and those of the STRING and DATETIME types need to be converted to the DATETIME type for comparison. Implicit conversions of other data types are not allowed.
    • If odps.sql.hive.compatible is set to true and the value of any input parameter is NULL, NULL is returned.

ORDINAL

  • Syntax
    ordinal(bigint <nth>, <var1>, <var2>, …)
  • Description

    Sorts the values of the input variables in ascending order and returns the value that is ranked nth.

  • Parameters
    • nth: required. The position of the value that you want to return. The value is of the BIGINT type. If it is set to NULL, NULL is returned.
    • var: required. The value is of the BIGINT, DOUBLE, DATETIME, or STRING type.
  • Return value
    • The value that is ranked nth is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • If a data type conversion is performed among the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. Implicit conversions of other data types are not allowed.
    • NULL is interpreted as the minimum value.
  • Example
    -- 2 is returned.
    select ordinal(3, 1, 3, 2, 5, 2, 4, 6); 

PARTITION_EXISTS

  • Syntax
    boolean partition_exists(string <table_name>, string... <partitions>)
  • Description

    Checks whether a specified partition exists in a table.

  • Parameters
    • table_name: required. The table name, which is of the STRING type. You can specify a project name in the table name, for example, my_proj.my_table. If you do not specify a project name, the current project name is used.
    • partitions: required. The partition name, which is of the STRING type. In this parameter, you must specify the values of partition key columns in a table based on the sequence of these columns. The number of values must be the same as the number of partition key columns.
  • Return value

    A value of the BOOLEAN type is returned. If the specified partition exists, True is returned. Otherwise, False is returned.

  • Example
    -- Create the foo partitioned table.
    create table foo (id bigint) partitioned by (ds string, hr string);
    -- Add partitions to the foo table.
    alter table foo add partition (ds='20190101', hr='1');
    -- Check whether partitions 20190101 and 1 exist. True is returned.
    select partition_exists('foo', '20190101', '1');

LEAST

  • Syntax
    least(<var1>, <var2>, …)
  • Description

    Returns the minimum value of the input parameters.

  • Parameter

    var: required. The values of the input parameters, which are of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.

  • Return value
    • The minimum value of input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • If a data type conversion is performed among the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. If a data type conversion is performed among the DECIMAL, DOUBLE, BIGINT, and STRING types, a value of the DECIMAL type is returned. Implicit conversions of other data types are not allowed.
    • NULL is interpreted as the minimum value.

MAX_PT

  • Syntax
    max_pt(<table_full_name>)
  • Description

    Returns the name of the largest level-1 partition in a partitioned table and reads the data files of this partition. This function determines the largest partition by sorting partitions in alphabetical order.

  • Parameter

    table_full_name: required. The table name, which must be specified with the project name, for example, prj.src. You must have read permissions on the table.

  • Return value

    The name of the largest partition is returned.

    Note

    If a partition is added by using the ALTER TABLE statement and the partition contains no data file, the name of this partition is not returned.

  • Example
    For example, tbl is a partitioned table, and the partitions in the table are 20120901 and 20120902, both of which contain data files. If you execute the following statement, the return value of max_pt is '20120902', and the MaxCompute SQL statement reads data from the 20120902 partition.
    select * from tbl where pt=max_pt('myproject.tbl');

UUID

  • Syntax
    string uuid()
  • Description

    Returns a random ID, which is in the format of 29347a88-1e57-41ae-bb68-a9edbdd9****.

    Note The return value is a random global ID that has a low probability of duplication.

SAMPLE

  • Syntax
    boolean sample(<x>, <y>, [<column_name1>, <column_name2>,...])
  • Description

    Samples all values read from column_name based on x and y, and filters out the rows that do not meet sampling conditions.

  • Parameters
    • x and y: x is required. x and y are integer constants that are greater than 0. Their values are of the BIGINT type. They indicate that the values fall into x portions based on the hash function and the yth portion is used.

      If y is not specified, you do not need to specify column_name. In this case, the first portion is used.

      An error is returned if x or y is of another data type, the value of x or y less than or equal to 0, or y is greater than x. If the value of x or y is NULL, NULL is returned.

    • column_name: optional. The name of the destination column on which sampling is performed. This parameter is optional. If it is not specified, random sampling is performed based on the values of x and y. It can be of any data type, and the column value can be NULL. No implicit conversion is performed. If column_name is constant NULL, an error is returned.
      Note To avoid data skew due to the NULL value, uniform hashing is performed on the NULL values in column_name in x portions. If column_name is not specified and the amount of data is small, the output is not necessarily uniform. In this case, we recommend that you specify column_name to obtain a uniform output.
  • Return value

    A value of the BOOLEAN type is returned.

  • Example

    Assume that the tbla table exists and contains the cola column.

    -- The values in the cola column fall into four portions based on the hash function, and the first portion is used. True is returned.
    select * from tbla where sample (4, 1 , cola);
    -- The values in each row are randomly hashed to four portions, and the second portion is used. True is returned.
    select * from tbla where sample (4, 2);

CASE WHEN expression

  • Syntax
    MaxCompute provides the following CASE WHEN syntax:
    • case <value>
      when <value1> then <result1>
      when <value2> then <result2>
      ...
      else <resultn>
      end
    • case
      when (<_condition1>) then <result1>
      when (<_condition2>) then <result2>
      when (<_condition3>) then <result3>
      ...
      else <resultn>
      end
  • Description

    Returns the value of result based on the calculation result of value or _condition.

  • Parameters
    • value: required. The value used for comparison.
    • _condition: required. The specified condition.
    • result: required. The return value.
    Note
    • If the data types of all result values are only BIGINT and DOUBLE, the data types are converted to the DOUBLE type and then values are returned.
    • If result values of the STRING type exist, the data types are converted to the STRING type and then values are returned. If an unsupported conversion, such as a conversion from the BOOLEAN type to the STRING type, is involved, an error is returned.
    • Conversions between other data types are not allowed.
  • Example
    select 
    case  
    when shop_name is null then 'default_region'
    when shop_name like 'hang%' then 'zj_region'
    end as region 
    from sale_detail;

IF expression

  • Syntax
    if(<testCondition>, <valueTrue>, <valueFalseOrNull>)
  • Description

    Checks whether testCondition evalutes to true. If testCondition evalutes to true, the value of valueTrue is returned. Otherwise, the value of valueFalseOrNull is returned.

  • Parameters
    • testCondition: required. The expression that you want to evaluate. The value is of the BOOLEAN type.
    • valueTrue: required. The value returned when testCondition evaluates to true
    • valueFalseOrNull: the value returned when testCondition evaluates to false. You can set this parameter to NULL.
  • Return value

    The data type of the return value is the same as that of valueTrue or valueFalseOrNull.

  • Example
    -- 200 is returned.
    select if(1=2,100,200); 

SPLIT

  • Syntax
    split(<str>, <pat>)
  • Description

    Returns an array after str is split with pat.

  • Parameters
    • str: required. The string that you want to split, which is of the STRING type.
    • pat: required. The STRING-type delimiter, which can be a regular expression.
  • Return value

    array <string> is returned.

  • Example
    -- ["a","b","c"] is returned.
    select split("a,b,c",",");

STR_TO_MAP

  • Syntax
    str_to_map(<text> [, <delimiter1> [, <delimiter2>]])
  • Description

    Splits text into key-value pairs by using delimiter1 and then separates keys from values in the key-value pairs by using delimiter2.

  • Parameters
    • text: required. The string that you want to split, which is of the STRING type.
    • delimiter1: required. The delimiter of the STRING type. If this parameter is not specified, commas (,) are used.
    • delimiter2: required. The delimiter of the STRING type. If this parameter is not specified, equal signs (=) are used.
      Note If the delimiter is a regular expression or special character, you must add two backslashes (\\) before the delimiter for escaping. The following special characters can be used as a delimiter: . ? + * :
  • Return value

    A value of the map<string, string> type is returned. The return value indicates that the string specified by text is split by using delimiter1 and delimiter2.

  • Example
    -- {"test2":"2","test1":"1"} is returned.
    select str_to_map('test1&1-test2&2','-','&');
    -- {"test2":"2","test1":"1"} is returned.
    select str_to_map("test1.1,test2.2", ",", "\\.") ;

EXPLODE

  • Limits
    • A SELECT statement can contain only one EXPLODE function, and no other columns of a table are allowed.
    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.
  • Syntax
    explode (<var>)
  • Description
    Transposes one row of data into multiple rows. This function is a UDTF.
    • If the parameter value is of the array<T> type, the array stored in the column is transposed into multiple rows.
    • If the parameter value is of the map<K, V> type, each key-value pair in the MAP parameter stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
  • Parameter

    var: the data type, which can be array<T> or map<K, V>.

  • Return value

    Rows after transposition are returned.

  • Example
    select explode(array(null, 'a', 'b', 'c')) col;
    -- Return result:
    +------------+
    | col        |
    +------------+
    | NULL       |
    | a          |
    | b          |
    | c          |
    +------------+

STACK

  • Syntax
    stack(n, expr1, ..., exprk) 
  • Description

    Splits expr1, ..., exprk into n rows. Unless otherwise specified, the output result uses the default column names col0, col1....

  • Parameters
    • n: required. The number of rows obtained after splitting.
    • expr: required. The parameter that you want to split expr1,... exprk must be of the INTEGER type, and the number of parameters must be an integer multiple of n. The parameter must be able to be split into n complete rows. Otherwise, an error is returned.
  • Return value

    n rows with a specific number of columns are returned. The number of columns is equal to the number of parameters divided by n.

  • Example
    -- Split the parameter group of 1, 2, 3, 4, 5, 6 into three rows.
    select stack(3, 1, 2, 3, 4, 5, 6);
    -- Return result:
    +------+------+
    | col0 | col1 |
    +------+------+
    | 1    | 2    |
    | 3    | 4    |
    | 5    | 6    |
    +------+------+
    
    -- Split 'A',10,date '2015-01-01','B',20,date '2016-01-01' into two rows.
    select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
    -- Return result:
    +------+------+------+
    | col0 | col1 | col2 |
    +------+------+------+
    | A    | 10   | 2015-01-01 |
    | B    | 20   | 2016-01-01 |
    +------+------+------+
    
    -- Split the parameter group of a, b, c, and d into two rows. If the source table contains multiple rows, this function is called for each row.
    select stack(2,a,b,c,d) as (col,value)
    from values 
        (1,1,2,3,4),
        (2,5,6,7,8),
        (3,9,10,11,12),
        (4,13,14,15,null)
    as t(key,a,b,c,d);
    -- Return result:
    +------+-------+
    | col  | value |
    +------+-------+
    | 1    | 2     |
    | 3    | 4     |
    | 5    | 6     |
    | 7    | 8     |
    | 9    | 10    |
    | 11   | 12    |
    | 13   | 14    |
    | 15   | NULL  |
    +------+-------+
    
    -- Use this function with the LATERAL VIEW clause.
    select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20, date '2016-01-01') tf as col0,col1,col2;
    -- Return result:
    +------+------+------+
    | col0 | col1 | col2 |
    +------+------+------+
    | A    | 10   | 2015-01-01 |
    | B    | 20   | 2016-01-01 |
    +------+------+------+

MAP

  • Syntax
    MAP map(K <key1>, V <value1>, K <key2>, V <value2>, ...)
  • Description

    Defines a MAP parameter based on a specific key-value pair.

  • Parameters
    • key: required. All keys must be of the same data types, including the types after an implicit conversion. The data types must be basic types.
    • value: required. All values must be of the same data type, including the data types after an implicit conversion. Random data types can be used.
    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    The MAP parameter is returned.

  • Example
    Column information of the t_table table is c1 BIGINT, c2 STRING, c3 STRING, c4 BIGINT, c5 BIGINT. The table contains the following data:
    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+
    -- Execute the following SQL statement:
    select  map(c2,c4,c3,c5) from t_table;
    -- Return result:
    +------+
    | _c0  |
    +------+
    | {k11:86, k21:15} |
    | {k12:97, k22:2} |
    | {k13:99, k23:1} |
    +------+

MAP_KEYS

  • Syntax
    ARRAY map_keys(map<K, V>)
  • Description

    Returns all keys in the MAP parameter as an array.

  • Parameter

    The parameter is of the MAP type.

  • Return value

    A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.

  • Example
    Column information of the t_table_map table is c1 BIGINT,t_map MAP<STRING,BIGINT>. The table contains the following data:
    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+
    -- Execute the following SQL statement:
    select  c1,map_keys(t_map) from t_table_map;
    -- Return result:
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [k11, k21] |
    | 1001       | [k12, k22] |
    | 1002       | [k13, k23] |
    +------------+------+

MAP_VALUES

  • Syntax
    ARRAY map_values(map<K, V>)
  • Description

    Returns all values in the MAP parameter as an array.

  • Parameter

    The parameter is of the MAP type.

  • Return value

    A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.

  • Example
    select map_values(map('a',123,'b',456));
    -- Return result:
    [123, 456]

ARRAY

  • Syntax
    ARRAY array(value1,value2, ...)
  • Description

    Creates an array by using given values.

  • Parameter

    The parameters can be of any type, but the types of all parameters must be the same.

    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    A value of the ARRAY type is returned.

  • Example
    Column information of the t_table table is c1 BIGINT, c2 STRING, c3 STRING, c4 BIGINT, c5 BIGINT. The table contains the following data:
    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+
    
    -- Execute the following SQL statement:
    select array(c2,c4,c3,c5) from t_table;
    -- Return result:
    +------+
    | _c0  |
    +------+
    | [k11, 86, k21, 15] |
    | [k12, 97, k22, 2] |
    | [k13, 99, k23, 1] |
    +------+

SIZE

  • Syntax
    INT size(map)
    INT size(array)
  • Description

    Uses size(map) to return the number of key-value pairs in a specified MAP parameter. Uses size(array) to return the number of elements in a specified array.

  • Parameters
    • map: the data of the MAP type.
    • array: the data of the ARRAY type.
  • Example
    -- 2 is returned.
    select size(map('a',123,'b',456)); 
    -- 3 is returned.
    select size(map('a',123,'b',456,'c',789)); 
    -- 2 is returned.
    select size(array('a','b'));
    -- 3 is returned. 
    select size(array(123,456,789)); 

ARRAY_CONTAINS

  • Syntax
    boolean array_contains(ARRAY<T> a,value v)
  • Description

    Checks whether array a contains v.

  • Parameters
    • a: the data of the ARRAY type.
    • v: The value of v must be of the same data type as the data in the array.
  • Example
    Column information of the t_table_array table is c1 BIGINT, t_array ARRAY<STRING>. The table contains the following data:
    +------------+---------+
    | c1         | t_array |
    +------------+---------+
    | 1000       | [k11, 86, k21, 15] |
    | 1001       | [k12, 97, k22, 2] |
    | 1002       | [k13, 99, k23, 1] |
    +------------+---------+
    -- Execute the following SQL statement:
    select c1, array_contains(t_array,'1') from t_table_array;
    -- Return result:
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | false |
    | 1001       | false |
    | 1002       | true |
    +------------+------+

SORT_ARRAY

  • Syntax
    ARRAY sort_array(ARRAY<T>)
  • Description

    Sorts given arrays.

  • Parameter

    ARRAY<T>: the data of the ARRAY type. Data in the array can be of any data type.

    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    A value of the ARRAY type is returned.

  • Example
    Column information of the t_array table is c1 ARRAY<STRING>,c2 ARRAY<INT> ,c3 ARRAY<STRING>. The table contains the following data:
    +------------+---------+--------------+
    | c1         | c2      | c3           |
    +------------+---------+--------------+
    | [a, c, f, b]  | [4, 5, 7, 2, 5, 8]  |  [You, Me, Him] |
    +------------+---------+--------------+
    -- Execute the following SQL statement:
    select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
    -- Return result:
    [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, You, Me]

POSEXPLODE

  • Syntax
    posexplode(ARRAY<T>)
  • Description

    Converts a given array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements.

  • Parameter

    ARRAY<T>: the data of the ARRAY type. Data in the array can be of any data type.

    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    The generated table is returned.

  • Example
    select posexplode(array('a','c','f','b'));
    -- Return result:
    +------------+------------+
    | pos        | val        |
    +------------+------------+
    | 0          | a          |
    | 1          | c          |
    | 2          | f          |
    | 3          | b          |
    +------------+------------+

STRUCT

  • Syntax
    STRUCT struct(value1,value2, ...)
  • Description

    Creates a struct based on a given value list.

  • Parameter

    value: It can be of any data type.

    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as col1, col2, ....

  • Example
    select struct('a',123,'ture',56.90);
    -- Return result:
    +------------+
    | _c0        |
    +------------+
    | {"col1":"a","col2":123,"col3":"ture","col4":56.9} |
    +------------+
    1 records (at most 10000 supported) fetched by instance tunnel.

NAMED_STRUCT

  • Syntax
    STRUCT named_struct(string name1, T1 value1, string name2, T2 value2, ...)
  • Description

    Creates a struct based on given name-value pairs.

  • Parameters
    • value: It can be of any data type.
    • name: the column name of the STRING type. This parameter is a constant.
    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as name1, name2, ....

  • Example
    select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);
    -- Return result:
    +------------+
    | _c0        |
    +------------+
    | {"user_id":10001,"user_name":"LiLei","married":"F","weight":63.5} |
    +------------+
    1 records (at most 10000 supported) fetched by instance tunnel.

INLINE

  • Syntax
    inline(array<struct<f1:T1, f2:T2, ... >>)
  • Description

    Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

  • Parameter

    STRUCT: The values in the array can be of any data type.

    MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:
    • Session level: To use a new data type, you must insert SET odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can set the project as needed. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
  • Return value

    The function generated by the table is returned.

  • Example
    Column information of the t_table table is t_struct struct<user_id:bigint,user_name:string,married:string,weight:double>. The table contains the following data:
    +----------+
    | t_struct |
    +----------+
    | {user_id:10001, user_name:LiLei, married:N, weight:63.5} |
    | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} |
    +----------+
    
    -- Execute the following SQL statement:
    select inline(array(t_struct)) from t_table;
    -- Return result:
    +------------+-----------+---------+------------+
    | user_id    | user_name | married | weight     |
    +------------+-----------+---------+------------+
    | 10001      | LiLei     | N       | 63.5       |
    | 10002      | HanMeiMei | Y       | 43.5       |
    +------------+-----------+---------+------------+

TABLE_EXISTS

  • Syntax
    boolean table_exists(string table_name)
  • Description

    Checks whether a specified table exists.

  • Parameter

    table_name: the name of the table that you want to query. It is of the STRING type. You can specify a project name in the table name, for example, my_proj.my_table. If you do not specify a project name, the current project name is used.

  • Return value

    A value of the BOOLEAN type is returned. If the specified table exists, True is returned. Otherwise, False is returned.

  • Example
    -- Used for the list in the SELECT statement.
    select from(table_exists('abd'), col1, col2) FROM src;

TRANS_ARRAY

  • Limits
    • All columns that are used as key must be placed before the columns are to be transposed.
    • Only one UDTF is allowed in a SELECT statement.
    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.
  • Syntax
    trans_array (num_keys, separator, key1,key2,…,col1, col2,col3) as (key1,key2,…,col1, col2)
  • Description

    Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.

  • Parameters
    • num_keys: the number of columns that can be used as key when you transpose one row into multiple rows. The value is a constant of the BIGINT type, which must be greater than or equal to 0.
    • key: duplicate columns in multiple rows after transposition.
    • separator: the delimiter used to split a string into multiple elements. It is a constant of the STRING type. If it is left blank, an error is returned.
    • keys: the columns that are used as key during the transposition. The number of keys is specified by num_keys. If num_keys equals the total number of all columns (which means that all columns are used as key), only one row is returned.
    • cols: the array that you want to be converted to rows. All columns that follow keys are considered arrays to be transposed. The parameter value must be of the STRING type to store arrays in the STRING format, such as Hangzhou;Beijing;shanghai. The values in this array are separated by semicolons (;).
  • Return value

    Transposed rows are returned. The new column name is specified by as. The data types of columns that are used as key remain unchanged. All other columns are of the STRING type. The number of separated rows is based on the array with the maximum number of elements. If the number of rows is insufficient, NULL is added.

  • Example
    The t_table table contains the following data:
    +----------+----------+------------+
    | login_id | login_ip | login_time |
    +----------+----------+------------+
    | wangwangA | 192.168.0.1,192.168.0.2 | 20120101010000,20120102010000 |
    | wangwangB | 192.168.45.10,192.168.67.22,192,168.6.3 | 20120111010000,20120112010000,20120223080000 |
    +----------+----------+------------+
    -- Execute the following SQL statement:
    select trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time) from t_table;
    Return result:
    +----------+----------+------------+
    | login_id | login_ip | login_time |
    +----------+----------+------------+
    | wangwangB | 192.168.45.10 | 20120111010000 |
    | wangwangB | 192.168.67.22 | 20120112010000 |
    | wangwangB | 192.168.6.3 | 20120223080000 |
    | wangwangA | 192.168.0.1 | 20120101010000 |
    | wangwangA | 192.168.0.2 | 20120102010000 |
    +----------+----------+------------+
    
    -- The table contains the following data:
    Login_id LOGIN_IP LOGIN_TIME 
    wangwangA 192.168.0.1,192.168.0.2 20120101010000
    -- NULL is added to supplement the array which has insufficient data. 
    Login_id Login_ip Login_time 
    wangwangA 192.168.0.1 20120101010000
    wangwangA 192.168.0.2 NULL

TRANS_COLS

  • Limits
    • All columns that are used as key must be placed before the columns are to be transposed.
    • Only one UDTF is allowed in a SELECT statement.
  • Syntax
    trans_cols (num_keys, key1,key2,…,col1, col2,col3) as (idx, key1,key2,…,col1, col2)
  • Description

    Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.

  • Parameters
    • num_keys: The value is a constant of the BIGINT type, which must be greater than or equal to 0. This parameter specifies the number of columns that can be used as key when you transpose one row into multiple rows.
    • key: duplicate columns in multiple rows when you transpose one row to multiple rows.
    • keys: the columns that are used as key during the transposition. The number of keys is specified by num_keys. If num_keys equals the total number of all columns (which means that all columns are used as key), only one row is returned.
    • col: the column you want to convert into a row.
  • Return value

    Transposed rows are returned. The new column name is specified by as. The first output column is the subscripts of the transposition, which start with 1. The data types of the columns that are used as key remain unchanged, whereas the data types of other columns remain unchanged.

  • Example
    The t_table table contains the following data:
    +----------+----------+------------+
    | Login_id | Login_ip1 | Login_ip2 |
    +----------+----------+------------+
    | wangwangA | 192.168.0.1 | 192.168.0.2 |
    +----------+----------+------------+
    -- Execute the following SQL statement:
    select trans_cols(1, login_id, login_ip1, login_ip2) as (idx, login_id, login_ip) from t_table;
    -- Return result:
    idx    login_id    login_ip
    1    wangwangA    192.168.0.1
    2    wangwangA    192.168.0.2

GET_USER_ID

  • Syntax
    get_user_id()
  • Description

    Obtains the ID of the current account, which is the user ID (UID).

  • Parameter

    No input parameters are required, and an error is returned.

  • Return value

    The ID of the current account is returned.

  • Example
    select get_user_id();
    -- Return result:
    +------------+
    | _c0        |
    +------------+
    | 1117xxxxxxxx8519 |
    +------------+