This topic describes the updates to MaxCompute in reverse chronological order.

December 14, 2020 (UTC+08:00): Check on input parameters of some date functions in MaxCompute is changed

MaxCompute strictly checks the format of input parameters of some date functions to avoid inaccurate results or other problems caused by invalid input. The change applies to all regionson the international site from December 14, 2020 to January 30, 2021. You must evaluate the current usage of related functions in your project and make preparations for the change in advance. The change involves the following functions:
  • CAST(expr AS DATE)
    • Before the change: The CAST(expr AS DATE) function does not strictly check input strings and characters that are not of the DATE type may be used as the input parameters. For example, execute the following query statement:
      select cast(a as date) from values ('2020-1x-19') t(a);
      The following result is returned:
      +------------+
      | a          |
      +------------+
      | 2020-01-19 |
      +------------+
    • After the change: The CAST(expr AS DATE) function strictly checks input strings. Only the input strings whose year, month, and day fields are all represented by digits can pass the check. If one of the three fields in an input string contains letters or spaces, the CAST(expr AS DATE) function considers the input string invalid and returns NULL. For example, execute the following query statement:
      select cast(a as date) from values ('2020-1x-19') t(a);
      The following result is returned:
      +------------+
      | a          |
      +------------+
      | NULL       |
      +------------+
  • TO_DATE
    • Before the change: The TO_DATE function does not strictly check input strings and characters that are not of the DATE type may be used as the input parameters. For example, execute the following query statement:
      select to_date(a) from values ('2020-1x-19') t(a);
      The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | 2020-01-19 |
      +------------+
    • After the change: The TO_DATE function strictly checks input strings. Only the input strings whose year, month, and day fields are all represented by digits can pass the check. If one of the three fields in an input string contains letters or spaces, the TO_DATE function considers the input string invalid and returns NULL. For example, execute the following query statement:
      select to_date(a) from values ('2020-1x-19') t(a);
      The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | NULL       |
      +------------+
  • CAST(expr AS TIMESTAMP)
    • Before the change: The CAST(expr AS TIMESTAMP) function does not strictly check input strings and characters that are not of the DATE type may be used as the input parameters. For example, execute the following query statement:
      select cast(a as timestamp) from values ('2020-11-19 16:3x:00.001') t(a);
      The following result is returned:
      +------------+
      | a          |
      +------------+
      | 2020-11-19 16:03:00.1 |
      +------------+
    • After the change: The CAST(expr AS TIMESTAMP) function strictly checks input strings. Only the input strings whose year, month, and day fields are all represented by digits can pass the check. If one of the three fields in an input string contains letters or spaces, the CAST(expr AS TIMESTAMP) function considers the input string invalid and returns NULL. For example, execute the following query statement:
      select cast(a as timestamp) from values ('2020-11-19 16:3x:00.001') t(a);
      The following result is returned:
      +------------+
      | a          |
      +------------+
      | NULL       |
      +------------+

October 13, 2020 (UTC+08:00): The SQL engine is updated for better compatibility

The following table describes the schedule for updating the SQL engine. If a change occurs, the new schedule prevails.
Sequence Region Date
1 India (Mumbai), Indonesia (Jakarta), and UK (London) October 13, 2020
2 US (Virginia), UAE (Dubai), China North 2 Ali Gov, and China East 2 Finance October 15, 2020
3 Japan (Tokyo), Australia (Sydney), US (Silicon Valley), and Malaysia (Kuala Lumpur) October 20, 2020
4 Singapore (Singapore), China (Hong Kong), and Germany (Frankfurt) October 22, 2020
The URL_DECODE and CONV functions in MaxCompute SQL are updated. This section describes the update details.
  • URL_DECODE
    • Before the update: If two parameters are specified for the URL_DECODE function, the function ignores the second parameter and decodes the value of the first parameter in UTF-8. For example, if you specify URL_DECODE(url, "gbk") in code, URL_DECODE(url) is executed.
    • After the update: If two parameters are specified for the URL_DECODE function, the function first performs decoding based on the percent sign (%). Then, the function performs decoding based on the encoding format of the second parameter. The following examples show how the URL_DECODE function works in DataWorks DataStudio before and after the function is updated:
      SELECT URL_DECODE("%CD%F5", "gbk");
      -- Before the update, the function returns garbled characters. The function ignores gbk and decodes %CD%F5 in UTF-8.
      -- After the update, the function returns the Chinese character. The function performs URL decoding on %CD%F5. \xCD\xF5 is obtained, which is the GBK-encoded string of the Chinese character.
      
      SELECT URL_DECODE("%E7%8E%8B", "gbk");
      -- Before the update, the function returns the Chinese character. %E7%8E%8B is the UTF-8-encoded string of the Chinese character. The function ignores gbk and decodes %E7%8E%8B in UTF-8.
      -- After the update, the function returns NULL. The function performs URL decoding on %E7%8E%8B. \xE7\x8E\x8B is obtained, which is an invalid GBK-encoded string.
      
      SELECT URL_DECODE("%E7%8E%8B");
      -- The function returns the Chinese character both before and after the update. %E7%8E%8B is the UTF-8-encoded string of the Chinese character. In this example, no decoding format is specified for the function, and the function decodes %E7%8E%8B in UTF-8 by default.
      Note The command-line interface (CLI) of the Windows operating system has the following issue: If you use the odpscmd client of MaxCompute to run commands in the Windows operating system, the GBK decoding result of the URL_DECODE function may be decoded in another format.
  • CONV
    • In a project that uses the Hive-compatible data type edition, the CONV function returns 0 both before and after the update if the input parameters are invalid.
    • In a project that uses the MaxCompute V1.0 or MaxCompute V2.0 data type edition:
      • Before the update: The CONV function returns garbled characters if the input parameters are invalid.
      • After the update: The CONV function returns NULL if the input parameters are invalid.

        For example, if you specify CONV("00e04c9d034a", 2, 10) in code, NULL is returned.

July 24, 2020 (UTC+08:00): MaxCompute provides more built-in aggregate functions

The following built-in aggregate functions are added to MaxCompute:
  • APPROX_DISTINCT(value): returns the approximate number of distinct input values.
  • ANY_VALUE(value): returns a non-deterministic value from the specified value range.
  • ARG_MAX(valueToMaximize, valueToReturn): finds the row where the maximum value of valueToMaximize resides and then returns the value of valueToReturn in the row.
  • ARG_MIN(valueToMinimize, valueToReturn): finds the row where the minimum value of valueToMinimize resides and then returns the value of valueToReturn in the row.
For more information about the built-in aggregate functions, see Aggregate functions.

The default data type edition for a new project is changed from the MaxCompute V1.0 to the MaxCompute V2.0

If you create a MaxCompute project in the DataWorks console, the default data type edition of the project is changed from the MaxCompute V1.0 data type edition to the MaxCompute V2.0 data type edition. This change applies to all regionson the international site from July 29 to August 6, 2020. If you have created projects within your Alibaba Cloud account, make sure that the data type edition you select for a new project is the same as that of the existing projects. Incompatibility issues may occur if projects of different data type editions interact with each other.

MaxCompute provides the following data type editions: MaxCompute V1.0 data type edition, MaxCompute V2.0 data type edition, and Hive-compatible data type edition. These editions are different in definitions and usage. MaxCompute provides three attributes to configure data type editions. You can configure these attributes as required to enable an edition. For more information, see Date types.
Note This feature has no impact on the data type editions of existing MaxCompute projects. You can change the data type editions of existing MaxCompute projects. For more information, see Date types.

June 29, 2020 (UTC+08:00): Users can choose data type editions for new projects

This feature applies to all regionson the international site from June 29 to July 15, 2020. After the feature is available, you must select initial data type editions for new MaxCompute projects.

MaxCompute provides the following data type editions: MaxCompute V1.0 data type edition, MaxCompute V2.0 data type edition, and Hive-compatible data type edition. These editions are different in definitions and usage. MaxCompute provides three attributes to configure data type editions. You can configure these attributes as required to enable an edition. For more information, see Date types.
Note This feature has no impact on the data type editions of existing MaxCompute projects. You can change the data type editions of existing MaxCompute projects. For more information, see Date types.

March 15, 2020 (UTC+08:00): The storage price of MaxCompute is reduced

As of March 15, 2020, the storage price of MaxCompute is reduced. For more information, see Storage pricing (pay-as-you-go). The price is reduced based on the following rules:
  • The number of pricing tiers is reduced from five to three to simplify storage billing.
  • The unit price in each new tier is reduced to lower the storage price.
The following table describes the tiered pricing method that was used before March 15, 2020.
Volume of stored data Tiered unit price Fixed price
0 < Data volume ≤ 1 GB N/A USD 0.00 per day
1 GB < Data volume ≤ 100 GB USD 0.0028 per GB per day N/A
100 GB < Data volume ≤ 1 TB USD 0.0014 per GB per day N/A
1 TB < Data volume ≤ 10 TB USD 0.0013 per GB per day N/A
10 TB < Data volume ≤ 100 TB USD 0.0011 per GB per day N/A
Data volume > 100 TB USD 0.0009 per GB per day N/A
The following table describes the tiered pricing method that is used since March 15, 2020.
Volume of stored data Tiered unit price Fixed price
0 < Data volume ≤ 1 GB N/A USD 0.00 per day
1 GB < Data volume ≤ 10 TB USD 0.0011 per GB per day N/A
10 TB < Data volume ≤ 100 TB USD 0.0009 per GB per day N/A
Data volume > 100 TB USD 0.0006 per GB per day N/A
The storage billing method remains unchanged. For more information, see Storage pricing (pay-as-you-go).
  • You are charged for daily stored data, including tables and resources, in MaxCompute by tier based on the data volume.
  • MaxCompute records the volume of data that is stored in each project on an hourly basis and then calculates the average volume of stored data for each day. The storage fee is equal to the average volume of stored data multiplied by the unit price. MaxCompute calculates the average volume of stored data in each project in a day. Therefore, storing more data in a specific project means a lower storage fee.
Assume that the daily average data volume of a project is 1 PB. The following formula is used to calculate the daily fee based on the original tiered pricing method:
(100 - 1) GB × USD 0.0028 per GB per day 
+ (1,024 - 100) GB × USD 0.0014 per GB per day 
+ (10,240 - 1,024) GB × USD 0.0013 per GB per day 
+ (102,400 - 10,240) GB × USD 0.0011 per GB per day 
+ (10,240 × 10,240 - 102,400) GB × USD 0.0009 per GB per day
= USD 966.486 per day
The following formula is used to calculate the daily fee based on the new tiered pricing method:
(10,240 - 1) GB × USD 0.0011 per GB per day
+ (102,400 - 10,240) GB × USD 0.0009 per GB per day 
+ (10,240 × 10,240 - 102,400) GB × USD 0.0006 per GB per day
= USD 661.9125 per day

February 24, 2020 (UTC+08:00): The SQL engine is updated for better compatibility

The following table describes the schedule for updating the SQL engine. If a change occurs, the new schedule prevails.
Sequence Region Date
1 Indonesia (Jakarta), UK (London), and India (Mumbai) February 24, 2020
2 UAE (Dubai), US (Virginia), China North 2 Ali Gov, and China (Hong Kong) February 26, 2020
3 Malaysia (Kuala Lumpur), Japan (Tokyo), and Germany (Frankfurt) March 2, 2020
4 US (Silicon Valley), Singapore (Singapore), and Australia (Sydney) March 4, 2020
  • Changes to the execution rule of the GET_IDCARD_AGE function
    • Based on the original execution rule of the GET_IDCARD_AGE function, if the difference between the current year and the year of birth is greater than 100, NULL is returned. After the new rule is applied, the difference between the current year and the year of birth is returned even if the difference exceeds 100. For example, the execution result of get_idcard_age('110101190001011009') is NULL before the change and 120 after the change.
    • If you want to apply the original execution rule to a query statement after the change, you must find the query statement and modify this statement as required. For example, you can add the IF function or CASE WHEN expression for processing the return result of the GET_IDCARD_AGE function to the query statement.
      Original query statement Modified query statement
      GET_IDCARD_AGE(idcardno) if(GET_IDCARD_AGE(idcardno) > 100, NULL, GET_IDCARD_AGE(idcardno))
      GET_IDCARD_AGE(idcardno) CASE WHEN GET_IDCARD_AGE(idcardno) > 100 THEN NULL ELSE GET_IDCARD_AGE(idcardno) END
  • Changes to the execution rule of the CONCAT_WS function
    • Before the change, if the CONCAT_WS function that is used in a query does not support Hive and has three or more parameters, including at least one parameter of the ARRAY type, the array items do not appear in the final result. For example, the expected result of the concat_ws(',', array('a'), array('b', 'c')) function is "a,b,c", but the actual result is ",,,".
    • After the change, the parameters of the STRING and ARRAY types can coexist in the CONCAT_WS function without Hive support enabled. For example, the return result of the concat_ws(',', array('a'), array('b', 'c')) function is "a,b,c".
  • Changes to the execution rule of the Like%% function when the input value is an empty string.
    Before the change, if the input value for the character matching function LIKE is an empty string and its pattern is %%, the return value is False. After the change, the return value is True.
    -- Create a table and insert an empty string to the table.
    create table if not exists table_test (a string) lifecycle 3;
    insert into table table_test values ('');
    
    select a like '%%' from table_test;
    
    -- The following result is returned before the change.
    +------+
    | _c0  |
    +------+
    | false |
    +------+
    
    -- The following result is returned after the change.
    +------+
    | _c0  |
    +------+
    | true |
    +------+

December 25, 2019 (UTC+08:00): MaxCompute is compatible with open source geospatial UDFs.

MaxCompute is compatible with open source geospatial UDFs, which are implemented by Esri for Apache Hive. You can register open source geospatial UDFs with MaxCompute so that the functions can be called like open source Hive UDFs. For more information, see Open source geospatial UDFs.

October 11, 2019 (UTC+08:00): New features of MaxCompute SQL are introduced

  • Support for specifying the priorities of the JOIN operations or set operations, including UNION, INTERSECT, and EXCEPT by using parentheses ()
    SELECT * FROM src JOIN (src2 JOIN src3 on xxx) ON yyy; 
    SELECT * FROM src UNION ALL (SELECT * FROM src2 UNION ALL SELECT * FROM src3);

    For more information, see JOIN and UNION, INTERSECT, and EXCEPT.

  • Support for the hive.orderby.position.alias and hive.groupby.position.alias flags
    If the two flags are enabled, integer constants in the ORDER BY and GROUP BY clauses of the SELECT statements are processed as column numbers.
    The columns in the src table can be sorted by keys or values.
    SELECT * FROM src ORDER BY 1;
    -- Equivalent to
    SELECT * FROM src ORDER BY key;

    For more information, see SELECT syntax.

  • Support for new built-in functions
    • STRING JSON_TUPLE(STRING json,STRING key1,STRING key2,…): converts a JSON string to a tuple based on a group of keys. The JSON_TUPLE() function supports multi-level nesting. It can parse JSON data that contains Chinese characters or nested arrays. To parse the same JSON string multiple times, you must call the GET_JSON_OBJECT() function multiple times. However, the JSON_TUPLE() function can parse the JSON string only once after you enter multiple keys at a time. Compared with GET_JSON_OBJECT(), JSON_TUPLE() is more efficient. For more information, see String functions.
    • INT EXTRACT(datepart from timestamp): extracts a part specified by the datepart parameter from a date value. The value of datepart can be YEAR, MONTH, or DAY. The value of timestamp is a date value of the TIMESTAMP type. For more information, see Date functions.
  • MaxCompute allows you to specify default values for columns in a table.
    The DEFAULT VALUE clause allows you to specify a default value for a column when you create a table. If you do not specify a value for the column in an INSERT statement, this default value is inserted. Example:
    CREATE TABLE t (a bigint default 10, b bigint);
    INSERT INTO TABLE t(b) SELECT 100; 
    -- Equivalent to
    INSERT INTO TABLE t(a, b) SELECT 10, 100;
  • Support for a NATURAL JOIN operation
    A NATURAL JOIN operation is a process where two tables are joined on the basis of their common columns. Common columns are columns that have the same name in both tables. MaxCompute supports OUTER NATURAL JOIN. You can use the USING clause so that the JOIN operation returns common columns only once. Example:
    -- To join the src table that contains the key1, key2, a1, and a2 columns and the src2 table that contains the key1, key2, b1, and b2 columns, you can run the following statement:
    SELECT * FROM src NATURAL JOIN src2;
    -- Both the src and src2 tables contain the key1 and key2 fields. In this case, the preceding statement is equivalent to the following statement:
    SELECT src.key1 as key1, src.key2 as key2, src.a1, src.a2, src2.b1, src2.b2 FROM src INNER JOIN src2 ON src.key1 = src2.key1 AND src.key2 = src2.key2;
    For more information, see JOIN.
  • Support for use of the OFFSET and ORDER BY LIMIT clauses
    The OFFSET clause can be used together with the ORDER BY LIMIT clause to skip the rows specified by OFFSET. For example, execute the following statement to sort the rows of the src table in ascending order by key, and return the 11th to 20th rows. OFFSET 10 indicates that the first 10 rows are skipped. LIMIT 10 indicates that a maximum of 10 rows can be returned.
    SELECT * FROM src ORDER BY key LIMIT 10 OFFSET 10;

    For more information, see SELECT syntax.

  • Support for built-in operators
    • The IS DISTINCT FROM operator is supported. a is distinct from b is equivalent to not(a <=> b), and a is not distinct from b is equivalent to a <=> b.
    • The || operator is supported to concatenate strings. For example, a || b || c is equivalent to concat(a, b, c).

    For more information, see Operators.

  • Support for partition merging

    In MaxCompute, a maximum of 60,000 partitions can be created in a table. If excessive partitions exist, you can merge partitions to archive data in a data warehouse. This reduces the number of partitions. If you trigger partition merging, MaxCompute merges multiple partitions in the same table into one partition, migrates their data to the merged partition, and then drops the previous partitions. The following syntax is used for merging partitions. For more information, see Partition and column operations.

    ALTER TABLE <tableName> MERGE [IF EXISTS] PARTITION(<predicate>) [, PARTITION(<predicate2>) ...] OVERWRITE PARTITION(<fullPartitionSpec>) ;
  • Add/Drop Partitions
    MaxCompute allows you to add or drop multiple partitions at a time by using statements. Syntax for the statements:
    ALTER TABLE t ADD [IF NOT EXISTS] PARTITION (p = '1') PARTITION (p = '2');
    ALTER TABLE t DROP [IF EXISTS]  PARTITION (p = '1'), PARTITION (p = '2');
    -- Note that no commas (,) exist between partitions in the ADD clause, whereas commas (,) are used to separate partitions in the DROP clause.

A custom storage handler for an external table is used to update the Outputer interface in some regions

On August 29, 2019, MaxCompute is upgraded. The upgrade may fail if you use a custom storage handler for an external table to update the Outputer interface and the column data is obtained by column name instead of array subscript.

Upgrade time: 14:00 to 23:00 on August 29, 2019 (UTC+08:00)

Regions: US (Silicon Valley) and Singapore (Singapore)

A custom storage handler for an external table is used to update the Outputer interface in some regions

On August 21, 2019, MaxCompute is upgraded. The upgrade may fail if you use a custom storage handler for an external table to update the Outputer interface and the column data is obtained by column name instead of array subscript.

Upgrade time: 14:00 to 23:00 on August 21, 2019 (UTC+08:00)

Regions: Japan (Tokyo), Germany (Frankfurt), China (Hong Kong), and Australia (Sydney)

Impact: In Outputer.output(Record record), the input record is generated by the last operator of Outputer. Column names are not fixed.

For example, the column name that is generated by the some_function(column_a) expression is a temporary column name. We recommend that you use record.get(index) instead of record.get(Column name) to obtain the content of a column. To obtain column names from a table in Outputer, call DataAttributes.getFullTableColumns().

If you have questions, submit a ticket.

March 26, 2019 (UTC+08:00): MaxCompute SQL is updated

  • The GROUPING SETS clause can be used with both the CUBE and ROLLUP subclauses to aggregate and analyze data of multiple dimensions. For example, you must aggregate column a, aggregate column b, or aggregate both columns a and b. For more information, see GROUPING SETS.
  • The INTERSECT, MINUS, and EXCEPT clauses are supported. For more information, see UNION, INTERSECT, and EXCEPT.
  • If MaxCompute reads files in the ORC or Parquet format in OSS by using external tables, it prunes the columns in the files to reduce I/O, save resources, and lower overall computing costs.
  • Systems that run in the Java UDX framework are enhanced to support writable parameters. For more information, see Java UDF.
Optimized SQL performance
  • DynamicDAG: a required mechanism for dynamic optimization. DynamicDAG delays optimization of resource allocation or algorithm selection and triggers it at runtime to improve optimizations and reduce the possibility of generating low-performance execution plans.
  • ShuffleRemove optimization: optimization for shuffles. MaxCompute supports ShuffleRemove for right tables that have duplicate keys during the execution of the INNER JOIN clause.

The underlying structure of MaxCompute in the China (Hong Kong) region is optimized

The underlying metadatabase of MaxCompute in the China (Hong Kong) region is optimized from 16:00 to 20:00 on January 15, 2019 to improve the performance and stability of MaxCompute. During the optimization, users in the China (Hong Kong) region may encounter job submission delays or job failures for about one minute. In the worst cases, applications may be unavailable for up to 30 minutes. We recommend that you do not submit jobs during the optimization. Users in other regions are not affected. If you have questions, contact the MaxCompute team by using DingTalk or submit a ticket.