This topic provides reading recommendations based on your roles.

MaxCompute beginners

If you are a beginner in MaxCompute, we recommend that you first familiarize yourself with the modules described in the following table.
Module Description
Product Introduction Provides the MaxCompute overview and describes its features. This topic helps you obtain a general knowledge of MaxCompute.
Quick Start Describes how to apply for an account, install the client, create a table, grant permissions, import and export data. It also describes how to execute SQL jobs, user-defined functions (UDFs), and MapReduce jobs.
Terms Introduces the basic terms of MaxCompute.
Commonly used commands Describes the commonly used commands in MaxCompute. This topic helps you familiarize yourself with operations on MaxCompute.
Tools and Downloads Before you analyze data, you must understand how to download, configure, and use the commonly used tools in MaxCompute.
Client You can use the client to perform operations on MaxCompute.
Configure endpoints Describes the regions in which MaxCompute is available, MaxCompute connection methods, and issues that arise from its use with other Alibaba Cloud services, such as Elastic Compute Service (ECS), Tablestore, and Object Storage Service (OSS). These issues include network connectivity issues and issues related to data download charges.

Data analysts

If you are a data analyst, we recommend that you familiarize yourself with the SQL topic. You can query and analyze large volumes of data stored in MaxCompute. MaxCompute SQL provides the following features:
  • Supports data description language (DDL) statements.
  • Uses CREATE, DROP, and ALTER statements to manage both tables and partitions.
  • Uses the SELECT statement to select data records in a table and the WHERE clause to view data records that meet specific conditions. These statements help filter data records.
  • Joins two tables by using equi-joins.
  • Uses the GROUP BY clause to aggregate columns.
  • Uses the INSERT OVERWRITE or INSERT INTO statement to insert data records into another table.
  • Uses built-in functions and UDFs to complete a variety of computations.
  • Collects table statistics and configures table lifecycles.
  • Supports regular expressions.

Users with development experience

If you have development experience, understand the distributed architecture, and want to obtain data analytics capabilities that SQL cannot deliver, we recommend that you read advanced functional modules of MaxCompute.
Module Description
MapReduce MaxCompute provides the MapReduce programming model for Java. You can use the Java API provided by MapReduce to write MapReduce programs and process MaxCompute data.
Graph Graph is a processing framework for iterative graph computing. A graph consists of vertices and edges, both of which contain values. MaxCompute Graph iteratively edits and evolves graphs to obtain analysis results.
Tunnel MaxCompute Tunnel enables you to upload or download large amounts of data to or from MaxCompute at a time.
Java SDK A Java API is provided for developers.
Python SDK A Python API is provided for developers.
Note MapReduce and Graph features are in public preview. If you want to use these features, . In the ticket, you must specify the name of your project. The system processes the ticket within seven business days.

Project owners or administrators

If you are a project owner or administrator, we recommend that you familiarize yourself with the modules described in the following table. A project owner can create and use projects while a project administrator can manage projects, security operations, and costs.
Module Feature Description
Project management Prepare for project creation A project is a basic organizational unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate users and control access requests. A user can have permissions on multiple projects. After a user is authorized to access multiple objects, the user can access objects across the projects, such as tables, resources, functions, and instances. MaxCompute is used to manage the various objects in projects. Preparations before project creation:
  • Budget for resources
    You are charged for storage resources, computing resources, and resources for Internet-based data downloads.
    • Storage resources: You are charged for these resources based on the pay-as-you-go billing method and tiered unit prices. You can estimate their costs based on the volume of data stored. Data stored in MaxCompute changes all the time. As a result, the costs also change.
    • Computing resources: You are charged for these resources based on both the pay-as-you-go and subscription billing methods. These resources are consumed to execute computing jobs, such as SQL statements, MapReduce jobs, Spark jobs, and Lightning jobs. It is difficult to estimate the number of required computing resources at the beginning of your project. We recommend that you use the pay-as-you-go billing method and then decide whether to switch to the subscription billing method based on the number of computing resources used.
    • Resources for Internet-based data downloads: You are charged for these resources based on the pay-as-you-go billing method.

    For more information, see Storage pricing (pay-as-you-go), Computing pricing, and Download pricing(Pay-As-You-Go).

  • Prepare an account and activate the service

    Before you create a MaxCompute project, you must create an Alibaba Cloud account and then activate MaxCompute. Bills are issued to the Alibaba Cloud account. After the account is created, you must choose the pay-as-you-go or subscription billing method based on your budget for the resources you require.

Create a project For more information, see Create a project.
Manage project members Members are managed from the perspectives of responsibilities and data security. If you use MaxCompute in the DataWorks console, you must understand the relationship between the permissions for the two services..
Manage RAM users

You can manage MaxCompute projects by using your Alibaba Cloud account or a RAM user. You can add RAM users under your Alibaba Cloud account to a MaxCompute project. However, MaxCompute does not authenticate these RAM users based on the permissions that are granted to the RAM users in Resource Access Management (RAM). For more information about RAM users, see Prepare a RAM user.

If you manage MaxCompute projects and DataWorks workspaces in the DataWorks console, you can add only RAM users under your Alibaba Cloud account as members. Therefore, you must use your Alibaba Cloud account to create RAM users and manage these RAM users.

Note
  • We recommend that you use one RAM user as one project member. Do not allow multiple members to share the same RAM user.
  • You must promptly delete the RAM users that correspond to members who are transferred to new positions or resign. If a RAM user is added as a project member in the DataWorks console, delete the project member in DataWorks and then delete the RAM user in RAM.
Manage scheduling resources
Scheduling resources of DataWorks. These resources are used to execute or distribute the tasks that are delivered by the scheduling system. Scheduling resources of DataWorks are categorized into the following types. For more information, see View a resource group list.
  • Default scheduling resources. Default scheduling resources are the public resource pool of DataWorks. If the concurrency of DataWorks nodes is high and scheduling resources are insufficient, the nodes wait for resources. After resources are allocated to the nodes, the nodes execute the delivered tasks.
  • Custom scheduling resources. You can configure your ECS instance as a scheduling server to execute or distribute delivered tasks. You can use your Alibaba Cloud account to create such resources. Scheduling resources include several physical machines or ECS instances that are used to execute tasks, such as data synchronization. You can submit a ticket to create a scheduling resource group. Custom scheduling resource groups that exist are not affected. This eliminates the limits of the default scheduling resource group.
Configure projects Only the owner of a project has the permissions to configure the project. For example, the project owner can specify whether to enable full table scan and whether to use the MaxCompute V2.0 data type edition for a project by default. For more information, see Project operations.
Cost management None Budgets for resources help you estimate costs before you use the resources. It is difficult to estimate the precise costs due to the billing methods of MaxCompute. You must manage costs during the entire business development process.
  • For more information about pricing, see Billing.
  • You can switch between the pay-as-you-go and subscription billing methods. For more information, see Switch billing methods.