This topic describes how to create a workflow, create nodes in the workflow, and configure node dependencies. After you create a workflow, you can use the DataStudio service to compute and analyze data in the workspace.
Create a workflow
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
- On the DataStudio page, move the pointer over the icon and select Workflow.
- In the Create Workflow dialog box, set the Workflow Name and Description parameters. Notice The workflow name must be 1 to 128 characters in length, and can contain letters, digits, underscores (_), and periods (.).
- Click Create.
Create nodes and configure node dependencies
- A zero load node is a control node that is used to maintain and control its descendant nodes in a workflow. A zero load node does not generate data.
- If other nodes depend on a zero load node and the zero load node is set to Failed by O&M personnel, the pending descendant nodes cannot run. During the O&M process, a zero load node can be disabled to prevent incorrect data of ancestor nodes from being obtained by their descendant nodes.
- In most cases, the root node of the workspace is used as the ancestor node of a zero
load node in a workflow. The root node of a workspace is named in the
- DataWorks automatically creates an output name for a node. The name is in the Workspace name.Node name format. If a workspace contains two nodes with the same name, rename one of the two nodes.
When you design a workflow, we recommend that you create a zero load node as the root node of the workflow to control the entire workflow. To design a workflow, perform the following steps:
- Double-click the name of the workflow to go to the configuration tab. Click Zero-Load Node and drag it to the canvas on the right.
- In the Create Node dialog box, set the Node Name parameter to start and click Commit. Notice The node name must be 1 to 128 characters in length, and can contain letters, digits, underscores (_), and periods (.).
- Use the same method to create an ODPS SQL node named insert_data.
- Drag a line from the start node to the insert_data node to configure the start node as the ancestor node of the insert_data node.
Configure the ancestor node of the zero load node
In a workflow, a zero load node is used to control the entire workflow and serves as the ancestor node of all nodes in the workflow.
In most cases, a zero load node depends on the root node of the workspace.
- Double-click the name of the zero load node to go to the node configuration tab.
- Click Properties in the right-side navigation pane.
- In the Dependencies section, click Use Root Node to configure the root node of the workspace as the ancestor node of the zero load
- Click the icon in the toolbar.
Edit and run the ODPS SQL node
This section describes how to use SQL code to query the number of singles with different education levels who have mortgage loans in the ODPS SQL node insert_data and save the query result. The query result can be used for descendant nodes to continue to analyze or present data.
- Go to the configuration tab of the ODPS SQL node and enter the following code: For more information about the syntax, see MaxCompute SQL overview.
INSERT OVERWRITE TABLE result_table -- Insert data into the result_table table. SELECT education , COUNT(marital) AS num FROM bank_data WHERE housing = 'yes' AND marital = 'single' GROUP BY education;
- Right-click bank_data in the code and select Delete input. The bank_data table is not generated by an auto-triggered node. For more information about how to generate a table and import data into the table, see Create tables and import data. If a node uses a SELECT statement to query data of the bank_data table, you need to use @exclude_input=bank_data to manually delete the node dependency that is automatically generated by the SELECT statement. This ensures that the ODPS SQL node periodically updates data and the descendant node obtains correct data from the ODPS SQL node based on the node dependency.Note Node dependencies ensure that a node can successfully obtain the table data generated by its ancestor node that is scheduled to run. However, if the ancestor node is not scheduled to run, the system cannot detect whether the ancestor node has generated the latest table data. If a node uses a SELECT statement to query data of a table that is not generated by an auto-triggered node, you need to manually delete the dependency of the node that is automatically generated by the SELECT statement.
- Click the icon in the top navigation bar. This prevents code loss.
- Click the icon. After the node is run, you can view the operational log and result in the lower part of the tab.
Commit a workflow
- After you run and debug the ODPS SQL node insert_data, return to the configuration tab of the workflow.
- Click the icon.
- In the Commit dialog box, select the node that you want to commit, enter your comments in the Change description field, and then select Ignore I/O Inconsistency Alerts.
- Click Commit. After the workflow is committed, you can view the node status from the node list in the workflow. If the icon is displayed on the left of the node name, the node is committed. If the icon is not displayed, the node is not committed.
What to do next
Create a sync node to export data to different types of data stores. For more information, see Create a sync node.