After you enable monitoring for data transformation tasks, Log Service sends alert notifications when exceptions occur during data transformation. This helps you handle exceptions in a timely manner. This topic describes how to enable monitoring for data transformation tasks.

Prerequisites

A data transformation task is created. For more information, see Create a data transformation task.

Background information

  • After you create a data transformation rule, Log Service automatically creates a dashboard named Data Transformation Troubleshooting for the data transformation task. We recommend that you take note of the following metrics on the Data Transformation Troubleshooting dashboard:
    • System metrics: the data consumption delay and relevant exceptions.
    • Application metrics: the number of received log entries and number of delivered log entries.
    For more information, see Data transformation dashboard.
  • Log Service provides built-in monitoring rules, action policy, and alert templates for data transformation. You can use built-in resources based on the following rules:
    • Alert Center provides built-in monitoring rules for data transformation. You can enable the alert instance of a monitoring rule to configure alerts. You do not need to write SQL statements. You can enable the alert instances of various monitoring rules, such as the rule that triggers an alert when delay, exceptions, or failures occur during data transformation. For more information, see Monitoring rules for data transformation.
    • You can specify notification methods and alert templates in the built-in action policy for data transformation.
    • You can specify the content of alert notifications in a built-in alert template for data transformation.

Configuration process

You can use built-in resources or custom resources to configure alerts based on the following process:
  • Use built-in resources.
    To configure alerts in an efficient manner, perform the following operations:
    1. Create a DingTalk chatbot.

      Configure a DingTalk chatbot to receive alert notifications.

    2. Configure an action policy.

      Specify the webhook URL of the preceding DingTalk chatbot for the built-in action policy for data transformation. Log Service sends alert notifications by using the webhook URL.

    3. Enable alert instances.
  • Use custom resources.
    To create custom resources and use them to configure recipients, alert templates, and notification methods based on your business requirements, perform the following operations:
    1. Create users.

      Configure users or user groups to receive alert notifications. For more information, see Create users and user groups.

    2. Create an alert template.

      Configure the content of alert notifications. For more information, see Create an alert template.

    3. Create an action policy.

      Configure notification methods, such as Voice Call, SMS Message, and Email. For more information, see Create an action policy.

    4. Enable alert instances.

The built-in resources that are provided by Log Service can be applied to most alerting scenarios. You can use built-in resources or custom resources based on your business requirements. In this example, built-in resources are used to configure alerts.

Step 1: Create a DingTalk chatbot

By default, the built-in action policy for data transformation uses DingTalk-Custom as the notification method to send alert notifications. Before you enable monitoring, you must create a DingTalk chatbot. After an alert is triggered, Log Service sends an alert notification to the specified DingTalk group by using the webhook URL of the DingTalk chatbot.

Create a DingTalk chatbot.
  1. Open DingTalk and go to a DingTalk group.
  2. In the upper-right corner of the chat window, click the Group Settings icon and choose Group Assistant > Add Robot.
  3. In the ChatBot dialog box, click the + icon in the Add Robot section.
  4. In the Robot details dialog box, select Custom (Custom message services via Webhook) and click Add.
  5. In the Add Robot dialog box, enter a chatbot name in the Chatbot name field and select security options in the Security Settings section based on your business requirements. Then, select the I have read and accepted DingTalk Custom Robot Service Terms of Service check box and click Finished.
    Note We recommend that you set the Security Settings parameter to Custom Keywords. You can set up to 10 keywords. The chatbot sends only messages that contain at least one of the specified keywords. We recommend that you specify Alert as a keyword.
  6. Click Copy to copy the webhook URL.

Step 2: Configure an action policy

Modify the request URL of the DingTalk-Custom notification method for the built-in action policy for data transformation. This way, Log Service sends alert notifications by using the specified webhook URL of the DingTalk chatbot.

  1. Log on to the Log Service console.
  2. Go to the Action Policy tab.
    1. In the Projects section, click the project in which you created the data transformation task.
    2. In the left-side navigation pane, click Alerts.
    3. Click Open Alert Center and choose Alert Management > Action Policy.
  3. On the Action Policy tab, click Modify in the Actions column of the built-in action policy whose ID is sls.app.etl.builtin.
  4. In the Edit Action Policy dialog box, click the Primary Action Policy tab and set the Request URL parameter under DingTalk-Custom to the webhook URL that is obtained in Step 1: Create a DingTalk chatbot. Then, click OK.

Step 3: Enable alert instances

  1. Log on to the Log Service console.
  2. In the Projects section, click the name of the project that you want to view.
  3. In the left-side navigation pane, click Alerts.
  4. Click Open Alert Center.
  5. On the Alert Rules/Incidents tab, select SLS Data Transformation in the Type section.
  6. In the monitoring rule list, find the monitoring rule whose alert instances you want to enable and click Enable.
    After you enable an alert instance, Log Service monitors all data transformation tasks in real time by default.
    • To enable multiple alert instances, click Add.
    • If you need to monitor only specific data transformation tasks, click Settings and specify the IDs of the data transformation tasks that you want to monitor.

    For more information, see Related operations.

    For information about the parameters of monitoring rules, see Monitoring rules for data transformation.

Related operations

Operation Description
Configure whitelists You can configure whitelists for specific monitoring rules. This way, alerts are not triggered by specified data transformation tasks.
Add alert instances You can add an alert instance of a monitoring rule. You can add an alert instance and configure its settings to monitor specific data transformation tasks.
Disable alert instances If you disable an alert instance, the status in the Status column of the alert instance changes to Not Enabled, and no more alerts are triggered based on the alert instance.

The configurations of the alert instance are not deleted. If you want to re-enable the alert instance to monitor data transformation tasks, you do not need to reconfigure the parameters of the alert instance.

Pause alert instances If you pause an alert instance, no alerts are triggered within a specified period of time based on the alert instance.
Resume alert instances You can resume paused alert instances.
Delete alert instances If you delete an alert instance, the status in the Status column of the alert instance changes to Not Created.

The configurations of the alert instance are deleted, such as the settings of data transformation tasks. If you want to re-enable the alert instance to monitor data transformation tasks, you must set the parameters of the alert instance again.

Modify alert instances You can modify the parameters of an alert instance, such as the alert name, the IDs of data transformation tasks that you want to monitor, monitoring threshold, action policy, and severity.

Monitoring rules for data transformation

The following tables describe the functionalities, parameters, and associated dashboard metrics of the Log Service built-in monitoring rules for data transformation. The table also provides the handling methods that are used to clear alerts.

  • Delay monitoring rule during data transformation
    Item Description
    Rule name Delay Monitoring during Data Transformation
    Functionality This rule monitors the delay that may occur when data is consumed from shards in data transformation tasks. If the delay during data transformation exceeds the value of the Monitoring Threshold parameter, an alert is triggered.
    Parameters
    • Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).

    • Monitoring Threshold: If the delay during data consumption exceeds this value, an alert is triggered. Default value: 300. Unit: seconds.
    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
    • Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
    Associated dashboard Data Transformation Troubleshooting > shard consumption delay (seconds)
    Handling method You can clear triggered alerts based on the following rules:
    1. If the amount of data in the source Logstore significantly increases, perform the following operations as needed:
      • If the value of the Transform speed (lines/s) metric increases at the same time and the value of the shard consumption delay (seconds) metric decreases, this indicates that the data transformation task is automatically scaling up resources due to the increasing data volume in the source Logstore. In this case, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
      • If the value of the Transform speed (lines/s) metric does not increase or the value of the shard consumption delay (seconds) metric continues to increase, this indicates that the number of shards in the source Logstore may be insufficient and the expansion of resources for data transformation is limited. In this case, you must split the shards in the source Logstore. For more information, see Split a shard. After you split the shards, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
    2. If alerts are triggered based on the Exception Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
    3. If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
  • Exception monitoring rule during data transformation
    Item Description
    Rule name Exception Monitoring during Data Transformation
    Functionality This rule monitors exceptions in data transformation tasks. If an exception occurs during data transformation, an alert is triggered.
    Parameters
    • Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).

    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
    • Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
    Associated dashboard Data Transformation Troubleshooting > Exception detail
    Handling method Fix exceptions based on the related error messages.
    • If the error message contains Unauthorized, InvalidAccessKeyId, or SignatureNotMatch, the data transformation task does not have the required permissions to read data from the source Logstore or write data to the destination Logstore. For more information, see Authorization overview.
    • If the error message contains ProjectNotExist or LogStoreNotExist, the related project or Logstore of the data transformation task does not exist. In this case, log on to the Log Service console to identify and fix the error.
    • If the error message contains SettingError, the configuration of the data transformation rule is invalid. For example, if the specified parameters in a function is invalid or the configuration of an external resource such as Object Storage Service (OSS) or ApsaraDB RDS for MySQL is invalid, an error occurs. For more information, see Function overview.
    • If the error message contains TransformError, the raw data in the source Logstore does not meet the logic of the current data transformation rule. This error may occur when new types of data are imported to the source Logstore. In this case, identify the raw data from the error message, update the data transformation task, and then try again. For more information, see Manage a data transformation task.
  • Traffic monitoring rule during data transformation (absolute value)
    Item Description
    Rule name Traffic Monitoring during Data Transformation (Absolute Value)
    Functionality This rule monitors the average number of log entries that are transformed by data transformation tasks within 5 minutes. If the average number of log entries that are transformed is lower than the value of the Monitoring Threshold parameter, an alert is triggered.
    Parameters
    • Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).

    • Monitoring Threshold: If the average number of transformed log entries is lower than this value, an alert is triggered. Default value: 40000. Unit: lines/s.
    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
    • Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
    Associated dashboard Data Transformation Troubleshooting > Transform speed (lines/s)
    Handling method You can clear triggered alerts based on the following rules:
    1. If the value change trend of the Transform speed (lines/s) metric is consistent with the increase or decrease trend of the data volume in the source Logstore, this indicates that the number of transformed log entries is limited by the data volume in the source Logstore. If not, proceed to the next step.
    2. If alerts are triggered based on the Delay Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the delay is less than 1 minute but the amount of the transformed data is inconsistent with the increase or decrease trend of the data volume in the source Logstore, proceed to the next step.
    3. If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
  • Traffic monitoring rule during data transformation (compared with previous day)
    Item Description
    Rule name Traffic Monitoring during Data Transformation (Compared with Previous Day)
    Functionality This rule monitors the increase rate and decrease rate of the transformed data compared with the same period of the previous day in data transformation tasks. If the increase rate is greater than the value of the Daily Increase Threshold or the decrease rate is greater than the value of the Daily Decrease Threshold parameter, an alert is triggered.
    Parameters
    • Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).

    • Daily Increase Threshold: If the daily increase rate of transformed data is greater than this value, an alert is triggered. Default value: 40%.
    • Daily Decrease Threshold: If the daily decrease rate of transformed data is greater than this value, an alert is triggered. Default value: 20%.
    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
    • Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
    Associated dashboard Data Transformation Troubleshooting > Transform speed (lines/s)
    Handling method You can clear triggered alerts based on the following rules:
    1. If the value change trend of the Transform speed (lines/s) metric is consistent with the increase or decrease trend of the data volume in the source Logstore, this indicates that the number of transformed log entries is limited by the data volume in the source Logstore. If not, proceed to the next step.
    2. If alerts are triggered based on the Delay Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the delay is less than 1 minute but the amount of the transformed data is inconsistent with the increase or decrease trend of the data volume in the source Logstore, proceed to the next step.
    3. If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
  • Monitoring rule for the number of log entries that fail to be transformed during data transformation
    Item Description
    Rule name Failure Monitoring during Data Transformation
    Functionality This rule monitors the failures of data transformation tasks within 15 minutes. If the number of log entries that fail to be transformed during data transformation exceeds the value of the Monitoring Threshold parameter, an alert is triggered.
    Parameters
    • Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).

    • Monitoring Threshold: If the number of log entries that fail to be transformed exceeds this value, an alert is triggered. Default value: 10.
    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
    • Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
    Associated dashboard Data Transformation Troubleshooting > Total logs failed
    Handling method You can clear triggered alerts based on the following rules:
    1. Clear the alerts by using the method that is provided by the Exception Monitoring during Data Transformation rule. If no error message is reported, proceed to the next step.
    2. If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.