Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Automatic multi-line aggregation part in Advanced Log Collection Configurations #27361

Merged
merged 8 commits into from
Feb 11, 2025
197 changes: 161 additions & 36 deletions content/en/agent/logs/advanced_log_collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -462,8 +462,12 @@
| 2020-10-27 05:10:49.657 | `\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}` |
| {"date": "2018-01-02" | `\{"date": "\d{4}-\d{2}-\d{2}` |

### Automatic multi-line aggregation
With Agent 7.37+, `auto_multi_line_detection` can be enabled, which allows the Agent to detect [common multi-line patterns][3] automatically.
### Global automatic multi-line aggregation
With Agent 7.37+, `auto_multi_line_detection` can be enabled, which allows the Agent to detect [common multi-line patterns][3] automatically for **all** log integrations it sets up.


{{< tabs >}}
{{% tab "Configuration file" %}}

Enable `auto_multi_line_detection` globally in the `datadog.yaml` file:

Expand All @@ -472,48 +476,66 @@
auto_multi_line_detection: true
```

For containerized deployments, you can enable `auto_multi_line_detection` with the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true` environment variable.
{{% /tab %}}
{{% tab "Docker" %}}

It can also be enabled or disabled (overriding the global config) per log configuration:
Use the environment variable `DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION` in the Datadog Agent container to configure a global automatic multi-line aggregation rule. For example:

{{< tabs >}}
{{% tab "Configuration file" %}}
```shell
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true
```

{{% /tab %}}
{{% tab "Kubernetes" %}}

#### Operator
Use the `spec.override.nodeAgent.env` parameter in your Datadog Operator manifest to set the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION` environment variable to configure a global automatic multi-line aggregation rule. For example:

```yaml
logs:
- type: file
path: /my/test/file.log
service: testApp
source: java
auto_multi_line_detection: true
spec:
override:
nodeAgent:
env:
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION
value: "true"
```

Automatic multi-line detection uses a list of common regular expressions to attempt to match logs. If the built-in list is not sufficient, you can also add custom patterns in the `datadog.yaml` file:
#### Helm
Use the `datadog.logs.autoMultiLineDetection` option in the Helm chart to configure a global automatic multi-line aggregation rule. For example:

```yaml
logs_config:
auto_multi_line_detection: true
auto_multi_line_extra_patterns:
- \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
- '[A-Za-z_]+ \d+, \d+ \d+:\d+:\d+ (AM|PM)'
datadog:
logs:
enabled: true
autoMultiLineDetection: true
```

If no pattern meets the line match threshold, add the `auto_multi_line_default_match_threshold` parameter with a lower value. This configures a threshold value that determines how frequently logs have to match in order for the auto multi-line aggregation to work. To find the current threshold value run the [agent `status` command][1].

{{% /tab %}}
{{< /tabs >}}

### Enable multi-line aggregation per integration
Alternatively, you can enable or disable multi-line aggregation for an individual integration's log collection. Changing the multi-line aggregation for an integration overrides the global configuration.

{{< tabs >}}
{{% tab "Configuration file" %}}

In a host environment, enable `auto_multi_line_detection` with the [Custom log collection][1] method. For example:

[1]: https://docs.datadoghq.com/agent/logs/?tab=tailfiles#custom-log-collection

```yaml
logs_config:
auto_multi_line_detection: true
auto_multi_line_extra_patterns:
- \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
- '[A-Za-z_]+ \d+, \d+ \d+:\d+:\d+ (AM|PM)'
auto_multi_line_default_match_threshold: 0.1
logs:
- type: file
path: /my/test/file.log
service: testApp
source: java
auto_multi_line_detection: true
```

[1]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-information
{{% /tab %}}
{{% tab "Docker" %}}

In a Docker environment, use the label `com.datadoghq.ad.logs` on your container to specify the `log_processing_rules`. For example:
In a Docker environment, use the label `com.datadoghq.ad.logs` on your container to specify the log configuration. For example:

```yaml
labels:
Expand All @@ -524,14 +546,10 @@
"auto_multi_line_detection": true
}]
```
Automatic multi-line detection uses a list of common regular expressions to attempt to match logs. If the built-in list is not sufficient, you can also add custom patterns in the `datadog.yaml` file with the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS` environment variable.

If no pattern meets the line match threshold, add the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_DEFAULT_MATCH_THRESHOLD` environment variable with a lower value. This configures a threshold value that determines how frequently logs have to match in order for the auto multi-line aggregation to work. To find the current threshold value run the [agent `status` command][1].

[1]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-information

{{% /tab %}}
{{% tab "Kubernetes" %}}
In a Kubernetes environment, use the annotation `ad.datadoghq.com/<CONTAINER_NAME>.logs` on your pod to specify the log configuration. For example:

```yaml
apiVersion: apps/v1
Expand All @@ -558,17 +576,124 @@
- name: '<CONTAINER_NAME>'
image: testApp:latest
```
{{% /tab %}}
{{< /tabs >}}

### Customize multi-line aggregation configuration
Automatic multi-line detection uses a list of [common regular expressions][1] to attempt to match logs. If the built-in list is not sufficient, you can also add custom patterns and thresholds for detection.

[1]:https://github.com/DataDog/datadog-agent/blob/a27c16c05da0cf7b09d5a5075ca568fdae1b4ee0/pkg/logs/internal/decoder/auto_multiline_handler.go#L187
{{< tabs >}}
{{% tab "Configuration file" %}}
In a configuration file, add the `auto_multi_line_extra_patterns` to your `datadog.yaml`:
```yaml
logs_config:
auto_multi_line_detection: true
auto_multi_line_extra_patterns:
- \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
- '[A-Za-z_]+ \d+, \d+ \d+:\d+:\d+ (AM|PM)'
```

The `auto_multi_line_default_match_threshold` parameter determines how closely logs have to match the patterns in order for the auto multi-line aggregation to work.

If your multi-line logs aren't getting aggregated as you like, you can change the sensitivity of the matching by setting the `auto_multi_line_default_match_threshold` parameter. Add the `auto_multi_line_default_match_threshold` parameter to your configuration file with a value lower (to increase matches) or higher (to decrease matches) than the current threshold value. To find the current threshold value, run the [Agent `status` command][1].

```yaml
logs_config:
auto_multi_line_detection: true
auto_multi_line_extra_patterns:
- \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
- '[A-Za-z_]+ \d+, \d+ \d+:\d+:\d+ (AM|PM)'
auto_multi_line_default_match_threshold: 0.1
```

[1]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-information
{{% /tab %}}
{{% tab "Docker" %}}
In a containerized Agent, add the environment variable `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS`:

```yaml
environment:
- DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true
- DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS=\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
```
**Note**: The Datadog Agent interprets spaces in the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS` environment variable as separators between multiple patterns. In the following example, the two regex patterns are divided by a space, and `\s` in the second regex pattern matches spaces.

The `auto_multi_line_default_match_threshold` parameter determines how closely logs have to match the patterns in order for the auto multi-line aggregation to work.

If your multi-line logs aren't getting aggregated as you like, you can change the sensitivity of the matching by setting the `auto_multi_line_default_match_threshold` parameter. Add the `auto_multi_line_default_match_threshold` parameter to your configuration file with a value lower (to increase matches) or higher (to decrease matches) than the current threshold value. To find the current threshold value, run the [Agent `status` command][1].

```yaml
environment:
- DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true
- DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS=\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
- DD_LOGS_CONFIG_AUTO_MULTI_LINE_DEFAULT_MATCH_THRESHOLD=0.1
```

[1]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-information
{{% /tab %}}
{{% tab "Kubernetes" %}}
In Kubernetes, add the environment variable `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS`:

#### Operator

```yaml
spec:
override:
nodeAgent:
env:
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS
value: \d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
```

#### Helm

```yaml
datadog:
env:
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS
value: \d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
```
**Note**: The Datadog Agent interprets spaces in the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS` environment variable as separators between multiple patterns. In the following example, the two regex patterns are divided by a space, and `\s` in the second regex pattern matches spaces.

Automatic multi-line detection uses a list of common regular expressions to attempt to match logs. If the built-in list is not sufficient, you can also add custom patterns in the `datadog.yaml` file with the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS` environment variable.

If no pattern meets the line match threshold, add the `DD_LOGS_CONFIG_AUTO_MULTI_LINE_DEFAULT_MATCH_THRESHOLD` environment variable with a lower value. This configures a threshold value that determines how frequently logs have to match in order for the auto multi-line aggregation to work. To find the current threshold value run the [agent `status` command][1].
The `auto_multi_line_default_match_threshold` parameter determines how closely logs have to match the patterns in order for the auto multi-line aggregation to work.

If your multi-line logs aren't getting aggregated as you like, you can change the sensitivity of the matching by setting the `auto_multi_line_default_match_threshold` parameter. Add the `auto_multi_line_default_match_threshold` parameter to your configuration file with a value lower (to increase matches) or higher (to decrease matches) than the current threshold value. To find the current threshold value, run the [Agent `status` command][1].

[1]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-information

#### Operator

```yaml
spec:
override:
nodeAgent:
env:
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS
value: \d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_DEFAULT_MATCH_THRESHOLD
value: "0.1"
```

#### Helm

```yaml
datadog:
env:
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_EXTRA_PATTERNS
value: \d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]) [A-Za-z_]+\s\d+,\s\d+\s\d+:\d+:\d+\s(AM|PM)
- name: DD_LOGS_CONFIG_AUTO_MULTI_LINE_DEFAULT_MATCH_THRESHOLD
value: "0.1"
```


{{% /tab %}}
{{< /tabs >}}

With this feature enabled, when a new log file is opened the Agent tries to detect a pattern. During this process the logs are sent as single lines. After the detection threshold is met, all future logs for that source are aggregated with the detected pattern, or as single lines if no pattern is found. Detection takes at most 30 seconds or the first 500 logs (whichever comes first).
With multi-line aggregation enabled, the Agent first tries to detect a pattern in each new log file. This detection process takes at most 30 seconds or the first 500 logs, whichever comes first. During the initial detection process, the logs are sent as single lines.

After the detection threshold is met, all future logs for that source are aggregated with the best matching pattern, or as single lines if no pattern is found.

Check notice on line 696 in content/en/agent/logs/advanced_log_collection.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

**Note**: If you can control the naming pattern of the rotated log, ensure that the rotated file replaces the previously active file with the same name. The Agent reuses a previously detected pattern on the newly rotated file to avoid re-running detection.

Expand Down
Loading