Skip to content

Multiline filter issue, the timestamp and metadata of some logs is missing #10576

@mirko-lazarevic

Description

@mirko-lazarevic

Bug Report

Describe the bug
When using a multiline filter, the timestamp and metadata of some logs is systematically missing.
We are logging two different types of messages in an time interval of 10 seconds like this:

Thu Jul 10 13:47:36 UTC 2025 Likely to fail
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T13:47:36+00:00 should be ok
Thu Jul 10 13:47:46 UTC 2025 Likely to fail
Thu Jul 10 13:47:56 UTC 2025 Likely to fail
Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T13:48:16+00:00 should be ok
Thu Jul 10 13:48:26 UTC 2025 Likely to fail
Thu Jul 10 13:48:36 UTC 2025 Likely to fail
Thu Jul 10 13:48:46 UTC 2025 Likely to fail

After logging the "Likely to fail" message twice, the above block is repeatedly logged in a loop (1x should be ok, followed by 3x Likley to fail).

The corresponding logs copied from the kubernetes file:

2025-07-10T08:47:36.927938223-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.929896848-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.931218787-05:00 stdout F 2025-07-10T13:47:36+00:00 should be ok
2025-07-10T08:47:46.936722867-05:00 stdout F Thu Jul 10 13:47:46 UTC 2025 Likely to fail
2025-07-10T08:47:56.942020106-05:00 stdout F Thu Jul 10 13:47:56 UTC 2025 Likely to fail
2025-07-10T08:48:06.946530058-05:00 stdout F Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T08:48:16.951177698-05:00 stdout F 2025-07-10T13:48:16+00:00 should be ok
2025-07-10T08:48:26.955666785-05:00 stdout F Thu Jul 10 13:48:26 UTC 2025 Likely to fail
2025-07-10T08:48:36.960419370-05:00 stdout F Thu Jul 10 13:48:36 UTC 2025 Likely to fail
2025-07-10T08:48:46.965574420-05:00 stdout F Thu Jul 10 13:48:46 UTC 2025 Likely to fail
2025-07-10T08:48:56.970499744-05:00 stdout F 2025-07-10T13:48:56+00:00 should be ok

We have a multiline filter (alias "ml-detect" in below configuration) which matches the first time format ("2025-07-08T15:25:07+00:0"0, message: "should be ok") in the start state and the second time format ("Tue Jul 8 15:25:17 UTC 2025", message "Likely to fail") in the next state.

After the first occurrence of the "should be ok" message, all logs with the message "Likely to fail" have the identical timestamp (e.g. [[1752051270.328705182, {}]) of that previous "should be ok" message, even though they are logged 10 seconds afterwards.
In addition, the logs with the message "Likely to fail" do not have any meta data fields like "time", "stream", or "file".

The below logs show an example output, including the wrong timestamp and missing metadata. It shows that the first two "Likely to fail" logs have a correct timestamp and metadata. As soon as the first "should be ok" message is logged, the bug appears (our multiline parser matches the start state) and the following "Likely to fail" logs have the wrong timestamp and no metadata.

[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.927938223, {}], {"time"=>"2025-07-10T08:47:36.927938223-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[1] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.929896848, {}], {"time"=>"2025-07-10T08:47:36.929896848-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"time"=>"2025-07-10T08:47:36.931218787-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:47:36+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:56 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:48:06 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"time"=>"2025-07-10T08:48:16.951177698-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:16+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:26 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:36 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155336.970499744, {}], {"time"=>"2025-07-10T08:48:56.970499744-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:56+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]

We are using the multiline "cri" parser in the tail input to remove the file headings added by kubernetes. In this way we can use our time formats for the multiline parser.

To Reproduce

  • Example log message if applicable: see above
  • Steps to reproduce the problem:
    1. Deploy fluent-bit using below configuration and make sure it has write access to "/var/log/fluent-bit/" for setting up the storage
    2. Create a testlog pod using the command:
      oc run testlog --image <any-linux-image> --command -- /bin/bash -c 'echo $(date) Likely to fail; echo $(date) Likely to fail; while true; do echo $(date -Iseconds) should be ok;sleep 10;echo $(date) Likely to fail;sleep 10;echo $(date) Likely to fail;sleep 10; echo $(date) Likely to fail;sleep 10;done'

It is important that the messages are not collected at once and logged with a delay (in our test example 10 seconds). When using the read from head function in the tail input, the bug will not appear. It needs to wait for the next log line to appear instead of collecting all at once.

Expected behavior

We expect all logs to have the current timestamp and metadata.

Your Environment

  • Version used: fluent-bit v3.2.10
  • Configuration:
  fluent-bit.yaml: |
    service:
      flush: 2
      log_level: info
      http_server: on
      http_listen: 0.0.0.0
      http_port: 8081
      health_check: on
      hc_errors_count: 1
      hc_retry_failure_count: 1
      hc_period: 30
      storage.path: /var/log/fluent-bit/fluent-bit-cache2
      storage.max_chunks_up: 512
      storage.metrics: on
      storage.sync: full
      storage.backlog.mem_limit: 20M
      parsers_file: parsers.yaml
    pipeline:
      inputs:
        - name: tail
          tag: kube.*
          path: /var/log/containers/*testlog*.log
          path_key: file
          db: /var/log/fluent-bit/fluent-bit2.DB
          buffer_chunk_size: 512kb
          buffer_max_size: 2mb
          mem_buf_limit: 512mb
          multiline.parser: cri
          skip_long_lines: on
          refresh_interval: 20
          skip_empty_lines: on
          threaded: true
          read_from_head: false
          rotate_wait: 15
          storage.type: filesystem
          storage.pause_on_chunks_overlimit: off
      filters:
        - name: multiline
          alias: ml-detect
          match: 'kube.var.log.containers.*'
          multiline.parser: multiline_detector
          multiline.key_content: log
          flush_ms: 1000
          emitter_mem_buf_limit: 50M
      outputs:
        - name: stdout
          match: '*'
  parsers.yaml: |
    multiline_parsers:
      - name: multiline_detector
        type: regex
        flush_timeout: 1000
        rules:
          - state: start_state
            regex: '/^(?<log>(((\[?\d+(\-|\/)\d+(\-|\/)\d+.?\d+\:\d+\:\d+(?:[.,]\d{1,6})?(Z|\+\d+)?)\]?).*))/'
            next_state: notime_bib_cont
          - state: notime_bib_cont
            regex: '/^(?<log>(?!\[?\d+(\-|\/)\d+(\-|\/)\d+)(?!BIP\d{3,4}[EI]|\s*Caused\sby\s\[2\]|\{\s*\").*)/'
            next_state: notime_bib_cont
          - state: notime_bib_cont
            regex: '/^\s*(?!.+)'
            next_state: notime_bib_cont
  • Environment name and version:
    • OpenShift Version: 4.17.14
    • Kubernetes Version: v1.27.16+03a907c
  • Operating System and version: RHEL 8
  • Filters and plugins: included in configuration above

Additional context

We are trying to merge multiline logs into one log line using a multiline parser.
In this particular example the usage of our multiline parser is not visible, but is shows a bug we detected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions