-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Report
Describe the bug
When using a multiline filter, the timestamp and metadata of some logs is systematically missing.
We are logging two different types of messages in an time interval of 10 seconds like this:
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T13:47:36+00:00 should be ok
Thu Jul 10 13:47:46 UTC 2025 Likely to fail
Thu Jul 10 13:47:56 UTC 2025 Likely to fail
Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T13:48:16+00:00 should be ok
Thu Jul 10 13:48:26 UTC 2025 Likely to fail
Thu Jul 10 13:48:36 UTC 2025 Likely to fail
Thu Jul 10 13:48:46 UTC 2025 Likely to fail
After logging the "Likely to fail" message twice, the above block is repeatedly logged in a loop (1x should be ok, followed by 3x Likley to fail).
The corresponding logs copied from the kubernetes file:
2025-07-10T08:47:36.927938223-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.929896848-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.931218787-05:00 stdout F 2025-07-10T13:47:36+00:00 should be ok
2025-07-10T08:47:46.936722867-05:00 stdout F Thu Jul 10 13:47:46 UTC 2025 Likely to fail
2025-07-10T08:47:56.942020106-05:00 stdout F Thu Jul 10 13:47:56 UTC 2025 Likely to fail
2025-07-10T08:48:06.946530058-05:00 stdout F Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T08:48:16.951177698-05:00 stdout F 2025-07-10T13:48:16+00:00 should be ok
2025-07-10T08:48:26.955666785-05:00 stdout F Thu Jul 10 13:48:26 UTC 2025 Likely to fail
2025-07-10T08:48:36.960419370-05:00 stdout F Thu Jul 10 13:48:36 UTC 2025 Likely to fail
2025-07-10T08:48:46.965574420-05:00 stdout F Thu Jul 10 13:48:46 UTC 2025 Likely to fail
2025-07-10T08:48:56.970499744-05:00 stdout F 2025-07-10T13:48:56+00:00 should be ok
We have a multiline filter (alias "ml-detect" in below configuration) which matches the first time format ("2025-07-08T15:25:07+00:0"0, message: "should be ok") in the start state and the second time format ("Tue Jul 8 15:25:17 UTC 2025", message "Likely to fail") in the next state.
After the first occurrence of the "should be ok" message, all logs with the message "Likely to fail" have the identical timestamp (e.g. [[1752051270.328705182, {}]) of that previous "should be ok" message, even though they are logged 10 seconds afterwards.
In addition, the logs with the message "Likely to fail" do not have any meta data fields like "time", "stream", or "file".
The below logs show an example output, including the wrong timestamp and missing metadata. It shows that the first two "Likely to fail" logs have a correct timestamp and metadata. As soon as the first "should be ok" message is logged, the bug appears (our multiline parser matches the start state) and the following "Likely to fail" logs have the wrong timestamp and no metadata.
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.927938223, {}], {"time"=>"2025-07-10T08:47:36.927938223-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[1] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.929896848, {}], {"time"=>"2025-07-10T08:47:36.929896848-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"time"=>"2025-07-10T08:47:36.931218787-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:47:36+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:56 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:48:06 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"time"=>"2025-07-10T08:48:16.951177698-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:16+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:26 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:36 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155336.970499744, {}], {"time"=>"2025-07-10T08:48:56.970499744-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:56+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
We are using the multiline "cri" parser in the tail input to remove the file headings added by kubernetes. In this way we can use our time formats for the multiline parser.
To Reproduce
- Example log message if applicable: see above
- Steps to reproduce the problem:
- Deploy fluent-bit using below configuration and make sure it has write access to "/var/log/fluent-bit/" for setting up the storage
- Create a testlog pod using the command:
oc run testlog --image <any-linux-image> --command -- /bin/bash -c 'echo $(date) Likely to fail; echo $(date) Likely to fail; while true; do echo $(date -Iseconds) should be ok;sleep 10;echo $(date) Likely to fail;sleep 10;echo $(date) Likely to fail;sleep 10; echo $(date) Likely to fail;sleep 10;done'
It is important that the messages are not collected at once and logged with a delay (in our test example 10 seconds). When using the read from head function in the tail input, the bug will not appear. It needs to wait for the next log line to appear instead of collecting all at once.
Expected behavior
We expect all logs to have the current timestamp and metadata.
Your Environment
- Version used: fluent-bit v3.2.10
- Configuration:
fluent-bit.yaml: |
service:
flush: 2
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 8081
health_check: on
hc_errors_count: 1
hc_retry_failure_count: 1
hc_period: 30
storage.path: /var/log/fluent-bit/fluent-bit-cache2
storage.max_chunks_up: 512
storage.metrics: on
storage.sync: full
storage.backlog.mem_limit: 20M
parsers_file: parsers.yaml
pipeline:
inputs:
- name: tail
tag: kube.*
path: /var/log/containers/*testlog*.log
path_key: file
db: /var/log/fluent-bit/fluent-bit2.DB
buffer_chunk_size: 512kb
buffer_max_size: 2mb
mem_buf_limit: 512mb
multiline.parser: cri
skip_long_lines: on
refresh_interval: 20
skip_empty_lines: on
threaded: true
read_from_head: false
rotate_wait: 15
storage.type: filesystem
storage.pause_on_chunks_overlimit: off
filters:
- name: multiline
alias: ml-detect
match: 'kube.var.log.containers.*'
multiline.parser: multiline_detector
multiline.key_content: log
flush_ms: 1000
emitter_mem_buf_limit: 50M
outputs:
- name: stdout
match: '*'
parsers.yaml: |
multiline_parsers:
- name: multiline_detector
type: regex
flush_timeout: 1000
rules:
- state: start_state
regex: '/^(?<log>(((\[?\d+(\-|\/)\d+(\-|\/)\d+.?\d+\:\d+\:\d+(?:[.,]\d{1,6})?(Z|\+\d+)?)\]?).*))/'
next_state: notime_bib_cont
- state: notime_bib_cont
regex: '/^(?<log>(?!\[?\d+(\-|\/)\d+(\-|\/)\d+)(?!BIP\d{3,4}[EI]|\s*Caused\sby\s\[2\]|\{\s*\").*)/'
next_state: notime_bib_cont
- state: notime_bib_cont
regex: '/^\s*(?!.+)'
next_state: notime_bib_cont- Environment name and version:
- OpenShift Version: 4.17.14
- Kubernetes Version: v1.27.16+03a907c
- Operating System and version: RHEL 8
- Filters and plugins: included in configuration above
Additional context
We are trying to merge multiline logs into one log line using a multiline parser.
In this particular example the usage of our multiline parser is not visible, but is shows a bug we detected.