-
Notifications
You must be signed in to change notification settings - Fork 1.8k
in_tail: fix data loss with buffered data on shutdown and gzip files #11269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
in_tail: fix data loss with buffered data on shutdown and gzip files #11269
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds gzip-aware resume state and DB-backed bookkeeping: new per-file fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode), DB schema migration and SQL updates for Changes
Sequence Diagram(s)sequenceDiagram
participant Disk as Disk File
participant Reader as in_tail Reader
participant Decompress as Gzip Decompressor
participant Buffer as In-memory Buffer
participant DB as SQLite DB
Disk->>Reader: read compressed/raw bytes (advance raw offset)
Reader->>Buffer: append raw bytes
alt decompression_context (gzip)
Buffer->>Decompress: feed compressed bytes
Decompress-->>Buffer: decompressed data (may span members)
Buffer->>Buffer: apply exclude_bytes / skipping_mode (drop initial decompressed bytes)
alt gzip member boundary reached
Decompress->>Reader: notify member end
Reader->>Reader: set anchor_offset (member start), reset skip_bytes
end
Reader->>DB: persist raw offset, skip, anchor
else no decompression
Buffer->>DB: persist raw offset
end
alt Shutdown with buffered unprocessed data and no decompression
Reader->>DB: rewind offset (offset -= buf_len) and persist
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (3)
🧰 Additional context used🧠 Learnings (12)📓 Common learnings📚 Learning: 2025-10-23T07:43:16.216ZApplied to files:
📚 Learning: 2025-09-22T15:59:55.794ZApplied to files:
📚 Learning: 2025-08-29T06:25:27.250ZApplied to files:
📚 Learning: 2025-08-29T06:24:26.170ZApplied to files:
📚 Learning: 2025-08-29T06:25:27.250ZApplied to files:
📚 Learning: 2025-08-29T06:25:02.561ZApplied to files:
📚 Learning: 2025-08-29T06:24:55.855ZApplied to files:
📚 Learning: 2025-08-29T06:24:44.797ZApplied to files:
📚 Learning: 2025-09-08T11:21:33.975ZApplied to files:
📚 Learning: 2025-08-31T12:46:11.940ZApplied to files:
📚 Learning: 2025-11-21T06:23:29.770ZApplied to files:
🧬 Code graph analysis (2)plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
🔇 Additional comments (16)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: Agent
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
71208f6 to
f63f1f4
Compare
f63f1f4 to
76f3c05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)
1471-1471: Remove trailing whitespace.There are trailing spaces after
#endifon this line.Apply this diff:
-#endif +#endif
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
1451-1472: Consider adding a warning for compressed files with buffered data.While the fix correctly excludes compressed files (where rewinding isn't possible due to gzip decompression state requirements), users monitoring
.gzfiles may be unaware that buffered data can still be lost on shutdown. Consider adding a debug or warning message similar to:if (file->buf_len > 0) { if (file->decompression_context != NULL) { flb_plg_debug(ctx->ins, "inode=%"PRIu64" compressed file %s has %lu bytes buffered; " "rewind not supported for compressed files", file->inode, file->name, (unsigned long)file->buf_len); } else { /* existing rewind logic */ ... } }This would help users understand the limitation without affecting correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)
1451-1472: LGTM! Rewind logic correctly prevents data loss for regular files.The implementation properly addresses the data loss issue when Fluent Bit shuts down with unprocessed buffered data:
- The condition
file->buf_len > 0 && file->decompression_context == NULLcorrectly identifies regular files with unprocessed data while excluding compressed files (where offset and buf_len have incompatible units).- The rewind arithmetic
offset -= buf_len(clamped to 0) correctly repositions the file offset to the start of unprocessed data.- The DB persistence call ensures the rewound offset survives restarts.
This addresses the core issue described in #11265 where partial lines and buffered data were lost on restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/in_tail/tail_file.c (2)
1451-1472: Core rewind logic looks correct; consider adding warning for compressed files.The check for
file->decompression_context == NULLproperly prevents the compressed-file arithmetic bug flagged in earlier reviews. For regular files, the offset and buffer length are in the same units, so the subtraction and DB update are valid.However, when compressed files have buffered decompressed data at shutdown, that data will be silently lost (since rewinding is skipped). Consider logging a warning in that case so users are aware:
if (file->buf_len > 0 && file->decompression_context == NULL) { /* * If there is data in the buffer, it means it was not processed. * We must rewind the offset to ensure this data is re-read on restart. */ off_t old_offset = file->offset; if (file->offset > file->buf_len) { file->offset -= file->buf_len; } else { file->offset = 0; } flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: old=%"PRId64" new=%"PRId64" (buf_len=%lu)", file->inode, file->name, old_offset, file->offset, (unsigned long)file->buf_len); #ifdef FLB_HAVE_SQLDB if (ctx->db) { flb_tail_db_file_offset(file, ctx); } #endif +} +else if (file->buf_len > 0 && file->decompression_context != NULL) { + flb_plg_warn(ctx->ins, "inode=%"PRIu64" cannot rewind compressed file %s; " + "%lu decompressed bytes in buffer may be lost on restart", + file->inode, file->name, (unsigned long)file->buf_len); }
1471-1471: Remove trailing whitespace.Minor formatting issue: extra spaces after
#endif.-#endif +#endif
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)
258-263: Verify SQL_INSERT_FILE parameter order and address skip_bytes type mismatch.The bindings at lines 258-263 correctly match SQL_INSERT_FILE's parameter order (name, offset, inode, created, skip, anchor). However,
skip_bytesis defined asuint64_tbut bound viasqlite3_bind_int64, which expects a signedsqlite3_int64. While values exceedingINT64_MAXare unlikely for a byte offset, consider either casting toint64_texplicitly or usingsqlite3_bind_int64with a comment documenting the assumption thatskip_bytesremains within signed int64 range.
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
1027-1141: Document multi-member gzip stream_offset limitation more prominently.The comment at lines 1090-1095 acknowledges that
stream_offset = skip_bytesis only correct for single-member gzip files. For multi-member gzip,skip_bytesresets at member boundaries, sostream_offsetwon't reflect total decompressed bytes from all prior members. This could affectoffset_keyaccuracy when users concatenate multiple gzip streams.Consider either:
- Adding a more prominent warning in the code or documentation that
offset_keyfor multi-member gzip reflects position within the current member, not total decompressed bytes across all members.- Tracking total decompressed bytes separately if multi-member gzip is a common use case.
Otherwise, the seek logic correctly handles DB-backed resume for both gzip and plain files, with proper initialization of
stream_offset,exclude_bytes, andskipping_mode.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/in_tail/tail_file_internal.h
- plugins/in_tail/tail_fs_stat.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_db.cplugins/in_tail/tail_file.cplugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)plugins/in_tail/tail_file.h (1)
flb_tail_target_file_name_cmp(64-116)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (16)
plugins/in_tail/tail_db.c (6)
28-34: LGTM!The column-existence callback is correctly implemented. It properly sets the flag when a row is returned from the pragma query, enabling reliable column detection for schema migration.
61-108: LGTM!The migration logic correctly uses
pragma_table_infoto detect column existence, distinguishing real errors from missing columns. The pattern is consistent for bothskipandanchorcolumns, with proper error handling and logging.
318-363: LGTM!The function correctly restores
skip_bytesandanchor_offsetfrom the database and initializes the runtime-onlyexclude_bytesandskipping_modefields based on the persisted skip state. The logic properly handles both cases (skip > 0 and skip == 0).
424-446: LGTM!Resetting
db_idtoFLB_TAIL_DB_ID_NONEafter deletion prevents accidental resurrection of deleted DB entries whenflb_tail_db_file_offsetis called later. This is an important safeguard for data integrity.
365-398: The parameter binding order inflb_tail_db_file_offsetcorrectly matches theSQL_UPDATE_OFFSETstatement: offset (parameter 1), skip (parameter 2), anchor (parameter 3), and id (parameter 4).
182-221: Column indices for skip and anchor are correct.The code correctly reads
skipfrom column index 6 andanchorfrom column index 7, matching the table schema defined in SQL_CREATE_FILES. WhenSELECT *is executed, SQLite returns columns in the order they appear in the table definition: id (0), name (1), offset (2), inode (3), created (4), rotated (5), skip (6), and anchor (7).plugins/in_tail/tail_fs_inotify.c (1)
259-281: LGTM!The truncation handler correctly resets all gzip resume state fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) alongside the file offset and buffer, ensuring a clean state after truncation. This initialization is consistent with similar handling intail_fs_stat.candadjust_countersintail_file.c.plugins/in_tail/tail_sql.h (3)
30-40: LGTM!The table schema correctly adds
skipandanchorcolumns withINTEGER DEFAULT 0, ensuring backward compatibility with existing databases. Column indices (skip=6, anchor=7) align with the reads indb_file_exists.
45-47: LGTM!The
SQL_INSERT_FILEstatement correctly includesskipandanchorin both the column list and VALUES clause. The parameter order matches the binding sequence indb_file_insert(tail_db.c lines 258-263).
52-53: LGTM!The
SQL_UPDATE_OFFSETstatement correctly updates all three position-tracking fields (offset,skip,anchor) atomically. The parameter order matches the binding sequence inflb_tail_db_file_offset(tail_db.c lines 372-375).plugins/in_tail/tail_file.c (6)
47-47: LGTM!Including
flb_compression.his appropriate for the gzip decompression functionality added in this PR.
1328-1338: LGTM!All gzip resume fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) are correctly initialized to zero/false when a new file is appended, ensuring clean initial state.
1509-1548: LGTM!The offset rewind logic correctly handles buffered data on shutdown:
- For non-compressed files: rewinds the offset by
buf_len(with bounds checking) so unprocessed data is re-read on restart.- For compressed files: logs a warning explaining that accurate rewinding is infeasible with streaming decompression.
- DB updates are properly guarded by
db_id > FLB_TAIL_DB_ID_NONEto prevent resurrecting deleted entries.This addresses the data loss issue described in the PR objectives.
1657-1679: LGTM!The truncation handler in
adjust_counterscorrectly resets all gzip resume state fields (anchor_offset,skip_bytes,exclude_bytes,skipping_mode) when a file is truncated, consistent with the inotify and stat-based truncation handlers.
1871-1891: LGTM!The skip logic during decompression correctly handles resuming from a mid-stream position:
- When
exclude_bytes >= decompressed_data_length, all newly decompressed data is skipped andexclude_bytesis decremented.- When
exclude_bytes < decompressed_data_length, the remaining bytes are shifted to the buffer start usingmemmove, andskipping_modeis cleared.This enables accurate gzip resume from the persisted
skip_bytesposition.
1932-1953: LGTM!The gzip member boundary handling correctly tracks position using the anchor/skip pattern:
skip_bytesis incremented byprocessed_bytesto track position within the current member.- When a member completes (decompressor transitions to
EXPECTING_HEADERstate and all buffers are empty),anchor_offsetadvances to the current raw file position andskip_bytesresets to 0.This enables resume at member boundaries for multi-member gzip files, addressing the data loss issue for compressed inputs mentioned in the PR objectives.
|
Thanks for the detailed analysis. I fully agree with your opinion. Although there is a risk of duplication in case of abrupt shutdown, it is technically difficult to solve completely, and duplication is definitely better than data loss. Also, regarding your question about the logs, I have changed the log level of the database migration messages to debug. This ensures they are not too noisy during normal operation while still being available for troubleshooting if needed. Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users. Let me know what you think! |
cosmo0920
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a small nitpick issue but your PR is not following our coding style.
So, we need to follow the style of defining variables.
I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly. |
Previously, when tailing gzip files, there was no mechanism to persistently
store the uncompressed position ('skip_bytes'). This meant that upon restart,
the plugin could not correctly locate the reading position, identifying it as
a rotation or new file case, potentially leading to data loss.
To fix this, 'skip_bytes' is now stored in the database to persist the
uncompressed offset. Additionally, 'exclude_bytes' is introduced to track
runtime skipping without interfering with the persistent value.
The SQLite schema has been updated to include 'anchor_offset' and 'skip_bytes'
columns to support these features.
Signed-off-by: jinyong.choi <[email protected]>
8640940 to
fdd8174
Compare
Got it! I'll create a separate PR for the documentation. |
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. Related to: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. refs: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. refs: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. refs: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. refs: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation. It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios. refs: fluent/fluent-bit#11269 Signed-off-by: jinyong.choi <[email protected]>
Unprocessed data in the internal buffer is discarded when Fluent Bit stops, causing data loss because the DB offset is already advanced.
This patch fixes the issue by rewinding the file offset by the remaining buffer length on exit, ensuring data is re-read on restart.
For compressed gzip files, a separate issue caused data duplication after restart because skip_bytes was incorrectly decremented during runtime. A new field 'exclude_bytes' is introduced as a runtime-only counter, preserving skip_bytes for correct DB persistence.
Additionally, this patch prevents resurrecting deleted file entries in the DB by resetting db_id to 0 upon deletion and checking it before updating the offset.
The SQLite schema is updated to include 'anchor_offset' and 'skip_bytes' columns. On upgrade from older versions, these columns are automatically added via ALTER TABLE if they do not exist.
Closes #11265
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.