Skip to content

Conversation

@jinyongchoi
Copy link
Contributor

@jinyongchoi jinyongchoi commented Dec 9, 2025

Unprocessed data in the internal buffer is discarded when Fluent Bit stops, causing data loss because the DB offset is already advanced.

This patch fixes the issue by rewinding the file offset by the remaining buffer length on exit, ensuring data is re-read on restart.

For compressed gzip files, a separate issue caused data duplication after restart because skip_bytes was incorrectly decremented during runtime. A new field 'exclude_bytes' is introduced as a runtime-only counter, preserving skip_bytes for correct DB persistence.

Additionally, this patch prevents resurrecting deleted file entries in the DB by resetting db_id to 0 upon deletion and checking it before updating the offset.

The SQLite schema is updated to include 'anchor_offset' and 'skip_bytes' columns. On upgrade from older versions, these columns are automatically added via ALTER TABLE if they do not exist.

Closes #11265


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[SERVICE]
    flush 2
    grace 60
    log_level debug
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
[SERVICE]
    flush 2
    grace 60
    log_level debug
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input.gz
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
  • Debug log output from testing the change
normal file
[2025/12/15 20:40:56.47094045] [debug] [input:tail:input_log] inode=50643270 rewind offset for /tmp/testing.input: old=185883589 new=185883490 (buf_len=99)

compressed file
[2025/12/15 20:45:12.615579997] [debug] [input:tail:input_log] Skipping: anchor=0 offset=0 exclude=1119419529 decompressed=999999
[2025/12/15 20:45:12.617241577] [debug] [input:tail:input_log] Skipping: anchor=0 offset=999999 exclude=1118419530 decompressed=15809
...
[2025/12/15 20:45:15.408197399] [debug] [input:tail:input_log] Skipping: anchor=0 offset=10923918 exclude=1014921 decompressed=999999
[2025/12/15 20:45:15.408206153] [debug] [input:tail:input_log] Skipping: anchor=0 offset=10923918 exclude=14922 decompressed=15809
[2025/12/15 20:45:20.13095551] [debug] [input:tail:input_log] Gzip member completed: updating anchor from 0 to 10923918, resetting skip from 2147483784 to 0
  • Attached Valgrind output that shows no leaks or memory corruption was found
valgrind --leak-check=full ./bin/fluent-bit -v -c ./fluentbit.conf
...
==546544== 
==546544== HEAP SUMMARY:
==546544==     in use at exit: 0 bytes in 0 blocks
==546544==   total heap usage: 1,973,893 allocs, 1,973,893 frees, 2,123,730,947 bytes allocated
==546544== 
==546544== All heap blocks were freed -- no leaks are possible
==546544== 
==546544== For lists of detected and suppressed errors, rerun with: -s
==546544== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Persistent resume for gzip-compressed logs, database schema migration to store resume state, and a new sentinel for "no DB id".
  • Bug Fixes

    • Improved seek/offset handling for compressed and plain files; correct behavior across truncation, rotation, removal and shutdown; reliable skipping of already-processed decompressed bytes.
  • Tests

    • Added DB + gzip tests covering resume loss, append/inotify append, rotation, and multi-resume.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds gzip-aware resume state and DB-backed bookkeeping: new per-file fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode), DB schema migration and SQL updates for skip/anchor, seek/processing changes to respect gzip decompression and member boundaries, rewind DB offsets when buffered data remains, and new gzip/DB tests.

Changes

Cohort / File(s) Summary
Core file & decompression logic
plugins/in_tail/tail_file.c
Add gzip-aware resume logic and includes; choose seek position from DB/anchor/read_from_head; initialize anchor/exclude/skipping after seek; apply skip/exclude logic during chunk processing; increment/reset skip_bytes across gzip members; rewind DB offset on removal when buffered data exists (unless decompression active).
DB layer & SQL
plugins/in_tail/tail_db.c, plugins/in_tail/tail_sql.h
Add pragma-based migration to add skip/anchor columns; replace in-file query_status with cb_column_exists; extend db_file_exists to return skip and anchor; bind/read skip/anchor in insert/update/offset/rotate/delete paths; reset db_id to FLB_TAIL_DB_ID_NONE after delete.
File struct & internal headers
plugins/in_tail/tail_file_internal.h, plugins/in_tail/tail.h
Add per-file fields anchor_offset (int64_t), skip_bytes (uint64_t), exclude_bytes (uint64_t), skipping_mode (int); add macro FLB_TAIL_DB_ID_NONE.
FS event / stat handling
plugins/in_tail/tail_fs_inotify.c, plugins/in_tail/tail_fs_stat.c
On truncation (size_delta < 0) initialize anchor_offset, skip_bytes, exclude_bytes, and skipping_mode alongside offset and buf_len.
File removal & counters
plugins/in_tail/tail_file.c (remove/adjust_counters)
On removal, if buffered data exists and no decompression, rewind file->offset by buf_len (floor 0) and persist to DB; on truncation reset anchor/skip/exclude/skipping and update DB when enabled.
SQL definitions
plugins/in_tail/tail_sql.h
Add skip and anchor columns (DEFAULT 0) to in_tail_files; update SQL_INSERT_FILE and SQL_UPDATE_OFFSET to include/handle skip and anchor.
Tests & helpers
tests/runtime/in_tail.c
Add raw write helper, gzip create/append utilities, gzip-resume inspection and wait utility; add DB/gzip resume, append and rotation tests and register them in TEST_LIST.

Sequence Diagram(s)

sequenceDiagram
    participant Disk as Disk File
    participant Reader as in_tail Reader
    participant Decompress as Gzip Decompressor
    participant Buffer as In-memory Buffer
    participant DB as SQLite DB

    Disk->>Reader: read compressed/raw bytes (advance raw offset)
    Reader->>Buffer: append raw bytes
    alt decompression_context (gzip)
        Buffer->>Decompress: feed compressed bytes
        Decompress-->>Buffer: decompressed data (may span members)
        Buffer->>Buffer: apply exclude_bytes / skipping_mode (drop initial decompressed bytes)
        alt gzip member boundary reached
            Decompress->>Reader: notify member end
            Reader->>Reader: set anchor_offset (member start), reset skip_bytes
        end
        Reader->>DB: persist raw offset, skip, anchor
    else no decompression
        Buffer->>DB: persist raw offset
    end
    alt Shutdown with buffered unprocessed data and no decompression
        Reader->>DB: rewind offset (offset -= buf_len) and persist
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay attention to:
    • DB migration and SQL binding/column ordering (plugins/in_tail/tail_db.c, plugins/in_tail/tail_sql.h).
    • Seek selection and post-seek initialization for compressed streams (plugins/in_tail/tail_file.c).
    • Lifecycle and transitions of skip_bytes / exclude_bytes / skipping_mode across reads and gzip-member boundaries.
    • Correctness of offset rewind logic in flb_tail_file_remove and DB synchronization on shutdown.
    • New test helpers and gzip test correctness in tests/runtime/in_tail.c.

Possibly related PRs

Suggested labels

backport to v4.0.x, backport to v4.1.x

Suggested reviewers

  • edsiper
  • cosmo0920
  • leonardo-albertovich
  • koleini
  • fujimotos

Poem

🐇
I nibble compressed bytes at dawn,
I mark anchors where members yawn.
When partial lines hide from sight,
I hop back offsets through the night.
Hops, skips, and crumbs — I make logs right.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main objectives: fixing data loss with buffered data on shutdown and adding gzip file handling.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #11265: implements offset rewinding for uncompressed files, handles gzip with schema changes and resume logic, resets db_id to prevent resurrection, and adds DB migration for new columns.
Out of Scope Changes check ✅ Passed All changes are scoped to addressing the data loss issue: buffer handling, gzip resume logic, DB schema migration, and test coverage. No unrelated modifications were detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8640940 and fdd8174.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (11 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_sql.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_db.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.h (1)
  • flb_tail_target_file_name_cmp (64-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (16)
plugins/in_tail/tail_fs_stat.c (1)

139-142: LGTM! Consistent gzip state reset on truncation.

The initialization of anchor_offset, skip_bytes, exclude_bytes, and skipping_mode on file truncation is correct and consistent with the corresponding logic in tail_fs_inotify.c and adjust_counters() in tail_file.c.

plugins/in_tail/tail_file.c (8)

47-47: LGTM!

Required include for the decompression state constants used in the gzip member boundary detection logic.


1031-1096: Well-structured gzip-aware DB position handling.

The logic correctly handles three scenarios:

  1. Gzip files with persisted anchor/skip: seeks to anchor_offset and enables skipping mode
  2. Normal files or legacy gzip entries: uses the stored offset
  3. No DB position with read_from_head=off: seeks to EOF

The comment at lines 1089-1094 appropriately documents the multi-member gzip limitation where skip_bytes only tracks bytes within the current member. This is a known limitation acknowledged in the PR discussion.


1130-1138: LGTM!

Correctly initializes gzip resume fields for compressed files without DB persistence. Setting stream_offset = 0 is intentional as there's no prior decompressed byte count to track.


1329-1338: LGTM!

Good initialization of the gzip resume fields. Using FLB_TAIL_DB_ID_NONE instead of a magic 0 improves code clarity and consistency with the deletion path.


1509-1548: Correct offset rewind implementation for uncompressed files.

The logic properly:

  1. Rewinds offset by buf_len for uncompressed files to prevent data loss on restart
  2. Logs a debug message documenting the acknowledged gzip limitation
  3. Uses db_id > FLB_TAIL_DB_ID_NONE check to avoid resurrecting deleted DB entries (addressing the past review concern)

The gzip limitation (lines 1537-1541) is intentional and documented in the PR objectives—mapping decompressed buffer positions back to compressed offsets is infeasible with streaming decompression.


1668-1671: LGTM!

Correctly resets all gzip resume state fields on file truncation, consistent with the truncation handling in tail_fs_stat.c and tail_fs_inotify.c.


1872-1892: Correct implementation of decompressed data skipping.

The skip logic properly handles:

  1. Full skip: when exclude_bytes >= decompressed_data_length, decrement and discard all data
  2. Partial skip: calculate remaining bytes, use memmove for the overlapping buffer shift, clear skipping_mode

Using memmove is correct here since source and destination overlap.


1933-1954: Solid gzip member boundary detection and anchor update.

The logic correctly:

  1. Tracks decompressed bytes within the current member via skip_bytes
  2. Detects member completion when the decompressor expects a new header and all buffers are empty
  3. Updates anchor_offset to the current raw file position for safe resume

As noted in the PR discussion (by cosmo0920), there's a known corner case with multi-member gzip + multiline where a shutdown between the in-memory skip_bytes increment and the DB persist can cause small duplication on restart. This is technically difficult to eliminate and is appropriately documented as a limitation rather than a bug.

plugins/in_tail/tail_db.c (7)

28-34: LGTM!

Clean callback implementation for detecting query results. This addresses the previous review feedback about using pragma_table_info for reliable column existence detection.


61-107: Robust schema migration using pragma_table_info.

This addresses the previous review feedback:

  • Uses pragma_table_info to reliably detect column existence
  • Properly distinguishes between query failures (returns NULL with error log) and missing columns (triggers migration)
  • Using flb_plg_debug for migration messages is appropriate per the PR discussion

182-232: LGTM! Proper type handling for skip/anchor columns.

The extended signature correctly uses:

  • int64_t for offset and anchor (matching sqlite3_column_int64 return type)
  • uint64_t for skip (matching file->skip_bytes type)

This addresses the previous review concern about potential truncation on platforms where off_t is 32-bit. The added cleanup at lines 204-205 before returning on error is also good practice.


262-263: LGTM!

Correctly binds skip_bytes and anchor_offset to the INSERT statement parameters.


323-359: LGTM! Correct restoration of gzip resume state from DB.

The logic properly:

  1. Retrieves skip_bytes and anchor_offset from the database
  2. Initializes skipping_mode and exclude_bytes when skip_bytes > 0, enabling the skip logic in flb_tail_file_chunk()
  3. Uses correct types matching the db_file_exists signature

373-375: LGTM!

Binding order correctly matches the SQL_UPDATE_OFFSET statement: offset, skip, anchor for SET clause, then db_id for WHERE clause.


444-444: Critical fix: Reset db_id to prevent DB entry resurrection.

Setting db_id = FLB_TAIL_DB_ID_NONE after deletion works in conjunction with the db_id > FLB_TAIL_DB_ID_NONE check in flb_tail_file_remove() to prevent the bug where a deleted file's DB entry could be recreated if offset rewinding occurs after deletion.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ded9ae and 71208f6.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: Agent
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)

1471-1471: Remove trailing whitespace.

There are trailing spaces after #endif on this line.

Apply this diff:

-#endif        
+#endif
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

1451-1472: Consider adding a warning for compressed files with buffered data.

While the fix correctly excludes compressed files (where rewinding isn't possible due to gzip decompression state requirements), users monitoring .gz files may be unaware that buffered data can still be lost on shutdown. Consider adding a debug or warning message similar to:

if (file->buf_len > 0) {
    if (file->decompression_context != NULL) {
        flb_plg_debug(ctx->ins, 
                      "inode=%"PRIu64" compressed file %s has %lu bytes buffered; "
                      "rewind not supported for compressed files",
                      file->inode, file->name, (unsigned long)file->buf_len);
    }
    else {
        /* existing rewind logic */
        ...
    }
}

This would help users understand the limitation without affecting correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71208f6 and f63f1f4.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)

1451-1472: LGTM! Rewind logic correctly prevents data loss for regular files.

The implementation properly addresses the data loss issue when Fluent Bit shuts down with unprocessed buffered data:

  • The condition file->buf_len > 0 && file->decompression_context == NULL correctly identifies regular files with unprocessed data while excluding compressed files (where offset and buf_len have incompatible units).
  • The rewind arithmetic offset -= buf_len (clamped to 0) correctly repositions the file offset to the start of unprocessed data.
  • The DB persistence call ensures the rewound offset survives restarts.

This addresses the core issue described in #11265 where partial lines and buffered data were lost on restart.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/in_tail/tail_file.c (2)

1451-1472: Core rewind logic looks correct; consider adding warning for compressed files.

The check for file->decompression_context == NULL properly prevents the compressed-file arithmetic bug flagged in earlier reviews. For regular files, the offset and buffer length are in the same units, so the subtraction and DB update are valid.

However, when compressed files have buffered decompressed data at shutdown, that data will be silently lost (since rewinding is skipped). Consider logging a warning in that case so users are aware:

 if (file->buf_len > 0 && file->decompression_context == NULL) {
     /*
      * If there is data in the buffer, it means it was not processed.
      * We must rewind the offset to ensure this data is re-read on restart.
      */
     off_t old_offset = file->offset;

     if (file->offset > file->buf_len) {
         file->offset -= file->buf_len;
     } else {
         file->offset = 0;
     }

     flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: old=%"PRId64" new=%"PRId64" (buf_len=%lu)",
                   file->inode, file->name, old_offset, file->offset, (unsigned long)file->buf_len);

 #ifdef FLB_HAVE_SQLDB
     if (ctx->db) {
         flb_tail_db_file_offset(file, ctx);
     }
 #endif
+}
+else if (file->buf_len > 0 && file->decompression_context != NULL) {
+    flb_plg_warn(ctx->ins, "inode=%"PRIu64" cannot rewind compressed file %s; "
+                 "%lu decompressed bytes in buffer may be lost on restart",
+                 file->inode, file->name, (unsigned long)file->buf_len);
 }

1471-1471: Remove trailing whitespace.

Minor formatting issue: extra spaces after #endif.

-#endif        
+#endif
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f63f1f4 and 76f3c05.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)

258-263: Verify SQL_INSERT_FILE parameter order and address skip_bytes type mismatch.

The bindings at lines 258-263 correctly match SQL_INSERT_FILE's parameter order (name, offset, inode, created, skip, anchor). However, skip_bytes is defined as uint64_t but bound via sqlite3_bind_int64, which expects a signed sqlite3_int64. While values exceeding INT64_MAX are unlikely for a byte offset, consider either casting to int64_t explicitly or using sqlite3_bind_int64 with a comment documenting the assumption that skip_bytes remains within signed int64 range.

🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

1027-1141: Document multi-member gzip stream_offset limitation more prominently.

The comment at lines 1090-1095 acknowledges that stream_offset = skip_bytes is only correct for single-member gzip files. For multi-member gzip, skip_bytes resets at member boundaries, so stream_offset won't reflect total decompressed bytes from all prior members. This could affect offset_key accuracy when users concatenate multiple gzip streams.

Consider either:

  1. Adding a more prominent warning in the code or documentation that offset_key for multi-member gzip reflects position within the current member, not total decompressed bytes across all members.
  2. Tracking total decompressed bytes separately if multi-member gzip is a common use case.

Otherwise, the seek logic correctly handles DB-backed resume for both gzip and plain files, with proper initialization of stream_offset, exclude_bytes, and skipping_mode.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1263901 and 8640940.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_fs_stat.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.h (1)
  • flb_tail_target_file_name_cmp (64-116)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (16)
plugins/in_tail/tail_db.c (6)

28-34: LGTM!

The column-existence callback is correctly implemented. It properly sets the flag when a row is returned from the pragma query, enabling reliable column detection for schema migration.


61-108: LGTM!

The migration logic correctly uses pragma_table_info to detect column existence, distinguishing real errors from missing columns. The pattern is consistent for both skip and anchor columns, with proper error handling and logging.


318-363: LGTM!

The function correctly restores skip_bytes and anchor_offset from the database and initializes the runtime-only exclude_bytes and skipping_mode fields based on the persisted skip state. The logic properly handles both cases (skip > 0 and skip == 0).


424-446: LGTM!

Resetting db_id to FLB_TAIL_DB_ID_NONE after deletion prevents accidental resurrection of deleted DB entries when flb_tail_db_file_offset is called later. This is an important safeguard for data integrity.


365-398: The parameter binding order in flb_tail_db_file_offset correctly matches the SQL_UPDATE_OFFSET statement: offset (parameter 1), skip (parameter 2), anchor (parameter 3), and id (parameter 4).


182-221: Column indices for skip and anchor are correct.

The code correctly reads skip from column index 6 and anchor from column index 7, matching the table schema defined in SQL_CREATE_FILES. When SELECT * is executed, SQLite returns columns in the order they appear in the table definition: id (0), name (1), offset (2), inode (3), created (4), rotated (5), skip (6), and anchor (7).

plugins/in_tail/tail_fs_inotify.c (1)

259-281: LGTM!

The truncation handler correctly resets all gzip resume state fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) alongside the file offset and buffer, ensuring a clean state after truncation. This initialization is consistent with similar handling in tail_fs_stat.c and adjust_counters in tail_file.c.

plugins/in_tail/tail_sql.h (3)

30-40: LGTM!

The table schema correctly adds skip and anchor columns with INTEGER DEFAULT 0, ensuring backward compatibility with existing databases. Column indices (skip=6, anchor=7) align with the reads in db_file_exists.


45-47: LGTM!

The SQL_INSERT_FILE statement correctly includes skip and anchor in both the column list and VALUES clause. The parameter order matches the binding sequence in db_file_insert (tail_db.c lines 258-263).


52-53: LGTM!

The SQL_UPDATE_OFFSET statement correctly updates all three position-tracking fields (offset, skip, anchor) atomically. The parameter order matches the binding sequence in flb_tail_db_file_offset (tail_db.c lines 372-375).

plugins/in_tail/tail_file.c (6)

47-47: LGTM!

Including flb_compression.h is appropriate for the gzip decompression functionality added in this PR.


1328-1338: LGTM!

All gzip resume fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) are correctly initialized to zero/false when a new file is appended, ensuring clean initial state.


1509-1548: LGTM!

The offset rewind logic correctly handles buffered data on shutdown:

  • For non-compressed files: rewinds the offset by buf_len (with bounds checking) so unprocessed data is re-read on restart.
  • For compressed files: logs a warning explaining that accurate rewinding is infeasible with streaming decompression.
  • DB updates are properly guarded by db_id > FLB_TAIL_DB_ID_NONE to prevent resurrecting deleted entries.

This addresses the data loss issue described in the PR objectives.


1657-1679: LGTM!

The truncation handler in adjust_counters correctly resets all gzip resume state fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) when a file is truncated, consistent with the inotify and stat-based truncation handlers.


1871-1891: LGTM!

The skip logic during decompression correctly handles resuming from a mid-stream position:

  • When exclude_bytes >= decompressed_data_length, all newly decompressed data is skipped and exclude_bytes is decremented.
  • When exclude_bytes < decompressed_data_length, the remaining bytes are shifted to the buffer start using memmove, and skipping_mode is cleared.

This enables accurate gzip resume from the persisted skip_bytes position.


1932-1953: LGTM!

The gzip member boundary handling correctly tracks position using the anchor/skip pattern:

  • skip_bytes is incremented by processed_bytes to track position within the current member.
  • When a member completes (decompressor transitions to EXPECTING_HEADER state and all buffers are empty), anchor_offset advances to the current raw file position and skip_bytes resets to 0.

This enables resume at member boundaries for multi-member gzip files, addressing the data loss issue for compressed inputs mentioned in the PR objectives.

@jinyongchoi
Copy link
Contributor Author

Thanks for the detailed analysis. I fully agree with your opinion. Although there is a risk of duplication in case of abrupt shutdown, it is technically difficult to solve completely, and duplication is definitely better than data loss.

Also, regarding your question about the logs, I have changed the log level of the database migration messages to debug. This ensures they are not too noisy during normal operation while still being available for troubleshooting if needed.

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.

Let me know what you think!
Thanks!

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a small nitpick issue but your PR is not following our coding style.
So, we need to follow the style of defining variables.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Dec 19, 2025

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.

Let me know what you think! Thanks!

I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly.

Previously, when tailing gzip files, there was no mechanism to persistently
store the uncompressed position ('skip_bytes'). This meant that upon restart,
the plugin could not correctly locate the reading position, identifying it as
a rotation or new file case, potentially leading to data loss.

To fix this, 'skip_bytes' is now stored in the database to persist the
uncompressed offset. Additionally, 'exclude_bytes' is introduced to track
runtime skipping without interfering with the persistent value.

The SQLite schema has been updated to include 'anchor_offset' and 'skip_bytes'
columns to support these features.

Signed-off-by: jinyong.choi <[email protected]>
@jinyongchoi
Copy link
Contributor Author

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.
Let me know what you think! Thanks!

I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly.

Got it! I'll create a separate PR for the documentation.
Thanks!

jinyongchoi pushed a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

Related to: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit-docs that referenced this pull request Dec 19, 2025
This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

in_tail: Data loss on exit/restart due to unhandled buffer (partial lines)

2 participants