fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder #5341

ColdL · 2025-11-25T12:59:46Z

Recently, my collaborator was testing the Lance file format and encountered a crash with the following error message:

Encountered internal error. Please file a bug report at lancedb/lance. drain was called on primitive field decoder for data type Float32 on column 2 but the decoder was never awaited, /aoneci/runner/work/source/rust/lance-encoding/src/previous/encodings/logical/primitive.rs:348:27

After some investigation, I found that this should be a bug.

The test used both LanceFileVersion V2_0 and V2_1. This issue occurred with V2_0. If data is written in the V2_0 format, there is a chance this bug will be triggered, even though the reading comes from the current latest Lance reader. Considering that V2_1 was only stable after v0.38.0, this bug should be worth fixing.

The cause of this bug is in the next_batch_task function in rust/lance-encoding/src/decoder.rs, where rows_scheduled only represents the scheduled length, not the length of completed I/O. When a piece of data is only scheduled but has not completed I/O, the above error occurs.

The specific flow of how the bug occurs is as follows:

BatchDecodeIterator calls next_batch_task for the first time, at which point rows_scheduled is initially zero, entering the scheduled_need > 0 branch.
In wait_for_io, rows_scheduled gets updated and synchronously waits for the data needed (to_take). Note there is a key difference here. rows_scheduled is the scheduled length, but only waits for to_take length of data. rows_scheduled may far exceeds to_take, and the data exceeding to_take may not have completed I/O yet.
When entering next_batch_task again, rows_scheduled may already be a large value. If to_take is small, it may miss the scheduled_need > 0 branch, completely skipping wait_for_io.
However, the scheduled data may not have completed I/O yet, so when the program proceeds to drain_batch, it crashes.

The fix for this bug is also in the next_batch_task function. The logic is to perform wait_for_io on the else branch. If the data is actually ready, wait_for_io should not cause new I/O or context switching, thus having almost no negative impact.

I have added a new UT test_blocking_take_with_many_rows in rust/lance-file/src/reader.rs. When the version is V2_0, this bug can be reproduced without this fix.

github-actions · 2025-11-25T13:00:02Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

…not completely loaded in decoder

codecov · 2025-11-26T03:15:16Z

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-encoding/src/decoder.rs	0.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

fix: always wait_for_io to prevent crash when rows are scheduled but …

079407c

…not completely loaded in decoder

ColdL force-pushed the fix-decoder-drain-batch branch from b543d28 to 079407c Compare November 26, 2025 02:42

ColdL changed the title ~~fix drain_batch when rows are scheduled but not completely loaded~~ fix: always wait_for_io to prevent crash when rows are scheduled but not completely loaded in decoder Nov 26, 2025

ColdL changed the title ~~fix: always wait_for_io to prevent crash when rows are scheduled but not completely loaded in decoder~~ fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder Nov 26, 2025

github-actions bot added the bug Something isn't working label Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder #5341

fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder #5341

Uh oh!

ColdL commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

codecov bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder #5341

Are you sure you want to change the base?

fix: always wait_for_io to prevent crash when rows are scheduled but not loaded in decoder #5341

Uh oh!

Conversation

ColdL commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

codecov bot commented Nov 26, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant