Option to retry writes if on-disk checksums don't match by alchark · Pull Request #58 · yoctoproject/bmaptool

alchark · 2026-03-06T09:26:44Z

As highlighted in #56, sometimes successfully writing an image with bmaptool to a storage device doesn't mean all data safely reached the device, which may happen for a variety of reasons, such as:

The storage device has transient hardware faults resulting in silent data corruption. bmaptool will verify that the data it sends to the block device matches the source image and respective checksums, but that might still change by the time those bits hit the storage medium
The storage device gets removed immediately after the bmaptool operation finishes, but some data might still be in OS buffers from the time between when bmaptool calls its sync() and when its own output buffer is fully flushed and given to the OS kernel

Possibly others not mentioned here, too.

This pull request introduces an explicit flush of the Python output buffer before asking the OS kernel to sync data to disk to ensure no data is still "in flight" at the moment the sync is executed, and then adds an additional option to the bmaptool copy command to force explicit checking of data on disk against the checksums recorded in the bmap file, retrying writes of every block region that didn't match. This helps ensure the integrity of data on disk after the bmaptool copy operation.

If some data is still in the Python output buffer when the sync is called, it will not be taken into consideration by the OS sync and might get lost if the device is removed before all buffer contents reach the disk. Add an explicit flush before the sync to avoid it. Signed-off-by: Alexey Charkov <alchark@flipper.net>

Add a new `--checksum-retry` option to `bmaptool copy` that will retry writing each block (possibly multiple times) until the checksum of data on disk matches the expected one from the bmap file. This is useful for devices that have unreliable writes, such as some worn down USB card readers. Signed-off-by: Alexey Charkov <alchark@flipper.net>

Copilot

Pull request overview

This PR adds an opt-in “read-back verify and retry” capability to bmaptool copy, intended to catch cases where data written to a block device later reads back with a checksum mismatch. It also ensures the Python file buffer is flushed before issuing fsync().

Changes:

Add --checksum-retry[=N] CLI option and plumb the retry count into the copy implementation.
Extend BmapCopy/BmapBdevCopy to (optionally) read back written ranges, compare against bmap range checksums, and retry writes on mismatch.
Flush the Python destination buffer before fsync() to reduce “in flight” user-space buffered data during sync.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`src/bmaptool/CLI.py`	Adds `--checksum-retry` argument, validates it, and opens block devices read/write when needed.
`src/bmaptool/BmapCopy.py`	Implements post-write readback verification + retry loop, threads range metadata through the batching pipeline, and adds an explicit `flush()` before `fsync()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T09:33:28Z

@@ -740,6 +895,11 @@ def copy(self, sync=True, verify=True):
            blocks_written += end - start + 1
            bytes_written += len(buf)

+            # Track buffers for this range (for retry if needed)
+            if self._checksum_retry:
+                range_buffers[(start, end)] = buf
+                current_range_blocks.add((start, end))


When --checksum-retry is enabled, range_buffers stores every batch buffer for the current bmap range so it can be rewritten on mismatch. A single bmap range can be very large, so this can grow to (near) the full image size and cause excessive memory usage/OOM. To make retries safe, consider spooling the range data to a temporary file (or another bounded-storage approach) and replaying from there on mismatch, rather than holding all buffers in RAM.

Spooling the range to disk would dramatically increase complexity, so I believe that would be an overkill, and can introduce issues of its own (e.g. in case of insufficient storage space, but plentiful RAM).

Ensure the docstring wording matches the logic of the function, i.e. the checksums are validated per bmap range (and not per block) Signed-off-by: Alexey Charkov <alchark@flipper.net> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…vided When the bmap file doesn't include range checksums, `--checksum-retry` becomes a no-op, so warn the user about it (at most once per file). Signed-off-by: Alexey Charkov <alchark@flipper.net>

This was a leftover from an earlier work in progress implementation, no longer needed in the current version, so remove it to avoid confusion. Signed-off-by: Alexey Charkov <alchark@flipper.net>

Add a new tests file runnable via `python3 -m unittest tests.test_checksum_retry` to verify the checksum-retry functionality. This includes tests for aligned and unaligned image sizes, correct handling of retries, and edge cases like single-byte image. Signed-off-by: Alexey Charkov <alchark@flipper.net>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T13:14:07Z

+            # Check if we've moved to a new range - if so, verify the previous one
+            new_range = (range_first, range_last, range_chksum)
+            if self._checksum_retry and current_range and new_range != current_range:
+                # Verify the completed range before moving to the next one
+                prev_first, prev_last, prev_chksum = current_range
+                if prev_chksum and self._cs_type:
+                    self._verify_range_with_retry(
+                        prev_first, prev_last, prev_chksum, range_buffers
+                    )


The warning path for --checksum-retry when the bmap has no per-range checksums never triggers because callers only invoke _verify_range_with_retry() when range_chksum is truthy. As a result, --checksum-retry becomes a silent no-op for checksum-less bmaps (despite the CLI log message). Call _verify_range_with_retry() unconditionally when finishing a range (let it decide whether to warn/skip), or move the missing-checksum detection outside the if prev_chksum ... guard.

Copilot · 2026-03-06T13:14:07Z

+    def test_checksum_retry_default_and_explicit_values(self):
+        """Test that default retry=1 and explicit retry=1 behave the same."""
+        image_size = 8192
+        image_path, bmap_path = _create_test_image_with_bmap(image_size)
+
+        try:
+            # Test with default retry (should be 1 or string "1")
+            f_image = TransRead.TransRead(image_path)
+            f_dest = tempfile.NamedTemporaryFile("w+b", delete=False)
+            dest_path = f_dest.name
+            f_bmap = open(bmap_path, "r")
+
+            # checksum_retry=None means user didn't specify it, should not retry
+            writer = BmapCopy.BmapCopy(
+                f_image, f_dest, f_bmap, image_size, checksum_retry=None
+            )
+            writer.copy(sync=True, verify=True)


test_checksum_retry_default_and_explicit_values is mislabeled: it compares checksum_retry=None (disabled) vs checksum_retry=1 (enabled), not “default retry=1” vs “explicit retry=1”. Either update the test name/docstring to reflect the actual behavior being tested, or add a case that exercises the CLI-style default (flag present with implicit value 1).

Copilot · 2026-03-06T13:14:07Z

+import sys
+import tempfile
+import hashlib
+from bmaptool import BmapCreate, BmapCopy, TransRead, BmapHelpers


This test module imports sys, hashlib, and BmapHelpers but does not use them. If pylint/linters run in CI, this will raise unused-import warnings/errors; please remove the unused imports or use them.

Suggested change

import sys

import tempfile

import hashlib

from bmaptool import BmapCreate, BmapCopy, TransRead, BmapHelpers

import tempfile

from bmaptool import BmapCreate, BmapCopy, TransRead

…-retry If the checksum-retry feature is enabled, the destination may return a full last block when reading back even if the source image size is not a multiple of the block size, resulting in a checksum mismatch. Adjust the readback logic to only access as much data as the source image size to make checksums comparable, even if a full block is written/read back. Signed-off-by: Alexey Charkov <alchark@flipper.net>

…enabled This helper already checks if each range has a checksum, so call it unconditionally when checksum-retry is enabled. This simplifies the code and ensures that the user gets a warning if checksums are missing. Signed-off-by: Alexey Charkov <alchark@flipper.net>

These were left over from the file used as a reference when writing the new tests and are unused, so remove them to clean up the code. Signed-off-by: Alexey Charkov <alchark@flipper.net>

The docstring and function name of test_checksum_retry_default_and_explicit_values() were confusing, as the default value for checksum retry is disabled. Update the test and its description to explicitly compare the images produced with checksum retry vs. without checksum retry to make sure they are the same. Signed-off-by: Alexey Charkov <alchark@flipper.net>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T14:23:13Z

+                "checksum-retry requested but bmap file does not contain checksums; "
+                "skipping verification for blocks %d-%d and beyond"


The warning in _verify_range_with_retry() says the bmap file lacks checksums, but this path can also happen when no bmap file is provided at all (e.g., API use or --nobmap). Consider adjusting the message/logic to distinguish “no bmap” from “bmap without checksums” to avoid misleading users.

Suggested change

"checksum-retry requested but bmap file does not contain checksums; "

"skipping verification for blocks %d-%d and beyond"

"checksum-retry requested but no checksum information is available "

"(no bmap file or bmap without checksums); skipping verification "

"for blocks %d-%d and beyond"

Copilot · 2026-03-06T14:23:13Z

+    def test_checksum_retry_disabled_vs_enabled(self):
+        """Test that checksum_retry=None (disabled) vs checksum_retry=1 (enabled) produce identical results."""
+        image_size = 8192


The docstring/comment for this test says "checksum_retry=None (disabled - no verification performed)", but the test still calls copy(..., verify=True) (which performs in-copy checksum verification when the bmap provides checksums). Consider rewording to clarify this only disables post-write readback verification/retry.

Copilot · 2026-03-06T14:23:14Z

+        try:
+            # Test with checksum-retry enabled
+            f_image = TransRead.TransRead(image_path)
+            f_dest = tempfile.NamedTemporaryFile("w+b", delete=False)
+            dest_path = f_dest.name
+            f_bmap = open(bmap_path, "r")
+
+            writer = BmapCopy.BmapCopy(
+                f_image, f_dest, f_bmap, image_size, checksum_retry=1
+            )
+            # copy() verifies each bmap range's checksum during operation
+            writer.copy(sync=True, verify=True)
+
+            # Verify destination has correct size
+            dest_size = os.path.getsize(dest_path)
+            self.assertEqual(dest_size, image_size)
+
+            f_image.close()
+            f_dest.close()
+            f_bmap.close()
+            os.unlink(dest_path)
+        finally:


Several tests (e.g. this one) delete dest_path only on the success path. If an assertion or writer.copy() fails, the temporary destination file can be left behind. Consider moving destination cleanup into the finally block (or using self.addCleanup) to ensure cleanup on failures too.

Copilot · 2026-03-06T14:23:14Z

+    # Validate checksum_retry argument
+    checksum_retry = None
+    if args.checksum_retry is not None:
+        try:
+            checksum_retry = int(args.checksum_retry)
+            if checksum_retry < 1:
+                error_out("--checksum-retry argument must be a positive integer")
+        except ValueError:
+            error_out("--checksum-retry argument must be a valid integer")
+


--checksum-retry relies on bmap range checksums; when --nobmap is used the option becomes a no-op (and block devices may still be opened read-write). Consider rejecting --checksum-retry together with --nobmap (and/or when no bmap file is found) with a clear error message.

alchark added 2 commits March 6, 2026 12:00

Copilot AI review requested due to automatic review settings March 6, 2026 09:26

Copilot started reviewing on behalf of alchark March 6, 2026 09:27 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

alchark and others added 4 commits March 6, 2026 13:57

Add a warning when checksum-retry is requested but no checksum is pro…

6dd5108

…vided When the bmap file doesn't include range checksums, `--checksum-retry` becomes a no-op, so warn the user about it (at most once per file). Signed-off-by: Alexey Charkov <alchark@flipper.net>

Remove unused current_range_blocks

996b433

This was a leftover from an earlier work in progress implementation, no longer needed in the current version, so remove it to avoid confusion. Signed-off-by: Alexey Charkov <alchark@flipper.net>

alchark requested a review from Copilot March 6, 2026 13:08

Copilot started reviewing on behalf of alchark March 6, 2026 13:09 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

alchark added 4 commits March 6, 2026 17:24

Remove unused imports from test_checksum_retry.py

70c9fc5

These were left over from the file used as a reference when writing the new tests and are unused, so remove them to clean up the code. Signed-off-by: Alexey Charkov <alchark@flipper.net>

alchark requested a review from Copilot March 6, 2026 14:16

Copilot started reviewing on behalf of alchark March 6, 2026 14:17 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

zhovner mentioned this pull request Mar 7, 2026

Homebrew package for macOS #57

Open

alchark mentioned this pull request Apr 8, 2026

Successful copy resulting in corrupted data #54

Open

		"checksum-retry requested but bmap file does not contain checksums; "
		"skipping verification for blocks %d-%d and beyond"

-                "checksum-retry requested but bmap file does not contain checksums; "
-                "skipping verification for blocks %d-%d and beyond"
+                "checksum-retry requested but no checksum information is available "
+                "(no bmap file or bmap without checksums); skipping verification "
+                "for blocks %d-%d and beyond"

Conversation

alchark commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

alchark Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants