Skip to content

DAOS-15847 object: restore iov_len for fetch on dup-only (no-merge) SGLs#18413

Merged
daltonbohning merged 3 commits into
masterfrom
lxz/obj_dup_sgls_free_fix
Jun 17, 2026
Merged

DAOS-15847 object: restore iov_len for fetch on dup-only (no-merge) SGLs#18413
daltonbohning merged 3 commits into
masterfrom
lxz/obj_dup_sgls_free_fix

Conversation

@liuxuezhao

Copy link
Copy Markdown
Contributor

In obj_dup_sgls_free(), when ctx->alloc_bitmaps[i] == NULL, the function skips the entire SGL with continue:
if (!ctx->alloc_bitmaps || !ctx->alloc_bitmaps[i])
continue;

This case arises when a SGL was duplicated only to strip iov_buf_len==0 entries (skip_sgl_iov returned true) but no IOVs were merged into allocated buffers. During construction (second pass of obj_sgls_dup), non-merged IOVs are copied as:
*iov_dup = *iov;
So iov_dup->iov_buf == iov->iov_buf (same pointer). The lower layer writes fetch data into iov_dup->iov_buf and updates iov_dup->iov_len to reflect actual bytes read.
Because obj_dup_sgls_free skips such SGLs, iov->iov_len in the original SGL is never updated. After api_args->sgls is restored to orr_usgls, the caller sees stale pre-fetch iov_len values even though the data is in the correct buffers.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

In obj_dup_sgls_free(), when ctx->alloc_bitmaps[i] == NULL, the function skips
the entire SGL with `continue`:
    if (!ctx->alloc_bitmaps || !ctx->alloc_bitmaps[i])
        continue;

This case arises when a SGL was duplicated only to strip iov_buf_len==0 entries
(skip_sgl_iov returned true) but no IOVs were merged into allocated buffers.
During construction (second pass of obj_sgls_dup), non-merged IOVs are copied as:
    *iov_dup = *iov;
So iov_dup->iov_buf == iov->iov_buf (same pointer).  The lower layer writes
fetch data into iov_dup->iov_buf and updates iov_dup->iov_len to reflect actual
bytes read.
Because obj_dup_sgls_free skips such SGLs, iov->iov_len in the *original* SGL
is never updated.  After api_args->sgls is restored to orr_usgls, the caller sees
stale pre-fetch iov_len values even though the data is in the correct buffers.

Signed-off-by: Xuezhao Liu <xuezhao.liu@hpe.com>
@liuxuezhao liuxuezhao requested review from a team as code owners June 3, 2026 08:42
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Ticket title is 'refine rebuild's error handling'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-15847

@liuxuezhao liuxuezhao requested a review from wangshilong June 3, 2026 08:43

@wangshilong wangshilong left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding.

@daosbuild3

Copy link
Copy Markdown
Collaborator

@liuxuezhao liuxuezhao requested a review from Nasf-Fan June 3, 2026 13:04
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18413/2/testReport/

@liuxuezhao

Copy link
Copy Markdown
Contributor Author

only one known failure of dfuse/daos_build.py not related with this PR.

@liuxuezhao liuxuezhao requested a review from a team June 16, 2026 15:37
@liuxuezhao liuxuezhao added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jun 16, 2026
@daltonbohning

Copy link
Copy Markdown
Contributor

@liuxuezhao Will you please merge latest master since CI should be clean now? I will watch this PR and merge once it makes it through CI cleanly

@liuxuezhao

Copy link
Copy Markdown
Contributor Author

@liuxuezhao Will you please merge latest master since CI should be clean now? I will watch this PR and merge once it makes it through CI cleanly

sure, merged

@liuxuezhao liuxuezhao removed the request for review from a team June 16, 2026 15:52
@daltonbohning daltonbohning requested a review from a team June 17, 2026 14:32
@daltonbohning daltonbohning removed the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jun 17, 2026
@daltonbohning daltonbohning merged commit 4f91965 into master Jun 17, 2026
42 checks passed
@daltonbohning daltonbohning deleted the lxz/obj_dup_sgls_free_fix branch June 17, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants