Skip to content

[FFI] Make StructuralEqual functor compare tensor content#646

Open
pisarev wants to merge 2 commits into
apache:mainfrom
pisarev:pisarev-patch-1
Open

[FFI] Make StructuralEqual functor compare tensor content#646
pisarev wants to merge 2 commits into
apache:mainfrom
pisarev:pisarev-patch-1

Conversation

@pisarev

@pisarev pisarev commented Jun 26, 2026

Copy link
Copy Markdown

Fixes #645.

The StructuralEqual functor calls Equal(lhs, rhs, false, /*skip_tensor_content=*/true), while the StructuralHash functor hashes content (skip_tensor_content = false). The two are used together as the hash and key-equal of the constant de-duplication map in Relax VM codegen (const_dedup_map_ in apache/tvm src/relax/backend/vm/exec_builder.cc). When hash and equal disagree, the map invariant equal(a, b) => hash(a) == hash(b) does not hold, so two distinct constants of equal shape and dtype get merged on a bucket collision and a later op reads the wrong constant.

Which pair collides depends on the STL bucket count, so the same model produces wrong output under MSVC and correct output under libstdc++. The defect is latent on every platform.

This change makes the functor compare content, so it agrees with the hash. Constants that are genuinely equal are still de-duplicated.

Verified on a Windows build (MSVC 19.44, LLVM 18.1.8): after the change, YOLO11n det, YOLO11s cls, PP-OCRv5 det and PP-OCRv5 rec all match onnxruntime to within floating point; before, det and the PP-OCR models were off by 6 to 61 percent relative error.

Full analysis in #645.

The functor used skip_tensor_content=true while the StructuralHash functor hashes content. As the hash and key-equal of the constant de-duplication map they must agree, otherwise two distinct constants of equal shape and dtype can be merged on a bucket collision (platform dependent). Compare content so equal and hash stay consistent.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the StructuralEqual::operator() in include/tvm/ffi/extra/structural_equal.h to pass false instead of true as the fourth argument to the Equal function. The reviewer suggests simplifying this call to Equal(lhs, rhs) to leverage default arguments, which improves code readability and avoids the "Boolean Trap" of passing multiple consecutive boolean literals.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

*/
TVM_FFI_INLINE bool operator()(const Any& lhs, const Any& rhs) const {
return Equal(lhs, rhs, false, true);
return Equal(lhs, rhs, false, false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Passing multiple consecutive boolean literals (often referred to as the "Boolean Trap") reduces code readability and increases the risk of parameter mismatch errors. Since both map_free_vars and skip_tensor_content default to false in the signature of Equal, we can simplify this call to Equal(lhs, rhs) to make the code cleaner and more maintainable.

    return Equal(lhs, rhs);

@tqchen

tqchen commented Jun 27, 2026

Copy link
Copy Markdown
Member

pls fix ci issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][ffi] StructuralEqual functor skips tensor content while StructuralHash includes it, causing wrong constant de-duplication

2 participants