Skip to content

[AIMIGRAPHX-885] Add hip graph to MIGraphX EP#234

Draft
TedThemistokleous wants to merge 15 commits intorocm7.2_internal_testingfrom
add_hip_graph
Draft

[AIMIGRAPHX-885] Add hip graph to MIGraphX EP#234
TedThemistokleous wants to merge 15 commits intorocm7.2_internal_testingfrom
add_hip_graph

Conversation

@TedThemistokleous
Copy link
Copy Markdown
Collaborator

Description

Motivation and Context

@TedThemistokleous TedThemistokleous force-pushed the update_sync_stream branch 3 times, most recently from 10eddc6 to 7eefe53 Compare April 20, 2026 19:30
Base automatically changed from update_sync_stream to rocm7.2_internal_testing April 20, 2026 20:03
…head by statically allocating buffers for model IO/params on session start
Plumbs a new boolean option through the full provider options pipeline:
- Provider option key: migraphx_hip_graph_enable
- Environment variable: ORT_MIGRAPHX_HIP_GRAPH_ENABLE
- MIGraphXExecutionProviderInfo struct field
- EP constructor initialization and env override
- GetProviderOptions / ToProviderOptions round-trip
- Hash function for provider info

No behavioral changes; flag is wired but not yet consumed.
Add requirements to disable graph capture if MIGraphX env variables are set on session creation.
Ensure we have primatives we need to perform hipGraph capture/replay
Track execution path taken (ultra-fast/fast/standard), hipGraph warmup/
capture/replay phases, pinned I/O buffer allocation sizes, dynamic batch
compilation progress, and program cache hit/miss across all code paths.

Made-with: Cursor
When hipGraph is enabled, the MIGraphXAllocator now caches freed device
pointers by size and returns them on subsequent same-size allocations.
This ensures ORT tensor buffers get stable GPU addresses across inference
calls, which is a prerequisite for capturing hipGraph directly on ORT
buffers (eliminating the intermediary pinned-copy overhead).

Pool mode is gated behind hip_graph_enable -- zero behavioral change
when hipGraph is disabled.

Made-with: Cursor
Add warmup_and_capture_hip_graph_direct() and run_program_or_hip_graph_direct()
which capture and replay hipGraphs using ORT's tensor pointers directly
instead of intermediary pinned buffers. Captured addresses are stored in
CapturedHipGraph so pointer drift can be detected and trigger re-capture.

Also adds use_direct_hip_graph flag to MIGraphXFuncState, set alongside
hip_graph_enabled during node state creation.

Made-with: Cursor
When use_direct_hip_graph is true and no batch padding is needed, all
three execution paths (ultra-fast, fast, standard) now bind ORT tensor
pointers directly into MIGraphX program_parameters and dispatch through
run_program_or_hip_graph_direct(). This eliminates the copy_inputs_to_pinned
and copy_pinned_outputs_to_ort memcpy rounds entirely.

The pinned-copy path is preserved as fallback for padding cases and when
use_direct_hip_graph is false.

Made-with: Cursor
Track direct-bind re-capture count in MIGraphXFuncState. If pointer
drift triggers more than kMaxDirectRecaptures (3) graph re-captures,
permanently disable use_direct_hip_graph for that node and fall back
to eager run_migraphx_program execution. This prevents infinite
re-capture loops if the pool allocator cannot maintain pointer stability.

Made-with: Cursor
When batch padding is needed (actual_batch < compiled_batch), all
execution paths continue to use the existing pinned-copy hipGraph
path since the padded buffer sizes differ from ORT's tensor sizes.

Also adds direct-bind path to the dynamic-batch compilation code
for exact-match (no-padding) cases, and ensures use_direct_hip_graph
is disabled alongside hip_graph_enabled in all compatibility-check
failure paths.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant