Skip to content

Conversation

@vedithal-amd
Copy link
Contributor

Motivation

Allow rocflop sample workload to be built on MI 200

Technical Details

Add pre-processor guards and runtime checks for rocflop.cpp for gfx90a

JIRA ID

Test Plan

Test Result

Ensure rocflop.cpp compiles on MI 200

Submission Checklist

Copilot AI review requested due to automatic review settings January 8, 2026 16:36
@vedithal-amd vedithal-amd requested a review from a team as a code owner January 8, 2026 16:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds preprocessor guards and runtime checks to exclude SMFMAC (Sparse MFMA) instructions for the gfx90a architecture (MI 200), where these instructions are not actually supported. The changes correctly restrict SMFMAC availability to gfx940 (MI300) and later architectures.

Key changes:

  • Added !defined(__gfx90a__) to all SMFMAC-related preprocessor guards
  • Updated the runtime architecture check from arch.minor > 0x4 to arch.minor >= 0x4 to correctly enable SMFMAC on gfx940 while excluding gfx90a
  • Updated all related comments to accurately reflect that SMFMAC is only available on gfx940 and later, not on gfx906, gfx908, or gfx90a

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vedithal-amd vedithal-amd merged commit ebe22b5 into develop Jan 9, 2026
34 checks passed
@vedithal-amd vedithal-amd deleted the users/vedithal/rocprofiler-compute-fix-rocflop branch January 9, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants