Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Jan 5, 2026

This should fix the issue where XPU results were confused as CPU because the device name wasn't set correctly

Testing

pytorch/pytorch#171731

cc @chuanqi129

@huydhn huydhn requested a review from yangw-dev January 5, 2026 19:38
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 5, 2026
@vercel
Copy link

vercel bot commented Jan 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Review Updated (UTC)
torchci Ignored Ignored Preview Jan 5, 2026 10:40pm

huydhn added a commit to huydhn/pytorch that referenced this pull request Jan 5, 2026
elif [[ "${DEVICE_NAME}" == "cpu" ]]; then
DEVICE_TYPE="$(lscpu | grep "Model name" | sed -E 's/.*Model name:[[:space:]]*//; s/Intel\(R\)//g; s/\(R\)//g; s/\(TM\)//g; s/CPU//g; s/Processor//g; s/[[:space:]]+/ /g; s/^ //; s/ $//; s/ /_/g')_$(awk -F: '/Core\(s\) per socket/ {c=$2} /Socket\(s\)/ {s=$2} END {gsub(/ /,"",c); gsub(/ /,"",s); printf "%sc", c*s}' < <(lscpu))"
elif [[ "${DEVICE_NAME}" == "xpu" ]]; then
DEVICE_TYPE=$(xpu-smi discovery -d 0 -j | jq -r '.[0].device_name')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect this be different? what DEVICE_TYPE we expect here? in FE we assume there is arch name, if this has some value, we might need to update FE query logics for compipler

Source:
mapping device and arch
https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark/compilers/common.tsx#L82

used in compiler query converter before query data
https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark_v3/configs/teams/compilers/config.ts#L158

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention in the upload script is to set DEVICE_NAME to a generic name like cuda, rocm, xpu while the DEVICE_TYPE holds the specific device name. xpu-smi discovery returns something like https://github.com/pytorch/pytorch/actions/runs/20727627657/job/59509816791#step:3:42

+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-001a-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:1a:00.0                                                        |
|           | DRM Device: /dev/dri/card3                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+

So, the DEVICE_TYPE here will be Intel(R) Data Center GPU Max 1100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, we can keep arch as empty for XPU device for now https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark/compilers/common.tsx#L82

@yangw-dev
Copy link
Contributor

test failed pytorch/pytorch#171731

2026-01-05T20:40:50.1272103Z ++ xpu-smi discovery -d 0 -j
2026-01-05T20:40:50.1272770Z ++ jq -r '.[0].device_name'
2026-01-05T20:40:50.9619376Z jq: error (at <stdin>:52): Cannot index object with number

@huydhn
Copy link
Contributor Author

huydhn commented Jan 5, 2026

test failed pytorch/pytorch#171731

2026-01-05T20:40:50.1272103Z ++ xpu-smi discovery -d 0 -j
2026-01-05T20:40:50.1272770Z ++ jq -r '.[0].device_name'
2026-01-05T20:40:50.9619376Z jq: error (at <stdin>:52): Cannot index object with number

Darn, Claude lets me down, I don't have an XPU device to test this locally, thus the need to test it out on CI

huydhn added 3 commits January 5, 2026 13:16
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn merged commit 63c0344 into main Jan 6, 2026
8 checks passed
@huydhn huydhn deleted the detech-xpu-device branch January 6, 2026 00:39
Copy link
Contributor

@yangw-dev yangw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants