-
Notifications
You must be signed in to change notification settings - Fork 110
[Benchmark] Detect XPU runner #7629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Huy Do <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Signed-off-by: Huy Do <[email protected]>
| elif [[ "${DEVICE_NAME}" == "cpu" ]]; then | ||
| DEVICE_TYPE="$(lscpu | grep "Model name" | sed -E 's/.*Model name:[[:space:]]*//; s/Intel\(R\)//g; s/\(R\)//g; s/\(TM\)//g; s/CPU//g; s/Processor//g; s/[[:space:]]+/ /g; s/^ //; s/ $//; s/ /_/g')_$(awk -F: '/Core\(s\) per socket/ {c=$2} /Socket\(s\)/ {s=$2} END {gsub(/ /,"",c); gsub(/ /,"",s); printf "%sc", c*s}' < <(lscpu))" | ||
| elif [[ "${DEVICE_NAME}" == "xpu" ]]; then | ||
| DEVICE_TYPE=$(xpu-smi discovery -d 0 -j | jq -r '.[0].device_name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect this be different? what DEVICE_TYPE we expect here? in FE we assume there is arch name, if this has some value, we might need to update FE query logics for compipler
Source:
mapping device and arch
https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark/compilers/common.tsx#L82
used in compiler query converter before query data
https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark_v3/configs/teams/compilers/config.ts#L158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention in the upload script is to set DEVICE_NAME to a generic name like cuda, rocm, xpu while the DEVICE_TYPE holds the specific device name. xpu-smi discovery returns something like https://github.com/pytorch/pytorch/actions/runs/20727627657/job/59509816791#step:3:42
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Data Center GPU Max 1100 |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-001a-0000-002f0bda8086 |
| | PCI BDF Address: 0000:1a:00.0 |
| | DRM Device: /dev/dri/card3 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
So, the DEVICE_TYPE here will be Intel(R) Data Center GPU Max 1100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see, we can keep arch as empty for XPU device for now https://github.com/pytorch/test-infra/blob/main/torchci/components/benchmark/compilers/common.tsx#L82
|
test failed pytorch/pytorch#171731 |
Darn, Claude lets me down, I don't have an XPU device to test this locally, thus the need to test it out on CI |
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
yangw-dev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
This should fix the issue where XPU results were confused as CPU because the device name wasn't set correctly
Testing
pytorch/pytorch#171731
cc @chuanqi129