Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hip-python API instead of rocm_agent_enumerator #1762

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

dorde-antic
Copy link
Contributor

@dorde-antic dorde-antic commented Mar 4, 2025

Handle agent detection correctly by using hip-python API instead of rocm_agent_enumerator.
Add requirements.txt for external python libraries.

Resolves #1760

@dorde-antic dorde-antic marked this pull request as ready for review March 4, 2025 13:58
@dorde-antic dorde-antic requested a review from causten as a code owner March 4, 2025 13:58
agents = set(x.decode("utf-8") for x in q.stdout.split())
return set(a for a in agents if a != "gfx000")
agents = set()
_, device_count = hip.hipGetDeviceCount()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider doing some basic error checking for the HIP calls.

@dorde-antic dorde-antic marked this pull request as draft March 4, 2025 14:45
@dorde-antic dorde-antic marked this pull request as ready for review March 4, 2025 14:45
requirements.txt Outdated
PyYAML>=5.4.0, <=6.0.1
ml_dtypes>=0.1.0, <=0.5.0 # provides several NumPy dtype extensions, including the bf16
hip-python
scipy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at the end

agents = set(x.decode("utf-8") for x in q.stdout.split() if x != b"gfx000")
agents = set()
_, device_count = hip.hipGetDeviceCount()
for device in range(device_count):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to call the function in common.py? same for other files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, in commit Use hip-python API fixes, as we talked we have duplicating where files are not in the same directory so that we don't use relative paths until we decide to handle it other way, and I've fixed where it could be done asap.

Maybe, for the future, we should consider having a package for a group of functions we need in different child-directories, so that we can use absolute import anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we can't import mlir/test/common_utils/common.py in perfRunner.py because it's in utils/performance. But why can't we import common.py here? it's also in mlir/test

Copy link
Contributor Author

@dorde-antic dorde-antic Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can import it, but in that case we would need to have a package, since those files are not all in same directory (others are in mlir/test/e2e and mlir/test/fusion/e2e). I thought that maybe using package wouldn't be appropriate solution for a file with 2 methods in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine then

props = hip.hipDeviceProp_t()
hip.hipGetDeviceProperties(props,device)
agent = props.gcnArchName.decode('utf-8')
agents.add(agent) if agent != "gfx000" else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this still happen with the API? do we get gfx000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't notice that this happens when using API, but I have kept it since I saw that it was happening in the previous version of function. Should I remove this check if agent != "gfx000" else None ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's not needed, let's remove it.

agent = props.gcnArchName.decode('utf-8')

if agent in xdlop_supported_gpus:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add this function to wherever we put getArchs()

@@ -1,5 +1,7 @@
import os
import subprocess
from hip import hip
from hip import hiprtc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find hiprtc being used. Is it required to be imported ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it when following the hip-python usage instructions, but seems that it is not required for our case. I will remove it and check locally if methods where we use it work without hiprtc.

requirements.txt Outdated
Comment on lines 1 to 6
tomli==2.0.1
nanobind>=2.4, <3.0
numpy>=1.19.5, <=2.1.2
pybind11>=2.10.0, <=2.13.6
PyYAML>=5.4.0, <=6.0.1
ml_dtypes>=0.1.0, <=0.5.0 # provides several NumPy dtype extensions, including the bf16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need all these packages for this PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to add requirements.txt in this PR to have a list of python dependencies. I think he listed all python libraries we use? @dorde-antic are these used by used or did you add all dependencies from external/llvm-project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I listed packages from both with versions same as in external/llvm-project. Should I remove those that are only in external, but keep others from the project?
Or I should just keep hip-python which is required for this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep all libraries that are actually used for rocmlir (not what external uses). But keep the versions consistent with external.


def is_xdlops_present() -> bool:
"""This function checks whether a GPU with xdlops support is present"""
xdlop_supported_gpus = ['gfx908', 'gfx90a', 'gfx942', 'gfx950']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to .startsWith("gfx9")

"""This function checks whether a GPU with xdlops support is present"""
xdlop_supported_gpus = ['gfx908', 'gfx90a', 'gfx942', 'gfx950']
err_code, device_count = hip.hipGetDeviceCount()
if err_code != hip.hipSuccess:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse get_agents() ? then iterate over the results

@@ -18,6 +18,8 @@
from typing import Optional, Dict, Tuple
import numpy as np
import pandas as pd
from hip import hip
from hip import hiprtc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to import hiprtc

props = hip.hipDeviceProp_t()
hip.hipGetDeviceProperties(props,device)
agent = props.gcnArchName.decode('utf-8')
agents.add(agent) if agent != "gfx000" else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gfx000 check not needed

agents = set(x.decode("utf-8") for x in q.stdout.split() if x != b"gfx000")
agents = set()
_, device_count = hip.hipGetDeviceCount()
for device in range(device_count):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we can't import mlir/test/common_utils/common.py in perfRunner.py because it's in utils/performance. But why can't we import common.py here? it's also in mlir/test

@dorde-antic dorde-antic requested a review from dhernandez0 March 5, 2025 14:19
&& pip3 install -r requirements.txt
&& pip3 install -r llvm-requirements.txt \
&& pip3 install -r ../../../requirements.txt \
&& pip3 install -i https://test.pypi.org/simple hip-python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might work if you add --extra-index-url https://pypi.org/simple to the first line of requirements.txt ?

@@ -90,7 +90,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& apt upgrade -y \
&& python3 -m pip install --upgrade pip \
&& pip3 install pandas numpy scipy jinja2 tomli \
&& pip3 install -r requirements.txt
&& pip3 install -r llvm-requirements.txt \
&& pip3 install -r ../../../requirements.txt \
Copy link
Contributor

@dhernandez0 dhernandez0 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use COPY instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants