-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hip-python API instead of rocm_agent_enumerator #1762
base: develop
Are you sure you want to change the base?
Conversation
mlir/test/common_utils/common.py
Outdated
agents = set(x.decode("utf-8") for x in q.stdout.split()) | ||
return set(a for a in agents if a != "gfx000") | ||
agents = set() | ||
_, device_count = hip.hipGetDeviceCount() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing some basic error checking for the HIP calls.
requirements.txt
Outdated
PyYAML>=5.4.0, <=6.0.1 | ||
ml_dtypes>=0.1.0, <=0.5.0 # provides several NumPy dtype extensions, including the bf16 | ||
hip-python | ||
scipy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line at the end
agents = set(x.decode("utf-8") for x in q.stdout.split() if x != b"gfx000") | ||
agents = set() | ||
_, device_count = hip.hipGetDeviceCount() | ||
for device in range(device_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to call the function in common.py? same for other files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, in commit Use hip-python API fixes, as we talked we have duplicating where files are not in the same directory so that we don't use relative paths until we decide to handle it other way, and I've fixed where it could be done asap.
Maybe, for the future, we should consider having a package for a group of functions we need in different child-directories, so that we can use absolute import anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we can't import mlir/test/common_utils/common.py in perfRunner.py because it's in utils/performance. But why can't we import common.py here? it's also in mlir/test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can import it, but in that case we would need to have a package, since those files are not all in same directory (others are in mlir/test/e2e and mlir/test/fusion/e2e). I thought that maybe using package wouldn't be appropriate solution for a file with 2 methods in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine then
mlir/test/common_utils/common.py
Outdated
props = hip.hipDeviceProp_t() | ||
hip.hipGetDeviceProperties(props,device) | ||
agent = props.gcnArchName.decode('utf-8') | ||
agents.add(agent) if agent != "gfx000" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this still happen with the API? do we get gfx000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't notice that this happens when using API, but I have kept it since I saw that it was happening in the previous version of function. Should I remove this check if agent != "gfx000" else None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not needed, let's remove it.
mlir/utils/performance/perfRunner.py
Outdated
agent = props.gcnArchName.decode('utf-8') | ||
|
||
if agent in xdlop_supported_gpus: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add this function to wherever we put getArchs()
mlir/test/common_utils/common.py
Outdated
@@ -1,5 +1,7 @@ | |||
import os | |||
import subprocess | |||
from hip import hip | |||
from hip import hiprtc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find hiprtc being used. Is it required to be imported ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it when following the hip-python usage instructions, but seems that it is not required for our case. I will remove it and check locally if methods where we use it work without hiprtc.
requirements.txt
Outdated
tomli==2.0.1 | ||
nanobind>=2.4, <3.0 | ||
numpy>=1.19.5, <=2.1.2 | ||
pybind11>=2.10.0, <=2.13.6 | ||
PyYAML>=5.4.0, <=6.0.1 | ||
ml_dtypes>=0.1.0, <=0.5.0 # provides several NumPy dtype extensions, including the bf16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need all these packages for this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested to add requirements.txt in this PR to have a list of python dependencies. I think he listed all python libraries we use? @dorde-antic are these used by used or did you add all dependencies from external/llvm-project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I listed packages from both with versions same as in external/llvm-project. Should I remove those that are only in external, but keep others from the project?
Or I should just keep hip-python which is required for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep all libraries that are actually used for rocmlir (not what external uses). But keep the versions consistent with external.
mlir/test/common_utils/common.py
Outdated
|
||
def is_xdlops_present() -> bool: | ||
"""This function checks whether a GPU with xdlops support is present""" | ||
xdlop_supported_gpus = ['gfx908', 'gfx90a', 'gfx942', 'gfx950'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to .startsWith("gfx9")
mlir/test/common_utils/common.py
Outdated
"""This function checks whether a GPU with xdlops support is present""" | ||
xdlop_supported_gpus = ['gfx908', 'gfx90a', 'gfx942', 'gfx950'] | ||
err_code, device_count = hip.hipGetDeviceCount() | ||
if err_code != hip.hipSuccess: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse get_agents() ? then iterate over the results
mlir/utils/performance/perfRunner.py
Outdated
@@ -18,6 +18,8 @@ | |||
from typing import Optional, Dict, Tuple | |||
import numpy as np | |||
import pandas as pd | |||
from hip import hip | |||
from hip import hiprtc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to import hiprtc
mlir/utils/performance/perfRunner.py
Outdated
props = hip.hipDeviceProp_t() | ||
hip.hipGetDeviceProperties(props,device) | ||
agent = props.gcnArchName.decode('utf-8') | ||
agents.add(agent) if agent != "gfx000" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gfx000 check not needed
agents = set(x.decode("utf-8") for x in q.stdout.split() if x != b"gfx000") | ||
agents = set() | ||
_, device_count = hip.hipGetDeviceCount() | ||
for device in range(device_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we can't import mlir/test/common_utils/common.py in perfRunner.py because it's in utils/performance. But why can't we import common.py here? it's also in mlir/test
mlir/utils/jenkins/Dockerfile
Outdated
&& pip3 install -r requirements.txt | ||
&& pip3 install -r llvm-requirements.txt \ | ||
&& pip3 install -r ../../../requirements.txt \ | ||
&& pip3 install -i https://test.pypi.org/simple hip-python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might work if you add --extra-index-url https://pypi.org/simple to the first line of requirements.txt ?
@@ -90,7 +90,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ | |||
&& apt upgrade -y \ | |||
&& python3 -m pip install --upgrade pip \ | |||
&& pip3 install pandas numpy scipy jinja2 tomli \ | |||
&& pip3 install -r requirements.txt | |||
&& pip3 install -r llvm-requirements.txt \ | |||
&& pip3 install -r ../../../requirements.txt \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use COPY instead
Handle agent detection correctly by using hip-python API instead of rocm_agent_enumerator.
Add requirements.txt for external python libraries.
Resolves #1760