-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python script for testing metrics and plotting correlations #1769
base: develop
Are you sure you want to change the base?
Conversation
…between metrics and parameters
Function |
numEUPerCU = 4 # may be changed in newer architectures | ||
numCUs = 304 # temporary hardcoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are hardcoding these values then it is better if you check get_gfx_arch
and assert that it is Mi300
.
Else i recommend using import hip
and fetching these values from hipDeviceProperities dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.
gemm_keys = ['TransA', 'TransB', 'G', 'M', 'K', 'N'] | ||
perfConfig_params = ['MPerBlock', 'NPerBlock', 'KPerBlock', 'MPerWave', 'NPerWave', 'kPack', 'splitKFactor', 'forceUnroll', 'ThreadCopyMore'] | ||
|
||
df[perfConfig_params] = df["PerfConfig"].str.replace("v2:", "").str.split(",", expand=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding perfConfig v3: here :
#1767
can you make sure .tsv
files are generated with v2:
and assert for that or put comment somewhere ?
Else you can check which perfConfig version it is first and based that set the columns differently for the perfConfig_params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the files I was working with, all the perfConfigs were starting with v2: so that's why I used v2 (didn't know that there are also different cases).
It can be changed so that the script can recognize perfConfigs that doesn't start with v2. Should it be changed for this issue now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put an assert somewhere that it is v2:
MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock) | ||
NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use math.ceil for this one and similar places elsewhere
numEUPerCU = 4 # may be changed in newer architectures | ||
numCUs = 304 # temporary hardcoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Analyze .tsv.debug file") | ||
parser.add_argument("files", nargs="+") | ||
parser.add_argument("--n", type=float, default=5) # percent of configs close to winning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default=0.05
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that user would put for example 5% so that n=5
which is later transformed to 0.05 using this (1 - n / 100)
.
Anyway, it can be changed so that user passes 0.xx format for percentages as argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, whatever you prefer then
|
||
top_list = [] | ||
|
||
for (key, group) in df.groupby(gemm_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what these lines do. I'm not familiar with groupby. Could you add a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Groupby is used so that we can group the rows of the csv file in which we have the same gemm_keys
(but different perfConfigs and TFlops) Gemm_keys param is previously defined.
Then for each subgroup generated by groupby, we calculate threshold using prefered function (max, best + n%, quantile...).
minNumWaves = numCUs * numEUPerCU | ||
|
||
|
||
def analyze_gemm_file(file, n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is "n" used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used when filtering top N configs by
- group[ group['TFlops'] >= (group['TFlops'].max() * (1 - n / 100))]
- group['TFlops'].quantile(1 - n / 100.0)
This version of script was last version we used when we wanted to check plots considering only the best configs
We can also add all three options in code and then select by argument which kind of filtering we want.
MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock) | ||
NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock) | ||
|
||
WorkGroups = G * MTiles * NTiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should take into account splitKFactor here
NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock) | ||
|
||
WorkGroups = G * MTiles * NTiles | ||
WavesPerBlock = int(MPerBlock) * int(NPerBlock) / int(MNPerWave) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use "//" for integer division: WavesPerBlock = (MPerBlock* NPerBlock) // MNPerWave
return (M*N*K)/(M*N + M*K + N*K) # opPerByte/bytesLoaded | ||
|
||
|
||
def calculate_occupancy(M, N, G, MPerBlock, NPerBlock, MNPerWave, minNumWaves): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random idea: we could check if occupancy in terms of blocks has better correlation. I mean, it could be that having one or two waves per CU is enough? No need to explore this in this PR
|
||
|
||
def analyze_conv_file(file, n): | ||
# implementation goes here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print an error message here
maxWavesPerCU = math.ceil(Waves / minNumWaves) | ||
|
||
return (maxWavesPerCU * minNumWaves) / Waves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand how this work imbalance is computed.
For example
let's consider workload with splitKFactor =1 only for ease.
In case A, it fills up 50% of the GPU. So 4 * (304/2) = 152 * 4 = 608 waves
In case B it fills up GPU 150% so it has 4 * (304) + 4 * 152 = 608 + 1216 = 1824 Waves.
To me it seems both workloads are equally imbalanced.
But based on following calculation, it will calculate
Work imbalance for case A as 2
Work imbalance for case B as 1.33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use (Waves % minNumWaves)/minNumWaves
Implement python script for testing metrics and plotting correlation between metrics and parameters
Resolves #1719