Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python script for testing metrics and plotting correlations #1769

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dorde-antic
Copy link
Contributor

Implement python script for testing metrics and plotting correlation between metrics and parameters

Resolves #1719

@dorde-antic
Copy link
Contributor Author

dorde-antic commented Mar 9, 2025

Function def analyze_conv_file(file, n) is left unimplemented, but is not deleted, in case we decide to calculate the metrics also for convs (to convert them to gemms and then calculate metrics).

Comment on lines +17 to +18
numEUPerCU = 4 # may be changed in newer architectures
numCUs = 304 # temporary hardcoded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are hardcoding these values then it is better if you check get_gfx_arch and assert that it is Mi300.
Else i recommend using import hip and fetching these values from hipDeviceProperities dynamically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.

gemm_keys = ['TransA', 'TransB', 'G', 'M', 'K', 'N']
perfConfig_params = ['MPerBlock', 'NPerBlock', 'KPerBlock', 'MPerWave', 'NPerWave', 'kPack', 'splitKFactor', 'forceUnroll', 'ThreadCopyMore']

df[perfConfig_params] = df["PerfConfig"].str.replace("v2:", "").str.split(",", expand=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding perfConfig v3: here : #1767

can you make sure .tsv files are generated with v2: and assert for that or put comment somewhere ?

Else you can check which perfConfig version it is first and based that set the columns differently for the perfConfig_params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the files I was working with, all the perfConfigs were starting with v2: so that's why I used v2 (didn't know that there are also different cases).
It can be changed so that the script can recognize perfConfigs that doesn't start with v2. Should it be changed for this issue now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put an assert somewhere that it is v2:

Comment on lines +134 to +135
MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock)
NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use math.ceil for this one and similar places elsewhere

Comment on lines +17 to +18
numEUPerCU = 4 # may be changed in newer architectures
numCUs = 304 # temporary hardcoded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Analyze .tsv.debug file")
parser.add_argument("files", nargs="+")
parser.add_argument("--n", type=float, default=5) # percent of configs close to winning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=0.05

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that user would put for example 5% so that n=5
which is later transformed to 0.05 using this (1 - n / 100).
Anyway, it can be changed so that user passes 0.xx format for percentages as argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, whatever you prefer then


top_list = []

for (key, group) in df.groupby(gemm_keys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what these lines do. I'm not familiar with groupby. Could you add a comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Groupby is used so that we can group the rows of the csv file in which we have the same gemm_keys (but different perfConfigs and TFlops) Gemm_keys param is previously defined.
Then for each subgroup generated by groupby, we calculate threshold using prefered function (max, best + n%, quantile...).

minNumWaves = numCUs * numEUPerCU


def analyze_gemm_file(file, n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is "n" used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used when filtering top N configs by

  • group[ group['TFlops'] >= (group['TFlops'].max() * (1 - n / 100))]
  • group['TFlops'].quantile(1 - n / 100.0)
    This version of script was last version we used when we wanted to check plots considering only the best configs

We can also add all three options in code and then select by argument which kind of filtering we want.

MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock)
NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)

WorkGroups = G * MTiles * NTiles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should take into account splitKFactor here

NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)

WorkGroups = G * MTiles * NTiles
WavesPerBlock = int(MPerBlock) * int(NPerBlock) / int(MNPerWave)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use "//" for integer division: WavesPerBlock = (MPerBlock* NPerBlock) // MNPerWave

return (M*N*K)/(M*N + M*K + N*K) # opPerByte/bytesLoaded


def calculate_occupancy(M, N, G, MPerBlock, NPerBlock, MNPerWave, minNumWaves):
Copy link
Contributor

@dhernandez0 dhernandez0 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random idea: we could check if occupancy in terms of blocks has better correlation. I mean, it could be that having one or two waves per CU is enough? No need to explore this in this PR



def analyze_conv_file(file, n):
# implementation goes here
Copy link
Contributor

@dhernandez0 dhernandez0 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print an error message here

Comment on lines +140 to +142
maxWavesPerCU = math.ceil(Waves / minNumWaves)

return (maxWavesPerCU * minNumWaves) / Waves
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how this work imbalance is computed.

For example

let's consider workload with splitKFactor =1 only for ease.

In case A, it fills up 50% of the GPU. So 4 * (304/2) = 152 * 4 = 608 waves

In case B it fills up GPU 150% so it has 4 * (304) + 4 * 152 = 608 + 1216 = 1824 Waves.

To me it seems both workloads are equally imbalanced.

But based on following calculation, it will calculate
Work imbalance for case A as 2
Work imbalance for case B as 1.33

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use (Waves % minNumWaves)/minNumWaves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants