New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Python script for testing metrics and plotting correlations #1769

Open

dorde-antic wants to merge 1 commit into develop from testing-metrics

+173 −0

Contributor

dorde-antic commented Mar 9, 2025

Implement python script for testing metrics and plotting correlation between metrics and parameters

Resolves #1719


          Implement python script for testing metrics and plotting correlation …

c42bc58

…between metrics and parameters

dorde-antic requested review from dhernandez0, umangyadav, stefankoncarevic, djramic and mirza-halilcevic

March 9, 2025 19:20

dorde-antic requested a review from causten as a code owner

March 9, 2025 19:20

Contributor Author

dorde-antic commented Mar 9, 2025 •

edited

Loading

Function def analyze_conv_file(file, n) is left unimplemented, but is not deleted, in case we decide to calculate the metrics also for convs (to convert them to gemms and then calculate metrics).

umangyadav reviewed

View reviewed changes

mlir/utils/performance/analysis/testing-metrics.py

Comment on lines +17 to +18

		numEUPerCU = 4 # may be changed in newer architectures
		numCUs = 304 # temporary hardcoded

Member

umangyadav Mar 10, 2025

If you are hardcoding these values then it is better if you check get_gfx_arch and assert that it is Mi300.
Else i recommend using import hip and fetching these values from hipDeviceProperities dynamically.

Contributor

dhernandez0 Mar 10, 2025

yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.

mlir/utils/performance/analysis/testing-metrics.py

+                  gemm_keys = ['TransA', 'TransB', 'G', 'M', 'K', 'N']
+                  perfConfig_params = ['MPerBlock', 'NPerBlock', 'KPerBlock', 'MPerWave', 'NPerWave', 'kPack', 'splitKFactor', 'forceUnroll', 'ThreadCopyMore']
+                  df[perfConfig_params] = df["PerfConfig"].str.replace("v2:", "").str.split(",", expand=True)

Member

umangyadav Mar 10, 2025

I am adding perfConfig v3: here : #1767

can you make sure .tsv files are generated with v2: and assert for that or put comment somewhere ?

Else you can check which perfConfig version it is first and based that set the columns differently for the perfConfig_params.

Contributor Author

dorde-antic Mar 10, 2025

In the files I was working with, all the perfConfigs were starting with v2: so that's why I used v2 (didn't know that there are also different cases).
It can be changed so that the script can recognize perfConfigs that doesn't start with v2. Should it be changed for this issue now?

Member

umangyadav Mar 11, 2025

You can put an assert somewhere that it is v2:

mlir/utils/performance/analysis/testing-metrics.py Show resolved Hide resolved

mlir/utils/performance/analysis/testing-metrics.py

Comment on lines +134 to +135

		MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock)
		NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)

Member

umangyadav Mar 10, 2025

I think you can use math.ceil for this one and similar places elsewhere

dhernandez0 reviewed

View reviewed changes

mlir/utils/performance/analysis/testing-metrics.py

Comment on lines +17 to +18

		numEUPerCU = 4 # may be changed in newer architectures
		numCUs = 304 # temporary hardcoded

Contributor

dhernandez0 Mar 10, 2025

yes, let's check it's gfx942. This should be fixed with https://github.com/ROCm/rocMLIR-internal/issues/1745 when we get python AmdArchDb functionality.

mlir/utils/performance/analysis/testing-metrics.py

+              if __name__ == "__main__":
+                  parser = argparse.ArgumentParser(description="Analyze .tsv.debug file")
+                  parser.add_argument("files", nargs="+")
+                  parser.add_argument("--n", type=float, default=5) # percent of configs close to winning

Contributor

dhernandez0 Mar 10, 2025

default=0.05

Contributor Author

dorde-antic Mar 10, 2025

I thought that user would put for example 5% so that n=5
which is later transformed to 0.05 using this (1 - n / 100).
Anyway, it can be changed so that user passes 0.xx format for percentages as argument.

Contributor

dhernandez0 Mar 10, 2025

I see, whatever you prefer then

mlir/utils/performance/analysis/testing-metrics.py


		top_list = []

		for (key, group) in df.groupby(gemm_keys):

Contributor

dhernandez0 Mar 10, 2025

not sure what these lines do. I'm not familiar with groupby. Could you add a comment here?

Contributor Author

dorde-antic Mar 10, 2025

Groupby is used so that we can group the rows of the csv file in which we have the same gemm_keys (but different perfConfigs and TFlops) Gemm_keys param is previously defined.
Then for each subgroup generated by groupby, we calculate threshold using prefered function (max, best + n%, quantile...).

mlir/utils/performance/analysis/testing-metrics.py

		minNumWaves = numCUs * numEUPerCU


		def analyze_gemm_file(file, n):

Contributor

dhernandez0 Mar 10, 2025

where is "n" used?

Contributor Author

dorde-antic Mar 10, 2025

It's used when filtering top N configs by

group[ group['TFlops'] >= (group['TFlops'].max() * (1 - n / 100))]
group['TFlops'].quantile(1 - n / 100.0)
This version of script was last version we used when we wanted to check plots considering only the best configs

We can also add all three options in code and then select by argument which kind of filtering we want.

mlir/utils/performance/analysis/testing-metrics.py

+                  MTiles = (int(M) + int(MPerBlock) - 1) / int(MPerBlock)
+                  NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)
+                  WorkGroups = G * MTiles * NTiles

Contributor

dhernandez0 Mar 10, 2025

we should take into account splitKFactor here

mlir/utils/performance/analysis/testing-metrics.py

+                  NTiles = (int(N) + int(NPerBlock) - 1) / int(NPerBlock)
+                  WorkGroups = G * MTiles * NTiles
+                  WavesPerBlock = int(MPerBlock) * int(NPerBlock) / int(MNPerWave)

Contributor

dhernandez0 Mar 10, 2025

you can use "//" for integer division: WavesPerBlock = (MPerBlock* NPerBlock) // MNPerWave

dhernandez0 reviewed

View reviewed changes

mlir/utils/performance/analysis/testing-metrics.py

		return (MNK)/(MN + MK + N*K) # opPerByte/bytesLoaded


		def calculate_occupancy(M, N, G, MPerBlock, NPerBlock, MNPerWave, minNumWaves):

Contributor

dhernandez0 Mar 10, 2025 •

edited

Loading

random idea: we could check if occupancy in terms of blocks has better correlation. I mean, it could be that having one or two waves per CU is enough? No need to explore this in this PR

dhernandez0 reviewed

View reviewed changes

mlir/utils/performance/analysis/testing-metrics.py



		def analyze_conv_file(file, n):
		# implementation goes here

Contributor

dhernandez0 Mar 10, 2025 •

edited

Loading

print an error message here

umangyadav reviewed

View reviewed changes

mlir/utils/performance/analysis/testing-metrics.py

Comment on lines +140 to +142

		maxWavesPerCU = math.ceil(Waves / minNumWaves)

		return (maxWavesPerCU * minNumWaves) / Waves

Member

umangyadav Mar 10, 2025

I don't fully understand how this work imbalance is computed.

For example

let's consider workload with splitKFactor =1 only for ease.

In case A, it fills up 50% of the GPU. So 4 * (304/2) = 152 * 4 = 608 waves

In case B it fills up GPU 150% so it has 4 * (304) + 4 * 152 = 608 + 1216 = 1824 Waves.

To me it seems both workloads are equally imbalanced.

But based on following calculation, it will calculate
Work imbalance for case A as 2
Work imbalance for case B as 1.33

Contributor

dhernandez0 Mar 10, 2025

let's use (Waves % minNumWaves)/minNumWaves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

dhernandez0 dhernandez0 left review comments

umangyadav umangyadav left review comments

stefankoncarevic Awaiting requested review from stefankoncarevic

djramic Awaiting requested review from djramic

mirza-halilcevic Awaiting requested review from mirza-halilcevic

causten Awaiting requested review from causten causten is a code owner

At least 1 approving review is required to merge this pull request.

Labels

None yet