Compile pass for hipBLASLt #3594

ahsan-ca · 2024-11-06T04:47:54Z

This pass helps to allocate actual workspace size needed for hipblaslt algos.
It does so by querying for the workspace needed for the particular algo, and allocates memory accordingly.

codecov · 2024-11-06T06:19:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.18%. Comparing base (b39a938) to head (250258e).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3594   +/-   ##
========================================
  Coverage    92.18%   92.18%           
========================================
  Files          513      513           
  Lines        21576    21576           
========================================
  Hits         19889    19889           
  Misses        1687     1687

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ahsan-ca · 2024-11-07T03:24:53Z

Windows build failure should be fixed by: 4ff36ae:

C:\home\jenkins\agent\workspace\UIF2_AMDMIGraphX_PR-3594\AMDMIGraphX\src\targets\gpu\compile_hipblaslt.cpp:78:26: error: use of undeclared identifier 'hipblaslt_workspace_size'
   78 |         std::size_t ws = hipblaslt_workspace_size;
      |                          ^
1 error generated.

…s in addition to throwing an exception

…owing an exception

…failure

…ce size when algo is not supported.

bpickrel · 2024-11-11T17:45:02Z

This needs a test. There are many compiler pass tests in directory test/ that make a run_pass() call and could be used for examples. It's not clear to me whether it's relevant to add a test case to test/gpu/gemm_tune.cpp.

CharlieL7 · 2024-11-11T22:19:31Z

Current code looks good. As Brian mentions would like to see some tests. The main reason for this compiler pass is to do output fusions with hipblaslt, right?

pfultz2 · 2024-11-11T22:26:10Z

The main reason for this compiler pass is to do output fusions with hipblaslt, right?

The main reason is to be able to insert accurate workspace allocation after tuning. We cant insert workspace allocation instructions during lowering because we dont know the layouts until after eliminate_contiguous and we also need to run tuning as well.

src/targets/gpu/compile_hipblaslt.cpp

…olutions is supported for the algo 2. Use default workspace size when returned workspace size is 0, temporary workaround for memory issues with 0 workspace size.

ahsan-ca · 2024-11-12T18:58:41Z

Chris reached out to Bryant and discussed the issue where none of the algos returned by getAllAlgos is supported/valid. Bryant's suggestion was to use hipblasLtMatmulAlgoGetHeuristic() with requestedAlgoCount=1, and then query the index for the algo returned by getIndexFromAlgo(). I have made the changes for this in tune() via commit: 2ae35e0.

src/targets/gpu/target.cpp

src/targets/gpu/hip_gemm_impl.cpp

src/targets/gpu/compile_hipblaslt.cpp

TedThemistokleous

Few comments.

TedThemistokleous

Fix CI but others LGTM

migraphx-bot · 2024-11-15T02:17:58Z

Test	Batch	Rate new 250258	Rate old b39a93	Diff	Compare
torchvision-resnet50	64	3,260.12	3,259.98	0.00%	✅
torchvision-resnet50_fp16	64	6,988.96	6,991.63	-0.04%	✅
torchvision-densenet121	32	2,434.72	2,436.76	-0.08%	✅
torchvision-densenet121_fp16	32	4,101.04	4,076.33	0.61%	✅
torchvision-inceptionv3	32	1,629.01	1,630.75	-0.11%	✅
torchvision-inceptionv3_fp16	32	2,746.52	2,750.13	-0.13%	✅
cadene-inceptionv4	16	765.70	766.23	-0.07%	✅
cadene-resnext64x4	16	811.34	811.64	-0.04%	✅
slim-mobilenet	64	7,466.21	7,471.32	-0.07%	✅
slim-nasnetalarge	64	208.42	208.48	-0.03%	✅
slim-resnet50v2	64	3,440.06	3,441.77	-0.05%	✅
bert-mrpc-onnx	8	1,151.41	1,151.97	-0.05%	✅
bert-mrpc-tf	1	460.06	492.75	-6.63%	🔴
pytorch-examples-wlang-gru	1	414.47	417.46	-0.71%	✅
pytorch-examples-wlang-lstm	1	408.33	406.83	0.37%	✅
torchvision-resnet50_1	1	781.34	775.10	0.80%	✅
cadene-dpn92_1	1	397.20	421.44	-5.75%	🔴
cadene-resnext101_1	1	383.11	383.26	-0.04%	✅
onnx-taau-downsample	1	345.67	346.67	-0.29%	✅
dlrm-criteoterabyte	1	33.33	33.33	0.01%	✅
dlrm-criteoterabyte_fp16	1	52.50	52.76	-0.50%	✅
agentmodel	1	8,172.67	9,419.30	-13.23%	🔴
unet_fp16	2	58.91	58.87	0.06%	✅
resnet50v1_fp16	1	942.95	941.94	0.11%	✅
resnet50v1_int8	1	1,018.89	985.94	3.34%	🔆
bert_base_cased_fp16	64	1,169.84	1,171.42	-0.13%	✅
bert_large_uncased_fp16	32	363.40	363.44	-0.01%	✅
bert_large_fp16	1	198.69	200.59	-0.95%	✅
distilgpt2_fp16	16	2,199.29	2,203.62	-0.20%	✅
yolov5s	1	531.68	526.10	1.06%	✅
tinyllama	1	43.41	43.40	0.02%	✅
vicuna-fastchat	1	173.53	172.63	0.52%	✅
whisper-tiny-encoder	1	418.66	418.90	-0.06%	✅
whisper-tiny-decoder	1	435.15	428.52	1.55%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-11-15T02:18:00Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

ahsan-ca requested a review from pfultz2 November 6, 2024 04:47

ahsan-ca self-assigned this Nov 6, 2024

ahsan-ca requested a review from causten as a code owner November 6, 2024 04:47

ahsan-ca force-pushed the hipblaslt-compile branch from 4169e71 to 120705e Compare November 6, 2024 15:36

ahsan-ca mentioned this pull request Nov 6, 2024

Add compile pass for hipblaslt #3455

Open

ahsan-ca added the enhancement New feature or request label Nov 6, 2024

ahsan-ca added 8 commits November 7, 2024 17:53

Update finalize function to run tune only when sol idx is 0.

f499117

Use solution_idx variable for hip_gemm_finalize

dd1270f

Modify hipblaslt_invoke to also have the ability to return with statu…

b600d2e

…s in addition to throwing an exception

Add comment to hipblaslt invoke function for parameter to control thr…

26a8d3d

…owing an exception

Compile pass for hipBLASLt

5513f7e

Fix failure when workspace size is '0'.

e046e87

Guard hipblaslt.cpp with MIGRAPHX_USE_HIPBLASLT to fix windows build …

561ef3e

…failure

Run workspace size function for solution index 0. Set default workspa…

a7196e4

…ce size when algo is not supported.

ahsan-ca force-pushed the hipblaslt-compile branch from 4ff36ae to a7196e4 Compare November 8, 2024 00:08

ahsan-ca changed the base branch from develop to hipblaslt-update-finalize November 8, 2024 00:09

Run compile_hipblaslt pass after fuse ops pass

afb6ba5

causten requested review from CharlieL7 and bpickrel November 8, 2024 14:45

pfultz2 reviewed Nov 11, 2024

View reviewed changes

src/targets/gpu/compile_hipblaslt.cpp Outdated Show resolved Hide resolved

1. Use algoGetHeuristic function to get algo/solution index when no s…

2ae35e0

…olutions is supported for the algo 2. Use default workspace size when returned workspace size is 0, temporary workaround for memory issues with 0 workspace size.

ahsan-ca added 2 commits November 12, 2024 14:26

Address tidy error

a16a443

Make compile a static free function in compile_hipblaslt.cpp

3e4c192

ahsan-ca requested a review from pfultz2 November 12, 2024 21:43

ahsan-ca added 2 commits November 13, 2024 12:18

Move hipblaslt_op from compile_hipblaslt.cpp to compile_hipblaslt.hpp

8cd6b7e

Add test case to check output for compile_hipblaslt pass after lowering

75ee1bc

ahsan-ca linked an issue Nov 13, 2024 that may be closed by this pull request

Add compile pass for hipblaslt #3455

Open

ahsan-ca requested review from TedThemistokleous, shivadbhavsar and kahmed10 and removed request for bpickrel November 13, 2024 20:27

TedThemistokleous added roadmap Tasks to finish for a release high priority A PR with high priority for review and merging. labels Nov 13, 2024

TedThemistokleous reviewed Nov 13, 2024

View reviewed changes

src/targets/gpu/target.cpp Show resolved Hide resolved

Base automatically changed from hipblaslt-update-finalize to develop November 13, 2024 21:10

ahsan-ca requested a review from TedThemistokleous November 13, 2024 21:14

TedThemistokleous reviewed Nov 13, 2024

View reviewed changes

src/targets/gpu/hip_gemm_impl.cpp Show resolved Hide resolved

TedThemistokleous reviewed Nov 13, 2024

View reviewed changes

src/targets/gpu/hip_gemm_impl.cpp Show resolved Hide resolved

TedThemistokleous reviewed Nov 14, 2024

View reviewed changes

src/targets/gpu/compile_hipblaslt.cpp Show resolved Hide resolved

TedThemistokleous reviewed Nov 14, 2024

View reviewed changes

CharlieL7 approved these changes Nov 14, 2024

View reviewed changes

ahsan-ca requested a review from TedThemistokleous November 14, 2024 19:57

TedThemistokleous approved these changes Nov 14, 2024

View reviewed changes

Merge branch 'develop' into hipblaslt-compile

250258e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile pass for hipBLASLt #3594

Compile pass for hipBLASLt #3594

ahsan-ca commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

ahsan-ca commented Nov 7, 2024

bpickrel commented Nov 11, 2024

CharlieL7 commented Nov 11, 2024

pfultz2 commented Nov 11, 2024

ahsan-ca commented Nov 12, 2024

TedThemistokleous left a comment

TedThemistokleous left a comment

migraphx-bot commented Nov 15, 2024

migraphx-bot commented Nov 15, 2024

Compile pass for hipBLASLt #3594

Are you sure you want to change the base?

Compile pass for hipBLASLt #3594

Conversation

ahsan-ca commented Nov 6, 2024 • edited Loading

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

ahsan-ca commented Nov 7, 2024

bpickrel commented Nov 11, 2024

CharlieL7 commented Nov 11, 2024

pfultz2 commented Nov 11, 2024

ahsan-ca commented Nov 12, 2024

TedThemistokleous left a comment

Choose a reason for hiding this comment

TedThemistokleous left a comment

Choose a reason for hiding this comment

migraphx-bot commented Nov 15, 2024

migraphx-bot commented Nov 15, 2024

ahsan-ca commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading