Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile pass for hipBLASLt #3594

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open

Compile pass for hipBLASLt #3594

wants to merge 15 commits into from

Conversation

ahsan-ca
Copy link
Contributor

@ahsan-ca ahsan-ca commented Nov 6, 2024

This pass helps to allocate actual workspace size needed for hipblaslt algos.
It does so by querying for the workspace needed for the particular algo, and allocates memory accordingly.

@ahsan-ca ahsan-ca requested a review from pfultz2 November 6, 2024 04:47
@ahsan-ca ahsan-ca self-assigned this Nov 6, 2024
Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.18%. Comparing base (b39a938) to head (250258e).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3594   +/-   ##
========================================
  Coverage    92.18%   92.18%           
========================================
  Files          513      513           
  Lines        21576    21576           
========================================
  Hits         19889    19889           
  Misses        1687     1687           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ahsan-ca ahsan-ca added the enhancement New feature or request label Nov 6, 2024
@ahsan-ca
Copy link
Contributor Author

ahsan-ca commented Nov 7, 2024

Windows build failure should be fixed by: 4ff36ae:

C:\home\jenkins\agent\workspace\UIF2_AMDMIGraphX_PR-3594\AMDMIGraphX\src\targets\gpu\compile_hipblaslt.cpp:78:26: error: use of undeclared identifier 'hipblaslt_workspace_size'
   78 |         std::size_t ws = hipblaslt_workspace_size;
      |                          ^
1 error generated.

@ahsan-ca ahsan-ca changed the base branch from develop to hipblaslt-update-finalize November 8, 2024 00:09
@bpickrel
Copy link
Contributor

This needs a test. There are many compiler pass tests in directory test/ that make a run_pass() call and could be used for examples. It's not clear to me whether it's relevant to add a test case to test/gpu/gemm_tune.cpp.

@CharlieL7
Copy link
Collaborator

Current code looks good. As Brian mentions would like to see some tests. The main reason for this compiler pass is to do output fusions with hipblaslt, right?

@pfultz2
Copy link
Collaborator

pfultz2 commented Nov 11, 2024

The main reason for this compiler pass is to do output fusions with hipblaslt, right?

The main reason is to be able to insert accurate workspace allocation after tuning. We cant insert workspace allocation instructions during lowering because we dont know the layouts until after eliminate_contiguous and we also need to run tuning as well.

…olutions is supported for the algo

2. Use default workspace size when returned workspace size is 0, temporary workaround for memory issues with 0 workspace size.
@ahsan-ca
Copy link
Contributor Author

Chris reached out to Bryant and discussed the issue where none of the algos returned by getAllAlgos is supported/valid. Bryant's suggestion was to use hipblasLtMatmulAlgoGetHeuristic() with requestedAlgoCount=1, and then query the index for the algo returned by getIndexFromAlgo(). I have made the changes for this in tune() via commit: 2ae35e0.

@ahsan-ca ahsan-ca linked an issue Nov 13, 2024 that may be closed by this pull request
@ahsan-ca ahsan-ca requested review from TedThemistokleous, shivadbhavsar and kahmed10 and removed request for bpickrel November 13, 2024 20:27
@TedThemistokleous TedThemistokleous added roadmap Tasks to finish for a release high priority A PR with high priority for review and merging. labels Nov 13, 2024
Base automatically changed from hipblaslt-update-finalize to develop November 13, 2024 21:10
Copy link
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments.

Copy link
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix CI but others LGTM

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
250258
Rate old
b39a93
Diff Compare
torchvision-resnet50 64 3,260.12 3,259.98 0.00%
torchvision-resnet50_fp16 64 6,988.96 6,991.63 -0.04%
torchvision-densenet121 32 2,434.72 2,436.76 -0.08%
torchvision-densenet121_fp16 32 4,101.04 4,076.33 0.61%
torchvision-inceptionv3 32 1,629.01 1,630.75 -0.11%
torchvision-inceptionv3_fp16 32 2,746.52 2,750.13 -0.13%
cadene-inceptionv4 16 765.70 766.23 -0.07%
cadene-resnext64x4 16 811.34 811.64 -0.04%
slim-mobilenet 64 7,466.21 7,471.32 -0.07%
slim-nasnetalarge 64 208.42 208.48 -0.03%
slim-resnet50v2 64 3,440.06 3,441.77 -0.05%
bert-mrpc-onnx 8 1,151.41 1,151.97 -0.05%
bert-mrpc-tf 1 460.06 492.75 -6.63% 🔴
pytorch-examples-wlang-gru 1 414.47 417.46 -0.71%
pytorch-examples-wlang-lstm 1 408.33 406.83 0.37%
torchvision-resnet50_1 1 781.34 775.10 0.80%
cadene-dpn92_1 1 397.20 421.44 -5.75% 🔴
cadene-resnext101_1 1 383.11 383.26 -0.04%
onnx-taau-downsample 1 345.67 346.67 -0.29%
dlrm-criteoterabyte 1 33.33 33.33 0.01%
dlrm-criteoterabyte_fp16 1 52.50 52.76 -0.50%
agentmodel 1 8,172.67 9,419.30 -13.23% 🔴
unet_fp16 2 58.91 58.87 0.06%
resnet50v1_fp16 1 942.95 941.94 0.11%
resnet50v1_int8 1 1,018.89 985.94 3.34% 🔆
bert_base_cased_fp16 64 1,169.84 1,171.42 -0.13%
bert_large_uncased_fp16 32 363.40 363.44 -0.01%
bert_large_fp16 1 198.69 200.59 -0.95%
distilgpt2_fp16 16 2,199.29 2,203.62 -0.20%
yolov5s 1 531.68 526.10 1.06%
tinyllama 1 43.41 43.40 0.02%
vicuna-fastchat 1 173.53 172.63 0.52%
whisper-tiny-encoder 1 418.66 418.90 -0.06%
whisper-tiny-decoder 1 435.15 428.52 1.55%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority A PR with high priority for review and merging. roadmap Tasks to finish for a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add compile pass for hipblaslt
6 participants