-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSC low-level GPU (transpilation) via Loki (CUF/CUDA) #328
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/328/index.html |
6cbc2b3
to
a3b8146
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #328 +/- ##
==========================================
+ Coverage 95.41% 95.44% +0.02%
==========================================
Files 178 181 +3
Lines 37312 38163 +851
==========================================
+ Hits 35600 36423 +823
- Misses 1712 1740 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
1afb822
to
0665ded
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, first of all this is a fantastic piece of work and achievement! Well done!
The complexity of the implementation is immense and I've tried to work my way through it to the best of my abilities but I definitely can't claim I fully understood it. Particularly the scc_cuf.py implementation I only skimmed and trust that it seems to be doing what it is supposed to do.
While working my way through the diff I've left comments as I went along, which have piled up to a quite substantial number now. Some of them are small things, others maybe require a little more work. So, it will also be quite confusing tracking the changes that you make as a consequence. To make the integration a little more digestable, I would suggest we try to split the PR into smaller pieces and bring them in piecewise.
I would probably start with the changes in block_index_transformations.py and the associated changes. These appear to me sufficiently standalone and independently tested so that I think it should be possible to merge them first.
Maybe we can subsequently take in the C-flavour backend changes, including some of the F2C transpilation changes if required?
Hopefully, this will reduce the size of this PR than to only the actual SCC transformation?
Open to suggestions here, of course.
Finally, many thanks and great work again!
Thank you for the review @reuterbal! I'll tackle your comments and see whether I can split up the PR into multiple smaller PRs. |
fcfc900
to
02a6f4e
Compare
61866e0
to
7694278
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for working through the comments and making requested changes. This looks almost ready to me.
I have marked comments as resolved that appear to have been fixed. There's only a handful left for which it would be good if you could have another look @MichaelSt98, and also a few where it would be good for @mlange05 to maybe provide some feedback.
The branch has a conflict in one file now, which needs to be resolved.
6844b29
to
12bfcc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much, particularly for sorting out the OpenMP directive removal!
All good from my side. Any additions @mlange05 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely fantastic! No notes - just brilliant! Many, many thanks, GTG for me!
…'-option in general
… check for intent, ...
…' utilities within 'array_indexing'
…opTransformation'
…tation details, ...
…y check for intent being None
12bfcc3
to
cccfe50
Compare
As discussed offline: I have removed the commit that points to the custom CLOUDSC branch. After merging this PR, we can create a new Loki release. As soon as CI tests are green I will merge. Thanks again! |
Still work in progress ...
Corresponding CLOUDSC branch
nams-cloudsc-low-level-GPU
Build via
cloudsc-bundle build --clean --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-loki --with-gpu --with-cuda
New modes/binaries:
./bin/dwarf-cloudsc-loki-idem-lower
CPU variant with lowered block index./bin/dwarf-cloudsc-loki-idem-lower-loop
CPU variant with lowered block index and loop./bin/dwarf-cloudsc-loki-scc-parametrise-cuda
SCC-PARAMETRISE CUDA C./bin/dwarf-cloudsc-loki-scc-hoist-cuda
SCC-HOIST CUDA CFurthermore, usage of pipelines: