Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSC low-level GPU (transpilation) via Loki (CUF/CUDA) #328

Merged
merged 36 commits into from
Sep 6, 2024

Conversation

MichaelSt98
Copy link
Collaborator

Still work in progress ...

Corresponding CLOUDSC branch nams-cloudsc-low-level-GPU

Build via cloudsc-bundle build --clean --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-loki --with-gpu --with-cuda

New modes/binaries:

  • ./bin/dwarf-cloudsc-loki-idem-lower CPU variant with lowered block index
  • ./bin/dwarf-cloudsc-loki-idem-lower-loop CPU variant with lowered block index and loop
  • ./bin/dwarf-cloudsc-loki-scc-parametrise-cuda SCC-PARAMETRISE CUDA C
  • ./bin/dwarf-cloudsc-loki-scc-hoist-cuda SCC-HOIST CUDA C

Furthermore, usage of pipelines:

SCCLowLevelCufParametrise = partial(
    Pipeline, classes=(
        SCCBaseTransformation,
        SCCDevectorTransformation,
        SCCDemoteTransformation,
        SCCRevectorTransformation,
        LowerBlockIndexTransformation,
        InjectBlockIndexTransformation,
        LowerBlockLoopTransformation,
        SccLowLevelLaunchConfiguration,
        SccLowLevelDataOffload,
        ParametriseTransformation
    )
)

SCCLowLevelCufHoist = partial(
    Pipeline, classes=(
        SCCBaseTransformation,
        SCCDevectorTransformation,
        SCCDemoteTransformation,
        SCCRevectorTransformation,
        LowerBlockIndexTransformation,
        InjectBlockIndexTransformation,
        LowerBlockLoopTransformation,
        SccLowLevelLaunchConfiguration,
        SccLowLevelDataOffload,
        HoistTemporaryArraysAnalysis,
        HoistTemporaryArraysDeviceAllocatableTransformation
    )
)

SCCLowLevelParametrise = partial(
    Pipeline, classes=(
        InlineTransformation,
        GlobalVariableAnalysis,
        GlobalVarHoistTransformation,
        DerivedTypeArgumentsTransformation,
        SCCBaseTransformation,
        SCCDevectorTransformation,
        SCCDemoteTransformation,
        SCCRevectorTransformation,
        LowerBlockIndexTransformation,
        InjectBlockIndexTransformation,
        LowerBlockLoopTransformation,
        SccLowLevelLaunchConfiguration,
        SccLowLevelDataOffload,
        ParametriseTransformation
    )
)

SCCLowLevelHoist = partial(
    Pipeline, classes=(
        InlineTransformation,
        GlobalVariableAnalysis,
        GlobalVarHoistTransformation,
        DerivedTypeArgumentsTransformation,
        SCCBaseTransformation,
        SCCDevectorTransformation,
        SCCDemoteTransformation,
        SCCRevectorTransformation,
        LowerBlockIndexTransformation,
        InjectBlockIndexTransformation,
        LowerBlockLoopTransformation,
        SccLowLevelLaunchConfiguration,
        SccLowLevelDataOffload,
        HoistTemporaryArraysAnalysis,
        HoistTemporaryArraysPragmaOffloadTransformation
    )
)

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/328/index.html

@MichaelSt98 MichaelSt98 marked this pull request as ready for review June 17, 2024 09:57
Copy link

codecov bot commented Jun 18, 2024

Codecov Report

Attention: Patch coverage is 95.19380% with 62 lines in your changes missing coverage. Please review.

Project coverage is 95.44%. Comparing base (fce5733) to head (cccfe50).
Report is 37 commits behind head on main.

Files with missing lines Patch % Lines
loki/transformations/single_column/scc_cuf.py 89.13% 39 Missing ⚠️
...oki/transformations/block_index_transformations.py 94.77% 7 Missing ⚠️
loki/backend/cgen.py 94.87% 4 Missing ⚠️
...oki/transformations/single_column/scc_low_level.py 87.87% 4 Missing ⚠️
loki/transformations/transpile/fortran_c.py 96.58% 4 Missing ⚠️
loki/tools/util.py 80.00% 2 Missing ⚠️
loki/backend/cudagen.py 98.59% 1 Missing ⚠️
loki/transformations/array_indexing.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   95.41%   95.44%   +0.02%     
==========================================
  Files         178      181       +3     
  Lines       37312    38163     +851     
==========================================
+ Hits        35600    36423     +823     
- Misses       1712     1740      +28     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.42% <95.19%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, first of all this is a fantastic piece of work and achievement! Well done!

The complexity of the implementation is immense and I've tried to work my way through it to the best of my abilities but I definitely can't claim I fully understood it. Particularly the scc_cuf.py implementation I only skimmed and trust that it seems to be doing what it is supposed to do.

While working my way through the diff I've left comments as I went along, which have piled up to a quite substantial number now. Some of them are small things, others maybe require a little more work. So, it will also be quite confusing tracking the changes that you make as a consequence. To make the integration a little more digestable, I would suggest we try to split the PR into smaller pieces and bring them in piecewise.

I would probably start with the changes in block_index_transformations.py and the associated changes. These appear to me sufficiently standalone and independently tested so that I think it should be possible to merge them first.

Maybe we can subsequently take in the C-flavour backend changes, including some of the F2C transpilation changes if required?

Hopefully, this will reduce the size of this PR than to only the actual SCC transformation?

Open to suggestions here, of course.

Finally, many thanks and great work again!

loki/backend/cgen.py Outdated Show resolved Hide resolved
loki/backend/cgen.py Outdated Show resolved Hide resolved
loki/backend/cgen.py Outdated Show resolved Hide resolved
loki/backend/cgen.py Outdated Show resolved Hide resolved
loki/backend/cgen.py Outdated Show resolved Hide resolved
loki/transformations/transpile/fortran_c.py Outdated Show resolved Hide resolved
loki/transformations/transpile/fortran_c.py Outdated Show resolved Hide resolved
loki/backend/cgen.py Show resolved Hide resolved
loki/transformations/single_column/base.py Outdated Show resolved Hide resolved
loki/transformations/single_column/base.py Outdated Show resolved Hide resolved
@MichaelSt98
Copy link
Collaborator Author

Thank you for the review @reuterbal!
I have no objections to your comments and want to apologise for the large PR...
However, it seems that you have no general objections regarding the implementation/splitting of transformation/etc.

I'll tackle your comments and see whether I can split up the PR into multiple smaller PRs.

@MichaelSt98 MichaelSt98 force-pushed the nams-cloudsc-low-level-GPU branch 2 times, most recently from 61866e0 to 7694278 Compare August 5, 2024 13:56
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for working through the comments and making requested changes. This looks almost ready to me.

I have marked comments as resolved that appear to have been fixed. There's only a handful left for which it would be good if you could have another look @MichaelSt98, and also a few where it would be good for @mlange05 to maybe provide some feedback.

The branch has a conflict in one file now, which needs to be resolved.

loki/transformations/block_index_transformations.py Outdated Show resolved Hide resolved
loki/transformations/single_column/scc_cuf.py Outdated Show resolved Hide resolved
loki/transformations/single_column/scc_cuf.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, particularly for sorting out the OpenMP directive removal!

All good from my side. Any additions @mlange05 ?

@reuterbal reuterbal added the ready for merge This PR has been approved and is ready to be merged label Sep 5, 2024
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely fantastic! No notes - just brilliant! Many, many thanks, GTG for me!

@reuterbal
Copy link
Collaborator

As discussed offline: I have removed the commit that points to the custom CLOUDSC branch. After merging this PR, we can create a new Loki release.
Subsequently, please file a corresponding CLOUDSC PR against develop that adds the transpilation targets in the CMake there and points to the new Loki version.
Once that's in CLOUDSC develop, we can also enable (at least the idem variants) in the test_cloudsc.py of the regression tests.

As soon as CI tests are green I will merge. Thanks again!

@reuterbal reuterbal merged commit 93196fa into main Sep 6, 2024
13 checks passed
@reuterbal reuterbal deleted the nams-cloudsc-low-level-GPU branch September 6, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants