Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCC: Loop over blocks not annotated correctly when nested within another loop in driver #246

Open
skarppinen opened this issue Mar 12, 2024 · 1 comment

Comments

@skarppinen
Copy link
Contributor

skarppinen commented Mar 12, 2024

Hello, filing an issue by request of @reuterbal.

Here is a reproducer of the problem:

module mod
    implicit none
    contains
    subroutine wrapper()
        integer, parameter :: ngpblk = 2
        integer, parameter :: klon = 10
        integer, parameter :: klev = 5 
        integer, parameter :: niter = 10
        integer :: jblk, jiter
        integer :: arr(klon, klev, ngpblk)

        ! Begin relevant part.
        DO jiter=1,niter
            DO jblk=1,ngpblk
                CALL kernel(klon, klev, 1, 1, arr(:, :, jblk))
            END DO
        END DO
        ! End relevant part.

    end subroutine wrapper

    subroutine kernel(klon, klev, jlev_lower, jlon_lower, arr)
        integer, intent(in) :: klon
        integer, intent(in) :: klev
        integer, intent(in) :: jlev_lower, jlon_lower
        integer, intent(inout) :: arr(klon, klev)
        integer :: jlon, jlev

        DO jlev=jlev_lower,klev
            DO jlon=jlon_lower,klon
                arr(jlon, jlev) = jlon + jlev
            END DO
        END DO

    end subroutine kernel

end module mod

After transforming with SCC, the relevant part becomes:

!$acc parallel loop gang vector_length( KLON )
    DO jiter=1,niter
      DO jblk=1,ngpblk
        CALL kernel_SCC(klon, klev, 1, 1, arr(:, :, jblk))
      END DO
    END DO
!$acc end parallel loop

Although I believe this should be

    DO jiter=1,niter
!$acc parallel loop gang vector_length( KLON )
      DO jblk=1,ngpblk
        CALL kernel_SCC(klon, klev, 1, 1, arr(:, :, jblk))
      END DO
!$acc end parallel loop
    END DO

instead.

I invoke the SCC transform with: loki-transform.py convert --mode scc --config config.cfg --path src --out-path scc,
where the contents of "config.cfg" are:

[default]
role = 'kernel'
expand = true  # Automatically expand subroutine calls
strict = true # Throw exceptions during discovery
enable_imports = true

[routines.wrapper]
role = 'driver'

[dimensions.horizontal]
size = 'KLON'
index = 'JLON'
bounds = ['jlon_lower', 'klon']

[dimensions.vertical]
size = 'KLEV'
index = 'JLEV'

[dimensions.block_dim]
size = 'NGPBLK'
index = 'JBLK'

I believe the issue is coming from SCCAnnotate.process_driver. I did some digging and I found out that the method calls SCCBaseTransformation.find_driver_loops, which returns the outer loop in the above example.
Should this function return the inner loop instead? (as the loops get passed to annotate_driver)?

@reuterbal
Copy link
Collaborator

I can confirm the behaviour and you are on the right path w.r.t. where this comes from: currently, all loops that contain a call to a target routine are considered driver loops. The block_dim is not used for this. Consequently, the jiter loop is picked.

I'm not sure if we want/should prescribe some kind of rule about the nesting of driver loops.

In any case, here is a test based on your reproduces, which should help when investigating this in the future:

@pytest.mark.parametrize('frontend', available_frontends())
def test_scc_annotate_driver_loop(frontend):
    """
    Test for issue #246
    """
    fcode = """
module mod
    implicit none
    integer, parameter :: ngpblk = 2
    integer, parameter :: klon = 10
    integer, parameter :: klev = 5
    contains
    subroutine wrapper(arr)
        integer, parameter :: niter = 10
        integer, intent(in) :: arr(klon, klev, ngpblk)
        integer :: jblk, jiter

        ! Begin relevant part.
        DO jiter=1,niter
            DO jblk=1,ngpblk
                CALL kernel(klon, klev, 1, 1, arr(:, :, jblk))
            END DO
        END DO
        ! End relevant part.

    end subroutine wrapper

    subroutine kernel(klon, klev, jlev_lower, jlon_lower, arr)
        integer, intent(in) :: klon
        integer, intent(in) :: klev
        integer, intent(in) :: jlev_lower, jlon_lower
        integer, intent(inout) :: arr(klon, klev)
        integer :: jlon, jlev

        DO jlev=jlev_lower,klev
            DO jlon=jlon_lower,klon
                arr(jlon, jlev) = jlon + jlev
            END DO
        END DO

    end subroutine kernel

end module mod
    """.strip()

    workdir = gettempdir() / 'test_scc_annotate_driver_loop'
    workdir.mkdir(exist_ok=True)

    source = Sourcefile.from_source(fcode, frontend=frontend, xmods=(workdir,))

    horizontal = Dimension('horizontal', index='JLON', bounds=['jlon_lower', 'klon'], size='KLON')
    vertical = Dimension('vertical', index='JLEV', bounds=['jlev_lower', 'klev'], size='KLEV')
    block_dim = Dimension('block_dim', index='JBLK', size='NGPBLK')

    # Test OpenACC annotations on non-hoisted version
    scc_transform = (
        SCCDevectorTransformation(horizontal=horizontal),
        SCCDemoteTransformation(horizontal=horizontal),
        SCCRevectorTransformation(horizontal=horizontal),
        SCCAnnotateTransformation(
            horizontal=horizontal, vertical=vertical, directive='openacc', block_dim=block_dim
        ),
    )
    for transform in scc_transform:
        transform.apply(source['wrapper'], role='driver', targets=['kernel'])
        transform.apply(source['kernel'], role='kernel')

    # Ensure routine is annotated at vector level
    pragmas = FindNodes(Pragma).visit(source['kernel'].ir)
    assert len(pragmas) == 5
    assert pragmas[0].keyword == 'acc'
    assert pragmas[0].content == 'routine vector'
    assert pragmas[1].keyword == 'acc'
    assert pragmas[1].content == 'data present(arr)'
    assert pragmas[-1].keyword == 'acc'
    assert pragmas[-1].content == 'end data'

    # Ensure vector and seq loops are annotated, including privatized variable `b`
    with pragmas_attached(source['kernel'], Loop):
        kernel_loops = FindNodes(Loop).visit(source['kernel'].ir)
        assert len(kernel_loops) == 2
        assert kernel_loops[0].pragma[0].keyword == 'acc'
        assert kernel_loops[0].pragma[0].content == 'loop vector'
        assert kernel_loops[1].pragma[0].keyword == 'acc'
        assert kernel_loops[1].pragma[0].content == 'loop seq'

    # Ensure a single outer parallel loop in driver
    with pragmas_attached(source['wrapper'], Loop):
        driver_loops = FindNodes(Loop).visit(source['wrapper'].body)
        assert len(driver_loops) == 2
        assert not driver_loops[0].pragma
        assert driver_loops[1].pragma[0].keyword == 'acc'
        assert driver_loops[1].pragma[0].content == 'parallel loop gang vector_length(KLON)'

    rmtree(workdir)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants