New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

New transformation 'LowerConstantArrayIndices' to allow to … #348

Merged

reuterbal merged 3 commits into main from nams-cloudsc-lower-constant-array-indices

Aug 23, 2024

Collaborator

MichaelSt98 commented Jul 22, 2024

to allow to pass/lower constant array indices to kernel(s) calls

E.g.,

subroutine driver(...)
  real, intent(inout) :: var(nlon,nlev,5,nb)
  do ibl=1,10
    call kernel(var(:, :, 1, ibl), var(:, :, 2:5, ibl))
  end do
end subroutine driver
     
subroutine kernel(var1, var2)
  real, intent(inout) :: var1(nlon, nlev)
  real, intent(inout) :: var2(nlon, nlev, 4)
  var1(:, :) = ...
  do jk=1,nlev
    do jl=1,nlon
      var1(jl, jk) = ...
      do jt=1,4
        var2(jl, jk, jt) = ...
      enddo
    enddo
  enddo
end subroutine kernel

is transformed to:

subroutine driver(...)
  real, intent(inout) :: var(nlon,nlev,5,nb)
  do ibl=1,10
    call kernel(var(:, :, :, ibl), var(:, :, :, ibl))
  end do
end subroutine driver

subroutine kernel(var1, var2)
  real, intent(inout) :: var1(nlon, nlev, 5)
  real, intent(inout) :: var2(nlon, nlev, 5)
  var1(:, :, 1) = ...
  do jk=1,nlev
    do jl=1,nlon
      var1(jl, jk, 1) = ...
      do jt=1,4
        var2(jl, jk, jt + 2 + -1) = ...
      enddo
    enddo
  enddo
end subroutine kernel

This allows to use the "normal" driver and kernel for CLOUDSC for e.g., the CUF transformation, see CLOUDSC branch: nams-lower-constant-array-indices.

github-actions bot commented Jul 22, 2024

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/348/index.html

MichaelSt98 requested review from reuterbal and mlange05

July 22, 2024 09:51

MichaelSt98 force-pushed the nams-cloudsc-lower-constant-array-indices branch 2 times, most recently from 3751695 to 10d932f Compare

July 22, 2024 12:40

codecov bot commented Jul 22, 2024 •

edited

Loading

Codecov Report

Attention: Patch coverage is 99.57983% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.41%. Comparing base (538a744) to head (80a1a81).
Report is 4 commits behind head on main.

Files	Patch %	Lines
loki/transformations/array_indexing.py	98.98%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #348      +/-   ##
==========================================
+ Coverage   95.39%   95.41%   +0.02%     
==========================================
  Files         177      177              
  Lines       37094    37332     +238     
==========================================
+ Hits        35385    35622     +237     
- Misses       1709     1710       +1

Flag	Coverage Δ
lint_rules	`96.39% <ø> (ø)`
loki	`95.40% <99.57%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

reuterbal requested changes

View reviewed changes

Collaborator

reuterbal left a comment

Thanks, this looks quite promising and useful.

I think a current limitation is that this allows replacing only one index and could be easily extended to an arbitrary number - I've left a suggestion how this could be tackled, hope it's clear.

Otherwise the transformation code itself is a bit Spaghetti-like and could benefit from breaking down into a few logical steps to improve clarity and readability.

loki/transformations/array_indexing.py

+                    subroutine driver(...)
+                      real, intent(inout) :: var(nlon,nlev,5,nb)
+                      do ibl=1,10
+                        call kernel(var(:, :, :, ibl), var(:, :, :, ibl))

Collaborator

reuterbal Jul 24, 2024

Do we have potential issues with aliasing here, or rather compilers not being able to figure out that there isn't any? I'm wondering wether we could avoid passing the same array multiple times and instead rename the argument then on kernel side?

Collaborator Author

MichaelSt98 Jul 26, 2024

Sure, that is possible. However, this makes the transformation more invasive and the question arises whether we want to do that within this transformation or introduce a separate utility/transformation to do that. What do you think?

Collaborator

reuterbal Jul 26, 2024

Ah, good point, that could indeed be a separate transformation pass! In that case, disregard this here.

loki/transformations/array_indexing.py Show resolved Hide resolved

loki/transformations/array_indexing.py

+                          inline_constant_parameters(routine, external_only=self.inline_external_only)
+                          self.process(routine, targets)
+                  def process(self, routine, targets):

Collaborator

reuterbal Jul 24, 2024

This is pretty lengthy code and hard to digest. Can we break this down a bit, e.g., like this?

def process_caller():
    dispatched_routines = set()
    for call in ...:
        for routine_arg, call_arg in ...:
            # Create argument mappings
            self.update_caller_arg(updated_call_args, ...)
            if call.routine.name not in dispatched_routines:
                self.update_callee_arg(routine_vmap, ...)

        # Apply mapping to callee
        if call.routine.name not in dispatched_routines:
            dispatched_routines.add(call.routine.name)
            self.process_callee(call.routine, routine_vmap)

        # Update the call
        self.update_call(...)

loki/transformations/array_indexing.py Outdated

Comment on lines 740 to 745

+                              insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim))
+                              if insert_dims:
+                                  for insert_dim in insert_dims:
+                                      new_dims = list(call_arg.dimensions)
+                                      new_dims[insert_dim[0]] = sym.RangeIndex((None, None))
+                                      updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))

Collaborator

reuterbal Jul 24, 2024

I think this would be a bit shorter and more readable:

Suggested change

      
                            insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim))
          
                            if insert_dims:
          
                                for insert_dim in insert_dims:
          
                                    new_dims = list(call_arg.dimensions)
          
                                    new_dims[insert_dim[0]] = sym.RangeIndex((None, None))
          
                                    updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))
          
                            if any(self.is_constant_dim(d) for d in call_arg.dimensions):
          
                                new_dims = [sym.RangeIndex((None, None)) if self.is_constant_dim(d) else d for d in call_arg.dimensions]
          
                                updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))

and then re-write the bit for the routine argument similarly.

loki/transformations/array_indexing.py Outdated

+                                          new_dims = list(routine_arg.dimensions)
+                                          # dimension is a constant RangeIndex, e.g., '1:3'
+                                          if isinstance(insert_dim[1], sym.RangeIndex):
+                                              introduce_offset[routine_arg.name] = (insert_dim[0], insert_dim[1].children[0])

Collaborator

reuterbal Jul 24, 2024

This means the transformation only works if there is a single index to be replaced, right? Couldn't we just use a generic map with offsets, i.e., if we have something like

call routine(..., var(:, 1, :, 2:3, ibl), ...)
...
subroutine routine(..., arg, ...)
  ... :: arg(:, :, :)

that would be transformed to

call routine(..., var(:, :, :, :, ibl), ...)
...
subroutine routine(..., arg, ...)
  ... :: arg(:, :, :, :)

then have the corresponding offset_map as something like:

offset_map[arg] = {
    0: (0, None), # same index as before, no offset
    1: (None, 1), # New index, offset 1
    2: (1, None), # Used to be position 1, no offset
    3: (2, 2), # Used to be position 2, Offset 2
}

I hope it is clear what I mean here

MichaelSt98 force-pushed the nams-cloudsc-lower-constant-array-indices branch from 0d83abd to 6e779ee Compare

August 7, 2024 11:42

Collaborator Author

MichaelSt98 commented Aug 7, 2024 •

edited

Loading

Refactored and improved.

With this we can transform something funny like:

subroutine driver(nlon,nlev,nb,var)
  use kernel_mod, only: kernel
  implicit none
  integer, parameter :: param_1 = 1
  integer, parameter :: param_2 = 2
  integer, parameter :: param_3 = 5
  integer, intent(in) :: nlon,nlev,nb
  real, intent(inout) :: var(nlon,4,3,nlev,param_3,nb)
  integer :: ibl, j
  integer :: offset
  integer :: some_val
  integer :: loop_start, loop_end
  loop_start = 2
  loop_end = nb
  some_val = 0
  offset = 1
  !$omp test
  do ibl=loop_start, loop_end
    do j=1,4
      call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
      call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
    end do
  enddo
end subroutine driver

subroutine kernel(nlon,nlev,var,another_var,icend,lstart,lend)
  use compute_mod, only: compute
  implicit none
  integer, intent(in) :: nlon,nlev,icend,lstart,lend
  real, intent(inout) :: var(nlon,nlev)
  real, intent(inout) :: another_var(nlon,2,nlev,4)
  integer :: jk, jl, jt
  var(:,:) = 0.
  do jk = 1,nlev
    do jl = 1, nlon
      var(jl, jk) = 0.
      do jt= 1,4
        another_var(jl, 1, jk, jt) = 0.0
      end do
    end do
  end do
  call compute(nlon,nlev,var)
  call compute(nlon,nlev,var)
end subroutine kernel

subroutine compute(nlon,nlev,var)
  implicit none
  integer, intent(in) :: nlon,nlev
  real, intent(inout) :: var(nlon,nlev)
  var(:,:) = 0.
end subroutine compute

to

SUBROUTINE driver (nlon, nlev, nb, var)
  USE kernel_mod, ONLY: kernel
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev, nb
  REAL, INTENT(INOUT) :: var(nlon, 4, 3, nlev, 5, nb)
  INTEGER :: ibl, j
  INTEGER :: offset
  INTEGER :: some_val
  INTEGER :: loop_start, loop_end
  loop_start = 2
  loop_end = nb
  some_val = 0
  offset = 1
!$omp test
  DO ibl=loop_start,loop_end
    DO j=1,4
      CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
      CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
    END DO
  END DO
END SUBROUTINE driver

SUBROUTINE kernel (nlon, nlev, var, another_var, icend, lstart, lend)
  USE compute_mod, ONLY: compute
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev, icend, lstart, lend
  REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
  REAL, INTENT(INOUT) :: another_var(nlon, 3, nlev, 5)
  INTEGER :: jk, jl, jt
  var(:, 1, :, 1) = 0.
  DO jk=1,nlev
    DO jl=1,nlon
      var(jl, 1, jk, 1) = 0.
      DO jt=1,4
        another_var(jl, 1 + 2 + -1, jk, jt + 2 + -1) = 0.0
      END DO
    END DO
  END DO
  CALL compute(nlon, nlev, var(:, :, :, :))
  CALL compute(nlon, nlev, var(:, :, :, :))
END SUBROUTINE kernel

SUBROUTINE compute (nlon, nlev, var)
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev
  REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
  var(:, 1, :, 1) = 0.
END SUBROUTINE compute

reuterbal approved these changes

View reviewed changes

Collaborator

reuterbal left a comment

Many!! thanks for these changes, I think that improved the capabilities of the transform and the structure is also much clearer to me now. Good to go from me!

MichaelSt98 added 3 commits

August 23, 2024 16:23


          Introduce new transformation 'LowerConstantArrayIndices' to allow to …

b44810f

…pass/lower constant array indices to kernel(s) calls


          use xmods=[tmp_path] for test

84b8f5b


          Refactor and improve 'LowerConstantArrayIndices' (which also allows n…

80a1a81

…ow multiple constant array indices for one array argument)

reuterbal force-pushed the nams-cloudsc-lower-constant-array-indices branch from 6e779ee to 80a1a81 Compare

August 23, 2024 14:25

reuterbal added the ready for merge label

reuterbal merged commit 9a76c21 into main

13 checks passed

reuterbal deleted the nams-cloudsc-lower-constant-array-indices branch

August 23, 2024 18:15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for merge