Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New transformation 'LowerConstantArrayIndices' to allow to … #348

Merged
merged 3 commits into from
Aug 23, 2024

Conversation

MichaelSt98
Copy link
Collaborator

to allow to pass/lower constant array indices to kernel(s) calls

E.g.,

subroutine driver(...)
  real, intent(inout) :: var(nlon,nlev,5,nb)
  do ibl=1,10
    call kernel(var(:, :, 1, ibl), var(:, :, 2:5, ibl))
  end do
end subroutine driver
     
subroutine kernel(var1, var2)
  real, intent(inout) :: var1(nlon, nlev)
  real, intent(inout) :: var2(nlon, nlev, 4)
  var1(:, :) = ...
  do jk=1,nlev
    do jl=1,nlon
      var1(jl, jk) = ...
      do jt=1,4
        var2(jl, jk, jt) = ...
      enddo
    enddo
  enddo
end subroutine kernel

is transformed to:

subroutine driver(...)
  real, intent(inout) :: var(nlon,nlev,5,nb)
  do ibl=1,10
    call kernel(var(:, :, :, ibl), var(:, :, :, ibl))
  end do
end subroutine driver

subroutine kernel(var1, var2)
  real, intent(inout) :: var1(nlon, nlev, 5)
  real, intent(inout) :: var2(nlon, nlev, 5)
  var1(:, :, 1) = ...
  do jk=1,nlev
    do jl=1,nlon
      var1(jl, jk, 1) = ...
      do jt=1,4
        var2(jl, jk, jt + 2 + -1) = ...
      enddo
    enddo
  enddo
end subroutine kernel

This allows to use the "normal" driver and kernel for CLOUDSC for e.g., the CUF transformation, see CLOUDSC branch: nams-lower-constant-array-indices.

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/348/index.html

@MichaelSt98 MichaelSt98 force-pushed the nams-cloudsc-lower-constant-array-indices branch 2 times, most recently from 3751695 to 10d932f Compare July 22, 2024 12:40
Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 99.57983% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.41%. Comparing base (538a744) to head (80a1a81).
Report is 4 commits behind head on main.

Files Patch % Lines
loki/transformations/array_indexing.py 98.98% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #348      +/-   ##
==========================================
+ Coverage   95.39%   95.41%   +0.02%     
==========================================
  Files         177      177              
  Lines       37094    37332     +238     
==========================================
+ Hits        35385    35622     +237     
- Misses       1709     1710       +1     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.40% <99.57%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks quite promising and useful.

I think a current limitation is that this allows replacing only one index and could be easily extended to an arbitrary number - I've left a suggestion how this could be tackled, hope it's clear.

Otherwise the transformation code itself is a bit Spaghetti-like and could benefit from breaking down into a few logical steps to improve clarity and readability.

subroutine driver(...)
real, intent(inout) :: var(nlon,nlev,5,nb)
do ibl=1,10
call kernel(var(:, :, :, ibl), var(:, :, :, ibl))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have potential issues with aliasing here, or rather compilers not being able to figure out that there isn't any? I'm wondering wether we could avoid passing the same array multiple times and instead rename the argument then on kernel side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that is possible. However, this makes the transformation more invasive and the question arises whether we want to do that within this transformation or introduce a separate utility/transformation to do that. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, that could indeed be a separate transformation pass! In that case, disregard this here.

loki/transformations/array_indexing.py Show resolved Hide resolved
inline_constant_parameters(routine, external_only=self.inline_external_only)
self.process(routine, targets)

def process(self, routine, targets):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty lengthy code and hard to digest. Can we break this down a bit, e.g., like this?

def process_caller():
    dispatched_routines = set()
    for call in ...:
        for routine_arg, call_arg in ...:
            # Create argument mappings
            self.update_caller_arg(updated_call_args, ...)
            if call.routine.name not in dispatched_routines:
                self.update_callee_arg(routine_vmap, ...)

        # Apply mapping to callee
        if call.routine.name not in dispatched_routines:
            dispatched_routines.add(call.routine.name)
            self.process_callee(call.routine, routine_vmap)

        # Update the call
        self.update_call(...)

Comment on lines 740 to 745
insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim))
if insert_dims:
for insert_dim in insert_dims:
new_dims = list(call_arg.dimensions)
new_dims[insert_dim[0]] = sym.RangeIndex((None, None))
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a bit shorter and more readable:

Suggested change
insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim))
if insert_dims:
for insert_dim in insert_dims:
new_dims = list(call_arg.dimensions)
new_dims[insert_dim[0]] = sym.RangeIndex((None, None))
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))
if any(self.is_constant_dim(d) for d in call_arg.dimensions):
new_dims = [sym.RangeIndex((None, None)) if self.is_constant_dim(d) else d for d in call_arg.dimensions]
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims))

and then re-write the bit for the routine argument similarly.

new_dims = list(routine_arg.dimensions)
# dimension is a constant RangeIndex, e.g., '1:3'
if isinstance(insert_dim[1], sym.RangeIndex):
introduce_offset[routine_arg.name] = (insert_dim[0], insert_dim[1].children[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the transformation only works if there is a single index to be replaced, right? Couldn't we just use a generic map with offsets, i.e., if we have something like

call routine(..., var(:, 1, :, 2:3, ibl), ...)
...
subroutine routine(..., arg, ...)
  ... :: arg(:, :, :)

that would be transformed to

call routine(..., var(:, :, :, :, ibl), ...)
...
subroutine routine(..., arg, ...)
  ... :: arg(:, :, :, :)

then have the corresponding offset_map as something like:

offset_map[arg] = {
    0: (0, None), # same index as before, no offset
    1: (None, 1), # New index, offset 1
    2: (1, None), # Used to be position 1, no offset
    3: (2, 2), # Used to be position 2, Offset 2
}

I hope it is clear what I mean here

@MichaelSt98 MichaelSt98 force-pushed the nams-cloudsc-lower-constant-array-indices branch from 0d83abd to 6e779ee Compare August 7, 2024 11:42
@MichaelSt98
Copy link
Collaborator Author

MichaelSt98 commented Aug 7, 2024

Refactored and improved.

With this we can transform something funny like:

subroutine driver(nlon,nlev,nb,var)
  use kernel_mod, only: kernel
  implicit none
  integer, parameter :: param_1 = 1
  integer, parameter :: param_2 = 2
  integer, parameter :: param_3 = 5
  integer, intent(in) :: nlon,nlev,nb
  real, intent(inout) :: var(nlon,4,3,nlev,param_3,nb)
  integer :: ibl, j
  integer :: offset
  integer :: some_val
  integer :: loop_start, loop_end
  loop_start = 2
  loop_end = nb
  some_val = 0
  offset = 1
  !$omp test
  do ibl=loop_start, loop_end
    do j=1,4
      call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
      call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
    end do
  enddo
end subroutine driver

subroutine kernel(nlon,nlev,var,another_var,icend,lstart,lend)
  use compute_mod, only: compute
  implicit none
  integer, intent(in) :: nlon,nlev,icend,lstart,lend
  real, intent(inout) :: var(nlon,nlev)
  real, intent(inout) :: another_var(nlon,2,nlev,4)
  integer :: jk, jl, jt
  var(:,:) = 0.
  do jk = 1,nlev
    do jl = 1, nlon
      var(jl, jk) = 0.
      do jt= 1,4
        another_var(jl, 1, jk, jt) = 0.0
      end do
    end do
  end do
  call compute(nlon,nlev,var)
  call compute(nlon,nlev,var)
end subroutine kernel

subroutine compute(nlon,nlev,var)
  implicit none
  integer, intent(in) :: nlon,nlev
  real, intent(inout) :: var(nlon,nlev)
  var(:,:) = 0.
end subroutine compute

to

SUBROUTINE driver (nlon, nlev, nb, var)
  USE kernel_mod, ONLY: kernel
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev, nb
  REAL, INTENT(INOUT) :: var(nlon, 4, 3, nlev, 5, nb)
  INTEGER :: ibl, j
  INTEGER :: offset
  INTEGER :: some_val
  INTEGER :: loop_start, loop_end
  loop_start = 2
  loop_end = nb
  some_val = 0
  offset = 1
!$omp test
  DO ibl=loop_start,loop_end
    DO j=1,4
      CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
      CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
    END DO
  END DO
END SUBROUTINE driver

SUBROUTINE kernel (nlon, nlev, var, another_var, icend, lstart, lend)
  USE compute_mod, ONLY: compute
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev, icend, lstart, lend
  REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
  REAL, INTENT(INOUT) :: another_var(nlon, 3, nlev, 5)
  INTEGER :: jk, jl, jt
  var(:, 1, :, 1) = 0.
  DO jk=1,nlev
    DO jl=1,nlon
      var(jl, 1, jk, 1) = 0.
      DO jt=1,4
        another_var(jl, 1 + 2 + -1, jk, jt + 2 + -1) = 0.0
      END DO
    END DO
  END DO
  CALL compute(nlon, nlev, var(:, :, :, :))
  CALL compute(nlon, nlev, var(:, :, :, :))
END SUBROUTINE kernel

SUBROUTINE compute (nlon, nlev, var)
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: nlon, nlev
  REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
  var(:, 1, :, 1) = 0.
END SUBROUTINE compute

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many!! thanks for these changes, I think that improved the capabilities of the transform and the structure is also much clearer to me now. Good to go from me!

@reuterbal reuterbal force-pushed the nams-cloudsc-lower-constant-array-indices branch from 6e779ee to 80a1a81 Compare August 23, 2024 14:25
@reuterbal reuterbal added the ready for merge This PR has been approved and is ready to be merged label Aug 23, 2024
@reuterbal reuterbal merged commit 9a76c21 into main Aug 23, 2024
13 checks passed
@reuterbal reuterbal deleted the nams-cloudsc-lower-constant-array-indices branch August 23, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants