Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Closes #2845 #1792) fix for inline symbol bug #2848

Open
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

arporter
Copy link
Member

No description provided.

@arporter arporter added in progress NEMO Issue relates to the NEMO domain NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH labels Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 98.57143% with 6 lines in your changes missing coverage. Please review.

Project coverage is 99.88%. Comparing base (e5fe4ea) to head (90334cd).

Files with missing lines Patch % Lines
src/psyclone/psyir/transformations/inline_trans.py 94.36% 4 Missing ⚠️
...psyclone/psyir/symbols/generic_interface_symbol.py 93.75% 1 Missing ⚠️
src/psyclone/psyir/symbols/symbol_table.py 99.31% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2848      +/-   ##
==========================================
- Coverage   99.89%   99.88%   -0.02%     
==========================================
  Files         359      359              
  Lines       51102    51329     +227     
==========================================
+ Hits        51050    51270     +220     
- Misses         52       59       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@arporter
Copy link
Member Author

A small change to the inlining transformation so that symbols are added to the table of the Routine of the call site rather than to the table of the local scope. This allows us to spot problems in the validate rather than crashing at the end of the appy() method. The integration tests will need to be run but I don't want to do that during the day while Glados is busy.

LonelyCat124
LonelyCat124 previously approved these changes Jan 15, 2025
Copy link
Collaborator

@LonelyCat124 LonelyCat124 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check if the integration tests are ok, but the code changes all look fine, and tests/coverage/etc. are all fine.

Edit: Someone started the integration tests about 30 minutes ago - @arporter is this ok or do we need to stop them and try them later?

@arporter
Copy link
Member Author

It's OK, it was me and I see NEMO v.4 failed :-(

@arporter
Copy link
Member Author

Failure was:

  File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.13/site-packages/psyclone/psyir/symbols/symbol_table.py", line 1793, in rename_symbol
    raise ValueError(
        f"The symbol argument of rename_symbol() must belong to this "
        f"symbol_table instance, but '{symbol}' does not.")
ValueError: The symbol argument of rename_symbol() must belong to this symbol_table instance, but 'psyclone_cmp_int: DataSymbol<Scalar<BOOLEAN, UNDEFINED>, Automatic>' does not.
make: *** [Makefile:70: psycloned-openacc_kernels/dynldf.f90] Error 1

and must be because we are adding symbols to tables in nested scopes rather than the parent Routine scope.

@LonelyCat124
Copy link
Collaborator

Ok - I'll sent it back to you to resolve then

@LonelyCat124 LonelyCat124 self-requested a review January 15, 2025 12:26
@LonelyCat124 LonelyCat124 dismissed their stale review January 15, 2025 12:30

Integration tests failed

@arporter
Copy link
Member Author

That was easier than I expected. Will wait until tonight to trigger integration tests again.

@arporter
Copy link
Member Author

NEMO4 OpenACC kernels integration test failed again :-( Will investigate.

@arporter
Copy link
Member Author

Tests and coverage should all be lovely now. However, running PSyclone over NEMOV4 still reveals problems for bdydyn3d.f90.

@arporter
Copy link
Member Author

It turned out that SymbolTable.resolve_imports() was not respecting whether or not the import was a wildcard. I've fixed this now.

@arporter
Copy link
Member Author

arporter commented Feb 21, 2025

NEMOv4 integration test failed at compilation time:

NVFORTRAN-S-0038-Symbol, kdim, has not been explicitly declared (NEMO/cfgs/SPITZ12_openacc_kernels/WORK/obs_inter_h2d.f90)
14606
  0 inform,   0 warnings,   1 severes, 0 fatal for obs_int_h2d_pol
14607
NVFORTRAN-S-1254-grt_cir_dis is use associated with obs_utils and cannot be redeclared. (NEMO/cfgs/SPITZ12_openacc_kernels/WORK/obs_inter_h2d.f90: 1291)
14608
  0 inform,   0 warnings,   1 severes, 0 fatal for grt_cir_dis
14609
NVFORTRAN-S-1254-grt_cir_dis_saa is use associated with obs_utils and cannot be redeclared. (NEMO/cfgs/SPITZ12_openacc_kernels/WORK/obs_inter_h2d.f90: 1304)
14610
  0 inform,   0 warnings,   1 severes, 0 fatal for grt_cir_dis_saa

obs_inter_h2d has a wildcard import from obs_utils and this is being left unchanged by the module-inline transformation so that results in a name clash.

A wildcard USE statement can have a rename-list associated with it so probably the easiest solution is to rename the version of the routine being imported from the module so that it doesn't clash with the one that has been inlined into the current module scope, eg. USE obs_utils, psy_renamed => grt_cir_dis_saa.

@arporter
Copy link
Member Author

I've extended KernelModuleInlineTrans so that it now does the renaming for a wildcard import. This revealed a bug in both the Fortran backend (we threw-away any symbol renaming if it was a wildcard import) and in SymbolTable.resolve_imports.

I've also replaced the ad-hoc dep. analysis in KernelModuleInlineTrans by using VariablesAccessInfo. However, that was a quick hack and needs tidying and testing.

@arporter
Copy link
Member Author

arporter commented Feb 24, 2025

Missing declaration of kdim is due to inlining of lu_invmat() into obs_int_h2d_pol():

   SUBROUTINE lu_invmat( pmatin, kdim, pmatou )
      INTEGER, INTENT(IN) :: &
         & kdim             ! Array dimension
      REAL(KIND=wp), DIMENSION(kdim,kdim), INTENT(IN) :: &
         & pmatin 
      REAL(KIND=wp), DIMENSION(kdim,kdim), INTENT(OUT) :: &
         & pmatou 

We must have failed to substitute the actual argument in place of kdim in the symbol definitions. As it happens, in this particular case kdim is a Literal. However, if the actual argument was written between the declarations and the call to lu_invmat() then we would get this wrong.
EDIT: the problem here is actually that we appear to be adding declarations for these variables at the call site, even though they are dummy arguments (and thus replaced by actual arguments). However, if they were local, automatic arrays then my original diagnosis still stands and we have a bug.

@arporter
Copy link
Member Author

The output I get when running PSyclone is:

Transforming obs_int_h2d_pol with acc kernels
Inlined routine 'lu_invmat'
Transforming subroutine: bil_wgt
Transforming bil_wgt with acc kernels
Transforming subroutine: lu_invmat
Transforming lu_invmat with acc kernels
Inlined routine 'lu_decomp'

and lu_invmat calls lu_decomp so it seems likely that the recursive inlining is to blame here. Highly likely to be because of nested scopes.

@arporter
Copy link
Member Author

The problem is in the inlined declaration of a local, automatic array - its properties are not updated. If we inline the call to sub in the following:

       subroutine main()
          real, dimension(10, 10) :: var = 0.0
          call sub(var, 10)
       end subroutine main
       subroutine sub(x, ilen)
          integer, intent(in) :: ilen
          real, dimension(ilen, ilen), intent(inout) :: x
          real, dimension(ilen, ilen) :: work
          work = 2.0
          x(:,:) = x(:,:) + work(:,:)
       end subroutine sub

then we get:

  subroutine main()
    real, dimension(10,10), save :: var = 0.0
    real, dimension(ilen,ilen) :: work

    work = 2.0
    var(:,:) = var(:,:) + work(:,:)

  end subroutine main

@arporter
Copy link
Member Author

Testing for real with NEMO main has revealed more problems, especially around module-inlining a routine that is called from different routines in the same container. I thought I'd already handled this but can't see any trace of that. Since module-inlining essentially modifies the parent Container, possibly this transformation should handle all calls to a given Routine at the same time. This would solve my current problem where I need to work out whether a Routine that exists inside a Container is the same one that is being called from some other location (via an import).

@arporter
Copy link
Member Author

Am hitting a weird problem with some weird Fortran. PSyclone generates:

module oce_sed
  use dom_oce, only : adatrj=>adatrj, e3t_1d=>e3t_1d, gdepw_1d=>gdepw_1d, glamt=>glamt, gphit=>gphit, mbkt=>mbkt, ndastp=>ndastp, &
&nyear=>nyear, rn_dt=>rn_dt, tmask=>tmask, wp=>glamt

Note that wp is being renamed to glamt. This is obtained by processing:

MODULE oce_sed
   USE par_sed
   USE par_trc , ONLY : rtrn  => rtrn
   USE par_pisces
   USE timing

   USE dom_oce , ONLY :   glamt     =>   glamt          !: longitude of t-point (degre)
   USE dom_oce , ONLY :   gphit     =>   gphit          !: latitude  of t-point (degre)
   USE dom_oce , ONLY :   e3t_1d    =>   e3t_1d         !: reference depth of t-points (m)
   USE dom_oce , ONLY :   gdepw_1d  =>   gdepw_1d       !: reference depth of t-points (m)
   USE dom_oce , ONLY :   mbkt      =>   mbkt           !: vertical index of the bottom last T- ocea

where the original code doesn't reference wp at all. (I don't understand why it's doing all that 'renaming' without chaning the name but that's beside the point.)

@arporter
Copy link
Member Author

arporter commented Feb 28, 2025

This bug is not present on master and only occurs if I apply acc_kernels_trans.py.

@arporter
Copy link
Member Author

There were two issues: in resolve_imports I was automatically adding any symbols that an imported symbol depended on - not respecting whether there was a wildcard import. The second issue was that in doing that, I was copying the ImportInterface of the original symbol (and thus any renaming) whereas I really wanted a new ImportInterface pointing to the same container as the original symbol.

@arporter
Copy link
Member Author

in function `nemogcm_nemo_dealloc_':
Projects/NEMO/NEMO/tests/BENCH_ACC_KERNELS_NVHPC/BLD_SCT_PSYCLONE/obj/nemogcm.f90:491:(.text+0x28b8): undefined reference to `trc_dealloc_'

where nemo_dealloc was originally:

      USE dom_oce   , ONLY : dom_oce_dealloc
      USE trc_oce   , ONLY : trc_oce_dealloc
      USE bdy_oce   , ONLY : bdy_oce_dealloc
      USE sbc_ice   , ONLY : sbc_ice_dealloc
      !!----------------------------------------------------------------------
      CALL     oce_dealloc()    ! ocean
      CALL dom_oce_dealloc()    ! ocean domain
      CALL zdf_oce_dealloc()    ! ocean vertical physics
      CALL trc_oce_dealloc()    ! shared TRC / TRA arrays
      CALL bdy_oce_dealloc()    ! bdy masks (incl. initialization)
      CALL sbc_oce_dealloc()
      CALL sbc_ice_dealloc()
      CALL top_dealloc()

and after processing is:

    use bdy_oce, only : bdy_oce_dealloc
    use dom_oce, only : dom_oce_dealloc
    use sbc_ice, only : sbc_ice_dealloc
    use trc_oce, only : trc_oce_dealloc
    use trdtrc_oce, only : trd_trc_oce_dealloc

    call oce_dealloc()
    call dom_oce_dealloc()
    call zdf_oce_dealloc()
    call trc_oce_dealloc()
    call bdy_oce_dealloc()
    call sbc_oce_dealloc()
    call sbc_ice_dealloc()
    call trc_dealloc()
    call trd_trc_oce_dealloc()

top_dealloc() was originally (in trcini.f90):

   SUBROUTINE top_dealloc()
      USE trdtrc_oce    , ONLY:   trd_trc_oce_dealloc
      CALL trc_dealloc()  
      CALL trd_trc_oce_dealloc()
   END SUBROUTINE top_dealloc

and trd_trc_oce_dealloc is an empty subroutine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress NEMO Issue relates to the NEMO domain NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants