Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC - linter: missing kind specifier for real literals and possible fix via git patch #334

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MichaelSt98
Copy link
Collaborator

@MichaelSt98 MichaelSt98 commented Jun 17, 2024

What it does/How it works:

  • read the original Fortran file
  • find all real literals (including its nodes with_ir_node=True)
  • if literal.kind is None: fix kind and save in dict
  • apply dict via SubstituteExpressions to the relevant node (only!)
  • substitute in the Fortran file representation/string the newly generated node with the old node (using node.source.string)
  • generate diff/patch using difflib and write to file
  • (git apply this file/patch)

Demonstration with CLOUDSC:

With this config file

basedir: .
include:
  - src/cloudsc_fortran/*.F90

rules:
  - MissingKindSpecifierRealLiterals

junitxml_file: linter.xml
violations_file: test_new_violations.yml

max_workers: 4

applied to the CLOUDSC dwarf via e.g., loki-lint.py check -c lint_config.yml --fix --worker 4
generates a file "loki_lint_CLOUDSC.patch" which looks like

--- a/src/cloudsc_fortran/cloudsc.F90
+++ b/src/cloudsc_fortran/cloudsc.F90
@@ -2048,11 +2048,11 @@
         ! otherwise faster to freeze (snow or ice pellets)
         ZQPRETOT(JL) = MAX(ZQX(JL,JK,NCLDQS)+ZQX(JL,JK,NCLDQR),ZEPSEC)
         PRAINFRAC_TOPRFZ(JL) = ZQX(JL,JK,NCLDQR)/ZQPRETOT(JL)
-        IF (PRAINFRAC_TOPRFZ(JL) > 0.8) THEN 
-          LLRAINLIQ(JL) = .True.
+        IF (PRAINFRAC_TOPRFZ(JL) > 0.8_JPRB) THEN
+          LLRAINLIQ(JL) = .true.
         ELSE
-          LLRAINLIQ(JL) = .False.
-        ENDIF
+          LLRAINLIQ(JL) = .false.
+        END IF
       ENDIF

       ! If temperature less than zero
@@ -2382,15 +2382,15 @@

       ZAPLUSB   = RCL_APB1*ZVPICE-RCL_APB2*ZVPICE*ZTP1(JL,JK)+ &
      &             PAP(JL,JK)*RCL_APB3*ZTP1(JL,JK)**3
-      ZCORRFAC  = (1.0/ZRHO(JL))**0.5
-      ZCORRFAC2 = ((ZTP1(JL,JK)/273.0)**1.5)*(393.0/(ZTP1(JL,JK)+120.0))
+      ZCORRFAC = (1.0_JPRB / ZRHO(JL))**0.5_JPRB
+      ZCORRFAC2 = ((ZTP1(JL, JK) / 273.0_JPRB)**1.5_JPRB)*(393.0_JPRB / (ZTP1(JL, JK) + 120.0_JPRB))

       ZPR02 = ZRHO(JL)*ZPRECLR*RCL_CONST1S/(ZTCG*ZFACX1S)

       ZTERM1 = (ZQSICE(JL,JK)-ZQE)*ZTP1(JL,JK)**2*ZVPICE*ZCORRFAC2*ZTCG* &
      &          RCL_CONST2S*ZFACX1S/(ZRHO(JL)*ZAPLUSB*ZQSICE(JL,JK))
-      ZTERM2 = 0.65*RCL_CONST6S*ZPR02**RCL_CONST4S+RCL_CONST3S*ZCORRFAC**0.5 &
-     &          *ZRHO(JL)**0.5*ZPR02**RCL_CONST5S/ZCORRFAC2**0.5
+      ZTERM2 = 0.65_JPRB*RCL_CONST6S*ZPR02**RCL_CONST4S + RCL_CONST3S*ZCORRFAC**0.5_JPRB*ZRHO(JL)**0.5_JPRB*ZPR02**RCL_CONST5S /  &
+      & ZCORRFAC2**0.5_JPRB

       ZDPEVAP = MAX(ZCOVPCLR(JL)*ZTERM1*ZTERM2*PTSPHY,0.0_JPRB)

and can be applied via git apply --whitespace=fix loki_lint_*.patch

resulting in git diff

diff --git a/src/cloudsc_fortran/cloudsc.F90 b/src/cloudsc_fortran/cloudsc.F90
index 17f7144..035bf07 100644
--- a/src/cloudsc_fortran/cloudsc.F90
+++ b/src/cloudsc_fortran/cloudsc.F90
@@ -2048,11 +2048,11 @@ DO JK=NCLDTOP,KLEV
         ! otherwise faster to freeze (snow or ice pellets)
         ZQPRETOT(JL) = MAX(ZQX(JL,JK,NCLDQS)+ZQX(JL,JK,NCLDQR),ZEPSEC)
         PRAINFRAC_TOPRFZ(JL) = ZQX(JL,JK,NCLDQR)/ZQPRETOT(JL)
-        IF (PRAINFRAC_TOPRFZ(JL) > 0.8) THEN 
-          LLRAINLIQ(JL) = .True.
+        IF (PRAINFRAC_TOPRFZ(JL) > 0.8_JPRB) THEN
+          LLRAINLIQ(JL) = .true.
         ELSE
-          LLRAINLIQ(JL) = .False.
-        ENDIF
+          LLRAINLIQ(JL) = .false.
+        END IF
       ENDIF
     
       ! If temperature less than zero
@@ -2382,15 +2382,15 @@ ENDIF ! on IEVAPRAIN
 
       ZAPLUSB   = RCL_APB1*ZVPICE-RCL_APB2*ZVPICE*ZTP1(JL,JK)+ &
      &             PAP(JL,JK)*RCL_APB3*ZTP1(JL,JK)**3
-      ZCORRFAC  = (1.0/ZRHO(JL))**0.5
-      ZCORRFAC2 = ((ZTP1(JL,JK)/273.0)**1.5)*(393.0/(ZTP1(JL,JK)+120.0))
+      ZCORRFAC = (1.0_JPRB / ZRHO(JL))**0.5_JPRB
+      ZCORRFAC2 = ((ZTP1(JL, JK) / 273.0_JPRB)**1.5_JPRB)*(393.0_JPRB / (ZTP1(JL, JK) + 120.0_JPRB))
 
       ZPR02 = ZRHO(JL)*ZPRECLR*RCL_CONST1S/(ZTCG*ZFACX1S)
 
       ZTERM1 = (ZQSICE(JL,JK)-ZQE)*ZTP1(JL,JK)**2*ZVPICE*ZCORRFAC2*ZTCG* &
      &          RCL_CONST2S*ZFACX1S/(ZRHO(JL)*ZAPLUSB*ZQSICE(JL,JK))
-      ZTERM2 = 0.65*RCL_CONST6S*ZPR02**RCL_CONST4S+RCL_CONST3S*ZCORRFAC**0.5 &
-     &          *ZRHO(JL)**0.5*ZPR02**RCL_CONST5S/ZCORRFAC2**0.5
+      ZTERM2 = 0.65_JPRB*RCL_CONST6S*ZPR02**RCL_CONST4S + RCL_CONST3S*ZCORRFAC**0.5_JPRB*ZRHO(JL)**0.5_JPRB*ZPR02**RCL_CONST5S /  &
+      & ZCORRFAC2**0.5_JPRB
 
       ZDPEVAP = MAX(ZCOVPCLR(JL)*ZTERM1*ZTERM2*PTSPHY,0.0_JPRB)
  

Also tested for "ifs-source" with "include: arpifs/phys_ec/**/*.F90".

Some notes:

  • we could also directly substitute within the Fortran file and don't use the git patch approach
  • just a POC to show how we could fix things and minimise the diff (as we don't use fgen() for all of the Fortran file/subroutine)
    • diff is still relatively large for the transformed/fixed nodes (white space, indentation, new lines, ...)

Current limitations:

  • problematic for multiple Rules/Fixes that apply to the same node (not tested but this won't work)
    • we would need to have a common dict/map per node for all the rules before applying SubstituteExpressions

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/334/index.html

Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, so this looks great and I really like the overall concept. As discussed offline, I think there are a few "next steps" to generalise this and make this a bit more usable in practice:

  • A generalisation of the diff-regex-per-node mechanism to generate patches should be provided, likely as part of the loki.lint sub-package
  • The false-positive generated by the .true. vs. .TRUE. in the adjoining lines shows that we need to make the fgen backend a bit more configurable (eg. via something like a pre-defined "style" dict; with this we could have a dedicated IFS-style that keeps the noise in the diffs low.
  • The per-node expression-substitution limits this to only local changes like the kind-identifier, but in general this should be fine for now. Further extension can come later (if needed).

As a proof-of-concept this looks great though -very cool!

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: This is fantastic and I'm positively surprised how relatively straightforward the implementation appears to be. The testing on our Fortran codes also makes this look very promising.

In terms of the implementation:

  • I think we might want to think a little more about the association of IR containers with the corresponding Sourcefile object. We should try to somehow provide a mechanism of providing this upward link, to make the additional argument that you had to introduce unnecessary. Moreover, the source object of the Sourcefile should also have the full string already, which would make it redundant to read the file again.
  • The mechanics of creating a diff from replacing individual source lines should likely become a set of independent utilities. Maybe we need to add a second use case to trial what the interface of such utilities should look like in a more generic sense?
  • We should think of how we are going to embed this in the larger linter control flow. Right now you've hijacked the fixer method, which was originally intended for editing the IR and later dumping the IR to file. We could make the "generate a diff instead"-mode a generic variant of that, or create a secondary fixer workflow side-by-side, or switch the entire fixer workflow to the diff methodology. I don't have a strong opinion on this (yet?) but probably worth a discussion, also with @mlange05 .

@MichaelSt98
Copy link
Collaborator Author

As discussed offline: This is fantastic and I'm positively surprised how relatively straightforward the implementation appears to be. The testing on our Fortran codes also makes this look very promising.

In terms of the implementation:

  • I think we might want to think a little more about the association of IR containers with the corresponding Sourcefile object. We should try to somehow provide a mechanism of providing this upward link, to make the additional argument that you had to introduce unnecessary. Moreover, the source object of the Sourcefile should also have the full string already, which would make it redundant to read the file again.
  • The mechanics of creating a diff from replacing individual source lines should likely become a set of independent utilities. Maybe we need to add a second use case to trial what the interface of such utilities should look like in a more generic sense?
  • We should think of how we are going to embed this in the larger linter control flow. Right now you've hijacked the fixer method, which was originally intended for editing the IR and later dumping the IR to file. We could make the "generate a diff instead"-mode a generic variant of that, or create a secondary fixer workflow side-by-side, or switch the entire fixer workflow to the diff methodology. I don't have a strong opinion on this (yet?) but probably worth a discussion, also with @mlange05 .

Thanks for having a look and I completely agree. The "hijacking fixer method"-approach was appropriate to test the idea/approach, but was never intended to be a proper solution.

I also don't have a strong opinion on whether we should introduce a "generate a diff instead"-mode or whether we should create a secondary fixed workflow ... or rather I can think of advantages and disadvantages for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants