Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitise: New transformation sub-package and some refactoring #433

Merged
merged 9 commits into from
Nov 12, 2024

Conversation

mlange05
Copy link
Collaborator

@mlange05 mlange05 commented Nov 8, 2024

This PR creates a new transformation sub-package for sanitisation and then refactors and improves several of the existing utilities for easier inclusion in external pipelines.

In particular it separate associate-handling from sequence-association resolution, and adds a new utility for user-driven expression substitution. Each of those are now available as a single Transformation object and have been combined into a combined SanitisePipeline, each of which can now be used via external config constructors. The existing SanitiseTransformation is retained for backward compatibility.

In some more detail:

  • Created a new sub-package loki.transformations.sanitise and moved according tests. In this sub-package, existing utilities for associates and sequence association have been separated
  • Refactored sequence association resolver as an in-place Transformer to avoid awkward map handling. This requires some sub-classing in specific further use-case in transformations.inline.
  • Provide AssociateTransformation to offer full or partial associate resolution together with the new associate-merging capabilities.
  • Provide sequence association resolution as a standalone Transformation, which entails some method renaming
  • Provide new SubstituteExpressionTransformation to expose string-based expression replacement
  • Combine all of the above into a SanitisePipeline, which is intended as the long-term replacement for SanitiseTransformation - the latter, however, is retained for external backward compatibility
  • Small tweaks to test (skip OMNI instead of xfail; it's faster and less messy in logs)

Copy link

github-actions bot commented Nov 8, 2024

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/433/index.html

@mlange05 mlange05 force-pushed the naml-sanitise-refactoring branch 4 times, most recently from 039cc1d to 2a5283d Compare November 8, 2024 16:03
This now separates associates and sequence assoction utilities and
defines the `SanitiseTransformation` in the sub-pacakge root.
Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 99.56897% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.16%. Comparing base (b12c834) to head (2bdbe34).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
.../transformations/sanitise/sequence_associations.py 97.72% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #433      +/-   ##
==========================================
+ Coverage   93.13%   93.16%   +0.02%     
==========================================
  Files         200      205       +5     
  Lines       39702    39832     +130     
==========================================
+ Hits        36978    37109     +131     
+ Misses       2724     2723       -1     
Flag Coverage Δ
lint_rules 96.39% <100.00%> (ø)
loki 93.11% <99.56%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mlange05 mlange05 force-pushed the naml-sanitise-refactoring branch 2 times, most recently from be64ce2 to 4fb1bc6 Compare November 9, 2024 05:26
@mlange05 mlange05 marked this pull request as ready for review November 9, 2024 07:59
Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this cleanup! Having access to some of these utilities directly via scheduler configs will be very helpful indeed👌 I just have two small comments on extending the test coverage of the SequenceAssociationTransformer, one of which is optional but I feel the other is an important use case to cover.

else:
for s, lower in zip(arg.shape[:n_dims], arg.dimensions[:n_dims]):
if isinstance(s, RangeIndex):
new_dims += [RangeIndex((lower, s.stop))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably quite a common use case, and so could the tests please be extended to cover this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is and it was not covered in Santeri's original testing. I've slightly refactored his test and added specific cases for this. Please check if this now covers it.


if not arg.shape:
# Hack: If we don't have a shape, short-circuit here
new_dims = tuple(RangeIndex((None, None)) for _ in dummy.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action] this line is also not covered by the testing, but I suspect we would only arrive here from incomplete enrichment of imported symbols, so probably no pressing need to extend the tests to cover this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, this one snuck in by accident (I needed it in one of the dev branches). I've now also added a test for this, so please see if this is ok.

Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for covering the missing lines, this is now good to go! 🙏

@mlange05 mlange05 added the ready for merge This PR has been approved and is ready to be merged label Nov 12, 2024
@mlange05 mlange05 merged commit 358fb60 into main Nov 12, 2024
13 checks passed
@mlange05 mlange05 deleted the naml-sanitise-refactoring branch November 12, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants