-
Notifications
You must be signed in to change notification settings - Fork 61
Labels
Description
Motivation
Our technical debt is starting to grow, and it is starting to cause some problems downstream. We should tackle these concerns rather quickly, to hopefully make future development easier down the line.
The DEBT
Build system
- We should transition to pyproject.toml rather soon. Transition to pyproject.toml, Ruff, and mypy #1832
- We should remove unnecessary dependencies. Transition to pyproject.toml, Ruff, and mypy #1832
- The only real dependencies are torch and mpi4py, the rest are only development dependencies
- Should simplify testing
- Should simplify downstream build systems like conda, spack and easybuild
- Transition to https://docs.astral.sh/uv/? I don’t see at as necessary, but might be helpful.
CI
We have far too many actions and pipeline steps that I’m not sure what they do anymore. The current state is the following.
- Github
- CI
- CIBASE:
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CIBase.yml ****
- On push to main and stable
- Triggers pipeline on ci pipeline on codebase
- CISupport
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CISupport.yml
- On push to support/** branches (When is this used?)
- Not hardened
- Triggers codebase ci
- CommentPR
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CommentPR.yml
- Triggers after ReceivePR
- Annoying!!! Way to many thank you for the PR messages. Clutters the conversation.
- Uses some artifacts to comment on the right PR.
- Triggers the codebase ci. Again!
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/ReceivePR.yml
- Triggers after opened, sync, reopened, ready for review (Far too many triggers)
- Runs test on Github, (non-matrix tests), only for python 3.10, and openmpi
- Creates artifact with PR number (for later use in Comment PR)
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/ci.yaml
- Those are the matrix tests
- Trigger on an approved review
- CIBASE:
- Security
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/codeql.yml
- Github CodeQL check. Looks fine.
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/dependency-review.yml
- Github dependency review
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/codeql.yml
- Code/Repo Quality
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/scorecard.yml
- Checks the quality of the repository according to the Open Source Security Foundation
- Gives us a badge (Scorecard, similar to fair-software and best practices badge)
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/scorecard.yml
- Benchmarks
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/bench_trigger.yml
- Triggers benchmark on codebase
- Triggers on push request to main, or on changes to a PR
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/bench_trigger.yml
- QL and Workflows
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/backport.yml
- Creates PR’s targeting stable branch based on PR tags.
- Triggers after pull request is closed and merged.
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/create-branch-on-assignment.yml
- Create branch on assignment (Duh!)
- I guess it’s fine, I don’t particularly like it
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/inactivity.yml
- Marks issues and PR’s as stale.
- Runs regularly
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/latest-pytorch-support.yml
- Triggers manually
- Checks if there is a new pytorch version, and creates a PR with update dependencies if there is.
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/pytorch-latest-release.yml
- Triggered manually
- Check the latest pytorch version, creates a new branch
- It could be merged with latest-pytorch-support.yml
- It could be triggered with a cronjob
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/markdown-links-check.yml
- Not really aware that this is necessary.
- Check should be more visible.
- Alternatively, maybe remove
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/release-drafter.yml
- Drafts changelogs and labels PR’s automatically
- Triggers on pull requests changes
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/release-prep.yml
- Creates Pull requests based on the targeted release
- Triggered manually
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/backport.yml
- Deploy/Release/Build
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/docker.yml
- Builds docker containers
- Outdated, should be deleted or updated
- Triggers manually
- https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/docker.yml
- CI
- Codebase
We should aim to have fewer actions, more clearly Cx steps, and better, more complete tests
3 levels of tests:
- Quick
- After every push on a PR
- CPU only
- Few mpi configurations
- Full
- After review and merges to main
- Matrix tests
- Python versions
- Torch versions
- MPI implementations
- On HPC hardware
- Both CPU and GPU
- Multiple mpi configurations
- Benchmarks
- After every push to main
- When requested on a PR
- GPU, fixed mpi configuration
- Ideally, HPC
CI Events by triggers
| Manually | Cron | Issue Assignment | PR Open | PR Push | Review (Approved) | PR Closed | Push Main |
|---|---|---|---|---|---|---|---|
| Docker | CodeQL | Create branch | Thanks for the PR! | CI Quick | CI Full | Backport | CI Full |
| Pytorch latests release | Dependecy Review | Release drafter | Benchmark (if labeled) | Benchmarks | |||
| Release prep | Inactivity | ||||||
| Scorecard | |||||||
| Pytorch latest release | |||||||
| Markdown link check |
Organize repo into actions and workflows, with actions being reusable on multiple workflows.
Code Quality and testing
- Introduce static typing analysis (Mypy)
- Ruff
- Testing improvements
- Pytest as default
- Parametrized tests
- Introduce hypothesis
- There are some nice utility functions within the BaseTestClass, we should pay more attention to it.
- Testing on HPC
- Testing multiple MPI
Code
- Array API
- Clearer communication layer
- Heat Communicator vs MPI Communicator
- Unclear distinction within the code.
- Heat Communicator works with DNDArray, meant to be used by the users.
- MPI Communicator gets torch Tensors, meant for the heat developers.
- Maybe we only really need one, not sure how much the users are using the communicators. MPI Communicator can become just a more generic communicator for torch Tensors that is used within the rest of the heat library, and the mpi4py communicator is ONLY used inside the MPICommunicator class ideally.
- More hardware/software stack agnostic implementation
- Already in the works in More generic check for CUDA-aware MPI #1793
- Some work started by the MPS PR Support Apple MPS acceleration #1129
- Advanced indexing
Docs
Not even sure, probably so outdated that is painful.
Ideas:
- Notebooks as documentation
- Slides and presentations
- Papers
Reactions are currently unavailable