[RFC] Technical Debt TODO

## Motivation

Our technical debt is starting to grow, and it is starting to cause some problems downstream. We should tackle these concerns rather quickly, to hopefully make future development easier down the line.

## The DEBT

### Build system

- We should transition to pyproject.toml rather soon. https://github.com/helmholtz-analytics/heat/pull/1832
- We should remove unnecessary dependencies. https://github.com/helmholtz-analytics/heat/pull/1832
    - The only real dependencies are **torch** and **mpi4py,** the rest are only development dependencies
    - Should simplify testing
    - Should simplify downstream build systems like conda, spack and easybuild
        - https://github.com/spack/spack/pull/47710
        - https://github.com/conda-forge/heat-feedstock/pull/52
- Transition to https://docs.astral.sh/uv/? I don’t see at as necessary, but might be helpful.

### CI

We have far too many actions and pipeline steps that I’m not sure what they do anymore. The current state is the following.

- Github
    - **CI**
        - **CIBASE:**
            - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CIBase.yml ****
            - On push to main and stable
            - Triggers pipeline on ci pipeline on codebase
        - **CISupport**
            - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CISupport.yml
            - On push to support/** branches (When is this used?)
            - Not hardened
            - Triggers codebase ci
        - **CommentPR**
            - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/CommentPR.yml
            - Triggers after **ReceivePR**
            - Annoying!!! Way to many thank you for the PR messages. Clutters the conversation.
            - Uses some artifacts to comment on the right PR.
            - Triggers the codebase ci. Again!
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/ReceivePR.yml
            - Triggers after opened, sync, reopened, ready for review (Far too many triggers)
            - Runs test on Github, (non-matrix tests), only for python 3.10, and openmpi
            - Creates artifact with PR number (for later use in Comment PR)
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/ci.yaml
            - Those are the matrix tests
            - Trigger on an approved review
    - **Security**
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/codeql.yml
            - Github CodeQL check. Looks fine.
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/dependency-review.yml
            - Github dependency review
    - **Code/Repo Quality**
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/scorecard.yml
            - Checks the quality of the repository according to the **Open Source Security Foundation**
            - Gives us a badge (Scorecard, similar to fair-software and best practices badge)
    - **Benchmarks**
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/bench_trigger.yml
            - Triggers benchmark on codebase
            - Triggers on push request to main, or on changes to a PR
    - **QL and Workflows**
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/backport.yml
            - Creates PR’s targeting stable branch based on PR tags.
            - Triggers after pull  request is closed and merged.
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/create-branch-on-assignment.yml
            - Create branch on assignment (Duh!)
            - I guess it’s fine, I don’t particularly like it
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/inactivity.yml
            - Marks issues and PR’s as stale.
            - Runs regularly
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/latest-pytorch-support.yml
            - Triggers manually
            - Checks if there is a new pytorch version, and creates a PR with update dependencies if there is.
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/pytorch-latest-release.yml
            - Triggered manually
            - Check the latest pytorch version, creates a new branch
            - It could be merged with **latest-pytorch-support.yml**
            - It could be triggered with a cronjob
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/markdown-links-check.yml
            - Not really  aware that this is necessary.
            - Check should be more visible.
            - Alternatively, maybe remove
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/release-drafter.yml
            - Drafts changelogs and labels PR’s automatically
            - Triggers on pull requests changes
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/release-prep.yml
            - Creates Pull requests based on the targeted release
            - Triggered manually
    - **Deploy/Release/Build**
        - https://github.com/helmholtz-analytics/heat/blob/main/.github/workflows/docker.yml
            - Builds docker containers
            - Outdated, should be deleted or updated
            - Triggers manually
- Codebase
    - https://codebase.helmholtz.cloud/helmholtz-analytics/cb **CB**
    - [https://codebase.helmholtz.cloud/helmholtz-analytics/ci](https://codebase.helmholtz.cloud/helmholtz-analytics/ci) **CI**

We should aim to have fewer actions, more clearly Cx steps, and better, more complete tests

**3 levels of tests:**

- Quick
    - After every push on a PR
    - CPU only
    - Few mpi configurations
- Full
    - After review and merges to main
    - Matrix tests
        - Python versions
        - Torch versions
        - MPI implementations
    - On HPC hardware
    - Both CPU and GPU
    - Multiple mpi configurations
- Benchmarks
    - After every push to main
    - When requested on a PR
    - GPU, fixed mpi configuration
    - Ideally, HPC

**CI Events by triggers**

| **Manually** | **Cron** | **Issue Assignment** | **PR Open** | **PR Push** | **Review (Approved)** | **PR Closed** | **Push Main** |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Docker | CodeQL | Create branch | Thanks for the PR! | CI Quick | CI Full | Backport | CI Full |
| Pytorch latests release | Dependecy Review |  | Release drafter | Benchmark (if labeled) |  |  | Benchmarks |
| Release prep | Inactivity |  |  |  |  |  |  |
|  | Scorecard |  |  |  |  |  |  |
|  | Pytorch latest release |  |  |  |  |  |  |
|  | Markdown link check |  |  |  |  |  |  |

Organize repo into actions and workflows, with actions being reusable on multiple workflows.

## Code Quality and testing

- Introduce static typing analysis (Mypy)
- Ruff
- Testing improvements
    - Pytest as default
    - Parametrized tests
    - Introduce hypothesis
        - There are some nice utility functions within the BaseTestClass, we should pay more attention to it.
    - Testing on HPC
    - Testing multiple MPI

## Code

- Array API
- Clearer communication layer
    - Heat Communicator vs MPI Communicator
    - Unclear distinction within the code.
        - Heat Communicator works with DNDArray, meant to be used by the users.
        - MPI Communicator gets torch Tensors, meant for the heat developers.
        - Maybe we only really need one, not sure how much the users are using the communicators. MPI Communicator can become just a more generic communicator for torch Tensors that is used within the rest of the heat library, and the mpi4py communicator is **ONLY** used inside the MPICommunicator class ideally.
- More hardware/software stack agnostic implementation
    - Already in the works in https://github.com/helmholtz-analytics/heat/pull/1793
    - Some work started by the MPS PR https://github.com/helmholtz-analytics/heat/pull/1129
- Advanced indexing

## Docs

Not even sure, probably so outdated that is painful.

Ideas:

- Notebooks as documentation
- Slides and presentations
- Papers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Technical Debt TODO #1874

Motivation

The DEBT

Build system

CI

Code Quality and testing

Code

Docs

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Manually	Cron	Issue Assignment	PR Open	PR Push	Review (Approved)	PR Closed	Push Main
Docker	CodeQL	Create branch	Thanks for the PR!	CI Quick	CI Full	Backport	CI Full
Pytorch latests release	Dependecy Review		Release drafter	Benchmark (if labeled)			Benchmarks
Release prep	Inactivity
	Scorecard
	Pytorch latest release
	Markdown link check

[RFC] Technical Debt TODO #1874

Description

Motivation

The DEBT

Build system

CI

Code Quality and testing

Code

Docs

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions