Skip to content

[RFC] Technical Debt TODO #1874

@JuanPedroGHM

Description

@JuanPedroGHM

Motivation

Our technical debt is starting to grow, and it is starting to cause some problems downstream. We should tackle these concerns rather quickly, to hopefully make future development easier down the line.

The DEBT

Build system

CI

We have far too many actions and pipeline steps that I’m not sure what they do anymore. The current state is the following.

We should aim to have fewer actions, more clearly Cx steps, and better, more complete tests

3 levels of tests:

  • Quick
    • After every push on a PR
    • CPU only
    • Few mpi configurations
  • Full
    • After review and merges to main
    • Matrix tests
      • Python versions
      • Torch versions
      • MPI implementations
    • On HPC hardware
    • Both CPU and GPU
    • Multiple mpi configurations
  • Benchmarks
    • After every push to main
    • When requested on a PR
    • GPU, fixed mpi configuration
    • Ideally, HPC

CI Events by triggers

Manually Cron Issue Assignment PR Open PR Push Review (Approved) PR Closed Push Main
Docker CodeQL Create branch Thanks for the PR! CI Quick CI Full Backport CI Full
Pytorch latests release Dependecy Review Release drafter Benchmark (if labeled) Benchmarks
Release prep Inactivity
Scorecard
Pytorch latest release
Markdown link check

Organize repo into actions and workflows, with actions being reusable on multiple workflows.

Code Quality and testing

  • Introduce static typing analysis (Mypy)
  • Ruff
  • Testing improvements
    • Pytest as default
    • Parametrized tests
    • Introduce hypothesis
      • There are some nice utility functions within the BaseTestClass, we should pay more attention to it.
    • Testing on HPC
    • Testing multiple MPI

Code

  • Array API
  • Clearer communication layer
    • Heat Communicator vs MPI Communicator
    • Unclear distinction within the code.
      • Heat Communicator works with DNDArray, meant to be used by the users.
      • MPI Communicator gets torch Tensors, meant for the heat developers.
      • Maybe we only really need one, not sure how much the users are using the communicators. MPI Communicator can become just a more generic communicator for torch Tensors that is used within the rest of the heat library, and the mpi4py communicator is ONLY used inside the MPICommunicator class ideally.
  • More hardware/software stack agnostic implementation
  • Advanced indexing

Docs

Not even sure, probably so outdated that is painful.

Ideas:

  • Notebooks as documentation
  • Slides and presentations
  • Papers

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions