From 83727cb27bf07ee5af204a86bfdf085442c2ad22 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Wed, 3 Sep 2025 13:12:48 +0200
Subject: [PATCH 01/15] Heat 1.6.0 - Release (#1957)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Update heat/core/io.py

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* added test for dndarray.info

* added tests for two uncovered exception lines

* one additional line from DMD covered

* one more line in DMD covered

* debugging

* build(deps): bump actions/setup-python from 5.4.0 to 5.5.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.4.0 to 5.5.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/42375524e23c412d93fb67b49958b491fce71c38...8d9ed9ac5c53483de85588cdf95a591a75ab9f55)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/PyCQA/flake8: 7.1.2 → 7.2.0](https://github.com/PyCQA/flake8/compare/7.1.2...7.2.0)

* build(deps): bump github/codeql-action from 3.28.12 to 3.28.13

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.12 to 3.28.13.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/5f8171a638ada777af81d42b55959a643bb29017...1b549b9259bda1cb5ddde3b41741a82a2d15a841)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* further work on eigh

* eigh completed for split = 0

* flake8

* tests for eigh, now split=none,0,1

* build(deps): bump step-security/harden-runner from 2.11.0 to 2.11.1 (#1851)

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.11.0 to 2.11.1.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/4d991eb9b905ef189e4c376166672c3f2f230481...c6295a65d1254861815972266d5933fd6e532bdf)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.11.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump actions/dependency-review-action from 4.5.0 to 4.6.0 (#1850)

Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.5.0 to 4.6.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](https://github.com/actions/dependency-review-action/compare/3b139cfc5fae8b618d3eae3675e383bb1769c019...ce3cf9537a52e8119d91fd484ab5b8a807627bf8)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 4.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* minor modifications due to Copilot's Review

* added SVD for general case

* reformatting

* tests for SVD

* tests for SVD completed

* added module _config in core which is intended to handle MPI, CUDA, and ROCm versioning

* added variable GPU_AWARE_MPI

* added MPICH

* changed method info into __repr__

* moved __repr__ to printing module

* removed dead code

* restructuring of tests

* added further test to DMDc

* small typo corrected

* adapted tolerances for last test; errors grow w.r.t. timesteps (is in the nature of DMD), so largest number of procs determines tolerance

* Update test_dmd.py

lower tolerances for the AMD-runner

* dummy commit since sth was wrong with pre-commit

* corrected tests

* debugging of tests

* Remove unnecessary contiguous calls (#1831)

* removed contiguous calls from manipulations.py

* removed the contiguous calls from linalg/qr.py

* removed unnecessary contiguous call in factories.py

* removed some more unnecessary contiguous calls

* reinstate contiguous() calls if needed

* removed the contiguous calls from linalg/qr.py

* reinstate setting Q_buf to Q_curr

---------

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* build(deps): bump github/codeql-action from 3.28.13 to 3.28.15

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.13 to 3.28.15.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/1b549b9259bda1cb5ddde3b41741a82a2d15a841...45775bd8235c68ba998cffa5171334d58593da47)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* removed debugging prints that were forgotten before merging into main

* subTest'ified the zarr tests, added some strange exception handling that is likely necessary to accomodate zarr-versions compatible with Python 3.10

* small bug fix

* bugfix in eigh

* removed unneccesary numpy import

* changed representation string according to review

* debugging of memory consumption in Polar

* bug fixes for devices in polar and eigh

* bugfixes for certain device constellations

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* Support latest PyTorch release

* Make unit tests compatible with NumPy 2.x (#1826)

* changed row_stack to vstack for numpy >= 2.0.0

* Changed the numpy version check to the numpy suggested method

* Changed numpy version requirement

* fixed DeprecationWarning from missing axes for np.fft.fftn

* Fixed one __array_wrap__ DeprecationWarning

* Stopped testing cross of vector axes with 2 elements for numpy >= 2.0

* changed requirements to avoid errors with numpy >= 2

* Using Python 3.10 for RecievePR

* changed python-version to '3.10' because 3.10 was interpreted as 3.1

* changed ci.yaml to exclude python 3.9

* Fixed two DeprecationWarnings of np.cross() by adding a third dimension with zeros

* Fixed the last np.cross() warning by performing the operation manually

* changed dtype of np.cross() to float32

---------

Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com>
Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 (#1863)

* build(deps): bump actions/setup-python from 5.5.0 to 5.6.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.5.0 to 5.6.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/8d9ed9ac5c53483de85588cdf95a591a75ab9f55...a26af69be951a213d495a4c3e4e4022e16d87065)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: 5.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Debugging

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* build(deps): bump docker/build-push-action from 6.15.0 to 6.16.0 (#1860)

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.15.0 to 6.16.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/471d1dc4e07e5cdedd4c2171150001c434f0b7a4...14487ce63c7a62a4a324b0bfb37086795e31c6c1)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 6.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump step-security/harden-runner from 2.11.1 to 2.12.0 (#1861)

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.11.1 to 2.12.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/c6295a65d1254861815972266d5933fd6e532bdf...0634a2670c59f64b4a01f0f96f84700a4088b9f0)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 3.28.15 to 3.28.17 (#1866)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.15 to 3.28.17.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/45775bd8235c68ba998cffa5171334d58593da47...60168efe1c415ce0f5521ea06d5c2062adbeed1b)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump actions/dependency-review-action from 4.6.0 to 4.7.0

Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.6.0 to 4.7.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](https://github.com/actions/dependency-review-action/compare/ce3cf9537a52e8119d91fd484ab5b8a807627bf8...38ecb5b593bf0eb19e335c03f97670f792489a8b)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 4.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* added benchmarks for eigh, svd, and rsvd

* dummy commit to trigger benchmark runs

* Support latest PyTorch release

* changed torchvision version to <0.22.1

* retrigger checks

* build(deps): bump github/codeql-action from 3.28.17 to 3.28.18

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.17 to 3.28.18.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/60168efe1c415ce0f5521ea06d5c2062adbeed1b...ff0a06e83cb2de871e5a09832bc6a81e7276941f)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.18
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump docker/build-push-action from 6.16.0 to 6.17.0

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.16.0 to 6.17.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/14487ce63c7a62a4a324b0bfb37086795e31c6c1...1dc73863535b631f98b2378be8619f83b136f4a0)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 6.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump actions/dependency-review-action from 4.7.0 to 4.7.1

Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.7.0 to 4.7.1.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](https://github.com/actions/dependency-review-action/compare/38ecb5b593bf0eb19e335c03f97670f792489a8b...da24556b548a50705dd671f47852072ea4c105d9)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 4.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* RTD Notebook gallery and profiling notebook with perun. (#1867)

* docs: notebook gallery in rtd

* docs: missing makefiles

* docs: reverted changes to gitignore

* haicore notebook setup

* ompi in readthedocs build

* correct apt package for mpi

* docs: replaced small bodies dataset with digits from sklearn (boring, but easier to access on the long term)

* perun notebook

* wrong cell type

* added pytorch 2.7 to ci workflow

* docs: post practice run fixes

* notebook thumbnails, formatting and corrections for tutorial

* forgot to uncomment autoapi

* build(deps): bump github/codeql-action from 3.28.18 to 3.28.19 (#1881)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.18 to 3.28.19.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/ff0a06e83cb2de871e5a09832bc6a81e7276941f...fca7ace96b7d713c7035871441bd52efbe39e27e)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.19
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump docker/build-push-action from 6.17.0 to 6.18.0 (#1877)

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.17.0 to 6.18.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/1dc73863535b631f98b2378be8619f83b136f4a0...263435318d21b8e681c14492fe198d362a7d2c83)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 6.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 (#1878)

Bumps [ossf/scorecard-action](https://github.com/ossf/scorecard-action) from 2.4.1 to 2.4.2.
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](https://github.com/ossf/scorecard-action/compare/f49aabe0b5af0936a0987cfb85d86b75731b0186...05b42c624433fc40578a4040d5cf5e36ddca8cde)

---
updated-dependencies:
- dependency-name: ossf/scorecard-action
  dependency-version: 2.4.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump step-security/harden-runner from 2.12.0 to 2.12.1

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.0 to 2.12.1.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/0634a2670c59f64b4a01f0f96f84700a4088b9f0...002fdce3c6a235733a90a27c80493a3241e56863)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.12.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github/codeql-action from 3.28.19 to 3.29.0

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.19 to 3.29.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/fca7ace96b7d713c7035871441bd52efbe39e27e...ce28f5bb42b7a9f2c824e633a3f6ee835bab6858)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add special methods for operations in NumPy

* add tests for NumPy related array methods

* fix variable name

* Exit installation if conda environment cannot be activated (#1880)

* exit 0_setup_conda.sh if environment cannot be activated

otherwise the script might install into the base environment

* exit 0_setup_pip.sh if environment cannot be activated

---------

Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>

* fix item access

* add contiguous call again

* same as before

* [pre-commit.ci] pre-commit autoupdate (#1894)

updates:
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](https://github.com/PyCQA/flake8/compare/7.2.0...7.3.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* bugfix in rSVD for the case the rank is smaller than number of processes

* build(deps): bump github/codeql-action from 3.29.0 to 3.29.1

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.0 to 3.29.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/ce28f5bb42b7a9f2c824e633a3f6ee835bab6858...39edc492dbe16b1465b0cafca41432d857bdb31a)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* resolved bug in rSVD, actualle one-process QR

* Support PyTorch 2.7.1 (#1883)

* Support latest PyTorch release

* Update ci.yaml

pytorch: add v2.2, drop v1.11

* debugging test_lasso

* Support latest PyTorch release

* Update bug_report.yml

Add latest versions for options

* update torchvision

* do not test latest torch in matrix

* pin zarr version

* remove dead code

* pin zarr to 3.0.8

* add back latest pytorch to matrix

* edit PR body

* Update ci.yaml

---------

Co-authored-by: ClaudiaComito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: Michael Tarnawa <m.tarnawa@fz-juelich.de>
Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com>

* build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1

Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3.10.0 to 3.11.1.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](https://github.com/docker/setup-buildx-action/compare/b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2...e468171a9de216ec08956ac3ada2f0791b6bd435)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-version: 3.11.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update polar.py

* Revert "build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1" (#1908)

* support torch_function (#1895)

Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Features/1845 Update citations  (#1846)

* Update README.md: added citation possibilities

* Update README.md: link to ZENODO

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Transition to pyproject.toml, Ruff, and mypy (#1832)

* Support latest PyTorch release

* Support latest PyTorch release

* retrigger checks

* wip: toml, ruff, mypy and cli

* ci: better mypy error filter (down to 3041)

* ci: mypy in list of dev deps

* ci: skiping mypy for now

* dependency specification

* missing argument description for permute method

* fixed dynamic version in pyproject.toml

* properly skipping mypy in the pre-commit.ci

* fixed dependencies, removed references to python 3.9, added python 3.13

* doc -> docs

* fixed tests

* cli tests

* Update .github/ISSUE_TEMPLATE/bug_report.yml

Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com>

* removed problematic pytorch version?!

* consistency is important

* did not work

* vulnerabitliy removal, other changes to toml

* It's not working :(

* fixing tests

* Update .pre-commit-config.yaml

* fix: cli does not change the default device

* fix: support for older pytorch version on the cli, limit zarr package version

* Update pytorch exclude with py-3.13

---------

Co-authored-by: ClaudiaComito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: Marc-Jindra <m.jindra@fz-juelich.de>
Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com>

* build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1 (#1911)

Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3.10.0 to 3.11.1.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](https://github.com/docker/setup-buildx-action/compare/b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2...e468171a9de216ec08956ac3ada2f0791b6bd435)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-version: 3.11.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 3.29.1 to 3.29.2 (#1910)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.1 to 3.29.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/39edc492dbe16b1465b0cafca41432d857bdb31a...181d5eefc20863364f96762470ba6f862bdef56b)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump step-security/harden-runner from 2.12.1 to 2.12.2 (#1909)

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.1 to 2.12.2.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/002fdce3c6a235733a90a27c80493a3241e56863...6c439dc8bdf85cadbbce9ed30d1c7b959517bc49)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.12.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* took review into account

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pre-commit stuff

* [pre-commit.ci] pre-commit autoupdate (#1912)

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.11.13...v0.12.3)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Updated release_prep.yml to incorporate up-to-date Dockerfile Pytorch versions (#1903)

* Create update_docker.yml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update release-prep.yml

Added the update functionality of the Dockerfile.source and Dockerfile.release Pytorch Image versions.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed update_docker.yml since the functionality was moved to release prep

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* [StepSecurity] Apply security best practices (#1891)

* [StepSecurity] Apply security best practices

Signed-off-by: StepSecurity Bot <bot@stepsecurity.io>

* Test with shellcheck-py

* Update .pre-commit-config.yaml

* shellcheck updates

* Update build_and_push.sh

* Update increment_version.sh

* Update increment_version.sh

* Update build_and_push.sh

* Update test_nvidia_image_haicore_enroot.sh

* Update test_nvidia_image_haicore_enroot.sh

* Update build_and_push.sh

* Update 0_setup_pip.sh

---------

Signed-off-by: StepSecurity Bot <bot@stepsecurity.io>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* build(deps): bump korthout/backport-action from 3.2.0 to 3.2.1

Bumps [korthout/backport-action](https://github.com/korthout/backport-action) from 3.2.0 to 3.2.1.
- [Release notes](https://github.com/korthout/backport-action/releases)
- [Commits](https://github.com/korthout/backport-action/compare/436145e922f9561fc5ea157ff406f21af2d6b363...0193454f0c5947491d348f33a275c119f30eb736)

---
updated-dependencies:
- dependency-name: korthout/backport-action
  dependency-version: 3.2.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump step-security/harden-runner from 2.12.1 to 2.13.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.1 to 2.13.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/v2.12.1...ec9f2d5744a09debf3a187a3f4f675c53b671911)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/mirrors-mypy: v1.16.1 → v1.17.0](https://github.com/pre-commit/mirrors-mypy/compare/v1.16.1...v1.17.0)
- [github.com/astral-sh/ruff-pre-commit: v0.12.3 → v0.12.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.3...v0.12.4)
- [github.com/gitleaks/gitleaks: v8.16.3 → v8.28.0](https://github.com/gitleaks/gitleaks/compare/v8.16.3...v8.28.0)

* Fix ValueError in save_zarr by conditional handling of chunks argument

Fixes error when calling zarr.create by only passing chunks as a list if not None.

* Unfix zarr version in pyproject.toml to test CI job `test_amd`

* Add note in docstring of save_zarr()

* build(deps): bump tj-actions/branch-names from 8.2.1 to 9.0.1

Bumps [tj-actions/branch-names](https://github.com/tj-actions/branch-names) from 8.2.1 to 9.0.1.
- [Release notes](https://github.com/tj-actions/branch-names/releases)
- [Changelog](https://github.com/tj-actions/branch-names/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/branch-names/compare/dde14ac574a8b9b1cedc59a1cf312788af43d8d8...386e117ea34339627a40843704a60a3bc9359234)

---
updated-dependencies:
- dependency-name: tj-actions/branch-names
  dependency-version: 9.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github/codeql-action from 3.29.2 to 3.29.4

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.2 to 3.29.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/181d5eefc20863364f96762470ba6f862bdef56b...4e828ff8d448a8a6e532957b1811f387a63867e8)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Setuptools build fix on pytproject.toml (#1919)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.4 → v0.12.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.4...v0.12.5)

* Stride argument for convolution (#1865)

* Add: enable stride as input option for convolution

Belongs to Issue 1755

Changes include
- stride as optional input
- stride in docstring (except for output dim computations and as
  example)
- raise ValueError for incorrect option (stride < 1) and combinations (stride >
  1 and mode = 'same')

* Add: Failing stride tests in test_signal.py

- Added correct results for odd and even kernel
- Added calls for convolution with stride=2 except with exception for mode `same`
- Covers is_mps true and false
- Stopped at tests for large distributed signals

* Add: Pass stride tests for standard cases

Belongs to Issue 1755

- edge cases and batch-processing not yet implemented or tested
- Fix test error
- Implement stride to pass tests
- Write convolution_stride.py script for debugging purposes, will be
  removed later

* Add: test for large signals with stride

Belongs to Issue 1755
- passes

* Add: Test cases until batch

Belongs to issue 1755
- Passes tests

* Add: Failing tests for batch convolution with stride

Belongs to Issue 1755

* Add: batch processing with stride

Belongs to Issue 1755
- Added stride to conv1D call in batch_convolution
- Passes tests

* Remove: Remove script to test convolution with stride

* Update: Docstring of convolution

Belongs to Issue 1755
- Correct stride information
- Add examples

* Fail: Tests fail for mpirun -n3 ...

- Issue with halo size
- Problem marked in code
- Not solved

* Update: Split stride tests

- Different configurations in different tests functions
- If process number == 1: all pass
- If multiple process < 3 (because tested only then), 5 fail
- This needs to be fixed, likely fails due to wrong halo in presence of
  stride

* Add: Enable stride on distributed arrays but not kernels

Adjust signal on each rank such that it starts with the next kernel
according to stride
- Added the compution of starting values for each rank
- Avoid doubles for even and odd kernels

Halo size computation is independent of stride

Added a script for debugging, will be removed

Ideas: generalize it for stride 1 (should work out of the box)

To do: Adapt for distributed kernels

* Add: Distributed Kernels and optimized start index computation

Optimized start index computation:
- Remove global index array
- Use lshap map and simple modulo operation only

Distributed kernels
- Any stride > 1 is a subset of the solution for stride 1
- Not the most efficient but at least functional

* Fix: Improve test coverage

Still missing
- Cuda code bits
- else statement beginning line 229

* Delete: conv_test.py

Test script no longer needed

* Add: Add benchmarks for signal.py

* Add: script to test benchmark

- empty so far

* Fix: Benchmarks

- Fix run_signal_benchmarks
- Add run_signal_benchmarks to main.py

Remove: print statements from convolution in signal.py

* Fix: benchmarks/cb/signal.py

- add () to monitor decorator of perun

* Fix: Rename signal.py to avoid clash with python3.12 signal

- change signal.py to heat_signal.py

* Fix: Adjust array numbers for benchmarking

* Remove: benchmark script from scripts/

* Fix: Benchmarking signal.py and testing signal.py

- Benchmark improved
- removed import pytest from test_signal.py

* Fix: test_signal.py batch convolution with stride

- Stride was not passed as a single value but different values for
  different ranks
- Solution: Do not randomly create stride values but use fixed values

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: test_convolution-stride_large_signal_and_kernel_modes

- Remove torch arrays
- Instead work with np.convolve similar to test without stride

* Fix: Add device for empty torch tensors

- Stride test fails for large arrays
- Error message says, that device does not match
- Due to large stride, potentially empty tensor creation -> add device
  to that tensor created

* Fix: torch_device instead of device

- when empty tensor is called, use torch_device

* Fix: Missed .device to previous commit

* Fix: torch device not accessible

- instead use str(ht_array.device)
- Use signal.device to get correct rank

---------

Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com>
Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* build(deps): bump tj-actions/branch-names from 9.0.1 to 9.0.2 (#1930)

Bumps [tj-actions/branch-names](https://github.com/tj-actions/branch-names) from 9.0.1 to 9.0.2.
- [Release notes](https://github.com/tj-actions/branch-names/releases)
- [Changelog](https://github.com/tj-actions/branch-names/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/branch-names/compare/386e117ea34339627a40843704a60a3bc9359234...5250492686b253f06fa55861556d1027b067aeb5)

---
updated-dependencies:
- dependency-name: tj-actions/branch-names
  dependency-version: 9.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 3.29.4 to 3.29.8 (#1933)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.4 to 3.29.8.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/4e828ff8d448a8a6e532957b1811f387a63867e8...76621b61decf072c1cee8dd1ce2d2a82d33c17ed)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump docker/login-action from 3.4.0 to 3.5.0 (#1934)

Bumps [docker/login-action](https://github.com/docker/login-action) from 3.4.0 to 3.5.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/74a5d142397b4f367a81961eba4e8cd7edddf772...184bdaa0721073962dff0199f1fb9940f07167d1)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: 3.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1931)

updates:
- [github.com/pre-commit/mirrors-mypy: v1.17.0 → v1.17.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.17.0...v1.17.1)
- [github.com/astral-sh/ruff-pre-commit: v0.12.5 → v0.12.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.5...v0.12.7)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>

* Update CODE_OF_CONDUCT.md

New email for reporting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* build(deps): bump github/codeql-action from 3.29.8 to 3.29.9 (#1943)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.8 to 3.29.9.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/76621b61decf072c1cee8dd1ce2d2a82d33c17ed...df559355d593797519d70b90fc8edd5db049e7a2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump actions/checkout from 4.2.2 to 5.0.0 (#1942)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 5.0.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/11bd71901bbe5b1630ceea73d27597364c9af683...08c6903cd8c0fde910a37f88322edcfb5dd907a8)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 5.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump korthout/backport-action from 3.2.1 to 3.3.0 (#1944)

Bumps [korthout/backport-action](https://github.com/korthout/backport-action) from 3.2.1 to 3.3.0.
- [Release notes](https://github.com/korthout/backport-action/releases)
- [Commits](https://github.com/korthout/backport-action/compare/0193454f0c5947491d348f33a275c119f30eb736...ca4972adce8039ff995e618f5fc02d1b7961f27a)

---
updated-dependencies:
- dependency-name: korthout/backport-action
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1936)

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0)
- [github.com/astral-sh/ruff-pre-commit: v0.12.7 → v0.12.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.7...v0.12.8)
- [github.com/shellcheck-py/shellcheck-py: v0.10.0.1 → v0.11.0.1](https://github.com/shellcheck-py/shellcheck-py/compare/v0.10.0.1...v0.11.0.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Update latest-pytorch-support.yml (#1950)

move to pyproject.toml

* [pre-commit.ci] pre-commit autoupdate (#1949)

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.8...v0.12.9)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Bump version to 1.6.0

* Update pytorch image in Dockerfile.release and Dockerfile.source to version

* update coverage, add link to issue search

* add FFT, masked arrays

* Update authors and contributors in CITATION.cff

* fix: typo in release drafter

* Updated Changelog

* changelog highlights

* Update micro version to 0

* Update CHANGELOG.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: StepSecurity Bot <bot@stepsecurity.io>
Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: jolemse <35252911+jolemse@users.noreply.github.com>
Co-authored-by: Hoppe <mrhf92@gmail.com>
Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Marc-Jindra <m.jindra@fz-juelich.de>
Co-authored-by: Hakdag97 <72792786+Hakdag97@users.noreply.github.com>
Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com>
Co-authored-by: Till Korten <webmaster@korten.at>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michael Tarnawa <m.tarnawa@fz-juelich.de>
Co-authored-by: StepSecurity Bot <bot@stepsecurity.io>
Co-authored-by: Berkant <51971304+Berkant03@users.noreply.github.com>
Co-authored-by: Lukas Scheib <ls.login.sl@gmail.com>
Co-authored-by: Lukas Scheib <146953413+LScheib@users.noreply.github.com>
Co-authored-by: lolacaro <carola.fischer.science@gmail.com>
Co-authored-by: Björn Hagemeier <b.hagemeier@fz-juelich.de>
Co-authored-by: Heat Release Bot <>
---
 .github/ISSUE_TEMPLATE/bug_report.yml         |    7 +-
 .github/PULL_REQUEST_TEMPLATE.md              |    1 +
 .github/rd-release-config.yml                 |   84 +-
 .github/workflows/CIBase.yml                  |    6 +-
 .github/workflows/CISupport.yml               |    7 +-
 .github/workflows/CommentPR.yml               |    6 +-
 .github/workflows/ReceivePR.yml               |   10 +-
 .github/workflows/backport.yml                |    9 +-
 .github/workflows/bench_trigger.yml           |    7 +-
 .github/workflows/changelog-updater.yml       |   55 -
 .github/workflows/ci.yaml                     |   36 +-
 .github/workflows/codeql.yml                  |   10 +-
 .../workflows/create-branch-on-assignment.yml |    4 +-
 .github/workflows/dependency-review.yml       |    6 +-
 .github/workflows/docker.yml                  |   14 +-
 .github/workflows/inactivity.yml              |    4 +-
 .github/workflows/increment_version.sh        |    4 +-
 .github/workflows/latest-pytorch-support.yml  |   16 +-
 .github/workflows/markdown-links-check.yml    |    4 +-
 .github/workflows/pytorch-latest-release.yml  |    4 +-
 .github/workflows/release-drafter.yml         |   13 +-
 .github/workflows/release-prep.yml            |   90 +-
 .github/workflows/scorecard.yml               |   10 +-
 .gitignore                                    |    1 +
 .perun.ini                                    |   23 +-
 .pre-commit-config.yaml                       |   47 +-
 .readthedocs.yaml                             |    8 +-
 CHANGELOG.md                                  |   85 +
 CITATION.cff                                  |   42 +-
 CODE_OF_CONDUCT.md                            |    2 +-
 README.md                                     |   73 +-
 benchmarks/cb/decomposition.py                |   17 +
 benchmarks/cb/heat_signal.py                  |   85 +
 benchmarks/cb/linalg.py                       |   98 +
 benchmarks/cb/main.py                         |    8 +
 coverage_tables.md                            |  763 ++--
 doc/Makefile                                  |   20 +
 doc/make.bat                                  |  268 +-
 doc/requirements.txt                          |    5 -
 .../_static}/images/GSoC-Horizontal.svg       |    0
 doc/{ => source/_static}/images/bsp.svg       |    0
 .../_static}/images/clustering.png            |  Bin
 .../_static}/images/clustering_kmeans.png     |  Bin
 doc/{ => source/_static}/images/data.png      |  Bin
 doc/{ => source/_static}/images/dlr_logo.svg  |    0
 doc/{ => source/_static}/images/fzj_logo.svg  |    0
 .../_static}/images/hSVD_bench_rank5.png      |  Bin
 .../_static}/images/hSVD_bench_rank50.png     |  Bin
 .../_static}/images/hSVD_bench_rank500.png    |  Bin
 .../_static}/images/heat_split_array.png      |  Bin
 .../_static}/images/heat_split_array.svg      |    0
 .../heatvsdask_strong_smalldata_without.png   |  Bin
 .../heatvsdask_weak_smalldata_without.png     |  Bin
 .../_static}/images/helmholtz_logo.svg        |    0
 doc/source/_static/images/jsc_logo.png        |  Bin 0 -> 16766 bytes
 doc/source/_static/images/jupyter.png         |  Bin 0 -> 22885 bytes
 doc/{ => source/_static}/images/kit_logo.svg  |    0
 doc/source/_static/images/local_laptop.png    |  Bin 0 -> 24793 bytes
 doc/{ => source/_static}/images/logo.png      |  Bin
 .../_static}/images/logo_emblem.png           |  Bin
 .../_static}/images/logo_emblem.svg           |    0
 .../_static}/images/logo_white.png            |  Bin
 .../_static}/images/logo_white.svg            |    0
 doc/source/_static/images/nhr_verein_logo.jpg |  Bin 0 -> 8167 bytes
 doc/source/_static/images/perun_logo.svg      |  112 +
 .../_static}/images/split_array.png           |  Bin
 .../_static}/images/split_array.svg           |    0
 .../_static}/images/tutorial_clustering.svg   |    0
 .../_static}/images/tutorial_dpnn.svg         |    0
 .../_static}/images/tutorial_logo.svg         |    0
 .../images/tutorial_split_dndarray.svg        |    0
 .../images/weak_scaling_gpu_terrabyte.png     |  Bin
 doc/source/case_studies.rst                   |    6 +-
 doc/source/conf.py                            |   21 +-
 doc/source/index.rst                          |    2 +-
 doc/source/tutorial_dpnn.rst                  |    4 -
 .../notebooks/0_setup/0_setup_conda.sh        |   15 +
 .../notebooks/0_setup/0_setup_haicore.ipynb   |  620 ++++
 .../notebooks/0_setup/0_setup_jsc.ipynb       |   80 +-
 .../notebooks/0_setup/0_setup_local.ipynb     |   48 +-
 .../notebooks/0_setup/0_setup_pip.sh          |   25 +
 doc/source/tutorials/notebooks/1_basics.ipynb | 3165 +++++++++++++++++
 .../tutorials/notebooks/2_internals.ipynb     | 1417 ++++++++
 .../notebooks/3_loading_preprocessing.ipynb   |  488 +++
 .../notebooks/4_matrix_factorizations.ipynb   |  258 +-
 .../tutorials/notebooks/5_clustering.ipynb    |  776 ++++
 .../tutorials/notebooks/6_profiling.ipynb     |  609 ++++
 .../{ => tutorials}/tutorial_30_minutes.rst   |    0
 .../{ => tutorials}/tutorial_clustering.rst   |    8 +-
 .../tutorials/tutorial_notebook_gallery.rst   |   25 +
 .../tutorial_parallel_computation.rst         |    4 +-
 doc/source/{ => tutorials}/tutorials.rst      |   27 +-
 docker/Dockerfile.release                     |    2 +-
 docker/Dockerfile.source                      |    2 +-
 docker/scripts/build_and_push.sh              |   32 +-
 docker/scripts/install_print_test.sh          |    4 +-
 .../test_nvidia_image_haicore_enroot.sh       |    2 +-
 heat/classification/kneighborsclassifier.py   |    3 +-
 heat/cli.py                                   |   54 +
 heat/cluster/batchparallelclustering.py       |    7 +-
 heat/cluster/kmedians.py                      |    2 +-
 heat/cluster/kmedoids.py                      |    6 +-
 .../tests/test_batchparallelclustering.py     |   14 +-
 heat/cluster/tests/test_kmeans.py             |    9 +-
 heat/cluster/tests/test_kmedians.py           |    7 +-
 heat/cluster/tests/test_kmedoids.py           |    7 +-
 heat/cluster/tests/test_spectral.py           |   85 +-
 heat/core/_operations.py                      |   17 +-
 heat/core/arithmetics.py                      |  145 +-
 heat/core/base.py                             |    8 +-
 heat/core/communication.py                    |  263 +-
 heat/core/complex_math.py                     |   12 +-
 heat/core/constants.py                        |    2 +-
 heat/core/devices.py                          |   30 +-
 heat/core/dndarray.py                         |  126 +-
 heat/core/exponential.py                      |    6 +-
 heat/core/factories.py                        |  101 +-
 heat/core/indexing.py                         |    4 +-
 heat/core/io.py                               |  415 ++-
 heat/core/linalg/__init__.py                  |    2 +
 heat/core/linalg/basics.py                    |  178 +-
 heat/core/linalg/eigh.py                      |  309 ++
 heat/core/linalg/polar.py                     |  370 ++
 heat/core/linalg/qr.py                        |  372 +-
 heat/core/linalg/solver.py                    |   13 +-
 heat/core/linalg/svd.py                       |  215 +-
 heat/core/linalg/svdtools.py                  |  525 ++-
 heat/core/linalg/tests/test_basics.py         |  321 +-
 heat/core/linalg/tests/test_eigh.py           |   55 +
 heat/core/linalg/tests/test_polar.py          |  117 +
 heat/core/linalg/tests/test_qr.py             |   67 +-
 heat/core/linalg/tests/test_solver.py         |  116 +-
 heat/core/linalg/tests/test_svd.py            |   72 +-
 heat/core/linalg/tests/test_svdtools.py       |  420 ++-
 heat/core/logical.py                          |   34 +-
 heat/core/manipulations.py                    |  251 +-
 heat/core/memory.py                           |    6 +-
 heat/core/printing.py                         |    8 +
 heat/core/random.py                           |   52 +-
 heat/core/relational.py                       |   46 +-
 heat/core/rounding.py                         |   20 +-
 heat/core/sanitation.py                       |   24 +-
 heat/core/signal.py                           |  111 +-
 heat/core/statistics.py                       |   96 +-
 heat/core/stride_tricks.py                    |   44 +-
 heat/core/tests/test_arithmetics.py           |  168 +-
 heat/core/tests/test_communication.py         |   50 +-
 heat/core/tests/test_complex_math.py          |  401 ++-
 heat/core/tests/test_dndarray.py              |  545 +--
 heat/core/tests/test_exponential.py           |  283 +-
 heat/core/tests/test_factories.py             |  174 +-
 heat/core/tests/test_io.py                    |  322 +-
 heat/core/tests/test_logical.py               |   22 +-
 heat/core/tests/test_manipulations.py         |  316 +-
 heat/core/tests/test_printing.py              |   15 +-
 heat/core/tests/test_random.py                |  125 +-
 heat/core/tests/test_rounding.py              |  264 +-
 heat/core/tests/test_sanitation.py            |   11 +
 heat/core/tests/test_signal.py                |  364 +-
 heat/core/tests/test_statistics.py            |  171 +-
 heat/core/tests/test_suites/basic_test.py     |  132 +-
 heat/core/tests/test_tiling.py                |    8 +
 heat/core/tests/test_trigonometrics.py        |  392 +-
 heat/core/tests/test_types.py                 |   61 +-
 heat/core/tests/test_vmap.py                  |   79 +-
 heat/core/tiling.py                           |   21 +-
 heat/core/trigonometrics.py                   |   22 +-
 heat/core/types.py                            |   41 +-
 heat/core/version.py                          |    8 +-
 heat/core/vmap.py                             |    3 +-
 heat/decomposition/__init__.py                |    1 +
 heat/decomposition/dmd.py                     |  715 ++++
 heat/decomposition/pca.py                     |  210 +-
 heat/decomposition/tests/test_dmd.py          |  589 +++
 heat/decomposition/tests/test_pca.py          |  150 +-
 heat/fft/tests/test_fft.py                    |  135 +-
 heat/naive_bayes/gaussianNB.py                |   14 +-
 heat/naive_bayes/tests/test_gaussiannb.py     |   12 +-
 heat/nn/__init__.py                           |    2 +-
 heat/nn/data_parallel.py                      |    4 +-
 heat/optim/__init__.py                        |    2 +-
 heat/optim/dp_optimizer.py                    |    4 +-
 heat/preprocessing/preprocessing.py           |    2 +-
 heat/py.typed                                 |    0
 heat/regression/lasso.py                      |    2 +-
 heat/sparse/factories.py                      |    2 +-
 heat/sparse/manipulations.py                  |    6 +-
 heat/sparse/tests/test_arithmetics_csr.py     |    9 +-
 heat/sparse/tests/test_dcscmatrix.py          |    9 +-
 heat/sparse/tests/test_dcsrmatrix.py          |   10 +-
 heat/sparse/tests/test_factories.py           |    9 +-
 heat/sparse/tests/test_manipulations.py       |    9 +-
 heat/spatial/distance.py                      |    2 +-
 heat/spatial/tests/test_distances.py          |   18 +-
 heat/tests/test_cli.py                        |   56 +
 heat/utils/data/_utils.py                     |    1 +
 heat/utils/data/datatools.py                  |    2 +-
 heat/utils/data/partial_dataset.py            |    1 +
 heat/utils/data/spherical.py                  |    4 +-
 heat/utils/data/tests/test_matrixgallery.py   |   15 +-
 pyproject.toml                                |  175 +-
 scripts/numpy_coverage_tables.py              |   86 +-
 setup.cfg                                     |   14 -
 setup.py                                      |   52 -
 tutorials/hpc/2_basics.ipynb                  |    1 -
 tutorials/hpc/3_internals.ipynb               |    1 -
 tutorials/hpc/4_loading_preprocessing.ipynb   |    1 -
 tutorials/hpc/5_matrix_factorizations.ipynb   |    1 -
 tutorials/hpc/6_clustering.ipynb              |    1 -
 tutorials/local/2_basics.ipynb                |  780 ----
 tutorials/local/3_internals.ipynb             |  301 --
 tutorials/local/4_loading_preprocessing.ipynb |  209 --
 tutorials/local/6_clustering.ipynb            |  787 ----
 .../hpc/01_basics/01_basics_dndarrays.py      |   25 +
 .../hpc/01_basics/02_basics_datatypes.py      |   22 +
 .../hpc/01_basics/03_basics_operations.py     |   30 +
 .../hpc/01_basics/04_basics_indexing.py       |   13 +
 .../hpc/01_basics/05_basics_broadcast.py      |   14 +
 .../scripts/hpc/01_basics/06_basics_gpu.py    |   39 +
 .../hpc/01_basics/07_basics_distributed.py    |   70 +
 .../08_basics_distributed_operations.py       |   24 +
 .../01_basics/09_basics_distributed_matmul.py |   55 +
 .../hpc/01_basics/10_interoperability.py      |   26 +
 .../scripts/hpc/01_basics/11_internals_1.py   |   44 +
 .../scripts/hpc/01_basics/12_internals_2.py   |   71 +
 .../hpc/02_loading_preprocessing/01_IO.py     |   40 +
 .../02_preprocessing.py                       |   69 +
 .../hpc/02_loading_preprocessing/iris.csv     |  150 +
 .../matrix_factorizations.py                  |   99 +
 .../scripts/hpc/04_clustering/clustering.py   |   68 +
 .../hpc/05_your_turn/now_its_your_turn.py     |   44 +
 tutorials/scripts/hpc/README.md               |   17 +
 tutorials/scripts/hpc/slurm_script_cpu.sh     |   12 +
 tutorials/scripts/hpc/slurm_script_gpu.sh     |   13 +
 234 files changed, 18199 insertions(+), 6019 deletions(-)
 delete mode 100644 .github/workflows/changelog-updater.yml
 create mode 100644 benchmarks/cb/decomposition.py
 create mode 100644 benchmarks/cb/heat_signal.py
 create mode 100644 doc/Makefile
 delete mode 100644 doc/requirements.txt
 rename doc/{ => source/_static}/images/GSoC-Horizontal.svg (100%)
 rename doc/{ => source/_static}/images/bsp.svg (100%)
 rename doc/{ => source/_static}/images/clustering.png (100%)
 rename doc/{ => source/_static}/images/clustering_kmeans.png (100%)
 rename doc/{ => source/_static}/images/data.png (100%)
 rename doc/{ => source/_static}/images/dlr_logo.svg (100%)
 rename doc/{ => source/_static}/images/fzj_logo.svg (100%)
 rename doc/{ => source/_static}/images/hSVD_bench_rank5.png (100%)
 rename doc/{ => source/_static}/images/hSVD_bench_rank50.png (100%)
 rename doc/{ => source/_static}/images/hSVD_bench_rank500.png (100%)
 rename doc/{ => source/_static}/images/heat_split_array.png (100%)
 rename doc/{ => source/_static}/images/heat_split_array.svg (100%)
 rename doc/{ => source/_static}/images/heatvsdask_strong_smalldata_without.png (100%)
 rename doc/{ => source/_static}/images/heatvsdask_weak_smalldata_without.png (100%)
 rename doc/{ => source/_static}/images/helmholtz_logo.svg (100%)
 create mode 100644 doc/source/_static/images/jsc_logo.png
 create mode 100644 doc/source/_static/images/jupyter.png
 rename doc/{ => source/_static}/images/kit_logo.svg (100%)
 create mode 100644 doc/source/_static/images/local_laptop.png
 rename doc/{ => source/_static}/images/logo.png (100%)
 rename doc/{ => source/_static}/images/logo_emblem.png (100%)
 rename doc/{ => source/_static}/images/logo_emblem.svg (100%)
 rename doc/{ => source/_static}/images/logo_white.png (100%)
 rename doc/{ => source/_static}/images/logo_white.svg (100%)
 create mode 100644 doc/source/_static/images/nhr_verein_logo.jpg
 create mode 100644 doc/source/_static/images/perun_logo.svg
 rename doc/{ => source/_static}/images/split_array.png (100%)
 rename doc/{ => source/_static}/images/split_array.svg (100%)
 rename doc/{ => source/_static}/images/tutorial_clustering.svg (100%)
 rename doc/{ => source/_static}/images/tutorial_dpnn.svg (100%)
 rename doc/{ => source/_static}/images/tutorial_logo.svg (100%)
 rename doc/{ => source/_static}/images/tutorial_split_dndarray.svg (100%)
 rename doc/{ => source/_static}/images/weak_scaling_gpu_terrabyte.png (100%)
 delete mode 100644 doc/source/tutorial_dpnn.rst
 create mode 100755 doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh
 create mode 100644 doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
 rename tutorials/hpc/1_intro.ipynb => doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb (78%)
 rename tutorials/local/1_intro.ipynb => doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb (79%)
 create mode 100755 doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh
 create mode 100644 doc/source/tutorials/notebooks/1_basics.ipynb
 create mode 100644 doc/source/tutorials/notebooks/2_internals.ipynb
 create mode 100644 doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb
 rename tutorials/local/5_matrix_factorizations.ipynb => doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb (56%)
 create mode 100644 doc/source/tutorials/notebooks/5_clustering.ipynb
 create mode 100644 doc/source/tutorials/notebooks/6_profiling.ipynb
 rename doc/source/{ => tutorials}/tutorial_30_minutes.rst (100%)
 rename doc/source/{ => tutorials}/tutorial_clustering.rst (97%)
 create mode 100644 doc/source/tutorials/tutorial_notebook_gallery.rst
 rename doc/source/{ => tutorials}/tutorial_parallel_computation.rst (99%)
 rename doc/source/{ => tutorials}/tutorials.rst (68%)
 create mode 100644 heat/cli.py
 create mode 100644 heat/core/linalg/eigh.py
 create mode 100644 heat/core/linalg/polar.py
 create mode 100644 heat/core/linalg/tests/test_eigh.py
 create mode 100644 heat/core/linalg/tests/test_polar.py
 create mode 100644 heat/decomposition/dmd.py
 create mode 100644 heat/decomposition/tests/test_dmd.py
 create mode 100644 heat/py.typed
 create mode 100644 heat/tests/test_cli.py
 delete mode 100644 setup.cfg
 delete mode 100644 setup.py
 delete mode 120000 tutorials/hpc/2_basics.ipynb
 delete mode 120000 tutorials/hpc/3_internals.ipynb
 delete mode 120000 tutorials/hpc/4_loading_preprocessing.ipynb
 delete mode 120000 tutorials/hpc/5_matrix_factorizations.ipynb
 delete mode 120000 tutorials/hpc/6_clustering.ipynb
 delete mode 100644 tutorials/local/2_basics.ipynb
 delete mode 100644 tutorials/local/3_internals.ipynb
 delete mode 100644 tutorials/local/4_loading_preprocessing.ipynb
 delete mode 100644 tutorials/local/6_clustering.ipynb
 create mode 100644 tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py
 create mode 100644 tutorials/scripts/hpc/01_basics/02_basics_datatypes.py
 create mode 100644 tutorials/scripts/hpc/01_basics/03_basics_operations.py
 create mode 100644 tutorials/scripts/hpc/01_basics/04_basics_indexing.py
 create mode 100644 tutorials/scripts/hpc/01_basics/05_basics_broadcast.py
 create mode 100644 tutorials/scripts/hpc/01_basics/06_basics_gpu.py
 create mode 100644 tutorials/scripts/hpc/01_basics/07_basics_distributed.py
 create mode 100644 tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py
 create mode 100644 tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py
 create mode 100644 tutorials/scripts/hpc/01_basics/10_interoperability.py
 create mode 100644 tutorials/scripts/hpc/01_basics/11_internals_1.py
 create mode 100644 tutorials/scripts/hpc/01_basics/12_internals_2.py
 create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py
 create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py
 create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/iris.csv
 create mode 100644 tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py
 create mode 100644 tutorials/scripts/hpc/04_clustering/clustering.py
 create mode 100644 tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py
 create mode 100644 tutorials/scripts/hpc/README.md
 create mode 100644 tutorials/scripts/hpc/slurm_script_cpu.sh
 create mode 100644 tutorials/scripts/hpc/slurm_script_gpu.sh

diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
index 5ef72f6bd1..b70c72b57e 100644
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -35,6 +35,7 @@ body:
       options:
         - main (development branch)
         - 1.5.x
+        - other
     validations:
       required: true
   - type: dropdown
@@ -43,16 +44,18 @@ body:
       label: Python version
       description: What Python version?
       options:
+        - 3.13
         - 3.12
         - 3.11
-        - "3.10"
-        - 3.9
+        - '3.10'
   - type: dropdown
     id: pytorch-version
     attributes:
       label: PyTorch version
       description: What PyTorch version?
       options:
+        - 2.7
+        - 2.6
         - 2.5
         - 2.4
         - 2.3
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index b7ac0c46da..9fec41ba22 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -6,6 +6,7 @@
 - Implementation:
     - [ ] unit tests: all split configurations tested
     - [ ] unit tests: multiple dtypes tested
+    - [ ] **NEW** unit tests: MPS tested (1 MPI process, 1 GPU)
     - [ ] benchmarks: created for new functionality
     - [ ] benchmarks: performance improved or maintained
     - [ ] documentation updated where needed
diff --git a/.github/rd-release-config.yml b/.github/rd-release-config.yml
index 6f1d103d27..a45fa74a14 100644
--- a/.github/rd-release-config.yml
+++ b/.github/rd-release-config.yml
@@ -94,11 +94,12 @@ autolabeler:
   - label: 'docker'
     files:
       - 'docker/**/*'
-  - label: 'backport release'
+  - label: 'backport stable'
     title:
       - '/bug/i'
       - '/resolve/i'
       - '/fix/i'
+      - '/\[pre\-commit\.ci\]/i'
     branch:
       - '/bug/i'
       - '/fix/i'
@@ -113,9 +114,6 @@ autolabeler:
   - label: 'interoperability'
     title:
       - '/Support.+/'
-  - label: 'testing'
-    files:
-      - '**/tests/**/*'
   - label: 'classification'
     files:
       - 'heat/classification/**/*'
@@ -164,6 +162,84 @@ autolabeler:
   - label: 'linalg'
     files:
       - 'heat/core/linalg/**/*'
+  - label: 'arithmetics'
+    files:
+      - 'heat/core/arithmetics.py'
+  - label: 'base'
+    files:
+      - 'heat/core/base.py'
+  - label: 'communication'
+    files:
+      - 'heat/core/communication.py'
+  - label: 'complex_math'
+    files:
+      - 'heat/core/complex_math.py'
+  - label: 'constants'
+    files:
+      - 'heat/core/constants.py'
+  - label: 'devices'
+    files:
+      - 'heat/core/devices.py'
+  - label: 'dndarray'
+    files:
+      - 'heat/core/dndarray.py'
+  - label: 'exponential'
+    files:
+      - 'heat/core/exponential.py'
+  - label: 'indexing'
+    files:
+      - 'heat/core/indexing.py'
+  - label: 'io'
+    files:
+      - 'heat/core/io.py'
+  - label: 'logical'
+    files:
+      - 'heat/core/logical.py'
+  - label: 'manipulations'
+    files:
+      - 'heat/core/manipulations.py'
+  - label: 'memory'
+    files:
+      - 'heat/core/memory.py'
+  - label: 'printing'
+    files:
+      - 'heat/core/printing.py'
+  - label: 'random'
+    files:
+      - 'heat/core/random.py'
+  - label: 'relational'
+    files:
+      - 'heat/core/relational.py'
+  - label: 'rounding'
+    files:
+      - 'heat/core/rounding.py'
+  - label: 'sanitation'
+    files:
+      - 'heat/core/sanitation.py'
+  - label: 'signal'
+    files:
+      - 'heat/core/signal.py'
+  - label: 'statistics'
+    files:
+      - 'heat/core/statistics.py'
+  - label: 'stride_tricks'
+    files:
+      - 'heat/core/stride_tricks.py'
+  - label: 'tiling'
+    files:
+      - 'heat/core/tiling.py'
+  - label: 'trigonometrics'
+    files:
+      - 'heat/core/trigonometrics.py'
+  - label: 'types'
+    files:
+      - 'heat/core/types.py'
+  - label: 'version'
+    files:
+      - 'heat/core/version.py'
+  - label: 'vmap'
+    files:
+      - 'heat/core/vmap.py'
 
 change-template: '- #$NUMBER $TITLE (by @$AUTHOR)'
 category-template: '### $TITLE'
diff --git a/.github/workflows/CIBase.yml b/.github/workflows/CIBase.yml
index 7f97485236..1454e239e5 100644
--- a/.github/workflows/CIBase.yml
+++ b/.github/workflows/CIBase.yml
@@ -4,7 +4,7 @@ on:
   push:
     branches:
       - 'main'
-      - 'release/**'
+      - 'stable'
 
 permissions:
   contents: read
@@ -14,13 +14,13 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: Get branch names
         id: branch-names
-        uses: tj-actions/branch-names@v8
+        uses: tj-actions/branch-names@5250492686b253f06fa55861556d1027b067aeb5 # v9.0.2
       - name: 'start test'
         run: |
          curl -s -X POST \
diff --git a/.github/workflows/CISupport.yml b/.github/workflows/CISupport.yml
index 7f06369842..4f65e0186f 100644
--- a/.github/workflows/CISupport.yml
+++ b/.github/workflows/CISupport.yml
@@ -9,9 +9,14 @@ jobs:
   starter:
     runs-on: ubuntu-latest
     steps:
+      - name: Harden the runner (Audit all outbound calls)
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
+        with:
+          egress-policy: audit
+
       - name: Get branch names
         id: branch-names
-        uses: tj-actions/branch-names@v8
+        uses: tj-actions/branch-names@5250492686b253f06fa55861556d1027b067aeb5 # v9.0.2
       - name: 'start test'
         run: |
          curl -s -X POST \
diff --git a/.github/workflows/CommentPR.yml b/.github/workflows/CommentPR.yml
index b371663ca4..c156bc28ee 100644
--- a/.github/workflows/CommentPR.yml
+++ b/.github/workflows/CommentPR.yml
@@ -16,7 +16,7 @@ jobs:
       PR_NR: ${{ steps.step1.outputs.test }}
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
@@ -65,11 +65,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
 
       - name: 'Trigger Workflow'
         run: |
diff --git a/.github/workflows/ReceivePR.yml b/.github/workflows/ReceivePR.yml
index 6c26e28266..82b27c8989 100644
--- a/.github/workflows/ReceivePR.yml
+++ b/.github/workflows/ReceivePR.yml
@@ -13,16 +13,16 @@ jobs:
 
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@v4.1.7
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
 
       - name: Use Python
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.2.0
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
         with:
-          python-version: 3.9
+          python-version: '3.10'
           architecture: x64
 
       - name: Setup MPI
@@ -42,7 +42,7 @@ jobs:
         run: |
           mkdir -p ./pr
           echo $PR_NUMBER > ./pr/pr_number
-      - uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0
+      - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
         with:
           name: pr_number
           path: pr/
diff --git a/.github/workflows/backport.yml b/.github/workflows/backport.yml
index 253eb39ee7..ab09a22d03 100644
--- a/.github/workflows/backport.yml
+++ b/.github/workflows/backport.yml
@@ -12,8 +12,13 @@ jobs:
     # Don't run on closed unmerged pull requests
     if: github.event.pull_request.merged
     steps:
-      - uses: actions/checkout@v4.1.7
+      - name: Harden the runner (Audit all outbound calls)
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
+        with:
+          egress-policy: audit
+
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
       - name: Create backport pull requests
-        uses: korthout/backport-action@v3
+        uses: korthout/backport-action@ca4972adce8039ff995e618f5fc02d1b7961f27a # v3.3.0
         with:
           merge_commits: 'skip'
diff --git a/.github/workflows/bench_trigger.yml b/.github/workflows/bench_trigger.yml
index 53fa95477f..cdee9d3af8 100644
--- a/.github/workflows/bench_trigger.yml
+++ b/.github/workflows/bench_trigger.yml
@@ -6,18 +6,21 @@ on:
   pull_request:
     types: [synchronize]
 
+permissions:
+  contents: read
+
 jobs:
   trigger-benchmark:
     name: Trigger Benchmarks
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: Checkout
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
       - name: Trigger benchmarks (PR)
         id: setup_pr
         if: contains(github.event.pull_request.labels.*.name, 'benchmark PR')
diff --git a/.github/workflows/changelog-updater.yml b/.github/workflows/changelog-updater.yml
deleted file mode 100644
index 1739b9a876..0000000000
--- a/.github/workflows/changelog-updater.yml
+++ /dev/null
@@ -1,55 +0,0 @@
-name: 'Update Changelog'
-
-on:
-  release:
-    types: [released]
-
-permissions:
-  contents: read
-
-jobs:
-  update-changelog:
-    permissions:
-      contents: write  # for stefanzweifel/git-auto-commit-action to push code in repo
-    runs-on: ubuntu-latest
-    steps:
-      - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
-        with:
-          egress-policy: audit
-
-      - name: Checkout code
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
-        with:
-          repository: helmholtz-analytics/heat
-          ref: ${{ github.event.release.target_commitish }}
-      - name: Update Changelog
-        run: |
-          echo $RELEASE_TITLE > cl_title.md
-          echo "$RELEASE_BODY" > cl_new_body.md
-          echo "" > newline.txt
-          cat cl_title.md newline.txt cl_new_body.md newline.txt CHANGELOG.md > tmp
-          mv tmp CHANGELOG.md
-          rm cl_title.md
-          rm cl_new_body.md
-          rm newline.txt
-          cat CHANGELOG.md
-        env:
-          RELEASE_TITLE: ${{ format('# {0} - {1}', github.event.release.tag_name, github.event.release.name) }}
-          RELEASE_BODY: ${{ github.event.release.body }}
-      - name: Create PR
-        uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6.1.0
-        with:
-            base: main
-            branch: post-release-changelog-update
-            delete-branch: true
-            token: ${{ secrets.GITHUB_TOKEN }}
-            commit-message: Update Changelog post release
-            title: Update Changelog post release
-            body: |
-              This PR updates the changelog post release.
-
-              Changed files should include an updated CHANGELOG.md containing the release notes from the latest release.
-
-            reviewers: ClaudiaComito, mtar, JuanPedroGHM
-            labels: chore, github_actions
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index fe6ceb8d25..84cb3d7d80 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -12,44 +12,54 @@ jobs:
       fail-fast: false
       matrix:
         py-version:
-          - 3.9
           - '3.10'
           - 3.11
           - 3.12
+          - 3.13
         mpi: [ 'openmpi' ]
-        install-options: [ '.', '.[hdf5,netcdf,pandas]' ]
+        install-options: [ '.', '.[hdf5,netcdf,pandas,zarr]' ]
         pytorch-version:
-          - 'torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2'
-          - 'torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2'
-          - 'torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
+          - 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2'
+          - 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
           - 'torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1'
           - 'torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1'
           - 'torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1'
           - 'torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0'
+          - 'torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1'
         exclude:
+          - py-version: '3.13'
+            pytorch-version: 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2'
+          - py-version: '3.13'
+            pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
+          - py-version: '3.13'
+            pytorch-version: 'torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1'
+          - py-version: '3.13'
+            pytorch-version: 'torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1'
+          - py-version: '3.13'
+            pytorch-version: 'torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1'
           - py-version: '3.12'
-            pytorch-version: 'torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2'
+            pytorch-version: 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2'
           - py-version: '3.12'
-            pytorch-version: 'torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2'
-          - py-version: '3.12'
-            pytorch-version: 'torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
+            pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
+          - py-version: '3.10'
+            install-options: '.[hdf5,netcdf,pandas,zarr]'
 
 
     name: Python ${{ matrix.py-version }} with ${{ matrix.pytorch-version }}; options ${{ matrix.install-options }}
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: Checkout
-        uses: actions/checkout@v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
       - name: Setup MPI
-        uses: mpi4py/setup-mpi@v1.2.0
+        uses: mpi4py/setup-mpi@3969f247e8fceef153418744f9d9ee6fdaeda29f # v1.2.0
         with:
           mpi: ${{ matrix.mpi }}
       - name: Use Python ${{ matrix.py-version }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.2.0
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
         with:
           python-version: ${{ matrix.py-version }}
           architecture: x64
diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
index 28bd3fbb72..9ad611e838 100644
--- a/.github/workflows/codeql.yml
+++ b/.github/workflows/codeql.yml
@@ -41,16 +41,16 @@ jobs:
 
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: Checkout repository
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
 
       # Initializes the CodeQL tools for scanning.
       - name: Initialize CodeQL
-        uses: github/codeql-action/init@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6
+        uses: github/codeql-action/init@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
         with:
           languages: ${{ matrix.language }}
           # If you wish to specify custom queries, you can do so here or in a config file.
@@ -60,7 +60,7 @@ jobs:
       # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
       # If this step fails, then you should remove it and run the build manually (see below)
       - name: Autobuild
-        uses: github/codeql-action/autobuild@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6
+        uses: github/codeql-action/autobuild@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
 
       # ℹ️ Command-line programs to run using the OS shell.
       # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
@@ -73,6 +73,6 @@ jobs:
       #   ./location_of_script_within_repo/buildscript.sh
 
       - name: Perform CodeQL Analysis
-        uses: github/codeql-action/analyze@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6
+        uses: github/codeql-action/analyze@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
         with:
           category: "/language:${{matrix.language}}"
diff --git a/.github/workflows/create-branch-on-assignment.yml b/.github/workflows/create-branch-on-assignment.yml
index 3e5abc9add..87ee737027 100644
--- a/.github/workflows/create-branch-on-assignment.yml
+++ b/.github/workflows/create-branch-on-assignment.yml
@@ -11,11 +11,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: Create Issue Branch
-        uses: robvanderleek/create-issue-branch@6bb28dd55d6790ee022ca0de60deca378e628ab3 # main
+        uses: robvanderleek/create-issue-branch@dfe19372d9a9198999c0fd8a81f0dbe00951afd9 # main
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.github/workflows/dependency-review.yml b/.github/workflows/dependency-review.yml
index bf2dcfbae9..3eb2f162ad 100644
--- a/.github/workflows/dependency-review.yml
+++ b/.github/workflows/dependency-review.yml
@@ -17,11 +17,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: 'Checkout Repository'
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
       - name: 'Dependency Review'
-        uses: actions/dependency-review-action@5a2ce3f5b92ee19cbb1541a4984c76d921601d7c # v4.3.4
+        uses: actions/dependency-review-action@da24556b548a50705dd671f47852072ea4c105d9 # v4.7.1
diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml
index 8327f935d0..49922bf604 100644
--- a/.github/workflows/docker.yml
+++ b/.github/workflows/docker.yml
@@ -25,31 +25,31 @@ jobs:
         runs-on: ubuntu-latest
         steps:
             - name: Harden Runner
-              uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+              uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
               with:
                 egress-policy: audit
 
             -
               name: Checkout
-              uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+              uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
             -
               name: Set up QEMU
-              uses: docker/setup-qemu-action@49b3bc8e6bdd4a60e6116a5414239cba5943d3cf # v3.2.0
+              uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0
             -
               name: Set up Docker Buildx
-              uses: docker/setup-buildx-action@988b5a0280414f521da01fcc63a27aeeb4b104db # v3.6.1
+              uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
               with:
                 driver: docker
             -
               name: Login to GitHub Container Registry
-              uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567 # v3.3.0
+              uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
               with:
                 registry: ghcr.io
                 username: ${{ github.repository_owner }}
                 password: ${{ secrets.GITHUB_TOKEN }}
             -
               name: Build
-              uses: docker/build-push-action@5cd11c3a4ced054e52742c5fd54dca954e0edd85 # v6.7.0
+              uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
               with:
                 file: docker/Dockerfile.release
                 build-args: |
@@ -65,7 +65,7 @@ jobs:
                 docker run -v `pwd`:`pwd` -w `pwd` --rm test_${{ inputs.name }} pytest
             -
               name: Build and push
-              uses: docker/build-push-action@5cd11c3a4ced054e52742c5fd54dca954e0edd85 # v6.7.0
+              uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
               with:
                 file: docker/Dockerfile.release
                 build-args: |
diff --git a/.github/workflows/inactivity.yml b/.github/workflows/inactivity.yml
index ef96defa50..636bc7de18 100644
--- a/.github/workflows/inactivity.yml
+++ b/.github/workflows/inactivity.yml
@@ -14,11 +14,11 @@ jobs:
       pull-requests: write
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/stale@28ca1036281a5e5922ead5184a1bbf96e5fc984e # v9.0.0
+      - uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
         with:
           days-before-issue-stale: 60
           days-before-issue-close: 60
diff --git a/.github/workflows/increment_version.sh b/.github/workflows/increment_version.sh
index cfc6481460..8c7b417170 100755
--- a/.github/workflows/increment_version.sh
+++ b/.github/workflows/increment_version.sh
@@ -2,6 +2,6 @@
 version=$1 # assume version is passed as an argument
 IFS='.' read -r -a parts <<< "$version" # split by dots
 last_index=$(( ${#parts[@]} - 1 )) # get last index
-parts[$last_index]=$(( ${parts[$last_index]} + 1 )) # increment last part
+parts[last_index]=$(( parts[last_index] + 1 )) # increment last part
 new_version=$(IFS=.; echo "${parts[*]}") # join by dots
-echo $new_version # print new version
+echo "$new_version" # print new version
diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 0b2060d302..916ca2cf9c 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -18,11 +18,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
       - uses: JasonEtco/create-an-issue@1b14a70e4d8dc185e5cc76d3bec9eab20257b2c5 # v2.9.2
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -34,23 +34,23 @@ jobs:
           update_existing: true
           search_existing: open
       - name: Check out new branch
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
         with:
           token: ${{ secrets.GITHUB_TOKEN }}
           ref: '${{ inputs.working_branch }}'
       - name: Set env variables
         run: |
-          echo "previous_pytorch=$(grep 'torch>=' setup.py | awk -F '<' '{print $2}' | tr -d '",')" >> $GITHUB_ENV
+          echo "previous_pytorch=$(grep 'torch~=' pyproject.toml | awk -F '<' '{print $2}' | tr -d '",')" >> $GITHUB_ENV
           echo "new_pytorch=$(<.github/pytorch-release-versions/pytorch-latest.txt)"  >> $GITHUB_ENV
       - name: Increment version
         run: |
           chmod +x .github/workflows/increment_version.sh
           echo "setup_pytorch=$(.github/workflows/increment_version.sh ${{ env.new_pytorch }})" >> $GITHUB_ENV
-      - name: Update setup.py
+      - name: Update pyproject.toml
         run: |
-          sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.setup_pytorch }}"'/g' setup.py
+          sed -i '/torch~=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.setup_pytorch }}"'/g' pyproject.toml
       - name: Create PR from branch
-        uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6.1.0
+        uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8
         with:
             base: ${{ inputs.base_branch }}
             branch: ${{ inputs.working_branch }}
@@ -62,7 +62,7 @@ jobs:
               Run tests on latest PyTorch release
               Issue/s resolved: #${{ steps.create-issue.outputs.number }}
               TODO:
-              - [ ] update `.github/workflows/ci.yaml` to include `n-1` Pytorch version
+              - [ ] update `.github/workflows/ci.yaml` to include latest Pytorch version
               - [ ] update Nvidia and AMD Docker images on gitlab CI
               Auto-generated by [create-pull-request][1]
               [1]: https://github.com/peter-evans/create-pull-request
diff --git a/.github/workflows/markdown-links-check.yml b/.github/workflows/markdown-links-check.yml
index c09084ba65..db4266ff13 100644
--- a/.github/workflows/markdown-links-check.yml
+++ b/.github/workflows/markdown-links-check.yml
@@ -12,11 +12,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # master
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # master
       - uses: gaurav-nelson/github-action-markdown-link-check@5c5dfc0ac2e225883c0e5f03a85311ec2830d368 # v1
         # checks all markdown files from root but ignores subfolders
         # By Removing the max-depth variable we can modify it -> to check all the .md files in the entire repo.
diff --git a/.github/workflows/pytorch-latest-release.yml b/.github/workflows/pytorch-latest-release.yml
index f39adc6183..2bd71c9d9c 100644
--- a/.github/workflows/pytorch-latest-release.yml
+++ b/.github/workflows/pytorch-latest-release.yml
@@ -14,11 +14,11 @@ jobs:
     if: ${{ github.repository }} == 'hemlholtz-analytics/heat'
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
         with:
           ref: '${{ env.base_branch }}'
       - name: Fetch PyTorch release version
diff --git a/.github/workflows/release-drafter.yml b/.github/workflows/release-drafter.yml
index f599bb28ea..0cd25717a3 100644
--- a/.github/workflows/release-drafter.yml
+++ b/.github/workflows/release-drafter.yml
@@ -1,8 +1,11 @@
 name: Release Drafter
 
 on:
-  pull_request_target:
-    types: [opened, reopened, synchronize]
+  pull_request:
+    types: [opened, reopened]
+permissions:
+  contents: read
+
 jobs:
   update_release_draft:
     permissions:
@@ -11,14 +14,14 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.2
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: release-drafter/release-drafter@v6 # v6.0.0
+      - uses: release-drafter/release-drafter@b1476f6e6eb133afa41ed8589daba6dc69b4d3f5 # v6.1.0
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         with:
-          commitish: 'release'
+          commitish: 'main'
           name: ${{ github.ref_name }} - Draft release
           config-name: rd-release-config.yml
diff --git a/.github/workflows/release-prep.yml b/.github/workflows/release-prep.yml
index 1e28bae9e0..b1f50d83ae 100644
--- a/.github/workflows/release-prep.yml
+++ b/.github/workflows/release-prep.yml
@@ -11,6 +11,10 @@ on:
         description: "The base branch to create the release branch from"
         required: true
         default: "main"
+      title:
+        description: "Release title"
+        required: False
+        default: "Heat"
 
 permissions:
   contents: write
@@ -22,13 +26,21 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
         with:
           ref: ${{ github.event.inputs.base_branch }}
+      - uses: release-drafter/release-drafter@b1476f6e6eb133afa41ed8589daba6dc69b4d3f5 # v6.1.0
+        id: release_drafter
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        with:
+          commitish: 'stable'
+          name: ${{ github.event.inputs.title }}
+          config-name: rd-release-config.yml
       - name: Bump version.py and create PR
         env:
           PR_BRANCH: pre-release/${{ inputs.release_version }}
@@ -43,43 +55,97 @@ jobs:
           MINOR=$(echo $VERSION | cut -d. -f2)
           MICRO=$(echo $VERSION | cut -d. -f3)
 
+          ## ----- START Update Dockerfiles -------
+          # Extract the current version from the Dockerfile
+          FILE_VERSION=$(grep -oP 'ARG PYTORCH_IMG=\K\d{2}\.\d{2}' docker/Dockerfile.release)
+          FILE_VERSION_SOURCE=$(grep -oP 'ARG PYTORCH_IMG=\K\d{2}\.\d{2}' docker/Dockerfile.source)
+
+          # Construct the date for the new version
+          DATE=$(date '+%y.%m')
+
+          # Separate new
+          YEAR=$(echo $DATE | cut -d'.' -f1)
+          MONTH=$(echo $DATE | cut -d'.' -f2)
+
+          ## --- Handling of special cases ---
+          # Move to the previous year
+          if [ "$MONTH" == "01" ]; then
+          PREV_MONTH="12"
+          YEAR=$(($YEAR - 1))
+          # 09 and 08 will be interpreted in Octal, so they have to be handled differently
+          elif [ "$MONTH" == "09" ]; then
+          PREV_MONTH="08"
+          elif [ "$MONTH" == "08" ]; then
+          PREV_MONTH="07"
+          else
+          PREV_MONTH=$(($MONTH - 1))
+          # Ensure the previous month is 2 digits
+          PREV_MONTH=$(printf "%02d" $PREV_MONTH)
+          fi
+
+          # Construct the new version
+          NEW_VERSION="${YEAR}.${PREV_MONTH}"
+
+          sed -i "s/$FILE_VERSION/$NEW_VERSION/g" docker/Dockerfile.release
+          sed -i "s/$FILE_VERSION_SOURCE/$NEW_VERSION/g" docker/Dockerfile.source
+
+          ## ----- END Workflow to update Dockerfile Images -------
+
           # Write on to the version.py file
           sed -i "s/major: int = \([0-9]\+\)/major: int = $MAJOR/g" heat/core/version.py
           sed -i "s/minor: int = \([0-9]\+\)/minor: int = $MINOR/g" heat/core/version.py
           sed -i "s/micro: int = \([0-9]\+\)/micro: int = $MICRO/g" heat/core/version.py
           sed -i "s/extension: str = .*/extension: str = None/g" heat/core/version.py
 
+          { echo -e "# v${MAJOR}.${MINOR}.${MICRO} - ${{github.event.inputs.title}}\n${{ steps.release_drafter.outputs.body}}\n"; cat CHANGELOG.md; } > tmp.md
+          mv tmp.md CHANGELOG.md
+
           # Git configuration with anonymous user and email
           git config --global user.email ""
           git config --global user.name "Heat Release Bot"
 
           # Commit the changes
-          git add heat/core/version.py
+          git add heat/core/version.py CHANGELOG.md
           git commit -m "Bump version to $VERSION"
 
+          # Commit Dockerfile changes
+          git add docker/Dockerfile.release
+          git add docker/Dockerfile.source
+          git commit -m "Update pytorch image in Dockerfile.release and Dockerfile.source to version $UPDATED_VERSION"
+
           # Push the changes
           git push --set-upstream origin pre-release/${VERSION}
 
           # Create PR for release
           gh pr create \
-          --base release \
+          --base stable \
           --head ${{ env.PR_BRANCH }} \
           --title "Heat ${{ env.VERSION }} - Release" \
           --body "Pre-release branch for Heat ${{ env.VERSION }}.
-          Any release work should be done on this branch, and then merged into the release branch and main, following git-flow.
+          Any release work should be done on this branch, and then merged into `stable` and `main`, following git-flow guidelines.
 
           TODO:
           - [x] Update version.py
-          - [ ] update the Requirements section on README.md if needed
-          - [ ] Update CITATION.cff if needed
-          - [ ] Ensure the Changelog is up to date
+          - [ ] Ensure Citation file `CITATION.cff` is up to date.
+          - [ ] Ensure the Changelog is up to date.
 
-          [1]: https://github.com/peter-evans/create-pull-request" \
-          --label invalid
+          DO NOT DELETE BRANCH AFTER MERGING!" \
+          --label "pre-release"
 
           # Create PR for main
           gh pr create --base main \
           --head ${{ env.PR_BRANCH }} \
           --title "Heat ${{ env.VERSION }} - Main" \
-          --body "Copy of latest pre-release PR targeting release." \
-          --label invalid
+          --draft \
+          --body "Copy of latest pre-release PR targeting release.
+          DO NOT CHANGE ANYTHING UNTIL `Heat ${{ env.VERSION }} - Release` HAS BEEN MERGED.
+
+          TODO:
+          - [ ] Make sure version.py is updated to reflect the dev version.
+          - [ ] Ensure Citation file is up to date.
+          - [ ] Ensure the Changelog is up to date.
+          - [ ] Test and merge conda-forge build (PR is usually created within a few hours of PyPi release)
+          - [ ] Update docker image and related documentation (see #1716 )
+          - [ ] Update spack recipe
+          - [ ] Update easybuild recipe" \
+          --label "post-release"
diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index f24e301680..243858735a 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -32,17 +32,17 @@ jobs:
 
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1
+        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
         with:
           egress-policy: audit
 
       - name: "Checkout code"
-        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
+        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
         with:
           persist-credentials: false
 
       - name: "Run analysis"
-        uses: ossf/scorecard-action@62b2cac7ed8198b15735ed49ab1e5cf35480ba46 # v2.4.0
+        uses: ossf/scorecard-action@05b42c624433fc40578a4040d5cf5e36ddca8cde # v2.4.2
         with:
           results_file: results.sarif
           results_format: sarif
@@ -64,7 +64,7 @@ jobs:
       # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
       # format to the repository Actions tab.
       - name: "Upload artifact"
-        uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
         with:
           name: SARIF file
           path: results.sarif
@@ -72,6 +72,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6
+        uses: github/codeql-action/upload-sarif@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
         with:
           sarif_file: results.sarif
diff --git a/.gitignore b/.gitignore
index 6cf3342594..f59b5da737 100644
--- a/.gitignore
+++ b/.gitignore
@@ -308,3 +308,4 @@ heat/datasets/MNISTDataset
 perun_results/
 bench_data/
 my_dev_stuff/
+docs/source/autoapi
diff --git a/.perun.ini b/.perun.ini
index 0919670d6e..b594eac4df 100644
--- a/.perun.ini
+++ b/.perun.ini
@@ -1,12 +1,28 @@
+[post-processing]
+power_overhead = 100
+pue = 1.05
+emissions_factor = 417.8
+price_factor = 0.3251
+price_unit = €
+
+[monitor]
+sampling_period = 0.1
+include_backends =
+include_sensors =
+exclude_backends =
+exclude_sensors = CPU_FREQ_\d
+
 [output]
+app_name
+run_id
 format = bench
 data_out = ./bench_data
 
 [benchmarking]
 rounds = 10
 warmup_rounds = 1
-metrics=runtime
-region_metrics=runtime
+metrics = runtime,energy
+region_metrics = runtime,power
 
 [benchmarking.units]
 joule = k
@@ -14,3 +30,6 @@ second =
 percent =
 watt =
 byte = G
+
+[debug]
+log_lvl = WARNING
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index b45d931840..4b2b239cc2 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,30 +1,53 @@
 # See https://pre-commit.com for more information
 # See https://pre-commit.com/hooks.html for more hooks
+ci:
+  skip:
+    - "mypy"  # Skip mypy in CI, as it is run manually
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v5.0.0
+    rev: v6.0.0
     hooks:
       - id: trailing-whitespace
       - id: end-of-file-fixer
       - id: check-yaml
       - id: check-added-large-files
       - id: check-toml
-  - repo: https://github.com/psf/black-pre-commit-mirror
-    rev: 25.1.0
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.17.1 # Use the sha / tag you want to point at
     hooks:
-    - id: black
-  - repo: https://github.com/PyCQA/flake8
-    rev: 7.2.0
+      - id: mypy
+        args: [--config-file, pyproject.toml, --ignore-missing-imports]
+        additional_dependencies:
+          - torch
+          - h5py
+          - zarr
+        pass_filenames: false
+        stages: [manual]
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.12.9
     hooks:
-      - id: flake8
-  - repo: https://github.com/pycqa/pydocstyle
-    rev: 6.3.0 # pick a git hash / tag to point to
-    hooks:
-      - id: pydocstyle
-        exclude: "tutorials|tests|benchmarks|examples|scripts|setup.py" #|heat/utils/data/mnist.py|heat/utils/data/_utils.py  ?
+      # Run the linter.
+      - id: ruff
+        args: [ --fix ]
+      # Run the formatter.
+      - id: ruff-format
   - repo: "https://github.com/citation-file-format/cffconvert"
     rev: "054bda51dbe278b3e86f27c890e3f3ac877d616c"
     hooks:
       - id: "validate-cff"
         args:
           - "--verbose"
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.28.0
+    hooks:
+      - id: gitleaks
+  - repo: https://github.com/shellcheck-py/shellcheck-py
+    rev: v0.11.0.1
+    hooks:
+      - id: shellcheck
+  #- repo: https://github.com/jumanjihouse/pre-commit-hooks
+  #  rev: 3.0.0
+  #  hooks:
+  #    - id: shellcheck
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
index 49dafa0fa7..e09599c26e 100644
--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -10,6 +10,9 @@ build:
   os: ubuntu-22.04
   tools:
     python: "3.11"
+  apt_packages:
+    - pandoc
+    - libopenmpi-dev
 
 # Build documentation in the docs/ directory with Sphinx
 sphinx:
@@ -19,4 +22,7 @@ sphinx:
 # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
 python:
   install:
-  - requirements: doc/requirements.txt
+  - method: pip
+    path: .
+    extra_requirements:
+    - docs
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4187c196b3..818a7dde09 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,88 @@
+# v1.6.0
+## Highlights
+
+1) Linear algebra: Singular Value Decomposition, Symmetric Eigenvalue Decomposition and Polar Decompositition via the "Zolotarev approach"
+2) MPI Layer: Support for communicating buffers larger than 2^31-1
+3) Dynamic Mode Decomposition (with and without control)
+4) IO: Zarr format support
+5) Signal Processing: Strided 1D convolution
+6) Expanded QR decomposition
+7) Apple MPS Support
+8) Tutorial overhaul
+
+*SVD, PCA, and DMD have been implemented within the project ESAPCA funded by the European Space Agency (ESA). This support is gratefully acknowledged.*
+
+## Changes
+
+### Features
+* Decomposition module and PCA interface by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1538
+* Distributed randomized SVD by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1561
+* Incremental SVD/PCA by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1629
+* Dynamic Mode Decomposition (DMD) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1639
+* `heat.eq`, `heat.ne` now allow non-array operands by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1773
+* Large data counts support for MPI Communication by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1765
+* Added `slice` argument for `load_hdf5` by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1753
+* Support Apple MPS acceleration by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1129
+* QR decomposition for non tall-skinny matrices and `split=0` by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1744
+* Support for the `zarr` data format by @Berkant03 in https://github.com/helmholtz-analytics/heat/pull/1766
+* Polar decomposition by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1697
+* Dynamic Mode Decomposition with Control (DMDc) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1794
+* Expand `np.funcs` to heat by @mtar in https://github.com/helmholtz-analytics/heat/pull/1888
+* Extends torch functions to DNDarrays by @mtar in https://github.com/helmholtz-analytics/heat/pull/1895
+* Symmetric Eigenvalue Decomposition (eigh) and full SVD (svd) based on Zolotarev Polar Decomposition by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1824
+* Stride argument for convolution by @lolacaro in https://github.com/helmholtz-analytics/heat/pull/1865
+
+### Interoperability
+* Support PyTorch 2.4.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1655
+* Support PyTorch 2.5.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1701
+* Support PyTorch 2.6.0 / Add zarr as optional dependency  by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1775
+* Make unit tests compatible with NumPy 2.x by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1826
+* Support PyTorch 2.7.0 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1869
+* Support PyTorch 2.7.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1883
+* More generic check for CUDA-aware MPI  by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1793
+
+
+### Fixes
+* Raise Error for batched vector inputs on matmul by @FOsterfeld in https://github.com/helmholtz-analytics/heat/pull/1646
+* Refactor `test_random` to minimize collective calls  by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1677
+* Printing non-distributed data  by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1756
+* Fixed precision loss in several functions when dtype is float64 by @neosunhan in https://github.com/helmholtz-analytics/heat/pull/993
+* Remove unnecessary contiguous calls by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1831
+* Zarr tests fail on main by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1859
+* Decrease accuracy on `ht.vmap` tests on multi-node GPU runs  by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1738
+* Bug-fixes during ESAPCA benchmarking by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1893
+* Exit installation if conda environment cannot be activated by @thawn in https://github.com/helmholtz-analytics/heat/pull/1880
+* Resolve bug in rSVD / wrong citation in polar.py by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1905
+* Fix IO test failures with Zarr v3.0.9 in save_zarr() by @LScheib in https://github.com/helmholtz-analytics/heat/pull/1921
+
+### Build system
+* Modernise setup.py configuration by @mtar in https://github.com/helmholtz-analytics/heat/pull/1731
+* Transition to pyproject.toml, Ruff, and mypy by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1832
+* Move to pyproject.toml in release action by @mtar in https://github.com/helmholtz-analytics/heat/pull/1950
+* Setuptools configuration in pyproject.toml by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1919
+
+### Docs and Cx
+* Documentation updates after new release by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1704
+* Release drafter action handles multi-branch releases by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1660
+* Release drafter update and autolabeler by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1681
+* Update tutorials instructions for `ipcluster` initialization by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1679
+* Added Dalcin et al 2018 reference to `manipulations._axis2axisResplit` by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1695
+* Make it easier to get to GitHub from the docs by @joernhees in https://github.com/helmholtz-analytics/heat/pull/1741
+* Linters will no longer format tutorials by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1748
+* Features/HPC-tutorial via python script  by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1527
+* Add marker for providing type annotation by @mtar in https://github.com/helmholtz-analytics/heat/pull/1733
+* Expanded post-release checklist by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1821
+* Skip large-count communication tests on AMD runner by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1834
+* Update `test_dmd.py` by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1852
+* RTD Notebook gallery and profiling notebook with perun. by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1867
+* Features/1845 Update citations  by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1846
+* Updated release_prep.yml to incorporate up-to-date Dockerfile Pytorch versions by @jolemse in https://github.com/helmholtz-analytics/heat/pull/1903
+* Update CODE_OF_CONDUCT.md by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1939
+
+
+#### Acknowledgement and Disclaimer
+*This work is partially carried out under a [programme](https://activities.esa.int/index.php/4000144045) of, and funded by, the European Space Agency. Any view expressed in this repository or related publications can in no way be taken to reflect the official opinion of the European Space Agency.*
+
 # v1.5.1
 ## Changes
 
diff --git a/CITATION.cff b/CITATION.cff
index 78d184e4a1..b09e7f80a5 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -9,46 +9,44 @@ authors:
 # release highlights
   - family-names: Hoppe
     given-names: Fabian
-  - family-names: Osterfeld
-    given-names: Fynn
   - family-names: Gutiérrez Hermosillo Muriedas
     given-names: Juan Pedro
-  - family-names: Vaithinathan Aravindan
-    given-names: Ashwath
+  - family-names: Palazoglu
+    given-names: Berkant
+  - family-names: Fischer
+    given-names: Carola
+  - family-names: Akdag
+    given-names: Hakan
   - family-names: Comito
     given-names: Claudia
+# active contributors in alphabetic order
+  - family-names: Hees
+    given-names: Jörn
+  - family-names: Jindra
+    given-names: Marc
+  - family-names: Korten
+    given-names: Till
   - family-names: Krajsek
     given-names: Kai
-  - family-names: Nguyen Xuan
-    given-names: Tu
+  - family-names: Lemmen
+    given-names: Jonas
+  - family-names: Scheib
+    given-names: Lukas
   - family-names: Tarnawa
     given-names: Michael
-  - family-names: Hees
-    given-names: Jörn
-# core team
-  # - family-names: Comito
-  #   given-names: Claudia
+# historic core team
   - family-names: Coquelin
     given-names: Daniel
   - family-names: Debus
     given-names: Charlotte
-  - family-names: Götz
-    given-names: Markus
-#  - family-names: Gutiérrez Hermosillo Muriedas
-#    given-names: Juan Pedro
   - family-names: Hagemeier
     given-names: Björn
-  # - family-names: Hoppe
-  #   given-names: Fabian
   - family-names: Knechtges
     given-names: Philipp
-  # - family-names: Krajsek
-  #   given-names: Kai
   - family-names: Rüttgers
     given-names: Alexander
-  # - family-names: Tarnawa
-  #   given-names: Michael
-# release contributors - add as needed below
+  - family-names: Götz
+    given-names: Markus
 repository-code: 'https://github.com/helmholtz-analytics/heat'
 url: 'https://helmholtz-analytics.github.io/heat/'
 repository: 'https://heat.readthedocs.io/en/stable/'
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index 8a4d8d0db2..1b50eebf8c 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -55,7 +55,7 @@ further defined and clarified by project maintainers.
 ## Enforcement
 
 Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported by contacting the project team at <c.comito@fz-juelich.de>. All
+reported by contacting the project team at <heat-dev.ethics@lists.fz-juelich.de>. All
 complaints will be reviewed and investigated and will result in a response that
 is deemed necessary and appropriate to the circumstances. The project team is
 obligated to maintain confidentiality with regard to the reporter of an incident.
diff --git a/README.md b/README.md
index 4944df3ed5..0f6ca711d1 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 <div align="center">
-  <img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/images/logo.png">
+  <img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/source/_static/images/logo.png">
 </div>
 
 ---
@@ -110,7 +110,7 @@ computational and memory needs of your laptop and desktop.
 ## Requirements
 
 ### Basics
-- python >= 3.9
+- python >= 3.10
 - MPI (OpenMPI, MPICH, Intel MPI, etc.)
 - mpi4py >= 3.0.0
 - pytorch >= 2.0.0
@@ -184,35 +184,66 @@ Heat is distributed under the MIT license, see our
 
 <!-- If you find Heat helpful for your research, please mention it in your publications. You can cite: -->
 
-Please do mention Heat in your publications if it helped your research. You can cite:
+If Heat contributed to a publication, please cite our main paper.
 
-* Götz, M., Debus, C., Coquelin, D., Krajsek, K., Comito, C., Knechtges, P., Hagemeier, B., Tarnawa, M., Hanselmann, S., Siggel, S., Basermann, A. & Streit, A. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 276-287). IEEE, DOI: 10.1109/BigData50022.2020.9378050.
+**Preferred Citation:**
 
-```
+Götz, M., Debus, C., Coquelin, D., et al. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In *2020 IEEE International Conference on Big Data (Big Data)* (pp. 276-287). IEEE. DOI: 10.1109/BigData50022.2020.9378050.
+
+```bibtex
 @inproceedings{heat2020,
     title={{HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics}},
-    author={
-      Markus Götz and
-      Charlotte Debus and
-      Daniel Coquelin and
-      Kai Krajsek and
-      Claudia Comito and
-      Philipp Knechtges and
-      Björn Hagemeier and
-      Michael Tarnawa and
-      Simon Hanselmann and
-      Martin Siggel and
-      Achim Basermann and
-      Achim Streit
-    },
+    author={Markus Götz and Charlotte Debus and Daniel Coquelin and Kai Krajsek and Claudia Comito and Philipp Knechtges and Björn Hagemeier and Michael Tarnawa and Simon Hanselmann and Martin Siggel and Achim Basermann and Achim Streit},
     booktitle={2020 IEEE International Conference on Big Data (Big Data)},
     year={2020},
     pages={276-287},
-    month={December},
     publisher={IEEE},
     doi={10.1109/BigData50022.2020.9378050}
 }
 ```
+
+### Other Relevant Publications
+
+**For the RSE perspective and latest benchmarks:**
+
+Hoppe, F., et al. (2025). *Engineering a large-scale data analytics and array computing library for research: Heat*. Electronic Communications of the EASST, 83.
+
+```bibtex
+@article{heat2025rse,
+  title={Engineering a large-scale data analytics and array computing library for research: Heat},
+  volume={83},
+  url={[https://eceasst.org/index.php/eceasst/article/view/2626](https://eceasst.org/index.php/eceasst/article/view/2626)},
+  DOI={10.14279/eceasst.v83.2626},
+  journal={Electronic Communications of the EASST},
+  author={Hoppe, Fabian and Gutiérrez Hermosillo Muriedas, Juan Pedro and Tarnawa, Michael and Knechtges, Philipp and Hagemeier, Björn and Krajsek, Kai and Rüttgers, Alexander and Götz, Markus and Comito, Claudia},
+  year={2025}
+}
+```
+
+**For the neural networks module (DASO):**
+
+Coquelin, D., et al. (2022). *Accelerating neural network training with distributed asynchronous and selective optimization (DASO)*. J Big Data 9, 14.
+
+```bibtex
+@Article{DASO2022,
+    author={Coquelin, Daniel and Debus, Charlotte and G{\"o}tz, Markus and von der Lehr, Fabrice and Kahn, James and Siggel, Martin and Streit, Achim},
+    title={Accelerating neural network training with distributed asynchronous and selective optimization (DASO)},
+    journal={Journal of Big Data},
+    year={2022},
+    volume={9},
+    number={1},
+    pages={14},
+    doi={10.1186/s40537-021-00556-1}
+}
+```
+
+
+**For specific software versions:**
+Please use the [Zenodo DOI]([10.5281/zenodo.2531472](https://doi.org/10.5281/zenodo.2531472).) provided with each release.
+
+
+
+
 # FAQ
 Work in progress...
 
@@ -235,4 +266,4 @@ Any view expressed in this repository or related publications can in no way be t
 ---
 
 <div align="center">
-  <a href="https://www.dlr.de/EN/Home/home_node.html"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/images/dlr_logo.svg" height="50px" hspace="3%" vspace="20px"></a><a href="https://www.fz-juelich.de/portal/EN/Home/home_node.html"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/images/fzj_logo.svg" height="40px" hspace="3%" vspace="20px"></a><a href="http://www.kit.edu/english/index.php"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/images/kit_logo.svg" height="40px" hspace="3%" vspace="5px"></a><a href="https://www.helmholtz.de/en/"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/images/helmholtz_logo.svg" height="50px" hspace="3%" vspace="5px"></a><a href="https://www.esa.int/"><img src="https://github.com/user-attachments/assets/2ee251b4-733e-44ea-8d1c-8b75928eef55" height="45px" hspace="3%" vspace="20px"></a>
+  <a href="https://www.dlr.de/EN/Home/home_node.html"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/source/_static/images/dlr_logo.svg" height="50px" hspace="3%" vspace="20px"></a><a href="https://www.fz-juelich.de/portal/EN/Home/home_node.html"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/source/_static/images/fzj_logo.svg" height="40px" hspace="3%" vspace="20px"></a><a href="http://www.kit.edu/english/index.php"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/source/_static/images/kit_logo.svg" height="40px" hspace="3%" vspace="5px"></a><a href="https://www.helmholtz.de/en/"><img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/main/doc/source/_static/images/helmholtz_logo.svg" height="50px" hspace="3%" vspace="5px"></a><a href="https://www.esa.int/"><img src="https://github.com/user-attachments/assets/2ee251b4-733e-44ea-8d1c-8b75928eef55" height="45px" hspace="3%" vspace="20px"></a>
diff --git a/benchmarks/cb/decomposition.py b/benchmarks/cb/decomposition.py
new file mode 100644
index 0000000000..44d9cf1c4a
--- /dev/null
+++ b/benchmarks/cb/decomposition.py
@@ -0,0 +1,17 @@
+# flake8: noqa
+import heat as ht
+from mpi4py import MPI
+from perun import monitor
+from heat.decomposition import IncrementalPCA
+
+
+@monitor()
+def incremental_pca_split0(list_of_X, n_components):
+    ipca = IncrementalPCA(n_components=n_components)
+    for X in list_of_X:
+        ipca.partial_fit(X)
+
+
+def run_decomposition_benchmarks():
+    list_of_X = [ht.random.rand(50000, 500, split=0) for _ in range(10)]
+    incremental_pca_split0(list_of_X, 50)
diff --git a/benchmarks/cb/heat_signal.py b/benchmarks/cb/heat_signal.py
new file mode 100644
index 0000000000..9ecf26a443
--- /dev/null
+++ b/benchmarks/cb/heat_signal.py
@@ -0,0 +1,85 @@
+import heat as ht
+from perun import monitor
+
+
+@monitor()
+def convolution_array_distributed(signal, kernel):
+    ht.convolve(signal, kernel, mode="full")
+
+
+@monitor()
+def convolution_kernel_distributed(signal, kernel):
+    ht.convolve(signal, kernel, mode="full")
+
+
+@monitor()
+def convolution_distributed(signal, kernel):
+    ht.convolve(signal, kernel, mode="full")
+
+
+@monitor()
+def convolution_batch_processing(signal, kernel):
+    ht.convolve(signal, kernel, mode="full")
+
+
+@monitor()
+def convolution_array_distributed_stride(signal, kernel, stride):
+    ht.convolve(signal, kernel, mode="full", stride=stride)
+
+
+@monitor()
+def convolution_kernel_distributed_stride(signal, kernel, stride):
+    ht.convolve(signal, kernel, mode="full", stride=stride)
+
+
+@monitor()
+def convolution_distributed_stride(signal, kernel, stride):
+    ht.convolve(signal, kernel, mode="full", stride=stride)
+
+
+@monitor()
+def convolution_batch_processing_stride(signal, kernel, stride):
+    ht.convolve(signal, kernel, mode="full", stride=stride)
+
+
+def run_signal_benchmarks():
+    n_s = 1000000000
+    n_k = 10003
+    stride = 3
+
+    # signal distributed
+    signal = ht.random.random((n_s,), split=0)
+    kernel = ht.random.random_integer(0, 1, (n_k,), split=None)
+
+    convolution_array_distributed(signal, kernel)
+    convolution_array_distributed_stride(signal, kernel, stride)
+
+    del signal, kernel
+
+    # kernel distributed
+    signal = ht.random.random((n_s,), split=None)
+    kernel = ht.random.random_integer(0, 1, (n_k,), split=0)
+
+    convolution_kernel_distributed(signal, kernel)
+    convolution_kernel_distributed_stride(signal, kernel, stride)
+
+    del signal, kernel
+
+    # signal and kernel distributed
+    signal = ht.random.random((n_s,), split=0)
+    kernel = ht.random.random_integer(0, 1, (n_k,), split=0)
+
+    convolution_distributed(signal, kernel)
+    convolution_distributed_stride(signal, kernel, stride)
+
+    del signal, kernel
+
+    # batch processing
+    n_s = 90000
+    n_b = 90000
+    n_k = 503
+    signal = ht.random.random((n_b, n_s), split=0)
+    kernel = ht.random.random_integer(0, 1, (n_b, n_k), split=0)
+
+    convolution_batch_processing(signal, kernel)
+    convolution_batch_processing_stride(signal, kernel, stride)
diff --git a/benchmarks/cb/linalg.py b/benchmarks/cb/linalg.py
index 3596d4916f..a6526f6c7e 100644
--- a/benchmarks/cb/linalg.py
+++ b/benchmarks/cb/linalg.py
@@ -19,6 +19,11 @@ def qr_split_0(a):
     qr = ht.linalg.qr(a)
 
 
+@monitor()
+def qr_split_0_square(a):
+    qr = ht.linalg.qr(a)
+
+
 @monitor()
 def qr_split_1(a):
     qr = ht.linalg.qr(a)
@@ -39,6 +44,51 @@ def lanczos(B):
     V, T = ht.lanczos(B, m=B.shape[0])
 
 
+@monitor()
+def zolopd_split0(A):
+    U, H = ht.linalg.polar(A)
+
+
+@monitor()
+def zolopd_split1(A):
+    U, H = ht.linalg.polar(A)
+
+
+@monitor()
+def eigh_split0(A):
+    H, Lambda = ht.linalg.eigh(A)
+
+
+@monitor()
+def eigh_split1(A):
+    H, Lambda = ht.linalg.eigh(A)
+
+
+@monitor()
+def svd_ts(a):
+    svd = ht.linalg.svd(a)
+
+
+@monitor()
+def svd_zolo_split0(a):
+    svd = ht.linalg.svd(a)
+
+
+@monitor()
+def svd_zolo_split1(a):
+    svd = ht.linalg.svd(a)
+
+
+@monitor()
+def randomized_svd_split0(a, r):
+    svd = ht.linalg.rsvd(a, r)
+
+
+@monitor()
+def randomized_svd_split1(a, r):
+    svd = ht.linalg.rsvd(a, r)
+
+
 def run_linalg_benchmarks():
     n = 3000
     a = ht.random.random((n, n), split=0)
@@ -57,6 +107,11 @@ def run_linalg_benchmarks():
     qr_split_0(a_0)
     del a_0
 
+    n = 2000
+    a_0 = ht.random.random((n, n), split=0)
+    qr_split_0_square(a_0)
+    del a_0
+
     n = 2000
     a_1 = ht.random.random((n, n), split=1)
     qr_split_1(a_1)
@@ -74,3 +129,46 @@ def run_linalg_benchmarks():
     hierachical_svd_rank(data, 10)
     hierachical_svd_tol(data, 1e-2)
     del data
+
+    n = 1000
+    A = ht.random.random((n, n), split=0)
+    zolopd_split0(A)
+    del A
+
+    A = ht.random.random((n, n), split=1)
+    zolopd_split1(A)
+    del A
+
+    n = 1000
+    A = ht.random.random((n, n), split=0)
+    A += A.T.resplit_(0)
+    eigh_split0(A)
+    del A
+
+    A = ht.random.random((n, n), split=1)
+    A += A.T.resplit_(1)
+    eigh_split1(A)
+    del A
+
+    n = int((4000000 // MPI.COMM_WORLD.size) ** 0.5)
+    m = MPI.COMM_WORLD.size * n
+    a_0 = ht.random.random((m, n), split=0)
+    svd_ts(a_0)
+    del a_0
+
+    n = 1000
+    A = ht.random.random((n, n), split=0)
+    svd_zolo_split0(A)
+    del A
+
+    A = ht.random.random((n, n), split=1)
+    svd_zolo_split1(A)
+    del A
+
+    A = ht.random.random((500 * MPI.COMM_WORLD.Get_size(), 1000), split=0)
+    randomized_svd_split0(A, 10)
+    del A
+
+    A = ht.random.random((1000, 500 * MPI.COMM_WORLD.Get_size()), split=1)
+    randomized_svd_split1(A, 10)
+    del A
diff --git a/benchmarks/cb/main.py b/benchmarks/cb/main.py
index 52cd18d76f..2dd4680ae0 100644
--- a/benchmarks/cb/main.py
+++ b/benchmarks/cb/main.py
@@ -6,12 +6,20 @@
 ht.use_device(os.environ["HEAT_DEVICE"] if os.environ["HEAT_DEVICE"] else "cpu")
 ht.random.seed(12345)
 
+world_size = ht.MPI_WORLD.size
+rank = ht.MPI_WORLD.rank
+print(f"{rank}/{world_size}: Working on {ht.get_device()}")
+
 from linalg import run_linalg_benchmarks
 from cluster import run_cluster_benchmarks
 from manipulations import run_manipulation_benchmarks
 from preprocessing import run_preprocessing_benchmarks
+from decomposition import run_decomposition_benchmarks
+from heat_signal import run_signal_benchmarks
 
 run_linalg_benchmarks()
 run_cluster_benchmarks()
 run_manipulation_benchmarks()
 run_preprocessing_benchmarks()
+run_decomposition_benchmarks()
+run_signal_benchmarks()
diff --git a/coverage_tables.md b/coverage_tables.md
index f90dadfba4..c9286a8834 100644
--- a/coverage_tables.md
+++ b/coverage_tables.md
@@ -13,395 +13,436 @@ The following tables show the NumPy functions supported by Heat.
 8. [NumPy Sorting Operations](#numpy-sorting-operations)
 9. [NumPy Statistical Operations](#numpy-statistical-operations)
 10. [NumPy Random Operations](#numpy-random-operations)
+11. [NumPy FFT Operations](#numpy-fft-operations)
+12. [NumPy Masked Array Operations](#numpy-masked-array-operations)
 
 ## NumPy  Mathematical Functions
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy  Mathematical Functions | Heat |
-|---|---|
-| sin | ✅ |
-| cos | ✅ |
-| tan | ✅ |
-| arcsin | ✅ |
-| arccos | ✅ |
-| arctan | ✅ |
-| hypot | ✅ |
-| arctan2 | ✅ |
-| degrees | ✅ |
-| radians | ✅ |
-| unwrap | ❌ |
-| deg2rad | ✅ |
-| rad2deg | ✅ |
-| sinh | ✅ |
-| cosh | ✅ |
-| tanh | ✅ |
-| arcsinh | ✅ |
-| arccosh | ✅ |
-| arctanh | ✅ |
-| round | ✅ |
-| around | ❌ |
-| rint | ❌ |
-| fix | ❌ |
-| floor | ✅ |
-| ceil | ✅ |
-| trunc | ✅ |
-| prod | ✅ |
-| sum | ✅ |
-| nanprod | ✅ |
-| nansum | ✅ |
-| cumprod | ✅ |
-| cumsum | ✅ |
-| nancumprod | ❌ |
-| nancumsum | ❌ |
-| diff | ✅ |
-| ediff1d | ❌ |
-| gradient | ❌ |
-| cross | ✅ |
-| trapz | ❌ |
-| exp | ✅ |
-| expm1 | ✅ |
-| exp2 | ✅ |
-| log | ✅ |
-| log10 | ✅ |
-| log2 | ✅ |
-| log1p | ✅ |
-| logaddexp | ✅ |
-| logaddexp2 | ✅ |
-| i0 | ❌ |
-| sinc | ❌ |
-| signbit | ✅ |
-| copysign | ✅ |
-| frexp | ❌ |
-| ldexp | ❌ |
-| nextafter | ❌ |
-| spacing | ❌ |
-| lcm | ✅ |
-| gcd | ✅ |
-| add | ✅ |
-| reciprocal | ❌ |
-| positive | ✅ |
-| negative | ✅ |
-| multiply | ✅ |
-| divide | ✅ |
-| power | ✅ |
-| subtract | ✅ |
-| true_divide | ❌ |
-| floor_divide | ✅ |
-| float_power | ❌ |
-| fmod | ✅ |
-| mod | ✅ |
-| modf | ✅ |
-| remainder | ✅ |
-| divmod | ❌ |
-| angle | ✅ |
-| real | ✅ |
-| imag | ✅ |
-| conj | ✅ |
-| conjugate | ✅ |
-| maximum | ✅ |
-| max | ✅ |
-| amax | ❌ |
-| fmax | ❌ |
-| nanmax | ❌ |
-| minimum | ✅ |
-| min | ✅ |
-| amin | ❌ |
-| fmin | ❌ |
-| nanmin | ❌ |
-| convolve | ✅ |
-| clip | ✅ |
-| sqrt | ✅ |
-| cbrt | ❌ |
-| square | ✅ |
-| absolute | ✅ |
-| fabs | ✅ |
-| sign | ✅ |
-| heaviside | ❌ |
-| nan_to_num | ✅ |
-| real_if_close | ❌ |
-| interp | ❌ |
+| NumPy  Mathematical Functions | Heat | Issues |
+|---|---|---|
+| sin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sin) |
+| cos | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cos) |
+| tan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tan) |
+| arcsin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arcsin) |
+| arccos | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arccos) |
+| arctan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctan) |
+| hypot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hypot) |
+| arctan2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctan2) |
+| degrees | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+degrees) |
+| radians | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+radians) |
+| unwrap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unwrap) |
+| deg2rad | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+deg2rad) |
+| rad2deg | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rad2deg) |
+| sinh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sinh) |
+| cosh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cosh) |
+| tanh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tanh) |
+| arcsinh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arcsinh) |
+| arccosh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arccosh) |
+| arctanh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctanh) |
+| round | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+round) |
+| around | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+around) |
+| rint | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rint) |
+| fix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fix) |
+| floor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+floor) |
+| ceil | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ceil) |
+| trunc | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trunc) |
+| prod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+prod) |
+| sum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sum) |
+| nanprod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanprod) |
+| nansum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nansum) |
+| cumprod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cumprod) |
+| cumsum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cumsum) |
+| nancumprod | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nancumprod) |
+| nancumsum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nancumsum) |
+| diff | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diff) |
+| ediff1d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ediff1d) |
+| gradient | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+gradient) |
+| cross | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cross) |
+| trapz | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trapz) |
+| exp | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+exp) |
+| expm1 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+expm1) |
+| exp2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+exp2) |
+| log | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log) |
+| log10 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log10) |
+| log2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log2) |
+| log1p | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log1p) |
+| logaddexp | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logaddexp) |
+| logaddexp2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logaddexp2) |
+| i0 | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+i0) |
+| sinc | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sinc) |
+| signbit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+signbit) |
+| copysign | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copysign) |
+| frexp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+frexp) |
+| ldexp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ldexp) |
+| nextafter | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nextafter) |
+| spacing | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+spacing) |
+| lcm | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+lcm) |
+| gcd | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+gcd) |
+| add | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+add) |
+| reciprocal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reciprocal) |
+| positive | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+positive) |
+| negative | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+negative) |
+| multiply | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+multiply) |
+| divide | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+divide) |
+| power | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+power) |
+| subtract | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+subtract) |
+| true_divide | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+true_divide) |
+| floor_divide | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+floor_divide) |
+| float_power | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+float_power) |
+| fmod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmod) |
+| mod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mod) |
+| modf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+modf) |
+| remainder | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+remainder) |
+| divmod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+divmod) |
+| angle | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+angle) |
+| real | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+real) |
+| imag | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+imag) |
+| conj | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+conj) |
+| conjugate | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+conjugate) |
+| maximum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+maximum) |
+| max | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+max) |
+| amax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+amax) |
+| fmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmax) |
+| nanmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmax) |
+| minimum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+minimum) |
+| min | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+min) |
+| amin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+amin) |
+| fmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmin) |
+| nanmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmin) |
+| convolve | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+convolve) |
+| clip | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+clip) |
+| sqrt | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sqrt) |
+| cbrt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cbrt) |
+| square | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+square) |
+| absolute | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+absolute) |
+| fabs | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fabs) |
+| sign | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sign) |
+| heaviside | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+heaviside) |
+| nan_to_num | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nan_to_num) |
+| real_if_close | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+real_if_close) |
+| interp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+interp) |
 ## NumPy Array Creation
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Array Creation | Heat |
-|---|---|
-| empty | ✅ |
-| empty_like | ✅ |
-| eye | ✅ |
-| identity | ❌ |
-| ones | ✅ |
-| ones_like | ✅ |
-| zeros | ✅ |
-| zeros_like | ✅ |
-| full | ✅ |
-| full_like | ✅ |
-| array | ✅ |
-| asarray | ✅ |
-| asanyarray | ❌ |
-| ascontiguousarray | ❌ |
-| asmatrix | ❌ |
-| copy | ✅ |
-| frombuffer | ❌ |
-| from_dlpack | ❌ |
-| fromfile | ❌ |
-| fromfunction | ❌ |
-| fromiter | ❌ |
-| fromstring | ❌ |
-| loadtxt | ❌ |
-| arange | ✅ |
-| linspace | ✅ |
-| logspace | ✅ |
-| geomspace | ❌ |
-| meshgrid | ✅ |
-| mgrid | ❌ |
-| ogrid | ❌ |
-| diag | ✅ |
-| diagflat | ❌ |
-| tri | ❌ |
-| tril | ✅ |
-| triu | ✅ |
-| vander | ❌ |
-| mat | ❌ |
-| bmat | ❌ |
+| NumPy Array Creation | Heat | Issues |
+|---|---|---|
+| empty | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+empty) |
+| empty_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+empty_like) |
+| eye | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+eye) |
+| identity | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+identity) |
+| ones | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ones) |
+| ones_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ones_like) |
+| zeros | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+zeros) |
+| zeros_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+zeros_like) |
+| full | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+full) |
+| full_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+full_like) |
+| array | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array) |
+| asarray | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray) |
+| asanyarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asanyarray) |
+| ascontiguousarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ascontiguousarray) |
+| asmatrix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asmatrix) |
+| copy | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copy) |
+| frombuffer | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+frombuffer) |
+| from_dlpack | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+from_dlpack) |
+| fromfile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromfile) |
+| fromfunction | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromfunction) |
+| fromiter | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromiter) |
+| fromstring | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromstring) |
+| loadtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+loadtxt) |
+| arange | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arange) |
+| linspace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linspace) |
+| logspace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logspace) |
+| geomspace | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+geomspace) |
+| meshgrid | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+meshgrid) |
+| mgrid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mgrid) |
+| ogrid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ogrid) |
+| diag | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diag) |
+| diagflat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diagflat) |
+| tri | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tri) |
+| tril | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tril) |
+| triu | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+triu) |
+| vander | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vander) |
+| mat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mat) |
+| bmat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bmat) |
 ## NumPy Array Manipulation
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Array Manipulation | Heat |
-|---|---|
-| copyto | ❌ |
-| shape | ✅ |
-| reshape | ✅ |
-| ravel | ✅ |
-| flat | ❌ |
-| flatten | ✅ |
-| moveaxis | ✅ |
-| rollaxis | ❌ |
-| swapaxes | ✅ |
-| T | ❌ |
-| transpose | ✅ |
-| atleast_1d | ❌ |
-| atleast_2d | ❌ |
-| atleast_3d | ❌ |
-| broadcast | ❌ |
-| broadcast_to | ✅ |
-| broadcast_arrays | ✅ |
-| expand_dims | ✅ |
-| squeeze | ✅ |
-| asarray | ✅ |
-| asanyarray | ❌ |
-| asmatrix | ❌ |
-| asfarray | ❌ |
-| asfortranarray | ❌ |
-| ascontiguousarray | ❌ |
-| asarray_chkfinite | ❌ |
-| require | ❌ |
-| concatenate | ✅ |
-| stack | ✅ |
-| block | ❌ |
-| vstack | ✅ |
-| hstack | ✅ |
-| dstack | ❌ |
-| column_stack | ✅ |
-| row_stack | ✅ |
-| split | ✅ |
-| array_split | ❌ |
-| dsplit | ✅ |
-| hsplit | ✅ |
-| vsplit | ✅ |
-| tile | ✅ |
-| repeat | ✅ |
-| delete | ❌ |
-| insert | ❌ |
-| append | ❌ |
-| resize | ❌ |
-| trim_zeros | ❌ |
-| unique | ✅ |
-| flip | ✅ |
-| fliplr | ✅ |
-| flipud | ✅ |
-| reshape | ✅ |
-| roll | ✅ |
-| rot90 | ✅ |
+| NumPy Array Manipulation | Heat | Issues |
+|---|---|---|
+| copyto | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copyto) |
+| shape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+shape) |
+| reshape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reshape) |
+| ravel | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ravel) |
+| flat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flat) |
+| flatten | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flatten) |
+| moveaxis | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+moveaxis) |
+| rollaxis | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rollaxis) |
+| swapaxes | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+swapaxes) |
+| T | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+T) |
+| transpose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+transpose) |
+| atleast_1d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_1d) |
+| atleast_2d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_2d) |
+| atleast_3d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_3d) |
+| broadcast | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast) |
+| broadcast_to | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast_to) |
+| broadcast_arrays | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast_arrays) |
+| expand_dims | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+expand_dims) |
+| squeeze | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+squeeze) |
+| asarray | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray) |
+| asanyarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asanyarray) |
+| asmatrix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asmatrix) |
+| asfarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asfarray) |
+| asfortranarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asfortranarray) |
+| ascontiguousarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ascontiguousarray) |
+| asarray_chkfinite | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray_chkfinite) |
+| require | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+require) |
+| concatenate | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+concatenate) |
+| stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+stack) |
+| block | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+block) |
+| vstack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vstack) |
+| hstack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hstack) |
+| dstack | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dstack) |
+| column_stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+column_stack) |
+| row_stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+row_stack) |
+| split | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+split) |
+| array_split | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_split) |
+| dsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dsplit) |
+| hsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hsplit) |
+| vsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vsplit) |
+| tile | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tile) |
+| repeat | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+repeat) |
+| delete | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+delete) |
+| insert | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+insert) |
+| append | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+append) |
+| resize | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+resize) |
+| trim_zeros | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trim_zeros) |
+| unique | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unique) |
+| flip | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flip) |
+| fliplr | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fliplr) |
+| flipud | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flipud) |
+| reshape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reshape) |
+| roll | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+roll) |
+| rot90 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rot90) |
 ## NumPy Binary Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Binary Operations | Heat |
-|---|---|
-| bitwise_and | ✅ |
-| bitwise_or | ✅ |
-| bitwise_xor | ✅ |
-| invert | ✅ |
-| left_shift | ✅ |
-| right_shift | ✅ |
-| packbits | ❌ |
-| unpackbits | ❌ |
-| binary_repr | ❌ |
+| NumPy Binary Operations | Heat | Issues |
+|---|---|---|
+| bitwise_and | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_and) |
+| bitwise_or | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_or) |
+| bitwise_xor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_xor) |
+| invert | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+invert) |
+| left_shift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+left_shift) |
+| right_shift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+right_shift) |
+| packbits | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+packbits) |
+| unpackbits | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unpackbits) |
+| binary_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+binary_repr) |
 ## NumPy IO Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy IO Operations | Heat |
-|---|---|
-| load | ✅ |
-| save | ✅ |
-| savez | ❌ |
-| savez_compressed | ❌ |
-| loadtxt | ❌ |
-| savetxt | ❌ |
-| genfromtxt | ❌ |
-| fromregex | ❌ |
-| fromstring | ❌ |
-| tofile | ❌ |
-| tolist | ❌ |
-| array2string | ❌ |
-| array_repr | ❌ |
-| array_str | ❌ |
-| format_float_positional | ❌ |
-| format_float_scientific | ❌ |
-| memmap | ❌ |
-| open_memmap | ❌ |
-| set_printoptions | ✅ |
-| get_printoptions | ✅ |
-| set_string_function | ❌ |
-| printoptions | ❌ |
-| binary_repr | ❌ |
-| base_repr | ❌ |
-| DataSource | ❌ |
-| format | ❌ |
+| NumPy IO Operations | Heat | Issues |
+|---|---|---|
+| load | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+load) |
+| save | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+save) |
+| savez | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savez) |
+| savez_compressed | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savez_compressed) |
+| loadtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+loadtxt) |
+| savetxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savetxt) |
+| genfromtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+genfromtxt) |
+| fromregex | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromregex) |
+| fromstring | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromstring) |
+| tofile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tofile) |
+| tolist | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tolist) |
+| array2string | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array2string) |
+| array_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_repr) |
+| array_str | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_str) |
+| format_float_positional | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format_float_positional) |
+| format_float_scientific | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format_float_scientific) |
+| memmap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+memmap) |
+| open_memmap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+open_memmap) |
+| set_printoptions | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+set_printoptions) |
+| get_printoptions | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+get_printoptions) |
+| set_string_function | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+set_string_function) |
+| printoptions | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+printoptions) |
+| binary_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+binary_repr) |
+| base_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+base_repr) |
+| DataSource | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+DataSource) |
+| format | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format) |
 ## NumPy LinAlg Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy LinAlg Operations | Heat |
-|---|---|
-| dot | ✅ |
-| linalg.multi_dot | ❌ |
-| vdot | ✅ |
-| inner | ❌ |
-| outer | ✅ |
-| matmul | ✅ |
-| tensordot | ❌ |
-| einsum | ❌ |
-| einsum_path | ❌ |
-| linalg.matrix_power | ❌ |
-| kron | ❌ |
-| linalg.cholesky | ❌ |
-| linalg.qr | ✅ |
-| linalg.svd | ❌ |
-| linalg.eig | ❌ |
-| linalg.eigh | ❌ |
-| linalg.eigvals | ❌ |
-| linalg.eigvalsh | ❌ |
-| linalg.norm | ✅ |
-| linalg.cond | ❌ |
-| linalg.det | ✅ |
-| linalg.matrix_rank | ❌ |
-| linalg.slogdet | ❌ |
-| trace | ✅ |
-| linalg.solve | ❌ |
-| linalg.tensorsolve | ❌ |
-| linalg.lstsq | ❌ |
-| linalg.inv | ✅ |
-| linalg.pinv | ❌ |
-| linalg.tensorinv | ❌ |
+| NumPy LinAlg Operations | Heat | Issues |
+|---|---|---|
+| dot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dot) |
+| linalg.multi_dot | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.multi_dot) |
+| vdot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vdot) |
+| inner | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+inner) |
+| outer | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+outer) |
+| matmul | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+matmul) |
+| tensordot | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tensordot) |
+| einsum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+einsum) |
+| einsum_path | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+einsum_path) |
+| linalg.matrix_power | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.matrix_power) |
+| kron | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+kron) |
+| linalg.cholesky | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.cholesky) |
+| linalg.qr | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.qr) |
+| linalg.svd | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.svd) |
+| linalg.eig | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eig) |
+| linalg.eigh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigh) |
+| linalg.eigvals | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigvals) |
+| linalg.eigvalsh | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigvalsh) |
+| linalg.norm | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.norm) |
+| linalg.cond | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.cond) |
+| linalg.det | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.det) |
+| linalg.matrix_rank | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.matrix_rank) |
+| linalg.slogdet | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.slogdet) |
+| trace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trace) |
+| linalg.solve | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.solve) |
+| linalg.tensorsolve | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.tensorsolve) |
+| linalg.lstsq | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.lstsq) |
+| linalg.inv | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.inv) |
+| linalg.pinv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.pinv) |
+| linalg.tensorinv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.tensorinv) |
 ## NumPy Logic Functions
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Logic Functions | Heat |
-|---|---|
-| all | ✅ |
-| any | ✅ |
-| isfinite | ✅ |
-| isinf | ✅ |
-| isnan | ✅ |
-| isnat | ❌ |
-| isneginf | ✅ |
-| isposinf | ✅ |
-| iscomplex | ✅ |
-| iscomplexobj | ❌ |
-| isfortran | ❌ |
-| isreal | ✅ |
-| isrealobj | ❌ |
-| isscalar | ❌ |
-| logical_and | ✅ |
-| logical_or | ✅ |
-| logical_not | ✅ |
-| logical_xor | ✅ |
-| allclose | ✅ |
-| isclose | ✅ |
-| array_equal | ❌ |
-| array_equiv | ❌ |
-| greater | ✅ |
-| greater_equal | ✅ |
-| less | ✅ |
-| less_equal | ✅ |
-| equal | ✅ |
-| not_equal | ✅ |
+| NumPy Logic Functions | Heat | Issues |
+|---|---|---|
+| all | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+all) |
+| any | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+any) |
+| isfinite | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isfinite) |
+| isinf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isinf) |
+| isnan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isnan) |
+| isnat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isnat) |
+| isneginf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isneginf) |
+| isposinf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isposinf) |
+| iscomplex | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+iscomplex) |
+| iscomplexobj | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+iscomplexobj) |
+| isfortran | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isfortran) |
+| isreal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isreal) |
+| isrealobj | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isrealobj) |
+| isscalar | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isscalar) |
+| logical_and | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_and) |
+| logical_or | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_or) |
+| logical_not | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_not) |
+| logical_xor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_xor) |
+| allclose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+allclose) |
+| isclose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isclose) |
+| array_equal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_equal) |
+| array_equiv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_equiv) |
+| greater | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+greater) |
+| greater_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+greater_equal) |
+| less | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+less) |
+| less_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+less_equal) |
+| equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+equal) |
+| not_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+not_equal) |
 ## NumPy Sorting Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Sorting Operations | Heat |
-|---|---|
-| sort | ✅ |
-| lexsort | ❌ |
-| argsort | ❌ |
-| sort | ✅ |
-| sort_complex | ❌ |
-| partition | ❌ |
-| argpartition | ❌ |
-| argmax | ✅ |
-| nanargmax | ❌ |
-| argmin | ✅ |
-| nanargmin | ❌ |
-| argwhere | ❌ |
-| nonzero | ✅ |
-| flatnonzero | ❌ |
-| where | ✅ |
-| searchsorted | ❌ |
-| extract | ❌ |
-| count_nonzero | ❌ |
+| NumPy Sorting Operations | Heat | Issues |
+|---|---|---|
+| sort | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort) |
+| lexsort | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+lexsort) |
+| argsort | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argsort) |
+| sort | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort) |
+| sort_complex | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort_complex) |
+| partition | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+partition) |
+| argpartition | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argpartition) |
+| argmax | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argmax) |
+| nanargmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanargmax) |
+| argmin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argmin) |
+| nanargmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanargmin) |
+| argwhere | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argwhere) |
+| nonzero | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nonzero) |
+| flatnonzero | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flatnonzero) |
+| where | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+where) |
+| searchsorted | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+searchsorted) |
+| extract | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+extract) |
+| count_nonzero | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+count_nonzero) |
 ## NumPy Statistical Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Statistical Operations | Heat |
-|---|---|
-| ptp | ❌ |
-| percentile | ✅ |
-| nanpercentile | ❌ |
-| quantile | ❌ |
-| nanquantile | ❌ |
-| median | ✅ |
-| average | ✅ |
-| mean | ✅ |
-| std | ✅ |
-| var | ✅ |
-| nanmedian | ❌ |
-| nanmean | ❌ |
-| nanstd | ❌ |
-| nanvar | ❌ |
-| corrcoef | ❌ |
-| correlate | ❌ |
-| cov | ✅ |
-| histogram | ✅ |
-| histogram2d | ❌ |
-| histogramdd | ❌ |
-| bincount | ✅ |
-| histogram_bin_edges | ❌ |
-| digitize | ✅ |
+| NumPy Statistical Operations | Heat | Issues |
+|---|---|---|
+| ptp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ptp) |
+| percentile | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+percentile) |
+| nanpercentile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanpercentile) |
+| quantile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+quantile) |
+| nanquantile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanquantile) |
+| median | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+median) |
+| average | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+average) |
+| mean | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mean) |
+| std | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+std) |
+| var | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+var) |
+| nanmedian | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmedian) |
+| nanmean | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmean) |
+| nanstd | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanstd) |
+| nanvar | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanvar) |
+| corrcoef | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+corrcoef) |
+| correlate | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+correlate) |
+| cov | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cov) |
+| histogram | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram) |
+| histogram2d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram2d) |
+| histogramdd | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogramdd) |
+| bincount | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bincount) |
+| histogram_bin_edges | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram_bin_edges) |
+| digitize | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+digitize) |
 ## NumPy Random Operations
 [Back to Table of Contents](#table-of-contents)
 
-| NumPy Random Operations | Heat |
-|---|---|
-| random.rand | ✅ |
-| random.randn | ✅ |
-| random.randint | ✅ |
-| random.random_integers | ❌ |
-| random.random_sample | ✅ |
-| random.ranf | ✅ |
-| random.sample | ✅ |
-| random.choice | ❌ |
-| random.bytes | ❌ |
-| random.shuffle | ❌ |
-| random.permutation | ✅ |
-| random.seed | ✅ |
-| random.get_state | ✅ |
-| random.set_state | ✅ |
+| NumPy Random Operations | Heat | Issues |
+|---|---|---|
+| random.rand | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.rand) |
+| random.randn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.randn) |
+| random.randint | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.randint) |
+| random.random_integers | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.random_integers) |
+| random.random_sample | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.random_sample) |
+| random.ranf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.ranf) |
+| random.sample | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.sample) |
+| random.choice | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.choice) |
+| random.bytes | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.bytes) |
+| random.shuffle | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.shuffle) |
+| random.permutation | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.permutation) |
+| random.seed | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.seed) |
+| random.get_state | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.get_state) |
+| random.set_state | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.set_state) |
+## NumPy FFT Operations
+[Back to Table of Contents](#table-of-contents)
+
+| NumPy FFT Operations | Heat | Issues |
+|---|---|---|
+| fft.fft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fft) |
+| fft.ifft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifft) |
+| fft.fft2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fft2) |
+| fft.ifft2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifft2) |
+| fft.fftn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fftn) |
+| fft.ifftn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifftn) |
+| fft.rfft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.rfft) |
+| fft.irfft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.irfft) |
+| fft.fftshift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fftshift) |
+| fft.ifftshift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifftshift) |
+## NumPy Masked Array Operations
+[Back to Table of Contents](#table-of-contents)
+
+| NumPy Masked Array Operations | Heat | Issues |
+|---|---|---|
+| ma.masked_array | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_array) |
+| ma.masked_where | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_where) |
+| ma.fix_invalid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.fix_invalid) |
+| ma.is_masked | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.is_masked) |
+| ma.mean | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.mean) |
+| ma.median | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.median) |
+| ma.std | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.std) |
+| ma.var | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.var) |
+| ma.sum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.sum) |
+| ma.min | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.min) |
+| ma.max | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.max) |
+| ma.ptp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.ptp) |
+| ma.count | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.count) |
+| ma.any | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.any) |
+| ma.all | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.all) |
+| ma.masked_equal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_equal) |
+| ma.masked_greater | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_greater) |
+| ma.masked_less | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_less) |
+| ma.notmasked_contiguous | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.notmasked_contiguous) |
diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 0000000000..d0c3cbf102
--- /dev/null
+++ b/doc/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/doc/make.bat b/doc/make.bat
index 02b8e03b50..747ffb7b30 100644
--- a/doc/make.bat
+++ b/doc/make.bat
@@ -1,64 +1,16 @@
 @ECHO OFF
 
+pushd %~dp0
+
 REM Command file for Sphinx documentation
 
 if "%SPHINXBUILD%" == "" (
 	set SPHINXBUILD=sphinx-build
 )
+set SOURCEDIR=source
 set BUILDDIR=build
-set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source
-set I18NSPHINXOPTS=%SPHINXOPTS% source
-if NOT "%PAPER%" == "" (
-	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
-	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
-)
-
-if "%1" == "" goto help
-
-if "%1" == "help" (
-	:help
-	echo.Please use `make ^<target^>` where ^<target^> is one of
-	echo.  html       to make standalone HTML files
-	echo.  dirhtml    to make HTML files named index.html in directories
-	echo.  singlehtml to make a single large HTML file
-	echo.  pickle     to make pickle files
-	echo.  json       to make JSON files
-	echo.  htmlhelp   to make HTML files and a HTML help project
-	echo.  qthelp     to make HTML files and a qthelp project
-	echo.  devhelp    to make HTML files and a Devhelp project
-	echo.  epub       to make an epub
-	echo.  epub3      to make an epub3
-	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
-	echo.  text       to make text files
-	echo.  man        to make manual pages
-	echo.  texinfo    to make Texinfo files
-	echo.  gettext    to make PO message catalogs
-	echo.  changes    to make an overview over all changed/added/deprecated items
-	echo.  xml        to make Docutils-native XML files
-	echo.  pseudoxml  to make pseudoxml-XML files for display purposes
-	echo.  linkcheck  to check all external links for integrity
-	echo.  doctest    to run all doctests embedded in the documentation if enabled
-	echo.  coverage   to run coverage check of the documentation if enabled
-	echo.  dummy      to check syntax errors of document sources
-	goto end
-)
-
-if "%1" == "clean" (
-	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
-	del /q /s %BUILDDIR%\*
-	goto end
-)
-
-
-REM Check if sphinx-build is available and fallback to Python version if any
-%SPHINXBUILD% 1>NUL 2>NUL
-if errorlevel 9009 goto sphinx_python
-goto sphinx_ok
-
-:sphinx_python
 
-set SPHINXBUILD=python -m sphinx.__init__
-%SPHINXBUILD% 2> nul
+%SPHINXBUILD% >NUL 2>NUL
 if errorlevel 9009 (
 	echo.
 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
@@ -67,215 +19,17 @@ if errorlevel 9009 (
 	echo.may add the Sphinx directory to PATH.
 	echo.
 	echo.If you don't have Sphinx installed, grab it from
-	echo.http://sphinx-doc.org/
+	echo.https://www.sphinx-doc.org/
 	exit /b 1
 )
 
-:sphinx_ok
-
-
-if "%1" == "html" (
-	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
-	goto end
-)
-
-if "%1" == "dirhtml" (
-	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
-	goto end
-)
-
-if "%1" == "singlehtml" (
-	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
-	goto end
-)
-
-if "%1" == "pickle" (
-	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can process the pickle files.
-	goto end
-)
-
-if "%1" == "json" (
-	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can process the JSON files.
-	goto end
-)
-
-if "%1" == "htmlhelp" (
-	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can run HTML Help Workshop with the ^
-.hhp project file in %BUILDDIR%/htmlhelp.
-	goto end
-)
-
-if "%1" == "qthelp" (
-	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can run "qcollectiongenerator" with the ^
-.qhcp project file in %BUILDDIR%/qthelp, like this:
-	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\HeAT.qhcp
-	echo.To view the help file:
-	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\HeAT.ghc
-	goto end
-)
-
-if "%1" == "devhelp" (
-	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished.
-	goto end
-)
-
-if "%1" == "epub" (
-	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The epub file is in %BUILDDIR%/epub.
-	goto end
-)
-
-if "%1" == "epub3" (
-	%SPHINXBUILD% -b epub3 %ALLSPHINXOPTS% %BUILDDIR%/epub3
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The epub3 file is in %BUILDDIR%/epub3.
-	goto end
-)
-
-if "%1" == "latex" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "latexpdf" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	cd %BUILDDIR%/latex
-	make all-pdf
-	cd %~dp0
-	echo.
-	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "latexpdfja" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	cd %BUILDDIR%/latex
-	make all-pdf-ja
-	cd %~dp0
-	echo.
-	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "text" (
-	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The text files are in %BUILDDIR%/text.
-	goto end
-)
-
-if "%1" == "man" (
-	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The manual pages are in %BUILDDIR%/man.
-	goto end
-)
-
-if "%1" == "texinfo" (
-	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
-	goto end
-)
-
-if "%1" == "gettext" (
-	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
-	goto end
-)
-
-if "%1" == "changes" (
-	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.The overview file is in %BUILDDIR%/changes.
-	goto end
-)
-
-if "%1" == "linkcheck" (
-	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Link check complete; look for any errors in the above output ^
-or in %BUILDDIR%/linkcheck/output.txt.
-	goto end
-)
-
-if "%1" == "doctest" (
-	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Testing of doctests in the sources finished, look at the ^
-results in %BUILDDIR%/doctest/output.txt.
-	goto end
-)
-
-if "%1" == "coverage" (
-	%SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Testing of coverage in the sources finished, look at the ^
-results in %BUILDDIR%/coverage/python.txt.
-	goto end
-)
-
-if "%1" == "xml" (
-	%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The XML files are in %BUILDDIR%/xml.
-	goto end
-)
+if "%1" == "" goto help
 
-if "%1" == "pseudoxml" (
-	%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
-	goto end
-)
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
 
-if "%1" == "dummy" (
-	%SPHINXBUILD% -b dummy %ALLSPHINXOPTS% %BUILDDIR%/dummy
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. Dummy builder generates no files.
-	goto end
-)
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
 
 :end
+popd
diff --git a/doc/requirements.txt b/doc/requirements.txt
deleted file mode 100644
index d65f26ec21..0000000000
--- a/doc/requirements.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-Sphinx==7.4.7
-sphinx-autoapi===3.3.0
-sphinx_rtd_theme==2.0.0
-sphinxcontrib-napoleon==0.7
-sphinx-copybutton==0.5.2
diff --git a/doc/images/GSoC-Horizontal.svg b/doc/source/_static/images/GSoC-Horizontal.svg
similarity index 100%
rename from doc/images/GSoC-Horizontal.svg
rename to doc/source/_static/images/GSoC-Horizontal.svg
diff --git a/doc/images/bsp.svg b/doc/source/_static/images/bsp.svg
similarity index 100%
rename from doc/images/bsp.svg
rename to doc/source/_static/images/bsp.svg
diff --git a/doc/images/clustering.png b/doc/source/_static/images/clustering.png
similarity index 100%
rename from doc/images/clustering.png
rename to doc/source/_static/images/clustering.png
diff --git a/doc/images/clustering_kmeans.png b/doc/source/_static/images/clustering_kmeans.png
similarity index 100%
rename from doc/images/clustering_kmeans.png
rename to doc/source/_static/images/clustering_kmeans.png
diff --git a/doc/images/data.png b/doc/source/_static/images/data.png
similarity index 100%
rename from doc/images/data.png
rename to doc/source/_static/images/data.png
diff --git a/doc/images/dlr_logo.svg b/doc/source/_static/images/dlr_logo.svg
similarity index 100%
rename from doc/images/dlr_logo.svg
rename to doc/source/_static/images/dlr_logo.svg
diff --git a/doc/images/fzj_logo.svg b/doc/source/_static/images/fzj_logo.svg
similarity index 100%
rename from doc/images/fzj_logo.svg
rename to doc/source/_static/images/fzj_logo.svg
diff --git a/doc/images/hSVD_bench_rank5.png b/doc/source/_static/images/hSVD_bench_rank5.png
similarity index 100%
rename from doc/images/hSVD_bench_rank5.png
rename to doc/source/_static/images/hSVD_bench_rank5.png
diff --git a/doc/images/hSVD_bench_rank50.png b/doc/source/_static/images/hSVD_bench_rank50.png
similarity index 100%
rename from doc/images/hSVD_bench_rank50.png
rename to doc/source/_static/images/hSVD_bench_rank50.png
diff --git a/doc/images/hSVD_bench_rank500.png b/doc/source/_static/images/hSVD_bench_rank500.png
similarity index 100%
rename from doc/images/hSVD_bench_rank500.png
rename to doc/source/_static/images/hSVD_bench_rank500.png
diff --git a/doc/images/heat_split_array.png b/doc/source/_static/images/heat_split_array.png
similarity index 100%
rename from doc/images/heat_split_array.png
rename to doc/source/_static/images/heat_split_array.png
diff --git a/doc/images/heat_split_array.svg b/doc/source/_static/images/heat_split_array.svg
similarity index 100%
rename from doc/images/heat_split_array.svg
rename to doc/source/_static/images/heat_split_array.svg
diff --git a/doc/images/heatvsdask_strong_smalldata_without.png b/doc/source/_static/images/heatvsdask_strong_smalldata_without.png
similarity index 100%
rename from doc/images/heatvsdask_strong_smalldata_without.png
rename to doc/source/_static/images/heatvsdask_strong_smalldata_without.png
diff --git a/doc/images/heatvsdask_weak_smalldata_without.png b/doc/source/_static/images/heatvsdask_weak_smalldata_without.png
similarity index 100%
rename from doc/images/heatvsdask_weak_smalldata_without.png
rename to doc/source/_static/images/heatvsdask_weak_smalldata_without.png
diff --git a/doc/images/helmholtz_logo.svg b/doc/source/_static/images/helmholtz_logo.svg
similarity index 100%
rename from doc/images/helmholtz_logo.svg
rename to doc/source/_static/images/helmholtz_logo.svg
diff --git a/doc/source/_static/images/jsc_logo.png b/doc/source/_static/images/jsc_logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..d7f23da6ec57004903a67ae80ad51ee441937a4a
GIT binary patch
literal 16766
zcmV)qK$^daP)<h;3K|Lk000e1NJLTq0074T0077c0ssI2(=lR1002QnNkl<Zc-rk<
z2Vfl4wVk#r$pu0J$^ZEw<);D0V6bJi-6<=%Vce^_w7tltLqZ6InnFmZNgxR&Kms9v
z>4e@w0yZ5}OiVGx4HvAoQ{H>?-+gaamcir;Tast-P6n}7guFX*=JtDz;1-YKG1!Mt
zyadIgh^pdIL{;%9qN;clk0PoD^B^uE#j8+6)t(>7B@X>c08%b?T}%jJ|FCbk1ob;H
z{RRf`F$lts@NX24BC2*iI7%pjuxn#tW7ncD5HS(45J4Eeztiau$A%xU9QXnH!Y%@+
zh^k%UiW(<supBT*u0wLC?Gl?{hvdG(3m1dM;;xb&ZqZ*BQT1s)kohu<06WwnVho-T
zAcGlja<BOdfD`=z`u|{u{*V4w7@^#c6l1b|6;&jEV2n8o5T^DA>=pPsWMPUGY-9Kb
zIS$1Q1&zZ`3}2D$WHF&XW)e4xonC~qMF8zXsG`W&RhGvbp36gV_=iHO+jTcv_WJIg
z54Lr#{&34%>o&gj{<>G*Ui0Ex@4WcdyZn5q@9V$cTlww>tJZG#_lC_MZ0%gXrIXvb
zB8&<cRV0TD($Dbp1>A&XaNv!o1)^35A)6T<2W$@kl-UeY+g$h64VxZ%>CNBY_0ZXW
zxG{6#RZZtyI`za0D&jx+?&1><Iwn5CXb~pN6AnFAICO?^*o^$wnQ$ICTdG*RUtRRz
z*-MUYIc<9Cv@A1hxgd+U#b~R*P{n-~hPXNhD;E49ZhIgeV&4TA5o9@*jU0=*Ef!##
zuKV~KYkvOcJ8KiCj+qzx_U!1_rpLZm*CtmkD%Dzrnr5-4O$vmiU_@4<vKCjggrp?|
zJ)S#_gs8_QJt3+QQHuzHuo#Re<qPJWa84(J7@i049M(lh6%wkjo4aQ@NVz<srl3KH
zL5(MJj*EK<-sU(T+V*Rkw_SYOgLSF1g+puQBaayoZ1hz(mTK+3nieJ4?o(rOFd_tF
zqMnr0q^!oJV2plM;7o{GbO);RxM(m`82~l8Hd8`hjGex0`HNZRMvJH_SX;#-Y<}+;
z$RGoAAjGCl&SVa*%XME^v;J>SFF)}&*N<6nyf9{(aP%Uns;vafPfSo_N+>M^<4Rpx
zqCnB&)B?$&q^L$kEk^%He}KLa*iVYgR>3#HR>cMO!-AHir&71nR}(q1{=`?;ZL<0t
za1m7niz;6GW+^>|5przD>2;tK`CfO=#WycIcIl7DE<8>gH&ZCDmx6KG=#Y)2qMDY~
z1T{he1q#3-E`;J@ZBz_Jq}mAmDjH!?kL1kK4#$`1Z>x6vzbgH;>5*C_a?Yj9Q1!!6
z-VGNIucNB(>F1crhvcqFa5o{;R4iY!etPD|Us4zBueHk6O-djv2O?51DyT_OPm94M
zhZI;Qtq&g&L1PgTJ@u%tv$OjTr^IZY5{!O%QtWR}J;zE}sEDcpLsehWhoIbO(Flqr
zb`EhnF?na}_P;#-{E_q0!ndkN1Y2m7t%N${P=}(Y6zZh3gsju>DB`0&js{H3bJ0`O
zQ=*;$2S-g8oC*F-uuK~L8Uv8!%u@1`s1ntfPm7fV+rBZmr5B8)>3$4VMGxprR1xOv
zFzb)d0Nkc%!pOmvneBo|+;ty)?X8n8zIN=wj4<XHe|4KrjY?`1{2H}J5KYJWIV^VS
zgLG;%8p}v}#9ul4-0N<E+9|eKl2AMf09D}OEJ6@G`-plugsmROq&INI`b{%V{K?m6
zBz)!dz5wKF+_h0t#B5JY)CUhkyGGR#pB@&EsNIOM1=)hhQo!QjeGdpe52gd?jib3;
z7G=AMySl4)!I{4h#@CgFnnXP;X(>_52znagZYU<vAUY-mW0E$cs7m>3mkN~&+b+1G
zhZ<4a#Ka`UVHoeDiZ|(?af>VmS=2tg{K3XEuKt^U!Yr|Bp`x>7Eff`vs6Z1pBL+Et
zZO-L!yET-kO86*z$4-6v-L<Bhv!|$Nt2~S<vQx6h4AzFMb<(f@IKKH*;qaNG4VrvK
zp@u|HBPKyi-NvvLg+NM+4IP;JkSb|0sj8tWdh&bQwmZb8?$2VC{o>(KR547^#~xdb
zZNo&0d(ZQ)f9II+h+uQ67Ll}QX(%B>{!d{<Q?ry{q*&q>7lJWCjquU{n69Bjm7>N<
zsurJj<L#XoqCJ9blMP<=FMkw|y@e{m8!ovqIeOA4%CRvrsmtrd=*_JkMb5ok_-0L+
z)+$p-(X>uYios9b?}RX<vAIsewKUi#Y9s0EV6sF_ePdeWk(b|q&Lei97F!IeJccSL
zXkwP-QC!(pw~g%W1pn^tNB(C@i?4EVnHKTs)YH+(iB*!+)DUB6m#9s$SS`w$fa%hj
zh}M1*jnf?mhTU0ls)#CYpenyYfLOFk{YchvOyrXHdn`SC(kQi6sBTxZgrd<qN)x>l
zxIES;8Fp0hSc*E_m{=DV_2|ez^F@ET11eNbuL}dn{R^PQV=tqM+bV}fy9nDha=I~D
z_OBNY4%L^|v?$cqXw)!JB&7u{!@efL85yx*hZOHLCn>DuP;BIsxc~6UAHifh?`0RO
zpPoY%Oi9r=H|x6VP3vcWzS}o`ZfP*AuoAh##+YbGE^0}d^s!($DukkY166U>&&eqC
z4IDKqeZlYOTelq3g>4)N3Ug}_#bcOJMf$qJ{iZi*s7s%C`mC>5vwp#8KNk*}RuXJ0
zQNtxhf`-Nt@7C4EsMBF*L@=V_-a!@ED$z*$=-U`OW#zgJ-I(SUFk*`M2nDR)6j3$2
zsN&<k98-OWvMm#1)JgE)SFZa0?5H?l;YcGbsSyS4Vqa&MH+uWdy%JQ#sL7OI?kXh*
z+Ku?>UAgW|esa+5_wT)X;J!VN{7eVpOOCALh4Vw$pC44YxgFoW*(my?Z?mj>o_}N1
z#AEz{hLTW5s0>4iT=&?i#YLDlj)27)H~0F-SM<XD!Cpesj_ch2$PeUT?iN_Kiau|s
zVnE`XzK-R(>$2HPZhuhtM#X3~;@6Xs9xDl@1<iY?0$|bbF5*)o-=3X$?%lPvS6AS7
zf)s_4?O<eM>Lp;kk)vw3AA=!5Reo`bxcQ-jEVjD{+Jtc87r#{|&fafwMy}2X!46SR
zNISW?#4ZP$h8Ily&BSCqrqssdi3{3(a?wZd{A1VKD>OXDgY6=OvIKV{2a&x~_#f&V
z5V4pRcOtyebr+rfa|)@FI_T7j+7dzQ5OP_b2Mra}U?nYbP+i+~_djMbE(VW+Y3I|y
zWIHA=ty$Z}X89rP_&mi`usqJpFLivYx0~}$J6E1CTMo7hYFyNz2_wQl3+%Ue5K;of
z=`p_%J9u*QhmiOIp9k?~T(x6p7V)e7$ZP-pO1bv<>#KU93E<`f_5wx~=~qu62sv5e
zzWL#{$(b{J<@0>d*Gs~tlpdFie7j9ccn42n9#zzW`a|iGV0-$)--BDhS%=*qPdYx}
zi9p!Fu4TiTh+7vw(Rb9OZ%$cs?}}IHpXps-(-k=1+8~95Dn_otnN@7~ti<hcoNW%8
z6hFPRx?a*FQfMz$3O;iVs8G0%Ip*yioBBRNOq{n=UaL{@F)vIk;+nRLSADoyJTf#w
zYc35o9#B2+A8)>kxF3W_1h{VU!~lhYDi%n&^y9%jAk<22qp{Q}=ZvbHuLN58u7T)H
zRDHr;QlsUoX{*hgZGtPeA(QoLtj-p=sqeFG2U<xdT=@GDYQ0a7$XcvSYxys2{`2o|
zG}$I4+#2wZc&ML^!A2Ev5D=$0S(j|Vc;RX1`^V2xs#_#BxFUKMRUfNV83|Zx9y9Il
zue{mA=3e^Bx!&}Eusnh-6ImN<a_HQoKhP?v38i+4Pmh*V)gQ5_<HZeIJAvlLhHW=+
zDMR~&47#lX0<rCuxdGwhFZrW?!mLs?EW_MEiq#dvqULE-!3aF8DJSHbro-o_*Z1^d
zmqjM7XCcKoCTL(~Q?T6l*wbICYnOo!f(EQ9DU_mthcI!`_@)!qVYeF|3~(lWWe88B
zD&LR+q63FGTO54V-=CC^oHnX1DI0v*Ji*SW$591irO?}s3FQk<zw##P#F5iW@0w>P
z6d)zyz{1Wp)5-kuDzUmzG@=m2s!_#Y>$NKN=8Z=zNYRwg0svY=y|_3?sB+mB19oiN
zqz}SlZ><r&r}^nqqP7E%3OK$LbZVhA<#ANOESIRZfAyHyeb2sNQ`3u0%-YDFt*W#6
zK^V|k{o$7H%}ywKR5GFpO}D9+(4t}xdgMMDCRQwH_`wALEChqB-i@pJ72oLFv@EBW
zKGToAzWR%SIle%Pq9p|{RSo`QF`XLJ2qO|o$$E!e-5Lm=yfNF`i*3a2n(N`=bObeY
zMB#GtQ_F?1$0|m=uQAU_Y}kN*bkw9&*~B@&{PSH~E!%{VZj0o$t#hAlmsE0l%T_T0
zL>J~mXj=Qq`VS{GoFJ6XD+#3(BQ;zz=6jARaD0+MAM1>)XGR<CKltqpG@h~uvh!SL
z9z+%IB_m{IkwtGqu>C|O&@35=k9|4_6GI89I$Wkj%f`=I{{9vdxZ`XX$EDs7Ryqbb
z6%qEtP~{;--<fT>=>48-+qstt<L8x4UIHr&Y9DPu!qdQlN_redyNtNMHmwBPM^(;Q
zwY@t_ogZ>+57Q^G;G4!BX4b_otlJ_S8uDo|8liuzNg#!iQn0<$$cS2UL?~8v#FP&l
z*M#1)O(S;<!TTU*g~FcLC`O8D!$zo!I$iho4?QW2nc@R5bx5HONh=UkF*ZF(k1Gc2
zC|4{xCUZLdfaXrY2k~IJiZkXq4!!<v<h;w3$_7bKiALg+wGwz{beaz?QMB}^Kx1WN
zW;4dUKy+z>xxyy!GJD2W!NX!=Ak}#2^|yTGGfMR^V~R=&Mp`uDg;GgpYxPN*Rrrjk
zaOmVG|Gfsv366sh@+_os&o+j{d%f)(6kMPLqp)pe?6QFYn?f{aD$|o?fws|s#iw3z
z!&c&Mw=6eTYe3$xRXI%#(&eB{uKUfY4N9=ZuSa1}I2eJlXDC@1&6ynwrBPq7?eGOl
zJ2_W2;~mYfQoEi-6}2(Hy6KMn^=2A!fI%`~EQ}BH?2IbN4P`ar*P?Q;{YzTY{jaRr
z&i0%z%5qBZy^1Q3Pe2x2aGHI9p=v+pl94sdxsRd63JXm6GF4pGlfu{;SKRsFM-K8T
z8*z@_o6&l+AqoQI-cy#I>DSv8HAcgaT{c_ATr?8pjdKXLs_Mr#9>2kMdK`K)9X8^Z
zr()jY$4J=S4@WZx;GCvG-#sf{{gS%aAB+?pnEF4;T2c;1zE;=z%6lJl+O~I5m0vB%
zZRoo0v#VBrV^&x)!jc+O)TnHH+R|*FucvWjvV{KFp;J$~<Z4K`5a<~U;zt&C?|ESj
zf~<+1EyOKf+~KcoQVg#S3J)k+j|riKSh2X_jElBdC~Mgs234MDo{61g3Atq1eZqtp
zzB-DV2n^cqI+p$TH))7cR+|`IG3V}QUS`(MX3<rjwHjVj<#^8>+d)>=b<e%#HsPq5
zP{R#%6h_T+ml0KnW=Y{Mg<3DUaTzQtg01pyi-xc<1&kaT`fauC^2jN2uwDko7lHkK
zJvoqKiU<){dW7D7=&Z;(la<?HCmZMXDfSAgpq}fJ=RVjh99mlvqHi=gpbz8WxN1aQ
z^57}WtG9i``mZ+jE~<Dz2ViBo?hETT3S;U<)+S^%3aEm}=~He*|L@Z<E0~b9v>1$x
zRGZJf=H?!5t_I0O3%le0T-U~EJ8?tNlSXK5qLCB~`m7glTqWsof1qXI4}Z=IKv{Zy
zJxz87nW=3es0A!M?L6wZ%e0gas3^j~>7@;zI!L2+nw<n_6rWUrZQq*Sx^nZjEJk>c
zitGbK6&tl+PYL9G2*$&%yZ4dN6?03WdK+fZZsDLxP4X=a;B<f<K%%rRNsaFEwHw%a
zrrjsX<xx|`CKYT#)?-&VVp53_m4k66k-|CRk^^fe#8~j2^yz8wsF~59GFnQQ7d0(l
zVWrtvj`U_XE`=~K^F$HF3WAXLnFyn`UEPyYKPszi^woqZs-QKi6^;kwEZKmnB&=i5
zKUOY2Y<8R(D&!(BvnQ~9aHvwjJ3RB6zxo3Wq864zyVXz+!Q7jyCd<?|;eV@EY}gE|
zL?{dFbS~FvMMG+<II0NKWpKJ(_ogSG6%Ly+0;n=mz=J@u4>eIZ)y}@X`vhY@iwR?o
zx&F!J5Uet>BjVi&1*lgc)}W+!VeI^<Piy7H^j$P+{$Et(nBP)<Ek0Upnb^35mDrJE
znOs%M#k;B5Dh%N1fKP&1l?;3zhL+#|)HV3jh@xghqk~PON2J0jD=-E<dJj@kC<zs3
zt@W!@nm@=|ou<isd~TQDi;X`h6k?Enc5i*+nJ-Un6$5RM&>6e&NV8(8mXfrLWMrgZ
zWWQkZb@x93W9`_<*N>c`v{eKg1ISq;oBplWJp7bcK8HnxOMwNCZH|kDQ&x~UaV>pJ
zA-j$V<@1lf<m!!G-7xxwoE(WBaXi|np*I1W)KHyx`3+*lA_&j-v{naZTMQ<ZK>Os5
zGuHN)PF@{&w^wln7FWT%eQv*x{#E5o8K2rrZw$paeK#yxj|hbmSLIQap%Il&i|!X_
zyyNNTAxgJ;SQ-T7AmZAdL>1v$_NzB<J+k4%(n)D)_lH4PzM?ZdA2lijQ#5NCRlR80
zKbH3(WC0ZvYzYq)syJ~C<M4#q;PQ9Y?O#1#3WhmPH8?&k%5sXrDS@gHrrws4^p296
zw#tU%H}+Vd3kO}Oob;dvJs{i&kQ<}!58bokWntV*+1L{nLXIkkUPKmSfj6vNJnz&C
zx(KnM&ucroOD!=VTjeld5tbI5aN!?D2itsls*r1*F%h1Mq@bq1G%30Cg3EKwf9z@Y
zSpE)86U0|$pK`t&Y++~^%z}=rwuy&NTDfs6qd%pGy?ah@K&Z;A(4uu+J%P4UMyTPv
zMgKZzsDdf=luu8LnmBjGyYD;hPC^3S9i^i8nUD>LNQZ}D4GzeaV}p2EO6t;OmGc*!
zc#g}LCcuuMAwv~CAfzJquXynbT73y{Ru^)AhCUSrHB<?Ipsl)bDJ#nq01+eH1dM66
zbo%eE7ml8xfcr~<j=^B|7?LGwtN)0}uWjzMTzb#5yGN@#KvV%KQa2{&{^_s6(R1Y5
zQ~@`Vq4-JjrnHX&>yWxTpM53k61PB5MFX8K*F7jWr_=~TBUy_NZjuDMscN*WrlIAm
zi@;pjJwt~oHl63<&9+mQIa{c1qqjsXWK>D8$&{9Yk%Q}+f#k{a5fukLf0I|=_1Hm^
zB627JPF!yvOwMrVU(w$gmx68Io*iDbb(;f_8nU<wcKC_Ct|u#xJBDrfr3=>s(xjD!
z>avj$CN`dP!|l|eSZ23ZmuT=#BW<iYx#;8zd=-sKD9zHh_Ccsu^{b02K(HI7vD$%C
zqgUSffU^ts7&2(6a#-GR%kr0n!>1}RiJKDCSRr-&cAi%AYxMFZ$^!K-uHE1eJJ<5^
zq={ma^PGzqHvV3G`~Aaar$!qQSak>G-{=lW-`$W@$qNVP&xE3d6^gA;#$VIYa`q+L
zEC={-u+275If2Q5io?P{tS>=8s+IZG?}Z8TWi_b)6-TU4J)i;dU2>ph+VSUp(A~pA
zM$@yQl*?-TZ1M?pJLnJhJmx=ot}oC^gDNQ}<vMt%(i0Nw(F1)!i5~xIUGu69AHq5$
zgmXE;;sMej4+vG5%`1$Z7niD9D5|8wvQ-~PRop*0Dp$?D?3TO1uQ(0@fpCwSD$Xg6
zq5JB5XxYc4&npYI32K|9#bhnEXF{8wv{gx_3QMD_jL+y04xfI*<Ie)o5~IiBGHLF>
zs_g^wfJ|LcI7EewGc5zfDlL-Vs7V%Fu}Z3I`}&NgSJ$nFhXwI<S#M6gU{32vt!XbI
zD>rTy4xQo$4X}vFq&FosCK<^=MOBi?SJSw;L#$}7O`X+Evl&>AhN3=jnG8&3Y@k)p
z_q%Oz%p|`~A+=NpCZOa~h^W$F?!_0ZpT6`A$U+=w(0F&4*yTK!wvDqit5|aRb;9^X
zu!9wfigmDQ4V(kJpN#JURd9WRuqQ0*9ZDcB95!{64J;otq=Ha^IGF5twBGMGJp9rd
zzN2SK(1TpU=y;-qiK?Vbzm7Zh&gHMLl{(ud1bcQbnN2$pnk=E-tm{fgO;teWCrup%
zm@8luiix$U!L(J>s|MQ?YOd6@tR<!L`B&U_KkSD&P=95X(;XaC@p*_pJ@9a8)qGa$
z=_t5%jR+G+Q6RC<lfIhP{|POipUb3|3KA>iS!>1o4UPGF>3#k6Eq9gF)C+;|r)^>l
z=C$Ip0$N<IZmw9AaG|7UGxkw3n3V<kn|^oaJ#xhy#+jMvNA^-UOQf)Tl2B-#F?PnU
zZoj{mUMZ%~3%AX)#8hldff`h5s3tTlEeW<N#;_Kj<zV=Ls+sTi!oV@%-1&ouDtcUY
z>+Pe}dI7@2v<STWv4XNy5w`S~E>WYSD(Af0)zby4$qtBL=Cr}Q>H}FI^YANg{@0{d
zrKUw1=If@056~K}yZ<qgV@e+AWf~}|e!J{mf5jY$P2n(}&jPYl5dqW#(h7a}CN2)2
zf7vDnWnor~o{p0j9q^*9a?EUR56%8gz5E)vy1_p=B@Nrd%<7S{5gTo^h0nT(2@=^%
zKMfD22fXxew~f;3AuR_~6;=~4!dhl&zmf7!n7!<o=elzf*@!-WPEKCcgJlK!=1ltD
z-u`HNAogQ12tu#J_b})8x}=(RBmU!U2xtcxs?cDf>YDo>^i|FkRW{>H^G~BtfE9y8
ziQWt~sRU!=n@@gy(`HsLMyv|~S{#I}zIhQ<$g~N5_~ljpi3@0|PxCTqIG*POCyimK
zqM2ZN%>3hCcxR2-k3@9PDhm(2^r|p!YU!j7IRLJNN!AubXcbdx<IqbEBucf`wzDqG
zWwkr@NWC=%!^(_^vkqj<t}f&rf7x}yghedshz}b-ipcldw78-sMg<%1`sZ`Kz&wZs
zqh8eyU2k{V!q{3#jZjp{MqDz&g@P)`{?vq`B@YNS-S@()o?Ue(x%fOs_~H@>`?_3L
zJYtFhmpa0C#D`7IfTPM!Bk6IouKN39JuYMGBm<)48`vZatA2?`*2!wS97w^;LZ}e5
zLpZ9SQk9YeZQnVzjc{>#&(7%OSuxqP0wnUkx$S<~dI-fC|6+95q=bRy3an#H*5YDi
zed?Eg*b3`a&Hx9|2G+5GB{{dE<pjB=NeIUCWc&qyDiLU8qoUd-mCwE8_6MQwXM0rV
zfh$z!<f&bf1zrpHo1sM|HLXf*nypz6Yt+GpW@ElkMyhVs5@&9-?5t&yL9AC{Cbjy*
zEB`E%&yzw4pgt;)sXfV{L3-4uhlL|1VHlaUkmFHQ<)m_;Hj9B{-*xYN{-se>^L$Xv
zNsDS~m_=VXy9o{06)pDPliD_AO~!9z58fn<13QOLta@KKY!Yk>X-Q}>7s4>HOi`sK
zgo=e7mtETh6`-t*EpIwN1Qky<#Y7y7p=#l2=lH7Xl~6{c2_Wyc4rAAq$5m2ohp)Cn
z`0nKQI=iz7vHb`#IJSx~4w7}I{oj+DWGy0r7^zonmBvN0DD<PU8uJ-RUrp3sKKH5B
z@0qZIZ6nu{lrKQX&w58Nbdsg0dUf6A@6Ae#tV^*ui!}E~!?}c#LscjZ8T8>Z?tbxg
zFjK@D$Rzs!Q3b<on;q10?q$M+2A`1@J<7_T;MbkENJTZK=xMpK<=ErSUDw%T1Lc!f
z<8%mIneqqhCU?_6Tz&6jrBw}Lpk3Cv2y~R4!yU^?kVL11@`dMLe;eDHbOy3@Gf-Pa
zKkqij{_~U1DC6hKdO|ilYOBONJrz?VOwso_BG`DzU+(BZHe)E@6AIXqs6q@>)C*dh
zZL8_5ON8-@OKOv1Fe0;Uv%P{UKEW(jG{k@UJ61_E2eNlMKu#Fy^c`yZR&Mz4uzAT6
zJuG=uL4;p-QP04srTuDj%>3jVo41)Db3}o~X*oO>uoFJ_$%F&Ck@fbr&Jp8h`h!t1
z5cL_UT;Qz5hY?jdlP>|^T}-ZSQQJ?2;T((x&jYgO8?y!J=-C(TA8M6%o?C|ojyHNZ
z?7Al<EdxsQTCB9X;q>2L&mJ<14JojQ*!5f~6e^4aG0GiG<Kgpeyh}K04!y|I4m{<~
z;ap5&I~VPO7E#pj|I{vKgH~wJqIIs32P0&z`^J{-qbg=8YUJaP^1#*&j7I`NO^SuL
zF@Lb7?C9w*N`);4WkH;n^F<I3IzU*DBCcbz)r<r61!7H$s3yeRh-TlJ7*13{x)W>_
z)u=*m?qLR?Y7pnv5#LH9*g}~4)wc7l@K-i5&TbaaalIPvpi0k3db(7P`Hr4(%hS&R
zs*ue#Mf&QD#CxbBCcHyn<i2giE5hMZMoi9dB<)31!6uK&P({BprfwUjqa}k?S;%W4
zQP2O8gV$s7Rin{Y)9hy<B<r=scaiiCb#X;Y@GUkj?f~XmkIHHk_z1PQGGYF#<A2i2
z9tjU~67N723^1{M*NF{FM%E^jVU418;^FOoN-A)s`9d-A$f<8^?ZE?E-Wn*X*i(*d
zv$w~Fmgg=1d|o(Wx?gSghcY13t#f*=*szzTxCk%*+oMe586*XK!SMgij6M0ryIBxp
zLtds)=z}{WyLMK*z4q(V!g3(2>=pC>iIp4Ra0gZJn7?Aq)2r9yrA7y578EEPbL8<>
zw#zctc4z1O@chwwi(DNSf*qokQlQol9p1n^za_#&vqC_+3Z#s|$o^`>nO9zK<@FAT
z=NVsE_6Ld|=UsD)QZaADl#ZMS*t@8rk1MrR)DMQ@(!^QIo>@L<sOpPDSU<rgOjFf#
z$+PdR{qFSEvYJ-0Iw1#>GSh-*W+O4IwkoG16=RE+aVZckt5`TGdJ1uQwc7SRix3;o
zqcN-VPIs@K{;?cr6hm<#v{yP4A463Vrcc?pguiOut^Zn%bMAUjJm8K97qiJp8zIo-
zxcTva$zvyvGGa<FCaF9P?2oF%(4(ram>#95qN$31Vna>SNgK1Kg>8fpDFk9Fw!eb2
z1V6Ou-H}EMkiD>`)80oFoa}8FQMqc~)ek;482f<z>@!Cd$qg$4qXTv<YO!y4YK1T+
zBnR4Lts_@D>qFHrBrH41=%Cf<Q(M13H~r%JO(wiwK$Tqxu!0gOMqPxQdetpLMFUKW
ztIVf+1y!IOTvoIA^4st1x4ZS9W@vX?A(sNnvR#{Io9ExWtgL3CPYXjLs>h^YT-Fny
z4>QO#aJ*l^fD57lG~JDe!MLDiWMc{a8%y-oZyX!G@1@teawT9OQ}O833oI2lY}?&6
zLA?w{3AZBrmD&cypdKd*EX$fFY!#c`OG-w9dcfZ;yWbr|Tea(IuuB6h({k|!;+}fd
zjU%h)mj)w>nv(S;qMDMxVzuYQdIrO2vDQHpJU#nulFm7C(t@5T4Rr|P=6`Q??6zlL
zpb>hHgLs#Tc(66IBk{`#&^u62fXe61Pd_glHVIG#!^G^;4S!qgleUU|Q_+*9l?$$1
zcAx7GC`=W0|8Zm5;xuya!S)u6uDJd0zOmCws>A-!aiY<|CSqH2veJWqD#khtb=Nc?
z2_PGTuSOCm`Hz~bwVnE})$dYBu^Au)W|Vz;%cEp3F4qa?FAytOkDHqqp*AT7A5m`;
zybxFE>^iG}s*;KG|8VyMLv5@08l8hl7J_#dvf@Wye`nPAX`_sAN!=3a2gRE9&&=7y
z6~=a9{9yr02vXyep&0emW@J6?3nfNPNsU$;XB~I$n(f`-@c5)6Fqq`UXuOFk=vP3n
ziX010*L`B;s{O0x`PGONN-$Lf&!8$8kw8Hyp-h~6&x)7v5ZkK00S-0=Kp-nL>D77d
zjf%#MIAP96BdJu!cCE;>TkwKz4E(n#DHti6l=5q>zM2L9J0sfiqw`(Hu|(+Y0oFXO
z@yhcHn-JtlRKaEALNT@l<Q&oSE*lYO02&A*#gyFJh9ij{2&z;NM^!>8dBW_wmcK-X
z9##Fz7BqOrre!+7bk+OAD{t7pW`VD!S>`(TJ2;6F|4{;pZIT!?k4^A>H1Xq35Gqn5
zYU5)00%gLC*(aa-*z2q5d+4EmuVXom$rP@09D{7m!XhN=&A<x7HZZn{<)Geowr(G@
zFyjlhfqxHW@*dC=sA7W>DM?TGCeC~O&9?`*JShy~rGj=Ff=T}Eh&=uFx+zP~5hu(l
zGib6FgLP*Z>1R<9y;wDEAj=M>KEf*UT(yfU2S<QKE*SAqPp77oU{VUirJ8t2ZAJ-3
z<*G)1#q0yCr$4%S%?8u%<pY#%uG5!WQ`sq{@S?3^T2nNWpy?kqROl~%e{7UqFXlQE
zER@l`A`Hu7*$nl7(%7jluHEb~Roh`j6(a?wSIO$?>F!2&tLr}b?)t9;W*?|GkJh83
z>y|2kgseld%EuhU{4*h|J5CBCIhgiQUmlDrG|aDVrC=)6W25xQ7uBY(h34P!-0Kuf
zoes<kgN4MnuTEP4t(hID0tf9Nb2~z@pI#wW){EYN(XQ5XGf*uWGj%IG33w3ufCDTx
zFc*YO$L?`3?nX`zEAHRD{MGhzE~|<CL_T_sSUz8>YLKd%8MSv3C@763z#!0fYH3+Z
zf#j>&E>tcO#?AbK(G*Ob+3>>)Zhhik^lveN(9AZoJua6TD-0lg+^Dn2&$5W~)|TyG
zsjXMEu;5AAPFCX2!2IB$2CIYY6;#31GU@dq*aD7H)-!P!6xQ@uH$VRL^^ZS$(q%V%
zdup@rT}?P-l5oT<VccBd=-K&mmM~_jaKy2~VaC`6$+ND%<Jw1`UiRz@YqFNbICO2|
zuu3%b9=Kpk)2_-@j*}%!Yxtr&?v*O%NuFdHXKLp#1?f{G{}r4MsM-s*iqj{<Uu|M!
z(yPWsdO;$ahFgeOwrg$e?b&KOn@sDS?(7@edsc4md3}4=Tb*$JdwbU#+q&M^+Vgf-
z@4BqHF>7`r=#DwidvzEoCF+Zu$bh<>-nCVPQF7DQ2k5g!wb56b@+PX}AgJhy)h!i`
z$GJIPkzqy^%m^Ts%;#=BP^I!Y8Vrk*h=!*GS&nVP261i)i0N78a6;H%xs7tPZl;-q
z_#ZKOf?N%X7okY*j|F0zK9)IAP&rUqbANyLLqhp<|Kz9`4100&8-xO)pzKvy-|&-*
z8ReHVXmjhIrF^x+mT<bG9f;xe_`b0kmu+h>a!$e&s>zNMu^t(p;LPc1GRH@9y=@+(
z?*dfb6?W%cKmPvc!)3V^jS3E*6)(}7l-e-FRUVWQR^tkkITAwo;<K*3i7UttKdSa9
zeeCyG7pQ0Z|34!yq_Hs~_YZeH{6(Wx(W9~!@##?yM(a?lQX_IGEgU)LZ%;qRP&Ihl
z5XED#f^@L87&d7>LTKiR=Sr1LQYa3b(4OR3rKkeXhf>0Kro7XWH9__g6;b6G|F;8D
zki(jY+_U19FB#4LP`=;bNihVN#!<tP79Aa^-_o0PAW%W1h$>H^3K*NH_rti;br<~T
zmr`YeVq|hd(jHVt2lkFAsBi__E82g!*|IRn@}Zp~s=Sc_vMB|Z(Ytpc^2~edgt60n
zFl3S^Q1Bqb0PF6Hp_owJ+I;>M+o{{eJ%|iQbfk!?VX7+-Y$BBH!MN9T%NM8pfo7OG
z1QwwP3?v$!i>UIziVNk7e!c9GUP6PaUMPtaQRM+hv7#N66?zfsp|<K*x85g>o8ilC
zekb}EM?A#V)Nh1=`sfkW4fi~~!eTp1cyH9JipMUCc$hI%WRYvH+WgUxi<XvovW~FJ
z!I+Y|%>N#ncy`s>4xfo8#T?KpMJmTdrtM~Fkbd3`%jAg*BsI2k(w7jt$=syHOO0rt
z?c@#JunW(6AjNu>M-o7krRR3rd2YkDuTPJa=n>gSeF9aU<m%O9{>sHoKRn;ft0N62
zi+}OhBM%6&$)Luz;EYSfiHm%p1c=fU1Ov1dJwR`~{BQTVO!<e;4iqa3d&%YP{0#?@
z?XLTW2mUdlX0ac{Lc@yANAhDsEg`vw?oZJo2O0}jZrF^#N?ELp>lSU*UWv`!oSGif
zU7{~d;;uvPsENmptff9BCDO!7jd&79DG}ZtqklG%l9BYOZHLTgaAB1S<e%BJToF}!
z#a2Pahk=!j*bcVo1HKiZmeYSWDo`(jeFElmS@Wa^)Y33a%$VgQKvi6-T5!&v{%W!X
zJ&+q2@I+e?RfFyUX;4d_?JRb(4#C?nzWK@L_N$p&qP8lqKLZSU5){+DX{$0a*Im|A
zvf3^j5?Y<jnrwLyy9kjYs`ied;#E)B7_%L7n`N!H?60T`eN|0zC@zOO7_%}&!J_8H
z>Xg6`FBz$_x{OlUFtO<b3aKp1{rU!Ri>TTw$YOFY#OkuGEv{ReI-{g&k<7Op8RdkS
zqrLR3vPMkO)4pI_K5EVtx8K*p8G+bNEONO@b+NWSjAqKsa|Y3S;IQTVAN=MzseEQB
z)5;QQm;!4>(P3sJ{EVpLEUz6tHGYuRxN`j_n~TF^%=pBaB`%_B$o(BA`Kf0<?m{CF
z%SNDh@RtW3-@jrGQ#{C*54`F4c4{hvwk7p|zUt`QlYYLwtA~HW&w6EwsM;g*#6@HI
z(g&MN2V(b5+kWZ2wcni7v>#BlM#MrWc<!)Op{P=u91&>0<feOA&u@2#dW)#qvnqm{
z8|210>LojioVP#PUfyuLzq(0MqrSSNq(ut{RY|cn;nO4EnUQ+_z4h3&c8^k^h^pQ7
zee6J(6R%<0hN*6cxZ99>OzJdY{5)T6Mlv!2#8Qz0(Wp*HYT93$@mJN)KK>kwbuf#z
zYA+lpx3O(I$m+#zr|T|0<-9Ly4L&0()}}-lAdX6Eq1dW~KXjZ_+3p)V{qZ;61}ZRY
z6;ZVpEEdj}60!(yMELCMZY!%;;8WXZ*elfmUw3YtvOqkb988R;4p%JbpdTyiWPy30
zh^nElo9Fq#8HXQ2h*2e5SqB!8&-~-fB@^Z<YD`q4axIH|bHr{5&sL|HU}=ntbV^3D
zL{ABaOnc(>_qqu&9Sf7)+|4Ya>N6aek3;6S_hARiGHnL2Q!lx4M8%vDYP)CkICrJ{
z%6ZRny05a4_NmcO<kSzdS^AXhiHu?qRi7z}B6-OT<T3&Xdfhjf<~i5?byVf-Qmw5-
zj|ytxWPCfw^%8=bl#P^p^t@Z1dal=Db<aJZu_&Ty;O@^wIU@yXeuz-FN4j12l3VXB
zEuUAahW*roz)ohQKp@4s$XZI$G7_UfEKy^D_S4?`sN1v|SvMiO%~V)K)n@}1E2TR&
zMqO+-JpGH`jjEhqQqv}bK1EVgqoP(2MFgNblk!r+l9p7o_<q5*i*8uvaMDsPP)!gi
zqG~8$)CVRukH;BoNUuwJTzA1q=Y7#=mC74N7zq%fR_Pfpl#Gv~3MO;(w5&!)=<WY)
zH1xWzg)El(<+S^Xs2cpj0m-ujv7II4a;i7ieQR51>sh~&j+j}d#zxjH^=WYs69~qc
zqE2B%>rx-nxY1Fj$K-L-|MJ8$HYBzV`>4#<V_Zblpl5uL1ld5Sg)z1rr#IWXjeh6x
zm8)t~=Lkp4mTMwRqmXU-avtxz2ADSqUiv41iXH*+JtHHBGUE7!Q`4urAb^`Y_VCgb
zQT6GuIh1QKA~q*#>$YJk^7;oKJ9K7D9KT?{NlQwM4$orf|2e^BvwD@D@#~3$^v37k
zUXw*Mr-ftBuU-*V{~<6(1oyhA$HBcl7;nVnqT3%GIdRq~Jt76d{!q%7Q}OFxN{ALl
z%MVnB(t_T>==m2MfAJr;SfKhmSnT3ORP8F9;&BvqvgSt)>NQ>R()tavPWicT_zb@q
zk>L?fF*=ucfwYZxW_sc3uhjRcsghvpn3>Vn)@?MA<q~VKSPqM*+6hlPf-1)D{gDN<
z?pNOT=yzvFeN_t;5NS%ve1Tm{7NqaXK`J4ti4ra8KkC@u+<Fgz_Bj2xgQj~|L{(pG
zPB@D#+vLMGLR&3RG?|?K(TJLbBSR5IZI?hjg(=a)$W(y=i%09AG|SADqQyo9TW1}2
zb~i+jx!FTfMAaU|Oq^Al4+g`fv|-;HTaIa(S!ftLPN(hM`j6+oSTol@aS^yYm8Rl}
z+>O~$=S&ogabkqTAjV;&l)AK7)iTP6j4D5No9kL&qixJKV$mmTb`ezrJwQ30aCI)i
ziSQicI0!k|>0;9d_dN4_?A*(hBc_!FTV$23UxL_9v>;uc7GtPND8ZDhrWGUQuS@yW
zw*9IXT=(eHTR^4|8c^8X!Aw;|)$Sb7kcUZg5W(SsT?8Nnh4Z(!cD9^z@t3ExNtKIA
z15FCs4QE?&F{O}ECFu02NYhi&lVWxAmqP6gr(d|q0qG4`-Jmff!D#UK$oC;s^~rJ}
z2ulh7nl-bQWp!F0DEh0L?)qZotkRk$g*7xpJuIjZfpH?p6kb}i5FwSL_k`8fl%ysl
zHCz^K_};XZwY`qXAVpxu4Do=V7+38<V2-jFTb5&YIZihwt3TZQ*QcI4C^%OhKdVe_
zFV#~%J;f3|rd`OW7TH}%h+4c*^FB2y2f^HyXwCZvW<K=tYwS8ODHnwDIxe_qs|H#o
z=>LKrAg0T%;Hw{QJN>uU9^G)9Fm9H=CQ=F|9O~>qW<%84MI$Qej9^$xNNP&dQiWPX
zpr;n`sgW|ZO&NdemG?gaohY*xN(nGV#U-?JnnQ(wDxPc+E?V<36Z_c=E`O4M37_x$
z=f8sqSBW9GFR>$BKM4+a5JB?`K`#QP(O6uy*0vjdc){1FHIEF`OMzxK7EE!I$wQ}|
zow58`U<E4rJP10Q$zfibqDnH7;>5=Nsu!Mo#kE_An?)vz&iflD<Q9spB3z^$lRnXQ
zvM0YRhf<U~b9Di}K@3uFOun1{0O4^suNcr*7A9LQdsDCZ)N8BNw&P01O)IIKS3*-f
z=&>X}Q6`8MsHe+~;~6o*;A{?zfYhKtdg_4MSY=~oy-6b}V$v|3;Gw(kLyIa5in)Dx
z>LAqFa(Z2a&lBVny%?{OLtKg+$96%55NL7w@l1TMy>r<=m;d6Hd%rU)AsjZjOmC9|
z?LH$836aquhL+@k_4znsk`ZI(3I<&SkR*=zsvEvHE3$ISwl0iqWDk%o!CR=>5z26e
zZi4#^%<jXA?3~KET+EHJ-w;-90m=+3i_I)SJU}pA_r?dGjGX<;VEYLN1Q!V9^I4@J
z1rKqqi$?<vAiop!#ODW9z`O{|+oF+_LrEzR_Ek3?JgMQCHS0QI)`i{zXQ(x*huBt8
zlYud-F*;msY7ak5-*Bd#kGyRPB?4hWFCkfmq#oD3?xCkEn==RMb4MGCY1ZQpw)tUC
z0m=k01Po_H3c^;gqS7;3xXInF<=28?w(67cg<>NEjbE;r_vFeqw^)wJL}j^1W1*so
zbJS2{)oGeM6#S$LAQ^Df&f6@xNjk3S?YdpA`_4z5FRop`;@$V}e&y}x1;72qqz3u$
zNxt!O{WZ;Hp|}tTi?HY&gMJHrVZ>#|d;*uA+AWp$F%shE2~uFIlGGSWp@bA@I50T(
z&ksG>3s#Ecw?^?$OE3;Cs;G(Ta?lNT|NW=GzwU~g?zr)RC;tBIi>o$n>)owiNMZE$
zN4*cf^!j!8J$l8h_x<#`znysL4YjE=zdp6Cq;jq>Zl+MNSPr%-YP(DUq@@&NsiZ9t
z)PzV;1+!2TJv5NwOK{xlC0XQ=77E=1vJGjz*2n5<Fh^fzM9L=2y7I1vNMF8$b5A#M
zNdcn@;cTaweQo3Bqi45$p>qCL>e|0{O#C~u(}&DUmNz6PrB0i9!ddf9I5YB-U$&q5
z^W=FKpK#IT$Nl2c%+G$E`o-npvoDUFdrAH27tA^SjJo(q6B;|dw;=tkIq8EYwH;U+
z*<Wi{DjKAT4JCngg|Ur6ZOMo+Qm<G)!G+|fNVW9IF(kYz%`o8boH1iX>60S`V{$M_
z-Jf4;D;YQa)_=T|1*I?S4(~CfaTV$8+(I4scE{f0x(5g6enD-bASh8IrFz5{Xe_C2
zD5+Xl7HlZd8<oI9U!cCkX!6n5YNJnU@CO=3huTNzEhY4ifqJ>7URIl^9r9^WzZR1M
z5hWNap`Z!SfGIyj5Q=?{Qn`Vj^l@S%E^Pcu3`X{=jqhJQ@0tgn+U~l&woRkjVHQ3g
zT3baetz(-MSe>@DHtSS29Y<4KKfMq&5ucuxpk^JB)u^l|eOgiu#F!{~Le^8PBAjHF
z9Ilk2#}r^}j!-+Kgi->;Nogq<SExr*dG~Z51QSORBghLS#88qxRC0(uu1#MLwO)1S
zgEWTfwYpql*;bZ(ZfsS)>}g}C*Fk0v!K*s6$DVjzY1JZsu-zZZNX8N<(5WdU2&XTY
zd!bMpPO+uN6#!XGq3EG7f^iN`GoQ@ANc3l15k0)$GF01F6iL)<bAk{pT53eZ3A4X-
zOzXY>dJ#0foNhw%8Bh_xpEFePN|j@ytb<XP<7{)?#vlBwY{JYDT3A$*tX#zO2*4)6
z@)NMKf@O;N)u>O6QGe&tVt!VD;cOG|LghPBJ9_e-R<0UKL&XbJ^y-bQOO}++8$Um}
z;{EkovZje`1amGlFGBWiZ?=09RXL$1MBr1w`XY8r$EKm<X_s96h01yU>ShJf77$6N
zzRU-S8&`~lbVlOzH!v&!C+wTE?1rI=6`z2_65#lGsx}RXZzH#S3=x%Fo038any-{r
z)>kZE^48|9)Z1mT1qzY$1gQ`Zph2IbXuV%8DsO@jLNtD~T=$y$9{-BIXoT8I!6XKg
zexpMSvic;n8g@l_sO8U3W<dLneS%0d!HaDYO<Zad{!qfNh5Z$C!)O0$lfzoX*coPj
zx7R0O2+K#@9?RNJ+<(0D!B=Y+msHmKslB1WQ)sE6r^Q+tcz%MET-!(E@sF7(k%`QR
z2DQ)$zmbruBW1z%vho?fy!HM~4z^ruSy_Y}cOOHQ%f`czg^`JIC&o1OoqhaSWfgP$
zYFN=bL>j-VG~<g3A0NEj=TVir0$|8RkS<CofwqyA^<N7teB_l?-89`pHYOIdWC+>k
zP?hi7(5%F9h-13gblo$qy8f$^o8_wJvQVmo8u{urvCpsZNL-8V^QhvK8wmyGRFkE8
z_zS^BQ{$&>G8x$|;<NY`%Z=Q94pnZB?H+S>42mraAsS}g_rj}_Gd~s%n><nvmwZ~X
zv(K9=8dQ~PNx6K%L4kR{zWL5=n8p&+=-Z%gLTpCVjmbV}tK8f$E6WP`%pJWXmPwzo
z_j|JE{OMNtsAGL<o5ZPF`>nI>i$zYRcgkXO7NP6WQW|l7zs`uAylU(A9$vF?9Fn7y
z<k!<3_A-JuY*mh|b%#?Vj11AXh|}x353F4M-5Kp=H497icADuZbr~2WSKH|cMP$CA
zuv6Zb9|A7iDrj!fi#IV7g26vj1U_$|y)@L}*E>q}q+H$bU$qSv-gH+lsH-{HWy5!j
z@^9xE1hQ|UYL{Noq@e0HdwN`Vi|d~LyFYzvYKyO8L0Ks7(_?-u;#Z@J$}e}6SFqS^
zh~&!<i5ziifd))!LWW`i{ZMJafTdPJjf#4^QXBW{kr9FRlA~ugo^kO%-daOJ)C28e
z`frQCOb~IscX$F-Y)+WwFFgqLTFB`}sMB>{SiA0&%dZu_U0qVs<kJ!gEJ-e93bQFu
zP0DJFnXdkJtB`8zpRC+M)mKdkYFgAgByEYTreq@~*R~4drv@S?-S^B3T^Is#P~N4P
z9<n&4BzD~*s&?%Tan2E%g!IDl0=;f7J#p4`SAMvys^w&9;^GpTOT!>kN)B}>p|q%m
zC1dA2QI8d3MU+`w2%Dr3cyPJ4L#|B=Rjs~Y`vFEoILx@};U_n_ZZ}43kAbNVay+q3
zD@=;0`t))`4qY(Vj=0FIOMN0iy$<Sj-K!t|*O80TU(oB7KvO9VJJg6Aisu$ca!Fz|
ze-`G(IyJo^sF?dhalhJJRyFU?d5PmM{KFQ)W~LErc{ng{&M7t+Nd$3`C*7ZSP(^UA
zl*PAZT+a6cyX&@PBWn2FtuDFxo(E>0aE@@~u~KET4+2{XuOu5`i1*vwD-<v#cC_<g
z<S-_wQGc*qIAqee`i}Fi`|HaeY_MF{#ExZIu+9i<K$uI%5jJMcqI9;zR7BOMccOQi
zVaGd`N-;FHz$XG-%Z9A^_*?JKJn`o=!B(o8M^9N=8p<eoMhGU9+9fbRtg*U0^b_L(
zti{E4$O&sK@EWS>S0AQhAODWV<vn6T5Y{myJ+2rj$n}CuHW!kQ_L5LoIC9p(lbWu1
zbopCbx5HEvO#JgW3PTA2lU=n{aWPuoL)^jQm+d;;#MzAT8{2#9e{|t~6X)!wwU%iK
zAGCT>LXb^gX}L{HSZO!-*fAf~Vl!AnH|ERyub3+e#F&a{6hKTN9t5D%yj9kdFsm3$
z$$_LCh>kR(zQCeUHFLi|vu)Y(SGT*M1($_=HwO`q=69bfRJruQg-EN%vO8^JXPGVg
z_m<8JuDwN#{pi5Dh;Y;bDG2L+xw!_I<Azg!UTg$R0uAuhut3kD@|8U9j<Pv=_!}9*
z5;Cy-K++0(FIq~X2}dv?9KEo#a$!~bN#V19`NV6h=!c=HNw?)Vu)^;U#;f6yBC3WQ
zRg4Y}CL2wdFC`8FE-%<cT4y?stX%c8o9?a6oGlzyCzj70VMKD&Qz|$bpzMX$VUg4B
z7@y$c*<&i7ewqsjw!M=ODZZgPutd_A$n*@xl|U;^W~uM~*0Ir(uK4q^e?Grz^H#=1
zhG~X}V1fmrNlqJqu_t;jpL0~fdNSnF*m99W*!C*F`Yvpl(k!VPlP#A0`o_%{-Ez-&
zrqzo_9-~w>l&azV>XHh@Q!uR5CRpnu$^0V(SUZYF{klVYTF<2LHQ0VwFj`id9%-cf
zG=7@c;+wc=zw(*MpZ(_XRcq+`+>V?qrfD4leh~;99m1+n^mDP69!*tn(E|=g^?`!O
zW~=Xr2U?6Yh}$swteaNWr2m3FX87S%+j}ng+x>OXQ@%Z|`Rmid`>73b)nc)-S*~uE
z*(#>kuM`QJ7lRSS!(b})u+@!nbz@o8;;+@V9XcyMes1aqSN!FHS612Vjah{BT3H)P
zz9u{%%qX!~+~FCW;|?bAYY|odj|a@BhPgrnFfZmZMDQ9m5WjG4A+Tg9&zrN!v>tis
zwO`%5EOXHx=bU!o(G4e;R?HU;n=TwRS14aBRX2;(Ei~RzjJU5hA@ff5u3L)v{E>@`
zKwTpOfm&6wSk)v<SSXB{E{vP;o%xx`$DP}L{^jTX@h^8iwc_=48~Q#0q#MkyajIm(
zMdmSacMSOFa^XIzts<)SVsSqAubn%rY|9l2JurT1v-IkNo?aTtJpay`2VVX6Z|->b
zgiEiPaoqXeogNkruN4li7RF2yj+iMNF;|$dNSN3tR5S^d&AG3YO+rPpFn*D6)O_LS
z1;XLeg+nF@-w%D+sMiu_gfIBb**D&P>$9)_bM?C~t=sfoSNDewm`oeMf<IOi*cXn^
zIfEnH$bc;jYTs-OIi$&|)mcd3-EKzp!KGknR{!Sa&c{}+zVn%vuX*6{%Wu2)vfJ*x
z^tSu>SN7t*-{1Z4U!Pol-wSUne{bVj6B<hN>}7cEWWOp)NS6H-@H=_5J>=WxMN|zd
zp2$uiS#Gea0T}`2{?7*?o@Oy81Od%wRV`zRrid9L47nS4VD6NOn=63tyT`u3Hk*_9
z;~`AmS8~wp;JzX*XLZW8jsP;QV-p9lNoWUN<R&dijv$gSZZU_Ea&c-%{%w@EQ9FC+
zBC0-L2hL}ZyW~5{A&jtmp2Gq2uncNp3~-Ftt*-%x37?m9`HAzCf4n_Ta<r{G5L~oX
zpJQ8v7)LFq{9{JP?c=n_*&f!r<%wCIZIlb+azx>rE!ye#x_pWtZ~5Q_ALC&*tca@5
zIj~4S61IMMdmpg+2V^@KDBZjv%IEuTzyH)F{j5=6O(2Iovacd)A3{~}D59!(6j4<?
liboMu#iNL-;!%Ld{{d_QE19SOzsmps002ovPDHLkV1nBEzYPEY

literal 0
HcmV?d00001

diff --git a/doc/source/_static/images/jupyter.png b/doc/source/_static/images/jupyter.png
new file mode 100644
index 0000000000000000000000000000000000000000..18503fc452d27ceb7f5d025d92ed2af7e8b1df9a
GIT binary patch
literal 22885
zcmbq)bx@UE^zNaNMi8V^y1T(cqlAETC@C%7-Q6M}U8024L0akV?(Xj9Zol8mow<MA
zxpU8q<BY!Vj`i;KthJu?>~Iw&SxhuyGzbKODK96b3W31I{riK84F1Khg|8n1(PojC
zdadsEZGQnpD{<-(<#<WH)hQA_22uSB(tM&{jZz6hjpXX5QjeONVU2R5(9%oG8lxJf
z=^v$erSKYa_0nH7o@tEyK-4KzScM?Mq0p>}KGDBFTAEWJkYA*|7aMBJOGv!(il=RH
z&2V%}Qk6iZ3OeujC=7w3QtbsHAyNg!z5eA#8HB@yN`MoD14aG|AAkexrqq(afua&2
zqN9T60zbgLMx`>rVW9*sV+8!)UJB}Rfq?PUEi4!{spH+$(7Yg@Um*%4!6QKU;j(|5
z50Q72m6c)43EI5)djY1I%&x8;Z?3L(nonldwmDpAAY@5axP$m%Lj)|{ba)Hm_I4Nr
zjt&ma($mxT4)|<lzpbC$n7y@a__uDB#lr<pgG9N?Ndz218f!LT%0Iq~QAFHaQNK#*
zFv0S>jmX5j+@K!yZbXoC&eOtVhr0XgH|&a!v;qRxmGQL;CjWM){cUtqwF{Gk&(`dv
z*_ZH!AA7SkeGO-u19N7pKGx5HY2u{i<>e1N9`Eb)Cnm0-{S)RtHCSs0GDYw7`T1*w
z94{=;|65OII9|^q*t!v9wD-PgED8yoAm+2H>rY<arO9Vq{$RyWRn-Jn9i3e`mxxf}
zXR{T?o?i0uL|Asixa8o!wNk!(nLKO{&nJa^HWfxMIKP`R^gVF)_4S42X^DuyzecTK
zqNhh0%oM$<d4-S73+=W_VK>a|a>26l17D{7$NBmB%+=xhmVTF*B2!a&u?CL|ODmY8
zme9Y?H>p?WP_Q@gbBga*#u||x%zB1c^Zonata##OUo2=CFC$~|5xj&W%uxiL%B271
zj4-4kqb=bhuuuh*y2n;mLxV5+fHcj-%xtv=2lL;CwWtUpX&`@II_jqOH+fF1)!HqM
zgdC85LHlR>{e0EpwHhl;nwh~IgwWnz1*`5Jw14ZUVrF1?i6}8!ks(RwdN}W%S5gvn
z@RH!!e=pfCu{|>q4kQ<OqoB|@gY_}|-#q<?>pj(ft<8sJEi5d=Vcc5O|BYicRo3)=
z08dp%XENge;|nSnAPkdi8Y{c#(i-KL)W2hFmYkF{uh%EENl=uV3vajKjr`x6)a~^V
zp|b<&{0B3GAO2^G2%KCK;lO5{T3ajU)B(7Eiz#VrZ2SS|_fa@h3^8C3`+q;$J2sZ6
zs_+KvtLQll;eQrGfWtWV>|C;xrQH4BnM%!&_%rlTI8bc1+Tsjlj)(>K-!Yo0u^78`
zg>i5D<D!a`FqY5!_o)7d56c$dA7jX|F))4)xuCuI=bI)jdTg5T5>F<=p=>fyFNxY*
zkl%C|1K%}MOHjr)mA3JhAjuy7CZeNIgOr5Nr?$xW2~J`kvfM+c`S6$x5rNOnhoHe{
z!$V#{-l?>X&mcv)Lm*d-e-Pla9~*9e222_Ei^6=>-)tQnVf*`wYADbqh+3PE1gYu_
z=gEHHniFJsk&{%J0Y|k~r!TSV_<Z4+pE!Qz!F%5aBUk8OJk(=22?zwsOzxd5oGmWe
z6cHAq6X~N17UcdloCB!`MgwM<25)S(JgfGp4;;j;A|sE=-`_v$ZI+k~(!a3@a7>)P
z?n}vrjN1Mfg{RQb$$vaP(J*`El81PWn2iS!sLFU3Xo*I-#v#>tAXxtra_W8<0G)lT
z^HK`FXSxdJK$?Rauo%kx?OIt`$yb??2RCuH)pGGIG_;Eq7O2I<LfO4qJPPAeD_&<G
zr?*3UMaqTYH@!PPiw5!VA9MaAZ;;|oU#E|9-+RC}T!^k<VNtwR?`%kb1D`Q7c<QR(
z;F2L~w>VE`z4gmPvEFaK-nmbVnc3=3jn$OLJBS_RbAzsSkIh`6V(D#-A}ZtVRP9j*
zR)(<Q{(Mg0d&Q;?&P*oRjj0ENdQBX0?plk=ujI)UUh`g`)Pxn51ynl_IwCc32?iLX
z=9`9|xrj<K&tRFd6ciK;G~YY4X)9Pz&S_^JALFoOi2FX8gEcrQR{E`PZq&>-d0xrZ
z+AgGF$Fk*Hsx4L+cU_DXDe4|=ZE4ztgeit_Ip@_H-4SxXhu=zp{7E?w(SSabiM|m*
zd05B6&-^fD95s<;B(I=APzZii8EX|$!RUk{N1K}=cnp>!`DopBj$5<zMfqNC7rPU6
z$eHxz+mofb{r#!jQ_j27C;rc<XGNqZ?UgrIb8y}tEa1p^xT8X@zZ-^HZ&ey8C|JZ~
zm>;Q;9xbcs(GPL!tz4tQB#Efm*bbAKH7i{z+%bQ?|EbXekdt|QV4yUu6k{^K2R|+l
z1shA$ZF?9O^2sb{U;!&#fRo4i{+flxx^I)QJadb2@X-DJD|s(;NRSrC+pvIu=W^9Y
z9M)BmLyYtvwpCzwJEj9^+cU05iw|Ei7I#?1vX<DXsaN@gA9N|w*(sBglT8k0YwS{z
zl1h-FPJ;RcKZQTy(e-1aqtDs3+stAF(~hXQzq03{Y2t}kWF~@xu+9vQBPM?id-Hs%
z$vk&qukX@cr$P6gF*PCrZMw;`b{hnTbX0$r&D~nCAI3mr%x`?w)4#a>MFQ()KbD8<
zHCMZZ`kKG$b6Ff8TYuuUwO*!L-c8Cd3LcgTkX*rbcW<zd;eSilDb!<V9j%|8vJIkt
zBxr$kgyR)AjgBVZ!TlK+_qG@KmT}Z{b(n5zYwMtPWol(z@8@-PP`d^qK*Ln~mO3+C
z6ot6MYI}If+v!kgH9dJJQ<D_BAML{QTuB&65PBGQ4zkL3TH6Js&)%W;)=NhPZKV81
zLNFTM{OaoJi)L?eZ<a)P0O8wjdQ}z9pirn7vAEB?a_6Fp+Q%v+T5j&DzA0VUdfIC6
z6~+eh;~7P{Qic$3F}mH4>VdM0lvp<7-iXbWk&yTGLsm-{j(75XllJvkmD3IAYHDNk
zw69)yBS6}stL1}#THO_}H=U2_C@U+^-9vcr(pf2=463(5VCdc1+${%KZuQ0IeB^F(
zH<mT(T$S6@=Qc8;|AAUZpX>0X@m+WQ!q#WKOZyPz=&GZu(eiH^WGsg|^6%fjpr@fR
zUf$@_{GzU@8OyP5_ELeW#b$2uWUVuL4j%{S^AwMKIzc)MB?=blv8dDa=NQ4O2*y8w
zuObOu&(F@PdMvV16QB+CfmzLe^@LkUAws`0P*UH1SvU^!6?P*10{{3$(0pp*r{+Oq
zL_}wPe*RksB@}Zl)Dp5?$EM$?<Z`fE;XD5Qod)M!Id4S|JeSqwJ`IFpa{eJi@#`-R
z7MA@tj#pL+OeY`xS3P>I(BxO&!VgCuh;)mfu6(OPC#}d}{?-(`H}W&$KqO6$cE%-%
zt1LEFVO%)c8KN?<;c#@T`w6%D1)rt_7dN+P^J3+O5Q(?2fPmM{g{C2^NuHgtnrTsW
z{E?S=YOfM6HiUF$Fi^`ebguNtA@rB&oIu$kfvY8YMc*bt5)%`Xk<GfgxBdb%aFv?f
zC$+*`SfJNVoU4^<%R^|d(ghnqEM0li4+clagG)i8(<H^VuRrDlQ1Ks9-uq6A_AT`G
zHkUXn$!+E4=CYMsEXiFka!OiiU^mffezAo>Tw!OFeXoDMx$RTqD-zQ?^Ge{d$fKWg
zk2C#M2k)@bie$%kS6)(LQse7g{6is3<dxZW+xwv`mHhTcGzd)Sy$}~Xv?r{-y8P5i
zjHF&I)tQvO#d_7c%umHH#`@AwUEPG&^YF*dE#0YdqH4p;=bj0`Svy?FO?dae-qo5<
zb~L5vR{6|ZGKg3?X3@y;^VQM`eRSs)vRujgmAd=fv+#QK(@m;=;LDkfS~yajii~I4
z?1d%E4=786_%lc`)9sj~uKry|&cw6J`iqLe125uaf7VxosYYfHLu+VeOX+;OViC7-
zU{#dqt2uG`C2Df2e!o2q95C)o%1X=g!^kV@B$d%VhYmb?dwai`-ha|oFF+cwYVmP)
zw|TsTvXFNtebQd(pS3e+(4e@wBsI0j8Srk-elm4zS;n%pEs272&o%A)B5i&7<a&Bt
z8#3u5c=_YUbuSm0rlHjL3L(Zy)#l<mhy5c3`P_-c<gn_hI{4Alp_W#it)wrmdy@my
zXkiLo(+?}R6`ZHc#a(c6p0KlQx?=N%Ojqu`%f^PU#mzQR!Siv!72w4d!xnDq3q`m~
zn9ZC<hsKlgIJ>v|hl3UA)anurc)w){iFP`|0$R>`vx{M#i3vwH=Y`H`J-D68(z#b?
zaa}S?0MtyZ&+n@1MUP|{lAc%Geuz3?K@~NXIGCJL-6(QJtRh1siUn2#6EHP3HHvY%
zw%|*=hV$WYZBgbZK#I8%JHL6&C?EB=MlepAdRX7bT2W_UlIX~fCG>!&8x?%`?@f=h
zR(HSk4&5CuXA0l{JfaT?Uf?p%QXVg|CRV@O*&ZVe`D*ik7ldKb1)TR}1^di5-aO+^
zo(Z#gB2vCuk&@H=bOm$&1O?J}Y;0_PU0>y&cMfm!sZGzcXGhD#NgilSP(8{`azzd^
z^lm!&qicF8jr<h}aBAY>;%LZ(owUq%eHbJ%J|8bLA|PWh@HrbQtI5YWmuQP$%9`#>
zL~rgBQqfq_wRoQ%w_U7ha80}_8Km^bFm~MYzhB7jN%XZCD;$tgvuQEzilHdit1v1k
zhCQ+Ll8pq^-yhJ!W&aeslkQeW^~W#r_2wT;m@<=nGO3cKRAvc}NK@0+cJH2^D}bEF
zkO{FE{{HZHt8)=XS7U$FtdY{9AJBm?^IYMh)WGCa)!xuP+DBpZ(To8Kf02swwOIaz
zdS~WQx%51xm>6z8yT!&Xxw)|jM#kkyyR^pF9o;nj6ZBA)?^{kgPa$WG3n$hnuV;Bv
z;Z?P@<(0C;L{I+w4eWbHC5o}2l+I^^g&y1*OQ)Ph%W2yhQmlZkGi8(26uHQZ$*q}%
zS`j{1cSU@c@^nTuwGavxVX2kKNU1#E=$?4hhArOT%g&&Md|OK9AZFbfg!J*_M<TkY
zE9}#?J&}zf*aI3O;y)9@-ubPYp~t*k|Ls2YD+5VXH<6iA86O{Cz|7nnWr>ZE$l>t*
zW|0)Dhy>!Ka&(X5XPI4{FWD92o+0F1KMYQ?nt_3VCdjpoC!bi{9r|(HE%&h?+h&74
zxRv<%1$RV6aBGE*GAv&zGmswC5?Lt;4m{|J6_=V+lQXWV=HB5Lkp){CEN`QB?WDU@
zks&>>vrrPq*uSc9w3+v85DXF<av=y}L6y{Nu(5z0PtO5!Jz=AV1-t)hmxT{P{FQ+~
z^~PjsC72QG$o3}wfOH!-odU&(6@cdec^Sps3j*vQVsJ<tLht)d&fdR&6$Bh5R)R%l
z;I=}X!!%fch#Fe|`Rdm7>?)?rTv$deRF;y(_=kdyh~ASCDR}u@MfLaX%B_7QfU$cF
zV^A#q+6P|#th(eg+MFOxrA&U)RD#))81aBhOd+Vg-hi1)cU}5t29sVix4RIKP@<r@
z`u)+*R*`;IT#Zhu{|xd1lKpr_g>>uJwj#288|wps@Zx93K*j@aBPB$UzSB#nL)42O
zFCa*0K@3kV=$2R4>>`j>IE4cS92EF*I7p_KU~4qVz<RPEtAIp0oP;CV@N<Zs+{6K&
z!p_E|L^dnL0#iaCBBv{YKKuzS#;R7C)Z8X^v-;34Z*l+wEYb2&$mTt*k#J~Db}X6{
z8(eo41+G0AqzvBgc4+UC4BjsANd-o;TYk(^^mrSr6cj-JYPknL<4i-p?;Hst4mpO0
zd|TC%t!^61C3q6EKHVT0erkm`GJAVr6FPg~Xau0PWqQ!)9r=r}y>F9;s#q<WoK*8a
z=l$Y+A%7GpKSvS@Y{Jdq4ului4}3I9nS%?Cd28`ev#?~C1akd@xvPPb^q5vc1o9{R
z;2S~yV6)^E>Rk`#od1xy@S8+`r1pOBWgEVHdBJ8RjMDc5Zds(f?~h}??o%WrZfE%8
zwBqrVxp2yaN-}|cQi8D<?N;pH3;r==1QcW(zdwaO3jID(LES!5Ch&8DX`n1NYgJ#`
z96$Kll*Gt-iwOB`y~T;5kQ0H^y@?dEiSQ>&#Pu~+_0?12*5P}MGK1?@ZZ10E8UK$-
zVqnd#g(bg))pT&OnV3CdAlH1!Q>2j6N)mELhnGaD-7X1*5cIIraXV5zL;Jf;GnnJo
z*J3Go3hRx>i!Pz@xZ2jCuOUCGc?Wjw7BUA-C@8QXl5p9gbE$8mDkyMqt;MNNS>|3_
zlM&!>e^-tn%9D#;K!>mb%c|p>49_0GD-YjM%MR~}6ul3PY$VTdorwu1miejhn`!5j
zUBSMC-FK#)pWRCPvJB@HHB)p2)x#gV(`gR6*cu022y{b`+S_*%jc=J%jfZB38G|B8
zhkQ<4^idv6_?ezBeS<A~Aj=xK*?-cX-3%vvk%5Y6z0Snbw9>fzNjWB+|8V>(V6N~~
zJG@8gHhcMtgZ!}Pv@!DYgUn`;q69l|&xQIdHhf~HjNcN?<;3l9OuQm3yiTZPy6eV~
zmbb@Y9l)jv(mL?q$xpuH?9yIRTCI2Qn;A5U{N?u*RZVC3$7Y|sCge}x0Sj{X3UZXT
z9R=-d-+|CE)!av%NCyWAA7)kwDtWKOFE0i?dE!RvRYK%H;krv8dNuY9O@lEYi5dHk
z$Dv_{r7PWm@JaT14H>I9LA3UWO-i#h&wfW-jpT)JysveSI(P65raeJ=ud(`xl4H3X
z_nIK4+ChAmXK#)pCcQS|fCYw#9v06DLMHECmp%lZps7KUQFMoc%YlrYk|%yG#*WG4
zrKv`4`f2S#Rg!e7qo0H(?CewP%5;}dd4g1BM%$MUaGaHCLm{?ihfA45E-}8jS3P9z
zOm?G<czUnP^BT#GO|0pmu)yNx0pkevE6s=&qSD`)crvKxzuI3%yZq%g7mT|5KtCsF
zjYNP$lR)3q$f6v*(B%^I^m>J_XwOekWTo0}9|6id9TS*_Y<gNq?rc%<PNIhy@?pp&
zlOENddq86g6EG>If8dR1-(DP;VT)0uwdQby9k58`e52W}8CqoDrtD(f#;4Er6`hM$
z8OvvC-q&GY<n=db;YkIFp2ai56+zntF~`h0N5*P<W8sL2HZT_UWUMZ-DN>(yOwRgm
z5AVO|x{vYRwT}Wdyp;?;sui0omD@Yvsi3f`X>4E5Zx7wBF~A(NAK^FyZr8u!NL%jP
zLq2svub3dW>H5bc4dK~Cv6rGUYuSt@73veOf*6~}b>o{IBNRMLiQ%@}Rg<W<Qu0t7
zZ-W;;7W=0AA?3%v?5T+1^(0sa5J*t3?6}W!?rpF>`Ig{T=DU5Fd4kZ%AwPaXdop4+
z0{XK-VRiu~5abxOx98|Sz*$6nBXx88{3zy8zHZ8WsWWJ`#G8rnwQ_;VWb+00Wn`TE
zcLnqVe8S66&K+TMPJ9`NUq;>SCJgKP&(1@Z7{8`33>jUHX52dCgxK-m-qma;3W`~z
z`xG7HZ5Y0Jj`@@8O%`fuD)mJq`9;rC3BvWyxUb~F#9hnU3+`(vas#7T(SPBMC_9Sn
z;P>!at_Q~ZVpP#I-9@3bGblsz9T(gBmc7k5mH4OJ!YB+wM_FC-j9wiW+(=sjL$QPo
zh~7UG=VJ%J2-#^J3wT6q+_ORY>W-kGUE#3T#+VvFrw@=+OyFXUH>9{4B*gFTqa*ve
zy<1a%)0#(-W_`zph8_lI*@B^;&C~>2q$Md*>EFYxpI~#_tksKMbPGj=?$SJR3*Rwu
zT}gA8N^>iY!WCfe%g8h?bF5~d)6FzTx5DSro9<3MI`@#!?mh|QS&NOi|BWXUuw4{D
zV1o{+G;C!|sVCr7&<T|~k6HYEXIGZ4S^5lSlgMIQM*K$8RqSK(NhFG>6~(EC)>$w;
z@`n?N?;n-M_`HuqgzC9dmhKQB9GAR*l}@w7b*$HIW_d*5M<fSeMCDwXLr1)e1QwX=
zpZmSKEJy|%=?S5HQqBk8Bp2A9W~BPiwutSku*tm5+)vC{897(^YzPQ&$QY|)u8N~Q
z*^?pFw(Es2mE66F%|A_RH~blXX*<N}2C@jr9+T|;r@EHh^<7c`2Pzz`#3D(f7)3YY
z&8=tK&V2$@HiTdxQMaWnSIB5Xe0tr`vzMX^?ze@KG7-Tz9=XRXTPry)tSq!gT03nc
zfJ4k#uZ;K-(SJPv#tNKeH_*#wVz=zq*K~-<-jeC^ADjGq8?utLi<+up<6v#L+1i|3
zT~AxhnQl*dJNUt1drn|35Lrv2TE&D6Z@i0@fy;^5Qeh{napb6Rorh&rd96qiRa8wI
zYyAH7#;U*OES-c?c_C{MJqB@=>o;aw!>?+UcHc{X|Bk$bsgRbE8C_2l;}ePQe!Yjk
z?u*z<!b%I%DQmkmM<XQ@9(!fo^G_kqMrSa9owNB^C*LIFpM@?aHoRkSA-%L^usgrk
zOAMAo-6?Uo5-$Hysl3m(+~ierqbuC>mm76z${kq*-U6cb-l#|fw@P-1@LK@=c3rRg
z3a&T{IjS~R(N4OL)Ba8QVSch_u7NV`aLr~)@)M^v2XiP=K3qh<_u4amI1dr&jvcOT
z=dK%ZpLYARGVYH&L2+<yxzf@xRWM}~w4w`lay6ZEf2hO!m{U&)AwiruV;`L9<|9%)
zUy(%lY-m5}p2$&y+ZeJWDlhWXz|U6ul@Y?<{&|E5er3WQq#ddHN2vm7?^~@XhbW&e
zYr;=k9b2|XztEJ6AL7)xQrGrLeB&gfV$M~VTUSf0zCnZ=-q>Ev8Jnj-oG(N8+_PP@
zjN84$P>`bAgYG+*Uqcm)BEo<4DyGjTJ*WeLYOjs#QmMJ>Q()ZMZu0SzuuhfP#ZQ9;
z_~hTTLu0@E?xR=F%Pb&4<rTQWLU-}Fc%+1+;hWVC<@~w=L2>WwJ!9%N8jrt+{q|s;
znEYa~bsVvJm32|aW_k9@Po$JKi-ER_SIJO9r!LE$^u{3jA*?%b6FmfntH7+>PTQ29
z+DHU4@w>Wo^&b1IKc)9{g*UEf+%EH`66)e&C>e`~*7c2m-2K^6CA6E5QPrqnqNqI|
z!Os&xahKHF%z1S{CtwPCjJp2BDV;j#pxrqeLcZm4Zn$xMM6*`mMrqQpFC&F7$S$y}
ztW!^6n;lL=^EuOaQ;Pr{z1z6I?Q{jPyXvr0<#PkZ*VXgEAqso_CVL(xC|}Ilrt_4=
z<z)|P%<i4Js6iH~@aTOp3m!pSm-=X*!P7lklN}+)?TaFJ@%wHWx*!~30b2uwty-++
zL92tOoh1qyM44{oBhq#yu0GHJgCuEkyPuxxKRMb*0VA*SSdKtKcha#cQ!uF7iJQd*
znxD=b8dh)<;g)?1=~VTmV-e6R{6umEs!#Rdm+#h>+H2<cx@MM?ma)<!&$Ow=+XbVd
z?mO&vLsTGiXFXRSZ#9wLC4AhcGd4_YL62l)bpquk3=y?Dpe076zC(yzgrLWTEhx_P
zoxZ5KW|e7Yd5|RX(*-6n7H{kzy0I`}dfGj17`5Y4Uu+~woBf5(mVi|kHH@1ohy}ZL
zPa?SOTrp8_{Ujb0rIS(VV0X}kjgR*RKKJ?2H{{q!39;SA<85jsD^l$k91p%b@|;)S
zR^rbXB}#A4)SS;o)nVKf#qxwW(vv5H7_TJ#13(_zE^EC7+Gt{3xJP=+NKu(s5QpIP
z>o=eGnC9ZO#;VcfpyTk6?+$5K4-yUJ#<c%EjPlPLH(d_I;~^(yPiuh0J^NMALxA)C
zq+k+5O?-mWZ5H2l%D)z{SvZw!N{+4+TS!68MRMUcFMpU?cYTx|W2Hoa#jHE+5Dt=D
zg&X`#qzW3wiy7m$-E<-Pn!_@92^wf5Y5O*f7v?S}WKc7{fd%BCxRY)N395(Es*KN<
zn6SLBx}X#UeNEz`p3GLz>>L;P|7d;w>z68Oxw{L}i;6DX?N0o(wKjh}8Tzjw8s{!g
z`@Cy^f1mT&vuEt83SBr*p=dJU%LzcX9~ujjRI-72E#bwU?$2uTRV`Z5`Ry)|{r=D&
zc|%d>M&e$y!3U_|eZJhEDW4wnv&J?+_<w21{-1qZ((ssmHfH9?^u$DN2n3mymevhU
zLKw0PFM-*{k^FFb(QvS~mN?Sn>1tnY(8Mt3j@v-{JY3(!1;!6?gZJB8BU>VdFu?i2
z$ICe-9z=yi#Ks<-bW!+54fgi#VnW~*iqWa^>6n--zpF4b4!MZgpccUH9=~yXgi=GC
zhR4l%5?<M%1-Dj(T-qNm|BaiUpWouG0agjeLPJX+nGK~I85vRQFgnpww2`77$v-|m
zR*<~Ysxlim@3LDoN5jGrW57pyl?IE?%sgZyh&1f!?<Xx#VL(QZ;3lgtbFjDXIX*jk
zgZAv%jVTykHPFYeB|I{c#2c`|M6<Kc5CeXLA&on?)!t<0@Y~M6EG&*LFF&gL_3Kw2
zrYJcXS(DtDl|Bmr`*`&O7MalfIOz9?J4Kba(10zalOH%2`jmnOXb8cJ7f5|3To))m
zW4~l9SYG&qt1#RTOil6EX_gyUkdu=$shOztb$9=IqpbX-r<M-Jf7RUF+-ojMm#mOs
zWNfUtxv`-L@5eS|iTC2g1~op~5ixUB@$@+^Dpfl+M5LO>YUHcf>FKFKcX#)-x*CbH
z@(?EELjpWJ#0xU4^@pM0GI~dXg+Q4u=&j|py}yLYq6(b6va;P(2ccm}VbUPa^2KC*
zZtkjdg;vR<j+LF=)5ZQwl{gkAW>e7>_DNEM>rp0QWJf4OV(saThlhu<BYqs{EtMhp
z`J~MX2ni{To&VNi(I9Lm4>9}geRuh0mt8c|fgBR0Bk}zC^TxH+)k$1v=ibe@#_j{p
ze{*0!YJ;FFt>3?YZ|dy73Ig5B{Gy^3NmPsD3l@AdaW4%x?G~ShFhk#m_ZdG+87oss
zz&Wn}owsqK;rk@x?p~+n<aFLLRi@9JlQhWLIvA0m$L71WSKjhSBP4WNdV1;*7-=LW
zB~=g^89CW1f%@#XqwHMlqJUAww5;w4#P2RoCR)gLl#)e3PR`91ILit6`4X#$WESLU
zu_&RUqvHV{OnYrGnJNT}?1zudb`2h&jTS|ysf`}MEFSF|5h$P`Tdz{AzL7@01gs&B
z)(sk8*pVHp?2tbT-S$G^9joP={i$b;8X5#s!_iTsf~rMLWB2#>5Xrk?X<)|hFg}d7
z{mZj6?rZ$~SCLUsUd&=*VhxRrBKIfG9L8u0;CFM}hD{UQsLz{$-d(I#v^=3{`0$!8
zkre_#!87@YNr00-alSM5lSg!L!$DX4rs1}m1}qUFrH{<E_ifT>1%>bZNnMxcvB%z6
zQS2?M)Lv+%77G?PLv{=}FP)#PxSe;$_X_*}gsiWdq=bcqQ2<COx&1H{q<`Yz;J`PK
z%5A<-s#_aXw?M2<oSc#(yuZ{UUM3w$fEPVeSMUcuz$xcs0E&8Zgxh_((Kpp+Qt3L7
zfR=rf#%qNoNBf-KJ8OGy&y$^!@<SEOrI3YkW@cspv=%}vEjl3*0mWRusDy>n3GwjG
zBHz-bmvxz|h(WwwkduoEi-_FYprC|CLq2OavFG$99?jQPeLh+3*dtE4_Z{CG{WOv*
zZ3nz69`~+>fzKP4jA%eo5nEM3_q~QjD}9#NX~kQ*F*n(1<14u31QiCB?4&PvP}07>
zzPYW<O%FUMJlvmmQy+rj4&S7(DH;OIU=SB?LDSDF;&a)bj(#4#t`C(9<X2Tuxa;ib
zKza)rS-M#^d#lqG##Pl1B`z*5Mbgyq?+(SP5K~iA4>U@&PSB}nwRCkAb_xI;e1w8c
z?ls-jRn$E(anTYRE)^@5I5swhL_|dN(Wt^L29}OWF5>p<^WEivrrA*DLldqKekqa^
z3xR7{Ny+^mRBvq-7D_q7*%%2OdOEt8M1_K<k?|PSG_vQ<E2(*S&Wry1`D4ClmKq;F
za@!qGQ@*jj?v>qjktkMCQIXZr+1XrdeAj}Agp`bq7$BAU_EJI829@etP*9LGpt&=O
zhhQ~5Jw5My|44nZnriWVGIW|*Okz}n64!^sWn>Ht_4L%`?%XUcE~0}4dO}m0mq(5&
zHKt(7cZUr}&Eq+447#<p51Xz_-nII3mjQu+1%Lnk<y0*`{|U~D4;Z59?bz0A!JCxv
z<f0KIAQLFJ5nSHh-u~KU{H?k;H_3n|C`oit+v*u4qSK5A<=<9`4f~T+r>?KBc`=)E
zP^rv2W5^nWKtt?$%rk!IIX=Gn3w-=)c@_eSsMy%2`;*Xy$;h8vsn(ZUS-w8o4_8Z5
zd$2h%@7w$D#eO1#;;;DmT`RJ(vPNusm_S4M#?32`P9aCdO(Nw$F+-^4Z)832ch(7{
zH+U2jStQi)a#{A29sn!SSrU5+EiD0Jo$g(qteGpP@f6Et*`OoBPnc(U!%oMP#YKI=
zDHIKgGxzxnPJ28EL3yCVm2f`r#2;0Nf=w!b14LW~SD)?eXQqJ12Il1C>;Ox4%q|js
zK`ZyC)c@pFg#&GTMl~e5S236&%r5+S`A?kh!=9l|WY<^_^rOACb%`7$&bEhWuq!~O
z>qtWOUFXR4FkBj}z0b1IYdMmk2IJ-pv%tMd35OU9WXC<HJNHvqxl*I{7aCl{3x`M*
zc6K)UQbKoiMKXbMr!qgDMiwhKiO(qzQImz>=!-&uhavwRN%T=T5Uo7J6j9$fIuf_(
ziXkg6)2c9vy>+%N?LYup8Zry7DxPIqH@9o%zAvm}ZaQz$0b&oVfM1fTexdWH)VR1~
znAjX4D&hsa|A$hTOI=-^Ow-rQ`1qWIOMRy2;Tl9$%6zY0y&8b{!C{IXuC6AKl8^{1
zP~wd9!pcFk%B7MF=^q=;_Xc3&M$<6jc9My68cRl@u#*aAdi_(QG{VYci2wl|bu@Sp
zi57qYX1rW=ONFPVmly?z`3Bh7u?6+jLqkKp%)-LD<!V(xiltkAAfzlUE&UxlV0!za
z)aQ1Gp77%JmoH|TQq=DXId7{FfqUMow6)b#nGa<aux)nk+5tGZV<^B}Y;pmqK&Qdj
zsLR#VQ&hBSu{ym*xq+w@1O6s|rC}t<>35*z@KDAkkm+C%asR0_?Q3>$auWG2Iff-0
zOFpro^y>)?74^Ht9IO;(BW$zOtY4m&9+F>B&{S}6Yb+t*r$jhZ`WN7%86MP&#O>=d
zG&D3qv|#?Ig`-^QNa}If)*+8594-M-(WbvQ3r^WXp43O6H}Va9qouGzP>JB=qkAv1
zU~81>@Xk)d&DK~I>D4<~{KgvlAt58viPF*0;n4(>jV8XxgYkg~smqY8K*Fn|!<76y
z^(~*FZUdpA*U47rxiKTqTuFjA2#M43S+pzLzYCKnV-RuWLn(21&YwQj0y$V|3ks%F
zMlj5_H;IuQ8NdloE_-C~KND=hukP*LnS$-e)5W1-V3eF)TttuLSZwvBuwze}&w1aT
z%X^53{-(1{VbdSwJ66sEIEzUg^ywQ_nySZSS6A2en$mP!Qd0h6>ZgU_H9Wev(-yst
zo>zw%)5FFh<aP=m$iL3=55mbn3yX~%x&*Fdz?+Z#PrJ4Qoq~5@rly9bV7moMOG#Nz
zY%tIQhc6MEySTV0J-NCvEGgA@a5&w&wZ0-6%E~QxM)N#;(Q2vL+qk~sB~Zal^4L5c
z0H+}5Gkx;$@iFHSVd)+i;QfSbtYD!@mR8)_+S<%xayJS8@7yUN<vZ_9W&sRHR5E17
zLG2ZCJ#2iZ+Yl9n5xr+6(l=k{=!)Ri35Taz?Bw9#QO_3GmU*F9rp`1Wu(^KHeRdXN
zy)>%V=r$rxm&ofTlsL521;&c5BIVIgQBfg+B=P;ro@|c?ITTT;41{=u@kvRok6C*r
zA?T=F1QP(>qdBF6GyuZRISHT~_}-kFHqq^zou4D-sQgw|BZ10Qfrg<)y>#_!5L$Ss
z^79J$>SIJ|k#BBp@_0*YY4Fi_$;e)+1)_8BXrW>}CnvWZ&0~H83`x|NHCCit&L=BN
z9lwoLaIVZckoI)5mF2N?xw^G=l?Pi*N=!U+F)+v+JaZPsrJyLc*?pcNo2eW4aCc?Z
zkV-xE_8svzs3<$T@`qL*_SDaF1aL)MuVq%Yx3xYDJ|e=u{_>X2b94Ku=sS9t0bw5f
z+pou4mj`po@P37AAIN;39<H`P_uDyBSb>G00T0R(BUW1wzh$XV;1E{WRaI3*56+1=
zhh&lxrl^X6f$fiv%WuK4kS_|UK-7|mdv|^?r@5A^*4eprhHk=@3Nsthv(rfjq0#ht
zxYQbN5;$_f;ZoEY=yE_4lSu5&Mi7}^jJ?fM(=#~u$ReD6kv=f{^;1JF4GoQAO;k<C
zA2qFRtth?-Em?l&PsYQ$cFdY(Z;O)5EG!dKQV0ue0$Pjo>z(B1<~8|vPxZW$%wA=&
zdt^n7zYCnww+`pM55T<*1+vui^RTYAHp#h!TC!MjQ6c#h8~FlNlw$QC_Q5i`#O_$J
zN?mmo8LtVGRMl8HtwtOASO8CMlB}nv=cJ=Lt^&}N_^2~Lw#eekb|q_tmPk0xYHo3K
zX_Sa>0FKcQHAxw1)C&HQ!crsA*v?k?fRg0VBocriNRVtBn|S}OSc)k(Dnq(Ni>ABY
zFEW|p_FIEwY8k{_e2rN1sK9JBmE6uKV3NL?ns_-{Og3V~SS3SeD&U&#njoZ9CdY3n
zuWxLm?vBU3d#9jSPh+(A>z4Yb)7D@{CmU*hvK(#iTRFK8JO`gZhRXpNcDhoxV_ne0
zLkspO&HeN$&B`NA^#>@vCWZ3+h<K`x@kil6J4l5o*)8AP#NFM!{w;m(UQ}_~Jr?AF
zM?{QLOl(n^_PMB|%KXk)k*>w;@7mfcIDd%`-2wVvl-59vlP|`^w7aw!8Tb}Xe7Ya>
zAhkMH8hNRLwjENG^NNN+F*qtTaL2#og*P#5jD&<l!z;>9OWR(^mMJbhS4`Egx@y!^
znD+V$3cB3djssTB-<=X0xxC4q;gq*@dxfQ#RQV$`{(*skHH*;`0D_5jc6Oe=o-wks
z$EvHVM~xnjTLCn~<>$)_9(5u4sV^fFVxxa;5%9~8=Xrs83Y=4+9`6#hl4P$*B{Ka0
zH`2ckom5r-%XN2Le(i&VK5B65UJ#CrF^`+?<IR?M3{MQHAQv2jxY3)Rf#DNx))0TE
z1z+Ua%F4(x#&6{vyAX!%ATYK;29{0rAuCCQ>g=fc+@!3%eZ|Zr-K*!}qLB0P(Gme#
z+VxD%+jK6|UM?FOHpAxoSej_y%`L?%jL85Gcz28xtXI!&bhgfKcZ#xPzS3iIy<nk?
z9&dDaEZv$}TU+DJM6rtkBH-r2S(q3qm3P0hk|zlBZ))`O0gNj0Y50P?XtYefK{Rc4
z{<ppH?pRSpyHSNWgh${hS%PvJ2M6avRc+ku&CNscqut23OpZ!pMnXbBQANp1f;gGo
znW~~mcqpp->n!&~r&sZEZb3M21TlxjfVXvzXUG+(Hjwb!HWrlX1JwXm)XWh`aH{pu
zE#kj^efhXf0X_ozCreTQR)djSI_1jm-@hliQ2!noiEjE*X0BMg0XTP)b-z!%`$sqn
zT4)$s;;r+KA3sj{T$X?bWXYpegsEvAZ_Dik?pltqJxfSPc#aI2wOx6Sgo08;RJCN#
z<nijAf}NFRVsbL?UC+|l=_#?EaE5By$IbXH;-Yu{vZ+yG4qiYeprbreP*gOO&|`8f
zXi{)YI3)+~pZyYYXl0d>nQ1$I)}&fI{*HxUkL$rz`oo70!JK!tKm>2Cqmu^ncZ_ye
z{^Nf$JFB9g46^TS)oJHRP%F;*`T1R>%>osDM8|5(Zr(sdL<BN=m>lhGY^rSL9@kRh
z$W|E|Hg-#Xn+MB7^W&|R;#5Q&QXBlJT<lS<{L~~HFRyb1=N(X<9tcCS`-)UD1cz!U
z#JsL0)oeTm(s<QCdTC`T$!J>mHuaQ3?Ofb;{&(qY;m%-RpXWbmYEVei<NV)16#7)^
zEMy%*8|t8A#SxG&8wSAm1O){RNd-PjjD1iv$@%5?^7V%g*^|Y;GroPxG;H#yc^3n{
zJY1N2&y?(wYwTF{7{koOWMEM5v}4t|cnHw1som&=jm(Gqq}pO^kf_1e&1vqy+g~Ev
zK4Y(>q$J1Q-oD&wx-&L2Gt;ceIpyQmz1dk%`&%`z&Qh**#V!@MX)zXM2)iC8$<Ycd
zPxgXdWB{je=Y4)(fI`c|<y=`*RMcSf&KnkjtEk9GVFbv4lUKxy>>ZFK+YNpbG_Jm&
z1SRZGpjf0prP85S!9Yik<kV;x8g23QjiwnZ*KgoUXj|bC_`B*jBwFNb(omt8&ZoE+
z%n0qyZf}>;H&hcs$HHpv*gC6OM8<@`QFxz^^qikNsg=EpiNZh+!v*f#0J2{hd{-d<
zUT~2->W%=1QV&#+&mi01_Z%4ttnXtErb>0Ms-&nb&V#j!)o0nbxK6#Kr8_9e(hOgM
zuk_e?S=@6rB%Xx`Nki^&9^r^z?m1cOB6)HVS^MO@*-w<%uUG>r-1oB@Zb7or)}H~}
z@hn_wbeVXwyAoI=;nyz?8|LW;>0<SmTfh*EfwE)x&6EuwA6(wOd-oH3AkQ3s!E9m?
zX$jPirp;6e7Q?a>-bqb>oT&&AY01qEXz%KZjR0Ko3y7D~_R-*KN2DR|#SRq^KAG-K
zmWp$)*Fkes_RfzBp^O7#lXSwucY%gJH_~h4XCcSu=RrM*^z87EWEd<6rv=n`kGN!H
z51Hyryt|C00EB91o4xPcjBTrmt1jS;U^l5A7rQl@DGe3mc!%?KXZN7A`p2|c9bYb9
zU%UOCvuk;f23srz<p_4BGe2g*IX@)s({^Od5+_yu94L5V-hSm+@zY{C%-uT!8t(4a
zVDuDvTH5ciFdnw)b14;-;mYOZWvghkl03jo*l>OzK7IoXQKX13{}WsU*>+f2v<JB`
zN(zg1u02L1mD%98R8kzAU+=p%Spk8V5+5HgZ8cf)%<gRqS86|CVSJplw6yN)o%edN
z$%Lxeg@xzVt$KJ~5OBsOJWD;;+MBDja|W|cfr1`$u>Y3dg%rztaIwkL{1q*&9(|DB
z0muqXK`oLBNK06fDmyd`hP`mBU1i2P4`kR5m?*@XOq4UR;`>h{6jZb<LPA1GAHR-K
zQBf5k!vB3GmSy;su~eOc>YYNyGpY<OE-o)%H@qZ}g`hzW>s|V{&T0YKv_+ElB!!8J
zfuRoS=KF71S>n1=TNvab0)y4XJn1|Zql>_iWOj$gK{?EC1C+m80RaJn;5Gs(6&mET
z#*<P)Vq${;__XW>jnn)B0$M5V%a2z5UpYqrO_K-t^QNolyCL2Q&|z75d3%q6Qp44N
z0pCD{B3mpoaPaX2moHRx_8ZP96F>h{6F5_b8XA+V;^N|QIM2Vc$tl|;(#Ok%AR{8a
zv?&ZI>@P$|El4sS<Q5nby_(kqhO^tV_t%XEEjl!azTN!wQ!0g1x8lRYLy__~fs=hu
z8u#&3b5EoVJxMa)X98?j=<ey!o{_*q!3yP8Wg+<45;e)5HMB%XLh{quq`u-}XKYJW
z6b%*g*|U!|er}$gKcB|9$5E{6tBaDb@FP1Ao*uo;&9~@%ZF2geX_SGNvwi6?VF$;^
zM1)PQ{yE5b2Go6b&SovPMXNy}!bu&E3jBTgoRI|@HmRnyv9WR67{?1V&tCT6)8BQ@
zX3{}(K9Br(|MRi_!&}CtyqscV?bO0sBN4}~!G{vyYr6fTM3A<vvliWEuNzYCiz{)N
z7*cmMh?9^h$XQ*V1-I%JDo=2LJ3C<!5gwAUV<odMdG0FZXzM|izDNNf7FDavaT8g5
znfkU<<BG4N5Iu~|Q}ADrP3GTV(vvU^NW7ZP`OPJU`-g`mFvjiC_fPS1`Nh!BYHDf&
z4!%uAMf6_xCtYWPq=U7Ehmk{DEfk;tuL6NXCe>ck5GdUf@9yrpc8Aqp0p68QlC~5u
z5I!K)i>`?(>3n6yiyJQG+uq)eF>x{|t#%BEs6suXoSt7Z@Cotpf5-TY_l#w-5I)Vp
zY~De0I25`%3eVqh_}pLHgVKHWqV~v22>u;K_)g{h&v$NaZo@#?|Jc{ju|cn9-2#JM
z)-tLUM(I?Vgfm7~E5Rcm)Ufewtgbq!(mp40)=BQXJ|9z_I=%5S{-$I#e#;KRE*Bl0
zS(plgSBk3TmtvPp7m(T}D%FjtPg}!%d?{1Z`XR!KHddLiB!kN|;#Q?;`sEzB0>;U#
zQL-&&KKRXiK{2nwPenoaB~4)2xcF1c)5B2;0s=x2$>&;CNhV-J8W3W1SH_Fg5^uZ`
z`x|poK%hpmsVxLa4$;Py9m_5aY<hayLI~^73J0W(=wa?|ZX5t;(#7ufxv|Bvo~|;e
z)fh1`F!ZJYYE_x-4i(OvN;_qk6&EB;rZN2iBeP!}%+*f$BcPb!K!c1be&a!aLXDox
z!4xDWTkjOg!C$`g_-G`0naM+2TR=^Td=-KO@1M&l9!fPXZL-mu?CId>sD@20a!pK?
zZwGi}`im?f@{8otn|lm@iDL+46x3P2mWGCu*F@Ae0h?KIqhtFb#naRCh{KpR1F(aS
z^i)(UU)7m1(gA-Mu1cKWHrltlB>e?+VYGF1txCucF5>6gz()B;$HwviKJq}rl)*&i
zP&si3$N3lCg4?DskQnL#%Cx?#yhshCG-4ZDTT|OK<e(-s?8Cawiaux~#!;|wkXFQr
z>}bl%%ku?U3+RiKs4$3_WvcsT=vcke(<5~PWZEOh*$p7WUsQLcQ0vNl;zL72v+|P)
z{}FZbda2dnv9Z1Y7+<fQZ-XPo$Hb(tXOj!yV_-tO1tjo^nVFAM-)*2_&mo7*_z}HC
z+-B?Ik8!;OsE9f!nHAFd82?C+l43s`D2z<t^M_3LM50X)fG$Bsi2%u-$`th!5)u|3
zlTW1Up~wnpLV^g2dYlWCFLH9E?C)F4hD&j=@bU3|2QKrNq$)<0Kd_c3cC%*YpkPsR
zi~8QXn*Dir1S*y0wO8{kuSp3BvI>Ir4ItWyQI9_<FO|4<0UCAz8x2i}>Up@**DAv|
zxcK<zlne|E&mal8xlUwU#_phb#G~9dBKscH8+l52sbgO`jn`w%la&a=rL4Pd-bhGD
z=u1fa;f$9vfsu*iQ=~T`Qxilw3&|^?)>+>2bbm21H|HPf>l<^ceuaa&3P8TPOd~vw
zw`v9uRTtwq+AtLclMOx<lZ@WJK0)9O*RT67;3tk(+FJOGS%S2{?Tz)$XcC<*70Z#_
z;LaK{oZAL)#W84VVuDvd|IT;`0P@JUH<Q%uXy9&ysi`R?xOZt~#ShMw+Zins_5oxJ
z0U)`X&`C>X1)+qc3YW5-4?J32T>KicoKuWhzynH4i?j}coTmhZ0@fHZVHa?R&i*fr
zo!yZQU^X^Wg^{Vg%;>>^s*4_^h&o3;Cue8ECYe&*v!Hms18nBgv0DX$q5|ls7>~Zd
z#jT{}b0x7;RUPF3(Lh1B;Tay?@(y!;vO1G9Xk!S-pb{09+FTtjnBU#ptc;*oa#{cI
zlPegV1cde-R99GG2iMp4sc%z0xa7T(5>|<B?V%rAM_Wh7t4ENRb)BGnm#;{LuLO5c
z%m1}dXdzCMfPdF{>SZo(oGFZcN53M?C&QF1)co{#x8xx3O@%CNs`;doWKx*vITeDo
zH(c@e@5|sy3+5AOcwRvyJZ%mh?(bnBwze6W$x44IC@H~ERa47SR#JNW*0zE%y24lk
z?z|;}TQ#;{smP%O9hT^F_V%1T-Q6$K0Lw}NL7hCYxT06ioF9&G{~n$wN2>%l_yx+u
zp3U7`Qhy1YCWnn)Izd)UsD_#Z7P$yRe_!98GULJV-BHWk_~hgzeUQ_rG2X6d$FJ9c
z$lCq1!C~IJG}Gcw_f<on5NJC)J6nSI!Rp0QhPI9jVAiQHdn(i!mY<B2pk!|Y7*M4A
zSb8Kx4tU;y-ehLwG-{Gc1CaF2qCgTayvehMo?h|sxrA7#QkjsIn31BkafGlLdHt(!
z9|1*rJpSSqg@S^jxOr&h?8Y9@u0Cx%;W(A#8jgsdMX2g_Sz-_wCVG5zwIl*g)e$di
zXHXgp@cE<zfZA0~fhCQ4f@6XXsCHPo7!mZ#kR=)_1b=rlOx2^Wt!?cN^h{Z4WTJk3
z-;j#mAQJezyaUQD2gvYm+s4Sqz9GlW{%{O5G_Ji{PE>6!uqlujvBCN2VD|mpyuQ1$
z^EeyeZ@~jJwpmPQi8=s`ji9)Pq)~p7PpU+q&gBGMG6Bc3&d@mCUup2GD=V+zmhmB=
z84mj3wUSV%2_%jTAKz)Ft81(J=TGK2PMhahL+toy!RWj-Cl?nrv+)mVg>P@f0PG$i
z<;M(dQN)CWxdGN1J5~tC136tEkBrJYC^)z{QVK<3M-(z$1%i7GH0PLjt`4Wt1Q;nG
zjuD__6JQD6E&S9huL7(k_Vi$29Pb~R|88Sfq}l%yC$PK$B83a>a4c(c3Dg;6y1NgF
z_aYXRNQi9oF&g=uPNu@h(C&I5rHN8oXqwA7AhrHfi4-)(e6H&4V2$i}O1XF!iNghM
z+qnD^d3E_l_-L8g+Wey=7}P}+YD(H0X637!lFh{WY0!l{D9*X~P4r1Jz1^%0p@7E=
zr(@d$R~p_K_}osClR-2c{affk%9;{a0n*_As5suAhOqUnPcQk!5yZ3}A59OoeWBBr
z*q4<tD3oA_muRWa`XW=6fkI_+^EDDhc;c0fAOun>7?3F*XpYn7LI>m70Yjs|#4CSH
zj5o;Ct+Kfl0GEh@ShSR|#>*qMB<rd}Z?i~`sGeEN8mT3+Q(jVWPq6z5L4FyRnLVmD
z+%0HZn?q5xvn!#hcWWeu-w_AL#-H*bDP*@TP9X<c)|RJ8L3<;2bSdu+yzZ4VnS@W$
zXKlC*j0Fjx-3dt>@v}ATwrUfaPj@}Fq%u-oayqUm6(mvm0?E%H;D$<L=w8<*9<*NM
zh8*aKdo6~cxLd)$p1X`u1x?c`$L!E3hbuzh7nPf3b?P5R_uIxi*n_RD{;6+>y9RVE
zfbMvHY$<Qzl_W)GSbaNfhueP|p+X=$-P$OwXc5&H8Q;HcIYsa+W<ddJq~Bx@?2iwg
zh<}>sSn1^BD~l<Fi*cYO-`N6hlfE*Zj(^P>zgXP~Wf&G&^h=4YARKh~0OuYxS9WG`
zfz>5WLA7|nm_F3Q-z?}Kt%pi==6QWU@-8B28WFyI<0w|}7=BH#RA~7}k(R_G1^sac
z+UF~snc=omUHY}X$xs>2iQ0pHQcNlny0<Q7ePH9Z1+HkCyu(sG?jPIVad%Z}x#2!A
z_O`w?Sg0Mdh2YeN%zoBuauHNExJP!w;<W?%F?q(wLE#DObe{l8IXOuifgr(^#(K)}
z#vT?za(|KFtqH@P7w5BO#h7Xu0%wm?PYRl5WOfNvcGM=3>-NkcleX!ZdBtwu?E}nh
zcnz8QLXp<(+*<jYRga=RSxTbHYSnIuR(PF}APfMdl?htSoLM8=(5(0B&fGOgT-)AN
z?`!UkN;I&$&a$nq>C1;TdRCI~HtVBN+%<;Ek3o#A?KK;?m0UnD-<S4Y5>{N6xr79{
z)JbC|^~CZF+}fNl*=q72Xhc3`nettmUpn=`g&TIp=$76({(ADwAkQ{kfC$=+nj}y1
zus7PQfao_^Urk@O;A06cg!C0Eo9P4PnzjfE=jB4E-S8TbHwB$k#9ksFQYyy_==b=q
zKc3&siXKT#7I<I$EbBlZ$0Qx!I?L|)LlcJ#PTC-^VN4>Ngsy&;B-}13JhW(9dC9H2
zdItv@@_p`jmCIB<{!(pRIY$B;b}izYyQ8Yj{!i=JHP8@3$l(F2|FV5c3ct)Sq%?0>
z<+IQKV+q@p#|9b{SB9_M^+3*cE|O~pBob&{#UwOu@%wgHaKi|ADXz)Q8Mb|w_zU<C
znh@F!w(gT`6F#6)9XL2-D>ZtA<T5yNQw>9xiq7V}#2xiMe=4?U)fNZR%z(SqgrRdm
zv>SG)P!D>Q@>GeBPpF7+MWTEw&{E~Uy}7LmMOhO1-k5gNMAd-+>D{OEB{mND#D@?o
z;q>{)JL^b5g1AaXj+OoMIYyU7x(}BTOOOh}E?<;(HcQ>Dbe>B>1EyE;O-u9B%=f?6
z7(np)y%HN%EZq2rU&^*K5;nj=>i%HAm~ZOoW;2_)j2N`QYa|tvCeo4IK6oxODR+~4
z;0X6Y#TX_H{BoLQe~)qb$ga_k57d4!yNO09jH5<_#KyY#%gy?+BjrGYdLGFM3sq6m
zto_(7Vj9k}d8AfO5~3j(tFzZbM4(|4Weo!~5xM1LtbSY43mGB{M~u*69)socp@Sly
z1j~FI>T#!}4GI0t6^@NmG71Qeu_AN+FHhV4m!VU9%buzzS`z2>4_xlY7x>R`ICkw>
z;cg?x&e@mN5u;eSH(hWm-}jcR893V?9l%3!IBVbNHFz&suT)G$*C~P<5>m>id0EjZ
zSvQN@2uO^7^bPK8g~>MeufLA<Hcd{?$#xSk4DP!r#n(OQ7gm2_!ue#oKCtBYwLT|S
zKbBKkX0dDlT#NC$d-K0KIrDcYqxX+LW3pw52!*nQ5y_gJk+CmLmaJK_XWzzB7)y+O
zjmSE(mMuGxH4?*+q_M9f8ClB`->1*#^A~*Qr@78^&3(@GoO{l>uKRw!-p`n*!Uape
z-6hZ266Nv92gRGThHf{w-(;jicMV+!J~=prL4b5!tD!A9n;cMZx};m@Zt<MF52zgP
zVUc3DQqX{jP3Lfdu!MJe=pE0+gRmo+GJneF6~`)rX=2maNpWAc(ldgzJSQ?y>WYsJ
zB*(`?!s;)Dh4dQ{nvG52v_=IS68fpvo_^a`#D6~Q<y|KvCrA!)+EAH2f5%ZizyAr=
zYH(Q<{pV#%b8t)rd&gFA)R69f73SrY7Ox!#a!pI$I(N#p@!p;_3>s<Mu6-qW4rq)O
zf8+?^vG=H4(y7`UefIm-h<2`?tRMPnn~_g&LQsFuk7%eP@$=e0z5Qm^r@uAU9SEuo
zmycE@hgKn3J_F|hNt<gW;&vaXvS|&?iO8@8<vpV*+--97;o6U^78C&w?#1-lkI}DS
zgquM#aoh)H?dh?HS4I-z$VS%5le89^sAt^ImvYH`4--@BU=slsV9EV@hT`L()Yfy2
zR02xMDZHVoo;j`7uRwB$>PZt0V^f{y<T@^OrD%mUX2UwPhlXQMK<6sMc)Yk&lgT5W
zVQnCA2d_=#XtFN3TctF!xV`walo2f5s+!R&oE}zP2(I~Zv-$9cCJrgH#USo05YQz=
ztzcr=e$|Y{*rh05{6Hu5R67pCz(xgFHofzLYG}yn5gIb9&fNbI9JV4HepS;SWG<l_
z9OY?{DLcVYp+8``@0b)KYg4!$R>a|CVf119nePyX5TNB<9wiX!ZyspRHZ{pU&z=}X
z-iP^d_@1dEBllY$Xf<Nwjrq(?s{K{a+No*~7-4m3nm#Ir;>Es!t8wVD^Xl^uTV3M~
zQ^5HXieF0GS*8AW5<Lh(n7qsDX8|jprJx3CZYSu24;8Jg{E&o^cZn1saNCpc1hl9y
zZtCSCyq0Og>121VKDyEyG-43<aiRrMgx8;t7Wp4fVjuSwQE=Q6pbuq`ZmO4rQv;yj
zx@cgFOf?x~ZNk<YPy?z#_6(&z?k1;Nk!qO3B+MV=N+bdMkN3Vu^Jk1njFfI&{Y*^^
zD*#CwG8znZISO$ywc<Ii0qEYHjKw2iE2ymx$prU>kc%lRAls#8iJ+9y%!D2Uxr9)|
z%PWbN?30bdg1}hzmg&D+F>R|?DZ6X)Zf(+k(VbXWQpm&`(mtyIHE_Z@c>ss({+v0|
zkO?Wfv35iiwhU5_^?B5F@t^cvO5bMCIf<OfSeY3}lA#0N<!Rkv4h;yE4`tRED~s5I
zfxribdeSq0@;*0H859i#c_*U*2A?kPKZN#{a65gU+4n1Ml=s$(hV)SnXK|d53a-h&
z0HUrsl_>8O!)xS6{uWWzY{C`X9mq`Kx^GdV?e)fKeV)@e+Y3LW7u2k5h_Xv0|LFA}
zPm|bXDs#7xgEHV1B0u)7CmcKmRJ660xX8v>yNH|aFA0h<-tp=L^7BpUrpRxg<Ax)$
z-T$NbWaAzPQQ6wlA_o`!i7524@|tL~22rh=ea^)5_RDdyYH^{uWXm)*ijwC{XgHw5
z+oPvO|A>O#RcZUF7)Y93Wnu75S%r^W^7L>Zy%=$jkNbp#m1P{wR<FmKMKMqdaP4P2
zN=6D2>c)7%bF*W`Od!~Yiq41?BtGf-+Nwu4BH0;^Gbi>FExzGwH$f09wz6TY6ZQ9-
zeeJLqPpOK^kQdjacVM4q#&|kC?|iq`jS)ot<z*<Kx++*V3hknzxeD=M0p<2e-Z>||
z7n^(~$xY$In#Se}M$LAIcN*IKBnf7w23p}~wje`z34j46+dTOe&2OT5nZ2MNmtaAP
zg4G<J>QsSZ4&-yXE~%h&R)}w)1A<Wo8f@inu(xnb-N}8JX9n8zl!Ez|{U4^b`bNNY
zs=)fZ{)lqxsMd^eo%da=^(`K=J|ND*8mM?r=NrAF;19#Q0bE`CjyGe11$E+{Lb*eL
zLLow{Fm*3?yM_AoP&&E5<(Qmo8;C6g1lT$cC-4K#I_;ruPIWfgg~)r<9o_zWoobdH
z9!%)VSfinqj)7S$;BaA((|PqFZjb2R*v$IoW(?=mA=mrY^BzB`8Pfj=7i@vR1two0
zlvd#ag=1H!8XV-`@bOm71jysQ@5oJ6vCj?Ks++=1dpPC?Yh({5$+`5_XBuM5l|{8-
zW1H|r7GS-pcd*$LiaypkCW@7Fk>*|;LW&%PB;)Kh^UCw`j#WP>@{MI9y<&QNIaj76
z<bm)cpka9#613&^lO(WLEIHz`h!?Yz_Ixv&TKf>3s*hv-TPwf+vdY5rfvvm_(*qsI
z^~>lOxQim+Ymi~>Exsd5jy!FIvzE_au{@X$C$v>*8~bO`fJeh$S5Z#L0tC%T?#0E-
z0XsTZh7`&MNrv{k6*9VO{Q3nUIjJ5s+@(_evx}@p+;RIyl2jHdj}om9s<{VqklN|F
ztk*dQ{P!N#=9N?|)p8bAEbAUEWcsd8ouodE(*?}|?4``x2JCihZd?yHrQ+WzJn4N6
z6YG|DWf)Hww+4S5@?v?d%3faEFowJ@LZP$&_kFGfkcEsAW~|X0NJM$MSztcn%|W9g
zmQp3Y(gGg7LrFlUE{~j>#_-$eo&txmgUYXKnCISXj6vNl-F0g-EF`gii@P@%6kc@R
zvt%3ei>fbT(j#xD`p5U915@GyribqHC8KO7K|4;<^bu#&Ai4Zzb}#JbY?zVJLBYVz
zZzo4K1|-G!qDHg?RfFJ1+FLGW1AX(0L*9+n7VaD7O)6FH%1LD?!m`Wk-Q*G%_bsjF
z`pm|vBF+aF9+*!sf4zYBPDXR_=nj_;mlumFlh&V#$_*Sl_J~T{Seu(#zH!tm@tZnq
z__L7jc8E*rcHo9O&s`%w8o2XLbUCHfU)PbDl0Q*PT+boY&`!?n(npFd-paj=JcSIF
zX`yVpGprL*#lAJM9^tQg!xEz)2k)nHq7^z(*PKq#L>4sHUd8?fyIuZ#PVx@hsTaj%
z{mzplOAV8-A@@WbL>X`)l3%PC_9W;N1)g(Enk_}Nm-Auc)2R<<Rb|y~R~SiS7sb1S
zB2mG3@vQ`G2-dK0Ta?Ro=uzF=&+2crCKZ?H$*wM*n(kvK>y4P!0x<gFXWi&1F>5_D
z1CxypZ&$PAt~1V(El}1RC;beY5V35$956w5pDo49scx(>a$8~3welE^QH#Sth`g^*
zeqR-~#yu<5V$kQJT;lk@f5Y^5Gv4oUi}lp2*@!OR3OPQ+yzdFa4CYS=%TZ8JKzRhp
zLhY;b&KqJ0*9utj$bQMU$cYtu4eq8TBVyn-HgJ1&AJgSFX_3%=C)eOqHaezR=-3<M
zS0;aQQR)vWhzwL5v>4r>!?&MF)qjW<k0rJJXkZ*==ZT?iTfDa>IGzn@JAx}&+jPc|
zZiVjA%^94CGpfasApIK{Hu=qw!3{NbL0PD$q<9qM0WjZr&K6YlY-9={X$HoL)Gj!i
z{~pnt|Mm4M3f!NCZZs@vrB*#i=b_?ihyCX)+*ceoPa=^&g;jqAS(bxx5ifgIonPzR
zY@QV*&6?edh9LuvqQyudo9FS2iLA3^yM;aho)avgv%Vj_wftzJ=qV6IH_N2P+kIDi
z9}OIKR6dr}_q!A&z>?J>{kW%hyIeqICSHPm6`VNN-gV5+)4FwL%INPXg86nEWe-%~
zNiv6&clP1YXaL3o;w{SMl~=PrKcOIiQrzr7eu%<E$CJXlW0{mW#XN8^jE4C~x{5<z
z1luV4deB`pngS>R%enWbG*an~@gCtSt8w37wi{Ud`YJc`haof^60W@5C{_91*n&0i
zlG`nwkEIKrUc{jiHzss=%2+$gTA$)<@nWJaHt>Z@{vnO0N}gNQ{!#bc1QYvYx7c?3
z+NLCz9jD@x!n>)VWX+ECDJMn;k#pySwmv6UVH<=PHl52E=c=AChaGy<p!M`zVgNDF
zwGa&n58e6lVs3HvbbM1u$|8bLJV;fxSJ^N#`QR#zvVq>KFf~8=C`wwYKm7qsck#Y9
z6Zhtm`I2`qm=nZrwDo&o5$^!R{<qte5$oEL#kyH)lq!MU=q_~Y@3*5nB#RICmfVVn
zEW~t=();H(ha{a|1e<e{QfNYZ(NUT@Iyx8Z5|);h6u^ou);FTZ!X#%!{(-@QQl}GS
z*)5uQ{m+#$?E<S(T<*I^b}x}wTuyjB9h)evCYlp0!QWxC3zCwOl0`@+iJ{~f`7Won
z{jLeb4B}*!M`UY0fBGbQQ_G+qtmHc!-03F8(M>2^xgpH^crR8{s7<3=MC;a>!%yp~
zP&KLR9|-8K9yD#rtVgnkyL;Z(xjCg-$NE)?kLo8!24&H7?5G8t{PwAgkkF<kTH^Um
z<CmE=SDBP`#obl6Pu{j3+b{2VNmUxu2UJRwWmHIPpIfCr`RD~l+Q3i0Oi#<c278lZ
zPZ~RQ5n7PYBRjC@T>}CeThzZc<NVj;eWW=G<~jm;(-WMpGxo>pr==Eru{vOx9Cj#B
z!#&$+A+jMUXf>6m-I^9z;#ZH>*4FqE>I#)aGJQi~pi(2T+vbmY%@pMxl3SpWBQY(v
zDK>w(#PGd;`wja9{>!QQekN=#h8r3;9&2S&^1O50GAYa1d@1rtK<;7&IdohDl=Ael
zv$I=+>T6SNM03oaUnhF}69SI)K{}XZc5wrZ$nQ_z3uj@J;fMz$MRpROx`bK)9sMbf
z9V3BvH#3h9`#HB<!Dm=i0LG{1#|Gn|SYgT47XUZr3RqR{ypNzrz3tK5?g!T8)zpi3
z8U~Y)V8VSA$GUCr%-U7d^Aes`^S@$BbP!iET+ID`wU{Z#yZ*ko$YkYq{>gb|UxJ+R
zW4Y^eG?DaRU{Fb6B9n(R#>#t)g-$&Rm{oQef6b5QzSkg~da{Dc=PvkWT(@X-1s|1P
z6ALr&%ehg_fM-0IeiJhpZ!NEzaASO}!<mVbe9VL`<&{^fr7CC`^-wkshh<lakcWBU
z@<2<#WYCn#znPfX4L`0>xjlW&P!_K_XP(X@D`$Xle!Y!Q<6qBbUUIuow>c_v^>WYK
z&j@<~Ed6*{^OMUHql|c2<KbWR!tVEk7#r35%FH*C>bP~usDx<;^_W>%!-#%z=lXsF
zBVM=~;=wWH?%tj&qB-jMLz>q6#{3LBMmWK2v9?6a<!1t@n}Jog6|Dy()WuHMT^}|i
zSSuHA9GR>3wa1EG64({Ris-L6B1~EktUP-4B56v?JOXW9nBX&ovdq^kBg0wMac7y3
z2s-F4yGWLfv6a<XQ}1G+B5i);;!8RIAHv0W)7x?&e&|&DK_&X1(x<>jhC`@l#i{ts
z+qW}rjpp7Z(m<4@%zLe9U4MUnbqfnZJ68TUQY_O9jFtF#wFd*hHErR+{wS2Xxo~D4
zLh_!UPMB6&M%c<=YMEL9fPQBK4{mI39_7Q*;yKUcK#d$beT-^;^AR~(g&I2krn56N
zpBNo(lThLTDsLN!^Z=5cmWokIS~?je!;u_EVuRt19>@2K>-$jxFepV7=9ZAqJk5@p
zV>h;G+AUs913-(zLg6_%IpSCCj<E7%{i{CY2lU+l@Bw*>1{FTSX%nN}9?kcaeAg0)
zv;Y#Ae(!z!rlP{7i5Ly&{P_7ZmrsriSh<gf8km@r8CDpjBLXg%ySv}6riP#k4foNK
zl9J4SpVcc&2elLxL6YOi(S-*4XfOk1vFs6u61=llffqn^tDj!;>C^TcvCP|Tyr$RK
z*?SKwg8`~bahxe65{VECGxyV|b(YLY#j*qNIc9k-#8d%Nz{xhiP!piKqkTQ)0v@7r
zSJw{!24AI7B1og77zR!N*ezXwq@Mh)X8HG>|F<+=*+<V0p7Q9*|DLr9%W(}_VxeHn
z3<WMuEHq;HmK*@A=+!~76C2+#g92^5nC}Qm0R3JN1z%cT?m!svf(xXj=oQ4%0bHJO
zU?1=3$^Jz=or<5qe`d#0f;#En7I?buF4=ic$!m9FNdOQ8zKE5T6^^ys&!nu8*w|R>
zSkRyA2pru96$NPoGtB|8AKA&w2?2J?{y?%7f_BTW>|}!6+_oR^&VCp$#F61CI)P1W
z3C3s~N21CATi<svU<?OvoBBR}^hy7;{~OV#)_frXJj`_cpa3j#iSl*Kv|Dw2Z0z){
zHoHPUx!IhC2FcuTFzLXwa<I3LIzW5jEJ&^E`B$*=09;m-tHZ#+;E)l26-+kpZ+$$F
zk-?n&-=xy40Tl{+{OJ{F%na7zSSRhZJ4K{z{lAASw1afRj(cK}-M>nSNoO_*oMyE)
z-RZ#{R(=-&7%jz$G>8d3`S^_m&l&RNMPA-t#It7#`F|k*cMRkx5fmrofToJ`2n~(%
zWMZa;vvjuZ#IU0Tsf`A>CyI0MK2q(H<)GgJO{=sq&e6oo`9-m8UBeN_y-QeC0N(0F
z;lm`ChkJxX{zAIB{iMWNc=WN=gWcNueSrYlSq8chIX*sK^6-tnqMfjT*H97Iy8QqG
zF3NXWpa1|JHv#*7!&Q})<~oRUa*nfi-8O*)x4Sj?4L_ey+B-lLBY1IKVP$0n=lC6~
z$rQJ&m1ZOc+N2Jor>Acf$y^8E*0%7|v6-1%`#MvrTEbR*ER2sT=9jvL#`f^=@Tm;w
z*{8tIuYR#d4CFg6K*fQ(uM!&@8x^$?+29E;!DjcHZeeu>VPq?w-MO9{mY<&PD}t^7
zt%+kG)P>+x^#IlWS0(GS)K`yYrxwfV{CXxu#(sv>OiVEDX7%ONNM8@BYl*iJe&XER
z-D$bGmQ_=mcjC^j1Hkb`C>a2RcKdSyzyd2R6#)Dp|L+JIbUd~e{r?WH-iH9d`MDE;
VLnDRRDjfhnT59@;>U%a3{|83V1Ev4~

literal 0
HcmV?d00001

diff --git a/doc/images/kit_logo.svg b/doc/source/_static/images/kit_logo.svg
similarity index 100%
rename from doc/images/kit_logo.svg
rename to doc/source/_static/images/kit_logo.svg
diff --git a/doc/source/_static/images/local_laptop.png b/doc/source/_static/images/local_laptop.png
new file mode 100644
index 0000000000000000000000000000000000000000..b247b98614f80d03f8d02427e83a58d930eb080f
GIT binary patch
literal 24793
zcmeFZc|4Tu`!_yPwAe~zr&6huu_QZnEACR<s3glIMMc&a>x`(#8nR>?k&rS<LY7IE
zu}k)ChOvxnV=%^;?YYL%hwjh!d49jw@6X@!SK_?R>s*fWeY}tNah%ulfAn>^cM9(W
zfk50B|2}U70&M|)+ydIp0sOPz(Y_1<-AcW9{@hiM-sxn{_`??lnBLI0wl=n7Hjd<H
zCB+u8!&`J;dTg&fX!|WGAG!Smhev=)Sv-VG5YlD`^E8YKN<DBa*vg<LPN58Ew=80K
zUM|yw?{NUU;~~lNp@pM?f!h_VewS@Zrg+)iF@-kN@-<)7K#aei0NAqBgNUon!FH7q
z%I=<jY4{=ZTL34oxVp#evyd)ATwLRpmH!M@h{#x?a??*<%8L2)f$F!~WN@@4pW;G!
zy6?jiAutE)p}~rPILVf=hMcta&qhria;#qfZA(Qx&p>2QPM4(%LIv2@dTfXkkZUV9
zY?gT+^7BV9jhr<qP`UQ*Ieb{w#enqtD07&Vv2@smXW|5|9GH3t`V4lwpYe)#J7N6=
z=AM$h?;hV_PsNyb!M+tOV^0e3JJ$@Yyj|U+cs6avlYLE~(8FcLZ#BH`?Y3VTDyQQy
z&(?R3J=uRW91j(;=G`i?ItO5`7G{bExX%`M37)&60OI!NUL7;H82IOGE$GR`2u4Hw
z!Sfuc`gyCbD1ANrD~F?Ij}o_0^p?MxK<(SltgeRA%hFgI(1Wt0^j{nY8r+{=U0bC>
zZ_h}!EUAg&<$o@kNd_&g&AhU~E+e5bQ^-#q$Gi4Q4e-E(Mib}&V|A)2;L=d3oxdkg
z>$#P4wM%W<4xZI_1}^ch&5xHT)xEYt>-|+|)7Do-&yZ$}WJs>HusrC|+R4mT+VNn|
z`s$2Vqmt@D_wYLke@~|!Cy`8!wg<H*PEO3AF8rSUKIS{sX{F2k%Mf6NXBdgSJ>NI1
z=9QS$HNMZ0dhz!o!yPdxx*VzfcA=?nh5R@wo~_wkxWIjnJD~4xbeVTY`3|xFvpxmP
z4`Ny-8CbQW&*2w~1A@<3o?Y*2Xt>}s`#+P4zYLT>9qlDse=koq5$M36-Ieq@>WdB^
zF>vh_ab1@b0}T{FcdI1UcHGPIYln9N+u8NSfp~oRZm{0{?b+@BSz6Im)7iMR>kZY{
zy9)?h-_Y6Ew53c-y6$>cf6b<SU>ZB_X+ONS?3f=RaFrOgAI<y@zek-_t))tk>H%db
zb#)BDLvF1Zhqg51np>}9d=Uxd<$7g+u2hi!bjP}#nU^M<nCD21GZt*)Ivf4_5p<5U
zMg-UYD?`3uK#*kr=gJ;tlndv+hoye&eO03Efc@*V`-yM>y~~E~n3L+ZPJ6%au2$L0
zh79(jQc`CM`lQ1{gK34d5(?(@ww0ANo$m}W(xu^5u8<DH-*g3D>J>g$1)n*>|I0MZ
z=gQ0JwEpb4$jC^w3%7%fIxpQvW1GltkFS_4SixYi1HzB+UW8E}nvW8rpIm>Z;*-F3
zOp-B=qz1XLhwWg!;aRl6vr&=QQ)Y{N27$YOe!#SGfpA)6)gv?Eh?*U&Go+0!@wI>N
ze!_M8e$E>^;O7G753KIf&gv2i8kw5r$OGBNrEm5U`!k*eS4gK)$aDF}OOlL`GR$Ar
z{w)0ZKG%GnwwqaPLt{Rz*~;qm=_uXsr7$jHDG?diy87H~Dd#~(5Q^fZusXDzE#6`)
z4I1uUW=+Y&15+IYm&jqN?`Xa*%+ETk`%=g%h-ygRvKHy+xhuT>Z5eVb$g&*r%uo4k
z7oL1cGV47il5Wt_=V4LYeqOE?1N6dRanZ>07VDiUO=d5dblgW-;X0WUO^R-0Y<u&5
znJLS9s~hFd*2ay0NSX~A++ZyE<DAEzSQUk>{=Itad16_P20!!b&-q%=OYgAV?Wfs)
z%nt3%NapZcZE|7sd3pI{<dUwixR9UhuU5FJ%9C19yRXV$&ds17t7}oK-UQlfK}+9!
zU<aEhbuK)h)yV`m8FQqLJ}%#UboZU5jiN?RfYzpzm^lVoa&i?B^7Bed$>ELptmtp~
z9EEdx7@cF3ZGNmiaO~33IU<?4mq}-*@299xD9I;R4yeF9V_rV~`G=HoGwBXd%_h)G
zjeX2i2z#njSAwF#q`Hvt%Eo4Vo4n%@B|CYMmO!^Bo6X&zQRP84F2~u=4nJXSypKmR
zSLOEns+M?mz8mE`_a<HCI3QDOWAnD_r!osi!Ah1=c?fM;J(_6g&izR#9JfA>yt^(f
z{ak{zy`wdgTNnAa+dh}TVdL1wBh4S;RuHI|sQQ>b@z@j>xRiMYbUZ5O3XNrXE1f}9
zmGx+ROyE6tT?#gIL*;1PqZO5t$u=%o+*`Hc99)Lo<n4t|>(TzsJ<tBb_ILBjoU7ln
zxzsqC4>5e-Gv*H~4}61N&R%<m1g8C}t@g8Ne_M?R$Y&CK;);KQ_}0usE1<>toZH%r
zMHG_u$xaq8#Tg5MrqzdHIV*nI2pT_VTb;E8MxXtb+dn_vqc{13YYT(5Z6Jlv;|2GV
zSQa10>(~ssf!L$`U+vP~<*5}Vfy?^RLt&|v)vAnk1<8x<F72pX>bUw!2~?K!J>CYC
zaKRu~maFjYhPS7fZ@YDz0S#tZY6!&Qm**W&eOb-+2MKKVmR^{var<>xXPgP+XW;{|
zM)p;ZMM4^q;I$X^ccmmd-%*Wauyi1wkl$BuouV!1J^E$uX?>Q_G=z1<vHg(pu;WM_
zCx)b80PmD=m~mv3=R~i^W2L$ycu;rv@wzt9+T=vQ1#Ge=(Btfb+O(SqY+0m)g-j8@
z2NM?lJEP29_W4c6|7>hrxVbhtz!di5v}{4{jcl_AVH=#kGS13{^bnLbax>q?M(p>f
zx9lRCRF;})Jcs;Yzi~F8dn>D}<Y9?6qIrvRnx`&hn3VynVRuBdG%%)UJt5Hgl1|<T
zhGRKWXGvDlv~9cePq6dFnLhxI;L7664Y{4HMuFMC?XF;K&CF!Gak6;i1n9So#^MmY
z0P%orTeXgL8wFkp7)2EpHTZ~~DwlDFxI*#zJ;d~gg!>>qxK=<L_Y|<Gzm4={wN@+8
z596qbQl5nI1AeWw#s0XLd4U^*BS2l;Hgx3UF?af`9x*}@q!|pK=zyne%@pa5IKF-s
zDiaeX?DO_AEx7ywi6LJY24xrjX$u&9w8`qcydI=yEP-V~DZ%UB6+lskXTWIJXFh!z
zbX~Wo#p@)a)l;jE<7`=R?hM210&loFw`^s+QR1repv;bO`Nb>|R(iUYyeM+ifsVwO
z8$+2_rtOEmA2kt-Mb^o_ZGY?horT+)>MESyjA+m#j&Up6gLjQ0ewO-HmlZ%wX;LPF
z&-&yyoD759Ex;_)s+%w!Db>i_1$BAQsHOE*#U{{cpTBdHIlfJNTpt7g7iVx_Dzvo=
zsu2u%GZe|TJzAU|7o#1-b#@?bBtZex`JY2nGuZ$d=Py-c|7T(84_E#zaU{wWWRHG1
za^G*=;^TTj@i`?gSvGkJ;H_R)mfN_`Dw1T*ZM!H_yz`s;<@+`^UZ*x%B|-tD27odj
z3c!4RlDTizn-9B1iL%AZXF!H7A50bU!&jG%1#(Z9a;z{PQmT+K4|otZpn70CejwR?
zMC&PlRtF%Wj<i0`(1>7{`G>aRyuSsZ>tcX0)n#yPGEztOf0mZ>_A&goNyV@Q>B%T^
z`mOJfYAg#~2ZL_G{oC3=Cu4!L2ijnh1`h$7MESQh0o!yGwgO8cte0j3M1cTt`udm-
z?Do=%+jS@n&sZX#1_hxI6~8zZfkoYT7VZK4z@u|E8e<J`yqIYmAR<|<cw<MJ>Yl=>
z8B1tkkVLf>X+q1uf8&vu8@h7E?F{gw49B<1r3+@vOw8e|U+0{@3f26EqCh1Lje=Gq
znt&k4;5Q9d->$E;(Kg+MK{=BBwi^4H<E4m|;Ku~Y{E)_}o5B&l5zk#YzFne0S9*G<
zPvzz>WSBVSW|uJEx_@!9ik9YT=(@8JU6a@WbXC)5Uyy|<UK`^+3(k}U7`ja>(6aL3
z#xY8numQ73#@P@awBSrkyZ}Dnf|T=5umK5c9_|8M1nI`QQvu*XgcwIJ8J{c>ul^&7
zn$dG<5&}*BKAbMYq`RPn9T<S@dC~sh{DW6q>Qq2qtS<o|v5u^I4?&F`7WFKf{2gqi
zCV0iA`X7!1JcZiZ(+Ub@u^X+jMaWOg7332HxdWPb?v<-~^=IeyaF2L%UlG4!#hve8
znd+{(!hVnBZ-3D`WC_;|aO*LHLABO+7ihdLV;y9<Q@Q{oOF^><9E1J3PPor;t7>@U
z5-_F_BY-lka8SxD%Sa2^j`{d@?}S(QN7sX@j>IMWhh?y^vjOCEK2NLS*L~1_J>cy5
zZLmpzB?O@$t1fsWd|qXl9(a>~S6Xoj!T>(E{2QMQShkI->-ZC&;fCyfOVniZO++C@
zo%@Fo&6?XU0*CM-A*41P3Cir-x2hs1tKkCdgI2V)pl0EK^1w!ao|ux-d+#`77nffq
z4x)b-^m+b?*`Hj4<*j$m<co*EA7KnWvHjzin^VHP-Wnijz}~n4SW8vdZyjN%t@Ly)
zlm6_v88b(zt9<}&tY%h}SGVuksNE)XZDs?|RV<UuOAxh*>jEu@zHA(0bPnibHB;X?
zA6j5b+#D@QMA>Ds<$1kdM|S`o!j>8s77HYc2;3i0D)~q;;3-JI9Y-;7Sst{!)e)e<
z8nD(jYAxt$bQS7N|G5hy;Mt?aBch-_xg(y!2iB1_;|yqQLgp^ZCIN6Pu#9Fr0)ntI
zy?@R{=>eWX%sAoID@(nNR*C2O7On!)tOCn}Nar<7Bu?D7+p4R6Ik%kq>`QaI&c^rY
zt3Ju0RR7jN#)EihjudfCUHFuVlUT3=EsXqJ2U(657Ex#!_S>}=^(cpYnq3y>>;jBw
z6BT9|X&-2Bxu*S7)UAq(cR{!)y(YHQHP^R-op(XXhq@7i3#UOLM}V_;b|YsH0c;Yv
z>ZpJ7THvK8Sf(eC<Zw%CGd^pusDbUj@mY#x+t`tbKk*rr1%&T({oIWm@fxIX00#Jx
z<C`wgplrD%def2c@h~<{=8naoLfB?yVQcD#*B?CLih;qy&wH8MIKCMZ!I|=)UeE8R
zfIyAqP1c=$Da%`LC!0oWiraI>D<U)5Hj(Bew!A$|F;-8=cWtoBst?SLL_0u7$xc3H
z-aWFja!zp*y5d0qx`wxbG(sU86cp11prw=9MlMr#4HS!mXBPF8^yV*Zl2xAZ))lmY
zZfIOzXYK$z=+N#mWtNw7b%`OT0oGylGoAU#Wjnw~sZr*Dr||k62=sgS3W0q63vEEb
z;;8XozIQ<HF5)0YGpT7|0v^%kE<6CmjZ)`+v5u@pA|O|{IC;P(kr_gMH<1_NU)i$W
zVvb}Ts5@f925{%5#>>czjaKQ(aSJU9lE4XUi0>rDkD6dP4qVDrw6p1K{ZEv|2Ulz?
z>4;t`FlrhCG>#r)BJZ*y$jjyaI>-j#S!@>X{B2t>WJS#pBq|pJV^S!7x!Fh`f_6eA
zzI=z|4Ze{FIoD;3fqtxoH7x7|0zlE-QwC06!hY4Pvv*)4XJFZ+$*QB?NN;0UrngJ~
zU-UMGh0hBA#%F$3KaQJd?+Sm0w6plO)Fz%BC}qG77~l=|!E~0Cgul4yNNAR3XE$@`
z5wX=E4?)zuNoITERz;0Jc*5Jv%riHGnOoQgv;VgN-4p8nqk!%?lZVyQzm<mn8qonh
z0)5|4yaxHUzLafzbZ-ryC!F*GV!xtO8@UV+c<2#P#{E+k)26hEOD%u~HKVBW53tjf
zfg`>5LGQ(-*P}{+zDY7Z!3BJMyV92~{Gmcz%mqA!#_z-FP+HFAr&(d90J;zXSy<4N
z2X!1Kf{#Q0X|35+dQk?T>+Y8?rf~edj;z_-e#Kvb_(=lS^^@!DhpsnT1E8zMggsYZ
zj6~(Ixbq*eKOa+tJ<poQkCzz2bymZ$rGq7LRqW=C&T(YtJF=oRtVU~bB1rxlK`7y|
zfr*Q>o7rQtfS5Lwy~Ifu25A(ZUS|Ro^DLq;uROS=dR6xg^++-Y#`Ma(ezTFD0X;?X
znhk+(I2j4~6^$6Vt_6j0yew`DoZIm2i&DxO+~01q&K}=}kgJ_#lhUh>dLzB9XN73n
zHpN;jdJ8vT;j_@c@%cHcAEiyScZENLAF%lLW}aLboym4H82+AZc13gkZ<0hn#uDK+
zNImp_r%9ZuhCgPTy-I7y=JIn-accN8>;t^D>UZBxYhpw*_2L=OM3WmJft~Pj0^*&G
zz^ViU{Z;TlR!=z-pRuyelW@{l!dI%}1KV8!XNte)`CQqAt||b!qJ~eO<sCLLzKa1Z
zxa_nBl!pBp#ouN7V>j|_An4Cn!X9^6%w0SQbIZU-vK=D}OJ)H(J*4fz3OFNT{v$Pf
z+zszRg#b#LA?*-1pwY%JjHBL*{sl17^21$#rx5?GwkTObIL;ylKM47yp}}5W5<-3t
zqL$)bs%709-iA3G2her3w}%bb->f5RJI6QkfebFdCO0d#C;oN;JO#j=6%^X+)<q1K
zRWg&<y5%^(1uk9SNKMJYPiP-`7&aIbSl3a=78GUIl<>-JJzVVCak2`+J?}s|4Eu1^
zbqP@27q8L+VBCHw<_T*ciy8D}5d}|Cr^mQB2U3a+NJm{i7Z}rJfZnW~g;gUJ@%!Et
z_+eJyq;fXbS#y_cm9?O-A&^C=%eg?hTdDl3ni_ESg4QK9z$PECY;xafK)*56y9<~e
zb5pDZ=s$o{$YkL&*T3=kJgc9?O|*A~KMN=VJQ=%*C(jK5%Fn54txnL=F-Eot({s<J
zBf;zV;9Y(N5c<o3#B0F4zXjb*JmJ@`>@dKTzJWYWFh;L(oK-U@8vP^eD+Ii?*pTMr
z$0c1n{jdm#H1-uR%zYZlbEm&;q^*expzAnPC99`@B?WdRngfFG(M{+o3ZN_2SvWP2
zn3FB<2=Y>SDhDVHAt~lof({KExQsdw^c&F#dzHLRy!WFvi@*nqSpg^dw_aB>*&T$M
z!d`%2NylK3&vZEsT=rUkT<19`4|0P<*#XiYAlv-5GcbFq!E7hj+16nJjWWRA6))2C
z>cy^pCT?+z(gWzam~m7l+iz9%0sF1o86ytBCjZcf53?kyKDmrnuYMJ5v`Tn0Xeyp1
z#cjy=I8{}SO!R@y*A>__HL9+Mi!31y`OLG>YY%EIrQm&naSp|C)<C9d4P+f>SJ<pN
zPtjSTUZEF|Dv}hJG6BYP6reY2Z(TJ~?z2OamDiVc?O#=f+eJ4iZx9YYNq_ndeFOHq
z8?3XpkuwyqY;xynK)*56lLt%>zA4rMloY@zw6UlG$G`D8jn&VUO|*AK-w0vxWWpw%
zOqm6cFcv3a2lCm=Y-c4RHXVsTCB=>&AoMR01n~|FJftJMqEp}gDf})hwCm^*A-6#4
z`<N~#*i)0G`se-#`}zQH9qxqqcdEiu_|`^X1%iHQBG!u4)4#HZPp{<4o`ghiLRTID
zT?xwEKmM;gJnZ&=<>CL|%fml2FBF^m%<QZvHad*=iDbsY<CtaG)q;)No+dpkY~Ggs
zUlMiBu`6NTuVya(?JLCqmiqrgy3YED3ZS!YWo4VH$ACmfW!}X<)QOsx0381xl6Cjj
z)1TK37CvlJf>T%0otS^-Y&2R6S9FKLoFy_q6IPy3`dn(tOFFgt))7|fB5rjYPw`EM
z7(AAL{2~YrK<!$CL7+hpKoQQa<!)zfcEhW*V%Sz+X8GD9KS+xOr)#N7r&7}@2_m43
zoj%ISOGqDf+)pG}ujpNC3E|mN)BmqYT9(qYlCR#`l1Tb*v~zKLHGp^8YeSy@;&qgf
zeyPRo>+gtZnNkP1>h#t8@JG)T&9w@(DQ^%g!-&fNJ+CZvR=!{l^N*YtFAnv)$^Ve0
zeYynbqjQRD>v6_F$S;<2xpQ+k1R};UWBIk$GTZhkvLMLp$k>?;T2$E{cs2Fd`mEKk
z{J#HDpTH1g+Iprt(>u+v3HF70%JU>+&Q1OI1HIb5dRuSo0Mt@RCS768#A3;khB|z3
zPUIF2JY?<VH!y8{@GAJnxc-B`ZmzWpC+{=71Z*>6q4!ulVa)&qF^(5ZL&^^_?px{x
z%}K8|<<=kwq=|+CzonO64>|nx$*L7;H&FAANw+FhVS|-aRGjS#Zu*)DK7;JvvA8SJ
z9A#v{xqM>I6*rGMvx}9wtZ=O<v80i~&t2Xg<9zM-ym7bxTA352axF*opPCHtI`{ie
z`OGE(|J3htqYwX%v5$7MG8mk^7$@R}{L+=*kBI$|O4_j)dnz%`KwsbfM9ihY5T?&P
z)CHIWWsYbqbu`v7Jea1mGXB1@QkUXKet|fN&BZt53$!jIM2#PiSG%9Evl6Pob-OB^
zCd;Wa3e(S*bT`+Ed`m>1jJdRQa2*d!K3vD(>tz4hj2UcI@Q!yHivkztplc12O&Gx9
zdV8fhZpe#{16nIQ<gh32szsCzl?|}}HX%ox#IsRFoE9^Ab-OULjHC5eD@dnSilsb`
zFoMYrtDSqIy-PTkV|TJZWP7tE9Yb$hEm7brHS^NgHW)N`n-j0m6P-h2Cz1_9n6SIR
z0?BYywp>4<7P7WFlcundZ<kAf2F5h%S1ROi@t2TKh|JZ<W1y~YWbmTD&raE}C2E{G
z!EN@&FUeJo*!QrsMOjeCLkbX$T<S`gW@4eo#rDke*PJhoJH{I<ot~b~d;j!A#QifO
z5n(HZGhQvP<r+(Z=FaM2d^;R}LZZ|9TcV`kXw?8I{0mX*y6C$hx3}tuX?%;4(klEp
zc<p>~<`IL~%G(p8pnS{a!2#WuUn|TUI`t&fDQcZ)T1|=RT$-00FZZiXWU)cY`?zZy
z;PyT}L@8%>$=Ue=_=3nOk585f9FC?}!TW(n7N1kXbj{}Os95fG)q<+T(t&Z%FZ>cz
zHzklijnui^X;dV{J(ilxxVxnqKQnZTr-Eh-uuNWCiX|rwPC2AC?0W9YjuU-cn96`D
ze#`KVzR+V`cSkJl&}!gZpA2eSmBzB3yBO<WP$DQ|fskjR(CuSSr@}+6U$mTt4a0$!
zj<exTvO3G#11Qy@rC^cLKvj=WyWl1_Ay2$eX`B2Uu<$jq8cMdMuD)eN5$wxIvTo%I
z_bR+Y%dy-ROQ$$lc^4}k4B2g%pF{@m(pC2HY;q7v0*5-<Gyo5-d33q4r5ktly@aDI
zvbmO;Qc)7e{t=c`(Vq}W_6>lx9H80{@kfWg+SWn~L#yza<mVzAYpFxGQE~yw@y;}R
z9Slj%;Mk{T4y{(A^^n9X=p>K0f>sY-Wsdw@u9;JbrLab4Me-~^iE(xNWQ>q!*M+yH
zb2mS9YKS&2!lQbyN{LOnYBw~T8s1%#CeL4F72rw7D|jK;iOlbZwC<isFS2x0?^Wx6
zd_${cDNLfiwxq}7{MSnZLw9ypLWNPp4xTEvbh-0HbZDtw7<TCnC(esEfTAVj$+Vl*
zcRG-3pcs#+t)(VbnBmWwL+%Gns)+^-%>^lGV%~+cU}Gbp;TXuZoQ^nEzDmhM7yug3
zm3cneIv6~O@wxRLEg(Q`GH{ZfFT0PK97#^sx4l+9Mc6i<Xd1_JJtv0jEzc>r8rnAb
z2f<rPm@kuCcibm1s8jtZj<Q<uky0iqpTj;jCdvm-R@XSHcL399yahWhN#LzX^7j=D
zDxlPo6nUyT%+Jk*gsQtPji2MGA|be(=|Swy$x1I8ViiSrJn$3rb04mI=dwGO6%ykq
zW<s7y<E77^j{^JA)%A;x00tKs2@PDLEeAo{4Rzo1F&S<(YLg$lnEm!=#1U@E%<dc0
zr}I$~Vq(6RU-)AE%&sG6L<b+QOP|-zCACsTgP11!t+2*B7K;scBh-*aJHYR+p18RK
z{1V=U&77$>FtNriPi~uNeho=EX<NNL=QKM3I019v@jI@}$t>FIrIdb&%h};oVqUn7
zV6GVM5N<{|SP#E3cZb4jct)^sxEkO5Wu$Hr_EaTO&u7LHH!tyJFga&paA=o}#`W*V
zD?E_E!6lEAPW4^O@9{|;OtPzs?xH&`TU+&5UI>JJJwh<=7!!Xzzu52UC2{Z>@bcuG
z<>w?X<|q4}Pn^3jDzWOG14xyT7Xg=c8}xq!BT8mwx9q}%Yib8!(n$<QFE3I#PN`gb
zwz*)iZjt)f-MhNN3x0yDmwg0#$`c=4GEq3LPI>bejsD68@+wNIkFYIFhyEs>0lUf#
zwdw7;P(`3`6WEkk-V(&bT!b=x`;TihR$B41{GQoFsnoWfIQNFkx8D0@VUiOsvB5T=
zJrSI1>E&5jsV=Hm5VJ7AA}(j!gTDsbI)-PBtMU3d9S>T>MMS*yua2NbvFYThm(jOV
zK(!f81|LT*TwvsGYginy?(*`Tx)@uUrLN8GeKITJx?K<ZHbT$-Qp%wj`xK|~*HF{f
zUC1H@mGiSHHkBbbsay)AT~bp6BAljP@hr2N(1ZI**cS~aUvr*_%j`m@1B1YcrQ*#H
zrQ^N^JKQazlw|7bV%0UUjYpzCHL0JrrYc{PYZPoKRaVrd2K8c|U{H-f%3>d7&OoR)
z@T5G1x4jSTj22TkNVt{%nb)%{OlLT9?zlgO8FMKe{id!kl+>qW@TFv*^if`gUfqjV
z{0km9HrWgxl$*YyX-LfuP_ZPONcDWTL4gT0rwNo%39g|FJ3`c#p&tti%x&Hh{=T+v
zv@3kE9+o1Gx|A`esq5_s$>~Bbc}!w*7w7hi)wI?`Hfg!Y)<A6mVv-&SMsDw)Qgq1O
z)2-xG2&GtXFETkC7XMP>)%LU+v_URn1^ctV4PXdlxo1~23HQJhGFUWL(EDwA*)OKc
zNMxYF`QlRd=>$c8XkbX$%+pAZv{t5t_*Ol4FoW49b)Cc2JDPTInBZ-68BUAf$yFj5
z`yJf9xkT{TH|q?t+TRQbbpP|ZF1)DXQP+p`=i$Qm>9qIZI4L#B3QBT%4MVt0^G%WJ
zggOj4Fi^YWhWFQr(5ivji#p}zxU78IdqX(cxto#@y$wE=r2w~$VynGpLoyk7-K}o@
z`W1JJ^I=O!2Q_ZA0ZsUIiEG=W<+E31ofkBBNiwa{ncfq7_A$pbS`_DSVM3nR2+u=@
z5owt5K`-nzK4Qe?RF}Y6T35{g-Rv;4&AMAx&Hge}X~w>MB+D6gGXj#|;}M!uBhy6^
z)^jc&O`GRSUX}`7_6^Nx{+mW9*AZj<FvPpZR%ldP;q7LAE^_K77vBe^5M&V|I+|GA
zF$5$B@A~1o?}v0BMqjJ0gbF+p_TwBCx|QHAKrzqhu(TPOH|rm9rH}y~_Vs?7V(RGP
zkrRwo#!fGIna)7!9g1&6*snOq;Goy2vEQqMz#-h6d)sUER~@J5;YzT-h3&hBuDjvF
zZ3U{IubvPa%X8K|7lA0UTMA07u`D-~2zZ%zuV_NOrse_o%mi7CM?yPX5c;-H%*_EQ
z>oR<eM-AdCjqek68+*&<k*n<--`R(}MXNW|L}cP!%Pir~Dy=e0grO5yH`h#Y%ljkq
z0b;?E&N^Z+=jXxjtK#=)7(Kx40b@=4da>E5j37R@BYJZck)E2eotRr}OPH0!$$Bwh
zpvP+j>85~HK&r%;v=iZUGFJJBhf?vqsg4Ju_UZXP&vKfr0d`s<M51!*CL4BTRhFm%
z-o-wz#igBs5uOifyJqVk>}RX8H;|pQDDrHG>3aB#MVZ#Ug4h9UjEAByNlJv?;2A#C
z&io9c-}LP-DCkZ`UMMfE$g0Gjv?9ih8rKj8aD!xxV{hTGjDSgcNkWju`<x>iWW&7e
zI533y>p(HgR<rpka+j#$TkHj>$K2QRx^fy)KlO$2Jzlh`fuF_}itbh?#ckWs8R4o8
zTD7E~I<~(iK8~qzF2yraIH>vIs#vUWzlNy$7~!;dn^xOvWl_5+P+?PRbS7k+{<uNP
zpyL6lNrySNCGhq=anZ%Qy13=C_#G{J*Jx1tGkd4L#?=-G!Le}kLQC}Ab40+TZ_A5d
zc-4b91GAZL#z1>7&;<JFwtwfwRp*g+Bl>2t<dTb>7M-WJ<qBu^S7}Ee`uq9}tx=}o
zkq<}*O-|A}4DnVD#ok(h$&tk$rp?6M9maIdVWI7m1S%B??S6$ac62PbIqjKSW%cOM
z!Z|N$Ohkp)^QOfsi9B|?^v7TG86~@jd!~ombeJyw+Hy33bn%u)$8taVf8AQGb;40G
ztSA5r<u%qi5mVnMYEB5x1ZjUY_$wbV4b$7Uhq(>P6pET`sra5D-J1N96Ij*0#*sbA
zKggm3$1RW%tu2jH#f3}Mhv7A+k+KuRnmlR|wn9Uil~%To&Lc}6HTd-8UVg#&T4!iW
zxfb9ZRM|`u+tq~cb~S&lU8cFXw~QW~q=Jp<4k`fzObXdomqK=x?0gt_duexoufMjl
zZ+HcavS&Zsl(F}o6mN5KK`Qd{r)6J+9LbthE_jRHR6!~){E;Zrvy-`JKhv#s0A6!P
zLt%zocSZR@!Gus+>CM;&1zyhv>!h}Ur_)RS&OIcHG<lSixf6pjops+!Q6VN;{&YYp
z{}q9-zk@?D^SKQyA&)*-TNen-6yw`f(=|t?hVebKpO?z6YD6h1Y4;jbzk>h>AD0!g
zbxcn`Rdv8WGY%f+NdrS%ynEwr%r=Ue&W_&@e{yJI5Ft8(c#l-{V$>;4e1e_w^kS6V
zLil!0XX6p~pmuFzfB)s1bWjOzm`@_n)J`7PU9Lq1R<{rDunE605<)BW^coLJs9;2e
zQpmVl4lO(Aif>cg?>1@mO-^4Mq3n5$!G<s<YHxm6G}6Q9rFDR8;5R>9*Bl(QnINPM
zh*j;bsC1RHRpDbM#^iBNZslXS{`rJLwD@#@xjJR94i3e~94d+zgED!<jms)&zTVS{
zl7ttQhm}fSNyjMFH#zwi58F|Jka=!i&GdPxg1?2qftSuIhV(+1=TRz#qvW)z>fB)>
zMd`-u<;AjvWsR};68+$kBKTf?Cb!Tq`9Xo$xKFt~#%IBWSb`9(uKJ=`<>*cSbhknL
z_6UN8mCCp))s7ab$VG$n7LB92k57`Zw^MXNv!tYL@6@6We<^KKm(*s|9g4f`U@F_}
zTzl?_v+h!3C`I5S;WZiT%1I2>=sQQ%Fm~`2Yl9>w6ln3Oc#M6z+eAe#z%y0GA(ldd
z)bOmX?C|BB(<lSYCY7H4W)I+yy#6w6qDg80vfMd2Y|mLKT9|8IjAjdB{zS#+iw2?0
zy$a$e)DtnY#wE^NP3AMMX_a!tg{S^WZ*4^@iz@gaEm2K2^R2l!lkBUS4v)mq#qT0y
zxwNcDu&;S0$*wvUGXz;ku41>hifE;xZBn6VxE%9_qvPDk?G?q9hTha@3@Xdo!O^)L
z<{0(1a~ug9&eAG6((ShGCn<bcK&ANM_gl=vwVotD_fL?$D5X0?b=HY7**_+&Q3pmZ
zrHOY@HLKwFA&p*?BJ{aWDR&v=EZ6!`q#AORaC7)3Y9!BDJOhHfcRo%fiJP(A`tpoM
zE>0@KSaEK&m}hdhNvBj|G7r|G*bRlogS+A%ez&rPB~#wizE<wHU*z;BiI_}3eDHt-
zt0~`yebfO%qtQyd_q^|$ir=oHS@2K=T%m0y+f<O^e7;R4p$wSn-^vAwO$31H=!dpO
zgow5W(k$%wm`A+iTn0)4?!??KX{Gt%W|(ay&Wlr{H5uYldCrW#krCV?omZxlR1zs(
zjxp-iC4+>LV1&9)ZEf5xu}<`Hokqw^9t?XYPvVy>q?}LT`Oj;%4C#EjOk<2a<q+4v
z6O6=m=c3MQ-p1OhUQCzV%0+66P9tJCPs<~sq8wGpxH}1Ix;=rFa@cc51T{PrN~B_M
zbs3xSx?YtaXw%{btApx$b+2|l<cT8vHBYXqRCk~M`HC|h%Elvt7g{FG^xssZlAc*n
z9+bgG1rsWl2GYX0I_F0-ffaPnX>4)5focg6@^*^>kLlZSs!TO9?|gv-a_SAJ0K(Ig
zkB-&*cnafSuUk>2-BwgZ#JIgnw}iQ4-l$6()Ac17y0u0h8jd*mmhLNn9<{Gos{XJ{
zc@wbwH)-!J!78Bm6lE1pKR)xL#qH($z<88Fo>qxSH4|O&o&9veT##X4m@Dd>#;>f7
z2guXnC7J^Rg1(-Kk>DKBF0_keQ~5+5j9GCD)o4|2ej7tko@#WI>qBM*B8uw;T>na5
zqLx^U>SY*F{35};kK(~va)COy^Qzi(ar;@Yb37P)rn{w^o62Et5?yE!Oo9TZey4h;
zp>L^UElGYYkjyyAkiLFd{KScAcg{MmVhXheZ6-A}c10n@k;^p+U^Fmo6sC-fv2#@E
z>*yfqUp|SoZzo*f8Mf8bfd^GC7s(y%VBqp#O+5>afDz)s|D^pKZ<SGVRZ+uwTxMDX
z@_*Ntn;U%;NRhXz@(CMWv`gvsh^h3Um;}%d2U6x9ik-YnebZo-tO>)Kh1vDv4!|Np
z?xW4b+5>%d=8^e+U&A>y>W26}fFK+)y<qG4o<cp(Nq#zTjNx*-{f*q%@~ObE`KQaI
z@USqAHY;mXYZk=QWI;EO=v@@agLJ+jfl%ts5$hr_IG9@kBP;|i-`WT5kSdyL<W;_f
zjoH<wOOfW0XhS$Ga7w8(#zD}J4#M5LVG<a-=D(jUEAoDf!k@E;dSt?~S`jdNrOO_%
z9Ny}`vJ<OGF!0#%Z@!|W7J`J)rm}~d?{FvppDW!lblXdvrt{?I&GVQNLz;m4gM`UO
zGqO|nE9?ngBe}+Ae81RxBbB}^tfW{C!JSkKfm{5_)5O^1DXG23s@+!oDlO(d=GfK8
z3O~w$g>?<o0t!eh3V0(G9M3EVl&Cp4o|2q_JnDu);y!<pw5DSm81BSku&U3ec~$G|
znRNHIBq>*iMoG~s!biC|;x%2pYO=WeziZ%1N6wt%b0BRIsOtp+`3DS56=_o%8J(l+
zkR8$=s^hbGx_Iac5HEHdRlF}K#pxP2_>wdk!#R6em=t}=V~pUPCVEB=SqKo}_cLEV
znG^2Mz4;cvtC{z(-7!n}QRk9|;Y;@0KW|Hru%9{uX<?*zxdF~>uR0@?RCZ78E=!z@
z7vbZ_kC}IC<FeFm?tEJOs|j!bfx9r-LUR`^3E2-ip7y{@a#Xlq^#&d@$i1F39q%=;
zr&>eXTTLYhyY%*hIKS;f6qC>TZ+6AVg^^JDW7@P-JRRFxV7C~$>>o<6cz%f48Bfoz
zeMQac;I_>oE}=%9=I8UXTG}iFLnc*w<Y6wzmoYXOwHbBcjcuII?uIA*oKJ9A&(_DK
zEfGqoE$U|UNqf{vU2(|VNYf9I4*vJ&G$}s)sYsLf+tWV^E-OV82Dcz3{Fgs$y!{uJ
zk?`IX%IuLaRum3{r+#EzE3`Dg<Z!4@Q99If3oVga1J%*PmyXZ|_aL#G(+0~%SFSsg
z;YC@cefhztNB+IA_Ethp_C@u%?JW^+kNZPE-tsk%kJ3v&#5~1JiERxa6K{T4M#>SX
z=vxj1|Mw`u_gh`_90_Tqn(>Kc2$Gl9vHH*oEJQ=cb+=ZrD>f(v45$wu{pSriNyOch
zqSi)Rw+F+zV(kRo5&dW#>WgSCW~CvcZuF6KXm^8(5MH&`_LT@Y?EWXI$!RHVdi{<l
zUWIO)wWQ;tb__#r>@@~eOCQ-rP#2gs8L3rU9DpM3r^Ab(31<@1G-aOmJo&pkMXbx_
z)C4){R@Xl1hT&)IqF}L7>UWhLVel9H{HG?hYlbejLUVugA^XcsX*H9gjm||6N9#3s
zEYoljn+q8Hk4#>DI{)=s_11kuTM_%`sg}YcT0>xN!Uvsu;Z8KFxZ|vWt54Y&hcI3_
z6pL*1*-<$c(f{zlUdy+5EbEF;97&p2Ual9-E{exG>r6q0l@U#H)j6VHB9O1>k8kx)
z<&~8g6Cl7H%W&r&*foiCGS~pxkBaYmf|$-jE#GY{qEXmqm@;jX+Jb&))goi9zII7K
zthQ)J%1Yr+wo0jOj><~T(M;aQzf4T*Pbrvta@JhH^+jNk-~#>#0B2JV19qq6i!#cY
zA+Koe1(lQZa8CiLdL)1i!(QdJ`O{GGqtRuWRmuHk`}N~R%Mu%wV=}?QM9;KgngaBU
z&xc{3GrY0IwN})nrrfdSIzflioc9tjC{f>qNPZkasrVh_pr(ZUGO8e)8iyD8h(ZlB
z>V3Aj64=Oa=x5*tsch9UT)q=M?eF>KwXTavHAMLy%7wF+r1uXPs0m!Z%<EJajJov9
zb&J$wI67tDJQV|kAP@#wI&`*Hw@OKk<|`q;<o${tCT~d);bshl50ejRKt>v&aioK`
z;vy8<c{GXY7mqHInJ46OFx|rEdwjmm!_Xu(o2eU=<AEgaHkQs~)lP^L?Ial7{#q!Z
z@*ew;!z{OmQdYTKcBrNlqEyR3-uytzG?a*_g#q!gcpFv=ZAhiJ9Lb7XD~__9z=5~O
z@G2E+*vW72f?GFQ&UtgXqVLEk0)Z$@Cn$xmL>@p7qH2jGOr>Cz#=u>gj;?OuIK?RE
z8f5yD=1BMp9ZgidGhO_C6_X21@-C{-Y6JcUc)&#SPUO-Qqb56w;BJbVnE3~ZEmX_J
zmW3^$`mE@e(#tdNe*=c@?#c*X>X+v1XVS0m<liS9bo*syI;rijXj64jypU$bN%A(!
zn{=-!F#{*EZb06gz#%>GTODf54I(JMco{<?6P7poNSaatpPD7p<!!jgHv_w<wr(c!
z5Y%@+H^3~u&=>oP>bu*=qh%(oZERY~!hBR(h|!0P4`4=>nLKh!NN5s}^<4gNJqPL1
zqnXfDQ@OCbZ<@SKW)f;|Keaa<X5eCuGB|@{Isu!~$forc{?rmRcMDw6qdCYRjMgY>
zG@tYJk<)RpoXA3H*&Q<p?B2<AD2fjQk~wG-OA0wm$NCkqBalO*vuNfhF~Y8&7dUm8
z+q1k*K4k)vq?~Zk@(v4HeOC<^|G;>zW<4sR4i8p46T%$;e?3f7`!R^BaMsj1>xla6
z9|ZoBZ~LS*Gp&m-K0Y2GR#>PcLR9(TOi!_%MGr53;v!k?G^-<uig=zF8vh=U`n`p>
zh)InR&g^pZDJP_Aii_rd8fxL`HEYQP%7~M~pv`_s89o-p=c^(5%?gFGW#5Se1qohM
zFOigz)`c|h9F~Ux(uXf_w30~x>598OH%jK29~~29*oxt?<s8hg@Og@Jjp}6PS9=Lu
zIFqCU&qEC_E>Z;9=sA^)!DsQR<uyd><!d$8%TP6O8}AnwEhS-6=Ex7GMs-k69nZvY
zUq!C%)}jP{SKtDoAf}w&PnWuiK?GPtcxku*LQcQ<u=F)+D2e)|`YO7B2WcWc_wgnY
z`=DUiJ&qXt?0an-RV9~q-rmm(X+EU@!-i)E<cN(SN)=woR}EbYuO1?LQ6Cn>YJ9s(
z<vU7X*Ma}DYfpH8K~}^hwfK16`-orFV?^Z#jt1>iV?$1I)MUE_$;9PX0Y<A~y1W{T
z6mepbu{w<-QB{trSrNMEg&ZZdEZ&~ZPD#d$CtmeZu7Wk}dzSDPW!2o;BalyWb!9&Q
zFY_VRUu?kfVx8KFA?Keux+ub6{6}W0QTF>i2vlakWK7MwyPe2!<?B1fh^E7ZK9r?0
z)#PiOGErbo)prpAK-|}AceBH=)29beS!nVJNsrzH(A4Tn6<#{`mQuz?qoeuJUu$h!
zzwiVh^2{ttZiFsp^BT6G(MhKcLiiuJX?HFjg<JQlY_&2+-7=KWt$yzB^^gytR&$Q&
za<4s>wD_gV(mPjxSYo@#c;^;Dn(S(>T4;&6HRH3Ziw@aasB!!DbtB)nhrXh(TQ2^z
zbcEiiVeBG88}xk1*z`FEp4&Wuf@5+btT;OW|3pdB;G)~?VH?JApltkE+=Cjoo%ltb
zC<#_RcCeZ_f?<R|t8Uhz4a_cviOhfDgL}f}Ejhf6uJpbrxonXmKrFV~Hf&}k*epPC
zq(JA(?guaRcTJK$W{!7BlMlY=dl3wzP<CNtPzEnsF6-;JhuXdL_Jm`e3AmQI=yjrN
z)WfN~3eSXa<Q;}Jjdpy9zKJZ&@<V85BQ58;wCxUzl0v)VqcWznYNSD}6^|r>kXu9g
zFlka^>B$}O{Qh;9X|j@hyW`cv!|8I`{Af;Bk&&W%5`7EK-C3F=U*5?-<LT{F+B%H$
zT{JDs{OEOdM%Otkaq&9i)1ZdxTXD_LH#8PZC8C^RjY&sLc_HFX@Pd~fh#jBrzMU(u
zH_bmNm@foFgnlr;pRdz`N|tK0Nl|e2?vI?*TS>3=HR+F)ertLXRn+s!ahFTdtQ0iU
ztXpVWxuth>YeG@o9V}PBzLR?OprgAeO6%49o+`ZNbde^@H3K<1UKfqm#?yMK_<1$<
zLjE@xYyE-k{RX!N?)jiY#0n$yniyyEV3RKXS@&EcWS8TjfkvH-Ozauk-6-SRm+40F
zsYz0Vz{d&2$cLQtyPPP(uv}&%ti77Fvzel|a8Y;d0(Qwo8hO%<jF2z>f_`M<Vw+<&
zcL^5x9vfDSL8>`i9!nYIRC7q`vmX1Ww8ELD`MMHF%1iT&qKe-0dR|IRf#a|dE`7x@
zOu^_D3D<tp^18tY-qd-G+NE$?H3x~lM#kCC(T*j16F+|J-4(_d{Tq#QGi!|A)0L=f
zac6dP>lZEW!wf^umq1@3SScyx33^3U)m(Z<kF+V0>`pmvwd40S+UcWGuI1??UuB=s
z%hjV%mJ(FkVJ{_MjGrWL-ZdXi6jf?=xGajoE5lma=I^qU+~Qo`>80j-ZK(n(GyUa4
z{Fz|2nUj#avnIR;5WEU0RY_w-zUH=zoZmq{%^84tmq!+&hp6M0d4NHd8VX3M?$@SV
zZB&mW9BQe8z-Af%q_pneHt)^j(5xwut{Ycehnl4wIlAc@FF<hkfwbgh8CQ{Z?n|=s
zml#-4uW^g(!d>2x!!?gg8@p*>F(v)^9hEg~RgTdq;9Tg-XaoK`XsZ!ypeWSAq-tcI
zJ6x?nK#=CfUW+f8m^SIZf*AYa`_eJIr-G;*UOo0M{%aBDqet1XY#GrGw7;B_iqH6~
zCBjl^Kjb!FWma$wxkXpyE2b~N!oYdJ0tfl?(weft>O!hN^L0q6g8=^C)|?P}|E%cg
zw}B#E9$2>1`U;T<U>#lBQs=%KV^2spn@W72b_}UQA=PI38;PD24Zz_?w1pCs?|~>j
z?=diuu;t^qeP~u~#=~~>q*zFkfpsr4CcJvmS`O=+KD_KvfI;XRAT0~$i|n+jAOnVw
z`y+bsH9ttpzE(FF1VN4Er1VMp;GTFV$c94qv6Boc*q!VcX^s}0Vs~@rsbdyukdtl#
z<9!t_(&iXoq4Y($%Xyfle_7+9n<&a)M)jyUqj*?sv{`=6fI(;qpB&C86K_$=ijWe<
zI*)+nJVPucfYjOfu4(Jh`o-5ezUt_Y-L+`?Sbcn<a&Cl(b#F~I@Q|Gi{6ynam(L#f
zwVC*&r0zQlnJw-M)S0-`J@S>LOY^7}hrOGVP~F_tf&$8ya=|A%flY@E-Do+JYa-)J
zEgP;`xIQdAs*Jx9E`qlUCA`oRbcBkc1STn;yxj1hE|OtRM~YoLT9bXeU;oVQg8QK4
z3`!N=ql==hs_Lax2C$Z~bc-(CoPw#X>6+qNe#=E6yK^$RQ#BXQAi`C0Bh*HCl4~Nn
zkfh5r$&}(Wy_p1x8gSfm<h-22fHnQ$@Mj^`vI8Hpy9C}aqFQlFA`<#3_EJW`&qtc!
zc;zH@zZ%asavYTNUBu`<=mf<G8KF_<=DXccLCRjk9z=2m537mG`z<S5S^0))esl8$
z0$|-qQY#QNtNSg>hBd~rwWHGkyKo1Qh)dh}oq*)z*W3tJZ2OBOr-}Jm5{Lscsn`x<
zMLOw{E~=#w&8hYkQ<Vu~-}7g#X)3%Ygtel`syu9x^(V|xhIrxp;8Pxv=+l?*)jsGx
zF{Ne$&R*EkL@BxJYbZo87FB+~&FZ1`U)@?6$sd#FyqE|t)ijhif00Hv9-sCI7}hz6
z%S5UsjQ<|~eC_g!+Xs~4Zp!BfNPhkXg2U7_%AhyX<oGbTP%!^<FY!rT-)$~G^_?_M
zo=NSo@ix-(i$tHNR!iJ*bUmO*Aj483P}JpxD7vg(ff!MLQ72hn=F88wEm&Kb;+J%c
zl9&Y{&}A?49>(&{+E<uZ7lRKJ3b^`tk`V{Hc|)jMoJ~~9$r$gZ_hyA;OT$R!{xH3x
zKjvT|G;>kM2#xDk`E&C-z>bkv6TPMIyZcLv<iFei;!SfITw#mmq>0o>Lwv~c3sMSS
z&b##Gdq;e9$FnqRt|;v9K+B=~-HgP$qY5-Tb1m{S@0>D{NS{!;^A6aF-t&C$1A4bb
zJ_bRI)Xc|HHTQqsozLupGsBE!mgCrYx*gsAg5PYpDUnM>vCU8HaJ$(uOWi*yDUPq5
z>K7NJwsLtymu<L!-!BHFtR&nIGq60j^_{GdtDkV{lDbsR<P1-FloUNo`m%a&&QmFv
z>0iPOt$=N*Tm8nA`RtP!^^R6qG?1@Rv2a{T*RX`5&@kJ*ObO1CnIlY>ilvxiO2019
z=t$t>Iu=#<@`Ae!HuRh5xKnl*lj&+Ee19n_0d?giHY_~cLGJcp+miY#HVpfUkP^${
z8AfCL*B9R9%wGS(8<e+<B9pPF{V)Zt1}&$n5~gs8(xE}RB3Gr|h_zcfT;+O`h!cql
z%ZeYf80^CJ7LJnYGfIF!-`h60h2CRGyQ4)s?&%cXqtzfbv6a9S?I1{8C^H9cc#A2$
znSW*Jds~7S?FwON=pzD(@7Cwe5#rwQCJ~U~%D5jFTOk93_rfbZ_3Ed(ccQe32uLDf
zP_CiP%_;k8?}(Rw@ZXCK*6y;@;NczMpU3ruoK^3F_+2dbC_Hn|AKsy<r7(XXxg44q
zI2k{5cO)#LFkERlE|6!CTzzG*td&S&92v72y^=d@g6u-@wL!|DAd)vTKHk>}L~${1
zBo|?z=k%x4Z*(bn<~~XHDK8ta{P<kbKdc1K8AA@aHFNbd1uIpxe5orra@=pmR#eWb
zA0*r+bsP7H1pLdWfvVesk|{kl<yj^0D!!DapB1{oD&Dmq7`8Khk(|wDPB>bIisCdb
z4u@nI6(4vD*R%qWM4)5!ME2Ua5bJWh1A8q>?~$+Ccq(1Z#SE&XCVXAyJPnGWV~4Wl
zQtUKs#J^?3AkT|25j?4K8jV41&)6Q7V?(kd@;QIJLcGB@uxkIS_$)av`C&<765j|^
z?aO^01E5@TF;H1z(z(6!ArNPK7rfIZAQ~owEjp3r6JOPjI%qvg0?+*#$GjDPo*0Z&
zZbqpc1!@n^q$JUD(tR+RER~zUGGFKRcvxmlEglZZ*aPcyvpp=je6fZ!Berlu*CCRo
z#8HyMAWZ;68k>R5Igb#`p<~}qPiNpmf`*Lwv^|78oQJ=`MZXae<;sWU*o!a`Xwm1i
zeO*MeFYhUS@eE1;dluH%dSYyWZKNUbibj<0Jg8Z`==(Uq>|$}JEv=bsLF0Ximr;9g
z8QccPvc%5Z=yS@Ali_MJ@jHgSu<Cxzb0j(b7r7E=@g;RjDW}-S8{v7ATi?{f4?UZe
zpT&u@!L30k!nqo#z34X*qdDmxbD#k7wBJ&h;!zu3p|f;{5K79g@uPKCG{)geZT+0u
z3$0pW-+;r(<*Qa{sM+A@NL@a`aQWVFAw23_v%7_Z=*y2CZnme-T=#zLT-MMIEtxr~
zB>Y;{3Y6T}a(iO<VXm6%!A0BZ%V(FxZ*K>)gB@BHBe)02$+_@~Ws+UPE~wcf*D_Wt
zBj_4Dqu{qhKYY|iQBfP~MxPOww=m&407<z&VF=VJ41wla)D7LA-aIvU2+=ZcMAOUo
zS>X~|Z783V%`9gX-f;d9qFbTpXL99p$)<ppEEnX@o#*^fT-Na(Qc;9)!*||mN4&>}
z$ZpNq4Qz%dD*V1AQ_~Miw#sMnEA5B4`<CHqa1<T*3Fv+r6@-Jv6&>tmYbQ)Lf_J|i
zGn|>vpbeigI|J^b#_6Pge2*yR_BAI&)*LCTHq>;4RLOFivx?<Js@`r(JLubQDBpkb
ze1>)CYu`FI+pEAT-!n(jm|Egz@$N>Az}Jj`feipTWg?N-1nd$a{?#kYt&s-nm618?
z(lqUWeJZhGN-aMomy}z!!Y%dy;=9;cn&tjBF)at++y+cY;Hd(|;g&NGajFnNvlza;
zFb;4*lyqp(m2JHgaiL)-ta({v?Usxl5fOg__IWR-Zr!ZDhH^vfDeoX^bCyrYSJw^*
zFJLf>ncup-A*CsDe$C`AL-w$!oa;H|7<$t4*%CR*oC9#4*;7<5FaBVomG{MgB_!IY
zae%+J<w=DX%e~d|tp?%91@SVTrh|*A@3Q)#Ci7Q)<0gF~waetZ4p&n;*{TS~B;5&a
zFI)PoW373OeqWIseY@?i%V4*5&;GA2qvD(=McLnJ=+fw!^-mYCRpCeU4N%)F=fKw}
zPnXDrfb^8I*Hwjitloe+LN{WVF$O%4>gV)lV#!PK1kdXW8osmprCZ7BotI@if!cTr
ztd!%&$$dv??&+-_e_bvw{cO^I5(i~hcl3}C9EAx=hq?dUkD!Jw(YV85*p_O1c^;u$
z1jw;{@*=(S1wIB~6zyVRLAdt#<G~j<Ef!SV%4Bfl`S_CVaT3_&VqWEPjDe2C{opR(
zgH`ZWu+ODg*w{Px{!u*_AZw<maqT73O%OGCUZr&@+--ShUg+2_|D>voU(RcYSpo{5
z`yb*5;GH{chE%2=u71bSou448YoNbheVgEme^*#Hi%um&nXe?+A37GAh@Shz`mm39
z=|Gj*w;`j&T^t@|8@`G1Hsn@F3DX)s!$_X0B|p8Ea~*RbR%7v=#NndJcI*7|U|JH*
zr^<ijqhGT>=br5MZEa18p3X4(mC6<bq**q6xT?tICXm^bKYE%~-3<ixA32*Wrf$$A
zH(w3S1wNiPgL=eS2U__8*kJYC7vv2&&zF+Kh(Eqpl`pq{$X@1%&`PzqcxjdIK^gbc
z=+~Pr<{x{y1cqqJL_8=BxU=?6&pn<lGT$sj#~n9(8AX*JUO=Xz|K{-6h8sA_pcbAl
z{(?B93H{J-a)_&v!jk+k;9Zz47^uId!N_66&?#rkQ=WywwW}VvfQ0+1(!cnF+C>hJ
z3xjoHHD~fpKd2+}Orf{FXqk;wb}Xs?b#gkrr^Ow0qzi{Er|pSivwZA<J@-EDBq4A_
zp>zvTrZJ<eXD2)uto0dN0*_+D?7>?+xdArC++F$e%s7zk%$C?qlLMXvN-O_jJ$ZRf
z2>4k-=sJ-wIfJ<4r*ya!eV|h>mVAEY4@TO&!e@wa$OvQ4*`2j&d><`8l}gsydTbl9
zIBk!E3&wCF{nU$T;PXIVNGnY@U|#$Dl$fho)d8(KkYlX}_QLx(b%H9;ADgEDcgT{2
z&S-`7F+zF}DGL9;YCF@grmi%OgP=GNmViM(NLY#tP$$^(2nHG!k+qL3;!?m+u_M$$
z159adB(j!8F;EuESYk4O78onosSG^B1&Yu?q7odLVmlaF3xlO0Ned`Q=Dm=crTsed
zG0!>YJ<I!^d(XM&{C_!dh~DRJFDZkYt0A2ma=VwQyqXxOOqfMV>Y8RJSBe%d-Mbm<
zRzo|(RMICtOXU?uLzV|wgru%J?1ly>mnspI=scB?F?r6U9|v;A>>G+`_C>aSZ<l+V
zdR7cCs~=$?>ScF5){tPWbHMON-lmPRaT4%9y0_?j!T1!#R+nwQhp%ypjz#ekh-tTP
z{CW}`I>W2kXvgZlq_=EQkfww{Z#*PPrAyK3t>IkeUxp`Q$!NZup5I|&Kqd4=LZ1vD
zM7-MJ-&U?A9u0luO>KZ?@RkkxGrf`e><xq7lF8}?CvB$c&{FN<_*Pa~SsC3h-OfgA
zf%#;1uu-S@`th#O5e?yJz9(010oB*bG5VxK%Yi<h%bcQFT}dT*p+O_i$DC6vz?&<m
z>{&T&zJyfre0cFpeX0vMl-GX$uuj7TK)I@P&!NI4WItq|Lp=yRr5z@A5`*bCeN7Qc
zL<&XuS^viaYhExqobe-WS4xSabyFt->3#BC7r_?WzlW}-EPeTu*=#CJOk0kGaCoB?
zp7fUKb~Ldy*oj0aq7m`S9oa8vmBNBqqgDpRhlnLkK{xq}(_wkFa$m<4LpM`C)i-)6
zB%!#BwNqxz_v@>kAh9nhDu5tcj<lC37=e&<j5xCo5~(QmqARdezy%!rUHlOA574NY
z3*|5vM9Dt`N);!76DmSxVTkUp{`Lhy^rrZv-r*dnv%vAmqVLqTvJMlM$dRd@3K9G=
z{@&HU+4H<-Z_%uSq^{(7nr!~(FRBKU5wgrA;@jZh7_ZOZhRT846YP8O<xr3bk!b~u
zC`Kc)e$(^avdh!m0dSF95`g9C45NnyClpt;V|&8ro_4H;X*<3>CV{c1MIIk}l?W;r
z_Pte`N!pli^uRMG0ahxNsE%Pcb`)Ln%YU~|erY+NKeK~fpG|dyTtc6_!*Z2w^c_BB
z%q)>}<vU|8<m4_p_GiwqTkWxYs~>$s-_XjuCapPi^7BV2sjgSTS;9c9-Ykzv+D~W*
zG>bEXZCxrOWjFZ1|Kq*3UUz=#Pw1+^LIDI+c}2TPy4`tkHh9pFo%bFmhCVo|lI-6C
zA9lyuw7pDnGiB%^Mn`q*VW6v6J~<F#Fa7}}L9bfwR5ebnwcF;;S#;~C@CVRV$Ckp0
z3#H&E3Xl@FiyHxGG<m&;trvCcsHAMAPtJobw|SsOF#PM%@Ojs-ns`QO@3jV#taGE*
z{__}|J&W$djCApoMw%4aY#Y5O%a2ci&rgfW==$k<2$G6BrbxHc6Sl||U4&d)rxgOY
z*QfO<Bi&;%&Pu!v$?MV&*%DIekZzP>`JDbPGAesGV8?o(59xf{Z_q1Cx~7b@(d$LB
zz4^u>7Q?FOSx|z}z^ciD9vL0)3*%0UpfwUS+!-}b-(wj$B1ul6z(nqO=X365b6aFl
z$bu^{F7ba$C5I%G_SRm#YI3kk><=ty>J$FR(QFKn`|sYW5Z%OYnMA?r2UbObTd^Nn
zVjrxL&SIDOpK4+35g#EpY}>dYE_WhTvfi)M^}dQTY73{|ayb3-v$dOULgRoRnggL5
z8}g=IF=Eld=9GBFz)<u4!L>irqSRR1!K{39MMkCW_7U&A-RsV{X=4}3sqnP}N6d}C
z%82fEk0i#W=%_HrK`68MWJkjG>?8c8o7TBsmm4s;Ih-yt1x4biqpH+Vy!>_Y@{Ug#
z*c0=0w{@ZG<y&!W8BFQSXsCq>UR*bEmc&|jFZ^Uw9eE9F{!@#6rq(+X*A1ltMKF#J
zc7E6yR{glK@mRzjW|O%leYNf4maI#lKJ#;Uf}I(FK&*1|RcQ6G(!a)?ZG0N*tjRsG
ztabKw6`qgY*H4m8d9}d$VUC=2$FvkvAgv||!rV43kkKcOZAjo_+1%?~jH`CBYMr2W
zyW{YjbaLE$<7C2s*)G?(H>(L)j$tVp-lrc2E{R!=GH(a_VJzrO)g)`*_z4{D%-D<k
z-$B+*RA&DV)VJ}moPcj#YHHs9Qgo*ZlMhhsTNE@6$hGzrf-S1BqiC~}O7PH*;A%pW
zO8hqBy1)xXRaj)D--qQ&3bK+Q{^IVJq-@+gsURe6b4BReFBn&{-b?)OcA4ei86A|o
zak3g}-M(XkWr*h><tq^ZyYbRR41C+6W>)Xf`Yx}U8gakA{iAlwq$d4URfNXSS}&Wa
zDCNj6&=YtAf!6*DSc=LPouMU3G{bxFiH#;rjfh7s!dAZcgOz-6v~G5TxT@2?c?Q=j
zFHCk{!_B*IE7us>yv^0WnW8YUML-<RxP8K7({~2c5dOoO=gIn>cBwlFo9p@B`xVLZ
znar~}6Ga6J`8cV4zwG5J8FArq?%sv8JmThb0tM)z<7vM{#=nNPH~oI>cUGHEx`OsB
z<i?#^eQrv!evtEnl^J|4udy-WNY)<#;9o#hg3!ko$P72Vs5c@$p4h6aKi(XzV4OP{
k>+WRTV~d*%LKKvYdEUi{w>bylFAH?S4ju|>34AyIzbiLVCIA2c

literal 0
HcmV?d00001

diff --git a/doc/images/logo.png b/doc/source/_static/images/logo.png
similarity index 100%
rename from doc/images/logo.png
rename to doc/source/_static/images/logo.png
diff --git a/doc/images/logo_emblem.png b/doc/source/_static/images/logo_emblem.png
similarity index 100%
rename from doc/images/logo_emblem.png
rename to doc/source/_static/images/logo_emblem.png
diff --git a/doc/images/logo_emblem.svg b/doc/source/_static/images/logo_emblem.svg
similarity index 100%
rename from doc/images/logo_emblem.svg
rename to doc/source/_static/images/logo_emblem.svg
diff --git a/doc/images/logo_white.png b/doc/source/_static/images/logo_white.png
similarity index 100%
rename from doc/images/logo_white.png
rename to doc/source/_static/images/logo_white.png
diff --git a/doc/images/logo_white.svg b/doc/source/_static/images/logo_white.svg
similarity index 100%
rename from doc/images/logo_white.svg
rename to doc/source/_static/images/logo_white.svg
diff --git a/doc/source/_static/images/nhr_verein_logo.jpg b/doc/source/_static/images/nhr_verein_logo.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d1156ac64e5804beb8dc081eecc52acc453b4d8d
GIT binary patch
literal 8167
zcmb7oWl&sA)Aq8kID|!ly9f8+5Zonra0nJ8xNC5CcXxLS9^3*1776aI!N1M@KKJug
zz5m`mQ*}<A>F&C^r)#e1d0l*62Vlrb%18oWU;qFZ=mU6N28aP*fxy25Dsa#NLIi=}
z;6TU-2=Iug$f&3&$S5di=$II2=wNgd6pXhRU@UAL930d)xOi`|@i4J*u>U%NfrZ+@
zfsjBTBy2PkH0=Mgy!HXWh=5_hFf0rO00@SG1;e}!0EnO<fv_-t@%}3ycsK+^Sm57T
z!oL&H>3^r7Zx}cb5FP>c6#_to{s&^fVnF{L#{R4EBrF2EcbJ^R(GL7D8EU+vs66*<
zue$1C@afU2tE_zA=(0z>O296meGQl7!wRYO^Y;u9!yCP-#~PNpV{8BpgN+TR>uPk{
zSoG8)U-=BLrOs-63q)|s_7qLHl~YI+FU$Q0am+}E;+4vBqRl~E%jFQ{hL|=*y;A+R
zqXm)(%%^&d*)+X6kENZ*EI+H#eT%BI7AO7nEu>2QiiK*Kus;W$90`V2X~Y4~UA)NV
z*_tkMH`|j1W3K?gPHU{76<GlCYJ79zc%)bOBHxi^j}&3gY6YRNX53^pMA%2?9abUh
zpCjH={z9kUMCG_e&AuVM9Tf}KZv>|nNxmBh`_TayTx201+1V}`%){KdGwPksjBgoU
z0isvC6F=RHXN*5@SrKI{=w>vXEoKO7*~wR(BHLn#ZEHMAmM^l8f&d`f5xn#>$|3!l
zK7F(D<7fM&u5MrBmqvX#HlIShiza4;BWCBNyKuAXSz_{}uDUN-Y54h<Rjz3<lxBTc
zu>e#o+F25<M+t9CtwoZ%#FU(1xGm1!KAIPMHq@tN;9j>OrsvTnSq1>Wus{S?pySIC
zK3~(%8F3%`gFSb*U|e=*5AalSOk}Lu#f7S6`HSy>OTVEnp;rV3iAPtd^vM@;msP5p
zU&ka@KcC;hq8TQzxQrA5+Mzb^?Z%ryGs|0f=lJF=Hw)AtDnW<p%JNQaa4<ZX<D$Xp
zi0kC!+Q@sJLv<vGvK+1GhLo!Iqmw7Z_>YmALMk=@;|hrEM0Qyi8j!{O*5bp<co50C
zJmb{tTvfhAudWE)zeNTB_(F@55d#1O!T>?AaQ{{)^a}(7fXBcl<KO~gV&PD-alXYx
zU{_{^Rx2X3l3_r=S3pJP=6e$tI{IJBET5y_x{RrX@*nWVO8Wnpac;vZ=n)<+d+XLk
zRCm+6uaV1Kmt*k6qrLs^glv4y-i%_nsXaCA^rm(xabds1)^_=eHGxFcdd8K;2S$0J
zs$tSBUMbdC&r;qbl*mo36(7e>uvar?-R{4mbs%!_>i4>JKagF#gAY57g=)sYuFk4w
zE&MT)UkrUb2%o#G4e%?1DA8e3YF9PAqd04ZeBMTS-KH;+W(U6l>^|WRS?#3m5ukeS
zK+P@e2ZlG>lp!l<raWNlghA(u@Gti`*$u77bz!F!+UECy&Ba%kag1+>HEcckR#Z+^
z^-B8kd}E)Xo?*K;TP>)9B?N!O$kk0PCAH#`6l~;-2(S1?70}njC4d6LCfWdOy54_v
zzgLL+xEVyiHE!QDS&&;99EBb|Z1Uy&>2#R7v39azL>(x&Gmf!DI}eUEB-R3!nM6tv
zyZ*K7cY$4UugbfxbWJ0D`xn2xu(uL&iud9X6g}|^VFF2u8SWrc-aXh9^WA_BF8n_+
z4lYrTVQEd8Bog%-{|ytW7&D3OOXZU2h;LPSkEQ!hp74ysw?4Zw>d|>s`&8i6k6uQb
zq|Z}1p08mKqSQs3r`t43d~uACpQooF4Bjnfa_qyNRA|#qFP{*~CdA$#mx0?J@z|zQ
z()v+Cs-!2X9IZ3F!FZ8q&xmG(AmE5Nrk|Z<+k&luQyPIUhY{r*dK_@nnW$X9cm3EZ
z@~&<IR3Xjiv_jYyR9j-0q2lafJy7AHGTc_V_Y-;fDxI8yyv%%3sr+%U>ekv$>0JFM
zk+8Dk^?{fAUYO=h%GI@pFr*>(g4-rsJ~WkX{}nKo)%AVik_k3VPpm$ISc!hJiZ66@
zVxV@nIGXAp<T8xwUZV12!ci#_GyE4*4Ha3sz$P%K^2az8k71D0uN@p56+dm0RqXQU
zWZKA4n94+*!*+p%YRh=hGe^b5Wb+A%F&uBBt5*PtO}uFF=TsIoZuGA0XgsaVq#KKB
zm4jAJ{<p~L84*+u`3qHMRkNARpb>CQCH@|IRsW{6A65fwu5@5vUUSdosYxun%T1ag
zyxDNNWCYF(`sVv}%tTC$j+ChWHM31O*zjZ~?YdiVZEC-b)O!cA9ZLJTUU4wYpl)Ii
z(oQg3?c8Uvv8|Y<vTAU-`r=z$Y{yK$A?&`MJj58Vz40io|Bp~f@I#W<6`iD8ws@_$
za~(<5;ARpdaofV?RK8`#?uL1!;uLtm&5>8<%GP|LL+lY%{m;q^T}&oR#c#|O2#b1s
zgEKviwD#euYIA#omFj$W5F1Y1wjf9*da_8SusXGf7gJgrm^SZ5Dz4L@uaIw*V7B~H
z1Kmp4zyKI1LnDBY{-rt?AS@gJ41&kNB4=guufyczvhT$vV;5CXH8gU_`)5CavZgT1
z@89C69XR&!EvIUEzVC8y7YnN*)*!5{^ywQX=Q!aTQA2jRle@UiU+#$+SluEr+tKj#
zYB#eHZjQ7A-rFLnznD0N2ty(vN`&Z1;f9mr$u04@EVhZApoKq`8i>99iS-Shkh+wC
z;lV)rqQX3hcPrxU+SW0-2TGeW1Yh$8wXRC)wB#VTF9ZRHG)eGiCm)wte^$B%37xj}
zb@ki0m~iIZMkhtw&0iiA&_CHZ+*FQi?2X%gz55i0R-a&Rq*Xt?{R_oxOH#uf(R*ml
zrPO*yJ8heWq%^DW^h4;ncb{x?PVaYUim1T=APn>b1KnW%BhVB9;J_G|tXLeP<P=nF
z?1ulO1^72DdqZ*Bc_KPRiWy0>i0vT@CH~5``gpGZVD;k<IdImzOKU_BGmaW^JQlwl
zwhf+psgRK4ZgHY~2M8P^)<a<pq?3<}s=8AUFrRS!j;)e^$Hi~UyRLE!U(FvP_olgG
z@)s&20b6+HH}aumMpz818^0LRkyii|rwEi4V1Pgn5*!i`9`-NPzc|5gn9!AE6D6Zy
zS2418@UH_=eo)0`<=`}S49M$^PoUyb{^%5k^KIilOaU(pWeRnw449$%{LZcdXX1~J
z!Phnqj+3|DJA@P3o5@`FEwdwp2$B%Q?I=$y`c(qE${N%<m-N;TQr_Mf8c!h?8dIgn
z9b8c+tNHX8i{C_lNm8;chP;cyOO<KQgLA*MR~@#M4|KO^uo+$;@G-r&;LG?nrvCJa
zu53|)*4t*E_!<#{W+LFhmQ!-SJvr8r?foU+N>1{}nzQ=2wLs%xj6|2a<!0s4ENK_5
zPn29%&VQ9BwuIWLpQg4%(j=;_ohY42ble)LB4(Ig0Xp`)?b?kd!b(>o8AIMqa`!ny
z{>T{_EBm4gP+!yKE&*Ry`g`^VOn;}>Wd1y~XP5H6*Xe#1zL|f?624g&_gZ&5dg=#1
zJISPt2sRlp_^x_JnufP@>*J&jk5L>ucV`*xm((x`=uloDzjHn3R1XTRGzjmY+qYF~
zK#%VIEt+Cqm4O?fEV06Cxj91`YOz4JS97M7@iaeMX5u9pzEC&j@p$pF>iy>xVB7s9
z_0s+dU=?0|1)T7Gu}cbFHLz@HPnV(nJi0Inf|DFA2r8e|o4z@(VD4SILHg^^a)|wO
zssL+8`_l>T-L}HtM5MWfamO(6rSSWZmMXmseG*e-CC#qts}oWqb`3}GHj!WBILbE|
ze_>#M&hUXSa}zB8B)tCzo#l|*;e0qk!-tx_NA$E#^IziM)1Fi$S1bzLNxG2xr;<Nh
z2hV6w3?)^K>$S`-4yaC#%u;S}ea5BrD`1E)*8NRuuC8^=Nlh{Qy*rHxS&>96P3!A%
zZd{owf55EdE5K@Hcg8N#mMbav<Fd$dDo;-79-ZK3vjYc}bkzc`#OSx3;W}Jm$r>W*
zkn?EC={LS}0+UGUhn4iozc*P@pk&c+gHH*vwlT|vq)`(&pv~LE#Z^_~5B2N(BpfdA
zQN*Sip85T4p-_ryc9~6UXhUcCtb2g_+)AE0RXX5qLxw#7r~QzKA2wZ406vU8fSp&&
zgC@54t<9$=m$$_(-0HYAiOeLQB_UKs4Jj-sRYvE(PZV!Y89XvCDCw@sHiH@@O*BY8
zp((8kB?X-$54yc$L98_2p*9iT<}zj@r6CtRWw81-wNN*%Kl>wAlD~Pjb*1kn@)Zz!
zy*eNK$Wm}}G|^ocnqmG)zjjeTBr0hx(CK)mk-C0&OJ$2I5Qmu1uGDrx>$hP0A)^Sd
zEU>xx!D%&Bry!7M2~;CVP9I8=6m%v>I8sHqNilyNYL6BYTdh&swV!LeojE=+u)v8c
zyeZA!&P4>%u(PE)@I$+bQ9JagbYmiitl7%(En6J%RpKyRcoZ(d5Y-=cWW%b&xY`?4
zB_ZlM^pvr$0HQY2sQT6|EjZ2HVMU`~^>vY{1;D-?9CgtR<;g<ESseL=)fR8I1k9@y
zFA<cFQ<ZmpPiOP`Q|ezBL$a3Zi;QZ6xIzYmxRzPRO=_DYsI-E^PfJB$!%Cd3FX@`i
zZYnR0m%;X}{#N~cZ1tZrc>1_?x^mDPYmLPcF03m}-P^)2J`+pUfp++I-l=z>i3Nq*
zh|uY!qNK+~q-9}ePN*o+uoaEB&>Pw6zUVg{Zy}qGE8J$6-zGW^Q@AA7q7CTO76!1l
z*B#>)V~o~o*^FArODcIgPPW}wEX7d5i71SmmSy8&GOIFo%?BBpJm?!_*c`%{!!(sH
z9CzWyC2kBu6pz$vX;-Y;?iB{l^e3Nvl8)~)cfxdcZL5-E(kZi<ly$e9j}V_|BFN9s
zJc3GWbeUjjyyruyullHM9SN_#puGZOC<eG&G+bO9#^gm}qHX3+3uW*<TGf}tNM^K)
zzlOE%rW!%G1spVV3s}@ExGBk$eu(#|B)D={Q;K+&74VXX6N(a6tnRqoS+DBOqpT`f
zg40P-HgYAh@C>>u0kEa2KdM4cYIPa_{mrS(TAPFq<=j!RND3(|gJUV39&>i6))Cm^
zLZ8RJi%TwbU0NS4D|m4jom|1RNHrt{4Vf)^Rv5OpM1IbT|B~;WgWP@Q_R`K~*Jr|M
zK4xR&%jC^T<ty%rKh;&+{EU}_X7m-h6B#$Vg6+rL3Yv{2A7Zd@>XSz3uBRh#8Y;t|
zw&tY{z>?pBCA%X9rav$A*(Kc(2urI;eeba=z~XI|!4bJEAWELd$}qQ_VW(H2AHe@;
z2dg}^N+p6G6rzqoW9(>tGQ%ljJ<hFP`zLU$LbvUQ#VbGqN^0~N&{q2|`@%sF9skfT
z3;;}y!KNZ=bWUcU=O0(s%Q|zR{4aaK2$R18@_Z7k88ikY0ewc-x@`raXyOoumnGO*
z{==LPSeOlUg5*QZU(5wVbt$Gdy{(Al8AVfl55VgNMh2{ic_bz0EDE^%%#Ee&*?X$a
zBr6+RB42_AV5OqW9-K)@5*MkxVMT9e=S>@iUIATVz#Gk@KU~tk`{U1)XX<2xaIg3V
zBVV@SJ`7|dm?(uD&_xkWmHTE8AU971XrPVWm?@I1wIo4Im>mQE{GLooi<*)OTX*67
z<C8FY`E&0TfQASO@NshB!Tq)4GMl-nJ$g-G-H!>G^gg#df2K{^DVcg9AoS}!SQf{%
z7xdAzc>_@qs@oi$V#_@$$hc0z{T*tPE$aA8<yFpKQU^!>vOwOW?kZ7lKj)WWcR<ou
z(J7FX+EOt8)%B*c8sd&_h8pdbW<4l_`Qy_Q*6-y(SMHfeJDpQ8S-ZzEss<%RmHJjR
zH2w?>o0<eDpr6BzCDO@a`RGKcC5<)r0XVxsopu5#chJEE`<%1#3gE$Ih;{ynKv+CW
zhGI0iYBqX{$n#})QMW(WNg=2A9}lH?C8<um#U%O-^y^dRLNN?u;*?)*%d)<(5oVW=
z{*O)>cHJ{#X7TvB_ZEYKaH0RN(aUMA&Q}0ZOoK^B6lxJZrqdYy^TX5M5S-8XG4$+a
zHes=Ie#>3N#|(9ACg%wM5q&%A1bINid}-tr^o;VkvxpLH9Y7{n>G&esBI@RG>!%$F
zsvIfql2Yn$p#&1+?bd@1=M^g%X`Kh0A3ro@)P-f?mi&}Oj{IK!Uh6u7k>Ek-cTZ5w
zL@XNMJ(oyfL$I*oeu!Lun;~6++OAFC@DhrXlGMQAItWBW(oR90v4f)K32%2o48al1
zrK<^c`MxqfXhl=P%AwIH7+RB0*Pm=`*Vv%jNBI24K6MyQ9mPJR!!NS-{o@!Ci12()
zb+v9pVjt@ald6LoCF3knsJO=6!FnA-Tn^T}X77c5fQ8ihsXp?{S&^8{Ls;D0gA(nS
zgjeC!iW1itw^F~|%kx9do6E5rWOJUDQ|tFi3+~M3BMKE48Qwhp`dCd4BLWIrbKZvM
zmj0eR2)X@D0S4oyyzg3~2RdE>A;2-Ng5MEjo&K#AseN4p@1sJn(TR~MV^u=2tl4G^
zJZzTI-}3^)_bgBWMWdZ7ZDa3nkv>=0YvNegerish+{eN2w;R`+KVvELOR+T)kG0V|
z_n4j65oDJQQJb%yZV5qFw)yTB18<5Y8Cz9^B4w3k;A~|ewMjeIueaz22?;KeZPM(p
zakx;e{i-jcYf#kTY)mb+Om?Y;<7$UyXp9q>tH+mrdSN=^9$jW2*LiY=V|Y}=r5$iH
z9PpgmDO;1a{Ql-oR{R9h==ZAB71&XNY8Sub!FwKTgRk{P9TCUHzf>eq+mX1q=Rfy0
z)CwlsJF<{6TP85ybM59CWh}1}9tjDQ9;-nXb@99wKZkwgUYj9pi(`UM@idSLj!&Gc
zu8|EsuE(jeJ@{T3V>4Z~^HPiiZ~g5fN5;1|_{`-4FW(I;$Rno%iq@klA_9ZCKd^p9
z49$a>8&ye3ZWrV9mc{q9=sU!R68G|p)rT6r30OMem-jbi8Sb=V+89vpvg*vF2*9fI
zME*%_G5<S&tKz5S`Lw*giYczR8%|MqgEC3gP7)y(f7*UKZC77T+j^_QnWd%Qk(9ty
zlz`&)_yhC^|D4@fMAn68YZqoz!5962#2hCA*<~V;b3G~e<-iD{AGcVB39@W)xa0Yf
zM#jOdN)x)7U^e8FLjMS3V+4vU!;G-au+gzzSWxLy_os3EpndIo<O^*Q{<-`F{OvC=
z*2Pd+`=JB-T*>z$=V{{`Pb~rqOONJdn%0(*z-`Bt0V00?4}mL4M@ZBL_x;n=l-1S6
zbs5v+)f8_<kDkvDrJE^%D<!%(lvlvk8-c5qJyhOTKvw(H)###u%$|!E?bQV1T76<p
zEzkEUysIg%%dscd7l0pB6>2vHxPlS{Ef{*c1q%d0`QaaWfOb7#067~58LKF?->K`J
zIT!Jd`~Ph##uoQnIUu%|LgZr2PIh7<J0IyW_yMMo*MxEXCT~A5fN?Kvvbe+~&p>RZ
zVion-iA-+06mqDAb^jrd*FUXRu(a>s>O!oXh+oNo6Fd;Cs}n*ef8u0mCTyufX)LW`
zG%WGn;F(u#9V0U>b=;}A2yN2#vPpdVO;dCG<?h&0koW0?aPLtbX_+@#bdeU)u^;6O
zX?iOd<_O@CsK)E>%F9lC7Q?C2FFJ`*6XW3qlMkEy5UE>2ozlJcq0N#AfdJcv2WQH7
zC<I;j4K#deX!t+?2-=+fH+%pXgN$6Xj+M<wM8!U@*MH`}%_+t;1G(DwunHb6*ke{C
zZF&5gNm_(&Fh&=Tpp2E2X^~|A0grW|7tX96QRkZe0W;3~@H>u#RxqOIzSo4-gYY)@
zQT3xE=TMl*v?_~9*w7X_!tvQtD(}zvL{gR2CA=D|AYbZ$!`gjh3sd#^ag%Ob7TsNi
zy5)Q$Wfpi%jc0`a5c)rS4)bq6{dc`Yp^Q#u<ZmCBcV5RT(z|i-KcJyYD~vH?Tzy;m
zzhw5Mkz>01hO|*8A61CNNsDe~pIi&p`KzkHJ*&5j;)PrvyR`0*;%;>tolNlaHwGBx
z5L3`f;g7-@pYhsyK~@{R<`GOxw7FDcEB<QgOPS;8DTj5!(CtDVtR^H4V}$eE(pjpQ
zELDrjL)&pbBzN+HT`ijNJ_kw}v_K`VF$qGtWF9+p13vf<P292k^3R%u)1gfJUH~-Y
zLLO=g*nuW_^6xD*@ZVUs9UKHF8xOTnLD<(j@B|O~*haFwTmCBriWHcX4F((wa0g>j
z_H7177VpWEy_{k-zrt>*DLFIu_>z;X`U>8wjbAoh$9Mk<PB!yAD;o5gI`EgD#Us)K
z=Ev?%|Cs^#>thDX9NwRe$zk*j=6dmwQ=o>m{?y-e>lH4>o*>JMEc6R0zuJ;3`*xX#
zA*DIxql0|iLGe!Jwq`$Vx>(V6AVT?QZI|hj*jDDi56s73E57M(43l0U(ql~IJNdp1
zXR^oD<=Pr;ei|;tSPUtKtDo=+!uH)!OuyMO8^^sXU)3YpmSGHvA$)Xxy4xw4{aVU;
z`sOKfc0rqHj;_z0PUh1h=>%9LS9~&aJ=SG<t=m)am)UNp6>RwQsF0FJa9={)@Lf@L
z1OsuKry9({=@G-~5i}a5!i~b1(%6eb;|#y?{86*|=vTm66oClkSLvuKgH47Piqb6R
z00McGD5J3e*Atn~*uw_R*M6nCJ(>zFn+5LQ4|mTzBBX(gh<Lk!<BUM!f@>kq%by4D
zt=`S~n?9jON%x^D=>&3<GhKMJ9i~16C%e?h-(LSRwL9fudmt6Ox4h2e5=2JaEiC_(
zIq1cA+LhNod~2x~L?O3Fk{0+?d}e})vORc7kjPhu?>h7YjYWXJ{}gP)ue*-M2StLX
ziMJf9<AwC-=@Q5^j5~+I*=6b0dL?mvu5_GP)1QDAv%vS^eVpt{%1vY>B2%%m%x*-K
z2U3Mcj1k|GO|)xdmHlq%;+7pU_GIIFa4iKxPr}glw!^JO5&MJ^YZWQpo5Zd;y9KEj
ze~hhE+;+|NgR4HqAyHi-56w12sN4^wPMTt#;4KTxD`Ddg6?Nk^n5Ge}#hn0AE|PJX
znUNzAtjLnw!3X=Dr>yz;!5R+bN9V%vN7)VI7?984n?dkAXV~8M!YmkmM%2x-RJ6n(
z5z!s~-rD%ff+m*Q@@RTp3x3QLjUPUr40(dA<9G(6lvRTqh1oJ9XxMl-`A53=84NlD
zc8Et{I?%c&9Ijv5)^bs#HZjN=a}>nE$`%}5p9hGOcYQ~ROVnC8QA9S)crh#&L=K6J
zFGDa+wZUAES2(APR4x~725MwBOAbh%@eDTnvY~cg>Z6Nkdi%NX6lKEz+i8LJt;oQ2
z+bxmiF9*%li=QlOm#WbX5redr$#IIPr?T&-1NPfLk}wBP!#(YXEPefDnh_bo<AWVb
z(#r_uya~-Wv1gWCI<i4s3P4MMgu!CJ!{IP{Zybif8p6wBtX*oBwZgZFX+fYe!WdvO
zfu@NC9yczswW!Fv?PS-VBeIhI1By_^{MyA>kGYl)>`)k$h>_)kVP&#(5gjDP!b;G@
zyi(RmlIXB>q2$~MV-JML*jq7G<yD^(S8<Woa<<05lX^zMXPSApW;u;V@nPD!-_45G
z)0DNoY(Wshs*Tl}!C8rj504FzyTJZrHn*f)xgl=iM@Co0oq8sJfJjE8rLZTUN{X)~
z5|sbmMkI)vEEcWG6GHDE^oM}+VCx&XI}jT#K@CX(AFWQ>g09{lxR4fAs*Ef@z|FDj
zHHR==B;e12TM+-`D(e4bZAK=!C14XTqqmDs*{Cm<v1geJwi?NhRitw^?34mo8u(^2
zMqsYk0^E8PT~BqK$P(vkEYn7~nMh<2r0M=ig#%vqwBIezYtGCoP&;hP91<WHwn@Iv
z)Fd@WL1=ql_%j_5J7HoJKLVH_vY477!H0ws+w8_ZXLT#Z*dfN*SzBPZR%Ut<{<y};
z`R!**QGlaTWB5LN2&=t_N;9FX&~O<GW~@;|JGYfHq@4nV>QrN;q`(D-FqhwXP--JI
z87cch1v?IdW>%peQ)9c$ak{9?5m64wMPMc!1Gz2CZ+w1r%8k~&kEd+Btf!Thdl0dC
zLry%5u<v7{JBeUySq^2%_J~n%oD3YIt|-_vv<S^Qk6~Yx*Re5<DmB5>BwRfU<DK+x
zhkS&I7>7Z0Fd1T{NGN_q!BWgI=Mta{rao6fG=Bw#5ks%xWC2kEB$8g;2&Q7pbIGd9
z2gC;&CekvmS*@|s(DKVbI3~kkD?^iOw@c_FbFsE0)-+^*_cT`*(hWqDTAt}_E%c!|
zBz~k!(j`Vn%s34UCSqhK$QiV;DQ9oQ_=*rd1JpNql`D0hH0xV!4sy%;`n9C_r$Y^!
zW$eKOLaHX9KwXR40Y2jorTXTQ%uF(=VB;P7NL(^;+z?4ECT&rPB<)XK!Q+VigXM6x
wVWMiaygR|7ds6$j9%QQ*vWaAotrE9fZqs@>%R^yYj$KN}X9uDLzOPIF57k*5AOHXW

literal 0
HcmV?d00001

diff --git a/doc/source/_static/images/perun_logo.svg b/doc/source/_static/images/perun_logo.svg
new file mode 100644
index 0000000000..ff794fba41
--- /dev/null
+++ b/doc/source/_static/images/perun_logo.svg
@@ -0,0 +1,112 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   width="57mm"
+   height="55.185368mm"
+   viewBox="0 0 57.000002 55.185368"
+   version="1.1"
+   id="svg5"
+   xml:space="preserve"
+   inkscape:version="1.2.2 (b0a8486541, 2022-12-01)"
+   sodipodi:docname="logo.svg"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:svg="http://www.w3.org/2000/svg"><sodipodi:namedview
+     id="namedview7"
+     pagecolor="#505050"
+     bordercolor="#eeeeee"
+     borderopacity="1"
+     inkscape:showpageshadow="0"
+     inkscape:pageopacity="0"
+     inkscape:pagecheckerboard="0"
+     inkscape:deskcolor="#505050"
+     inkscape:document-units="mm"
+     showgrid="false"
+     inkscape:zoom="1.2368489"
+     inkscape:cx="116.42489"
+     inkscape:cy="197.27551"
+     inkscape:window-width="1902"
+     inkscape:window-height="976"
+     inkscape:window-x="5"
+     inkscape:window-y="892"
+     inkscape:window-maximized="1"
+     inkscape:current-layer="layer1" /><defs
+     id="defs2" /><g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(-21.508291,-109.02316)"><g
+       id="g1801"><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231"
+         cx="50.063343"
+         cy="136.61584"
+         r="2.5926795" /><path
+         id="path447"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         inkscape:transform-center-y="0.017822118"
+         d="M 62.557928,162.37462 56.310765,151.6161 50.0636,140.85706 43.81592,151.6161 37.568757,162.37462 h 5.139738 a 7.3572302,7.3572302 0 0 1 -0.0021,-0.1323 7.3572302,7.3572302 0 0 1 7.357172,-7.35717 7.3572302,7.3572302 0 0 1 7.357174,7.35717 7.3572302,7.3572302 0 0 1 -0.0057,0.1323 z"
+         inkscape:transform-center-x="-3.5862785" /><path
+         id="path447-9"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         d="m 78.618394,138.6746 -12.440734,0.0309 -12.441184,0.0307 6.193762,10.79017 6.193569,10.78946 2.56987,-4.45114 a 7.3572302,7.3572302 0 0 1 -0.115567,-0.0644 7.3572302,7.3572302 0 0 1 -2.692918,-10.05009 7.3572302,7.3572302 0 0 1 10.050092,-2.6929 7.3572302,7.3572302 0 0 1 0.111716,0.0711 z"
+         inkscape:transform-center-x="3.586223" /><path
+         id="path447-1"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         inkscape:transform-center-y="0.017951275"
+         d="m 34.002876,160.31586 6.19357,-10.78946 6.194021,-10.78972 -12.441442,-0.0311 -12.440734,-0.0309 2.569869,4.45115 a 7.3572302,7.3572302 0 0 1 0.113577,-0.0679 7.3572302,7.3572302 0 0 1 10.050083,2.69291 7.3572302,7.3572302 0 0 1 -2.69291,10.05008 7.3572302,7.3572302 0 0 1 -0.117425,0.0612 z"
+         inkscape:transform-center-x="3.5862786" /><path
+         id="path447-7"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         d="m 21.508292,134.55708 12.440734,-0.0309 12.441184,-0.0307 -6.193762,-10.79017 -6.193569,-10.78946 -2.56987,4.45115 a 7.3572302,7.3572302 0 0 1 0.115575,0.0644 7.3572302,7.3572302 0 0 1 2.69291,10.05009 7.3572302,7.3572302 0 0 1 -10.050083,2.69291 7.3572302,7.3572302 0 0 1 -0.111725,-0.0711 z"
+         inkscape:transform-center-x="-3.5862228" /><path
+         id="path447-2"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         inkscape:transform-center-y="-0.017820934"
+         d="m 37.568759,110.85707 6.247162,10.75851 6.247163,10.75904 6.247683,-10.75904 6.247165,-10.75851 h -5.139739 a 7.3572302,7.3572302 0 0 1 0.002,0.13229 7.3572302,7.3572302 0 0 1 -7.357174,7.35717 7.3572302,7.3572302 0 0 1 -7.357174,-7.35717 7.3572302,7.3572302 0 0 1 0.0057,-0.13229 z"
+         inkscape:transform-center-x="3.5862753" /><path
+         id="path447-3"
+         style="fill:#005aa0;fill-opacity:1;stroke-width:0.296897"
+         d="m 66.1238,112.91582 -6.19356,10.78946 -6.194019,10.78973 12.441443,0.0311 12.440725,0.031 -2.56987,-4.45114 a 7.3572302,7.3572302 0 0 1 -0.113565,0.0679 7.3572302,7.3572302 0 0 1 -10.050082,-2.69292 7.3572302,7.3572302 0 0 1 2.692908,-10.05008 7.3572302,7.3572302 0 0 1 0.117416,-0.0612 z"
+         inkscape:transform-center-y="-0.017950184"
+         inkscape:transform-center-x="-3.5862753" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-6"
+         cx="93.281113"
+         cy="-136.66405"
+         r="2.5926795"
+         transform="rotate(120)" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-5"
+         cx="-50.063339"
+         cy="-161.61584"
+         r="2.5926795"
+         transform="scale(-1)" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-9"
+         cx="-143.34445"
+         cy="-49.951794"
+         r="2.5926795"
+         transform="rotate(-120)" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-6-5"
+         cx="-93.281097"
+         cy="86.66407"
+         r="2.5926795"
+         transform="rotate(-59.999988)" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-5-4"
+         cx="143.34445"
+         cy="-0.048177384"
+         r="2.5926795"
+         transform="rotate(59.999988)" /><circle
+         style="fill:#8cb423;fill-opacity:1;stroke-width:0.264583"
+         id="path231-9-9"
+         cx="50.063343"
+         cy="111.61584"
+         r="2.5926795" /></g><g
+       aria-label="PERUN"
+       id="text854"
+       style="font-size:35.2778px;fill:#8cb423;stroke:#000000;stroke-width:0.264583" /></g></svg>
diff --git a/doc/images/split_array.png b/doc/source/_static/images/split_array.png
similarity index 100%
rename from doc/images/split_array.png
rename to doc/source/_static/images/split_array.png
diff --git a/doc/images/split_array.svg b/doc/source/_static/images/split_array.svg
similarity index 100%
rename from doc/images/split_array.svg
rename to doc/source/_static/images/split_array.svg
diff --git a/doc/images/tutorial_clustering.svg b/doc/source/_static/images/tutorial_clustering.svg
similarity index 100%
rename from doc/images/tutorial_clustering.svg
rename to doc/source/_static/images/tutorial_clustering.svg
diff --git a/doc/images/tutorial_dpnn.svg b/doc/source/_static/images/tutorial_dpnn.svg
similarity index 100%
rename from doc/images/tutorial_dpnn.svg
rename to doc/source/_static/images/tutorial_dpnn.svg
diff --git a/doc/images/tutorial_logo.svg b/doc/source/_static/images/tutorial_logo.svg
similarity index 100%
rename from doc/images/tutorial_logo.svg
rename to doc/source/_static/images/tutorial_logo.svg
diff --git a/doc/images/tutorial_split_dndarray.svg b/doc/source/_static/images/tutorial_split_dndarray.svg
similarity index 100%
rename from doc/images/tutorial_split_dndarray.svg
rename to doc/source/_static/images/tutorial_split_dndarray.svg
diff --git a/doc/images/weak_scaling_gpu_terrabyte.png b/doc/source/_static/images/weak_scaling_gpu_terrabyte.png
similarity index 100%
rename from doc/images/weak_scaling_gpu_terrabyte.png
rename to doc/source/_static/images/weak_scaling_gpu_terrabyte.png
diff --git a/doc/source/case_studies.rst b/doc/source/case_studies.rst
index 61ec3a1983..184e11571f 100644
--- a/doc/source/case_studies.rst
+++ b/doc/source/case_studies.rst
@@ -5,7 +5,7 @@ Case Studies
 
     .. container:: case-image
 
-        .. image:: ../images/fzj_logo.svg
+        .. image:: _static/images/fzj_logo.svg
 
     .. container:: case-text
 
@@ -17,7 +17,7 @@ Case Studies
 
     .. container:: case-image
 
-        .. image:: ../images/dlr_logo.svg
+        .. image:: _static/images/dlr_logo.svg
 
     .. container:: case-text
 
@@ -29,7 +29,7 @@ Case Studies
 
     .. container:: case-image
 
-        .. image:: ../images/kit_logo.svg
+        .. image:: _static/images/kit_logo.svg
 
     .. container:: case-text
 
diff --git a/doc/source/conf.py b/doc/source/conf.py
index 2cadd075ba..c2da12b04f 100644
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -21,8 +21,6 @@
 
 import os
 import sys
-import sphinx_rtd_theme
-from sphinx.ext.napoleon.docstring import NumpyDocstring, GoogleDocstring
 
 # sys.path.insert(0, os.path.abspath('.'))
 sys.path.insert(0, os.path.abspath("../../heat"))
@@ -46,6 +44,7 @@
     "sphinx.ext.napoleon",
     "sphinx.ext.mathjax",
     "sphinx_copybutton",
+    "nbsphinx",
 ]
 
 # Document Python Code
@@ -133,7 +132,7 @@ def setup(sphinx):
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = None
+language = "en"
 
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:
@@ -209,7 +208,7 @@ def setup(sphinx):
 # The name of an image file (relative to this directory) to place at the top
 # of the sidebar.
 #
-html_logo = "../images/logo_emblem.png"
+html_logo = "_static/images/logo_emblem.png"
 
 # The name of an image file (relative to this directory) to use as a favicon of
 # the docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
@@ -409,3 +408,17 @@ def setup(sphinx):
 # If true, do not generate a @detailmenu in the "Top" node's menu.
 #
 # texinfo_no_detailmenu = False
+
+# NBSPHINX
+nbsphinx_execute = "never"
+nbsphinx_thumbnails = {
+    "tutorials/notebooks/0_setup/0_setup_jsc": "_static/images/jsc_logo.png",
+    "tutorials/notebooks/0_setup/0_setup_local": "_static/images/local_laptop.png",
+    "tutorials/notebooks/0_setup/0_setup_haicore": "_static/images/nhr_verein_logo.jpg",
+    "tutorials/notebooks/1_basics": "_static/images/logo_emblem.png",
+    "tutorials/notebooks/2_internals": "_static/images/tutorial_split_dndarray.svg",
+    "tutorials/notebooks/3_loading_preprocessing": "_static/images/jupyter.png",
+    "tutorials/notebooks/4_matrix_factorizations": "_static/images/hSVD_bench_rank5.png",
+    # "tutorials/notebooks/5_clustering": "_static/images/tutorial_split_dndarray.svg",
+    "tutorials/notebooks/6_profiling": "_static/images/perun_logo.svg",
+}
diff --git a/doc/source/index.rst b/doc/source/index.rst
index a176a45340..08b96be0f3 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -16,7 +16,7 @@ Heat is a distributed tensor framework for high performance data analytics.
 
    introduction
    getting_started
-   tutorials
+   tutorials/tutorials
    case_studies
    documentation_howto
 
diff --git a/doc/source/tutorial_dpnn.rst b/doc/source/tutorial_dpnn.rst
deleted file mode 100644
index 7cffac3b59..0000000000
--- a/doc/source/tutorial_dpnn.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-Data-parallel Neural Networks
-=============================
-
-1
diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh b/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh
new file mode 100755
index 0000000000..231c62c24c
--- /dev/null
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+## 1. If necessary, install conda: https://www.anaconda.com/docs/getting-started/miniconda/install
+
+
+## 2. Setup conda environment
+conda create --name heat-env python=3.11
+conda activate heat-env || exit 1
+conda install -c conda-forge heat xarray jupyter scikit-learn ipyparallel
+
+## 3. Setup kernel
+python -m ipykernel install --user --name=heat-env
+
+## 3. Start notebook
+jupyter notebook
diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
new file mode 100644
index 0000000000..6e4662a701
--- /dev/null
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
@@ -0,0 +1,620 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# Setting up a parallel notebook with heat, SLURM, and ipyparallel on HAICORE/Horeka"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "The original version of this tutorial was inspired by the [CS228 tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) by Volodomyr Kuleshov and Isaac Caswell."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "\n",
+    "\n",
+    "\n",
+    "## Introduction\n",
+    "---\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "<b>Note:</b>\n",
+    "This notebook expects that you will be working on the JupyterLab hosted in <a href=\"https://haicore-jupyter.scc.kit.edu/\">HAICORE</a>, at the Karlsruhe Institute of Technology.\n",
+    "\n",
+    "If you want to run the tutorial on your local machine, or on another systems, please refer to the <a href=\"../0_setup/0_setup_local.ipynb\">local setup notebook</a> in this repository for reference, or to our <a href=\"https://heat.readthedocs.io/en/stable/tutorial_notebook_gallery.html\">notebook gallery</a> for more examples.\n",
+    "</div>\n",
+    "\n",
+    "<div style=\"float: right; padding-right: 2em; padding-top: 2em;\">\n",
+    "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo.png\"></img>\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "## Setting up the environment\n",
+    "\n",
+    "The rest of this tutorial assumes you have started a JupyterLab at [Jupyter for HAICORE](https://haicore-jupyter.scc.kit.edu/) with the following parameters:\n",
+    "\n",
+    "| **Resources**     |     |\n",
+    "| ---               | --- |\n",
+    "| Nodes             | 1   |\n",
+    "| GPUs              | 4   |\n",
+    "| Runtime (hours) | 4   |\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "### Resources\n",
+    "\n",
+    "We will be running the tutorial on the GPU partition of the [HAICORE](https://www.nhr.kit.edu/userdocs/haicore/hardware/) cluster, with the following hardware:\n",
+    "\n",
+    "- 2× Intel Xeon Platinum 8368, 2 × 38 cores\n",
+    "- 4x NVIDIA A100-40\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "### Setup environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The first step is to load (and unload) the right modules on HAICORE+Jupyter. \n",
+    "\n",
+    "On the left bar on Jupyter Lab, open the modules tab, and make to unload any ```jupyter``` modules, and the load ```mpi/openmpi/4.1``` and ```devel/cuda/12.4```.\n",
+    "\n",
+    "Afterwards, run the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Currently Loaded Modules:\n",
+      "i/4.1dot                       3) numlib/mkl/2022.0.2       5) mpi/openmp\n",
+      "  2) compiler/intel/2023.1.0   4) devel/cuda/12.4     (E)\n",
+      "\n",
+      "  Where:\n",
+      "   E:  Experimental\n",
+      "\n",
+      " \n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: heat in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (1.5.1)\n",
+      "Requirement already satisfied: mpi4py>=3.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (4.0.3)\n",
+      "Requirement already satisfied: numpy<2,>=1.22.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (1.26.4)\n",
+      "Requirement already satisfied: torch<2.6.1,>=2.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (2.6.0)\n",
+      "Requirement already satisfied: scipy>=1.10.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (1.15.3)\n",
+      "Requirement already satisfied: pillow>=6.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (11.2.1)\n",
+      "Requirement already satisfied: torchvision<0.21.1,>=0.15.2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (0.21.0)\n",
+      "Requirement already satisfied: filelock in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.18.0)\n",
+      "Requirement already satisfied: typing-extensions>=4.10.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (4.13.2)\n",
+      "Requirement already satisfied: networkx in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.4.2)\n",
+      "Requirement already satisfied: jinja2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.1.6)\n",
+      "Requirement already satisfied: fsspec in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (2025.3.2)\n",
+      "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (9.1.0.70)\n",
+      "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.5.8)\n",
+      "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (11.2.1.3)\n",
+      "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (10.3.5.147)\n",
+      "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (11.6.1.9)\n",
+      "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.3.1.170)\n",
+      "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (0.6.2)\n",
+      "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (2.21.5)\n",
+      "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n",
+      "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n",
+      "Requirement already satisfied: triton==3.2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.2.0)\n",
+      "Requirement already satisfied: sympy==1.13.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (1.13.1)\n",
+      "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from sympy==1.13.1->torch<2.6.1,>=2.0.0->heat) (1.3.0)\n",
+      "Requirement already satisfied: MarkupSafe>=2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jinja2->torch<2.6.1,>=2.0.0->heat) (3.0.2)\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING: There was an error checking the latest version of pip.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: ipyparallel in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (9.0.1)\n",
+      "Requirement already satisfied: decorator in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (5.2.1)\n",
+      "Requirement already satisfied: ipykernel>=6.9.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (6.29.5)\n",
+      "Requirement already satisfied: ipython>=5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (9.2.0)\n",
+      "Requirement already satisfied: jupyter-client>=7 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (8.6.3)\n",
+      "Requirement already satisfied: psutil in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (7.0.0)\n",
+      "Requirement already satisfied: python-dateutil>=2.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (2.9.0.post0)\n",
+      "Requirement already satisfied: pyzmq>=25 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (26.4.0)\n",
+      "Requirement already satisfied: tornado>=6.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (6.4.2)\n",
+      "Requirement already satisfied: tqdm in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (4.67.1)\n",
+      "Requirement already satisfied: traitlets>=5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (5.14.3)\n",
+      "Requirement already satisfied: comm>=0.1.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (0.2.2)\n",
+      "Requirement already satisfied: debugpy>=1.6.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (1.8.14)\n",
+      "Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (5.7.2)\n",
+      "Requirement already satisfied: matplotlib-inline>=0.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (0.1.7)\n",
+      "Requirement already satisfied: nest-asyncio in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (1.6.0)\n",
+      "Requirement already satisfied: packaging in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (25.0)\n",
+      "Requirement already satisfied: ipython-pygments-lexers in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (1.1.1)\n",
+      "Requirement already satisfied: jedi>=0.16 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (0.19.2)\n",
+      "Requirement already satisfied: pexpect>4.3 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (4.9.0)\n",
+      "Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (3.0.51)\n",
+      "Requirement already satisfied: pygments>=2.4.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (2.19.1)\n",
+      "Requirement already satisfied: stack_data in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (0.6.3)\n",
+      "Requirement already satisfied: typing_extensions>=4.6 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (4.13.2)\n",
+      "Requirement already satisfied: six>=1.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from python-dateutil>=2.1->ipyparallel) (1.17.0)\n",
+      "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jedi>=0.16->ipython>=5->ipyparallel) (0.8.4)\n",
+      "Requirement already satisfied: platformdirs>=2.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel>=6.9.1->ipyparallel) (4.3.8)\n",
+      "Requirement already satisfied: ptyprocess>=0.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from pexpect>4.3->ipython>=5->ipyparallel) (0.7.0)\n",
+      "Requirement already satisfied: wcwidth in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=5->ipyparallel) (0.2.13)\n",
+      "Requirement already satisfied: executing>=1.2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (2.2.0)\n",
+      "Requirement already satisfied: asttokens>=2.1.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (3.0.0)\n",
+      "Requirement already satisfied: pure-eval in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (0.2.3)\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING: There was an error checking the latest version of pip.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Installed kernelspec myEnv in /hkfs/home/haicore/scc/io3047/.local/share/jupyter/kernels/myenv\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "# Report modules\n",
+    "ml list\n",
+    "\n",
+    "# Create a virtual environment\n",
+    "python3.11 -m venv heat-env\n",
+    "source heat-env/bin/activate\n",
+    "pip install heat[hdf5] ipyparallel xarray matplotlib scikit-learn perun[nvidia]\n",
+    "\n",
+    "python -m ipykernel install \\\n",
+    "      --user \\\n",
+    "      --name heat-env \\\n",
+    "      --display-name \"heat-env\"\n",
+    "deactivate"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "To be able to run this tutorial interactively for parallel computing, we need to start an [IPython cluster](https://ipyparallel.readthedocs.io/en/latest/tutorial/process.html).\n",
+    "\n",
+    "\n",
+    "In the terminal, type:\n",
+    "\n",
+    "```bash\n",
+    "ipcluster start -n 4 --engines=MPI --MPILauncher.mpi_args=\"--oversubscribe\"\n",
+    "```\n",
+    "On your terminal, you should see something like this:\n",
+    "\n",
+    "```bash\n",
+    "2024-03-04 16:30:24.740 [IPController] Registering 4 new hearts\n",
+    "2024-03-04 16:30:24.740 [IPController] registration::finished registering engine 0:63ac2343-f1deab70b14c0e14ca4c1630 in 5672ms\n",
+    "2024-03-04 16:30:24.740 [IPController] engine::Engine Connected: 0\n",
+    "2024-03-04 16:30:24.744 [IPController] registration::finished registering engine 3:673ce83c-eb7ccae6c69c52382c8349c1 in 5397ms\n",
+    "2024-03-04 16:30:24.744 [IPController] engine::Engine Connected: 3\n",
+    "2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 1:d7936040-5ab6c117b845850a3103b2e8 in 5627ms\n",
+    "2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 1\n",
+    "2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 2:ca57a419-2f2c89914a6c17865103c3e7 in 5508ms\n",
+    "2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 2\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-block alert-info\">\n",
+    "<b>Note:</b>\n",
+    "You must now reload the kernel to be able to access the IPython cluster.\n",
+    "</div>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To be able to start working with Heat on an HPC cluster, we first need to check the health of the available processes. We will use `ipyparallel` for this. For a great intro on `ipyparallel` usage on our supercomputers, check out Jan Meinke's tutorial [\"Interactive Parallel Computing with IPython Parallel\"](https://gitlab.jsc.fz-juelich.de/sdlbio-courses/hpc-python/-/blob/master/06_LocalParallel.ipynb) or the [ipyparallel docs](https://ipyparallel.readthedocs.io/en/latest/)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.wait_for_engines(4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Earlier, we have started an IPython cluster with 4 processes. We can now check if the processes are available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[0, 1, 2, 3]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rc.ids"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `px` magic command allows you to execute Python commands or a Jupyter cell on the ipyparallel engines interactively ([%%px documentation](https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can now finally import `heat` on our 4-process cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "%px: 100%|██████████| 4/4 [00:01<00:00,  2.77tasks/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import heat as ht"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "ht.use_device(\"gpu\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:3]: \u001b[0m\n",
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "f2f42775-a79bbfdf74b1451745b1b33b",
+      "error": null,
+      "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')"
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-13T13:57:17.136864Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:3]: \u001b[0m\n",
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "41c89ea2-836a289f0df22369ee3a4a41",
+      "error": null,
+      "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')"
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-13T13:57:17.136703Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:3]: \u001b[0m\n",
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "9a961c9d-e3973d86ed7923c48e730123",
+      "error": null,
+      "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')"
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-13T13:57:17.136816Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:3]: \u001b[0m\n",
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "db282267-fc4496217b3a865d3c3b5ae8",
+      "error": null,
+      "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')"
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-13T13:57:17.136769Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "x = ht.ones((10,10), split=0)\n",
+    "x.larray\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] True\n",
+       "2\n",
+       "0,1\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] True\n",
+       "2\n",
+       "0,1\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] True\n",
+       "2\n",
+       "0,1\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] True\n",
+       "2\n",
+       "0,1\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import torch\n",
+    "import os\n",
+    "\n",
+    "print(torch.cuda.is_available())\n",
+    "print(torch.cuda.device_count())\n",
+    "print(os.environ[\"CUDA_VISIBLE_DEVICES\"])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python (heat_nb_env)",
+   "language": "python",
+   "name": "myenv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/tutorials/hpc/1_intro.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
similarity index 78%
rename from tutorials/hpc/1_intro.ipynb
rename to doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
index 34ad179a7c..ee00ae6115 100644
--- a/tutorials/hpc/1_intro.ipynb
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
@@ -4,8 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Heat Tutorial\n",
-    "---"
+    "# Setting up a parallel notebook with SLURM, ipyparallel at JSC"
    ]
   },
   {
@@ -23,7 +22,7 @@
    "source": [
     "\n",
     "<div style=\"float: right; padding-right: 2em; padding-top: 2em;\">\n",
-    "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo.png\"></img>\n",
+    "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/logo.png\"></img>\n",
     "</div>\n",
     "\n",
     "## Introduction\n",
@@ -80,13 +79,6 @@
     "Navigate to `$HOME`, then `tutorials/hpc`. Open `1_intro.ipynb`."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -197,9 +189,9 @@
    "metadata": {},
    "source": [
     "<div>\n",
-    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/images/heatvsdask_strong_smalldata_without.png?raw=true title=\"Strong scaling CPU\" width=\"30%\" style=\"float:center\"/>\n",
-    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/images/heatvsdask_weak_smalldata_without.png?raw=true title=\"Weak scaling CPU\" width=\"30%\" style=\"float:center \"/>\n",
-    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/images/weak_scaling_gpu_terrabyte.png?raw=true title=\"Weak scaling GPU\" width=\"30%\" style=\"float:center\"/>\n",
+    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/source/_static/images/heatvsdask_strong_smalldata_without.png?raw=true title=\"Strong scaling CPU\" width=\"30%\" style=\"float:center\"/>\n",
+    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/source/_static/images/heatvsdask_weak_smalldata_without.png?raw=true title=\"Weak scaling CPU\" width=\"30%\" style=\"float:center \"/>\n",
+    "  <img src=https://github.com/helmholtz-analytics/heat/blob/docs/tutorials-hpc/doc/source/_static/images/weak_scaling_gpu_terrabyte.png?raw=true title=\"Weak scaling GPU\" width=\"30%\" style=\"float:center\"/>\n",
     "</div>"
    ]
   },
@@ -231,7 +223,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {
     "tags": []
    },
@@ -242,7 +234,7 @@
        "[0, 1, 2, 3]"
       ]
      },
-     "execution_count": 3,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -267,59 +259,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[stderr:3] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-       "  from .autonotebook import tqdm as notebook_tqdm\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stderr:2] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-       "  from .autonotebook import tqdm as notebook_tqdm\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stderr:0] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-       "  from .autonotebook import tqdm as notebook_tqdm\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stderr:1] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-       "  from .autonotebook import tqdm as notebook_tqdm\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "%px: 100%|██████████| 4/4 [00:07<00:00,  1.96s/tasks]\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%px import heat as ht"
    ]
@@ -327,9 +271,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "heat1.3.1",
+   "display_name": "heat-dev-311",
    "language": "python",
-   "name": "heat1.3.1"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -341,7 +285,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.4"
+   "version": "3.11.8"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/local/1_intro.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb
similarity index 79%
rename from tutorials/local/1_intro.ipynb
rename to doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb
index 16da4e7563..8656c09896 100644
--- a/tutorials/local/1_intro.ipynb
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb
@@ -4,8 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Heat Tutorial\n",
-    "---"
+    "# Running Heat in parallel on a Jupyter Notebook"
    ]
   },
   {
@@ -21,20 +20,12 @@
    "source": [
     "\n",
     "<div style=\"float: center; padding-right: 2em; padding-top: 2em;\">\n",
-    "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo.png\"></img>\n",
+    "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/logo.png\"></img>\n",
     "</div>\n",
     "\n",
-    "## Introduction\n",
-    "---\n",
+    "## Installation\n",
     "\n",
-    "\n",
-    "This tutorial is designed to run on your local machine. Generally, to run Heat you need an [MPI](https://hpc-tutorials.llnl.gov/mpi/) installation, and a Python environment with PyTorch and mpi4py. The easiest way to set up such an environment is to install `heat` via conda: \n",
-    "\n",
-    "```shell\n",
-    "conda create --name heat-env python=3.11\n",
-    "conda activate heat-env\n",
-    "conda install -c conda-forge heat\n",
-    "```\n"
+    "Run the scripts to install our notebook setup, either with ```pip``` or with anaconda. They can be found ```doc/source/tutorials/notebooks/0_setup```."
    ]
   },
   {
@@ -43,9 +34,15 @@
    "source": [
     "## Setting up the IPyParallel environment\n",
     "\n",
-    "In this tutorial, we want to demonstrate how Heat distributes arrays and operations across multiple MPI processes. We can do this interactively in a Jupyter Notebook using IPyParallel. In your virtual environment, install the following packages:\n",
+    "In this tutorial, we want to demonstrate how Heat distributes arrays and operations across multiple MPI processes. We can do this interactively in a Jupyter Notebook using IPyParallel.\n",
+    "\n",
+    "Now you can select the `heat-env` kernel when creating a new notebook.\n",
+    "\n",
+    "Finally, you need to start an IPyParallel cluster to access multiple MPI processes. You can do this by running the following command in a terminal inside the jupyter:\n",
     "\n",
     "```bash\n",
+<<<<<<< HEAD:doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb
+=======
     "conda install ipyparallel jupyter\n",
     "```"
    ]
@@ -76,6 +73,7 @@
     "Finally, you need to start an IPyParallel cluster to access multiple MPI processes. You can do this by running the following command in a terminal:\n",
     "\n",
     "```bash\n",
+>>>>>>> stable:tutorials/local/1_intro.ipynb
     "ipcluster start -n 4 --engines=mpi\n",
     "```\n",
     "\n",
@@ -93,7 +91,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {
     "tags": []
    },
@@ -112,7 +110,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {
     "tags": []
    },
@@ -123,7 +121,7 @@
        "[0, 1, 2, 3]"
       ]
      },
-     "execution_count": 3,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -148,7 +146,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {
     "tags": []
    },
@@ -156,7 +154,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "01f69af457ad4f9e818c5571fa4d17b3",
+       "model_id": "d51a2e5dfcad4264a317401c208b0f6d",
        "version_major": 2,
        "version_minor": 0
       },
@@ -173,18 +171,18 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "The server can be ```ipcluster``` server can be stopped by stopping the command with CTRL+C."
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "heat_env",
+   "display_name": "heat-dev-311",
    "language": "python",
-   "name": "heat_env"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh b/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh
new file mode 100755
index 0000000000..f2e21c518a
--- /dev/null
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+# 1. If necessary, install openmpi
+# Heat can also be installed with pip, but ```openmpi``` has to be available on the system. To install ```openmpi``` on linux/macos:
+
+# Ubuntu
+# sudo apt install openmpi-bin libopenmpi-dev
+
+# Arch
+# sudo pacman -S openmpi
+
+# MacOS
+# brew install openmpi
+
+# 2. Create environment and install dependencies
+python -m venv heat-env
+# shellcheck disable=SC1091
+. heat-env/bin/activate || exit 1
+pip install heat xarray jupyter scikit-learn ipyparallel
+
+# 3. Setup jupyter kernel
+python -m ipykernel install --user --name=heat-env
+
+# 4. Start jupyter
+jupyter notebook
diff --git a/doc/source/tutorials/notebooks/1_basics.ipynb b/doc/source/tutorials/notebooks/1_basics.ipynb
new file mode 100644
index 0000000000..73d3c48b84
--- /dev/null
+++ b/doc/source/tutorials/notebooks/1_basics.ipynb
@@ -0,0 +1,3165 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Heat Basics"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## What is Heat for?\n",
+    "\n",
+    "\n",
+    "\n",
+    "Straight from our [GitHub repository](https://github.com/helmholtz-analytics/heat):\n",
+    "\n",
+    "Heat builds on [PyTorch](https://pytorch.org/) and [mpi4py](https://mpi4py.readthedocs.io) to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.\n",
+    "\n",
+    "\n",
+    "With Heat you can:\n",
+    "- port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort;\n",
+    "- exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms;\n",
+    "- run your NumPy/SciPy code on GPUs (CUDA, ROCm, limited support of Apple MPS).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Why?\n",
+    "\n",
+    "- significant **scalability** with respect to task-parallel frameworks;\n",
+    "- analysis of massive datasets without breaking them up in artificially independent chunks;\n",
+    "- ease of use: script and test on your laptop, port straight to HPC cluster; \n",
+    "- PyTorch-based: GPU support beyond the CUDA ecosystem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div>\n",
+    "  <img src=https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/heatvsdask_strong_smalldata_without.png?raw=true title=\"Strong scaling CPU\" width=\"30%\" style=\"float:center\"/>\n",
+    "  <img src=https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/heatvsdask_weak_smalldata_without.png?raw=true title=\"Weak scaling CPU\" width=\"30%\" style=\"float:center \"/>\n",
+    "  <img src=https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/weak_scaling_gpu_terrabyte.png?raw=true title=\"Weak scaling GPU\" width=\"30%\" style=\"float:center\"/>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connecting to ipyparallel cluster\n",
+    "\n",
+    "We have started an `ipcluster` with 4 engines at the end of the [Setup notebook](0_setup/0_setup_local.ipynb).\n",
+    "\n",
+    "Let's start the interactive session with a look into the `heat` data object. But first, we need to import the `ipyparallel` client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 engines found\n"
+     ]
+    }
+   ],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.ids\n",
+    "\n",
+    "if len(rc.ids) == 0:\n",
+    "    print(\"No engines found\")\n",
+    "else:\n",
+    "    print(f\"{len(rc.ids)} engines found\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will always start `heat` cells with the `%%px` magic command to execute the cell on all engines. However, the first section of this tutorial doesn't deal with distributed arrays. In these cases, we will use the `%%px --target 0` magic command to execute the cell only on the first engine.\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## DNDarrays\n",
+    "\n",
+    "\n",
+    "Similar to a NumPy `ndarray`, a Heat `dndarray`  (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension. \n",
+    "\n",
+    "Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a9e49353486a4ec5b84c5718033ed9d2",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "%px:   0%|          | 0/4 [00:00<?, ?tasks/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "1f283774-7944c3bff18cf904e3c744da",
+      "error": null,
+      "execute_input": "import heat as ht\na = ht.array([1, 2, 3])\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 1,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:33.530025Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "6ba7607d-e04b564e1775d69954ce923e",
+      "error": null,
+      "execute_input": "import heat as ht\na = ht.array([1, 2, 3])\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 1,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:33.530360Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:1]: \u001b[0mDNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "import heat as ht\na = ht.array([1, 2, 3])\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None)"
+       },
+       "execution_count": 1,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:33.529817Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "92c8e04b-51a47e4da9184bf5101ea3b3",
+      "error": null,
+      "execute_input": "import heat as ht\na = ht.array([1, 2, 3])\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 1,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:33.530200Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px \n",
+    "import heat as ht\n",
+    "a = ht.array([1, 2, 3])\n",
+    "a\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.ones((4, 5,))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:3]: \u001b[0mDNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "ht.arange(10)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:45.922663Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.arange(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:4]: \u001b[0m\n",
+       "DNDarray([[9., 9.],\n",
+       "          [9., 9.],\n",
+       "          [9., 9.]], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "ht.full((3, 2,), fill_value=9)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[9., 9.],\n          [9., 9.],\n          [9., 9.]], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 4,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:46.050829Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.full((3, 2,), fill_value=9)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data Types\n",
+    "\n",
+    "Heat supports various data types and operations to retrieve and manipulate the type of a Heat array. However, in contrast to NumPy, Heat is limited to logical (bool) and numerical types (uint8, int16/32/64, float32/64, and complex64/128). \n",
+    "\n",
+    "**NOTE:** by default, Heat will allocate floating-point values in single precision, due to a much higher processing performance on GPUs. This is one of the main differences between Heat and NumPy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:5]: \u001b[0m\n",
+       "DNDarray([[0., 0., 0., 0.],\n",
+       "          [0., 0., 0., 0.],\n",
+       "          [0., 0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a = ht.zeros((3, 4,))\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[0., 0., 0., 0.],\n          [0., 0., 0., 0.],\n          [0., 0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 5,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:46.176160Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.zeros((3, 4,))\n",
+    "a"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:6]: \u001b[0m\n",
+       "DNDarray([[0, 0, 0, 0],\n",
+       "          [0, 0, 0, 0],\n",
+       "          [0, 0, 0, 0]], dtype=ht.int64, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "b = a.astype(ht.int64)\nb\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[0, 0, 0, 0],\n          [0, 0, 0, 0],\n          [0, 0, 0, 0]], dtype=ht.int64, device=cpu:0, split=None)"
+       },
+       "execution_count": 6,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:46.307232Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "b = a.astype(ht.int64)\n",
+    "b"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Operations\n",
+    "\n",
+    "Heat supports many mathematical operations, ranging from simple element-wise functions, binary arithmetic operations, and linear algebra, to more powerful reductions. Operations are by default performed on the entire array or they can be performed along one or more of its dimensions when available. Most relevant for data-intensive applications is that **all Heat functionalities support memory-distributed computation and GPU acceleration**. This holds for all operations, including reductions, statistics, linear algebra, and high-level algorithms. \n",
+    "\n",
+    "You can try out the few simple examples below if you want, but we will skip to the [Parallel Processing](#Parallel-Processing) section to see memory-distributed operations in action."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.full((3, 4,), 8)\n",
+    "b = ht.ones((3, 4,))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:8]: \u001b[0m\n",
+       "DNDarray([[9., 9., 9., 9.],\n",
+       "          [9., 9., 9., 9.],\n",
+       "          [9., 9., 9., 9.]], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a + b\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[9., 9., 9., 9.],\n          [9., 9., 9., 9.],\n          [9., 9., 9., 9.]], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 8,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:46.818474Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a + b"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:9]: \u001b[0m\n",
+       "DNDarray([[7., 7., 7., 7.],\n",
+       "          [7., 7., 7., 7.],\n",
+       "          [7., 7., 7., 7.]], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "ht.sub(a, b)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[7., 7., 7., 7.],\n          [7., 7., 7., 7.],\n          [7., 7., 7., 7.]], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 9,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:46.944208Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.sub(a, b)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:10]: \u001b[0mDNDarray([ 0.0000,  0.8415,  0.9093,  0.1411, -0.7568], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "ht.arange(5).sin()\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([ 0.0000,  0.8415,  0.9093,  0.1411, -0.7568], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 10,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:47.057469Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.arange(5).sin()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:11]: \u001b[0m\n",
+       "DNDarray([[8., 8., 8.],\n",
+       "          [8., 8., 8.],\n",
+       "          [8., 8., 8.],\n",
+       "          [8., 8., 8.]], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a.T\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([[8., 8., 8.],\n          [8., 8., 8.],\n          [8., 8., 8.],\n          [8., 8., 8.]], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 11,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:47.178568Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a.T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:12]: \u001b[0mDNDarray([4., 4., 4.], dtype=ht.float32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "b.sum(axis=1)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([4., 4., 4.], dtype=ht.float32, device=cpu:0, split=None)"
+       },
+       "execution_count": 12,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:47.324713Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "b.sum(axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "Heat implements the same broadcasting rules (implicit repetion of an operation when the rank/shape of the operands do not match) as NumPy does, e.g.:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:13]: \u001b[0mDNDarray([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "ht.arange(10) + 3\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 13,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:47.436007Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.arange(10) + 3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:14]: \u001b[0m\n",
+       "(DNDarray([[1., 1., 1., 1.],\n",
+       "          [1., 1., 1., 1.],\n",
+       "          [1., 1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None),\n",
+       " DNDarray([0, 1, 2, 3], dtype=ht.int32, device=cpu:0, split=None),\n",
+       " DNDarray([[1., 2., 3., 4.],\n",
+       "          [1., 2., 3., 4.],\n",
+       "          [1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=None))"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a = ht.ones((3, 4,))\nb = ht.arange(4)\nc = a + b\n\na, b, c\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "(DNDarray([[1., 1., 1., 1.],\n          [1., 1., 1., 1.],\n          [1., 1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None),\n DNDarray([0, 1, 2, 3], dtype=ht.int32, device=cpu:0, split=None),\n DNDarray([[1., 2., 3., 4.],\n          [1., 2., 3., 4.],\n          [1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=None))"
+       },
+       "execution_count": 14,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:47.639454Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.ones((3, 4,))\n",
+    "b = ht.arange(4)\n",
+    "c = a + b\n",
+    "\n",
+    "a, b, c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Indexing\n",
+    "\n",
+    "Heat allows the indexing of arrays, and thereby, the extraction of a partial view of the elements in an array. It is possible to obtain single values as well as entire chunks, i.e. slices."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:15]: \u001b[0mDNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a = ht.arange(10)\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 15,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.008176Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.027255Z",
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "1f283774-7944c3bff18cf904e3c744da",
+      "error": null,
+      "execute_input": "a = ht.arange(10)\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 2,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_19",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.029808Z",
+      "started": "2025-05-26T13:23:48.017162Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.008542Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "6ba7607d-e04b564e1775d69954ce923e",
+      "error": null,
+      "execute_input": "a = ht.arange(10)\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 2,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.009226Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "92c8e04b-51a47e4da9184bf5101ea3b3",
+      "error": null,
+      "execute_input": "a = ht.arange(10)\na\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 2,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.009103Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "a = ht.arange(10)\n",
+    "a"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "1f283774-7944c3bff18cf904e3c744da",
+      "error": null,
+      "execute_input": "a[3]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.161611Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "6ba7607d-e04b564e1775d69954ce923e",
+      "error": null,
+      "execute_input": "a[3]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.161996Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "92c8e04b-51a47e4da9184bf5101ea3b3",
+      "error": null,
+      "execute_input": "a[3]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 3,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.161808Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:16]: \u001b[0mDNDarray(3, dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.191783Z",
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a[3]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray(3, dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 16,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_22",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.201000Z",
+      "started": "2025-05-26T13:23:48.171724Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.161033Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "a[3]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:17]: \u001b[0mDNDarray([1, 2, 3, 4, 5, 6], dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.347178Z",
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a[1:7]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([1, 2, 3, 4, 5, 6], dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 17,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_26",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.358342Z",
+      "started": "2025-05-26T13:23:48.338556Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.330525Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.349166Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "6ba7607d-e04b564e1775d69954ce923e",
+      "error": null,
+      "execute_input": "a[1:7]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 4,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_29",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.359482Z",
+      "started": "2025-05-26T13:23:48.342189Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.332068Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.356210Z",
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "1f283774-7944c3bff18cf904e3c744da",
+      "error": null,
+      "execute_input": "a[1:7]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 4,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_27",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.360588Z",
+      "started": "2025-05-26T13:23:48.342414Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.331655Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.366525Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "92c8e04b-51a47e4da9184bf5101ea3b3",
+      "error": null,
+      "execute_input": "a[1:7]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 4,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_28",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.369675Z",
+      "started": "2025-05-26T13:23:48.348209Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.331903Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "a[1:7]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:18]: \u001b[0mDNDarray([0, 2, 4, 6, 8], dtype=ht.int32, device=cpu:0, split=None)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "0baec09a-557ac92cbb4d09ae53564756",
+      "error": null,
+      "execute_input": "a[::2]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "DNDarray([0, 2, 4, 6, 8], dtype=ht.int32, device=cpu:0, split=None)"
+       },
+       "execution_count": 18,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.484327Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "6ba7607d-e04b564e1775d69954ce923e",
+      "error": null,
+      "execute_input": "a[::2]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 5,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.484970Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "1f283774-7944c3bff18cf904e3c744da",
+      "error": null,
+      "execute_input": "a[::2]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 5,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.484586Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-26T13:23:48.500203Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "92c8e04b-51a47e4da9184bf5101ea3b3",
+      "error": null,
+      "execute_input": "a[::2]\n",
+      "execute_result": {
+       "data": {
+        "text/plain": ""
+       },
+       "execution_count": 5,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0417f370-73d37041a9ac2531bf7eba83_117201_32",
+      "outputs": [],
+      "received": "2025-05-26T13:23:48.512685Z",
+      "started": "2025-05-26T13:23:48.494409Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-26T13:23:48.484659Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "a[::2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**NOTE:** Indexing in Heat is undergoing a [major overhaul](https://github.com/helmholtz-analytics/heat/pull/938), to increase interoperability with NumPy/PyTorch indexing, and to provide a fully distributed item setting functionality. Stay tuned for this feature in the next release."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Documentation\n",
+    "\n",
+    "Heat is extensively documented. You may find the online API reference on Read the Docs: [Heat Documentation](https://heat.readthedocs.io/). It is also possible to look up the docs in an interactive session."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Help on function sum in module heat.core.arithmetics:\n",
+       "\n",
+       "sum(a: 'DNDarray', axis: 'Union[int, Tuple[int, ...]]' = None, out: 'DNDarray' = None, keepdims: 'bool' = None) -> 'DNDarray'\n",
+       "    Sum of array elements over a given axis. An array with the same shape as ``self.__array`` except\n",
+       "    for the specified axis which becomes one, e.g.\n",
+       "    ``a.shape=(1, 2, 3)`` => ``ht.ones((1, 2, 3)).sum(axis=1).shape=(1, 1, 3)``\n",
+       "    \n",
+       "    Parameters\n",
+       "    ----------\n",
+       "    a : DNDarray\n",
+       "        Input array.\n",
+       "    axis : None or int or Tuple[int,...], optional\n",
+       "        Axis along which a sum is performed. The default, ``axis=None``, will sum all of the\n",
+       "        elements of the input array. If ``axis`` is negative it counts from the last to the first\n",
+       "        axis. If ``axis`` is a tuple of ints, a sum is performed on all of the axes specified in the\n",
+       "        tuple instead of a single axis or all the axes as before.\n",
+       "    out : DNDarray, optional\n",
+       "        Alternative output array in which to place the result. It must have the same shape as the\n",
+       "        expected output, but the datatype of the output values will be cast if necessary.\n",
+       "    keepdims : bool, optional\n",
+       "        If this is set to ``True``, the axes which are reduced are left in the result as dimensions\n",
+       "        with size one. With this option, the result will broadcast correctly against the input\n",
+       "        array.\n",
+       "    \n",
+       "    Examples\n",
+       "    --------\n",
+       "    >>> ht.sum(ht.ones(2))\n",
+       "    DNDarray(2., dtype=ht.float32, device=cpu:0, split=None)\n",
+       "    >>> ht.sum(ht.ones((3,3)))\n",
+       "    DNDarray(9., dtype=ht.float32, device=cpu:0, split=None)\n",
+       "    >>> ht.sum(ht.ones((3,3)).astype(ht.int))\n",
+       "    DNDarray(9, dtype=ht.int64, device=cpu:0, split=None)\n",
+       "    >>> ht.sum(ht.ones((3,2,1)), axis=-3)\n",
+       "    DNDarray([[3.],\n",
+       "              [3.]], dtype=ht.float32, device=cpu:0, split=None)\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "help(ht.sum)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Parallel Processing\n",
+    "\n",
+    "Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs. \n",
+    "\n",
+    "For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem.\n",
+    "\n",
+    "**NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### GPUs\n",
+    "\n",
+    "Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-block alert-info\">\n",
+    "<b>The following cells will only work if you have a GPU available.</b>\n",
+    "\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[0:execute]\n",
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m\n",
+      "\u001b[31mKeyError\u001b[39m                                  Traceback (most recent call last)\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:190\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m\n",
+      "\u001b[32m    189\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "\u001b[32m--> \u001b[39m\u001b[32m190\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__device_mapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m.\u001b[49m\u001b[43mstrip\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mlower\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\n",
+      "\u001b[32m    191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n",
+      "\n",
+      "\u001b[31mKeyError\u001b[39m: 'gpu'\n",
+      "\n",
+      "During handling of the above exception, another exception occurred:\n",
+      "\n",
+      "\u001b[31mValueError\u001b[39m                                Traceback (most recent call last)\n",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[20]\u001b[39m\u001b[32m, line 1\u001b[39m\n",
+      "\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mht\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m4\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mgpu\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n",
+      "\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:1489\u001b[39m, in \u001b[36mzeros\u001b[39m\u001b[34m(shape, dtype, split, device, comm, order)\u001b[39m\n",
+      "\u001b[32m   1451\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n",
+      "\u001b[32m   1452\u001b[39m \u001b[33;03mReturns a new :class:`~heat.core.dndarray.DNDarray` of given shape and data type filled with zero values.\u001b[39;00m\n",
+      "\u001b[32m   1453\u001b[39m \u001b[33;03mMay be allocated split up across multiple nodes along the specified axis.\u001b[39;00m\n",
+      "\u001b[32m   (...)\u001b[39m\u001b[32m   1486\u001b[39m \u001b[33;03m          [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m\n",
+      "\u001b[32m   1487\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n",
+      "\u001b[32m   1488\u001b[39m \u001b[38;5;66;03m# TODO: implement 'K' option when torch.clone() fix to preserve memory layout is released.\u001b[39;00m\n",
+      "\u001b[32m-> \u001b[39m\u001b[32m1489\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__factory\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msplit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcomm\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:763\u001b[39m, in \u001b[36m__factory\u001b[39m\u001b[34m(shape, dtype, split, local_factory, device, comm, order)\u001b[39m\n",
+      "\u001b[32m    761\u001b[39m dtype = types.canonical_heat_type(dtype)\n",
+      "\u001b[32m    762\u001b[39m split = sanitize_axis(shape, split)\n",
+      "\u001b[32m--> \u001b[39m\u001b[32m763\u001b[39m device = \u001b[43mdevices\u001b[49m\u001b[43m.\u001b[49m\u001b[43msanitize_device\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[32m    764\u001b[39m comm = sanitize_comm(comm)\n",
+      "\u001b[32m    766\u001b[39m \u001b[38;5;66;03m# chunk the shape if necessary\u001b[39;00m\n",
+      "\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:192\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m\n",
+      "\u001b[32m    190\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m __device_mapping[device.strip().lower()]\n",
+      "\u001b[32m    191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n",
+      "\u001b[32m--> \u001b[39m\u001b[32m192\u001b[39m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m'\u001b[39m\u001b[33mUnknown device, must be one of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[33m\"\u001b[39m\u001b[33m, \u001b[39m\u001b[33m\"\u001b[39m.join(__device_mapping.keys())\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m)\n",
+      "\n",
+      "\u001b[31mValueError\u001b[39m: Unknown device, must be one of cpu\n"
+     ]
+    },
+    {
+     "ename": "RemoteError",
+     "evalue": "[0:execute] ValueError: Unknown device, must be one of cpu",
+     "output_type": "error",
+     "traceback": [
+      "[0:execute]",
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mKeyError\u001b[39m                                  Traceback (most recent call last)",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:190\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m",
+      "\u001b[32m    189\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:",
+      "\u001b[32m--> \u001b[39m\u001b[32m190\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__device_mapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m.\u001b[49m\u001b[43mstrip\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mlower\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m",
+      "\u001b[32m    191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):",
+      "",
+      "\u001b[31mKeyError\u001b[39m: 'gpu'",
+      "",
+      "During handling of the above exception, another exception occurred:",
+      "",
+      "\u001b[31mValueError\u001b[39m                                Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[20]\u001b[39m\u001b[32m, line 1\u001b[39m",
+      "\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mht\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m4\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mgpu\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m",
+      "",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:1489\u001b[39m, in \u001b[36mzeros\u001b[39m\u001b[34m(shape, dtype, split, device, comm, order)\u001b[39m",
+      "\u001b[32m   1451\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m",
+      "\u001b[32m   1452\u001b[39m \u001b[33;03mReturns a new :class:`~heat.core.dndarray.DNDarray` of given shape and data type filled with zero values.\u001b[39;00m",
+      "\u001b[32m   1453\u001b[39m \u001b[33;03mMay be allocated split up across multiple nodes along the specified axis.\u001b[39;00m",
+      "\u001b[32m   (...)\u001b[39m\u001b[32m   1486\u001b[39m \u001b[33;03m          [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m",
+      "\u001b[32m   1487\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m",
+      "\u001b[32m   1488\u001b[39m \u001b[38;5;66;03m# TODO: implement 'K' option when torch.clone() fix to preserve memory layout is released.\u001b[39;00m",
+      "\u001b[32m-> \u001b[39m\u001b[32m1489\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__factory\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msplit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcomm\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m)\u001b[49m",
+      "",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:763\u001b[39m, in \u001b[36m__factory\u001b[39m\u001b[34m(shape, dtype, split, local_factory, device, comm, order)\u001b[39m",
+      "\u001b[32m    761\u001b[39m dtype = types.canonical_heat_type(dtype)",
+      "\u001b[32m    762\u001b[39m split = sanitize_axis(shape, split)",
+      "\u001b[32m--> \u001b[39m\u001b[32m763\u001b[39m device = \u001b[43mdevices\u001b[49m\u001b[43m.\u001b[49m\u001b[43msanitize_device\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m)\u001b[49m",
+      "\u001b[32m    764\u001b[39m comm = sanitize_comm(comm)",
+      "\u001b[32m    766\u001b[39m \u001b[38;5;66;03m# chunk the shape if necessary\u001b[39;00m",
+      "",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:192\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m",
+      "\u001b[32m    190\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m __device_mapping[device.strip().lower()]",
+      "\u001b[32m    191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):",
+      "\u001b[32m--> \u001b[39m\u001b[32m192\u001b[39m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m'\u001b[39m\u001b[33mUnknown device, must be one of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[33m\"\u001b[39m\u001b[33m, \u001b[39m\u001b[33m\"\u001b[39m.join(__device_mapping.keys())\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m)",
+      "",
+      "\u001b[31mValueError\u001b[39m: Unknown device, must be one of cpu"
+     ]
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "ht.zeros((3, 4,), device='gpu')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Arrays on the same device can be seamlessly used in any Heat operation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:21]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (3, 4), Split: None, Local Shape: (3, 4), Device: gpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "a = ht.zeros((3, 4,), device='gpu')\nb = ht.ones((3, 4,), device='gpu')\na + b\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (3, 4), Split: None, Local Shape: (3, 4), Device: gpu:0, Dtype: float32)>"
+       },
+       "execution_count": 21,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:40.413421Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.zeros((3, 4,), device='gpu')\n",
+    "b = ht.ones((3, 4,), device='gpu')\n",
+    "a + b"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[0:execute]\n",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n",
+      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)\n",
+      "Cell \u001b[0;32mIn[22], line 3\u001b[0m\n",
+      "\u001b[1;32m      1\u001b[0m a \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mfull((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), \u001b[38;5;241m4\u001b[39m, device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
+      "\u001b[1;32m      2\u001b[0m b \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mones((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mgpu\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
+      "\u001b[0;32m----> 3\u001b[0m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m\n",
+      "\n",
+      "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:124\u001b[0m, in \u001b[0;36m_add\u001b[0;34m(self, other)\u001b[0m\n",
+      "\u001b[1;32m    122\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_add\u001b[39m(\u001b[38;5;28mself\u001b[39m, other):\n",
+      "\u001b[1;32m    123\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "\u001b[0;32m--> 124\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43madd\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mother\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[1;32m    125\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:\n",
+      "\u001b[1;32m    126\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mNotImplemented\u001b[39m\n",
+      "\n",
+      "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:119\u001b[0m, in \u001b[0;36madd\u001b[0;34m(t1, t2, out, where)\u001b[0m\n",
+      "\u001b[1;32m     74\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21madd\u001b[39m(\n",
+      "\u001b[1;32m     75\u001b[0m     t1: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],\n",
+      "\u001b[1;32m     76\u001b[0m     t2: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],\n",
+      "\u001b[0;32m   (...)\u001b[0m\n",
+      "\u001b[1;32m     80\u001b[0m     where: Union[\u001b[38;5;28mbool\u001b[39m, DNDarray] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m,\n",
+      "\u001b[1;32m     81\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DNDarray:\n",
+      "\u001b[1;32m     82\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n",
+      "\u001b[1;32m     83\u001b[0m \u001b[38;5;124;03m    Element-wise addition of values from two operands, commutative.\u001b[39;00m\n",
+      "\u001b[1;32m     84\u001b[0m \u001b[38;5;124;03m    Takes the first and second operand (scalar or :class:`~heat.core.dndarray.DNDarray`) whose\u001b[39;00m\n",
+      "\u001b[0;32m   (...)\u001b[0m\n",
+      "\u001b[1;32m    117\u001b[0m \u001b[38;5;124;03m              [5., 6.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m\n",
+      "\u001b[1;32m    118\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n",
+      "\u001b[0;32m--> 119\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_operations\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__binary_op\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt1\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\n",
+      "File \u001b[0;32m~/code/heat/heat/core/_operations.py:204\u001b[0m, in \u001b[0;36m__binary_op\u001b[0;34m(operation, t1, t2, out, where, fn_kwargs)\u001b[0m\n",
+      "\u001b[1;32m    201\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m t1\u001b[38;5;241m.\u001b[39mlarray\u001b[38;5;241m.\u001b[39mis_mps \u001b[38;5;129;01mand\u001b[39;00m promoted_type \u001b[38;5;241m==\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat64:\n",
+      "\u001b[1;32m    202\u001b[0m     promoted_type \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat32\n",
+      "\u001b[0;32m--> 204\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43moperation\u001b[49m\u001b[43m(\u001b[49m\u001b[43mt1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfn_kwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[1;32m    206\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m out \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m where \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n",
+      "\u001b[1;32m    207\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m DNDarray(\n",
+      "\u001b[1;32m    208\u001b[0m         result,\n",
+      "\u001b[1;32m    209\u001b[0m         output_shape,\n",
+      "\u001b[0;32m   (...)\u001b[0m\n",
+      "\u001b[1;32m    214\u001b[0m         balanced\u001b[38;5;241m=\u001b[39moutput_balanced,\n",
+      "\u001b[1;32m    215\u001b[0m     )\n",
+      "\n",
+      "\u001b[0;31mRuntimeError\u001b[0m: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\n"
+     ]
+    },
+    {
+     "ename": "RemoteError",
+     "evalue": "[0:execute] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!",
+     "output_type": "error",
+     "traceback": [
+      "[0:execute]",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[22], line 3\u001b[0m",
+      "\u001b[1;32m      1\u001b[0m a \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mfull((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), \u001b[38;5;241m4\u001b[39m, device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m'\u001b[39m)",
+      "\u001b[1;32m      2\u001b[0m b \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mones((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mgpu\u001b[39m\u001b[38;5;124m'\u001b[39m)",
+      "\u001b[0;32m----> 3\u001b[0m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m",
+      "",
+      "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:124\u001b[0m, in \u001b[0;36m_add\u001b[0;34m(self, other)\u001b[0m",
+      "\u001b[1;32m    122\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_add\u001b[39m(\u001b[38;5;28mself\u001b[39m, other):",
+      "\u001b[1;32m    123\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:",
+      "\u001b[0;32m--> 124\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43madd\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mother\u001b[49m\u001b[43m)\u001b[49m",
+      "\u001b[1;32m    125\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:",
+      "\u001b[1;32m    126\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mNotImplemented\u001b[39m",
+      "",
+      "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:119\u001b[0m, in \u001b[0;36madd\u001b[0;34m(t1, t2, out, where)\u001b[0m",
+      "\u001b[1;32m     74\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21madd\u001b[39m(",
+      "\u001b[1;32m     75\u001b[0m     t1: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],",
+      "\u001b[1;32m     76\u001b[0m     t2: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],",
+      "\u001b[0;32m   (...)\u001b[0m",
+      "\u001b[1;32m     80\u001b[0m     where: Union[\u001b[38;5;28mbool\u001b[39m, DNDarray] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m,",
+      "\u001b[1;32m     81\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DNDarray:",
+      "\u001b[1;32m     82\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m",
+      "\u001b[1;32m     83\u001b[0m \u001b[38;5;124;03m    Element-wise addition of values from two operands, commutative.\u001b[39;00m",
+      "\u001b[1;32m     84\u001b[0m \u001b[38;5;124;03m    Takes the first and second operand (scalar or :class:`~heat.core.dndarray.DNDarray`) whose\u001b[39;00m",
+      "\u001b[0;32m   (...)\u001b[0m",
+      "\u001b[1;32m    117\u001b[0m \u001b[38;5;124;03m              [5., 6.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m",
+      "\u001b[1;32m    118\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m",
+      "\u001b[0;32m--> 119\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_operations\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__binary_op\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt1\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m",
+      "",
+      "File \u001b[0;32m~/code/heat/heat/core/_operations.py:204\u001b[0m, in \u001b[0;36m__binary_op\u001b[0;34m(operation, t1, t2, out, where, fn_kwargs)\u001b[0m",
+      "\u001b[1;32m    201\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m t1\u001b[38;5;241m.\u001b[39mlarray\u001b[38;5;241m.\u001b[39mis_mps \u001b[38;5;129;01mand\u001b[39;00m promoted_type \u001b[38;5;241m==\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat64:",
+      "\u001b[1;32m    202\u001b[0m     promoted_type \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat32",
+      "\u001b[0;32m--> 204\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43moperation\u001b[49m\u001b[43m(\u001b[49m\u001b[43mt1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfn_kwargs\u001b[49m\u001b[43m)\u001b[49m",
+      "\u001b[1;32m    206\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m out \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m where \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:",
+      "\u001b[1;32m    207\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m DNDarray(",
+      "\u001b[1;32m    208\u001b[0m         result,",
+      "\u001b[1;32m    209\u001b[0m         output_shape,",
+      "\u001b[0;32m   (...)\u001b[0m",
+      "\u001b[1;32m    214\u001b[0m         balanced\u001b[38;5;241m=\u001b[39moutput_balanced,",
+      "\u001b[1;32m    215\u001b[0m     )",
+      "",
+      "\u001b[0;31mRuntimeError\u001b[0m: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"
+     ]
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.full((3, 4,), 4, device='cpu')\n",
+    "b = ht.ones((3, 4,), device='gpu')\n",
+    "a + b"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It is possible to explicitly move an array from one device to the other and back to avoid this error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:23]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (3, 4), Split: None, Local Shape: (3, 4), Device: cpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "a = ht.full((3, 4,), 4, device='gpu')\na.cpu()\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (3, 4), Split: None, Local Shape: (3, 4), Device: cpu:0, Dtype: float32)>"
+       },
+       "execution_count": 23,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.011333Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "a = ht.full((3, 4,), 4, device='gpu')\n",
+    "a.cpu()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll put our multi-GPU setup to the test in the next section."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Distributed Computing\n",
+    "\n",
+    "Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept.\n",
+    "\n",
+    "<img src=\"https://github.com/helmholtz-analytics/heat/blob/main/doc/source/_static/images/split_array.png?raw=true\" width=\"100%\"></img>\n",
+    "\n",
+    "The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions. \n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.\n",
+    "\n",
+    "You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node.\n",
+    "\n",
+    "On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This\n",
+    "\n",
+    "```bash\n",
+    "python ./my_script.py\n",
+    "```\n",
+    "\n",
+    "becomes this instead:\n",
+    "\n",
+    "```bash\n",
+    "mpirun -n <number_of_processors> python ./my_script.py\n",
+    "```\n",
+    "On an HPC cluster you'll of course use SBATCH or similar.\n",
+    "\n",
+    "\n",
+    "Let's see some examples of working with distributed Heat:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:6]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 6,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.052126Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:6]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 6,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.052193Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:24]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 24,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.052033Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:6]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.061012Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: None, Local Shape: (5, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 6,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_42",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.067009Z",
+      "started": "2025-05-19T19:17:51.055404Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.052218Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import heat as ht\n",
+    "dndarray = ht.arange(60).reshape(5,4,3)\n",
+    "dndarray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.\n",
+    "\n",
+    "Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:7]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 7,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.106705Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:25]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: 0, Local Shape: (2, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: 0, Local Shape: (2, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 25,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.106454Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:7]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 7,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.106872Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:7]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 7,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.106799Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "dndarray = ht.arange(60, split=0).reshape(5,4,3)\n",
+    "dndarray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the \"local array\" via the `larray` attribute."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:8]: \u001b[0m\n",
+       "tensor([[[24, 25, 26],\n",
+       "         [27, 28, 29],\n",
+       "         [30, 31, 32],\n",
+       "         [33, 34, 35]]], dtype=torch.int32)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.194662Z",
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "dndarray.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[24, 25, 26],\n         [27, 28, 29],\n         [30, 31, 32],\n         [33, 34, 35]]], dtype=torch.int32)"
+       },
+       "execution_count": 8,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_48",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.198154Z",
+      "started": "2025-05-19T19:17:51.190508Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.178849Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:8]: \u001b[0m\n",
+       "tensor([[[48, 49, 50],\n",
+       "         [51, 52, 53],\n",
+       "         [54, 55, 56],\n",
+       "         [57, 58, 59]]], dtype=torch.int32)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.194657Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "dndarray.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[48, 49, 50],\n         [51, 52, 53],\n         [54, 55, 56],\n         [57, 58, 59]]], dtype=torch.int32)"
+       },
+       "execution_count": 8,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_50",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.197555Z",
+      "started": "2025-05-19T19:17:51.191263Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.180344Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:26]: \u001b[0m\n",
+       "tensor([[[ 0,  1,  2],\n",
+       "         [ 3,  4,  5],\n",
+       "         [ 6,  7,  8],\n",
+       "         [ 9, 10, 11]],\n",
+       "\n",
+       "        [[12, 13, 14],\n",
+       "         [15, 16, 17],\n",
+       "         [18, 19, 20],\n",
+       "         [21, 22, 23]]], dtype=torch.int32)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.195580Z",
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "dndarray.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[ 0,  1,  2],\n         [ 3,  4,  5],\n         [ 6,  7,  8],\n         [ 9, 10, 11]],\n\n        [[12, 13, 14],\n         [15, 16, 17],\n         [18, 19, 20],\n         [21, 22, 23]]], dtype=torch.int32)"
+       },
+       "execution_count": 26,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_47",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.198806Z",
+      "started": "2025-05-19T19:17:51.190353Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.178691Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:8]: \u001b[0m\n",
+       "tensor([[[36, 37, 38],\n",
+       "         [39, 40, 41],\n",
+       "         [42, 43, 44],\n",
+       "         [45, 46, 47]]], dtype=torch.int32)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.196655Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "dndarray.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[36, 37, 38],\n         [39, 40, 41],\n         [42, 43, 44],\n         [45, 46, 47]]], dtype=torch.int32)"
+       },
+       "execution_count": 8,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_49",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.204676Z",
+      "started": "2025-05-19T19:17:51.191467Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.179122Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "dndarray.larray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator.\n",
+    "\n",
+    "(In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Global shape of the dndarray: (5, 4, 3)\n",
+       "On rank 0/4, local shape of the dndarray: (2, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] Global shape of the dndarray: (5, 4, 3)\n",
+       "On rank 1/4, local shape of the dndarray: (1, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] Global shape of the dndarray: (5, 4, 3)\n",
+       "On rank 2/4, local shape of the dndarray: (1, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] Global shape of the dndarray: (5, 4, 3)\n",
+       "On rank 3/4, local shape of the dndarray: (1, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "print(f\"Global shape of the dndarray: {dndarray.shape}\")\n",
+    "print(f\"On rank {dndarray.comm.rank}/{dndarray.comm.size}, local shape of the dndarray: {dndarray.lshape}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported.  \n",
+    "\n",
+    "The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:28]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 2), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "# transpose \ndndarray.T\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 2), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 28,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.287542Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:10]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.294221Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "# transpose \ndndarray.T\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 10,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_57",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.297046Z",
+      "started": "2025-05-19T19:17:51.290699Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.288331Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:10]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.295026Z",
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "# transpose \ndndarray.T\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 10,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_56",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.297591Z",
+      "started": "2025-05-19T19:17:51.290440Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.288210Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:10]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.296667Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "# transpose \ndndarray.T\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (3, 4, 5), Split: 2, Local Shape: (3, 4, 1), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 10,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_58",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.300499Z",
+      "started": "2025-05-19T19:17:51.293633Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.288398Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px \n",
+    "# transpose \n",
+    "dndarray.T\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] The slowest run took 31.60 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "504 µs ± 876 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] The slowest run took 28.84 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "501 µs ± 864 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] The slowest run took 29.75 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "503 µs ± 880 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] The slowest run took 8.36 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "237 µs ± 216 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# reduction operation along the distribution axis\n",
+    "%timeit -n 1 dndarray.sum(axis=0)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] The slowest run took 13.43 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "114 µs ± 141 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] 72.7 µs ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] 71.7 µs ± 35.8 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] The slowest run took 15.67 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "183 µs ± 291 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px \n",
+    "# reduction operation along non-distribution axis: no communication required\n",
+    "%timeit -n 1 dndarray.sum(axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Operations between tensors with equal split or no split are fully parallelizable and therefore very fast."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:13]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 13,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.515462Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:31]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: 0, Local Shape: (2, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (5, 4, 3), Split: 0, Local Shape: (2, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 31,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.514668Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:13]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.529643Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 13,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_70",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.532912Z",
+      "started": "2025-05-19T19:17:51.518984Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.516415Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:13]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:17:51.529626Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (5, 4, 3), Split: 0, Local Shape: (1, 4, 3), Device: cpu:0, Dtype: int32)>"
+       },
+       "execution_count": 13,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_69",
+      "outputs": [],
+      "received": "2025-05-19T19:17:51.534904Z",
+      "started": "2025-05-19T19:17:51.522307Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:17:51.516241Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n",
+    "\n",
+    "# element-wise multiplication\n",
+    "dndarray * other_dndarray\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems.\n",
+    "\n",
+    "So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize. \n",
+    "\n",
+    "With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "# free up memory if necessary\n",
+    "try:\n",
+    "    del x, y, z\n",
+    "except NameError:\n",
+    "    pass\n",
+    "\n",
+    "n, m = 4000, 4000\n",
+    "x = ht.random.randn(n, m, split=0, device=\"gpu\") # distributed RNG\n",
+    "y = ht.random.randn(m, n, split=None, device=\"gpu\")\n",
+    "z =  x @ y\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that:\n",
+    "- If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.)\n",
+    "- If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU.\n",
+    "- If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option.\n",
+    "\n",
+    "Time the multiplication for different split parameters and see how the performance changes.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] The slowest run took 15.33 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "2.78 ms ± 2.76 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] The slowest run took 14.90 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "2.69 ms ± 2.65 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] The slowest run took 14.88 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "2.22 ms ± 2.24 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] The slowest run took 14.81 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+       "2.7 ms ± 2.66 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "z = %timeit -n 1 -r 5 x @ y "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Heat supports many linear algebra operations:\n",
+    "```bash\n",
+    ">>> ht.linalg.\n",
+    "ht.linalg.basics        ht.linalg.hsvd_rtol(    ht.linalg.projection(   ht.linalg.triu(\n",
+    "ht.linalg.cg(           ht.linalg.inv(          ht.linalg.qr(           ht.linalg.vdot(\n",
+    "ht.linalg.cross(        ht.linalg.lanczos(      ht.linalg.solver        ht.linalg.vecdot(\n",
+    "ht.linalg.det(          ht.linalg.matmul(       ht.linalg.svdtools      ht.linalg.vector_norm(\n",
+    "ht.linalg.dot(          ht.linalg.matrix_norm(  ht.linalg.trace(        \n",
+    "ht.linalg.hsvd(         ht.linalg.norm(         ht.linalg.transpose(    \n",
+    "ht.linalg.hsvd_rank(    ht.linalg.outer(        ht.linalg.tril(         \n",
+    "```\n",
+    "\n",
+    "and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](https://github.com/helmholtz-analytics/heat/issues) and we'll get back to you."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Interoperability\n",
+    "\n",
+    "We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows. Here a basic example with xarrays:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[0:execute]\n",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)\n",
+      "Cell \u001b[0;32mIn[34], line 1\u001b[0m\n",
+      "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n",
+      "\u001b[1;32m      3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[1;32m      4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n",
+      "\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n",
+      "[2:execute]\n",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)\n",
+      "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n",
+      "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n",
+      "\u001b[1;32m      3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[1;32m      4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n",
+      "\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n",
+      "[1:execute]\n",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)\n",
+      "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n",
+      "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n",
+      "\u001b[1;32m      3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[1;32m      4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n",
+      "\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n",
+      "[3:execute]\n",
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)\n",
+      "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n",
+      "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n",
+      "\u001b[1;32m      3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[1;32m      4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n",
+      "\n",
+      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n"
+     ]
+    },
+    {
+     "ename": "AlreadyDisplayedError",
+     "evalue": "4 errors",
+     "output_type": "error",
+     "traceback": [
+      "4 errors"
+     ]
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import xarray as xr\n",
+    "\n",
+    "local_xr = xr.DataArray(dndarray.larray, dims=(\"z\", \"y\", \"x\"))\n",
+    "# proceed with local xarray operations\n",
+    "local_xr\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**NOTE:** this is not a distributed `xarray`, but local xarray objects on each rank.\n",
+    "Work on [expanding xarray support](https://github.com/helmholtz-analytics/heat/pull/1183) is ongoing.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] tensor([-1,  1,  2,  3,  4])\n",
+       "tensor([0, 1, 2, 3, 4])\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] tensor([-1,  1,  2,  3,  4])\n",
+       "tensor([0, 1, 2, 3, 4])\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] tensor([-1,  1,  2,  3,  4])\n",
+       "tensor([0, 1, 2, 3, 4])\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] tensor([-1,  1,  2,  3,  4])\n",
+       "tensor([0, 1, 2, 3, 4])\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import torch\n",
+    "torch_array = torch.arange(5)\n",
+    "heat_array = ht.array(torch_array, copy=False)\n",
+    "heat_array[0] = -1\n",
+    "print(torch_array)\n",
+    "\n",
+    "torch_array = torch.arange(5)\n",
+    "heat_array = ht.array(torch_array, copy=True)\n",
+    "heat_array[0] = -1\n",
+    "print(torch_array)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](https://github.com/helmholtz-analytics/heat/issues) if you encounter any issues or have any feature requests."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the [next notebook](2_internals.ipynb), let's have a look at Heat's most important internal functions."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev311",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/source/tutorials/notebooks/2_internals.ipynb b/doc/source/tutorials/notebooks/2_internals.ipynb
new file mode 100644
index 0000000000..27f823ba78
--- /dev/null
+++ b/doc/source/tutorials/notebooks/2_internals.ipynb
@@ -0,0 +1,1417 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Heat as infrastructure for MPI applications\n",
+    "\n",
+    "In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your IPython cluster should still be running. Let's check it out."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 engines found\n"
+     ]
+    }
+   ],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.ids\n",
+    "\n",
+    "if len(rc.ids) == 0:\n",
+    "    print(\"No engines found\")\n",
+    "else:\n",
+    "    print(f\"{len(rc.ids)} engines found\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If no engines are found, go back to the [Intro](0_setup/0_setup_local.ipynb) for instructions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We already mentioned that the DNDarray object is \"MPI-aware\". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it. \n",
+    "\n",
+    "We will use the %%px magic in every cell that executes MPI code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:22]: \u001b[0m<heat.core.communication.MPICommunication at 0x7be987008bd0>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<heat.core.communication.MPICommunication at 0x7be987008bd0>"
+       },
+       "execution_count": 22,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:41.928917Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:22]: \u001b[0m<heat.core.communication.MPICommunication at 0x798dd77977d0>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<heat.core.communication.MPICommunication at 0x798dd77977d0>"
+       },
+       "execution_count": 22,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:41.928982Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:22]: \u001b[0m<heat.core.communication.MPICommunication at 0x73c55613e1d0>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<heat.core.communication.MPICommunication at 0x73c55613e1d0>"
+       },
+       "execution_count": 22,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:41.929045Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:40]: \u001b[0m<heat.core.communication.MPICommunication at 0x74dcbbfd7490>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:20:41.934024Z",
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<heat.core.communication.MPICommunication at 0x74dcbbfd7490>"
+       },
+       "execution_count": 40,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "cf6f5092-7287c6a9544d4c34e4c3830f_231404_21",
+      "outputs": [],
+      "received": "2025-05-19T19:20:41.941201Z",
+      "started": "2025-05-19T19:20:41.930851Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:41.928786Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import torch\n",
+    "import heat as ht\n",
+    "\n",
+    "a = ht.random.randn(7,4,3, split=0)\n",
+    "a.comm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] a is distributed over 4 processes\n",
+       "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] a is distributed over 4 processes\n",
+       "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] a is distributed over 4 processes\n",
+       "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] a is distributed over 4 processes\n",
+       "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# MPI size = total number of processes \n",
+    "size = a.comm.size\n",
+    "\n",
+    "print(f\"a is distributed over {size} processes\")\n",
+    "print(f\"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] Rank 1 holds a slice of a with local shape (2, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Rank 0 holds a slice of a with local shape (2, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] Rank 2 holds a slice of a with local shape (2, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] Rank 3 holds a slice of a with local shape (1, 4, 3)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# MPI rank = rank of each process\n",
+    "rank = a.comm.rank\n",
+    "# Local shape = shape of the data on each process\n",
+    "local_shape = a.lshape\n",
+    "print(f\"Rank {rank} holds a slice of a with local shape {local_shape}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Distribution map\n",
+    "\n",
+    "In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array. \n",
+    "\n",
+    "The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray. \n",
+    "\n",
+    "The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:25]: \u001b[0m\n",
+       "tensor([[2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [1, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "lshape_map = a.lshape_map\nlshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [1, 4, 3]])"
+       },
+       "execution_count": 25,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:45.543757Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:25]: \u001b[0m\n",
+       "tensor([[2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [1, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "lshape_map = a.lshape_map\nlshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [1, 4, 3]])"
+       },
+       "execution_count": 25,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:45.543320Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:25]: \u001b[0m\n",
+       "tensor([[2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [1, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "lshape_map = a.lshape_map\nlshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [1, 4, 3]])"
+       },
+       "execution_count": 25,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:45.543554Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:43]: \u001b[0m\n",
+       "tensor([[2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [1, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "lshape_map = a.lshape_map\nlshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [1, 4, 3]])"
+       },
+       "execution_count": 43,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:45.543032Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "lshape_map = a.lshape_map\n",
+    "lshape_map"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Modifying the DNDarray distribution\n",
+    "\n",
+    "In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm.\n",
+    "\n",
+    "Depending on your needs, you can choose between:\n",
+    "- `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a \"target map\".\n",
+    "- `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis.\n",
+    "\n",
+    "Let's see some examples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:44]: \u001b[0m\n",
+       "tensor([[1, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3]])"
+       },
+       "execution_count": 44,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:47.671295Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:26]: \u001b[0m\n",
+       "tensor([[1, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3]])"
+       },
+       "execution_count": 26,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:47.671730Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:26]: \u001b[0m\n",
+       "tensor([[1, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3]])"
+       },
+       "execution_count": 26,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:47.671921Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:26]: \u001b[0m\n",
+       "tensor([[1, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3],\n",
+       "        [2, 4, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[1, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3],\n        [2, 4, 3]])"
+       },
+       "execution_count": 26,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:47.671506Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "#redistribute\n",
+    "target_map = a.lshape_map\n",
+    "target_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n",
+    "# in-place redistribution (see ht.redistribute for out-of-place)\n",
+    "a.redistribute_(target_map=target_map)\n",
+    "\n",
+    "# new lshape map after redistribution\n",
+    "a.lshape_map"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:27]: \u001b[0m\n",
+       "tensor([[[ 0.5730, -1.0918, -0.8577],\n",
+       "         [ 0.6610, -0.4874,  0.9850],\n",
+       "         [ 1.0930, -0.8518, -0.7061],\n",
+       "         [-0.7625,  0.6767,  0.1940]],\n",
+       "\n",
+       "        [[-1.1230,  0.2482,  0.7127],\n",
+       "         [-0.3202, -0.3510, -1.2052],\n",
+       "         [-1.0595, -0.5830,  0.4192],\n",
+       "         [ 0.5600, -1.2777, -0.1323]]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "# local arrays after redistribution\na.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[ 0.5730, -1.0918, -0.8577],\n         [ 0.6610, -0.4874,  0.9850],\n         [ 1.0930, -0.8518, -0.7061],\n         [-0.7625,  0.6767,  0.1940]],\n\n        [[-1.1230,  0.2482,  0.7127],\n         [-0.3202, -0.3510, -1.2052],\n         [-1.0595, -0.5830,  0.4192],\n         [ 0.5600, -1.2777, -0.1323]]])"
+       },
+       "execution_count": 27,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:48.893023Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:27]: \u001b[0m\n",
+       "tensor([[[ 1.6286,  0.4707, -0.5730],\n",
+       "         [ 0.3841, -0.4789, -0.8033],\n",
+       "         [ 0.1299, -0.6602, -2.0182],\n",
+       "         [ 0.5541, -0.1653, -0.4314]],\n",
+       "\n",
+       "        [[ 1.1544, -0.8126, -0.7634],\n",
+       "         [-0.0817, -1.5430, -0.6341],\n",
+       "         [ 0.0291,  0.9677,  0.1294],\n",
+       "         [-0.3747, -1.4987, -0.1063]]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "# local arrays after redistribution\na.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[ 1.6286,  0.4707, -0.5730],\n         [ 0.3841, -0.4789, -0.8033],\n         [ 0.1299, -0.6602, -2.0182],\n         [ 0.5541, -0.1653, -0.4314]],\n\n        [[ 1.1544, -0.8126, -0.7634],\n         [-0.0817, -1.5430, -0.6341],\n         [ 0.0291,  0.9677,  0.1294],\n         [-0.3747, -1.4987, -0.1063]]])"
+       },
+       "execution_count": 27,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:48.892765Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:27]: \u001b[0m\n",
+       "tensor([[[-0.0919, -0.7646,  0.1660],\n",
+       "         [-0.9814,  0.9445, -1.8339],\n",
+       "         [-1.0218,  0.8454, -0.6050],\n",
+       "         [-0.4161, -0.0764,  0.4383]],\n",
+       "\n",
+       "        [[ 0.3151, -2.1761,  0.9970],\n",
+       "         [ 0.9423,  0.7667,  0.6834],\n",
+       "         [ 1.9586, -0.0994,  0.0186],\n",
+       "         [-0.0961, -0.3901,  1.2133]]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "# local arrays after redistribution\na.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[-0.0919, -0.7646,  0.1660],\n         [-0.9814,  0.9445, -1.8339],\n         [-1.0218,  0.8454, -0.6050],\n         [-0.4161, -0.0764,  0.4383]],\n\n        [[ 0.3151, -2.1761,  0.9970],\n         [ 0.9423,  0.7667,  0.6834],\n         [ 1.9586, -0.0994,  0.0186],\n         [-0.0961, -0.3901,  1.2133]]])"
+       },
+       "execution_count": 27,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:48.892426Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:45]: \u001b[0m\n",
+       "tensor([[[-0.1776, -0.8116, -0.6636],\n",
+       "         [ 0.3238,  2.4110,  0.4005],\n",
+       "         [-0.7808, -2.0984,  1.7691],\n",
+       "         [ 0.9370,  0.0141,  0.6934]]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "# local arrays after redistribution\na.larray\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[[-0.1776, -0.8116, -0.6636],\n         [ 0.3238,  2.4110,  0.4005],\n         [-0.7808, -2.0984,  1.7691],\n         [ 0.9370,  0.0141,  0.6934]]])"
+       },
+       "execution_count": 45,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:48.891730Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# local arrays after redistribution\n",
+    "a.larray"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:28]: \u001b[0m\n",
+       "tensor([[7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3]])"
+       },
+       "execution_count": 28,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:49.681796Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:28]: \u001b[0m\n",
+       "tensor([[7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3]])"
+       },
+       "execution_count": 28,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:49.682052Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:28]: \u001b[0m\n",
+       "tensor([[7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3]])"
+       },
+       "execution_count": 28,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:49.682295Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:46]: \u001b[0m\n",
+       "tensor([[7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3],\n",
+       "        [7, 1, 3]])"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "tensor([[7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3],\n        [7, 1, 3]])"
+       },
+       "execution_count": 46,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:49.681493Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# resplit\n",
+    "a.resplit_(axis=1)\n",
+    "\n",
+    "a.lshape_map"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:47]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+       },
+       "execution_count": 47,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:53.077278Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:29]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+       },
+       "execution_count": 29,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:53.077581Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:29]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+       },
+       "execution_count": 29,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:53.077833Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:29]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (7, 4, 3), Split: None, Local Shape: (7, 4, 3), Device: cpu:0, Dtype: float32)>"
+       },
+       "execution_count": 29,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:20:53.078115Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# \"un-split\" distributed array\n",
+    "a.resplit_(axis=None)\n",
+    "# each process now holds a copy of the entire array"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "# make `a` split again\n",
+    "a = ht.array(a, split=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Making disjoint data into a global DNDarray\n",
+    "\n",
+    "Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend.\n",
+    "\n",
+    "You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be \"joined\" into a global, distributed DNDarray."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:49]: \u001b[0m(12, 4)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "(12, 4)"
+       },
+       "execution_count": 49,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:21:07.545019Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:31]: \u001b[0m(12, 4)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "(12, 4)"
+       },
+       "execution_count": 31,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:21:07.545093Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:31]: \u001b[0m(12, 4)"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "(12, 4)"
+       },
+       "execution_count": 31,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:21:07.545314Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:31]: \u001b[0m(12, 4)"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:21:07.555075Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "(12, 4)"
+       },
+       "execution_count": 31,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "cf6f5092-7287c6a9544d4c34e4c3830f_231404_59",
+      "outputs": [],
+      "received": "2025-05-19T19:21:07.560600Z",
+      "started": "2025-05-19T19:21:07.547257Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:21:07.545116Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# create some random local arrays on each process\n",
+    "import numpy as np\n",
+    "local_array = np.random.rand(3, 4)\n",
+    "\n",
+    "# join them into a distributed array\n",
+    "a_0 = ht.array(local_array, is_split=0)\n",
+    "a_0.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `ht.array` function takes any data object as an input that can be converted to a torch tensor. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can access the MPI communication functionalities of the DNDarray through the `comm` attribute, i.e.:\n",
+    "\n",
+    "```python\n",
+    "# these are just examples, this cell won't do anything\n",
+    "a.comm.Allreduce(a, b, op=MPI.SUM)\n",
+    "\n",
+    "a.comm.Allgather(a, b)\n",
+    "a.comm.Isend(a, dest=1, tag=0)\n",
+    "```\n",
+    "\n",
+    "etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the next notebooks, we'll show you how we use Heat's distributed-array infrastructure to scale complex data analysis workflows to large datasets and high-performance computing resources.\n",
+    "\n",
+    "- [Data loading and preprocessing](3_loading_preprocessing.ipynb)\n",
+    "- [Matrix factorization algorithms](4_matrix_factorizations.ipynb)\n",
+    "- [Clustering algorithms](5_clustering.ipynb)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev-311",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb b/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb
new file mode 100644
index 0000000000..9db5a38216
--- /dev/null
+++ b/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb
@@ -0,0 +1,488 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Loading and Preprocessing\n",
+    "\n",
+    "### Refresher\n",
+    "\n",
+    "Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. \n",
+    "\n",
+    "As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each \"worker\" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.\n",
+    "\n",
+    "In other words: \n",
+    "- you don't have to worry about optimizing data chunk sizes; \n",
+    "- you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; \n",
+    "- you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### I/O\n",
+    "\n",
+    "Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).\n",
+    "\n",
+    "This particular example data set (generated from all Asteroids from the [JPL Small Body Database](https://ssd.jpl.nasa.gov/sb/)) is really small, but it allows to demonstrate the basic functionality of Heat. \n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your ipcluster should still be running (see the [Intro](0_setup/0_setup_local.ipynb)). Let's test it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[0, 1, 2, 3]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.ids"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The above cell should return [0, 1, 2, 3].\n",
+    "\n",
+    "Now let's import `heat` and load the data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:54]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (1797, 64), Split: 0, Local Shape: (450, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (1797, 64), Split: 0, Local Shape: (450, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 54,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:24:32.711141Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:36]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 36,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:24:32.711423Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:36]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 36,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:24:32.711532Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:36]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 36,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:24:32.711290Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "import heat as ht\n",
+    "import sklearn\n",
+    "import sklearn.datasets\n",
+    "\n",
+    "X,_ = sklearn.datasets.load_digits(return_X_y=True)\n",
+    "X = ht.array(X, split=0)\n",
+    "X\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have loaded the entire data onto 4 MPI processes. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data exploration\n",
+    "\n",
+    "Let's get an idea of the size of the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] X is a 2-dimensional array with shape(1797, 64)\n",
+       "X takes up 0.920064 MB of memory.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px \n",
+    "# print global metadata once only\n",
+    "if X.comm.rank == 0:\n",
+    "    print(f\"X is a {X.ndim}-dimensional array with shape{X.shape}\")\n",
+    "    print(f\"X takes up {X.nbytes/1e6} MB of memory.\")\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "X is a matrix of shape *(datapoints, features)*. \n",
+    "\n",
+    "To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Mean: DNDarray([0.0000e+00, 3.0384e-01, 5.2048e+00, 1.1836e+01, 1.1848e+01, 5.7819e+00, 1.3623e+00, 1.2966e-01, 5.5648e-03,\n",
+       "          1.9939e+00, 1.0382e+01, 1.1979e+01, 1.0279e+01, 8.1758e+00, 1.8464e+00, 1.0796e-01, 2.7824e-03, 2.6016e+00,\n",
+       "          9.9032e+00, 6.9928e+00, 7.0979e+00, 7.8063e+00, 1.7885e+00, 5.0083e-02, 1.1130e-03, 2.4697e+00, 9.0913e+00,\n",
+       "          8.8214e+00, 9.9271e+00, 7.5515e+00, 2.3178e+00, 2.2259e-03, 0.0000e+00, 2.3395e+00, 7.6672e+00, 9.0718e+00,\n",
+       "          1.0302e+01, 8.7440e+00, 2.9093e+00, 0.0000e+00, 8.9037e-03, 1.5838e+00, 6.8815e+00, 7.2282e+00, 7.6722e+00,\n",
+       "          8.2365e+00, 3.4563e+00, 2.7268e-02, 7.2343e-03, 7.0451e-01, 7.5070e+00, 9.5392e+00, 9.4162e+00, 8.7585e+00,\n",
+       "          3.7251e+00, 2.0646e-01, 5.5648e-04, 2.7935e-01, 5.5576e+00, 1.2089e+01, 1.1809e+01, 6.7641e+00, 2.0679e+00,\n",
+       "          3.6450e-01], dtype=ht.float32, device=cpu:0, split=None)\n",
+       "Var: DNDarray([0.0000e+00, 8.2254e-01, 2.2596e+01, 1.8043e+01, 1.8371e+01, 3.2090e+01, 1.1055e+01, 1.0756e+00, 8.8728e-03,\n",
+       "          1.0210e+01, 2.9376e+01, 1.5812e+01, 2.2861e+01, 3.6618e+01, 1.2855e+01, 6.8506e-01, 3.8876e-03, 1.2783e+01,\n",
+       "          3.2367e+01, 3.3652e+01, 3.8118e+01, 3.8385e+01, 1.0621e+01, 1.9226e-01, 1.1117e-03, 9.8952e+00, 3.8320e+01,\n",
+       "          3.4590e+01, 3.7827e+01, 3.4468e+01, 1.3582e+01, 2.2210e-03, 0.0000e+00, 1.2106e+01, 3.9979e+01, 3.9271e+01,\n",
+       "          3.5187e+01, 3.4445e+01, 1.2505e+01, 0.0000e+00, 2.1067e-02, 8.8863e+00, 4.2721e+01, 4.1468e+01, 3.9160e+01,\n",
+       "          3.2421e+01, 1.8747e+01, 9.4415e-02, 4.1684e-02, 3.0474e+00, 3.1843e+01, 2.7306e+01, 2.8096e+01, 3.6355e+01,\n",
+       "          2.4187e+01, 9.6851e-01, 5.5617e-04, 8.7243e-01, 2.6026e+01, 1.9127e+01, 2.4330e+01, 3.4798e+01, 1.6723e+01,\n",
+       "          3.4581e+00], dtype=ht.float64, device=cpu:0, split=None)\n",
+       "Max: DNDarray([ 0.,  8., 16., 16., 16., 16., 16., 15.,  2., 16., 16., 16., 16., 16., 16., 12.,  2., 16., 16., 16., 16., 16.,\n",
+       "          16.,  8.,  1., 15., 16., 16., 16., 16., 15.,  1.,  0., 14., 16., 16., 16., 16., 14.,  0.,  4., 16., 16., 16.,\n",
+       "          16., 16., 16.,  6.,  8., 16., 16., 16., 16., 16., 16., 13.,  1.,  9., 16., 16., 16., 16., 16., 16.], dtype=ht.float64, device=cpu:0, split=None)\n",
+       "Min: DNDarray([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
+       "          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=ht.float64, device=cpu:0, split=None)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "features_mean = ht.mean(X,axis=0)\n",
+    "features_var = ht.var(X,axis=0)\n",
+    "features_max = ht.max(X,axis=0)\n",
+    "features_min = ht.min(X,axis=0)\n",
+    "\n",
+    "if ht.MPI_WORLD.rank == 0:\n",
+    "    print(f\"Mean: {features_mean}\")\n",
+    "    print(f\"Var: {features_var}\")\n",
+    "    print(f\"Max: {features_max}\")\n",
+    "    print(f\"Min: {features_min}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Preprocessing/scaling\n",
+    "\n",
+    "Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated.\n",
+    "\n",
+    "Again, please let us know if you're missing any features."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Standard Scaler Mean: \n",
+       "Standard Scaler Var: \n",
+       "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Robust Scaler Mean: \n",
+       "Robust Scaler Median: \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Standard Scaler Mean: \n",
+       "Standard Scaler Var: \n",
+       "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Robust Scaler Mean: \n",
+       "Robust Scaler Median: \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Standard Scaler Mean: \n",
+       "Standard Scaler Var: \n",
+       "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Robust Scaler Mean: \n",
+       "Robust Scaler Median: \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Standard Scaler Mean: DNDarray([ 0.0000e+00, -1.0710e-08, -1.1292e-08,  3.9116e-08,  1.0431e-07, -4.6566e-08, -7.4506e-09, -1.8626e-09,\n",
+       "           0.0000e+00, -1.0710e-08, -6.3796e-08, -1.1176e-08, -1.1502e-07, -5.9605e-08, -2.2352e-08,  1.8626e-09,\n",
+       "          -2.7940e-09, -1.6764e-08, -9.5344e-08,  5.5879e-08,  1.3970e-08,  5.5181e-08, -2.9802e-08, -7.4506e-09,\n",
+       "           9.3132e-10, -1.6764e-08,  6.4261e-08, -3.9116e-08, -6.7055e-08, -6.2399e-08, -2.1420e-08, -1.8626e-09,\n",
+       "           0.0000e+00,  0.0000e+00,  2.9802e-08, -9.0338e-08, -1.3970e-09,  3.5390e-08,  2.6077e-08,  0.0000e+00,\n",
+       "          -1.8626e-09,  3.1199e-08, -2.3749e-08, -6.7055e-08, -2.8871e-08, -4.0978e-08, -3.0384e-08, -5.3551e-09,\n",
+       "          -2.7940e-09, -7.4506e-09, -1.0245e-08, -3.7253e-08, -3.7253e-09, -6.3330e-08, -1.8626e-09,  3.7253e-09,\n",
+       "          -2.7940e-09, -3.7253e-09, -4.0513e-08,  7.6252e-08,  8.9407e-08, -4.0978e-08,  7.4506e-09,  0.0000e+00], dtype=ht.float32, device=cpu:0, split=None)\n",
+       "Standard Scaler Var: DNDarray([0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n",
+       "          1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n",
+       "          1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n",
+       "          0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n",
+       "          1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], dtype=ht.float64, device=cpu:0, split=None)\n",
+       "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n",
+       "Robust Scaler Mean: DNDarray([ 0.0000e+00,  3.0384e-01,  1.5060e-01, -2.3283e-01, -2.3038e-01,  1.6199e-01,  1.3623e+00,  1.2966e-01,\n",
+       "           5.5648e-03,  6.6463e-01, -1.7974e-01, -1.4580e-01, -9.0081e-02, -6.8679e-02,  9.2321e-01,  1.0796e-01,\n",
+       "           2.7824e-03,  4.0039e-01, -2.0968e-01,  9.0251e-02,  9.1495e-02, -1.3833e-02,  5.9618e-01,  5.0083e-02,\n",
+       "           1.1130e-03,  3.6742e-01, -1.5906e-01, -9.8219e-02, -1.7274e-01,  4.5956e-02,  5.7944e-01,  2.2259e-03,\n",
+       "           0.0000e+00,  5.8486e-01, -2.3770e-02, -7.1401e-02, -2.6984e-01, -1.1418e-01,  3.1822e-01,  0.0000e+00,\n",
+       "           8.9037e-03,  7.9188e-01,  6.2962e-02,  1.6297e-02, -2.5213e-02, -7.6349e-02,  3.5090e-01,  2.7268e-02,\n",
+       "           7.2343e-03,  7.0451e-01, -4.4822e-02, -5.1196e-02, -5.8375e-02, -9.5501e-02,  3.8930e-01,  2.0646e-01,\n",
+       "           5.5648e-04,  2.7935e-01,  1.7307e-01, -1.8219e-01, -3.6515e-01,  6.3671e-02,  1.0339e+00,  3.6450e-01], dtype=ht.float32, device=cpu:0, split=None)\n",
+       "Robust Scaler Median: DNDarray([0.0000e+00, 8.2254e-01, 3.5306e-01, 7.2170e-01, 7.3486e-01, 2.6521e-01, 1.1055e+01, 1.0756e+00, 8.8728e-03,\n",
+       "          1.1344e+00, 3.6266e-01, 3.2269e-01, 3.5721e-01, 2.5429e-01, 3.2136e+00, 6.8506e-01, 3.8876e-03, 7.9893e-01,\n",
+       "          3.2367e-01, 2.7812e-01, 2.6471e-01, 1.9584e-01, 1.1801e+00, 1.9226e-01, 1.1117e-03, 6.1845e-01, 2.6611e-01,\n",
+       "          2.4021e-01, 2.6269e-01, 2.3936e-01, 8.4890e-01, 2.2210e-03, 0.0000e+00, 7.5664e-01, 2.0398e-01, 2.3237e-01,\n",
+       "          3.5187e-01, 2.8467e-01, 3.4737e-01, 0.0000e+00, 2.1067e-02, 2.2216e+00, 2.1796e-01, 2.1157e-01, 2.3171e-01,\n",
+       "          3.2421e-01, 3.8259e-01, 9.4415e-02, 4.1684e-02, 3.0474e+00, 2.6316e-01, 3.3711e-01, 2.8096e-01, 2.1512e-01,\n",
+       "          4.9361e-01, 9.6851e-01, 5.5617e-04, 8.7243e-01, 3.2131e-01, 7.6509e-01, 6.7584e-01, 2.4165e-01, 4.1808e+00,\n",
+       "          3.4581e+00], dtype=ht.float64, device=cpu:0, split=None)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "# Standard Scaler\n",
+    "scaler = ht.preprocessing.StandardScaler()\n",
+    "X_standardized = scaler.fit_transform(X)\n",
+    "standardized_mean = ht.mean(X_standardized,axis=0)\n",
+    "standardized_var = ht.var(X_standardized,axis=0)\n",
+    "print(f\"Standard Scaler Mean: {standardized_mean}\")\n",
+    "print(f\"Standard Scaler Var: {standardized_var}\")\n",
+    "\n",
+    "# Robust Scaler\n",
+    "scaler = ht.preprocessing.RobustScaler()\n",
+    "X_robust = scaler.fit_transform(X)\n",
+    "robust_mean = ht.mean(X_robust,axis=0)\n",
+    "robust_var = ht.var(X_robust,axis=0)\n",
+    "\n",
+    "print(f\"Robust Scaler Mean: {robust_mean}\")\n",
+    "print(f\"Robust Scaler Median: {robust_var}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data. Check out our dedicated \"clustering\" notebook for an example.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)! "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev-311",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/tutorials/local/5_matrix_factorizations.ipynb b/doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb
similarity index 56%
rename from tutorials/local/5_matrix_factorizations.ipynb
rename to doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb
index 0abbd6f0ae..3a862220e4 100644
--- a/tutorials/local/5_matrix_factorizations.ipynb
+++ b/doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb
@@ -92,7 +92,15 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 engines found\n"
+     ]
+    }
+   ],
    "source": [
     "from ipyparallel import Client\n",
     "rc = Client(profile=\"default\")\n",
@@ -108,11 +116,151 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:41]: \u001b[0m<DNDarray(MPI-rank: 1, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 1, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 41,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:27:27.875170Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:41]: \u001b[0m<DNDarray(MPI-rank: 2, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 2, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 41,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:27:27.875244Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:59]: \u001b[0m<DNDarray(MPI-rank: 0, Shape: (1797, 64), Split: 0, Local Shape: (450, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 0, Shape: (1797, 64), Split: 0, Local Shape: (450, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 59,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:27:27.874886Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:41]: \u001b[0m<DNDarray(MPI-rank: 3, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:27:27.893702Z",
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "<DNDarray(MPI-rank: 3, Shape: (1797, 64), Split: 0, Local Shape: (449, 64), Device: cpu:0, Dtype: float64)>"
+       },
+       "execution_count": 41,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "09810356-47db3eecea6fcfe880a7f49c_231811_4",
+      "outputs": [],
+      "received": "2025-05-19T19:27:27.898332Z",
+      "started": "2025-05-19T19:27:27.879051Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:27:27.875269Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "%%px\n",
     "import heat as ht\n",
-    "X = ht.load_hdf5(\"/p/scratch/training2404/data/JPL_SBDB/sbdb_asteroids.h5\",dataset=\"data\",split=0).T"
+    "import sklearn\n",
+    "import sklearn.datasets\n",
+    "\n",
+    "X,_ = sklearn.datasets.load_digits(return_X_y=True)\n",
+    "X = ht.array(X, split=0)\n",
+    "X"
    ]
   },
   {
@@ -133,7 +281,49 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] relative residual:  rank:  55\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] relative residual:  rank:  55\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] hSVD level 0...\t processes  0\t\t1\t\t2\t\t3\n",
+       "              current ranks: 55\t\t56\t\t58\t\t56\n",
+       "hSVD level 1...\t processes  0\t\t2\n",
+       "              current ranks: 57\t\t59\n",
+       "hSVD level 2...\t processes  0\n",
+       "relative residual: DNDarray(0.0085, dtype=ht.float64, device=cpu:0, split=None) rank:  55\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] relative residual:  rank:  55\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "%%px\n",
     "# compute truncated SVD w.r.t. relative tolerance \n",
@@ -152,7 +342,47 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] relative residual:  rank:  3\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] relative residual:  rank:  3\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] hSVD level 0...\t processes  0\t\t1\t\t2\t\t3\n",
+       "              current ranks: 8\t\t8\t\t8\t\t8\n",
+       "hSVD level 1...\t processes  0\n",
+       "relative residual: DNDarray(0.5713, dtype=ht.float64, device=cpu:0, split=None) rank:  3\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] relative residual:  rank:  3\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "%%px\n",
     "# compute truncated SVD w.r.t. a fixed truncation rank \n",
@@ -206,10 +436,24 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev-311",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/doc/source/tutorials/notebooks/5_clustering.ipynb b/doc/source/tutorials/notebooks/5_clustering.ipynb
new file mode 100644
index 0000000000..2a603bad6d
--- /dev/null
+++ b/doc/source/tutorials/notebooks/5_clustering.ipynb
@@ -0,0 +1,776 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Cluster Analysis\n",
+    "================\n",
+    "\n",
+    "This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html). \n",
+    "\n",
+    "We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation.\n",
+    "\n",
+    "We will use matplotlib for visualization of data and results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 engines found\n"
+     ]
+    }
+   ],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.ids\n",
+    "\n",
+    "if len(rc.ids) == 0:\n",
+    "    print(\"No engines found\")\n",
+    "else:\n",
+    "    print(f\"{len(rc.ids)} engines found\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%px import heat as ht\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Spherical Clouds of Datapoints\n",
+    "------------------------------\n",
+    "For a simple demonstration of the clustering process and the differences between the algorithms, we will create an\n",
+    "artificial dataset, consisting of two circularly shaped clusters positioned at $(x_1=2, y_1=2)$ and $(x_2=-2, y_2=-2)$ in 2D space.\n",
+    "For each cluster we will sample 100 arbitrary points from a circle with radius of $R = 1.0$ by drawing random numbers\n",
+    "for the spherical coordinates $( r\\in [0,R], \\phi \\in [0,2\\pi])$, translating these to cartesian coordinates\n",
+    "and shifting them by $+2$ for cluster ``c1`` and $-2$ for cluster ``c2``. The resulting concatenated dataset ``data`` has shape\n",
+    "$(200, 2)$ and is distributed among the ``p`` processes along axis 0 (sample axis)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "\n",
+    "num_ele = 100\n",
+    "R = 1.0\n",
+    "\n",
+    "# Create default spherical point cloud\n",
+    "# Sample radius between 0 and 1, and phi between 0 and 2pi\n",
+    "r = ht.random.rand(num_ele, split=0) * R\n",
+    "phi = ht.random.rand(num_ele, split=0) * 2 * ht.constants.PI\n",
+    "\n",
+    "# Transform spherical coordinates to cartesian coordinates\n",
+    "x = r * ht.cos(phi)\n",
+    "y = r * ht.sin(phi)\n",
+    "\n",
+    "\n",
+    "# Stack the sampled points and shift them to locations (2,2) and (-2, -2)\n",
+    "cluster1 = ht.stack((x + 2, y + 2), axis=1)\n",
+    "cluster2 = ht.stack((x - 2, y - 2), axis=1)\n",
+    "\n",
+    "data = ht.concatenate((cluster1, cluster2), axis=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's plot the data for illustration. In order to do so with matplotlib, we need to unsplit the data (gather it from\n",
+    "all processes) and transform it into a numpy array. Plotting can only be done on rank 0.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "data_np = ht.resplit(data, axis=None).numpy()  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:65]: \u001b[0m[<matplotlib.lines.Line2D at 0x74dbde6c9ad0>]"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "import matplotlib.pyplot as plt\nplt.plot(data_np[:,0], data_np[:,1], 'bo')\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "[<matplotlib.lines.Line2D at 0x74dbde6c9ad0>]"
+       },
+       "execution_count": 65,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:28:37.872913Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[output:0]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA5TUlEQVR4nO3df5BV9X3/8ddlE1aJsHFxRWAvgjSTjNNqOyYmkDCFSGNtm+BsFo02SoxxqkEj0mAkJkE6dbCEBIwxUWMCnYnrL1iync5YjRbQjtFEE6apiU5kYIBFZYFmF0lniXfv94/zPbt3754fn/Prfs7d+3zM3Nns3XPvOffg5PO+n8/7834XyuVyWQAAABZMsH0BAACgcRGIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACseZftCwgyNDSkQ4cOafLkySoUCrYvBwAAGCiXyzp+/LhmzJihCROC5zxyHYgcOnRIxWLR9mUAAIAYDhw4oPb29sBjch2ITJ48WZLzQaZMmWL5agAAgImBgQEVi8XhcTxIrgMRdzlmypQpBCIAANQZk7QKklUBAIA1BCIAAMAaAhEAAGBNpoHI97//fZ133nnDOR7z5s3TE088keUpAQBAHck0EGlvb9ddd92ll19+WS+99JI+/vGPa8mSJXrllVeyPC0AAKgThXK5XK7lCVtbW/XNb35T1157beixAwMDamlpUX9/P7tmAACoE1HG75pt3y2VSnr88cd14sQJzZs3z/OYwcFBDQ4ODv8+MDBQq8sDAAAWZJ6s+utf/1qnnXaampubdf3112v79u0699xzPY9dt26dWlpahh9UVQUAYHzLfGnm5MmT2r9/v/r7+7V161Y9+OCD2rVrl2cw4jUjUiwWWZoBAFhVKknPPSe98YY0fbq0YIHU1GT7qvIrytJMzXNEFi9erLlz5+r+++8PPZYcEQCAbd3d0s03SwcPjjzX3i7dfbfU0WHvuvIsyvhd8zoiQ0NDo2Y9AADIq+5uqbNzdBAiSb29zvPd3XauazzJNFl19erVuuSSSzRr1iwdP35cXV1d2rlzp5588sksTwsAQGKlkjMT4rVuUC5LhYK0YoW0ZAnLNElkGogcPnxYV199td544w21tLTovPPO05NPPqm/+qu/yvK0AAAk9txzY2dCKpXL0oEDznELF9bsssadTAORH/7wh1m+PQAAmXnjjXSPgzd6zQAA4GH69HSPgzcCEQAAPCxY4OyOKRS8/14oSMWicxziIxABAMBDU5OzRVcaG4y4v2/aRKJqUgQiAAD46OiQtm6VZs4c/Xx7u/M8dUSSq1mvGQAA6lFHh7NFl8qq2SAQAQAgRFMTW3SzQiACAMhcnF4t9HdpDAQiAIBMxenVUk/9XQiYkiFZFQAwrFSSdu6UHn7Y+VkqJXu/OL1a6qm/S3e3NHu2tGiRdOWVzs/Zs/N1jXlX8+67UdB9FwBqJ+1ZiFLJGZT9yqQXCs777907MoMQ5zW2uAFT9Sjqbu3Nw64aW7M1ue6+CwDInyxmIaL0aknymjREnQkKa4gnOQ3xks4oJVEvszUEIgDQ4LIaVOP0arHR3yXOgJ11wJR0iayelrcIRACgwWU1qMbp1VLr/i5xB+wsA6akMxn1MFtTiUAEABpcVoNqnF4tYa+RnByHI0eiXYuXJAN2VgFTGjMZtpa34iIQAYAGl9WgGqdXS+Vr/JRK0mWXJV9eSDJgZ9EQL62ZDBvLW0kQiABAg8uyy2ycXi0dHdKjj4bv7ki6vJBkwM6iIV5aMxm1Xt5KikAEABpc1l1mOzqkffukHTukri7n5969wVtbf/vb4CDDZFAOS/hMOmCn3RAvrZmMLAPLLFBZFQAwPKh61RHZtCl5PYwovVq6u6U1a8yO9RuUTWqiuAN2b6/3cohbsyRowE6zIV5aMxluYNnZ6XyGys+WRmCZNgqaAQCG2S5XHlbQrNqOHWMDnCiFxtxjJe8BO+uiZJX3+8wzpc99zj8wkqS2NufeTJwY/t5ewVixmE5gGSbK+E0gAgDIjZ07ne2qJorFsRVW41RmtTVge5136lTp6NGxMxmVolS7rYfKqizNAAByI8pODq/lhSgJn+5MSprLK6b8Zm2OHXN+trY6AYkXdyuvyWxNlCUxWwhEAAC5YZonsXat9yAcN+GzlgN22DbdQkE65RTpjDO866W4x6xY4QRQecn1iItABACQG2EJpJLz99tv9/5bWgmf1bkbknT4cDqzJSazNr29we/hNbNTrwhEAAC5EbTjw9XZ6QzA1QFBqeQ8WltHljiqmeyE8crdqJSkI7GUbiGxvBQlS4I6IgCAXPGrz+EGHZs2je2/4vZnWbw4OAhxX+83o+FXYr1S0sZxaRYSy0tRsiTYNQMAyCV3eaSnxwkeqrmBxZe/LG3Y4L+U4wrbCRNl67DX7htT7nmC6pfMnOn87dCh4Boncc5fC1HGb2ZEAAC51NTkLKFs3er993LZeXz728FBSGur9PTT4dVcw3I3qs8dt3GcSSXbu++WvvOd4GPyVJQsCQIRAEBumQQHYf1mjh1zBuywQTtOvkXcHA2T8vBpl5DPK5JVAQC51dOTzvuYBAxx8i2S5GiY1C+xUeOk1ghEAAC5VCpJDz2UznuZBAwmW4ddJrtvTATVL6muinrZZeMrAHERiADAOGa7d0wSzz0n9fWFHzdhwki+SDWvgMHvnphsHXbfU8o2R8Okad94QY4IAIxT7pbWRYukK68cu+U170zzL/7mb5yfJkmdYffELy+jUnu79OijThLsww87/XHC8lSi8NtCnHTbcF6xfRcAxqEoHWjzyrQB3saNTnBwyy3Bjeui3JOgyqpHjow9V1qzFXGa9uUR3XcBoIElHczyspwTVm+jUnu7s423rc37utMa4LMO8EyDrx078l3anToiANDAonSgrZan5ZymJme2w+Trcm+vdPnlzlbdK65wBunKgCLJPXGFNauTnEZ0SZZp4jbtc5VKTjCTxZJRVghEAGCciTuY5S03obvbWQIxERYIJB3gpXSCmTBJmvZFDSLzErQQiADAOBNnMKvFt/0oTHq+VAsKBOLek8qBOqwjritJIzp3C3F14q2rUHByX6q3DUcNIvM088X2XQAYZ8LqYXhtaTX9tn/PPdK0adnkjri5Kb29zkxI3AxGr0Agyj0plaQ773SSTysb6LW1mZ0/SZGzoC3EftuGw4LIQsEJIpcscV7nl+fiBi21TmRmRgQAxhmTXibVg5npt/hbbslm2r/yG/pnP2tWP8SPVyBgek96epxAa82asV18jxwJPq/fbEVUUUu7R1kyytvMl0QgAgDjUtTBLM63+LSm/eMsw3hxZzVKJe8AKOyeSNKnPy0dPer9/kEzNGkXOevokPbtc3bHdHU5P/2a9kXJf6lFnktUbN8FgHHMdCtulK2ylaq3vUbd3hq2rTbKdZTL0tSpowMJt75HZb+W6rog7gxGlOs444zRMyQ2q55G2fL7xhtOcBimq8vZfRRXlPGbHBEAGMeCeplUH2dS3rxa5TfoBQui5SpIZt11/a63crajtdUJQKpnM3p7nVkOvwDFvTf/9E/RrmNoKPo1ZyVK/ovpTEeSPJeoWJoBAEgyK2/uJ+60f9QdJoWC83j44ZFli6eflk491f+ckneA4i4rdXc7OSFRVOeP2Cy/HiUnqK8veOkorTyXKAhEAADDqnMTNm40e9306fFqdUT95t3a6gRLS5c6sxmXXSb9+tfRZ1XcAOXmm51HUrYSPV0mOUHd3U7Rt7Dry7KZnxeWZgAAo1Qu55RK0re+ld20f9iyQrXKmQivDrVRlMvJc1Oq38+d8bFRfr2jY3QuTGX+yzPPSNddF3yPm5qkRx6pfZ4LgQgAwFeUuhZx6pfEyU1xZx0uvzx+rZEsJSlollR1TlCUYK1UcpJwa42lGQBAINOtwEG5CpITNHzhC+bv78WddfjiF/MZhEi1TfQMEmdbtI0giu27AAAjpluBw76F+211LZWkO+6Q/vmfU7/0mjDt4FsLcbdFp9XVNzfdd9etW6cPfehDmjx5ss4880xdeumleu2117I8JQAgI+60v1d320puwuvatd5/99th0tQkXXRRihecskLB2Qbs7typ/ptU+0RPP1G3RdvYLePKNBDZtWuXli9frhdeeEE//elP9cc//lGf+MQndOLEiSxPCwDIgR/8wPv5ctl5eO0wMWn6ZtrzJaqgL+7u9TzwQLSKtabS7oQbZYnFehBVrqHDhw+XJZV37dpldHx/f39ZUrm/vz/jKwMApGnHDjfcCH7s2DH2tdu2lcuFgvOoPNZ97vHHy+UzzjB7/yiP1tZy+emny+UVK8rltrbRfysWnetyvfOOc+1dXc7Pd94Jfj7Itm3lcnv76PO1t48+X1b33+uzpSHK+F3TZNX+/n5JUmtrq+ffBwcHNTAwMOoBALAn7jd102/k27ePfS4sObaz02mMl7Zjx5wZgY0bnesP6vPitUwVtceO5J9QmrRAWtjMkuQsMz39tH8Pm5pJNwbyVyqVyn/7t39b/uhHP+p7zJo1a8qSxjyYEQGA2kvyTd30G/mECc4Mh5eg2YUo3/ijPH784/j3qnoGp3IWx+uevfPO2Ptb/dpi0WxWJeia/GaW0p4FqRRlRqRmu2ZuuOEGPfHEE/qv//ovtbe3ex4zODiowcHB4d8HBgZULBbZNQMANRa1eV21Ukk666zRjeGCbNsW7Vt53CZ9YSZPln74QycPJWx3UPW1+CWH+u2midKsLu5OFq8dTMWikw+S5SxIbnbNuG688Ub9+7//u3bs2OEbhEhSc3OzpkyZMuoBAKitUim4eZ0UXsq8qSna8kmc0uhhlULjOH7cKRsfZXklTo8dKV5J/KiqS/Z7LTPZlmll1XK5rJtuuknbt2/Xzp07NWfOnCxPBwCQeb0PP1EG1qBv6kuWON+8TUQpjZ60tHtUbr5G5SxQ5T3+zW/M3qc6oDAtfBalQJrfv72NkvOmMg1Eli9frq6uLvX09Gjy5Ml68803JUktLS061a9VIgAgNq9B2q+AmJ8439S9BkA3YdI0YDA5r9+SUZbKZWd5ZcUKJ7jq6YkXCFUHFHFK4gdJ49/eiuxSVcqeiaeSyps3bzZ6Pdt3AcBcnIRJL1G33gYltW7bZp4o6rWVt1JYcmctHmvXet/joEdQ0mlaCaVp/dunJZfJqnFQ4h0AzMRNmAx6r7Bv6nv3OrMDYUmtpZKzzdUvB8T02kyTO7PU2jq6A7CpoGTcpAmlaf7bpyV3yaoAgGzFTZj045cIWlmFUzJLau3ocNrLe4lS1TONhmwf+ECy18cJQqZOdZZ0/CRNKE37377WCEQAYBxIaweGW5RrzRrvv1eWMo8yAHZ2OrMC1Rsno5RGT6OrbdxciULBmQ2J4+jR8CDAtI+Pl1rsvskSgQgAjANp7MAIaxu/du3ob+pRB8Ck3/wXLIgfDLimTo3+Hu6szc03xz9vlkFAFrtvaolABADGAZNmcUHdVYNqh7ivf/DB0c/FGQCTfPNvakoWDEjStGnR32PmTGfW5vbbw8um+8kyCEj6b28bgQgAjANNTc42TSlei/o4eQYm/Uza2pyk1zQ6ykpOMJBk78LMmc57TJ1q/potW5xZm6B7HCTrICDpv71tBCIAME6ENYsLWgKJk2dgMjD39TkVVk0qlJpoaho7M2PKDQiamqQHHjB/3eHDI/+7o0P68pejBSK1CAKS/NvbRiACAONI3DyMuHkGfgOgl6QdZV1Ll0qrVpkfXyg4j8qAoKPDyXkxUfmZu7ulDRukoSGz17q7hmrB/bd/+mnpa19zHps3B+/YyQPqiAAAItUO8fp271ZW7e2VbrnFmQnxkmZNi61bpS9+cfS53CWXo0dHnvOqyVEqOctFS5dK//u/ZtcaVq/DS5KGdXF41SRpa5O+9z0nCKwV6ogAACJJmmfgJqHOnOkfhEjp1rTo7HSWiipnf956y3kEzQi5W5QXLw4OQqTRnzksj6b69bVOEPXb9dTX5wRct95au2uJItNeMwCA+uEus3j1KzGt8lnrmhZ+Dd38ZiFMe9V4febe3mjXVssE0bBdT5L0zW9KH/qQE5TkCYEIAGBYR4eTUxC3e2+ea1qYDNZTp0qPPuq9tThopqdSS4v0ox/Fyw2J2znZdLZm+fKRHUB5QSACABglSdv4tDvKpslksD561Pn8XgN1W5vZea66Kl4QkqR7rukMU1+fcx9qmbcShhwRAEBq8lzTIumykcnOIEn67nej7wzyy+8w3WkUZYYpb6XeCUQAAKnKa02LqMtG7s6ahx92fs6fP7ZXjpdCwdm2a1rALWjJqLJ5YND7LVhgPmOTt1LvbN8FAGQibr5DltdjukW5p8d7meSKK5ykTxOmW3d37nQKviV9v61bwxNRi8V0tk6HYfsuAMC6JH1lsroek2Wjnh7/ZZING6S/+zuz86W9gyjsuM7O4EJv1UXd8oJABADQMMKWjZYsCV8mefFFs3OlvYPI5Lj166XHHpPOOGP088Vifku9szQDAONY3pZH8sLvvpguk7S1SUeOxKtC63UtSara+r2nzX/3KOM323cBYJxKsh10vPPbomy6TPL3f+/cx0JhdPAQZ2eQu2TU2ZnO+7nvmactukFYmgGAcSjpdtBGZbpMsmRJujuD8rrTqBZYmgGAcSasOVuajefGm6jLJGkvgUR5P5NjbS3RsDQDAA0srIJoZeO5epm+N5HGoBt1mSTtJRDT9zNZdquXpTkCEQAYZ2rdeC4P0hx002j+F4UbQPX2OiXY29qcJRq/QMqvcZ+77LZ1q/N72DF5CUZYmgGAcSatAln1wm9gdmcw4g66tVjW8AqgXF6BlMmym5tnYnNpLsr4TSACAONMFttB86qe82H8AqhKhcLoQMo0yDSRZSBKZVUAaGB5bjyXtij5MHkS1F+mWmWfmTSX0/KyNEcgAgDjUKNsB63XfJiwAMpVHUil2bAuL83vSFYFgHGqo8Opd5GHyqpx8y3CXve735mdPy+DritqYOQev2CBE0wGLbu5wWfY0tyCBdGuISsEIgAwjuWhwubjj0tf/KJTEt1lsqMlbCdMqST94Afh58/ToOuKGhi5x5tsL3aX5dKs1JollmYAAJm59VbpsstGByGSE1wEVXg1qQxrurxx3XX5GXRd7sxGmELBaVhXGUiZLLvV09Icu2YAAJnYulVaujT4mGJx7I4W050w69ZJn/1s+HV0dUlXXGF82TUTZ9dMJSqrAgDgo1RylmPCeFV4Nd0J87OfmV1L3vJDXH6F01zFYnABNZNltzwszYUhEAEApO6555wqoSaqEzdNEznvvTf473lLyvRSmVBsWll1vCEQAQCkLsqukOoZizPPTH7+PCZl+qmHWYssEYgAAFJnuhzS1jZ6xqK7W/rSl5KfP6u+MFFV52jMny89/7z97dR5QiACAEiduyskbFfLvfeODMQmyZsmNm6UbrrJ/gDvtf24qWmkSqqUz264tcb2XQBA6tx6F9Ul5iutWjWyqyZKyfMw06blIwjx2n5cGYRIo7cjNyoCEQBAJtxdIdX1MtranCJn69ePPGdaE8SE7V0yUYIq95jKfjKNhqUZAEBmTMvMmya3nnaadOJEvkuXRw2qKvvJNGLSKoEIACBTJrtCTGcx3n7b+Znn0uVxG+zlrTFfrbA0AwCwzk1uDcopcU2dmu/S5XGXhmwvKdnCjAgAwDo3ufXTnw4/9uhR6emnndfkcRtsWIfcanlZUrKFGREAQC50dDhJmyYOH3aWe664wvmZlyBEGgmqpPAZnjwtKdlCIAIAyI0lS8yOy/syhl/32+pgI09LSrbQfRcAkBtu512/ZQ13GaO6Y29eNWplVbrvAgDqkrus0dmZ750xprx2DDXiFt0gLM0AAHLFb1mDZYzxKdNA5Nlnn9UnP/lJzZgxQ4VCQT/5yU+yPB0AYJzo6JD27ZN27JC6upyfe/cShIxHmS7NnDhxQueff74+//nPq4P/egAAEZgUQkP9yzQQueSSS3TJJZdkeQoAAFDHyBEBAADW5GrXzODgoAYHB4d/HxgYsHg1AAAga7maEVm3bp1aWlqGH8Vi0fYlAQCADOUqEFm9erX6+/uHHwcOHLB9SQAAIEO5Wpppbm5Wc3Oz7csAAAA1kmkg8vbbb+v1118f/n3v3r3avXu3WltbNWvWrCxPDQAA6kCmgchLL72kRYsWDf++cuVKSdKyZcu0ZcuWLE8NAADqQKaByMKFC5XjnnoAAMCyXCWrAgCAxkIgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsKYmgci9996r2bNn65RTTtGHP/xh/fznP6/FaQEAQM5lHog8+uijWrlypdasWaNf/vKXOv/883XxxRfr8OHDWZ8aAADkXOaByLe//W1dd911uuaaa3Tuuefqvvvu06RJk/SjH/0o61MDAICcyzQQOXnypF5++WUtXrx45IQTJmjx4sX62c9+Nub4wcFBDQwMjHoAAIDxK9NA5MiRIyqVSpo2bdqo56dNm6Y333xzzPHr1q1TS0vL8KNYLGZ5eQAAwLJc7ZpZvXq1+vv7hx8HDhywfUkAACBD78ryzc844ww1NTXprbfeGvX8W2+9pbPOOmvM8c3NzWpubs7ykgAAQI5kOiMyceJEXXDBBXrmmWeGnxsaGtIzzzyjefPmZXlqAABQBzKdEZGklStXatmyZfrgBz+oCy+8UJs2bdKJEyd0zTXXZH1qAACQc5kHIpdffrn6+vr0jW98Q2+++ab+/M//XP/xH/8xJoEVAAA0nkK5XC7bvgg/AwMDamlpUX9/v6ZMmWL7cgAAgIEo43euds0AAIDGQiACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwJrNA5M4779T8+fM1adIkvfe9783qNAAAoI5lFoicPHlSS5cu1Q033JDVKQAAQJ17V1ZvvHbtWknSli1bsjoFAACoc5kFInEMDg5qcHBw+PeBgQGLVwMAALKWq2TVdevWqaWlZfhRLBZtXxIAAMhQpEDktttuU6FQCHy8+uqrsS9m9erV6u/vH34cOHAg9nuhdkolaedO6eGHnZ+lku0rAgDUi0hLM//4j/+oz33uc4HHnHPOObEvprm5Wc3NzbFfj9rr7pZuvlk6eHDkufZ26e67pY4Oe9cFAKgPkQKRtrY2tbW1ZXUtqDPd3VJnp1Quj36+t9d5futWghEAQLDMckT279+v3bt3a//+/SqVStq9e7d2796tt99+O6tTooZKJWcmpDoIkUaeW7GCZRoAQLDMds184xvf0L/+678O//4Xf/EXkqQdO3Zo4cKFWZ0WNfLcc6OXY6qVy9KBA85x/HMDAPxkNiOyZcsWlcvlMQ+CkPHhjTfSPQ4A0JhytX0X9WP69HSPAwA0JgIRxLJggbM7plDw/nuhIBWLznEAAPghEEEsTU3OFl1pbDDi/r5pk3McAAB+CEQQW0eHs0V35szRz7e3s3UXAGAmV71mUH86OqQlS5zdMW+84eSELFjATAgAwAyBCBJramKLLgAgHgKRHCiV4s0oxH0dAAB5QSBiWdxeLfXS44VgCQAQhGTViNLsNOv2aqmuUOr2aunuTvd1tdbdLc2eLS1aJF15pfNz9uz8XB8AwL5CuezVLSQfBgYG1NLSov7+fk2ZMsX25aQ6C1EqOYOyX5n0QsF57717R88gxH1drfk1xHO39uZhVw2zNQCQjSjjNzMihtKehYjSqyWN1yURdRaoHhriMVsDAPlAIGIgi4E1bq+WWvd4iTNg1yJYSrJEVi9LWwDQCAhEDGQxsMbt1VLLHi9xB+ysg6Uksxn1MFsDAI2EQMRAFgNr3F4tYa+TnDyHI0fMr8VLkgE7y2Ap6WyGjaUtAIA/AhEDWQyscXu1VL7OT6kkXXZZsiWGJAN2Vg3x0pjNqPXSFgAgGIGIgawG1ri9Wjo6pEcfDd/hkWSJIcmAnVVDvDRmM2q5tAUACEcgYiDLTrMdHdK+fdKOHVJXl/Nz797wra2//W1wkBE2KIcleyYdsLNoiJfGbEZWQSUAIB4qqxpyB1avOiKbNiWriRG1V0t3t7RmjdmxXoOyST0Ud8Du7fVeCnHrlQQN2Gk3xEtjNsMNKjs7nc9Q+dmSBpUAgOgoaBaR7SJYYQXNqu3YMTrIiVJozD1W8h6wa1GUrPJ+n3mm9LnP+QdHktTW5tybiROD39crGCsWkweVAIBo4zeBSJ3ZudPZrmqiWBxdYTVOVVabA7bXuadOlY4eHTubUcm02q3toBIAxqso4zdLM3Umym6O6iWGKMme7ixK2ssrpvxmbo4dc362tjoBiRd3K2/YjE3UJTEAQPoIROqMaZ7E2rVjB+G4yZ61HrDDtukWCtIpp0hnnOFdL8U9ZsUKJ4hilgMA8otApM6EJZFKzt9vv330c6WS9NZbZucwCXaqczck6fDhdGZMTGZuenuD38NrdgcAkD8EInUmaNeHq7PTGYDdgMAr18KLyU4YKfz94nYkdqVZTIzCZACQb9QRqUN+NTrcWYhNm0b6r9x6q3dJ9GqmW1f9SqxXSto8Ls1iYhQmA4B8Y9dMHXOXR3p6nAAiCZOdMFG2DnvtwDHlnieohsnMmc7fDh0KrnMS5/wAgGSijN/MiNSxpiZnGeXxx5O9z8aNZtVcw3I3KiVpHmdSyfbuu6XvfCf4GAqTAUD+EYjUuTvvDE/cDDNtmtmAHSffIm6OhkmJ+CzKyAMAaotk1ToWpdR7kCwbwSXJ0QirYVIqOfVE7rpL6utzqqrOnElhMgCoJw0fiNRrdU231kYSprtkXCZbh+O+tx+/GiZB/XLq4d8PAOBo6KWZ7m4nKXLRIunKK0d2msTd7VFLUfI1pGh5FH6deYNyN0zfOw1+O3eS7tYBANRewwYi9T6YRcm9WLXKPI8iLDjzy8uofu9HH3WWTaqDmaTCqq5KTkXVtM4HAMhWQ27fjdP8LW9Mm999+tPSjTdK8+dLzz8fvAQVpTNvUGXVI0ekW27xXjZJmkBq+rmruw4DAGqH7rshkgxmeckpCau1US0sEEgrOIsSzMTx8MPOTE2Yri7piivinwcAEB91RELEbf6Wp5ySpiZnoDUNI8OWnKJ05vVTi2WTpDt8/PJfAAB2NGQgEmcwy1tOSXe3tGGD+fFhgUDc4KxSGsFMGHfnjl+ybKHgVIn12q0TJZAkYAGA2mjIQCTqYJaHBMnKgfGZZ/yvJ0hQIBAnOKserE0LqyVpRGdSddVrt06UQDJPM18AMO6Vc6y/v78sqdzf35/6e2/bVi4XCs7DGaKdh/vctm0jx+7YMfoYv8fTTzvHdnU5P995J71rbW83uwaTR1fX2HO8845zjur7UXlfisVyeXDQ+WwrVpTLbW2jj6n+3e+xY0c296RYHP3vVv3Z/K7H/WzvvDPy34XXMdX/XQAAvEUZvxu2oJm7DdWrKFZ18zfTb/CXXSYdOzb6vbwSRKMkvPolfybhNfvhzjR0djozC5Xnc2caPvMZae5c/+WXI0eCz5tWkTMpvOpqJdMlo507g2e+CgVn5mvJkvzupgKAetOwgYhkPpiZLltUBiHSyLR/5U6RoIqgXgFLnCUYP27X2lLJWU6p/rxBwdlnPuPkpARdS9Dfsihy5ld1tZppILlzp3mOC1uDASAdDR2ISGaDWZTS5pWqv0X39HjPbngFLFL06qlB3FmO//s/afHikefb253uu2ecMRKM7dkzuubI/PnSjBnRPvuUKdLAwOjzVM801UqSfjdekuS4AABGa8hk1ahMS5t7MZ32l8YmvMYZ8Nzrmzp19POtrc7Po0dHP3/woLR06ejEzLlzndmdK65wgrS77hr7ujCVQUhbm/Ttb9vrhmuanGw6y5F2YAMAjYxAxJBfaXN3gA8TZdrfFWfAa2+Xtm2T3nrLKcj24x+HL6lUq9xJUiqNBGFxHTni5M/Y2nViutNm4cKxAVz1sX5bgwEA8TT80kwUXjklpdLopY6kKmdBoi4JVc88HDsm3XZb9OWdyiWllpaxuS9R5SHR0y//ZeZM6brrpMFB6c47g2d+yuXsGvkBQKNqyBLvaQorte7uFNm82SxgqS4r7+6akcKDkcoy6lI6u22+9jXpn/852XtUst0DpnLH0u9+Jz3wgHn9k6lTnZkmAhEACEaJ9xqKMu0fpyKoSbdblxt03Hyz9KUvpbPbZmgo+XtUsp3o6SYnNzdLd9xhHoRIzmxJkqqwAICxCERS4BcstLeP7IQJC1jKZadT7nPPja3Q2tEh7dvnzCZ87WvB11IuO0sPUQbYIPff78wERE3S9ZOHRM8k26JtB1IAMN4QiKSkMljo6nJ+7t07eqeIX8Ay4f//K2za5F9O3P0mf+652X0GL0ePOo+ksyt5SvRMsi06D4EUAIwnmQUi+/bt07XXXqs5c+bo1FNP1dy5c7VmzRqdPHkyq1Na5wYL7rZXr1yCyoBlxQrnueoZkKBGerYGwtNOM1sekqL1gLEh7rbovARSADCeZBaIvPrqqxoaGtL999+vV155RRs3btR9992nr371q1mdsm40NTkDmptUWs3tcOLVSM+kJkZ7+8gsS1reflv6whektWv9z1soSKtWBS9Rxe1qm2Y33KjBXN4CKQAYVzLvfFNh/fr15Tlz5hgfn2XTO9tMG+l5NYgLa9i3dm16DfIqH1OnjjSGC2o498473s3/vF7X3h7eSC7u6/yENfirfvg10wMAeMtt07v+/n61BlQAGxwc1ODg4PDvA5XlOXMqSgO7SqbLA9u3j93uGtawr+IWpsrdNRLWo8erbL5f8z6/8vZJXxfEpMHfHXdI73tftH9TAEAMNQiMyuVyufy73/2uPGXKlPIDDzzge8yaNWvKksY88jojkuSbuumMyIQJ5fLjj3u/h9/Mg+l7x3ncfHP0++TOQPi9Z6HgzDq415/0dabCZnYAAPFEmRGJXNDstttu07/8y78EHvPb3/5WH/jAB4Z/7+3t1V/+5V9q4cKFevDBB31f5zUjUiwWc1nQzO+bemVRsaBv6qWSdNZZTvlzE9u2mX/zDyuylsSECdLXvy69//3mswU7dzq7gcJUFzuL+7oo4s5oAQD8RSloFjkQ6evr09GQDmjnnHOOJk6cKEk6dOiQFi5cqI985CPasmWLJkTIosxrZVV3oPfbAuomjO7dGzyo3XKLs5RiolgMf79KW7c6zeyy1t7uLHMEBUkPP+w01AvT1eXsOEr6Oj8EHQBQG1HG78g5Im1tbWprazM6tre3V4sWLdIFF1ygzZs3RwpCspR0QAqrQ1HZwC7om/qSJeaBiMn7ubq7nSCnFrxyNarv75lnmr1X9W4W090tJsd1d3vn1IQFUQCAbGWWrNrb26uFCxfq7LPP1oYNG9TX1zf8t7POOiur04ZKY0AyTTStPM4r+HG34poW1zI5r9+SUVaqG9r19Hjf36lTneZ5Qf14qmt0hDX983tdtSwSXgEAKckqUWXz5s2eiadRTpn29l1326tX0mOhYJ6kGHXrbVBS67Zt5omiXlt5K4Uld2b9WLvW//4GvS7o3odtVQ77N8s64RUAMFamyaq1lGaOSFp5HZXvFfZNfe9eZ4YgLKm1VHJyHPyKdJlem2lyZ1ZaW51Zj6hWrZLWr/f/u9csVrHoLGuFzWTUIuEVADAa3Xc9RMnrMHHddf5BiDSS++HXXM19bsUKZzB95BHv80Sp6plGQ7aPfjT+a+MEIZLz2YMqpZr08fETZxkNAFA7DROIpDUgdXc7syFr1nj/vbKceZTgp7PT2aLb3u7/fmHS6EMze3b01xQKzmxIXCYBoEkfHy9pJrwCANLXMIFIGgOSm/ToF1ysXTv6m3rU4CfJN3/JSdqcPNnsWD9RF+rcGZubb0523qxmJEx689DMDgDsaZhAJOmAVCr5L7O4r6+u1RYn+In7zd997cUXmx/vJeqMyMyZzozN7bcH398wWc1IuOXcpfx3BQaARtQwgUjSASlOjklY8CNJbW1O0mvSjrKu669P9vqPf3zs8lCQLVucGZug+xsm6xkJtzdPUFdgAIAdDROISMkGpDg5JiaDc1+f9NnPOjs7Zs92ln+SWLjQqdkRR7HovN69ZhOHD4/8744O6ctfdsrAR1GLGYmky14AgGw0VCAixR+Q4uaY+AU/XtwCW0mCkaYm6YEHor2mUHAebkCwZIl0zTVmr638vN3d0oYN0WZ21q6tXTDQ1OTMvEyf7tzre+6RHnoovdkoAEB0DVNHJKkotUO8vt27lVV7e53y6xWFZiO9jym/CrJXXOH0cPGryeH1OpPrDKvT4qW93QkKa5WfEfTZKPcOAOnJtOldLeUpEJFGds1Io4MR0467Um0LbPn11PF73rQ8vNfnjVJMLcr9SovJZysUyBkBgDRk2vSukbnLLF4zDSZVPqXaFthyd+CYPB+2K6iS1+eNcr0zZ9Z29sH0s5XLIz1z2EUDALVBIBJRR4czUMXt3pvXAlthu4JcGzdKN9009vNGud5rr40fhMTpnGz62aRoXY4BAMkRiMTgN9NgIq2OsmkzndGYNs174F+wwLzXzNq10nnnRQ9G4nZOjjq7RLl3AKidhts1Y1teC2wlnalpaopWXXXFimg7Vfyq2prsNIo6u0S5dwCoHZJVLUnSUTYLcXYFVS+TzJ8vzZghHT1qdk7ThNyknZPDPlulYjH5jiUAaHR0360DeSuwFXWmxm3+t2iRdOWVzs+5c6XPf978nKZLIEk7J1d+tiCVtVQAALVBIGJRkr4yWTCtPBu0TLJhg3T55WbnM10CSWOnkfvZ/MrXF4ts3QUAGxp+aSbOLozxLuiemCyTzJzpzFL09vofE6VoW5q1VyoLy/X1Ob1+Zs7k3x0A0kQdEUNxd2GMd0G7gkyWSQ4edHbG3HHHyHOuOAm5ae40SrLjCQCQvoZdmkmyC6ORmS6TvO996XW8zetOIwBAcg0ZiARV2nSfi7q9tFFE2eabZkJuks7JAID8asgckVr2e8mLtHJhkjb/SyrK5zA5lhwhAEgfOSIhatnvJQ/SzIVxl0k6O52gI2n+R1SmOR4mn5kcIQCwjxmRAONhRsSv62zSDri1LMgWdaeLyWeWsrkvAIBo43dDBiK2lxdqJWlFUpP3z3pZwyvgcXnNXphuL5ayuy8A0OiorBqiUXZhJK1IGibrgmx+O5tcBw+O3eFkur04y/sCADDXkIGI1Bi7MOo5FyZoZ1O1yh1OaX6WPN4XABhvGjJZ1dXRIS1ZUt+7JoKWR5J21LUpbGbDVTl7sXBhup8lj/cFAMabhg5EpHxU2jx5Uvre96Q9e5zGcV/8ojRxYvjrwnZ9HDkS/h7FollF0lqLOhvhHm9ShdWdBUujUisAIJmGXZrJi1tvlSZNkm65Rfrud52fkyY5zwcJqwz7+OPOe4X51rfyOQMUdTbCPd4k/+fuuxsjRwgA6gGBiEW33ip985tjK7iWSs7zfsGISWXY5cvNljba2qJdc624MxthCoWxszom+T+NkCMEAPWgIbfv5sHJk87MR1AZ+aYm6Q9/GLtMY1oHxURXl7PrJY/86oFUKhT8AwcqqwKAHVRWrQPf+154L5tSyTluxYrRz6e5myPPCZnurIVfHZGwAmom+T95yBECgEZGIGLJnj3xj0sjeKiXhMzKnU2mlVUBAPWDQMSSuXPjHVcqOY/WVunYsXjnrreETGYtAGD8IkfEkjg5IkHlzqPIU2O36hyN+fOl558nZwMA6hk5InVg4kRp5Upnd4yflStHByFhiZumtmyRLroo+fsk5RVYNTWNDs7yFDQBANLH9l2L1q+XVq0a+42/qcl5fv1653eTcuetrdJXv2p23sOH411vmvzqoFTPELl1USr7yQAAxg8CEcvWr3eWXzZulG680fn5hz+MBCGSWbnzY8ekQ4fMzml7p0yUPjLuMZX9ZAAA4wdLMzkwceLYLbqVTLfrbtkiTZ3qBCV5Ll1u2kfGVd1PBgAwfjAjUgfizGDkuXR53DoodMMFgPGHQKQOmJY7l6SjR6U77sh36fK4S0O2l5QAAOljaaYOuI3cPv1ps+Pf9z5p3778li4P65BbLS9LSgCA9DEjUic6OqS1a82OnT59pAjYFVc4P/MShEjBHXKr5WlJCQCQPgKROnL77WOXXCp5daLNK7/ut9XBRp6WlAAA6WNppo40NUnf+Y5TV0MavaxRjzMHlX1kqKwKAI2JEu91yKsiaVgnWgAAaoUS7+Oc10wCMwcAgHqUaY7Ipz71Kc2aNUunnHKKpk+frquuukqHTMt/IlCek1EBADCVaSCyaNEiPfbYY3rttde0bds27dmzR51uggMAAGh4Nc0R+bd/+zddeumlGhwc1Lvf/e7Q48kRAQCg/uQyR+TYsWN66KGHNH/+fN8gZHBwUIODg8O/DwwM1OryAACABZnXEfnKV76i97znPZo6dar279+vnp4e32PXrVunlpaW4UexWMz68gAAgEWRA5HbbrtNhUIh8PHqq68OH79q1Sr96le/0lNPPaWmpiZdffXV8lsNWr16tfr7+4cfBw4ciP/JAABA7kXOEenr69PRo0cDjznnnHM0ceLEMc8fPHhQxWJRzz//vObNmxd6LnJEAACoP5nmiLS1tamtrS3WhQ0NDUnSqDwQAADQuDJLVn3xxRf1i1/8Qh/72Md0+umna8+ePfr617+uuXPnGs2GAACA8S+zZNVJkyapu7tbF110kd7//vfr2muv1Xnnnaddu3apubk5q9MCAIA6ktmMyJ/92Z/pP//zPxO9h5u+wjZeAADqhztum6Sh5rrXzPHjxyWJbbwAANSh48ePq6WlJfCYXHffHRoa0qFDhzR58mQV3D73hgYGBlQsFnXgwAF23ETEvYuH+xYP9y0+7l083Ld4oty3crms48ePa8aMGZowITgLJNczIhMmTFB7e3ui95gyZQr/ocXEvYuH+xYP9y0+7l083Ld4TO9b2EyIK/PKqgAAAH4IRAAAgDXjNhBpbm7WmjVr2CocA/cuHu5bPNy3+Lh38XDf4snqvuU6WRUAAIxv43ZGBAAA5B+BCAAAsIZABAAAWEMgAgAArGmYQORTn/qUZs2apVNOOUXTp0/XVVddpUOHDtm+rFzbt2+frr32Ws2ZM0ennnqq5s6dqzVr1ujkyZO2Ly337rzzTs2fP1+TJk3Se9/7XtuXk2v33nuvZs+erVNOOUUf/vCH9fOf/9z2JeXes88+q09+8pOaMWOGCoWCfvKTn9i+pLqwbt06fehDH9LkyZN15pln6tJLL9Vrr71m+7Jy7/vf/77OO++84UJm8+bN0xNPPJHa+zdMILJo0SI99thjeu2117Rt2zbt2bNHnZ2dti8r11599VUNDQ3p/vvv1yuvvKKNGzfqvvvu01e/+lXbl5Z7J0+e1NKlS3XDDTfYvpRce/TRR7Vy5UqtWbNGv/zlL3X++efr4osv1uHDh21fWq6dOHFC559/vu69917bl1JXdu3apeXLl+uFF17QT3/6U/3xj3/UJz7xCZ04ccL2peVae3u77rrrLr388st66aWX9PGPf1xLlizRK6+8ks4Jyg2qp6enXCgUyidPnrR9KXVl/fr15Tlz5ti+jLqxefPmcktLi+3LyK0LL7ywvHz58uHfS6VSecaMGeV169ZZvKr6Iqm8fft225dRlw4fPlyWVN61a5ftS6k7p59+evnBBx9M5b0aZkak0rFjx/TQQw9p/vz5eve73237cupKf3+/WltbbV8GxoGTJ0/q5Zdf1uLFi4efmzBhghYvXqyf/exnFq8MjaK/v1+S+P+0CEqlkh555BGdOHFC8+bNS+U9GyoQ+cpXvqL3vOc9mjp1qvbv36+enh7bl1RXXn/9dd1zzz36h3/4B9uXgnHgyJEjKpVKmjZt2qjnp02bpjfffNPSVaFRDA0NacWKFfroRz+qP/3TP7V9Obn361//Wqeddpqam5t1/fXXa/v27Tr33HNTee+6DkRuu+02FQqFwMerr746fPyqVav0q1/9Sk899ZSampp09dVXq9yAhWWj3jdJ6u3t1V//9V9r6dKluu666yxduV1x7huAfFq+fLn+53/+R4888ojtS6kL73//+7V79269+OKLuuGGG7Rs2TL95je/SeW967rEe19fn44ePRp4zDnnnKOJEyeOef7gwYMqFot6/vnnU5teqhdR79uhQ4e0cOFCfeQjH9GWLVs0YUJdx6+xxfnvbcuWLVqxYoV+//vfZ3x19efkyZOaNGmStm7dqksvvXT4+WXLlun3v/89M5aGCoWCtm/fPuoeItiNN96onp4ePfvss5ozZ47ty6lLixcv1ty5c3X//fcnfq93pXA91rS1tamtrS3Wa4eGhiRJg4ODaV5SXYhy33p7e7Vo0SJdcMEF2rx5c8MGIVKy/94w1sSJE3XBBRfomWeeGR5Eh4aG9Mwzz+jGG2+0e3EYl8rlsm666SZt375dO3fuJAhJYGhoKLXxs64DEVMvvviifvGLX+hjH/uYTj/9dO3Zs0df//rXNXfu3IabDYmit7dXCxcu1Nlnn60NGzaor69v+G9nnXWWxSvLv/379+vYsWPav3+/SqWSdu/eLUn6kz/5E5122ml2Ly5HVq5cqWXLlumDH/ygLrzwQm3atEknTpzQNddcY/vScu3tt9/W66+/Pvz73r17tXv3brW2tmrWrFkWryzfli9frq6uLvX09Gjy5MnDuUgtLS069dRTLV9dfq1evVqXXHKJZs2apePHj6urq0s7d+7Uk08+mc4JUtl7k3P//d//XV60aFG5tbW13NzcXJ49e3b5+uuvLx88eND2peXa5s2by5I8Hwi2bNkyz/u2Y8cO25eWO/fcc0951qxZ5YkTJ5YvvPDC8gsvvGD7knJvx44dnv99LVu2zPal5Zrf/59t3rzZ9qXl2uc///ny2WefXZ44cWK5ra2tfNFFF5Wfeuqp1N6/rnNEAABAfWvcBX8AAGAdgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArPl/9MNMutYHLVoAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "engine": 0
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "import matplotlib.pyplot as plt\n",
+    "plt.plot(data_np[:,0], data_np[:,1], 'bo')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the\n",
+    "initial centroids."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = DNDarray([[ 2.0113,  1.9847],\n",
+       "          [-1.9887, -2.0153]], dtype=ht.float32, device=cpu:0, split=None)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "kmeans = ht.cluster.KMeans(n_clusters=2, init=\"kmeans++\")\n",
+    "labels = kmeans.fit_predict(data).squeeze()\n",
+    "centroids = kmeans.cluster_centers_\n",
+    "\n",
+    "# Select points assigned to clusters c1 and c2\n",
+    "c1 = data[ht.where(labels == 0), :]\n",
+    "c2 = data[ht.where(labels == 1), :]\n",
+    "# After slicing, the arrays are no longer distributed evenly among the processes; we might need to balance the load\n",
+    "c1.balance_() #in-place operation\n",
+    "c2.balance_()\n",
+    "\n",
+    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
+    "        f\"Centroids = {centroids}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's plot the assigned clusters and the respective centroids:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "# just for plotting: collect all the data on each process and extract the numpy arrays. This will copy data to CPU if necessary.\n",
+    "c1_np = c1.numpy()\n",
+    "c2_np = c2.numpy()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[output:0]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9OElEQVR4nO3de5RU9Z3v/U9Vd7pbAt0C6aCERmluio7OeowK6JkBYYLwmESOehLXRNHxMKOBRCQro0QzPlkJwWQ8glFjjK6RrGe8RkTycEAUvM0M4C1yEkGaawaEEAloNzChO1TV80f33uzatatq7117195V9X6txZLqrsuuIiu/T/1+39/3l8hkMhkBAABEIBn1BQAAgNpFEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARKY+6gsoJJ1Oa//+/RowYIASiUTUlwMAAFzIZDI6cuSIhg4dqmSy8JxHrIPI/v371dbWFvVlAAAAH/bu3athw4YVvE+sg8iAAQMk9b6R5ubmiK8GAAC40dXVpba2NnMcLyTWQcRYjmlubiaIAABQYdyUVVCsCgAAIkMQAQAAkSGIAACAyIQaRB5++GGdd955Zo3HhAkTtHr16jBfEgAAVJBQg8iwYcN0zz336N1339U777yjyy67TF/+8pe1efPmMF8WAABUiEQmk8mU8wUHDRqkf/7nf9ZNN91U9L5dXV1qaWlRZ2cnu2YAAKgQXsbvsm3fTaVS+uUvf6ljx45pwoQJjvfp7u5Wd3e3eburq6tclwcAACIQerHqb3/7W/Xv31+NjY26+eabtXz5co0bN87xvosWLVJLS4v5h66qAABUt9CXZnp6erRnzx51dnbqueee02OPPabXX3/dMYw4zYi0tbWxNAMAQAXxsjRT9hqRqVOnauTIkXrkkUeK3pcaEQBAlI6vXSwlkmqacmvu79bdL2XSapp6WwRXFm9exu+y9xFJp9NZsx4AAMRWIqnutff1hg6L4+vuV/fa+6QE7bhKFWqx6oIFCzR9+nQNHz5cR44c0ZNPPqnXXntNa9asCfNlAQAIhDET0r32PvO2EUIap853nCmBN6EGkY8++kjXX3+9fv/736ulpUXnnXee1qxZo7/5m78J82UBAAiMNYx0v/qglOohhASo7DUiXlAjAgCIi867RkupHqmuQS0/2B715cRarGtEAACoNMfX3W+GEKV6cmpG4B9BBACAAqw1IS0/2K7GqfMdC1jhT9k6qwIAUGmcClOdCljhH0EEAIB8MmnHwlTzdiYdwUVVF4pVAQBAoChWBQAAFYEgAgAAIkMQAQCE5vjaxXl3lxxfd3/vWS4BPAaViyACAAiPn7NaKuR8FwJTMNg1AwAI7ZRZP2e1VMz5Ln2BScrewmu9VhRHEAEAhDqo+jmrpRLOd4lzYAorWIaBIAIACH1QbZpyqxkoVNfg6vn8PMarUgfs2AamCpqticdCGwAgck1TbjXbl3feNTrQb/Z+zmopy/kuAdSjNE251bzGUgNTUHUn1n9L4/niMltjRxABAJiCHFQNfs5qMR5T1z7R8TFBFYMGMWAHGpgCLNQNM1gGiaUZAIDJaVAt6Ru+j7NarCEktWt91jV0r71PJ3ZtVGrX+sCWF0pZXrG/PzMwOLwvr9di3C5lJqMcy1ulIogAACQFP6hK8ndWi+Ux9muwhpAgB1U/A3ZYB+IFWXcSdLAMA0EEABDeoOqi0LPg4x0G5br2CYEPpr4G7BAPxAtiJiOUYBkCgggAIF6nzNp2fGQNypLq2ycUfLjXnTB+B+xSQ1Yhpc5khBUsw0AQAQCEOqh6ZR8wJZkhxBUPW1fjOGAHMpMRp2BZBEEEABA7TmHECBDFBmVPBZ8RDdj5Zm1OFupOkDLpnEJd6/sp1OMkTsGyGIIIACD+bHUSXsJIoYLPyAbsPLM2J3aul3Ry+SnnPn3hJG5NyUpBEAEAxJIxKNvrJNzOVsR562q+WY7U7o05gSmOLeSDRBABAMSOfVC210n4bTQWp8HbzaxNbFvIB4jOqgCAWMlXQOqmI6vTc7jt5iplt1i3t1u3dnMNsrNrsU62YXS7jRNmRAAA8WIrIM0p7LQsyRxfd79O7PwP1Y+85OT9bTUUx9cuNmtBiu4+sdRuZP2977GNU+cHWqPhZtYm7jM7pSKIAABiJaeAtMh23Lr2idm/7wsykrICg5vaEutSSOPU+eZMiqSc5wyiV0ixbbqV0pSsFAQRAECsudmOmzVAT70tb1Gn126uqmswfx5kjYab/iXG3+PU4yQMBBEAQGzZl2WyW71PNGc3gi7qtO+4kRRsjYbL/iWV0pSsFAQRAEB82ZZlzHCQqFNq13rVt4837xrkdl17XYakQGs0gjqDpxoQRAAAsWWd6Tixa6MZQpRJqa59Yk7NSBBFnfZiVLNGZPLcrNvVFAaixPZdAECsNU25VXXtE5XatT4rhKR2rT+51dbndl07pxDiVLTq57ldX4Nt27D9+oLYNhwnzIgAQBXyegJtnB1fd39WCFFdg/rPfsoMDSd2bVRq1/qiRZ1uPhPpZF3G8bWLHc+mMT+3sGo0PBzaVw0IIgBQjappMMukT86IOCy7nNj5H+6KOl18Jlk7bGxBzQgn+ZZ8ggp4ng7tqwIEEQCoQlU1mCWS5oyHMmmd2P1WdqCw9NywB4F87dKN254/kzIFvFpo7W4giABAlfI6mMVxOcepX0hq1/qcJmZug0CpA3w5A16cD+0LEsWqAFDFPJ1T0vdt314oaTYLS5RvyDALNm39NqyFq0YfEa9BoNSzW6zn3nTeNTq0WSanXUB571vBBa4EEQCoYl4GM6eD5SJbzjGWQGwzNNYZkdSu9ep+7aeer8/LZ5JP2AfRed4F5CFExi20sDQDAFXKzzklcalNcLME0nnXaM9BwNX5Lm521ySSoR1E56b9e77CXFdLRjErZCaIAEAV8jOYGXLam/ctf4RVO5Jv4G+acqtO7NroGIr8NC8r9pmc2LVB9e0THAfq4+vu14ldG5TatcGcjQntIDqX7d/t3IbIuBUyE0QAoBr5HMyk3KWLE7vf6t06q+LfoH0VvBb4hm72D7HMfPg+kbbIZ3Ji53rHBmbSyUHbHkKsjw8qjJTS/t1tgWtcZr4kgggAVCW/g1m+Qb7QLpVSp/0LfUO39w85+ui1jkHACBFO78/NrI2xDdh6nfYwYmwfrm8f7yvglYOXmaK47MohiAAAJBVfujDCSBjT/k7f0PMtgdjPmJGk+pETldq9USd2bTD7ikjS0Ue/qtSuDY4ByNpN1ZjFybp+49Rd6eRsTIx3n3idKQrqbJ5SEUQAAL1cLOek/vOd0Kb9s76h952umy8U2QdNp9/1Lu1scHytfAfbmYWo1lN3+5aGjj56raSM+ZxxKPR0ev18S0bGe3MKh0cfvTayw/wIIgAAScWXc8Ke9jefP5F0PF3XUDdivOMSSL4AZPzM+j6cZmiyloIsGvuKZo1tw9Ylm6gLPU1uaoL6ls3sZ/PkaxJXLolMJpMp26t51NXVpZaWFnV2dqq5uTnqywGAmpVv2j/f4GsuC/SFlmKDdL4ZCuvjsu5j+XZv1/mddvNwvJYfbHd9PdalDElKDGxTwwXXOBaqmtfo8v3FhXV5y3pwYNZupAA66HoZv5kRAQAU5HUrsJ9ahXwzFPadK07PaQ0lRx+9tjeEKGEup9SPuEhNU2+zLPs4B5hsCWU+3tv7mD717eNV3z7eHKjjUOhZiNMOJuu/W+d3Rjh2ri03gggAoDAPW4F99S9xeP6sZZZ190uZVMEi2tSu9ep595fKfLxXiYFt5n+NZZajj371ZM1HJq1VP/66vvP/vq57F9+vSxObew/T27Wx9/dGfYhl23Dj5LlZMwVxKfQsKM8OJlMmHYsQRRABABTkaSuwJVRYv5HbQ4t1CSDf8xetM8mkLTUdvTMYampW8z/+u7p+fGlOGJF6l1fqRlys/2fW7fo/+3v0nb//il76H59W/chLlNq9Mat4M6uHieVz8N3HpMwK7WCSFJsQFWoQWbRokZ5//nlt3bpVp5xyiiZOnKgf/ehHGjt2bJgvCwCISFaosH0jd6z1KKDorEMieTIsZFJ9D+pS54IzJWXMmRFTfaNSu9br1UMtent/j75+4QD99O3DeuXgaZqSWK+6EeNzijftRZySvM/4RMipgFdSrEJUqIfevf7665ozZ442btyol19+WX/+85/1hS98QceOHQvzZQEAMdA0xf8hem4OfWuacmvfCbx9NSGmjKSEmv/x37Of9ES3MpmMvv/A47pwaIMWXnaqLhzaoHtePaBMJqP6kROzXrf/7KfUOHW+6kdcZL7+iZ3r8y5TmUW0MdM05das7cj2EFX0QL2QhToj8uKLL2bdXrp0qT772c/q3Xff1V/91V+F+dIAgDjoWz7Je16Mww4Nt3Um9pkL2wv3zYyclBjYpnXvbtPb+3u07H+0KpFI6I5LW3TVswf1b6deoy/2LScVLN7MpIv2T/HCV0t8r69hbou2zBxZRN0Vtqw1Ip2dnZKkQYMGOf6+u7tb3d3d5u2urq6yXBcAICTW5ZM858XkcFEcm3MSr7FlN/uJsm6lD+/RPf/eqQuHNuiyEU2SpMtGNOnCzzXqh8t/rStuz/hvje83UIR8Eq7bepYol5NCXZqxSqfTmjdvni655BKde+65jvdZtGiRWlpazD9tbW3lujwAgMXxtYvzTtUfX3e/61bnOcsnfVtq7YOj9fmapt5WcNahaeptWWHF3LKbqHN8TGJg71jyyu7jent/j+64tEWJRO9STiKR0B2XNGvjW29r5Y9u8d/CvS9Q2D8zc+BPOA+3pSxfFZNvZinqpRi7sgWROXPm6P3339fTTz+d9z4LFixQZ2en+Wfv3r157wsACJHPgdXOWD7pDQO9tRvGckrWN3SXz2cwwop1eablh7t6Q4+hrkF17ROV+XivMnUNObMhBmNW5PsPPK6MEp6Clnk9JQQK62M77xodXJfWAjNLcapnKcvSzNy5c7Vy5Uq98cYbGjZsWN77NTY2qrGxsRyXBAAooNDWT7eDZP7lk94w4nSSrhdOyw5ZtSKpHtW3j1f64705tSFWxqzIVc8e1JqVL2jyKbtzlkRcn+Ar72fsGI8NukGa32Wmcgt1RiSTyWju3Llavny5XnnlFY0YMSLMlwMABKjkb+rWniJ9Tcl6l09Ozox4DSFZS0a2b/w97zwrqXdXSOPU+aobMV7da+/rrQ1Zf9RxNsRw2YgmXTi0QT/85Qap5XOO9RpuZm2ydqh4CBROW5VrRahBZM6cOfrXf/1XPfnkkxowYIAOHDigAwcO6E9/+lOYLwsANSmoug4rvwOrlL18Ym7F/eGuk2EkUef9m7llychaS3L00WuV+eRDJQa2mT1L+v/9M6prn9BbG/Lhn7JqQ3Ketm8Hzdv7e7TuvR15l1eKfcZHf/4Vz4HCzVblos8Rwr99uYQaRB5++GF1dnZq0qRJOv30080/zzzzTJgvCwC1KaC6DvtjCw2sRQfmR7/qcKhaytxKevTRaz1dT9OUW82ZDmtYMOpQMh/v7Q0DfepGjM9bG2LXWyvSpHveTun4y//LeRbI4TM+vnaxWYBrdGe1Boqjj34172sGVlAawr99uYRaIxLjg30BoOoEUddh5WrrZ5Htp3Ujxuc9sdeoEfHaYrx+5ESldm/MqcU4sXO9Uh/vVWr3Rh39+VfU/++f0Zr/b3ne2hC73lqRAbrq2YN65Xef0pQRiZxZIKfP+MTut8zaFKfPObVrQ/736OEcn0KC/rcvJ86aAYAqUkrBpJXbpmJuB0Cnn1mPobc+v5v3eGLXxt7B35itkZTavdG8T2r3Rn1y5yj98Lm9GjGwXoP71WnTgZ6izz24X51GnFqve/7tY102sk0Jh9byTp9x4tRhynzyoePnJylvoAiyoDSof/tyI4gAQJXxuwMjqymX7Zt6zq4Ry8DqagAM6Ju/ob59/MkdMqmerFN4JUmJOvX09Gj/kZT2HUlp0tIDnp7/z4kmJS69RXUfvp23AZj1M26+/T/M8BFlCAhj903YCCIAUGV8H1FvXWaxfFO3f7t3eq5iA2CQ3/yt19Nt1JxI5i4cY7aksT6hNdcN0ccDz1W//74o6zn+6/k7lNr3W9V97jyl9v0m63d1nztPrQMapDd+ovqp81XfPiEnjGS1Tbd8xuZnkKiLpE+H73/7CBFEAKCKlHJEfSl1BuUaAHOXPLLbuptLNn3FsMNaGjUstVWNH/9b9jJRqkONX7uj7/1uzXqOujMGOW8t7gsWZv2L7XTeE7s2Zp3pcmL3W4G//0JK+bePEkEEAKqE27qOQvzUGZR1AOxb4jGu0VyO6Rv8sw7B6wtFRlAwr8f2HNa/S3IMIflqXYzbiYFtWTt3jGso14xEEP/2UYnvfh4AgDcBtfT20jvEccbEcuKu03bSUnpaGEs81hBi9CcxzpQxf9a3hdYIJ8b7tz6Hcd3mey7ixM7/MNvTSyfP0sl83HskSebjvWqcOl/9Zz9V3jNdKqSduxOCCABUCVeHxbngqcun0wDYd+KudfA3njeQnhZ9Qcc+c9FwwTU5dzUG4t5Zk2TWc+QU41p24JzYuT7nuSSpfuQl5kyH+bMRF0nq2xpsadJWzhAQ1L99FFiaAQCYvC6zOA1w1iWB+vbxjs9biqapt+n42sWqbx/vuDPH+Lv9erJ+lqcY1/qeHZdVLLM9xnP31oL0tq1XJpX1uHIuh2TterL/zsVZOVEhiAAAJAVbZxB2TwvHAORiZ459sLYXvx5fuzhr6cb6WElZsz3da+8zd+1Ya0Miq8no2/V0YtdG9Z/9lPlja3Gt9f3FBUszAIBeAdcZeKk1KRt7K3Rb4aqxfJPvPWct9fTNgEiJ6GpDbNdmLFkZrfPtO3zi2Oo9kYlxH/auri61tLSos7NTzc3NUV8OAMADc1mnL4zEpctnvqUYL9dntKc31LVPzJmFiGopxLy2vp1ETvU0YfMyfscvGgEAKl4QJ8qGxZjV6F57n/PBdkUYh+xZC1StBax+QkiQp+f2n/2UGUIk5+3IcUIQAQAEKrATZUPkd9nI2jdEymTNOvSetHutv51BAZ6ea55wbD53XWxDiEQQAQAErQJ6WnjaomzVt2vGKEy1Ln0YTc38zD44BTU/S0bWmhBJZlAyakbiiBoRAEBNKVYjUmgbrFF/kbO92TIzYq0V8Xttfupq7IWpxmONay712rygRgQAAAeulo0KLJMYA3rW9ua6ht6lkERdX3Mz/0raaZSn0Vv/2U9ltZyPG/qIAABqR4FlI+P3Tn1E8i2T2Jd4St0eW8rhgXkbvak3jBhFtHHD0gwAAA6KLZMEsQ3Y6fWCer4oeRm/mREBgCpUqe2+46Rpyq1mV1j7MknQp91W8um5pSKIAEA16qtzkJR3AEVhBZdJXCzxeBL081UQlmYAoEpV01R/ufHZlYalGQBA6AfPVau4L5O4WXaTVDFLc2zfBYAqFsuD5wIWZHt0SfFvyOamC2uAnVrDxowIAFSxUraDVoyA62EKzRQE+dkZMxvKpHNmL8yZjb7fW6/JaXYm39KRm/tEjSACAFUqX52DVF07MLwMzLHSF6CMZmNS9rVbm5PZuVl2q5SlOYpVAaAKFWrAFfsB2qdS2qNHxR467P8t9h467xptzni1/GC77/sEjRbvAFDr4l7nEBBrfYi9Hsb4fZwZ/x6pXeulRF3Wf4uFEDcH9/k+3K+MWJoBgCpUrjqHyFnqQyRlDbqV0i8lq3Ga1HtuTZHCYjfLbpWyNEcQAQCExm+HV7ePs9eHGMEjK5zEnDlr0XeCrxJ1BQuL3WwvNv4e1y3IVgQRAEAojq9drBO738oqxDQYR9PnnbHwuRPGWpQpxW/QtStUI5L32l12Ya2UTq0EEQBAOBJJx0HVCCF17RPzBgRPO2H6BuZ858LEadC1ctodY9814xRGSl12i1soY9cMACA09sHWWH6oa5+o/rOfcv34YjthKnLHjM8+IpXAy/hNEAEAhMpaJClJStSp5Ye7XD/euv20cdLXHQdtI+zUj7jIXNaphDBSrdi+CwCIjaYpt/bOhBgyKdfbSO3bT0/sfiurdXnujEvS3BLr1OIc8UONCAAgVEcfvfbkbpC+ZRnHQlTbThl7YeqJXRtyak6USTs2/4pjUSacEUQAAKGxFqb2n/1U4UJMW08Qawgx/l7fPiH78QVqQuJ0Qq71tn3bctxOwy03gggAIBTH192fFUKk7N0wde0Ts2Yssn43YnxOCLHvhEn95zvxPVXYvv247/aJXRuzti37PZivmhBEAADhcNHvwj4LYA0jqT2/znuYW9xPFXbafmyEEGPbcjWf++MFQQQAEArf/S4yabO7qH224/i6+/tqRTbEvnW50+m3Rj2LsROo1kOIxK4ZAEDMnNj9Vk6rc+nkMoY1hEiK9S4Z+0F8/Wc/lXW71kOIxIwIACBGrHUl1h0y5rLGiPGqH5nbkTWuu2TsS0hHH7021ktKUSCIAABiIe9psYm6nKJXJ3Eb0O3vJ98OIil+115OBBEAQDzYilubptx68vyYRF1v19QK4RSqrDM91pmQWg8jBBEAQCzYi1vtyxpKVFBZo33HkOW2eY6M4rukVE6cNQMAiJ18yzTsMqkMXsZvZkQAALHiFDpYxqheBBEAQLy4aISG6hHqgtsbb7yhL37xixo6dKgSiYReeOGFMF8OAFAFmqbelnfGo2nKrTV7Jku1CjWIHDt2TOeff74eeuihMF8GAABUqFCXZqZPn67p06eH+RIAAKCCVdBeKAAAUG1iVaza3d2t7u5u83ZXV1eEVwMAAMIWqxmRRYsWqaWlxfzT1tYW9SUBAIAQxSqILFiwQJ2dneafvXv3Rn1JAAAgRLFammlsbFRjY2PUlwEAAMok1CBy9OhR7dixw7y9e/dubdq0SYMGDdLw4cPDfGkAAFABQg0i77zzjiZPnmzenj9/viRp1qxZWrp0aZgvDQAAKkCoQWTSpEmK8Zl6AAAgYrEqVgUAALWFIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIhMWYLIQw89pDPPPFNNTU26+OKL9dZbb5XjZQEAQMyFHkSeeeYZzZ8/X3fffbd+/etf6/zzz9e0adP00Ucfhf3SAAAg5kIPIvfdd59mz56tG2+8UePGjdPPfvYz9evXT//yL/8S9ksDAICYCzWI9PT06N1339XUqVNPvmAyqalTp2rDhg059+/u7lZXV1fWHwAAUL1CDSJ//OMflUqlNGTIkKyfDxkyRAcOHMi5/6JFi9TS0mL+aWtrC/PyAABAxGK1a2bBggXq7Ow0/+zduzfqSwIAACGqD/PJP/OZz6iurk5/+MMfsn7+hz/8QaeddlrO/RsbG9XY2BjmJQEAgBgJdUakoaFBF1xwgdatW2f+LJ1Oa926dZowYUKYLw0AACpAqDMikjR//nzNmjVLn//853XRRRdpyZIlOnbsmG688cawXxoAAMRc6EHkK1/5ig4ePKh/+qd/0oEDB/SXf/mXevHFF3MKWAEAQO1JZDKZTNQXkU9XV5daWlrU2dmp5ubmqC8HAAC44GX8jtWuGQAAUFsIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMiEFkQWLlyoiRMnql+/fjr11FPDehkAAFDBQgsiPT09uuaaa3TLLbeE9RIAAKDC1Yf1xN/73vckSUuXLg3rJQAAQIULLYj40d3dre7ubvN2V1dXhFcDAADCFqti1UWLFqmlpcX809bWFvUlAQCAEHkKInfccYcSiUTBP1u3bvV9MQsWLFBnZ6f5Z+/evb6fCwAAxJ+npZlvfetbuuGGGwrep7293ffFNDY2qrGx0ffjUV7LVq5WMpnUzBnTcn63fNUapdNpXXXF9AiuDABQKTwFkdbWVrW2toZ1LagwyWRSz61cJUlZYWT5qjV6buUqXX3FjKguDQBQIUIrVt2zZ48OHz6sPXv2KJVKadOmTZKkUaNGqX///mG9LMrICB/WMGINIU4zJQAAWCUymUwmjCe+4YYb9Itf/CLn56+++qomTZrk6jm6urrU0tKizs5ONTc3B3yFCIoRPurr63TiRIoQAgA1zsv4HdqumaVLlyqTyeT8cRtCUDlmzphmhpD6+jpCCADAtVht30VlWr5qjRlCTpxIafmqNVFfEgCgQsSqoRkqj70mxLgtiZkRAEBRBBH45lSY6lTACgBAPgQR+JZOpx0LU43b6XQ6issCAFSQ0HbNBIFdMwAAVJ5Y7JoBAAAohiACAAAiQxCJ0LKVq/NudV2+ao2WrVwd6OMAAIgbgkiEjLNa7KHC2I2STDr/8/h9XDkRlgAAbrBrxqUwTpr1e1ZLJZzxwoF4AAA3CCIuhTWwWkPFCy+ucX1Wi9/HlUucw1IYoRIA4A9BxKUwB9aZM6aZYcLLWS1+H+dWqQN2XMMSszUAEB/RFxNUkJkzpunqK2bouZWrNOub8wP7du/3rJawz3gJohYl6APxgqg9sf47Gs8Vl9kaAKg1zIh4FPQshN+zWoz7jRszWnfOm5vzuCCWGIKYBXIKS6V8ZkHNZsR1tgYAag1BxKMgB1a/Z7VYQ8iWbduzruG5lau0Zdt2bdm2PZAlhlIG7DAOxAtyiSzspS0AQHEEEQ+CHlj9ntVifZz9GqwhJKiB1c+AHeaBeEHNZgQ9WwMA8I4g4lIYA6ubQs9inAbls0ePCnRA9TNgh30gXqmzGWHM1gAAvCOIuBSnk2btdRLWQVmSzhk7Ju9jve6E8TtgBxGyCillNiPM2RoAgDcEEZfCHli9sA+akswQUoyXYs+4DtilzmbEKVQCQK0jiFQopzBihIhCg7KXYs8oB+x8MzfGtZ49epTS6XROoa71PeXbNRSnUAkAtY4gUiXsdRJuw0ihYs8oB+x8MzebO7ZJOrn8ZL+PEU5oTAYAlYEgUsGMQdleJ+FmxiLuW1fzzXJ8sH1HTmiKYxt5AIA7BJEKZR+U7XUSxQbhSti66mbmhsZkAFDZCCIVyEsRqVOthbUh2lmjRuZdBnFifT77c1vrMoI6PM7NzE3cZ3cAAPkRRCqQvYjUHgisSzJbd+zUlm3bJWUvXRhdWceNGe1pJ4w1tFj/bjz+6itmBFqj4WbmphJmdwAAzggiFcg+y1BoS64RNozfp9NpM4Q4zagU2wljDS1XXzHDPDxOyt61E9RhgMW26dKYDAAqG0GkCrjZkmvcNmYNCm3L9fJ69fV15s+DrNFws/xk/D1ufU4AAO4RRCqYdUmmWKv3oOso7M8nKdAaDbc9TGhMBgCVjSBSwby0eg+6jsL+fFLuNuJSlNrDhJkQAKgMBJEK5rbVe9B1FPZiVOO5rry8PMsiXs/LAQDEV80GkWoZzIq1et+ybXvewlQ/gcEphNj/bi1gDSOMeDkvBwAQbzUbRKp1MHNq9W7domtwqqNwE86kk3UZy1audjyfxghwYdVoeDkvBwAQbzUbRKppMPPb6t3+Ht2EM+vP7TNGRjjJVyMS5EwTHVUBoDrUbBCRqmMws7d6/8HiB/K2ei8WBIIIZ+WcaaKjKgBUvpoOIpL7wSxuNSXLVq42u6ZaQ8I5Y8fog+07Cjb+KqTUcFbOmSY6qgJA5UtGfQFRcxrMnBjf9O2/NwbZZLK8H2Uymcxp0W5nLNl4DQIzZ0wzPw8/Mw0zZ0wzC1ZnfXN+aCHEeN5f/OQ+8/Xy/fsZS0b5nmvZytWBXRsAwL2anhHxsq01bjUl1usxZgKcdrTM+uZ8z7MaQcw0hLls4uXQP4PbJaO4zXwBQLWr2SDiZzCLoqak0MAoSWePHpX3evwEATfhzM1gnUwmQ1s2cdt11el3xYJkte6mAoC4qtkg4mcwM35vH+DD/BbtZmDcvnt3TuDwM6tRLJxt2bZdd86bm3NN1vBhPdk3rIPo/HZddRMk4zbzBQDVrmaDiN/BzGmAD3Pav9jAKMmxxsVPJ9VC4cxojGYNNNaGaUb4KHSybxwOonOzZFQNu6kAoFLUbBDxI9+yhVM30SCn/fMNjMbPnAKHPQhs7tiWNwgUm7VZtnK1xo0ZrXFjRmc9hxFCEgmZ4SOdTrtuoBYFtzNFbA0GgPKo+V0zbuVbtrAGkGI7Raz3N2Yu3E7723eySHK8nrNHj3J8vHEAnrGTRuoNGAuXPOi468e6k8QaoKzvccu27ZKkTKa3mZqxPBPXQk8vO23c7qYCAJSGGRGX3NSUXHXF9NCm/e0D4+aObY6Pu+u2b2S1Y3d6XWMWwOhDYp/BsM/QWB979uhRZiGqwbimf3vzLR08dDjrMU7PFwU3xclGkDJ+Zp9p2tyxTXfd9o1o3gAAVClmRFy66orpecPCzBnTdNUV011/i/bap8P+Tf7s0aP0wfYdee9rhBD761tnZL42Z15WTUexGRrjsR9s35EVcsaNGa1f/OQ+jRszWgcPHVbr4EG+ZnzCVihIGktKxsxPvuv9YPsOZkYAIGDMiATES08SLztanAZyt91TCxWHZjIZ1dfX6c55c83HeS3MbB08SFu2bdfCJQ/mFKzGrdDTviRkLRy2Xtvmjm36YPsObe7Y5lgYHHWNCwBUG4JIALz0JPESWCTnb/LW53YaMK33Ne5j1IgYr5VMJnTiREoLlzyos0aNNENRMplwHGyt15lIJJTJZHTw0GElkwkzfBihJp1Oa9uuXbEu9MxXOGyEvA+27/DVDA4A4A1BJABue5L4aaLm5oC6fAOm9T7WpRxj5sKY0Th46JBZ85FOZ7R1x06tXbtW8267TUsWL9aRnpReW79BknICVDqdUTKZ0FmjRpqvWQlnwBTbFs2OGQAoD4JIANz2JLEHFntfEetMhJtdJoW2mFqf27iP1WcGDdLBQ4fV75RTzAJTqXe5ZXPHNv3iZw9qe0eHbvyf/1P/bfqXlEgk8s4M9IaRpHndfnqYRKHQtui4BykAqBahBZHf/e53+v73v69XXnlFBw4c0NChQ/W1r31Nd955pxoaGsJ62Vizh4pS24kXmnkwnnvLtu2WJZeMJJnbblsHD8oKIf1OadLBQ4fVkElpe0eHxp73l+r4zSb91ycf68KLL3ac2ZFk9igx+orEtZmZE3uYk/w1gwMA+BNaENm6davS6bQeeeQRjRo1Su+//75mz56tY8eO6d577w3rZStKKe3Ei808WBuOjRszWmeNGpm1vVaSDn38cdZz/tefjiuTyWjlC89r8JDT9H9N/G869IcD2vLe2+p36kCNGzM67/KS8fM4NzNzYg9zXpfOAAClCS2IXH755br88svN2+3t7ero6NDDDz9MELEwOpE67TLJtzzjptZEUtZOlq07duYEAWOGxNA6eJD+z3vv6dAfDmjS//1lJRIJnfv5i/Xa/16hr/7tdZo5o/dcHT9n9Fh7dNgVWoYK8xwf++f4g8UPOG6LjnOQAoBKV9Y+Ip2dnRo0aFDe33d3d6urqyvrT7VLJpPasm27uYvFqPUwBkmnwdtNTwzjPnfOm2t2PTW0Dnb+N/joj4f023fe1OAhp+n0tuGSpNPbhuszQ07TmpW/0vP/+0XzdZwY/VTyvU+nLqaF3mcpjyvGKczddds38nZbLfTeAAD+la1YdceOHXrggQcKzoYsWrRI3/ve98p1SSUJ6pu6/cwWY0ut/fRa6/N5ObDPWHow2OtCrH6/d0/WbIikrFmRBx7+mb759VuKvqdC1+V1GaqU5atC/J6+DAAIViKTyWSK3+2kO+64Qz/60Y8K3ueDDz7QWWedZd7et2+f/vqv/1qTJk3SY489lvdx3d3d6u7uNm93dXWpra1NnZ2dam5u9nKZocs3GHodJI37GwHB6NFh7cvhd9A1Hnv26FE6Z+wYM/AY6uvrdOXlvQfibdm2XS8t/6Uk6QszrzGDiNTb/Oyl5b/UoFNb1PHBB3ph9Uu+l0SMazJqMrx+Tl4fBwAov66uLrW0tLgavz3PiHzrW9/SDTfcUPA+7e3t5t/379+vyZMna+LEifr5z39e8HGNjY1qbGz0ekmRCOKbuv3+182dp3Q6o0QikdWxtJQQYn1sOp3WwUOHzEZkxkxJIpFwnA0xJBIJ/UXfrMjdP/ihdv7+o6wdPV5ngPz06PD7OABAvHkOIq2trWptbXV133379mny5Mm64IIL9Pjjj/tez4+rfH0o3A6S1uWB3sE8Y26zNcKI1xBiLBk5LT1s3bFTBw8dNnfRGPUXmUxGWze9k1UbYnd623ANHnKafvrQg7r1HxfkbStfjN9mZ6U0SQuz4BUAUJrQakT27dunSZMm6YwzztC9996rgwcPmr877bTTwnrZgsIYkEr5pm68Vu7MyG3mLhOv3/yNcGEPIcbsSuvgQbpz3lzz55s7tmndunU6sG+f42yIwTorsnNbh+N1F/t8jXNcvPboKLVJWqn9WgAA4QktiLz88svasWOHduzYoWHDhmX9zmNZSmDCGJCKfVP3MzgbISSdTmvhkgezgkMxM2dMMxuMGbeXr1pjhpCDhw7rB4sfMI+zHzdmtJb86IcFZ0MMxg6alS88r+s/1aBUKnvWJd/na90Wa99yvGXb9oKhwk9bfKfPxH7/uJwKDAC1LrQgcsMNNxStJSm3oAckN9/Ui4Wfs0ePynm8cduYxfDbYty+ZLS5Y5sOHjpsHmc/c8Y0rfjVr/LWhthZd9Ds3b1bw9vbs64r3+drhBB7szMjII0bM7pg/5EgdreUuowGAAhHzZ01E9SA5Pabutvw4/Qz664Z6/MXY5wgKymrdbnxs9bBg3p7Zax+UatWLFf/5hY1nXKKDh/8qOhzN51yivo3t+j9d97U6W3Dc0JSvs/X+Jlxf7cB0MtW5WIoeAWA+Km5ICL5H5Csyyz2b+r2GhPrN3U34SfIvhbW3iSSzNbl0smTd5PJhHp6evSno0f1X8eO6sXnnnb9/JKUTCQ0pv1Mx5Bk/Xzty1LWz+Ds0aPKGgYq4VRgAKg1NRlE/A5I1mUW6zd1e42J03MVCz9BfvO3LnlY+4YYvUmMJZ+6unr9zX+/Rp9rbdVNf/uVrOd47ImntXvPXvO20d+kfXhvHcn+gwd1/jnn6PxzzskJI8bna69zsX4GUu/MTblU0qnAAFBLai6IlDIglVJjUq5v4/ZQZA0i1t4kRkgY0NyiT/50XP954GBWkOj8U7cuveSSrMdbu7Je/9Wv5Mzo2F/fWueycMmDGjdmdFaX13IJouAVABCOmgoiQe/AcFtjUs5v48YSj3GNkszwI2UfhGf83Dh0z7ge4yA+o4+J8VxGCLEvqRSrczHCiBFq7NcXdgignTsAxFdNBZEgd2C4rTHJ1+HUPvhb719Kg62rrpieFXScwo/9HJvnVq7K2rliHMRnvWY3Syrvb+3I2RkjKWuJyP55lSOMBLnsBQAIVnW1Oi3iqium5x14Zs5wf7qq0zJLPk7hxxjo7dtWSz1R1rC5Y5uk3J4d48aMliR9ZtCgrJ9ffcUMc7nG6Zqt79f6/HbnnjXW3G5s9dr6jX3vO5H1eRmvzYwEANSumpoRCYLXZRancGNd3jHCQZANtoxD7uzPc9aokVn/tV/P+1s7cq7ZOmNibQvvVOPi1Ext4ZIH9cfDvW3lx40ZnfP7cs1IWFvf23fy2JvI0e4dAMqHIOJBkEWPYTbYyjeQulmicCrENZZWrMsu+d6v0cPkuZWr9PyqF81lKGMp6uorZuicsWPKXiRqBCjrMpHTe6TdOwCUF0HEg6CLHuPWYMseMuxFq/bw5fR+rc9hzDJYQ4jTTptysM9CPbdylVlA6/QeAQDlkchEdfCLC11dXWppaVFnZ6eam5ujvpzAGd/GjVqTuAyEpV6XdbnKUKj5WzlZ63B6g1Lvacdx+ewBoBp4Gb9rqlg1TqzLPL/4yX26+ooZZu1F1GbOmGaGEK8zNdaljkK/91qQu2zl6ryfzfJVa7Rs5WpXz2O8N2M2Jp3OxGI2CgBqFUEkAvlqTeISRrzsCrI/znhf9iDy3MpVWrjkQd8FudYiWafXdBtsrF1fe5834ek9AgCCRY1IBOLcYKuU5mv2Zmr2xmWl1GGU0tXW/t6sxbfGf+mwCgDRIIhEIK4NttzuCrIe/md11RXTzS6q+ZqhlaKUnUZOu2Psu2YIIwBQfgQRmNzO1FgP/7P347Bv87Uv82zu2FbSQO93p5Hx3oydQE47gOwN5gAA4WPXDHzJt4Rj3x1T7D5+XzduO40AACd5Gb9rdkYk3/KCFO320kpRbJkkjBNvy3l4IACgPGo2iBRaXrAWWiK/QsskQRfkhhFsAADRq9kgEsQujFrntM3X+NyCLsiN804jAIB/NV8jQs2BP2HUfwAAqgM1Ih7E7byXoIVRCxP3ZRI371kSNUIAEAM131nVbxfRShFUR1KrQsskxhbZKLl5z2F8LgAA72p6RqQWdmGEUQtTroZsxsyGcYqvvajY+Ll99sLLe6ZGCACiVbNBJO7LC0EqpSNplIxZC6MbqpQdGKxdUu3cvOdK/VwAoJrUbBCptV0YlVgLYw0KRgv2Ldu2Z50TUyg4uHnPlfi5AEA1qdkgEtfzXoJkLdq018IsXPKgzho1MvYFmdYwkkwmtGXbdiWTSVcH6BXaXuzlPgCA8NRsEKl0bnaGGEsbxiyCMXAbB9NVCuushdQ7W1Vs9sJN/U8t1AgBQNwRRCL0/ft+omQyqTvnzc353cIlDyqdTuu787/p+Fg3nWFnzpiWtZRhDLbWpY1KmAEwZi2SyYTS6YySyWTB2Qs39T/G32uhRggA4owgEiFjiWHhkgezwogxYzFuzOi8j3W7M+SsUSMlSVu2bdesb87PKsi09tSIK3thqvW/+QKD2/qfWqoRAoC4qvnOqlGzho47583NuV2M286wRgipr6/TL35yXxhvJXBOu2Py7Zph9gIA4sPL+E0QiQF7zYbbEGIoFjIqtY293z4iAIBoEUQq0N9+/Vbz70/89H7Xj7OHjLNHj9I5Y8c4FmRK0uaObfpg+46KCSMAgMrDWTMVZuGSB3Nue1mWsS9ZfLB9h3kfawgx/n7O2DEUZAIAYoEDNSJmrQl54qf3m3UP9nCybOXqrHNR7DMdy1auNs96kXpDx+aObTkhZOaMabE5EwYAAIJIhJwKU++cN9cxjNgPaTN2hkhGs6/ef0ojZJw9epQ+2L5DL7zofH7KzBnTIq2rsAcr6+3lq9Zo2crV5u/stwEA1YMgEqF0Ou1YmGqEEeuMhREwjDBihIh8IeOu275h1o3EsXW5PVgZtxcueTArWHEaLgBUN2pEIpSvWZkkxxqRmTOmaXPHtryHtBk7Sa66YnrsW5c79UFxar7GabgAUN0IIhXmnLFj9MH2HTkzHdZBu1JalzudfmssS9mbrwEAqhPz3RXMmOmwF646tS63LuvEycwZ07KWkO6cNzfWS0oAgGAxI1JBnAKHMdNhBI9lK1dXVOtyp1OB47ykBAAIFkGkQjjVS1hPpDUU2gkTtwHd/p7su4jiuqQEAAgOSzMVwn6Qm3UmQertmFpJnOpYnE4FjuuSEgAgGMyIVAjrTEe+YtRKWsawByvrbeupwHFdUgIABIOzZipMvi2tbHUFAMQFZ81UMftMgoGZAwBAJWJGBAAABMrL+B1qseqXvvQlDR8+XE1NTTr99NN13XXXaf/+/WG+JAAAqCChBpHJkyfr2WefVUdHh5YtW6adO3fq6quvDvMlAQBABSnr0syvfvUrXXnlleru7tanPvWpovdnaQYAgMoTy2LVw4cP64knntDEiRPzhpDu7m51d3ebt7u6usp1eQAAIAKhNzS7/fbb9elPf1qDBw/Wnj17tGLFirz3XbRokVpaWsw/bW1tYV8eAACIkOcgcscddyiRSBT8s3XrVvP+3/72t/Xee+/ppZdeUl1dna6//nrlWw1asGCBOjs7zT979+71/84AAEDsea4ROXjwoA4dOlTwPu3t7WpoaMj5+Ycffqi2tjatX79eEyZMKPpa1IgAAFB5Qq0RaW1tVWtrq68LM5ptWetAAABA7QqtWPXNN9/U22+/rUsvvVQDBw7Uzp079d3vflcjR450NRsCAACqX2jFqv369dPzzz+vKVOmaOzYsbrpppt03nnn6fXXX1djY2NYLwsAACpIaDMif/EXf6FXXnmlpOcwylfYxgsAQOUwxm03ZaixPvTuyJEjksQ2XgAAKtCRI0fU0tJS8D6xPvQunU5r//79GjBggBKJhKfHdnV1qa2tTXv37mXHjUd8dv7wufnD5+Yfn50/fG7+ePncMpmMjhw5oqFDhyqZLFwFEusZkWQyqWHDhpX0HM3NzfwPzSc+O3/43Pzhc/OPz84fPjd/3H5uxWZCDKF3VgUAAMiHIAIAACJTtUGksbFRd999N1uFfeCz84fPzR8+N//47Pzhc/MnrM8t1sWqAACgulXtjAgAAIg/gggAAIgMQQQAAESGIAIAACJTM0HkS1/6koYPH66mpiadfvrpuu6667R///6oLyvWfve73+mmm27SiBEjdMopp2jkyJG6++671dPTE/Wlxd7ChQs1ceJE9evXT6eeemrUlxNrDz30kM4880w1NTXp4osv1ltvvRX1JcXeG2+8oS9+8YsaOnSoEomEXnjhhagvqSIsWrRIF154oQYMGKDPfvazuvLKK9XR0RH1ZcXeww8/rPPOO89sZDZhwgStXr06sOevmSAyefJkPfvss+ro6NCyZcu0c+dOXX311VFfVqxt3bpV6XRajzzyiDZv3qzFixfrZz/7mb7zne9EfWmx19PTo2uuuUa33HJL1JcSa88884zmz5+vu+++W7/+9a91/vnna9q0afroo4+ivrRYO3bsmM4//3w99NBDUV9KRXn99dc1Z84cbdy4US+//LL+/Oc/6wtf+IKOHTsW9aXF2rBhw3TPPffo3Xff1TvvvKPLLrtMX/7yl7V58+ZgXiBTo1asWJFJJBKZnp6eqC+lovz4xz/OjBgxIurLqBiPP/54pqWlJerLiK2LLrooM2fOHPN2KpXKDB06NLNo0aIIr6qySMosX7486suoSB999FFGUub111+P+lIqzsCBAzOPPfZYIM9VMzMiVocPH9YTTzyhiRMn6lOf+lTUl1NROjs7NWjQoKgvA1Wgp6dH7777rqZOnWr+LJlMaurUqdqwYUOEV4Za0dnZKUn8f5oHqVRKTz/9tI4dO6YJEyYE8pw1FURuv/12ffrTn9bgwYO1Z88erVixIupLqig7duzQAw88oH/4h3+I+lJQBf74xz8qlUppyJAhWT8fMmSIDhw4ENFVoVak02nNmzdPl1xyic4999yoLyf2fvvb36p///5qbGzUzTffrOXLl2vcuHGBPHdFB5E77rhDiUSi4J+tW7ea9//2t7+t9957Ty+99JLq6up0/fXXK1ODjWW9fm6StG/fPl1++eW65pprNHv27IiuPFp+PjcA8TRnzhy9//77evrpp6O+lIowduxYbdq0SW+++aZuueUWzZo1S1u2bAnkuSu6xfvBgwd16NChgvdpb29XQ0NDzs8//PBDtbW1af369YFNL1UKr5/b/v37NWnSJI0fP15Lly5VMlnR+dU3P/97W7p0qebNm6dPPvkk5KurPD09PerXr5+ee+45XXnllebPZ82apU8++YQZS5cSiYSWL1+e9RmisLlz52rFihV64403NGLEiKgvpyJNnTpVI0eO1COPPFLyc9UHcD2RaW1tVWtrq6/HptNpSVJ3d3eQl1QRvHxu+/bt0+TJk3XBBRfo8ccfr9kQIpX2vzfkamho0AUXXKB169aZg2g6nda6des0d+7caC8OVSmTyegb3/iGli9frtdee40QUoJ0Oh3Y+FnRQcStN998U2+//bYuvfRSDRw4UDt37tR3v/tdjRw5suZmQ7zYt2+fJk2apDPOOEP33nuvDh48aP7utNNOi/DK4m/Pnj06fPiw9uzZo1QqpU2bNkmSRo0apf79+0d7cTEyf/58zZo1S5///Od10UUXacmSJTp27JhuvPHGqC8t1o4ePaodO3aYt3fv3q1NmzZp0KBBGj58eIRXFm9z5szRk08+qRUrVmjAgAFmLVJLS4tOOeWUiK8uvhYsWKDp06dr+PDhOnLkiJ588km99tprWrNmTTAvEMjem5j7zW9+k5k8eXJm0KBBmcbGxsyZZ56ZufnmmzMffvhh1JcWa48//nhGkuMfFDZr1izHz+3VV1+N+tJi54EHHsgMHz4809DQkLnooosyGzdujPqSYu/VV191/N/XrFmzor60WMv3/2ePP/541JcWa3/3d3+XOeOMMzINDQ2Z1tbWzJQpUzIvvfRSYM9f0TUiAACgstXugj8AAIgcQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkfn/AYyS7l+jQ9wLAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "engine": 0
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "# plotting on 1 process only\n",
+    "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n",
+    "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n",
+    "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n",
+    "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = DNDarray([[ 1.9905,  1.9855],\n",
+       "          [-2.0095, -2.0145]], dtype=ht.float32, device=cpu:0, split=None)\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] Number of points assigned to c1: 100 \n",
+       "Number of points assigned to c2: 100 \n",
+       "Centroids = \n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "kmedians = ht.cluster.KMedians(n_clusters=2, init=\"kmedians++\")\n",
+    "labels = kmedians.fit_predict(data).squeeze()\n",
+    "centroids = kmedians.cluster_centers_\n",
+    "\n",
+    "# Select points assigned to clusters c1 and c2\n",
+    "c1 = data[ht.where(labels == 0), :]\n",
+    "c2 = data[ht.where(labels == 1), :]\n",
+    "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n",
+    "c1.balance_()\n",
+    "c2.balance_()\n",
+    "\n",
+    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
+    "        f\"Centroids = {centroids}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plotting the assigned clusters and the respective centroids:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "c1_np = c1.numpy()\n",
+    "c2_np = c2.numpy()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[output:0]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9EElEQVR4nO3df5QU9Z3v/1f3TJiRACNDJirLEBl+KUk052oU0O8uCCvCmkSu+s16NhFdL3c1kIjkZCPRxJsTCWbXo7hqjNG7kntXjYkE3cNXRMVfMYD4I5wov38lIARB0BlgZSYzXd8/Zqqorq7urqqu6qrufj7OYaVn+kd1k7OfV38+78/7kzIMwxAAAEAM0nFfAAAAqF0EEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbOrjvoBCMpmM9u3bp4EDByqVSsV9OQAAwAPDMHTkyBENHTpU6XThOY9EB5F9+/aptbU17ssAAAAB7NmzR8OGDSt4n0QHkYEDB0rqfSODBg2K+WoAAIAXHR0dam1ttcbxQhIdRMzlmEGDBhFEAACoMF7KKihWBQAAsSGIAACA2BBEAABAbCINIg888IDOOussq8ZjwoQJWrFiRZQvCQAAKkikQWTYsGG644479NZbb+nNN9/URRddpK985SvasGFDlC8LAAAqRMowDKOcL9jc3Kx//dd/1XXXXVf0vh0dHWpqalJ7ezu7ZgAAqBB+xu+ybd/t6enRr3/9ax07dkwTJkxwvU9nZ6c6Ozut2x0dHeW6PAAAEIPIi1XfeecdDRgwQA0NDbr++uu1bNkyjRs3zvW+ixYtUlNTk/WHrqoAAFS3yJdmurq6tHv3brW3t+vJJ5/Uww8/rFdeecU1jLjNiLS2trI0AwBABfGzNFP2GpGpU6dq5MiRevDBB4velxoRAECcjr9wt5RKq3HKjbm/W3WPZGTUOPWmGK4s2fyM32XvI5LJZLJmPQAASKxUWp0v3NUbOmyOr7pHnS/cJaVox1WqSItVFyxYoOnTp2v48OE6cuSIHnvsMb388stauXJllC8LAEAozJmQzhfusm6bIaRh6nzXmRL4E2kQOXDggK6++mr9+c9/VlNTk8466yytXLlSf/u3fxvlywIAEBp7GOl86T6pp4sQEqKy14j4QY0IACAp2m8dLfV0SXX91HT7trgvJ9ESXSMCAEClOb7qHiuEqKcrp2YEwRFEAAAowF4T0nT7NjVMne9awIpgytZZFQCASuNWmOpWwIrgCCIAAORjZFwLU63bRiaGi6ouFKsCAIBQUawKAAAqAkEEAADEhiACAIjM8Rfuzru75Piqe3rPcgnhMahcBBEAQHSCnNVSIee7EJjCwa4ZAEBkp8wGOaulYs536QtMUvYWXvu1ojiCCAAg0kE1yFktlXC+S5IDU1TBMgoEEQBA5INq45QbrUChun6eni/IY/wqdcBObGCqoNmaZCy0AQBi1zjlRqt9efuto0P9Zh/krJaynO8SQj1K45QbrWssNTCFVXdi/7c0ny8pszVOBBEAgCXMQdUU5KwW8zF1bRNdHxNWMWgYA3aogSnEQt0og2WYWJoBAFjcBtWSvuEHOKvFHkJ6dq7OuobOF+5S98616tm5OrTlhVKWV5zvzwoMLu/L77WYt0uZySjH8lapCCIAAEnhD6qSgp3VYnuM8xrsISTMQTXIgB3VgXhh1p2EHSyjQBABAEQ3qHoo9Cz4eJdBua5tQuiDaaABO8ID8cKYyYgkWEaAIAIASNYps44dH1mDsqT6tgkFH+53J0zQAbvUkFVIqTMZUQXLKBBEAACRDqp+OQdMSVYI8cTH1tUkDtihzGQkKVgWQRABACSOWxgxA0SxQdlXwWdMA3a+WZsThboTJCOTU6hrfz+FepwkKVgWQxABACSfo07CTxgpVPAZ24CdZ9ame8dqSSeWn3Lu0xdOktaUrBQEEQBAIpmDsrNOwutsRZK3ruab5ejZtTYnMCWxhXyYCCIAgMRxDsrOOomgjcaSNHh7mbVJbAv5ENFZFQCQKPkKSL10ZHV7Dq/dXKXsFuvOduv2bq5hdnYt1sk2im63ScKMCAAgWRwFpDmFnbYlmeOr7lH3jt+pfuQFJ+7vqKE4/sLdVi1I0d0nttqNrL/3PbZh6vxQazS8zNokfWanVAQRAECi5BSQFtmOW9c2Mfv3fUFGUlZg8FJbYl8KaZg635pJkZTznGH0Cim2TbdSmpKVgiACAEg0L9txswboqTflLer0281Vdf2sn4dZo+Glf4n59yT1OIkCQQQAkFjOZZnsVu8TrdmNsIs6nTtuJIVbo+Gxf0mlNCUrBUEEAJBcjmUZKxyk6tSzc7Xq28Zbdw1zu66zLkNSqDUaYZ3BUw0IIgCAxLLPdHTvXGuFEBk9qmubmFMzEkZRp7MY1aoRmTw363Y1hYE4sX0XAJBojVNuVF3bRPXsXJ0VQnp2rj6x1Tbgdl0ntxDiVrQa5Lk9X4Nj27Dz+sLYNpwkzIgAQBXyewJtkh1fdU9WCFFdPw2Y/bgVGrp3rlXPztVFizq9fCbSibqM4y/c7Xo2jfW5RVWj4ePQvmpAEAGAalRNg5mROTEj4rLs0r3jd96KOj18Jlk7bBxBzQwn+ZZ8wgp4vg7tqwIEEQCoQlU1mKXS1oyHjIy6d63LDhS2nhvOIJCvXbp52/dnUqaAVwut3U0EEQCoUn4HsyQu57j1C+nZuTqniZnXIFDqAF/OgJfkQ/vCRLEqAFQxX+eU9H3bdxZKWs3CUuUbMqyCTUe/DXvhqtlHxG8QKPXsFvu5N+23jo5slsltF1De+1ZwgStBBACqmJ/BzO1gudiWc8wlEMcMjX1GpGfnanW+/FPf1+fnM8kn6oPofO8C8hEikxZaWJoBgCoV5JySpNQmeFkCab91tO8g4Ol8Fy+7a1LpyA6i89L+PV9hrqclo4QVMhNEAKAKBRnMTDntzfuWP6KqHck38DdOuVHdO9e6hqIgzcuKfSbdO9eovm2C60B9fNU96t65Rj0711izMZEdROex/buT1xCZtEJmgggAVKOAg5mUu3TRvWtd79ZZFf8GHajgtcA3dKt/iG3mI/CJtEU+k+4dq10bmEknBm1nCLE/PqwwUkr7d68FrkmZ+ZIIIgBQlYIOZvkG+UK7VEqd9i/0Dd3ZP+ToQ1e5BgEzRLi9Py+zNuY2YPt1OsOIuX24vm18oIBXDn5mipKyK4cgAgCQVHzpwgwjUUz7u31Dz7cE4jxjRpLqR05Uz6616t65xuorIklHH/p79exc4xqA7N1UzVmcrOs3T92VTszGJHj3id+ZorDO5ikVQQQA0MvDck7Pn96MbNo/6xt63+m6+UKRc9B0+13v0s4a19fKd7CdVYhqP3W3b2no6ENXSTKs50xCoafb6+dbMjLfm1s4PPrQVbEd5kcQAQBIKr6cE/W0v/X8qbTr6bqmuhHjXZdA8gUg82f29+E2Q5O1FGTT0Fc0a24bti/ZxF3oafFSE9S3bOY8mydfk7hySRmGYZTt1Xzq6OhQU1OT2tvbNWjQoLgvBwBqVr5p/3yDr7Us0Bdaig3S+WYo7I/Luo/t271T+/farMPxmm7f5vl67EsZkpQa3Kp+51zpWqhqXaPH95cU9uUt+8GBWbuRQuig62f8ZkYEAFCQ363AQWoV8s1QOHeuuD2nPZQcfeiq3hCilLWcUj/iPDVOvcm27OMeYLKlZHy4p/cxferbxqu+bbw1UCeh0LMQtx1M9n+39u+NcO1cW24EEQBAYT62AgfqX+Ly/FnLLKvukYyegkW0PTtXq+utX8v4cI9Sg1ut/5rLLEcf+nupp0sv//G4bn3xQ/3k5G9oxj//1LpmGRl171zb++JmfYht23DD5LlZMwVJKfQsKM8OJouRSUSIIogAAArytRXYFirs38idocW+BJDv+YvWmRgZW01H7wyGGgdp0D+/po5/uTAnjBiGoYWvd+vdA3/R/7rnf2vyfxujnl2vn2gZv2ttVvFmVg8T2+cQuI9JmRXawSQpMSEq0iCyaNEi/eY3v9HmzZt10kknaeLEifrJT36isWPHRvmyAICYZIUKxzdy11qPAorOOqTSJ8KC0dP3oA61LzhdkmHNjJhe3N2jN3Yd1je+OFA/feOIVjz0Y01pO8kKK3UjxucUbzqLOCX5n/GJkVsBr6REhahID7175ZVXNGfOHK1du1bPP/+8/vKXv+jiiy/WsWPHonxZAEACNE4Jfoiel0PfGqfc2HcCb19NiMWQlNKgf37txE8MQ3e88oG+OLSfFl50sr44tJ/ueK1dRsNApQcPk9Tbi8T+ugNmP66GqfNVP+I86/W7d6zOu0xlFdEmTOOUG7O2IztDVNED9SIW6YzIs88+m3V7yZIl+vSnP6233npLf/3Xfx3lSwMAkqBv+STveTEuOzS81pk4Zy4cL9w3M9LrxV3H9ca+Li39f1uUSqV084VNuvxXB/XinoymdK7JWk4qWLxpZIr2T/EjUEt8v69hbYu2zRzZxN0Vtqw1Iu3t7ZKk5uZm1993dnaqs7PTut3R0VGW6wIARMS+fJLnvJgcHopjc07iNbfsZj9R7/81DN3xWru+OLSfLhrRKEm6aERj76zIszv0t7fNPBE8grbGDxooIj4J12s9S9XWiNhlMhnNmzdPF1xwgT73uc+53mfRokX64Q9/WK5LAgDkEdY39UZbMzD7llpnQy3783kJA/aZC2vLbp5v/C8eHKg39u2xZkMkZc2KvHz4ZH3J5/vKETBQFCooLbU3SSknMJdT2YLInDlz9O677+q1117Le58FCxZo/vwT/1gdHR1qbW0tx+UBAOxC+qZuLp+cKBxNWcspRWdGCrDvYDGfb8Dsx0/sdpGkun5KDz9Hd/yfZVmzISZzVuRH9z6iSc0fKbOrd4kmSCApJVDk6whbckgo4QTmcipLEJk7d66WL1+uV199VcOGDct7v4aGBjU0NJTjkgAABYTxTT3/8klvGHE7SdcPt2WHrFqRni69fPjkrNoQO/usyPOrXtTFUy/Kan/ufC1PJ/gqWKAI0hK/6HMGXGYqt0h3zRiGoblz52rZsmV68cUXNWLEiChfDgAQIvuOivZbR/tfLrD3FOlrSqZUncxdLUFCyPEX7j6xu8Pxjb/rzV9J6t0V0jB1vtKnn68f3fuI62yIyaoVea1dPYf39M7enDzMdRZIqeJDZtYOFR+Bwm2rcq2INIjMmTNH//Ef/6HHHntMAwcO1P79+7V//359/PHHUb4sANSkrEHa+btV9wQ6wj7owCr1fiN3zqQ0/XjniTCSqvP/zbxvyej4qnus55ekow9dJeOj95Qa3Gr1LFl9+nV6Y1+Xbr6wKWc2xHq6vlmRN/Z1adXbvefSGB+9l3e7cbHP+OjPv+o7UHjZqlz0OSL4ty+XSIPIAw88oPb2dk2aNEmnnXaa9eeJJ56I8mUBoDbZBmk7P9/onYp9Uy86MD/09y6Hqp0oLD360FW+rqdxyo2qGzE+pzeJvQ7l6M+/KsMw9L9+8H198a8a886GmOyzIv2m3GRtN3adBXL5jI+/cLeOPnSVOl+4y+rOag8URx/6+7yvna+g1HcYieDfvlwirRFJ8MG+AFB1wt6B4WnrZ5Gi1roR4/Oe2GvWiPhtMV4/srcdu7MWo3vHavV8uEc9u9bq6XmTtHbdG661IU72WpGVy5/S5JN25Ww3Nrl9xt271lm1KW6fc8/ONfnfY0gFpVHuvokaZ80AQBUJaweG162fXgdAt5/Zj6G3P7+X92htCTZnayT17Oo9tM4wDP34ybUacXK9hvSv0/r9XUWfc0j/Oo04uV4//vUaTf7WudJH7+VtLe/2GadOHibjo/dcP7/ei3IPFGEWlEa2+yZiBBEAqDJBd2Bk9Q5xfFPP2TViG1g9DYAhbyWtbxt/YodMT1f2Kbw90r4jPdp7pEeTluz39bx/ydSr84M96j/mAtW3jVf3zjV5G4DZP+NB3/2dFT7iDAFR7L6JGkEEAKpM4CPq7csstm/qzm/3bs9VbAAM85u//Xo6zZoTydqFU7dzrVZ+/VV98F+94abur85S//++KOs5/us3N6tn7zuq+6uz1LP3D9bPW/qn1X/MBZKRsV6jvm1CThjJaptu+4ytzyBVF0ufjsD/9jEiiABAFSnliPpS6gzKNQDmLnlkd1I1l2yGNTVo2KC+otiezWr48LfZy0Q9W9TwtZv73u/mnNcxi06dszr2a3Ceztu9c23WmS7du9aF/v4LKeXfPk4EEQCoEmG09A5SZ1DWAbBvice8RuvAu77BP+sQvL5QZAYF63ocz2H/uyTX/ib5al3M26nBrVk7d8xrKNeMRKW0c3eT3P08AAB/CtRh+Dmi3k/vENcZE9uJu27bSUvpaWEu8dhDiNmfJDW490gQ62d9W2jNcGK+f/tzmNdtveciunf8zmpPL/VtJ26b2Ne+XjI+3KOGqfM1YPbjgfqBBBbSv30cCCIAUCXsDb5yfjflRs9np/jq8uk2APaduGsf/M3nDaWnRV/Qcc5c9Dvnypy7mgNx76xJOus5copxbTtwunesznkuSaofeYE102H9bMR5kvq2CNuatJUzBIT1bx8HlmYAABa/yyxuA5x9SaC+bbzr85aicepNOv7C3apvG++6M8f8u/N6sn6WpxjX/p5dl1Vssz3mc/fWgvS2rZfRk/W4ci6HhHVicrkRRAAAksKtM4i6p4VrAPKwM8c5WDuLX4+/cHfW0o39sZKyZns6X7jL2rVjrw2JrSajb9dT9861GjD7cevH9uJa+/tLCpZmAAC9Qq4z8FNrUjbOVuiOwlVz+Sbfe85a6umbAZFS8dWGOK7NXLIyW+c7d/gksdV7ykhwH/aOjg41NTWpvb1dgwYNivtyAAA+WMs6fWEkKV0+8y3F+Lk+sz29qa5tYs4sRFxLIda19e0kcquniZqf8Tt50QgAUPHCOFE2KuasRt6D7YowD9mzF6jaC1iDhJAwT88dMPtxK4RI7tuRk4QgAgAIVWgnykYo6LKRvW+IZGTNOvSetHtVsJ1BIZ6ea51wbD13XWJDiEQQAQCErQJ6WvjaomzXt2vGLEy1L32YTc2CzD64BbUgS0b2mhBJVlAya0aSiBoRAEBNKVYjUmgbrFl/kbO92TYzYq8VCXptQepqnIWp5mPNay712vygRgQAABeelo0KLJOYA3rW9ua6fr1LIam6vuZmwZW00yhPo7cBsx/PajmfNPQRAQDUjgLLRubv3fqI5FsmcS7xlLo9tpTDA/M2elNvGDGLaJOGpRkAAFwUWyYJYxuw2+uF9Xxx8jN+MyMCAFWoUtt9J0njlButrrDOZZKwT7ut5NNzS0UQAYBq1FfnICnvAIrCCi6TeFji8SXs56sgLM0AQJWqpqn+cuOzKw1LMwCAyA+eq1ZJXybxsuwmqWKW5ti+CwBVLJEHz4UszPbokpLfkM1LF9YQO7VGjRkRAKhipWwHrRgh18MUmikI87MzZzZkZHJmL6yZjb7f26/JbXYm39KRl/vEjSACAFUqX52DVF07MPwMzInSF6DMZmNS9rXbm5M5eVl2q5SlOYpVAaAKFWrAlfgBOqBS2qPHxRk6nP8t9h7abx1tzXg13b4t8H3CRot3AKh1Sa9zCIm9PsRZD2P+PsnMf4+enaulVF3Wf4uFEC8H9wU+3K+MWJoBgCpUrjqH2NnqQyRlDbqV0i8lq3Ga1HtuTZHCYi/LbpWyNEcQAQBEJmiHV6+Pc9aHmMEjK5wknDVr0XeCr1J1BQuLvWwvNv+e1C3IdgQRAEAkjr9wt7p3rcsqxDSZR9PnnbEIuBPGXpQpJW/QdSpUI5L32j12Ya2UTq0EEQBANFJp10HVDCF1bRPzBgRfO2H6BuZ858IkadC1c9sd49w14xZGSl12S1ooY9cMACAyzsHWXH6oa5uoAbMf9/z4YjthKnLHTMA+IpXAz/hNEAEARMpeJClJStWp6cc7PT/evv20YdI3XAdtM+zUjzjPWtaphDBSrdi+CwBIjMYpN/bOhJiMHs/bSJ3bT7t3rctqXZ4745K2tsS6tThH8lAjAgCI1NGHrjqxG6RvWca1ENWxU8ZZmNq9c01OzYmMjGvzryQWZcIdQQQAEBl7YeqA2Y8XLsR09ASxhxDz7/VtE7IfX6AmJEkn5NpvO7ctJ+003HIjiAAAInF81T1ZIUTK3g1T1zYxa8Yi63cjxueEEOdOmJ4/vZncU4Wd24/7bnfvXJu1bTnowXzVhCACAIiGh34XzlkAexjp2f123sPckn6qsNv2YzOEmNuWq/ncHz8IIgCASATud2FkrO6iztmO46vu6asVWZP41uVup9+a9SzmTqBaDyESu2YAAAnTvWtdTqtz6cQyhj2ESEr0LhnnQXwDZj+edbvWQ4jEjAgAIEHsdSX2HTLWssaI8aofmduRNam7ZJxLSEcfuirRS0pxIIgAABIh72mxqbqcolc3SRvQne8n3w4iKXnXXk4EEQBAMjiKWxun3Hji/JhUXW/X1ArhFqrsMz32mZBaDyMEEQBAIjiLW53LGkpVUFmjc8eQ7bZ1joySu6RUTpw1AwBInHzLNOwyqQx+xm9mRAAAieIWOljGqF4EEQBAsnhohIbqEemC26uvvqovfelLGjp0qFKplJ566qkoXw4AUAUap96Ud8ajccqNNXsmS7WKNIgcO3ZMZ599tu6///4oXwYAAFSoSJdmpk+frunTp0f5EgAAoIJV0F4oAABQbRJVrNrZ2anOzk7rdkdHR4xXAwAAopaoGZFFixapqanJ+tPa2hr3JQEAgAglKogsWLBA7e3t1p89e/bEfUkAACBCiVqaaWhoUENDQ9yXAQAAyiTSIHL06FFt377dur1r1y6tX79ezc3NGj58eJQvDQAAKkCkQeTNN9/U5MmTrdvz58+XJM2aNUtLliyJ8qUBAEAFiDSITJo0SQk+Uw8AAMQsUcWqAACgthBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSlLELn//vt1+umnq7GxUeeff77WrVtXjpcFAAAJF3kQeeKJJzR//nzddtttevvtt3X22Wdr2rRpOnDgQNQvDQAAEi7yIHLXXXdp9uzZuvbaazVu3Dj97Gc/U//+/fXv//7vUb80AABIuEiDSFdXl9566y1NnTr1xAum05o6darWrFmTc//Ozk51dHRk/QEAANUr0iDywQcfqKenR6ecckrWz0855RTt378/5/6LFi1SU1OT9ae1tTXKywMAADFL1K6ZBQsWqL293fqzZ8+euC8JAABEqD7KJ//Upz6luro6vf/++1k/f//993Xqqafm3L+hoUENDQ1RXhIAAEiQSGdE+vXrp3POOUerVq2yfpbJZLRq1SpNmDAhypcGAAAVINIZEUmaP3++Zs2apXPPPVfnnXeeFi9erGPHjunaa6+N+qUBAEDCRR5EvvrVr+rgwYP6wQ9+oP379+sLX/iCnn322ZwCVgAAUHtShmEYcV9EPh0dHWpqalJ7e7sGDRoU9+UAAAAP/Izfido1AwAAagtBBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEJvIgsjChQs1ceJE9e/fXyeffHJULwMAACpYZEGkq6tLV155pW644YaoXgIAAFS4+qie+Ic//KEkacmSJVG9BAAAqHCRBZEgOjs71dnZad3u6OiI8WoAAEDUElWsumjRIjU1NVl/Wltb474kAAAQIV9B5Oabb1YqlSr4Z/PmzYEvZsGCBWpvb7f+7NmzJ/BzAQCA5PO1NPPtb39b11xzTcH7tLW1Bb6YhoYGNTQ0BH48ymvp8hVKp9OaOWNazu+WPbNSmUxGl186PYYrAwBUCl9BpKWlRS0tLVFdCypMOp3Wk8ufkaSsMLLsmZV6cvkzuuLSGXFdGgCgQkRWrLp7924dPnxYu3fvVk9Pj9avXy9JGjVqlAYMGBDVy6KMzPBhDyP2EOI2UwIAgF3KMAwjiie+5ppr9Itf/CLn5y+99JImTZrk6Tk6OjrU1NSk9vZ2DRo0KOQrRFjM8FFfX6fu7h5CCADUOD/jd2S7ZpYsWSLDMHL+eA0hqBwzZ0yzQkh9fR0hBADgWaK276IyLXtmpRVCurt7tOyZlXFfEgCgQiSqoRkqj7MmxLwtiZkRAEBRBBEE5laY6lbACgBAPgQRBJbJZFwLU83bmUwmjssCAFSQyHbNhIFdMwAAVJ5E7JoBAAAohiACAABiQxCJ0dLlK/JudV32zEotXb4i1McBAJA0BJEYmWe1OEOFuRslnXb/5wn6uHIiLAEAvGDXjEdRnDQb9KyWSjjjhQPxAABeEEQ8impgtYeKp55d6fmslqCPK5ckh6UoQiUAIBiCiEdRDqwzZ0yzwoSfs1qCPs6rUgfspIYlZmsAIDniLyaoIDNnTNMVl87Qk8uf0axvzQ/t233Qs1qiPuMljFqUsA/EC6P2xP7vaD5XUmZrAKDWMCPiU9izEEHPajHvN27MaN0yb27O48JYYghjFsgtLJXymYU1m5HU2RoAqDUEEZ/CHFiDntViDyEbt27LuoYnlz+jjVu3aePWbaEsMZQyYEdxIF6YS2RRL20BAIojiPgQ9sAa9KwW++Oc12APIWENrEEG7CgPxAtrNiPs2RoAgH8EEY+iGFi9FHoW4zYonzl6VKgDapABO+oD8UqdzYhitgYA4B9BxKMknTTrrJOwD8qS9NmxY/I+1u9OmKADdhghq5BSZjOinK0BAPhDEPEo6oHVD+egKckKIcX4KfZM6oBd6mxGkkIlANQ6gkiFcgsjZogoNCj7KfaMc8DON3NjXuuZo0cpk8nkFOra31O+XUNJCpUAUOsIIlXCWSfhNYwUKvaMc8DON3OzYctWSSeWn5z3McMJjckAoDIQRCqYOSg76yS8zFgkfetqvlmOTdu254SmJLaRBwB4QxCpUM5B2VknUWwQroStq15mbmhMBgCVjSBSgfwUkbrVWtgbop0xamTeZRA39udzPre9LiOsw+O8zNwkfXYHAJAfQaQCOYtInYHAviSzefsObdy6TVL20oXZlXXcmNG+dsLYQ4v97+bjr7h0Rqg1Gl5mbiphdgcA4I4gUoGcswyFtuSaYcP8fSaTsUKI24xKsZ0w9tByxaUzrMPjpOxdO2EdBlhsmy6NyQCgshFEqoCXLbnmbXPWoNC2XD+vV19fZ/08zBoNL8tP5t+T1ucEAOAdQaSC2ZdkirV6D7uOwvl8kkKt0fDaw4TGZABQ2QgiFcxPq/ew6yiczyflbiMuRak9TJgJAYDKQBCpYF5bvYddR+EsRjWf67JLyrMs4ve8HABActVsEKmWwaxYq/eNW7flLUwNEhjcQojz7/YC1ijCiJ/zcgAAyVazQaRaBzO3Vu/2LbomtzoKL+FMOlGXsXT5CtfzacwAF1WNhp/zcgAAyVazQaSaBrOgrd6d79FLOLP/3DljZIaTfDUiYc400VEVAKpDzQYRqToGM2er99vvvjdvq/diQSCMcFbOmSY6qgJA5avpICJ5H8ySVlOydPkKq2uqPSR8duwYbdq2vWDjr0JKDWflnGmioyoAVL503BcQN7fBzI35Td/5e3OQTafL+1Gm0+mcFu1O5pKN3yAwc8Y06/MIMtMwc8Y0q2B11rfmRxZCzOf9xb/dZb1evn8/c8ko33MtXb4itGsDAHhX0zMifra1Jq2mxH495kyA246WWd+a73tWI4yZhiiXTfwc+mfyumSUtJkvAKh2NRtEggxmcdSUFBoYJenM0aPyXk+QIOAlnHkZrNPpdGTLJl67rrr9rliQrNbdVACQVDUbRIIMZubvnQN8lN+ivQyM23btygkcQWY1ioWzjVu36ZZ5c3OuyR4+7Cf7RnUQXdCuq16CZNJmvgCg2tVsEAk6mLkN8FFO+xcbGCW51rgE6aRaKJyZjdHsgcbeMM0MH4VO9k3CQXReloyqYTcVAFSKmg0iQeRbtnDrJhrmtH++gdH8mVvgcAaBDVu25g0CxWZtli5foXFjRmvcmNFZz2GGkFRKVvjIZDKeG6jFwetMEVuDAaA8an7XjFf5li3sAaTYThH7/c2ZC6/T/s6dLJJcr+fM0aNcH28egGfupJF6A8bCxfe57vqx7ySxByj7e9y4dZskyTB6m6mZyzNJLfT0s9PG624qAEBpmBHxyEtNyeWXTo9s2t85MG7YstX1cbfe9M2sduxur2vOAph9SJwzGM4ZGvtjzxw9yipENZnX9NvX1+ngocNZj3F7vjh4KU42g5T5M+dM04YtW3XrTd+M5w0AQJViRsSjyy+dnjcszJwxTZdfOt3zt2i/fTqc3+TPHD1Km7Ztz3tfM4Q4X98+I/O1OfOyajqKzdCYj920bXtWyBk3ZrR+8W93adyY0Tp46LBahjQHmvGJWqEgaS4pmTM/+a5307btzIwAQMiYEQmJn54kfna0uA3kXrunFioONQxD9fV1umXeXOtxfgszW4Y0a+PWbVq4+L6cgtWkFXo6l4TshcP2a9uwZas2bduuDVu2uhYGx13jAgDVhiASAj89SfwEFsn9m7z9ud0GTPt9zfuYNSLma6XTKXV392jh4vt0xqiRVihKp1Oug639OlOplAzD0MFDh5VOp6zwYYaaTCajrTt3JrrQM1/hsBnyNm3bHqgZHADAH4JICLz2JAnSRM3LAXX5Bkz7fexLOebMhTmjcfDQIavmY9/uP2nO//wfavo/v9DUqVOt63559RpJyglQmYyhdDqlM0aNtF6zEs6AKbYtmh0zAFAeBJEQeO1J4gwszr4i9pkIL7tMCm0xtT+3eR+7TzU36+Chw+p/0klWgalhGNr49ht6/8/79I25c7Vl0yb9+J77rd0x+WYGesNI2rruID1M4lBoW3TSgxQAVIvIgsgf//hH/ehHP9KLL76o/fv3a+jQofra176mW265Rf369YvqZRPNGSpKbSdeaObBfO6NW7fZllwMSbKCRcuQZiuESNKHB/Zr/969GnvWF7TlD+s144qvqvmU0/Sp5mZ9+lNDXGd2JFk9Ssy+IkltZubGGeakYM3gAADBRBZENm/erEwmowcffFCjRo3Su+++q9mzZ+vYsWO68847o3rZilJKO/FiMw/2hmPjxozWGaNGZm2vlaRDH35o/d0wDL3xu99qyCmn6r9N/H/0wfv79cbvfqt/+B//pEkTx1ut2/MtL5k/T3IzMzfOMOd36QwAUJrIgsgll1yiSy65xLrd1tamLVu26IEHHiCI2JidSN12meRbnvFSayIpayfL5u07coKAOUMiSX/es1uH3t+vSX/3FaVSKX3+3PP18v/3tP60Y4eePPyh9VpLl68IdEaPvUeHU6FlqCjP8XF+jrfffa/rtugkBykAqHRl7SPS3t6u5ubmvL/v7OxUR0dH1p9ql06ntXHrNmsXi1nrYQ6SboO3l54Y5n1umTfX6npqahmS/W9gGIbeefN1DTnlVJ3WOlySdFrrcA055VS99tILOnP0KOuazNdxY/ZTyfc+3bqYFnqfpTyuGLcwd+tN38zbbbXQewMABFe2YtXt27fr3nvvLTgbsmjRIv3whz8s1yWVJKxv6s4zW8wttc7Ta+3P5+fAPnPpweSsC5FyZ0MkZc2KdB3tKLk7atBlqFKWrwoJevoyACBcKcMwjOJ3O+Hmm2/WT37yk4L32bRpk8444wzr9t69e/U3f/M3mjRpkh5++OG8j+vs7FRnZ6d1u6OjQ62trWpvb9egQYP8XGbk8g2GfgdJ8/5mQDB7dNj7cgQddM3Hnjl6lD47dowVeEz19XX6yrSLdeM3rtfHnZ26eOaVVhCRemdKnlv2a0nSxTOv1JVf+ruCS0Z+rsmsyfD7Ofl9HACg/Do6OtTU1ORp/PY9I/Ltb39b11xzTcH7tLW1WX/ft2+fJk+erIkTJ+rnP/95wcc1NDSooaHB7yXFIoxv6s77f33uPGUyhlKpVFbH0lJCiP2xmUxGBw8dshqRdXf3aP3bb+m93X/Kmg0x2WdFPjywP+d57a/lZwYoSI+OoI8DACSb7yDS0tKilpYWT/fdu3evJk+erHPOOUePPPJI4PX8pMrXh8LrIGlfHugdzA1rm60ZRvyGEHPJyG3pYfP2HTp46LC1iyaVSul/XndtVm2Ik1kr8sbvfqvBnz5VqVTKqhmR/B9oF7TZWSlN0qIseAUAlCayZLB3715NmjRJw4cP15133qmDBw9q//792r9/f1QvWdTS5SvyHlpmP/bej5kz/B1gZ2cepGcfzP/vfYuVTqdlGEbewbMQs7jT+VhzdqVlSLNumTdXl186XZ+sT+nQ+/v1+XPPz5kNMZmzIofe368/79lt/UzKnXUp9vnefve9WYf35SsMdXtskMc5P5OwC14BAKWLrFj1+eef1/bt27V9+3YNGzYs63c+y1JCU2oDMTfFvqkX+zZuHrLmLEw1ZzUWLr5Pt8yb6/l6Zs6YZjUYM28ve2alFUIOHjqs2+++V7fMm6sf/OAHajn1tLyzISZzVuSdN1/XWV/4gjZu3ebaVj7f52vfFuvccrxx67aCPTqCtMV3+0yc90/KqcAAUOsiCyLXXHNN0VqScgt7QPLSzrxY+Dlz9Kicx5u3zVmMoC3GnUtGG7Zs1cFDh7Vp23bddvuPtW7dOtfaECd7rcgf1q/XsNNPd50Byvf5miHE2ezMDEjjxowu2H8kjN0tpS6jAQCiUXNnzYQ1IHn9pu41/Lj9zL5rxv78xZgnyErKal1u/uxTzYP10/vv04BBTWo86SQdPnig6HM2nnSSBgxq0jtvvq7TWofrE5+od50BKnR+i7k84icA+tmqXAwFrwCQPDUXRKTgA5J9mcX5Td1Z9Gj/pu4l/ITZ18Lem0SS1bpc6p2VeGfTJn189Kj+69hRPfvkLz0/b+919CiT6dGYtjEyDMM1JNk/X+eylP0zsBe9lkMlnAoMALWmJoNI0AHJvsxi/6burDFxe65i4SfMb/72JQ973xCzN8nCxffp+H+/Up0ffyxJahs+XNf9w1eznuPhR3+pXbv3WLfN/iaf/+w4NZ082NrR89mxY3LCiPn5Outc7J+B1DtzUy6VdCowANSSmgsipQxIpdSYlOvbuDMU2YOIvTfJwEFN+uSAgUqn0/ro4+P60/6DWUGi/eNOXXjBBVmPbxnSrL0HPtDeAx+4zug4X99e57Jw8X0aN2Z0VpfXcgmj4BUAEI2aCiJh78DwWmNSzm/j5hKPeY2SrPAjZR+EZ/7cPHTPvB7zID5z1sN8LrM1vHNJpVidixlGzFDjvL6oQwDt3AEguWoqiIS5A8NrjUm+DqfOwd9+/1IabF1+6fSsoOMWfpzn2Dy5/JmsnSvmQXz2a/aypPLu5i05O2MkZS0ROT+vcoSRMJe9AADhqqlOTmYDMTczZ3g/XdVtmSUft/BjDvTObathNdjasGWrpNyeHePGjJYkfaq5OevnV1w6o+8E4LTrNdvfr/35nT53xlhru7Hdy6vX9r3vVNbnZb42MxIAULtqakYkDH6XWdzCjX15xwwHYTbYMg+5cz7PGaNGZv3XeT3vbt6Sc832GZMzRo3M6lLqNrPkbKa2cPF9+uBwb1v5cWNG5/y+XDMS9tb3zp08ziZytHsHgPIhiPgQZtFjlA228g2kXpYo3ApxzaUV+7JLvvdr9jB5cvkz+s0zz1rLUOZSVL6dNlEzA5R9mcjtPQbprgsACI4g4kPYRY9Ja7DlDBnOolVn+HJ7v/bnMGcZ7CHEbadNOThnoZ5c/oxVQOv2HgEA5ZEy4jr4xYOOjg41NTWpvb1dgwYNivtyQmd+GzdrTZIyEJZ6XfblKlOh5m/lZK/D6Q1KvacdJ+WzB4Bq4Gf8rqli1SQp9UTZKM2cEfxEYftSR6Hf+y3IDevkZPO9mbMxmYyRiNkoAKhVBJEY5Ks1SUoY8bMryPk48305g8iTy5/RwsX3BS7ItRfJur2m12Bj7/ra+7wpX+8RABAuakRikOQGW6U0X3M2U3M2LiulDqOUrrbO92YvvjX/S4dVAIgHQSQGSW2w5XVXkP3wP7vLL51udVHN1wytFKXsNHLbHePcNUMYAYDyI4jA4nWmxn74n7Mfh3Obr3OZZ8OWrSUN9EF3GpnvzdwJ5LYDyNlgDgAQPXbNIJB8SzjO3THF7hP0dZO20wgAcIKf8btmZ0TyLS9I8W4vrRTFlkmiOPG2nIcHAgDKo2aDSKHlBXuhJfIrtEwSdkFuFMEGABC/mg0iYezCqHVu23zNzy3sgtwk7zQCAARX8zUi1BwEE0X9BwCgOlAj4kPSznsJWxS1MElfJvHyniVRIwQACVDznVWDdhGtFGF1JLUrtExibpGNk5f3HMXnAgDwr6ZnRGphF0YUtTDlashmzmyYp/g6i4rNnztnL/y8Z2qEACBeNRtEkr68EKZSOpLGyZy1MLuhStmBwd4l1cnLe67UzwUAqknNBpFa24VRibUw9qBgtmDfuHVb1jkxhYKDl/dciZ8LAFSTmg0iST3vJUz2ok1nLczCxffpjFEjE1+QaQ8j6XRKG7duUzqd9nSAXqHtxX7uAwCITs0GkUrnZWeIubRhziKYA7d5MF2lsM9aSL2zVcVmL7zU/9RCjRAAJB1BJEY/uuvflE6ndcu8uTm/W7j4PmUyGX1//rdcH+ulM+zMGdOyljLMwda+tFEJMwDmrEU6nVImYyidThecvfBS/2P+vRZqhAAgyQgiMTKXGBYuvi8rjJgzFuPGjM77WK87Q84YNVKStHHrNs361vysgkx7T42kcham2v+bLzB4rf+ppRohAEiqmu+sGjd76Lhl3tyc28V47QxrhpD6+jr94t/uiuKthM5td0y+XTPMXgBAcvgZvwkiCeCs2fAaQkzFQkaltrEP2kcEABAvgkgF+odv3Gj9/dGf3uP5cc6QceboUfrs2DGuBZmStGHLVm3atr1iwggAoPJw1kyFWbj4vpzbfpZlnEsWm7Ztt+5jDyHm3z87dgwFmQCAROBAjZjZa0Ie/ek9Vt2DM5wsXb4i61wU50zH0uUrrLNepN7QsWHL1pwQMnPGtMScCQMAAEEkRm6FqbfMm+saRpyHtJk7QySz2VfvP6UZMs4cPUqbtm3XU8+6n58yc8a0WOsqnMHKfnvZMyu1dPkK63fO2wCA6kEQiVEmk3EtTDXDiH3GwgwYZhgxQ0S+kHHrTd+06kaS2LrcGazM2wsX35cVrDgNFwCqGzUiMcrXrEySa43IzBnTtGHL1ryHtJk7SS6/dHriW5e79UFxa77GabgAUN0IIhXms2PHaNO27TkzHfZBu1Jal7udfmsuSzmbrwEAqhPz3RXMnOlwFq66tS63L+skycwZ07KWkG6ZNzfRS0oAgHAxI1JB3AKHOdNhBo+ly1dUVOtyt1OBk7ykBAAIF0GkQrjVS9hPpDUV2gmTtAHd+Z6cu4iSuqQEAAgPSzMVwnmQm30mQertmFpJ3OpY3E4FTuqSEgAgHMyIVAj7TEe+YtRKWsZwBiv7bfupwEldUgIAhIOzZipMvi2tbHUFACQFZ81UMedMgomZAwBAJWJGBAAAhMrP+B1pseqXv/xlDR8+XI2NjTrttNP09a9/Xfv27YvyJQEAQAWJNIhMnjxZv/rVr7RlyxYtXbpUO3bs0BVXXBHlSwIAgApS1qWZ//zP/9Rll12mzs5OfeITnyh6f5ZmAACoPIksVj18+LAeffRRTZw4MW8I6ezsVGdnp3W7o6OjXJcHAABiEHlDs+9+97v65Cc/qSFDhmj37t16+umn89530aJFampqsv60trZGfXkAACBGvoPIzTffrFQqVfDP5s2brft/5zvf0e9//3s999xzqqur09VXX618q0ELFixQe3u79WfPnj3B3xkAAEg83zUiBw8e1KFDhwrep62tTf369cv5+XvvvafW1latXr1aEyZMKPpa1IgAAFB5Iq0RaWlpUUtLS6ALM5tt2etAAABA7YqsWPX111/XG2+8oQsvvFCDBw/Wjh079P3vf18jR470NBsCAACqX2TFqv3799dvfvMbTZkyRWPHjtV1112ns846S6+88ooaGhqielkAAFBBIpsR+fznP68XX3yxpOcwy1fYxgsAQOUwx20vZaiJPvTuyJEjksQ2XgAAKtCRI0fU1NRU8D6JPvQuk8lo3759GjhwoFKplK/HdnR0qLW1VXv27GHHjU98dsHwuQXD5xYcn10wfG7B+PncDMPQkSNHNHToUKXThatAEj0jkk6nNWzYsJKeY9CgQfwPLSA+u2D43ILhcwuOzy4YPrdgvH5uxWZCTJF3VgUAAMiHIAIAAGJTtUGkoaFBt912G1uFA+CzC4bPLRg+t+D47ILhcwsmqs8t0cWqAACgulXtjAgAAEg+gggAAIgNQQQAAMSGIAIAAGJTM0Hky1/+soYPH67Gxkaddtpp+vrXv659+/bFfVmJ9sc//lHXXXedRowYoZNOOkkjR47Ubbfdpq6urrgvLfEWLlyoiRMnqn///jr55JPjvpxEu//++3X66aersbFR559/vtatWxf3JSXeq6++qi996UsaOnSoUqmUnnrqqbgvqSIsWrRIX/ziFzVw4EB9+tOf1mWXXaYtW7bEfVmJ98ADD+iss86yGplNmDBBK1asCO35ayaITJ48Wb/61a+0ZcsWLV26VDt27NAVV1wR92Ul2ubNm5XJZPTggw9qw4YNuvvuu/Wzn/1M3/ve9+K+tMTr6urSlVdeqRtuuCHuS0m0J554QvPnz9dtt92mt99+W2effbamTZumAwcOxH1piXbs2DGdffbZuv/+++O+lIryyiuvaM6cOVq7dq2ef/55/eUvf9HFF1+sY8eOxX1piTZs2DDdcccdeuutt/Tmm2/qoosu0le+8hVt2LAhnBcwatTTTz9tpFIpo6urK+5LqSj/8i//YowYMSLuy6gYjzzyiNHU1BT3ZSTWeeedZ8yZM8e63dPTYwwdOtRYtGhRjFdVWSQZy5Yti/syKtKBAwcMScYrr7wS96VUnMGDBxsPP/xwKM9VMzMidocPH9ajjz6qiRMn6hOf+ETcl1NR2tvb1dzcHPdloAp0dXXprbfe0tSpU62fpdNpTZ06VWvWrInxylAr2tvbJYn/n+ZDT0+PfvnLX+rYsWOaMGFCKM9ZU0Hku9/9rj75yU9qyJAh2r17t55++um4L6mibN++Xffee6/+6Z/+Ke5LQRX44IMP1NPTo1NOOSXr56eccor2798f01WhVmQyGc2bN08XXHCBPve5z8V9OYn3zjvvaMCAAWpoaND111+vZcuWady4caE8d0UHkZtvvlmpVKrgn82bN1v3/853vqPf//73eu6551RXV6err75aRg02lvX7uUnS3r17dckll+jKK6/U7NmzY7ryeAX53AAk05w5c/Tuu+/ql7/8ZdyXUhHGjh2r9evX6/XXX9cNN9ygWbNmaePGjaE8d0W3eD948KAOHTpU8D5tbW3q169fzs/fe+89tba2avXq1aFNL1UKv5/bvn37NGnSJI0fP15LlixROl3R+TWwIP97W7JkiebNm6ePPvoo4qurPF1dXerfv7+efPJJXXbZZdbPZ82apY8++ogZS49SqZSWLVuW9RmisLlz5+rpp5/Wq6++qhEjRsR9ORVp6tSpGjlypB588MGSn6s+hOuJTUtLi1paWgI9NpPJSJI6OzvDvKSK4Odz27t3ryZPnqxzzjlHjzzySM2GEKm0/70hV79+/XTOOedo1apV1iCayWS0atUqzZ07N96LQ1UyDEPf/OY3tWzZMr388suEkBJkMpnQxs+KDiJevf7663rjjTd04YUXavDgwdqxY4e+//3va+TIkTU3G+LH3r17NWnSJH3mM5/RnXfeqYMHD1q/O/XUU2O8suTbvXu3Dh8+rN27d6unp0fr16+XJI0aNUoDBgyI9+ISZP78+Zo1a5bOPfdcnXfeeVq8eLGOHTuma6+9Nu5LS7SjR49q+/bt1u1du3Zp/fr1am5u1vDhw2O8smSbM2eOHnvsMT399NMaOHCgVYvU1NSkk046KearS64FCxZo+vTpGj58uI4cOaLHHntML7/8slauXBnOC4Sy9ybh/vCHPxiTJ082mpubjYaGBuP00083rr/+euO9996L+9IS7ZFHHjEkuf5BYbNmzXL93F566aW4Ly1x7r33XmP48OFGv379jPPOO89Yu3Zt3JeUeC+99JLr/75mzZoV96UlWr7/f/bII4/EfWmJ9o//+I/GZz7zGaNfv35GS0uLMWXKFOO5554L7fkrukYEAABUttpd8AcAALEjiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNv8/S0UKfqPA5WAAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "engine": 0
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px --target 0\n",
+    "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n",
+    "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n",
+    "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n",
+    "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Iris Dataset\n",
+    "------------------------------\n",
+    "The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from\n",
+    "three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader.\n",
+    "\n",
+    "**NOTE: you might have to change the path to the dataset in the following cell.**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "iris = ht.load_csv(\"heat/datasets/iris.csv\", sep=\";\", split=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well.\n",
+    "\n",
+    "Fitting the dataset with `kmeans`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[1:54]: \u001b[0m\n",
+       "KMeans({\n",
+       "    \"n_clusters\": 3,\n",
+       "    \"init\": \"probability_based\",\n",
+       "    \"max_iter\": 300,\n",
+       "    \"tol\": 0.0001,\n",
+       "    \"random_state\": null\n",
+       "})"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 1,
+      "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5",
+      "error": null,
+      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
+       },
+       "execution_count": 54,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:30:37.715568Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[3:54]: \u001b[0m\n",
+       "KMeans({\n",
+       "    \"n_clusters\": 3,\n",
+       "    \"init\": \"probability_based\",\n",
+       "    \"max_iter\": 300,\n",
+       "    \"tol\": 0.0001,\n",
+       "    \"random_state\": null\n",
+       "})"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 3,
+      "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266",
+      "error": null,
+      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
+       },
+       "execution_count": 54,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:30:37.715694Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[0:75]: \u001b[0m\n",
+       "KMeans({\n",
+       "    \"n_clusters\": 3,\n",
+       "    \"init\": \"probability_based\",\n",
+       "    \"max_iter\": 300,\n",
+       "    \"tol\": 0.0001,\n",
+       "    \"random_state\": null\n",
+       "})"
+      ]
+     },
+     "metadata": {
+      "after": null,
+      "completed": null,
+      "data": {},
+      "engine_id": 0,
+      "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead",
+      "error": null,
+      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
+       },
+       "execution_count": 75,
+       "metadata": {}
+      },
+      "follow": null,
+      "msg_id": null,
+      "outputs": [],
+      "received": null,
+      "started": null,
+      "status": null,
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:30:37.715223Z"
+     },
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[0;31mOut[2:54]: \u001b[0m\n",
+       "KMeans({\n",
+       "    \"n_clusters\": 3,\n",
+       "    \"init\": \"probability_based\",\n",
+       "    \"max_iter\": 300,\n",
+       "    \"tol\": 0.0001,\n",
+       "    \"random_state\": null\n",
+       "})"
+      ]
+     },
+     "metadata": {
+      "after": [],
+      "completed": "2025-05-19T19:30:37.759682Z",
+      "data": {},
+      "engine_id": 2,
+      "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21",
+      "error": null,
+      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
+      "execute_result": {
+       "data": {
+        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
+       },
+       "execution_count": 54,
+       "metadata": {}
+      },
+      "follow": [],
+      "is_broadcast": false,
+      "is_coalescing": false,
+      "msg_id": "762efd2c-13bb8db32ded39032c1e088e_231924_46",
+      "outputs": [],
+      "received": "2025-05-19T19:30:37.766196Z",
+      "started": "2025-05-19T19:30:37.717854Z",
+      "status": "ok",
+      "stderr": "",
+      "stdout": "",
+      "submitted": "2025-05-19T19:30:37.715597Z"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "k = 3\n",
+    "kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n",
+    "kmeans.fit(iris)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:2] Number of points assigned to c1: 97 \n",
+       "Number of points assigned to c2: 24 \n",
+       "Number of points assigned to c3: 29\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:3] Number of points assigned to c1: 97 \n",
+       "Number of points assigned to c2: 24 \n",
+       "Number of points assigned to c3: 29\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:0] Number of points assigned to c1: 97 \n",
+       "Number of points assigned to c2: 24 \n",
+       "Number of points assigned to c3: 29\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[stdout:1] Number of points assigned to c1: 97 \n",
+       "Number of points assigned to c2: 24 \n",
+       "Number of points assigned to c3: 29\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%px\n",
+    "labels = kmeans.predict(iris).squeeze()\n",
+    "\n",
+    "# Select points assigned to clusters c1, c2 and c3\n",
+    "c1 = iris[ht.where(labels == 0), :]\n",
+    "c2 = iris[ht.where(labels == 1), :]\n",
+    "c3 = iris[ht.where(labels == 2), :]\n",
+    "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n",
+    "#TODO is balancing really necessary?\n",
+    "c1.balance_()\n",
+    "c2.balance_()\n",
+    "c3.balance_()\n",
+    "\n",
+    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c3: {c3.shape[0]}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of points assigned to c1: 38 \n",
+      "Number of points assigned to c2: 50 \n",
+      "Number of points assigned to c3: 62\n"
+     ]
+    }
+   ],
+   "source": [
+    "# compare Heat results with sklearn\n",
+    "from sklearn.cluster import KMeans\n",
+    "import sklearn.datasets\n",
+    "k = 3\n",
+    "iris_sk = sklearn.datasets.load_iris().data\n",
+    "kmeans_sk = KMeans(n_clusters=k, init=\"k-means++\").fit(iris_sk)\n",
+    "labels_sk = kmeans_sk.predict(iris_sk)\n",
+    "\n",
+    "c1_sk = iris_sk[labels_sk == 0, :]\n",
+    "c2_sk = iris_sk[labels_sk == 1, :]\n",
+    "c3_sk = iris_sk[labels_sk == 2, :]\n",
+    "print(f\"Number of points assigned to c1: {c1_sk.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c2: {c2_sk.shape[0]} \\n\"\n",
+    "        f\"Number of points assigned to c3: {c3_sk.shape[0]}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev-311",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/source/tutorials/notebooks/6_profiling.ipynb b/doc/source/tutorials/notebooks/6_profiling.ipynb
new file mode 100644
index 0000000000..973dcfe6b6
--- /dev/null
+++ b/doc/source/tutorials/notebooks/6_profiling.ipynb
@@ -0,0 +1,609 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0",
+   "metadata": {},
+   "source": [
+    "# Distributed profiling and energy measurements with perun"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1",
+   "metadata": {},
+   "source": [
+    "How to locate performance issues on your distributed application, and fix them, in three steps:\n",
+    "\n",
+    "1. Find the problematic/slow function in your code.\n",
+    "2. Gather statistics and data about the slow function.\n",
+    "3. Fix it!\n",
+    "\n",
+    "---\n",
+    "\n",
+    "<div style=\"float: left; padding-right: 2em; padding-top: 2em;\">\n",
+    "    <img src=\"https://raw.githubusercontent.com/Helmholtz-AI-Energy/perun/refs/heads/main/docs/images/full_logo.svg\"></img>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2",
+   "metadata": {},
+   "source": [
+    "If you want more information on perun, find any issues, or questions leaves us a message on [github](https://github.com/Helmholtz-AI-Energy/perun) or check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3",
+   "metadata": {},
+   "source": [
+    "## Installation\n",
+    "\n",
+    "Perun can be installed with ```pip```:\n",
+    "\n",
+    "```shell\n",
+    "pip install perun\n",
+    "```\n",
+    "\n",
+    "Thourgh pip, optional dependencies can be installed that target different hardware accelerators, as well as the optional MPI support.\n",
+    "\n",
+    "\n",
+    "```shell\n",
+    "pip install perun[mpi,nvidia]\n",
+    "# or\n",
+    "pip install perun[mpi,rocm]\n",
+    "```\n",
+    "\n",
+    "Running the cell below will install perun."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: perun[mpi,nvidia] in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (0.9.0)\n",
+      "Requirement already satisfied: h5py>=3.5.9 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (3.13.0)\n",
+      "Requirement already satisfied: numpy>=1.20.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (2.2.5)\n",
+      "Requirement already satisfied: pandas>=1.3 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (2.2.3)\n",
+      "Requirement already satisfied: psutil>=5.9.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (7.0.0)\n",
+      "Requirement already satisfied: py-cpuinfo>=5.0.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (9.0.0)\n",
+      "Requirement already satisfied: tabulate>=0.9 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (0.9.0)\n",
+      "Requirement already satisfied: mpi4py>=3.1 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (4.0.3)\n",
+      "Collecting nvidia-ml-py>=12.535.77 (from perun[mpi,nvidia])\n",
+      "  Using cached nvidia_ml_py-12.575.51-py3-none-any.whl.metadata (9.3 kB)\n",
+      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2.9.0.post0)\n",
+      "Requirement already satisfied: pytz>=2020.1 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2025.2)\n",
+      "Requirement already satisfied: tzdata>=2022.7 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2025.2)\n",
+      "Requirement already satisfied: six>=1.5 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas>=1.3->perun[mpi,nvidia]) (1.17.0)\n",
+      "Using cached nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB)\n",
+      "Installing collected packages: nvidia-ml-py\n",
+      "Successfully installed nvidia-ml-py-12.575.51\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "perun 0.9.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "pip install perun[mpi,nvidia]\n",
+    "perun --version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5",
+   "metadata": {},
+   "source": [
+    "## Basic command line usage\n",
+    "\n",
+    "Perun is primarily a command line tool. The complete functionality can be accessed through the ```perun``` command. On a terminal, simply type ```perun``` and click enter to get a help dialog with the available subcommands."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "usage: perun [-h] [-c CONFIGURATION] [-l {DEBUG,INFO,WARN,ERROR,CRITICAL}]\n",
+      "             [--log_file LOG_FILE] [--version]\n",
+      "             {showconf,sensors,metadata,export,monitor} ...\n",
+      "\n",
+      "Distributed performance and energy monitoring tool\n",
+      "\n",
+      "positional arguments:\n",
+      "  {showconf,sensors,metadata,export,monitor}\n",
+      "    showconf            Print perun configuration in INI format.\n",
+      "    sensors             Print available sensors by host and rank.\n",
+      "    metadata            Print available metadata.\n",
+      "    export              Export existing output file to another format.\n",
+      "    monitor             Gather power consumption from hardware devices while\n",
+      "                        SCRIPT [SCRIPT_ARGS] is running. SCRIPT is a path to\n",
+      "                        the python script to monitor, run with arguments\n",
+      "                        SCRIPT_ARGS.\n",
+      "\n",
+      "options:\n",
+      "  -h, --help            show this help message and exit\n",
+      "  -c CONFIGURATION, --configuration CONFIGURATION\n",
+      "                        Path to perun configuration file.\n",
+      "  -l {DEBUG,INFO,WARN,ERROR,CRITICAL}, --log_lvl {DEBUG,INFO,WARN,ERROR,CRITICAL}\n",
+      "                        Logging level.\n",
+      "  --log_file LOG_FILE   Path to the log file. None by default. Writting to a\n",
+      "                        file disables logging in stdout.\n",
+      "  --version             show program's version number and exit\n"
+     ]
+    }
+   ],
+   "source": [
+    "!perun"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7",
+   "metadata": {},
+   "source": [
+    "**perun** can already be used after this, without any further configuration or modification of the code. perun can monitor command line scripts, and other programs from the command lines. Try running the ```perun monitor -b sleep 10``` on a terminal, or by running the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/home/juanpedroghm/code/heat/doc/source/tutorials/notebooks\n",
+      "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R3/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n",
+      "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R3/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n",
+      "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R1/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n",
+      "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R1/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n",
+      "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n",
+      "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n",
+      "[2025-05-20 16:59:39,976][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R2/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n",
+      "[2025-05-20 16:59:39,976][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R2/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "pwd\n",
+    "mpirun -n 4 perun monitor -b sleep 10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9",
+   "metadata": {},
+   "source": [
+    "In the directory reported by ```pwd```, you should see a new directory called ```perun_results```, (might be named ```bench_data``` if the current directory is the heat root directory ) with two files, **sleep.hdf5** and **sleep_<date_and_time>.txt**. \n",
+    "\n",
+    "The file **sleep_<date_and_time>.txt** contains a summary of what was measured on the run, with the average power draw of different hardware componets, memory usage, and the total energy. The available information depends on the available *sensors* that perun finds. You can see a list of the available sensors by running the sensors subcommand:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2025-05-20 16:55:39,740][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/1:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n",
+      "[2025-05-20 16:55:39,740][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/1:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n",
+      "|           Sensor |        Source |          Device |   Unit |\n",
+      "|-----------------:|--------------:|----------------:|-------:|\n",
+      "|  cpu_0_package-0 | powercap_rapl |  DeviceType.CPU |      J |\n",
+      "|       CPU_FREQ_0 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_1 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_2 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_3 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_4 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_5 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_6 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|       CPU_FREQ_7 |        psutil |  DeviceType.CPU |     Hz |\n",
+      "|        CPU_USAGE |        psutil |  DeviceType.CPU |      % |\n",
+      "|  DISK_READ_BYTES |        psutil | DeviceType.DISK |      B |\n",
+      "| DISK_WRITE_BYTES |        psutil | DeviceType.DISK |      B |\n",
+      "|   NET_READ_BYTES |        psutil |  DeviceType.NET |      B |\n",
+      "|  NET_WRITE_BYTES |        psutil |  DeviceType.NET |      B |\n",
+      "|        RAM_USAGE |        psutil |  DeviceType.RAM |      B |\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "!perun sensors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11",
+   "metadata": {},
+   "source": [
+    "The other file, **sleep.hdf5**, contains all the raw data that perun collects, that can be used for later processing. To get an interactive view of the data, navigate to [myhdf5](https://myhdf5.hdfgroup.org), and upload the file there.\n",
+    "\n",
+    "This will let you explore the data tree that perun uses to store the hardware information. More info on the data tree can be found on the [data documentation](https://perun.readthedocs.io/en/latest/data.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12",
+   "metadata": {},
+   "source": [
+    "The data that is stored on the hdf5 file can be exported to other formats. Supported formats are text (same as text report), csv, json and bench. Run the cell below to export the last run of the sleep program to csv."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "13",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      ",run id,hostname,device_group,sensor,unit,magnitude,timestep,value\n",
+      "0,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,0.0,2021.14599609375\n",
+      "1,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,1.0068829,964.1939697265625\n",
+      "2,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,2.0126529,400.12799072265625\n",
+      "3,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,3.0183434,2600.0\n",
+      "4,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,4.024712,2800.0\n",
+      "5,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,5.0291414,2384.971923828125\n",
+      "6,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,6.033699,1418.0760498046875\n",
+      "7,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,7.0397954,2297.81298828125\n",
+      "8,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,8.047083,2893.419921875\n",
+      "9,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,9.0511675,2456.3759765625\n",
+      "10,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.060614,1828.7459716796875\n",
+      "11,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.068606,3012.5791015625\n",
+      "12,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,0.0,1211.6190185546875\n",
+      "13,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,1.0068829,2700.0\n",
+      "14,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,2.0126529,1569.219970703125\n",
+      "15,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,3.0183434,2497.64697265625\n",
+      "16,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,4.024712,2693.7109375\n",
+      "17,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,5.0291414,2240.751953125\n",
+      "18,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,6.033699,3000.02099609375\n",
+      "19,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,7.0397954,2600.0\n",
+      "20,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,8.047083,3100.0\n",
+      "21,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,9.0511675,1806.197021484375\n",
+      "22,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,10.060614,3102.570068359375\n",
+      "23,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,10.068606,2934.219970703125\n",
+      "24,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,0.0,2200.10595703125\n",
+      "25,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,1.0068829,2700.096923828125\n",
+      "26,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,2.0126529,2842.551025390625\n",
+      "27,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,3.0183434,2488.455078125\n",
+      "28,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,4.024712,2651.922119140625\n",
+      "29,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,5.0291414,2183.43310546875\n",
+      "30,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,6.033699,2751.02490234375\n",
+      "31,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,7.0397954,2544.83203125\n",
+      "32,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,8.047083,3044.756103515625\n",
+      "33,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,9.0511675,2271.235107421875\n",
+      "34,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,10.060614,2385.8291015625\n",
+      "35,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,10.068606,3200.0\n",
+      "36,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,0.0,2200.012939453125\n",
+      "37,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,1.0068829,2700.0\n",
+      "38,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,2.0126529,1869.530029296875\n",
+      "39,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,3.0183434,2600.0\n",
+      "40,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,4.024712,2800.0\n",
+      "41,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,5.0291414,2315.37109375\n",
+      "42,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,6.033699,2672.827880859375\n",
+      "43,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,7.0397954,2600.0\n",
+      "44,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,8.047083,2464.04296875\n",
+      "45,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,9.0511675,2410.884033203125\n",
+      "46,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,10.060614,3060.60791015625\n",
+      "47,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,10.068606,2562.06201171875\n",
+      "48,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,0.0,2156.548095703125\n",
+      "49,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,1.0068829,2499.455078125\n",
+      "50,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,2.0126529,400.62200927734375\n",
+      "51,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,3.0183434,2080.2919921875\n",
+      "52,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,4.024712,2777.10693359375\n",
+      "53,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,5.0291414,1521.5909423828125\n",
+      "54,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,6.033699,2873.384033203125\n",
+      "55,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,7.0397954,2195.196044921875\n",
+      "56,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,8.047083,2817.139892578125\n",
+      "57,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,9.0511675,2418.926025390625\n",
+      "58,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,10.060614,2187.868896484375\n",
+      "59,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,10.068606,2655.29296875\n",
+      "60,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,0.0,2137.35791015625\n",
+      "61,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,1.0068829,2700.0\n",
+      "62,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,2.0126529,769.7069702148438\n",
+      "63,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,3.0183434,1988.4849853515625\n",
+      "64,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,4.024712,2471.529052734375\n",
+      "65,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,5.0291414,1931.303955078125\n",
+      "66,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,6.033699,2886.305908203125\n",
+      "67,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,7.0397954,2543.840087890625\n",
+      "68,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,8.047083,3100.0\n",
+      "69,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,9.0511675,2055.845947265625\n",
+      "70,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,10.060614,2340.925048828125\n",
+      "71,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,10.068606,2812.739990234375\n",
+      "72,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,0.0,2176.281005859375\n",
+      "73,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,1.0068829,1221.010986328125\n",
+      "74,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,2.0126529,1433.5810546875\n",
+      "75,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,3.0183434,2562.242919921875\n",
+      "76,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,4.024712,2591.029052734375\n",
+      "77,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,5.0291414,2437.9990234375\n",
+      "78,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,6.033699,3000.0\n",
+      "79,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,7.0397954,2550.77392578125\n",
+      "80,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,8.047083,3063.29296875\n",
+      "81,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,9.0511675,2261.791015625\n",
+      "82,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,10.060614,3050.388916015625\n",
+      "83,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,10.068606,3017.64892578125\n",
+      "84,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,0.0,2199.987060546875\n",
+      "85,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,1.0068829,2698.6279296875\n",
+      "86,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,2.0126529,1597.2509765625\n",
+      "87,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,3.0183434,2600.0\n",
+      "88,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,4.024712,2800.0\n",
+      "89,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,5.0291414,2749.60400390625\n",
+      "90,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,6.033699,1021.1300048828125\n",
+      "91,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,7.0397954,1945.0069580078125\n",
+      "92,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,8.047083,3001.322998046875\n",
+      "93,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,9.0511675,2486.304931640625\n",
+      "94,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,10.060614,3200.0\n",
+      "95,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,10.068606,2859.821044921875\n",
+      "96,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,0.0,37.5\n",
+      "97,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,1.0068829,25.700000762939453\n",
+      "98,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,2.0126529,24.600000381469727\n",
+      "99,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,3.0183434,33.599998474121094\n",
+      "100,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,4.024712,31.5\n",
+      "101,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,5.0291414,23.100000381469727\n",
+      "102,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,6.033699,26.600000381469727\n",
+      "103,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,7.0397954,33.900001525878906\n",
+      "104,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,8.047083,24.700000762939453\n",
+      "105,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,9.0511675,23.600000381469727\n",
+      "106,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,10.060614,23.299999237060547\n",
+      "107,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,10.068606,50.0\n",
+      "108,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,0.0,9.068116188049316\n",
+      "109,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,1.0068829,9.068116188049316\n",
+      "110,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,2.0126529,9.29400634765625\n",
+      "111,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,3.0183434,10.591010093688965\n",
+      "112,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,4.024712,9.672627449035645\n",
+      "113,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,5.0291414,9.234281539916992\n",
+      "114,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,6.033699,10.3326416015625\n",
+      "115,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,7.0397954,10.53620433807373\n",
+      "116,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,8.047083,8.992063522338867\n",
+      "117,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,9.0511675,9.542298316955566\n",
+      "118,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,10.060614,10.295360565185547\n",
+      "119,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,10.068606,11.85925579071045\n",
+      "120,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,0.0,6371516416.0\n",
+      "121,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,1.0068829,6371516416.0\n",
+      "122,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,2.0126529,6371516416.0\n",
+      "123,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,3.0183434,6371516416.0\n",
+      "124,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,4.024712,6371516416.0\n",
+      "125,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,5.0291414,6371516416.0\n",
+      "126,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,6.033699,6371520512.0\n",
+      "127,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,7.0397954,6371520512.0\n",
+      "128,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,8.047083,6371520512.0\n",
+      "129,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,9.0511675,6371520512.0\n",
+      "130,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,10.060614,6371520512.0\n",
+      "131,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,10.068606,6371520512.0\n",
+      "132,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,0.0,35543599104.0\n",
+      "133,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,1.0068829,35543599104.0\n",
+      "134,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,2.0126529,35543599104.0\n",
+      "135,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,3.0183434,35543697408.0\n",
+      "136,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,4.024712,35556833280.0\n",
+      "137,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,5.0291414,35556833280.0\n",
+      "138,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,6.033699,35556923392.0\n",
+      "139,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,7.0397954,35556923392.0\n",
+      "140,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,8.047083,35556923392.0\n",
+      "141,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,9.0511675,35556923392.0\n",
+      "142,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,10.060614,35557033984.0\n",
+      "143,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,10.068606,35557033984.0\n",
+      "144,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,0.0,18377730529.0\n",
+      "145,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,1.0068829,18377732025.0\n",
+      "146,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,2.0126529,18377732426.0\n",
+      "147,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,3.0183434,18377740366.0\n",
+      "148,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,4.024712,18377741928.0\n",
+      "149,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,5.0291414,18377741994.0\n",
+      "150,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,6.033699,18377741994.0\n",
+      "151,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,7.0397954,18391531834.0\n",
+      "152,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,8.047083,18391531959.0\n",
+      "153,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,9.0511675,18391531959.0\n",
+      "154,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,10.060614,18391534144.0\n",
+      "155,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,10.068606,18391534144.0\n",
+      "156,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,0.0,304896333.0\n",
+      "157,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,1.0068829,304897829.0\n",
+      "158,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,2.0126529,304898025.0\n",
+      "159,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,3.0183434,304900338.0\n",
+      "160,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,4.024712,304901904.0\n",
+      "161,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,5.0291414,304901904.0\n",
+      "162,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,6.033699,304901904.0\n",
+      "163,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,7.0397954,304946475.0\n",
+      "164,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,8.047083,304946686.0\n",
+      "165,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,9.0511675,304946686.0\n",
+      "166,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,10.060614,304948698.0\n",
+      "167,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,10.068606,304948698.0\n",
+      "168,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,0.0,7110832128.0\n",
+      "169,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,1.0068829,7132991488.0\n",
+      "170,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,2.0126529,7121014784.0\n",
+      "171,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,3.0183434,7130132480.0\n",
+      "172,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,4.024712,7077158912.0\n",
+      "173,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,5.0291414,7070154752.0\n",
+      "174,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,6.033699,7081443328.0\n",
+      "175,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,7.0397954,7110733824.0\n",
+      "176,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,8.047083,7109107712.0\n",
+      "177,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,9.0511675,7103995904.0\n",
+      "178,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,10.060614,7114371072.0\n",
+      "179,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,10.068606,7114371072.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "perun export perun_results/sleep.hdf5 csv\n",
+    "cat perun_results/sleep_*.csv"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14",
+   "metadata": {},
+   "source": [
+    "Let's move on to a slightly more interesting example, that we are going to profile in parallel inside our notebook using **ipyparallel**. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15",
+   "metadata": {},
+   "source": [
+    "## Setup for a notebook"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "16",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 engines found\n"
+     ]
+    }
+   ],
+   "source": [
+    "from ipyparallel import Client\n",
+    "rc = Client(profile=\"default\")\n",
+    "rc.ids\n",
+    "\n",
+    "if len(rc.ids) == 0:\n",
+    "    print(\"No engines found\")\n",
+    "else:\n",
+    "    print(f\"{len(rc.ids)} engines found\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17",
+   "metadata": {},
+   "source": [
+    "## Using the perun decorators\n",
+    "\n",
+    "perun offers an alternative way to start monitoring your code by using function decorators. The main goal is to isolate the region of the code that you want to monitor inside a function, and decorate it with the ```@perun``` decorator. Now, your code can be started using the normal python command, and perun will start gathering data only when that function is reached.\n",
+    "\n",
+    "**Carefull**: For each time the perun decorator is called, it will create a new output file and a new run, which could slow down your code significantly. If the function that you want to monitor will be run more than once, it is better to use the ```@monitor``` decorator. \n",
+    "\n",
+    "Let's look at the example below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "import sklearn\n",
+    "import heat as ht\n",
+    "from perun import perun, monitor\n",
+    "\n",
+    "@monitor()\n",
+    "def data_loading():\n",
+    "    X,_ = sklearn.datasets.load_digits(return_X_y=True)\n",
+    "    return ht.array(X, split=0)\n",
+    "\n",
+    "@monitor()\n",
+    "def fitting(X):\n",
+    "    k = 10\n",
+    "    kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n",
+    "    kmeans.fit(X)\n",
+    "\n",
+    "@perun(log_lvl=\"WARNING\", data_out=\"perun_data\", format=\"text\", sampling_period=0.1)\n",
+    "def main():\n",
+    "    data = data_loading()\n",
+    "    fitting(data)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19",
+   "metadata": {},
+   "source": [
+    "The example has 3 functions, the ```main``` function with the ```@perun``` decorator, ```fitting``` and ```data_loading``` with the ```@monitor``` decorator. **perun** will start monitoring whenever we run the ```main``` function, and will record the entry and exit time of the other two functions marked with ```@monitor```. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "main()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21",
+   "metadata": {},
+   "source": [
+    "The text report will have an extra table with with all the monitored functions, outlining the average runtime, and power draw measured while the application was running, together with other metrics. The data can also be found in the hdf5 file, where the start and stop events of the functions are stored under the regions node of the individual runs. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22",
+   "metadata": {},
+   "source": [
+    "If you want more information on perun check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest) or check the code in [github](https://github.com/Helmholtz-AI-Energy/perun). Thanks!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "heat-dev311",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/source/tutorial_30_minutes.rst b/doc/source/tutorials/tutorial_30_minutes.rst
similarity index 100%
rename from doc/source/tutorial_30_minutes.rst
rename to doc/source/tutorials/tutorial_30_minutes.rst
diff --git a/doc/source/tutorial_clustering.rst b/doc/source/tutorials/tutorial_clustering.rst
similarity index 97%
rename from doc/source/tutorial_clustering.rst
rename to doc/source/tutorials/tutorial_clustering.rst
index 21b4157065..ccceb4248b 100644
--- a/doc/source/tutorial_clustering.rst
+++ b/doc/source/tutorials/tutorial_clustering.rst
@@ -50,7 +50,7 @@ all processes) and transform it into a numpy array. Plotting can only be done on
 
 This will render something like
 
-.. image:: ../images/data.png
+.. image:: ../_static/images/data.png
 
 Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the
 initial centroids.
@@ -93,7 +93,7 @@ Let's plot the assigned clusters and the respective centroids:
         plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')
         plt.show()
 
-.. image:: ../images/clustering.png
+.. image:: ../_static/images/clustering.png
 
 We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'.
 
@@ -127,7 +127,7 @@ Plotting the assigned clusters and the respective centroids:
         plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')
         plt.show()
 
-.. image:: ../images/clustering_kmeans.png
+.. image:: ../_static/images/clustering_kmeans.png
 
 The Iris Dataset
 ------------------------------
@@ -139,6 +139,8 @@ dataloader
 .. code:: python
 
     iris = ht.load("heat/datasets/iris.csv", sep=";", split=0)
+
+
 Fitting the dataset with kmeans:
 
 .. code:: python
diff --git a/doc/source/tutorials/tutorial_notebook_gallery.rst b/doc/source/tutorials/tutorial_notebook_gallery.rst
new file mode 100644
index 0000000000..67c67ab40b
--- /dev/null
+++ b/doc/source/tutorials/tutorial_notebook_gallery.rst
@@ -0,0 +1,25 @@
+Notebook gallery
+================
+
+Setup notebooks
+~~~~~~~~~~~~~~~
+
+Example notebooks explaining how to setup an MPI enabled notebook to work with heat in an interactive way.
+
+.. nbgallery::
+    notebooks/0_setup/0_setup_local
+    notebooks/0_setup/0_setup_jsc
+    notebooks/0_setup/0_setup_haicore
+
+Example notebooks
+~~~~~~~~~~~~~~~~~
+
+This notebooks contain heat examples that have been used in interactive tutorials.
+
+.. nbgallery::
+    notebooks/1_basics
+    notebooks/2_internals
+    notebooks/3_loading_preprocessing
+    notebooks/4_matrix_factorizations
+    notebooks/5_clustering
+    notebooks/6_profiling
diff --git a/doc/source/tutorial_parallel_computation.rst b/doc/source/tutorials/tutorial_parallel_computation.rst
similarity index 99%
rename from doc/source/tutorial_parallel_computation.rst
rename to doc/source/tutorials/tutorial_parallel_computation.rst
index 684e775cea..2a10777726 100644
--- a/doc/source/tutorial_parallel_computation.rst
+++ b/doc/source/tutorials/tutorial_parallel_computation.rst
@@ -74,7 +74,7 @@ Distributed Computing
 
 With Heat you can even compute in distributed memory environments with multiple computation nodes, like modern high-performance cluster systems. For this, Heat makes use of the fact that operations performed on multi-dimensional arrays tend to be identical for all data items. Hence, they can be processed in data-parallel manner. Heat partitions the total number of data items equally among all processing nodes. A ``DNDarray`` assumes the role of a virtual overlay over these node-local data portions and manages them for you while offering the same interface. Consequently, operations can now be executed in parallel. Each processing node applies them locally to their own data chunk. If necessary, partial results are communicated and automatically combined behind the scenes for correct global results.
 
-.. image:: ../images/split_array.svg
+.. image:: ../_static/images/split_array.svg
     :align: center
     :width: 80%
 
@@ -202,7 +202,7 @@ Technical Details
 
 On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) <https://en.wikipedia.org/wiki/Bulk_synchronous_parallel>`_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronization only occurs for collective MPI calls as well as at the program start and termination.
 
-.. image:: ../images/bsp.svg
+.. image:: ../_static/images/bsp.svg
     :align: center
     :width: 60%
 
diff --git a/doc/source/tutorials.rst b/doc/source/tutorials/tutorials.rst
similarity index 68%
rename from doc/source/tutorials.rst
rename to doc/source/tutorials/tutorials.rst
index 6cc42143f2..59b68fb2bf 100644
--- a/doc/source/tutorials.rst
+++ b/doc/source/tutorials/tutorials.rst
@@ -7,12 +7,14 @@ Heat Tutorials
     tutorial_30_minutes
     tutorial_parallel_computation
     tutorial_clustering
+    tutorial_notebook_gallery
+
 
 .. container:: tutorial
 
     .. container:: tutorial-image
 
-        .. image:: ../images/tutorial_logo.svg
+        .. image:: ../_static/images/tutorial_logo.svg
             :target: tutorial_30_minutes.html
 
     .. raw:: html
@@ -31,7 +33,7 @@ Heat Tutorials
 
     .. container:: tutorial-image
 
-        .. image:: ../images/tutorial_split_dndarray.svg
+        .. image:: ../_static/images/tutorial_split_dndarray.svg
             :target: tutorial_parallel_computation.html
 
     .. raw:: html
@@ -50,7 +52,7 @@ Heat Tutorials
 
     .. container:: tutorial-image
 
-        .. image:: ../images/tutorial_clustering.svg
+        .. image:: ../_static/images/tutorial_clustering.svg
             :target: tutorial_clustering.html
 
     .. raw:: html
@@ -63,3 +65,22 @@ Heat Tutorials
                 <em>For intermediate analysts.</em>
             </div>
         </a>
+
+
+.. container:: tutorial
+
+    .. container:: tutorial-image
+
+        .. image:: ../_static/images/jupyter.png
+            :target: tutorial_notebook_gallery.html
+
+    .. raw:: html
+
+        <a class="reference external" href="tutorial_notebook_gallery.html">
+            <div class="tutorial-text">
+                <strong>Example notebooks</strong>
+                <p/>
+                <p>Ideal for people that like using jupyter notebook or other interactive environments.</p>
+                <em>Excellent for beginners.</em>
+            </div>
+        </a>
diff --git a/docker/Dockerfile.release b/docker/Dockerfile.release
index 3aa43fde14..8ead42996a 100644
--- a/docker/Dockerfile.release
+++ b/docker/Dockerfile.release
@@ -1,5 +1,5 @@
 ARG HEAT_VERSION=latest
-ARG PYTORCH_IMG=23.05-py3
+ARG PYTORCH_IMG=25.07-py3
 
 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
 COPY ./tzdata.seed /tmp/tzdata.seed
diff --git a/docker/Dockerfile.source b/docker/Dockerfile.source
index 2765d1cc41..93a017b359 100644
--- a/docker/Dockerfile.source
+++ b/docker/Dockerfile.source
@@ -1,4 +1,4 @@
-ARG PYTORCH_IMG=23.05-py3
+ARG PYTORCH_IMG=25.07-py3
 ARG HEAT_BRANCH=main
 
 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
diff --git a/docker/scripts/build_and_push.sh b/docker/scripts/build_and_push.sh
index 10895596ab..8fdd1d309f 100755
--- a/docker/scripts/build_and_push.sh
+++ b/docker/scripts/build_and_push.sh
@@ -8,39 +8,35 @@ while [[ $# -gt 0 ]]; do
   case $1 in
     --heat-version)
       HEAT_VERSION="$2"
-      shift # past argument
-      shift # past value
+      shift 2 # Use 'shift 2' as a concise way to skip past the argument and its value
       ;;
     --pytorch-img)
       PYTORCH_IMG="$2"
-      shift # past argument
-      shift # past value
+      shift 2
       ;;
     --torch-version)
       TORCH_VERSION="$2"
-      shift # past argument
-      shift # past value
+      shift 2
       ;;
     --cuda-version)
       CUDA_VERSION="$2"
-      shift # past argument
-      shift # past value
+      shift 2
       ;;
     --python-version)
       PYTHON_VERSION="$2"
-      shift # past argument
-      shift # past value
+      shift 2
       ;;
     --upload)
       GHCR_UPLOAD=true
-      shift
-      shift
+      shift # FIX 1: This is a flag, not an option with a value, so only shift once.
       ;;
-    -*|--*)
+    -*) # FIX 2: Simplified from '-*|--*'. This correctly catches all unknown options.
       echo "Unknown option $1"
       exit 1
       ;;
-    *)
+    *) # FIX 3: Added a 'shift' to handle positional arguments and prevent an infinite loop.
+      shift
+      ;;
   esac
 done
 
@@ -56,13 +52,13 @@ ghcr_tag="ghcr.io/helmholtz-analytics/heat:${HEAT_VERSION}_torch${TORCH_VERSION}
 echo "Building image $ghcr_tag"
 
 docker build --file ../Dockerfile.release \
-              --build-arg HEAT_VERSION=$HEAT_VERSION \
-              --build-arg PYTORCH_IMG=$PYTORCH_IMG \
-              --tag $ghcr_tag \
+              --build-arg HEAT_VERSION="$HEAT_VERSION" \
+              --build-arg PYTORCH_IMG="$PYTORCH_IMG" \
+              --tag "$ghcr_tag" \
               .
 
 if [ $GHCR_UPLOAD = true ]; then
   echo "Push image"
   echo "You might need to log in into ghcr.io (https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry)"
-  docker push $ghcr_tag
+  docker push "$ghcr_tag"
 fi
diff --git a/docker/scripts/install_print_test.sh b/docker/scripts/install_print_test.sh
index 9103be9562..08bbf33ae6 100755
--- a/docker/scripts/install_print_test.sh
+++ b/docker/scripts/install_print_test.sh
@@ -12,10 +12,10 @@ mpirun --version
 
 # Install heat from source.
 git clone https://github.com/helmholtz-analytics/heat.git
-cd heat
+cd heat || exit
 pip install --upgrade pip
 pip install mpi4py --no-binary :all:
-pip install .[netcdf,hdf5,dev]
+pip install '.[netcdf,hdf5,dev]'
 
 # Run tests
 HEAT_TEST_USE_DEVICE=gpu mpirun -n 1 pytest heat/
diff --git a/docker/scripts/test_nvidia_image_haicore_enroot.sh b/docker/scripts/test_nvidia_image_haicore_enroot.sh
index 7b052b22ea..07c6b94f7d 100755
--- a/docker/scripts/test_nvidia_image_haicore_enroot.sh
+++ b/docker/scripts/test_nvidia_image_haicore_enroot.sh
@@ -12,7 +12,7 @@ SBATCH_PARAMS=(
 	--gres		   gpu:1
 	--container-image  ~/containers/nvidia+pytorch+23.05-py3.sqsh
 	--container-writable
-	--container-mounts /etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch
+	--container-mounts "/etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch"
 	--container-mount-home
 )
 
diff --git a/heat/classification/kneighborsclassifier.py b/heat/classification/kneighborsclassifier.py
index dac65546bc..90d1859537 100644
--- a/heat/classification/kneighborsclassifier.py
+++ b/heat/classification/kneighborsclassifier.py
@@ -24,7 +24,7 @@ class KNeighborsClassifier(ht.BaseEstimator, ht.ClassificationMixin):
         The distance function used to identify the nearest neighbors, defaults to the Euclidean distance.
 
     References
-    --------
+    ----------
     [1] T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in IEEE Transactions on Information Theory,
     vol. 13, no. 1, pp. 21-27, January 1967, doi: 10.1109/TIT.1967.1053964.
     """
@@ -122,7 +122,6 @@ def predict(self, x: DNDarray) -> DNDarray:
         """
         distances = self.effective_metric_(x, self.x)
         _, indices = ht.topk(distances, self.n_neighbors, largest=False)
-
         predictions = self.y[indices.flatten()]
         predictions.balance_()
         predictions = ht.reshape(predictions, (indices.gshape + (self.y.gshape[1],)))
diff --git a/heat/cli.py b/heat/cli.py
new file mode 100644
index 0000000000..29a91fdbc7
--- /dev/null
+++ b/heat/cli.py
@@ -0,0 +1,54 @@
+"""
+Heat command line interface module.
+"""
+
+import torch
+import platform
+import mpi4py
+import argparse
+
+from heat.core.version import __version__ as ht_version
+from heat.core.communication import CUDA_AWARE_MPI
+
+
+def cli() -> None:
+    """
+    Command line interface entrypoint.
+    """
+    parser = argparse.ArgumentParser(
+        prog="heat", description="Commmand line utilities of the Helmholtz Analytics Toolkit"
+    )
+    parser.add_argument(
+        "-i", "--info", action="store_true", help="Print version and platform information"
+    )
+
+    args = parser.parse_args()
+    if args.info:
+        plaform_info()
+    else:
+        parser.print_help()
+
+
+def plaform_info():
+    """
+    Print the current software stack being used by heat, including available devices.
+    """
+    print("HeAT: Helmholtz Analytics Toolkit")
+    print(f"  Version: {ht_version}")
+    print(f"  Platform: {platform.platform()}")
+
+    print(f"  mpi4py Version: {mpi4py.__version__}")
+    print(f"  MPI Library Version: {mpi4py.MPI.Get_library_version()}")
+
+    print(f"  Torch Version: {torch.__version__}")
+    print(f"  CUDA Available: {torch.cuda.is_available()}")
+    if torch.cuda.is_available():
+        def_device = torch.cuda.current_device()
+        print(f"    Device count: {torch.cuda.device_count()}")
+        print(f"    Default device: {def_device}")
+        print(f"    Device name: {torch.cuda.get_device_name(def_device)}")
+        print(f"    Device name: {torch.cuda.get_device_properties(def_device)}")
+        print(
+            f"   Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB"
+        )
+        print(f"    CUDA Aware MPI: {CUDA_AWARE_MPI}")
diff --git a/heat/cluster/batchparallelclustering.py b/heat/cluster/batchparallelclustering.py
index 257b88c18d..ef7dd45aba 100644
--- a/heat/cluster/batchparallelclustering.py
+++ b/heat/cluster/batchparallelclustering.py
@@ -308,13 +308,16 @@ def predict(self, x: DNDarray):
         if self._p == 2:
             self._functional_value = (
                 torch.norm(
-                    x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), p="fro"
+                    x.larray - self._cluster_centers.larray[local_labels, :].squeeze(),
+                    p="fro",
                 )
                 ** 2
             )
         else:
             self._functional_value = torch.norm(
-                x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), p=self._p, dim=1
+                x.larray - self._cluster_centers.larray[local_labels, :].squeeze(),
+                p=self._p,
+                dim=1,
             ).sum()
         x.comm.Allreduce(ht.communication.MPI.IN_PLACE, self._functional_value)
         self._functional_value = self._functional_value.item()
diff --git a/heat/cluster/kmedians.py b/heat/cluster/kmedians.py
index c7d991b1fd..efca792ff6 100644
--- a/heat/cluster/kmedians.py
+++ b/heat/cluster/kmedians.py
@@ -32,7 +32,7 @@ class KMedians(_KCluster):
         Determines random number generation for centroid initialization.
 
     References
-    -------------
+    ----------
     [1] Hakimi, S., and O. Kariv. "An algorithmic approach to network location problems II: The p-medians." SIAM Journal on Applied Mathematics 37.3 (1979): 539-560.
     """
 
diff --git a/heat/cluster/kmedoids.py b/heat/cluster/kmedoids.py
index 0eb38a5eb6..b0fd951ae2 100644
--- a/heat/cluster/kmedoids.py
+++ b/heat/cluster/kmedoids.py
@@ -10,9 +10,9 @@
 
 class KMedoids(_KCluster):
     """
-    This is not the original implementation of k-medoids using PAM as originally proposed by in [1].
-    This is kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center
+    Kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center
     and snapping the centroid to the the nearest datapoint afterwards.
+    This is not the original implementation of k-medoids using PAM as originally proposed by in [1].
 
     Parameters
     ----------
@@ -30,7 +30,7 @@ class KMedoids(_KCluster):
         Determines random number generation for centroid initialization.
 
     References
-    -----------
+    ----------
     [1] Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical Data Analysis Based on the L1 Norm and Related Methods, edited by Y. Dodge, North-Holland, 405416.
 
     """
diff --git a/heat/cluster/tests/test_batchparallelclustering.py b/heat/cluster/tests/test_batchparallelclustering.py
index 222e6f8fff..684d9d9247 100644
--- a/heat/cluster/tests/test_batchparallelclustering.py
+++ b/heat/cluster/tests/test_batchparallelclustering.py
@@ -1,4 +1,5 @@
 import os
+import platform
 import unittest
 import numpy as np
 import torch
@@ -11,7 +12,12 @@
 
 # test BatchParallelKCluster base class and auxiliary functions
 
+# skip on MPS
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
 
+
+@unittest.skipIf(is_mps, "Batchparallelclustering fit() fails on MPS")
 class TestAuxiliaryFunctions(TestCase):
     def test_kmex(self):
         X = torch.rand(10, 3)
@@ -50,6 +56,7 @@ def test_BatchParallelKClustering(self):
 
 
 # test BatchParallelKMeans and BatchParallelKMedians
+@unittest.skipIf(is_mps, "Batchparallelclustering fit() fails on MPS")
 class TestBatchParallelKCluster(TestCase):
     def test_clusterer(self):
         for ParallelClusterer in [ht.cluster.BatchParallelKMeans, ht.cluster.BatchParallelKMedians]:
@@ -84,6 +91,11 @@ def test_get_and_set_params(self):
             self.assertEqual(10, parallelclusterer.n_clusters)
 
     def test_spherical_clusters(self):
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+
         for ParallelClusterer in [ht.cluster.BatchParallelKMeans, ht.cluster.BatchParallelKMedians]:
             if ParallelClusterer is ht.cluster.BatchParallelKMeans:
                 ppinitkws = ["k-means++"]
@@ -91,7 +103,7 @@ def test_spherical_clusters(self):
                 ppinitkws = ["k-medians++"]
             for seed in [1, None]:
                 n = 20 * ht.MPI_WORLD.size
-                for dtype in [ht.float32, ht.float64]:
+                for dtype in dtypes:
                     data = create_spherical_dataset(
                         num_samples_cluster=n,
                         radius=1.0,
diff --git a/heat/cluster/tests/test_kmeans.py b/heat/cluster/tests/test_kmeans.py
index 9fa79ed67e..ec6254c633 100644
--- a/heat/cluster/tests/test_kmeans.py
+++ b/heat/cluster/tests/test_kmeans.py
@@ -100,15 +100,20 @@ def test_spherical_clusters(self):
 
         # different datatype
         n = 20 * ht.MPI_WORLD.size
+        if self.is_mps:
+            # MPS does not support float64
+            dtype = ht.float32
+        else:
+            dtype = ht.float64
         data = create_spherical_dataset(
-            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed
+            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed
         )
         kmeans = ht.cluster.KMeans(n_clusters=4, init="kmeans++")
         kmeans.fit(data)
         self.assertIsInstance(kmeans.cluster_centers_, ht.DNDarray)
         self.assertEqual(kmeans.cluster_centers_.shape, (4, 3))
 
-        # on Ints (different radius, offset and datatype
+        # on Ints (different radius, offset and datatype)
         data = create_spherical_dataset(
             num_samples_cluster=n, radius=10.0, offset=40.0, dtype=ht.int32, random_state=seed
         )
diff --git a/heat/cluster/tests/test_kmedians.py b/heat/cluster/tests/test_kmedians.py
index 64c95eb740..ee8b534e50 100644
--- a/heat/cluster/tests/test_kmedians.py
+++ b/heat/cluster/tests/test_kmedians.py
@@ -100,8 +100,13 @@ def test_spherical_clusters(self):
 
         # different datatype
         n = 20 * ht.MPI_WORLD.size
+        # MPS does not support float64
+        if self.is_mps:
+            dtype = ht.float32
+        else:
+            dtype = ht.float64
         data = create_spherical_dataset(
-            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed
+            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed
         )
         kmedians = ht.cluster.KMedians(n_clusters=4, init="kmedians++")
         kmedians.fit(data)
diff --git a/heat/cluster/tests/test_kmedoids.py b/heat/cluster/tests/test_kmedoids.py
index b04d29a522..27ce5388bf 100644
--- a/heat/cluster/tests/test_kmedoids.py
+++ b/heat/cluster/tests/test_kmedoids.py
@@ -103,8 +103,13 @@ def test_spherical_clusters(self):
 
         # different datatype
         n = 20 * ht.MPI_WORLD.size
+        # MPS does not support float64
+        if self.is_mps:
+            dtype = ht.float32
+        else:
+            dtype = ht.float64
         data = create_spherical_dataset(
-            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed
+            num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed
         )
         kmedoid = ht.cluster.KMedoids(n_clusters=4, init="kmedoids++")
         kmedoid.fit(data)
diff --git a/heat/cluster/tests/test_spectral.py b/heat/cluster/tests/test_spectral.py
index 9e24dddfc5..cd43433d9d 100644
--- a/heat/cluster/tests/test_spectral.py
+++ b/heat/cluster/tests/test_spectral.py
@@ -2,6 +2,7 @@
 import unittest
 
 import heat as ht
+import torch
 
 from ...core.tests.test_suites.basic_test import TestCase
 
@@ -35,49 +36,51 @@ def test_get_and_set_params(self):
         self.assertEqual(10, spectral.n_clusters)
 
     def test_fit_iris(self):
-        # get some test data
-        iris = ht.load("heat/datasets/iris.csv", sep=";", split=0)
-        m = 10
-        # fit the clusters
-        spectral = ht.cluster.Spectral(
-            n_clusters=3, gamma=1.0, metric="rbf", laplacian="fully_connected", n_lanczos=m
-        )
-        spectral.fit(iris)
-        self.assertIsInstance(spectral.labels_, ht.DNDarray)
+        # skip on MPS, matmul on ComplexFloat not supported as of PyTorch 2.5
+        if not self.is_mps:
+            # get some test data
+            iris = ht.load("heat/datasets/iris.csv", sep=";", split=0)
+            m = 10
+            # fit the clusters
+            spectral = ht.cluster.Spectral(
+                n_clusters=3, gamma=1.0, metric="rbf", laplacian="fully_connected", n_lanczos=m
+            )
+            spectral.fit(iris)
+            self.assertIsInstance(spectral.labels_, ht.DNDarray)
 
-        spectral = ht.cluster.Spectral(
-            metric="euclidean",
-            laplacian="eNeighbour",
-            threshold=0.5,
-            boundary="upper",
-            n_lanczos=m,
-        )
-        labels = spectral.fit_predict(iris)
-        self.assertIsInstance(labels, ht.DNDarray)
+            spectral = ht.cluster.Spectral(
+                metric="euclidean",
+                laplacian="eNeighbour",
+                threshold=0.5,
+                boundary="upper",
+                n_lanczos=m,
+            )
+            labels = spectral.fit_predict(iris)
+            self.assertIsInstance(labels, ht.DNDarray)
 
-        spectral = ht.cluster.Spectral(
-            gamma=0.1,
-            metric="rbf",
-            laplacian="eNeighbour",
-            threshold=0.5,
-            boundary="upper",
-            n_lanczos=m,
-        )
-        labels = spectral.fit_predict(iris)
-        self.assertIsInstance(labels, ht.DNDarray)
+            spectral = ht.cluster.Spectral(
+                gamma=0.1,
+                metric="rbf",
+                laplacian="eNeighbour",
+                threshold=0.5,
+                boundary="upper",
+                n_lanczos=m,
+            )
+            labels = spectral.fit_predict(iris)
+            self.assertIsInstance(labels, ht.DNDarray)
 
-        kmeans = {"kmeans++": "kmeans++", "max_iter": 30, "tol": -1}
-        spectral = ht.cluster.Spectral(
-            n_clusters=3, gamma=1.0, normalize=True, n_lanczos=m, params=kmeans
-        )
-        labels = spectral.fit_predict(iris)
-        self.assertIsInstance(labels, ht.DNDarray)
+            kmeans = {"kmeans++": "kmeans++", "max_iter": 30, "tol": -1}
+            spectral = ht.cluster.Spectral(
+                n_clusters=3, gamma=1.0, normalize=True, n_lanczos=m, params=kmeans
+            )
+            labels = spectral.fit_predict(iris)
+            self.assertIsInstance(labels, ht.DNDarray)
 
-        # Errors
-        with self.assertRaises(NotImplementedError):
-            spectral = ht.cluster.Spectral(metric="ahalanobis", n_lanczos=m)
+            # Errors
+            with self.assertRaises(NotImplementedError):
+                spectral = ht.cluster.Spectral(metric="ahalanobis", n_lanczos=m)
 
-        iris_split = ht.load("heat/datasets/iris.csv", sep=";", split=1)
-        spectral = ht.cluster.Spectral(n_lanczos=20)
-        with self.assertRaises(NotImplementedError):
-            spectral.fit(iris_split)
+            iris_split = ht.load("heat/datasets/iris.csv", sep=";", split=1)
+            spectral = ht.cluster.Spectral(n_lanczos=20)
+            with self.assertRaises(NotImplementedError):
+                spectral.fit(iris_split)
diff --git a/heat/core/_operations.py b/heat/core/_operations.py
index 4541ba08a7..3977975a7a 100644
--- a/heat/core/_operations.py
+++ b/heat/core/_operations.py
@@ -63,7 +63,7 @@ def __binary_op(
 
     MPI communication is necessary when both operands are distributed along the same dimension, but the distribution maps do not match. E.g.:
     ```
-    a =  ht.ones(10000, split=0)
+    a = ht.ones(10000, split=0)
     b = ht.zeros(10000, split=0)
     c = a[:-1] + b[1:]
     ```
@@ -197,6 +197,10 @@ def __get_out_params(target, other=None, map=None):
         sanitation.sanitize_out(out, output_shape, output_split, output_device, output_comm)
         t1, t2 = sanitation.sanitize_distribution(t1, t2, target=out)
 
+    # MPS does not support float64
+    if t1.larray.is_mps and promoted_type == torch.float64:
+        promoted_type = torch.float32
+
     result = operation(t1.larray.to(promoted_type), t2.larray.to(promoted_type), **fn_kwargs)
 
     if out is None and where is True:
@@ -282,6 +286,9 @@ def __cum_op(
 
     if dtype is not None:
         dtype = types.canonical_heat_type(dtype)
+        if x.larray.is_mps and dtype == types.float64:
+            warnings.warn("MPS does not support float64, will cast to float32")
+            dtype = types.float32
 
     if out is not None:
         sanitation.sanitize_out(out, x.shape, x.split, x.device)
@@ -350,13 +357,15 @@ def __local_op(
     out : DNDarray, optional
         A location in which to store the results. If provided, it must have a broadcastable shape. If not provided or
         set to None, a fresh tensor is allocated.
+    **kwargs:
+        Arguments to be passed to the operation.
 
     Warning
     -------
     The gshape of the result DNDarray will be the same as that of x
 
     Raises
-    -------
+    ------
     TypeError
         If the input is not a tensor or the output is not a tensor or None.
     """
@@ -369,6 +378,8 @@ def __local_op(
     # we need floating point numbers here, due to PyTorch only providing sqrt() implementation for float32/64
     if not no_cast:
         promoted_type = types.promote_types(x.dtype, types.float32)
+        if promoted_type is types.float64 and x.larray.is_mps:
+            promoted_type = types.float32
         torch_type = promoted_type.torch_type()
     else:
         torch_type = x.larray.dtype
@@ -426,6 +437,8 @@ def __reduce_op(
         Neutral element, i.e. an element that does not change the result of the reduction operation. Needed for
         those cases where 'x.gshape[x.split] < x.comm.rank', that is, the shape of the distributed tensor is such
         that one or more processes will be left without data.
+    **kwargs:
+        Arguments to be passed to the operation.
 
     Raises
     ------
diff --git a/heat/core/arithmetics.py b/heat/core/arithmetics.py
index d91c6e6d1b..1318e8a8ba 100644
--- a/heat/core/arithmetics.py
+++ b/heat/core/arithmetics.py
@@ -231,11 +231,11 @@ def bitwise_and(
     DNDarray(1, dtype=ht.int64, device=cpu:0, split=None)
     >>> ht.bitwise_and(14, 13)
     DNDarray(12, dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.bitwise_and(ht.array([14,3]), 13)
+    >>> ht.bitwise_and(ht.array([14, 3]), 13)
     DNDarray([12,  1], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.bitwise_and(ht.array([11,7]), ht.array([4,25]))
+    >>> ht.bitwise_and(ht.array([11, 7]), ht.array([4, 25]))
     DNDarray([0, 1], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.bitwise_and(ht.array([2,5,255]), ht.array([3,14,16]))
+    >>> ht.bitwise_and(ht.array([2, 5, 255]), ht.array([3, 14, 16]))
     DNDarray([ 2,  4, 16], dtype=ht.int64, device=cpu:0, split=None)
     >>> ht.bitwise_and(ht.array([True, True]), ht.array([False, True]))
     DNDarray([False,  True], dtype=ht.bool, device=cpu:0, split=None)
@@ -304,15 +304,15 @@ def bitwise_and_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     DNDarray(16, dtype=ht.int64, device=cpu:0, split=None)
     >>> T2
     DNDarray(16, dtype=ht.int64, device=cpu:0, split=None)
-    >>> T4 = ht.array([14,3])
+    >>> T4 = ht.array([14, 3])
     >>> s = 29
     >>> T4 &= s
     >>> T4
     DNDarray([12,  1], dtype=ht.int64, device=cpu:0, split=None)
     >>> s
     29
-    >>> T5 = ht.array([2,5,255])
-    >>> T6 = ht.array([3,14,16])
+    >>> T5 = ht.array([2, 5, 255])
+    >>> T6 = ht.array([3, 14, 16])
     >>> T5 &= T6
     >>> T5
     DNDarray([ 2,  4, 16], dtype=ht.int64, device=cpu:0, split=None)
@@ -457,7 +457,7 @@ def bitwise_or_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     DNDarray([33,  5], dtype=ht.int64, device=cpu:0, split=None)
     >>> s
     1
-    >>> T4 = ht.array([2,5,255])
+    >>> T4 = ht.array([2, 5, 255])
     >>> T5 = ht.array([4, 4, 4])
     >>> T4 |= T5
     >>> T4
@@ -524,9 +524,9 @@ def bitwise_xor(
     DNDarray(28, dtype=ht.int64, device=cpu:0, split=None)
     >>> ht.bitwise_xor(31, 5)
     DNDarray(26, dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.bitwise_xor(ht.array([31,3]), 5)
+    >>> ht.bitwise_xor(ht.array([31, 3]), 5)
     DNDarray([26,  6], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.bitwise_xor(ht.array([31,3]), ht.array([5,6]))
+    >>> ht.bitwise_xor(ht.array([31, 3]), ht.array([5, 6]))
     DNDarray([26,  5], dtype=ht.int64, device=cpu:0, split=None)
     >>> ht.bitwise_xor(ht.array([True, True]), ht.array([False, True]))
     DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None)
@@ -598,7 +598,7 @@ def bitwise_xor_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     DNDarray([26,  6], dtype=ht.int64, device=cpu:0, split=None)
     >>> s
     5
-    >>> T4 = ht.array([31,3,255])
+    >>> T4 = ht.array([31, 3, 255])
     >>> T5 = ht.array([5, 6, 4])
     >>> T4 ^= T5
     >>> T4
@@ -661,7 +661,7 @@ def copysign(
     --------
     >>> ht.copysign(ht.array([3, 2, -8, -2, 4]), 1)
     DNDarray([3, 2, 8, 2, 4], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.copysign(ht.array([3., 2., -8., -2., 4.]), ht.array([1., -1., 1., -1., 1.]))
+    >>> ht.copysign(ht.array([3.0, 2.0, -8.0, -2.0, 4.0]), ht.array([1.0, -1.0, 1.0, -1.0, 1.0]))
     DNDarray([ 3., -2.,  8., -2.,  4.], dtype=ht.float32, device=cpu:0, split=None)
     """
     try:
@@ -702,7 +702,7 @@ def copysign_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T1 = ht.array([3., 2., -8., -2., 4.])
+    >>> T1 = ht.array([3.0, 2.0, -8.0, -2.0, 4.0])
     >>> s = 2.0
     >>> T1.copysign_(s)
     DNDarray([3., 2., 8., 2., 4.], dtype=ht.float32, device=cpu:0, split=None)
@@ -710,8 +710,8 @@ def copysign_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     DNDarray([3., 2., 8., 2., 4.], dtype=ht.float32, device=cpu:0, split=None)
     >>> s
     2.0
-    >>> T2 = ht.array([[1., -1.],[1., -1.]])
-    >>> T3 = ht.array([-5., 2.])
+    >>> T2 = ht.array([[1.0, -1.0], [1.0, -1.0]])
+    >>> T3 = ht.array([-5.0, 2.0])
     >>> T2.copysign_(T3)
     DNDarray([[-1.,  1.],
               [-1.,  1.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -767,7 +767,7 @@ def cumprod(a: DNDarray, axis: int, dtype: datatype = None, out=None) -> DNDarra
 
     Examples
     --------
-    >>> a = ht.full((3,3), 2)
+    >>> a = ht.full((3, 3), 2)
     >>> ht.cumprod(a, 0)
     DNDarray([[2., 2., 2.],
             [4., 4., 4.],
@@ -796,7 +796,7 @@ def cumprod_(t: DNDarray, axis: int) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T = ht.full((3,3), 2)
+    >>> T = ht.full((3, 3), 2)
     >>> T.cumprod_(0)
     DNDarray([[2., 2., 2.],
               [4., 4., 4.],
@@ -821,6 +821,14 @@ def wrap_cumprod_(a: torch.Tensor, b: int, out=None, dtype=None) -> torch.Tensor
     def wrap_mul_(a: torch.Tensor, b: torch.Tensor, out=None) -> torch.Tensor:
         return a.mul_(b)
 
+    axis = stride_tricks.sanitize_axis(t.shape, axis)
+    if axis is None:
+        raise NotImplementedError("cumprod_ is not implemented for axis=None")
+
+    if not t.is_distributed():
+        t.larray.cumprod_(dim=axis)
+        return t
+
     return _operations.__cum_op(t, wrap_cumprod_, MPI.PROD, wrap_mul_, 1, axis, dtype=None, out=t)
 
 
@@ -850,7 +858,7 @@ def cumsum(a: DNDarray, axis: int, dtype: datatype = None, out=None) -> DNDarray
 
     Examples
     --------
-    >>> a = ht.ones((3,3))
+    >>> a = ht.ones((3, 3))
     >>> ht.cumsum(a, 0)
     DNDarray([[1., 1., 1.],
               [2., 2., 2.],
@@ -874,7 +882,7 @@ def cumsum_(t: DNDarray, axis: int) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T = ht.ones((3,3))
+    >>> T = ht.ones((3, 3))
     >>> T.cumsum_(0)
     DNDarray([[1., 1., 1.],
               [2., 2., 2.],
@@ -891,6 +899,14 @@ def wrap_cumsum_(a: torch.Tensor, b: int, out=None, dtype=None) -> torch.Tensor:
     def wrap_add_(a: torch.Tensor, b: torch.Tensor, out=None) -> torch.Tensor:
         return a.add_(b)
 
+    axis = stride_tricks.sanitize_axis(t.shape, axis)
+    if axis is None:
+        raise NotImplementedError("cumsum_ is not implemented for axis=None")
+
+    if not t.is_distributed():
+        t.larray.cumsum_(dim=axis)
+        return t
+
     return _operations.__cum_op(t, wrap_cumsum_, MPI.SUM, wrap_add_, 0, axis, dtype=None, out=t)
 
 
@@ -913,7 +929,7 @@ def diff(
     output array is balanced.
 
     Parameters
-    -------
+    ----------
     a : DNDarray
         Input array
     n : int, optional
@@ -1622,8 +1638,8 @@ def wrap_gcd_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
 
 
 def hypot(
-    a: DNDarray,
-    b: DNDarray,
+    t1: DNDarray,
+    t2: DNDarray,
     /,
     out: Optional[DNDarray] = None,
     *,
@@ -1635,9 +1651,9 @@ def hypot(
 
     Parameters
     ----------
-    a:   DNDarray
+    t1:   DNDarray
          The first input array
-    b:   DNDarray
+    t2:   DNDarray
          the second input array
     out: DNDarray, optional
         The output array. It must have a shape that the inputs broadcast to and matching split axis.
@@ -1651,17 +1667,27 @@ def hypot(
 
     Examples
     --------
-    >>> a = ht.array([2.])
-    >>> b = ht.array([1.,3.,3.])
-    >>> ht.hypot(a,b)
+    >>> a = ht.array([2.0])
+    >>> b = ht.array([1.0, 3.0, 3.0])
+    >>> ht.hypot(a, b)
     DNDarray([2.2361, 3.6056, 3.6056], dtype=ht.float32, device=cpu:0, split=None)
     """
+    # catch int64 operation crash on MPS. TODO: issue still persists in 2.3.0, check 2.4, report to PyTorch
+    t1_ismps = getattr(getattr(t1, "device", "cpu"), "torch_device", "cpu").startswith("mps")
+    t2_ismps = getattr(getattr(t2, "device", "cpu"), "torch_device", "cpu").startswith("mps")
+    if t1_ismps or t2_ismps:
+        t1_isint64 = getattr(t1, "dtype", None) == types.int64
+        t2_isint64 = getattr(t2, "dtype", None) == types.int64
+        if t1_isint64 or t2_isint64:
+            raise TypeError(
+                f"hypot on MPS does not support int64 dtype, got {t1.dtype}, {t2.dtype}"
+            )
+
     try:
-        res = _operations.__binary_op(torch.hypot, a, b, out, where)
+        res = _operations.__binary_op(torch.hypot, t1, t2, out, where)
     except RuntimeError:
         # every other possibility is caught by __binary_op
-        raise TypeError(f"Not implemented for array dtype, got {a.dtype}, {b.dtype}")
-
+        raise TypeError(f"hypot on CPU does not support Int dtype, got {t1.dtype}, {t2.dtype}")
     return res
 
 
@@ -1691,8 +1717,8 @@ def hypot_(t1: DNDarray, t2: DNDarray) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T1 = ht.array([1.,3.,3.])
-    >>> T2 = ht.array(2.)
+    >>> T1 = ht.array([1.0, 3.0, 3.0])
+    >>> T2 = ht.array(2.0)
     >>> T1.hypot_(T2)
     DNDarray([2.2361, 3.6056, 3.6056], dtype=ht.float32, device=cpu:0, split=None)
     >>> T1
@@ -1704,6 +1730,17 @@ def hypot_(t1: DNDarray, t2: DNDarray) -> DNDarray:
     def wrap_hypot_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
         return a.hypot_(b)
 
+    # catch int64 operation crash on MPS
+    t1_ismps = getattr(getattr(t1, "device", "cpu"), "torch_device", "cpu").startswith("mps")
+    t2_ismps = getattr(getattr(t2, "device", "cpu"), "torch_device", "cpu").startswith("mps")
+    if t1_ismps or t2_ismps:
+        t1_isint64 = getattr(t1, "dtype", None) == types.int64
+        t2_isint64 = getattr(t2, "dtype", None) == types.int64
+        if t1_isint64 or t2_isint64:
+            raise TypeError(
+                f"hypot_ on MPS does not support int64 dtype, got {t1.dtype}, {t2.dtype}"
+            )
+
     try:
         return _operations.__binary_op(wrap_hypot_, t1, t2, out=t1)
     except NotImplementedError:
@@ -1711,7 +1748,7 @@ def wrap_hypot_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
             f"In-place operation not allowed: operands are distributed along different axes. \n Operand 1 with shape {t1.shape} is split along axis {t1.split}. \n Operand 2 with shape {t2.shape} is split along axis {t2.split}."
         )
     except RuntimeError:
-        raise TypeError(f"Not implemented for array dtype, got {t1.dtype}, {t2.dtype}")
+        raise TypeError(f"hypot on CPU does not support Int dtype, got {t1.dtype}, {t2.dtype}")
 
 
 DNDarray.hypot_ = hypot_
@@ -1724,7 +1761,7 @@ def invert(a: DNDarray, /, out: Optional[DNDarray] = None) -> DNDarray:
     Bitwise_not is an alias for invert.
 
     Parameters
-    ---------
+    ----------
     a: DNDarray
         The input array to invert. Must be of integral or Boolean types
     out : DNDarray, optional
@@ -1834,12 +1871,12 @@ def lcm(
     --------
     >>> a = ht.array([6, 12, 15])
     >>> b = ht.array([3, 4, 5])
-    >>> ht.lcm(a,b)
+    >>> ht.lcm(a, b)
     DNDarray([ 6, 12, 15], dtype=ht.int64, device=cpu:0, split=None)
     >>> s = 2
-    >>> ht.lcm(s,a)
+    >>> ht.lcm(s, a)
     DNDarray([ 6, 12, 30], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.lcm(b,s)
+    >>> ht.lcm(b, s)
     DNDarray([ 6,  4, 10], dtype=ht.int64, device=cpu:0, split=None)
     """
     try:
@@ -1943,7 +1980,7 @@ def left_shift(
 
     Examples
     --------
-    >>> ht.left_shift(ht.array([1,2,3]), 1)
+    >>> ht.left_shift(ht.array([1, 2, 3]), 1)
     DNDarray([2, 4, 6], dtype=ht.int64, device=cpu:0, split=None)
     """
     dtypes = (heat_type_of(t1), heat_type_of(t2))
@@ -2000,7 +2037,7 @@ def left_shift_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T1 = ht.array([1,2,3])
+    >>> T1 = ht.array([1, 2, 3])
     >>> s = 1
     >>> T1.left_shift_(s)
     DNDarray([2, 4, 6], dtype=ht.int64, device=cpu:0, split=None)
@@ -2208,7 +2245,7 @@ def nan_to_num(
 
     Examples
     --------
-    >>> x = ht.array([float('nan'), float('inf'), -float('inf')])
+    >>> x = ht.array([float("nan"), float("inf"), -float("inf")])
     >>> ht.nan_to_num(x)
     DNDarray([ 0.0000e+00,  3.4028e+38, -3.4028e+38], dtype=ht.float32, device=cpu:0, split=None)
     """
@@ -2245,7 +2282,7 @@ def nan_to_num_(
     Examples
     --------
     >>> import heat as ht
-    >>> T1 = ht.array([float('nan'), float('inf'), -float('inf')])
+    >>> T1 = ht.array([float("nan"), float("inf"), -float("inf")])
     >>> T1.nan_to_num_()
     DNDarray([ 0.0000e+00,  3.4028e+38, -3.4028e+38], dtype=ht.float32, device=cpu:0, split=None)
     >>> T1
@@ -2298,7 +2335,7 @@ def nanprod(
 
     Examples
     --------
-    >>> ht.nanprod(ht.array([4.,ht.nan]))
+    >>> ht.nanprod(ht.array([4.0, ht.nan]))
     DNDarray(4., dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.nanprod(ht.array([
         [1.,ht.nan],
@@ -2349,11 +2386,11 @@ def nansum(
     --------
     >>> ht.sum(ht.ones(2))
     DNDarray(2., dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,3)))
+    >>> ht.sum(ht.ones((3, 3)))
     DNDarray(9., dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,3)).astype(ht.int))
+    >>> ht.sum(ht.ones((3, 3)).astype(ht.int))
     DNDarray(9, dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,2,1)), axis=-3)
+    >>> ht.sum(ht.ones((3, 2, 1)), axis=-3)
     DNDarray([[3.],
               [3.]], dtype=ht.float32, device=cpu:0, split=None)
     """
@@ -2377,7 +2414,7 @@ def neg(a: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
     --------
     >>> ht.neg(ht.array([-1, 1]))
     DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None)
-    >>> -ht.array([-1., 1.])
+    >>> -ht.array([-1.0, 1.0])
     DNDarray([ 1., -1.], dtype=ht.float32, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(a)
@@ -2411,7 +2448,7 @@ def neg_(t: DNDarray) -> DNDarray:
     DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None)
     >>> T1
     DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None)
-    >>> T2 = ht.array([[-1., 2.5], [4. , 0.]])
+    >>> T2 = ht.array([[-1.0, 2.5], [4.0, 0.0]])
     >>> T2.neg_()
     DNDarray([[ 1.0000, -2.5000],
               [-4.0000, -0.0000]], dtype=ht.float32, device=cpu:0, split=None)
@@ -2449,7 +2486,7 @@ def pos(a: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
     --------
     >>> ht.pos(ht.array([-1, 1]))
     DNDarray([-1,  1], dtype=ht.int64, device=cpu:0, split=None)
-    >>> +ht.array([-1., 1.])
+    >>> +ht.array([-1.0, 1.0])
     DNDarray([-1.,  1.], dtype=ht.float32, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(a)
@@ -2502,7 +2539,7 @@ def pow(
 
     Examples
     --------
-    >>> ht.pow (3.0, 2.0)
+    >>> ht.pow(3.0, 2.0)
     DNDarray(9., dtype=ht.float32, device=cpu:0, split=None)
     >>> T1 = ht.float32([[1, 2], [3, 4]])
     >>> T2 = ht.float32([[3, 3], [2, 2]])
@@ -2671,7 +2708,7 @@ def prod(
 
     Examples
     --------
-    >>> ht.prod(ht.array([1.,2.]))
+    >>> ht.prod(ht.array([1.0, 2.0]))
     DNDarray(2., dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.prod(ht.array([
         [1.,2.],
@@ -2857,7 +2894,7 @@ def right_shift(
 
     Examples
     --------
-    >>> ht.right_shift(ht.array([1,2,3]), 1)
+    >>> ht.right_shift(ht.array([1, 2, 3]), 1)
     DNDarray([0, 1, 1], dtype=ht.int64, device=cpu:0, split=None)
     """
     dtypes = (heat_type_of(t1), heat_type_of(t2))
@@ -2914,7 +2951,7 @@ def right_shift_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> T1 = ht.array([1,2,32])
+    >>> T1 = ht.array([1, 2, 32])
     >>> s = 1
     >>> T1.right_shift_(s)
     DNDarray([ 0,  1, 16], dtype=ht.int64, device=cpu:0, split=None)
@@ -3124,11 +3161,11 @@ def sum(
     --------
     >>> ht.sum(ht.ones(2))
     DNDarray(2., dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,3)))
+    >>> ht.sum(ht.ones((3, 3)))
     DNDarray(9., dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,3)).astype(ht.int))
+    >>> ht.sum(ht.ones((3, 3)).astype(ht.int))
     DNDarray(9, dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.sum(ht.ones((3,2,1)), axis=-3)
+    >>> ht.sum(ht.ones((3, 2, 1)), axis=-3)
     DNDarray([[3.],
               [3.]], dtype=ht.float32, device=cpu:0, split=None)
     """
diff --git a/heat/core/base.py b/heat/core/base.py
index 9c1233ce4b..66c2000cbe 100644
--- a/heat/core/base.py
+++ b/heat/core/base.py
@@ -173,10 +173,10 @@ def transform(self, x: DNDarray) -> DNDarray:
         """
         Transforms the input data.
 
-         Parameters
-         ----------
-         x : DNDarray
-             Values to transform. Shape = (n_samples, n_features)
+        Parameters
+        ----------
+        x : DNDarray
+            Values to transform. Shape = (n_samples, n_features)
         """
         raise NotImplementedError()
 
diff --git a/heat/core/communication.py b/heat/core/communication.py
index 6443d31b01..eb3443bc10 100644
--- a/heat/core/communication.py
+++ b/heat/core/communication.py
@@ -5,8 +5,6 @@
 from __future__ import annotations
 
 import numpy as np
-import os
-import subprocess
 import math
 import ctypes
 import torch
@@ -116,6 +114,8 @@ class MPICommunication(Communication):
         Handle for the mpi4py Communicator
     """
 
+    COUNT_LIMIT = torch.iinfo(torch.int32).max
+
     __mpi_type_mappings = {
         torch.bool: MPI.BOOL,
         torch.uint8: MPI.UNSIGNED_CHAR,
@@ -134,8 +134,8 @@ class MPICommunication(Communication):
     def __init__(self, handle=MPI.COMM_WORLD):
         self.handle = handle
         try:
-            self.rank = handle.Get_rank()
-            self.size = handle.Get_size()
+            self.rank: Optional[int] = handle.Get_rank()
+            self.size: Optional[int] = handle.Get_size()
         except MPI.Exception:
             # ranks not within the group will fail with an MPI.Exception, this is expected
             self.rank = None
@@ -281,7 +281,33 @@ def mpi_type_and_elements_of(
 
         if is_contiguous:
             if counts is None:
-                return mpi_type, elements
+                if elements > cls.COUNT_LIMIT:
+                    # Uses vector type to get around the MAX_INT limit on certain MPI implementations
+                    # This is at the moment only applied when sending contiguous data, as the construction of data types to get around non-contiguous data naturally aliviates the problem to a certain extent.
+                    # Thanks to: J. R. Hammond, A. Schäfer and R. Latham, "To INT_MAX... and Beyond! Exploring Large-Count Support in MPI," 2014 Workshop on Exascale MPI at Supercomputing Conference, New Orleans, LA, USA, 2014, pp. 1-8, doi: 10.1109/ExaMPI.2014.5. keywords: {Vectors;Standards;Libraries;Optimization;Context;Memory management;Open area test sites},
+
+                    new_count = elements // cls.COUNT_LIMIT
+                    left_over = elements % cls.COUNT_LIMIT
+
+                    if new_count > cls.COUNT_LIMIT:
+                        raise ValueError("Tensor is too large")
+                    vector_type = mpi_type.Create_vector(
+                        new_count, cls.COUNT_LIMIT, cls.COUNT_LIMIT
+                    )
+                    if left_over > 0:
+                        left_over_mpi_type = mpi_type.Create_contiguous(left_over).Commit()
+                        _, old_type_extent = mpi_type.Get_extent()
+                        disp = cls.COUNT_LIMIT * new_count * old_type_extent
+                        struct_type = mpi_type.Create_struct(
+                            [1, 1], [0, disp], [vector_type, left_over_mpi_type]
+                        ).Commit()
+                        vector_type.Free()
+                        left_over_mpi_type.Free()
+                        return struct_type, 1
+                    else:
+                        return vector_type, 1
+                else:
+                    return mpi_type, elements
             factor = np.prod(obj.shape[1:], dtype=np.int32)
             return (
                 mpi_type,
@@ -310,7 +336,7 @@ def mpi_type_and_elements_of(
         return mpi_type, elements
 
     @classmethod
-    def as_mpi_memory(cls, obj) -> MPI.memory:
+    def as_mpi_memory(cls, obj: torch.Tensor) -> MPI.memory:
         """
         Converts the passed ``torch.Tensor`` into an MPI compatible memory view.
 
@@ -320,7 +346,8 @@ def as_mpi_memory(cls, obj) -> MPI.memory:
             The tensor to be converted into a MPI memory view.
         """
         # TODO: MPI.memory might be depraecated in future versions of mpi4py. The following code might need to be adapted and use MPI.buffer instead.
-        return MPI.memory.fromaddress(obj.data_ptr(), 0)
+        nbytes = obj.dtype.itemsize * obj.numel()
+        return MPI.memory.fromaddress(obj.data_ptr(), nbytes)
 
     @classmethod
     def as_buffer(
@@ -494,7 +521,7 @@ def Irecv(
         Nonblocking receive
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address where to place the received message
         source: int, optional
@@ -523,7 +550,7 @@ def Recv(
         Blocking receive
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address where to place the received message
         source: int, optional
@@ -554,7 +581,7 @@ def __send_like(
         Generic function for sending a message to process with rank "dest"
 
         Parameters
-        ------------
+        ----------
         func: Callable
             The respective MPI sending function
         buf: Union[DNDarray, torch.Tensor, Any]
@@ -578,7 +605,7 @@ def Bsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0
         Blocking buffered send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -597,7 +624,7 @@ def Ibsend(
         Nonblocking buffered send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -616,7 +643,7 @@ def Irsend(
         Nonblocking ready send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -633,7 +660,7 @@ def Isend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0
         Nonblocking send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -652,7 +679,7 @@ def Issend(
         Nonblocking synchronous send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -669,7 +696,7 @@ def Rsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0
         Blocking ready send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -686,7 +713,7 @@ def Ssend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0
         Blocking synchronous send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -703,7 +730,7 @@ def Send(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0)
         Blocking send
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be send
         dest: int, optional
@@ -723,7 +750,7 @@ def __broadcast_like(
         communicator
 
         Parameters
-        ------------
+        ----------
         func: Callable
             The respective MPI broadcast function
         buf: Union[DNDarray, torch.Tensor, Any]
@@ -747,7 +774,7 @@ def Bcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> None:
         Blocking Broadcast
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be broadcasted
         root: int
@@ -765,7 +792,7 @@ def Ibcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> MPIR
         Nonblocking Broadcast
 
         Parameters
-        ------------
+        ----------
         buf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the message to be broadcasted
         root: int
@@ -775,25 +802,91 @@ def Ibcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> MPIR
 
     Ibcast.__doc__ = MPI.Comm.Ibcast.__doc__
 
+    def __derived_op(
+        self, tensor: torch.Tensor, datatype: MPI.Datatype, operation: MPI.Op
+    ) -> Callable[[MPI.memory, MPI.memory, MPI.Datatype], None]:
+        # Based from this conversation on the internet: https://groups.google.com/g/mpi4py/c/UkDT_9pp4V4?pli=1
+        shape = tensor.shape
+        dtype = tensor.dtype
+        stride = tensor.stride()
+        offset = tensor.storage_offset()
+        count = tensor.numel()
+
+        mpiOp2torch = {
+            MPI.SUM.handle: torch.add,
+            MPI.PROD.handle: torch.mul,
+            MPI.MIN.handle: torch.min,
+            MPI.MAX.handle: torch.max,
+            MPI.LAND.handle: torch.logical_and,
+            MPI.LOR.handle: torch.logical_or,
+            MPI.LXOR.handle: torch.logical_xor,
+            MPI.BAND.handle: torch.bitwise_and,
+            MPI.BOR.handle: torch.bitwise_or,
+            MPI.BXOR.handle: torch.bitwise_xor,
+            # MPI.MINLOC.handle: torch.argmin, Not supported, seems to be an invalid inplace operation
+            # MPI.MAXLOC.handle: torch.argmax
+        }
+        mpiDtype2Ctype = {
+            torch.bool: ctypes.c_bool,
+            torch.uint8: ctypes.c_uint8,
+            torch.uint16: ctypes.c_uint16,
+            torch.uint32: ctypes.c_uint32,
+            torch.uint64: ctypes.c_uint64,
+            torch.int8: ctypes.c_int8,
+            torch.int16: ctypes.c_int16,
+            torch.int32: ctypes.c_int32,
+            torch.int64: ctypes.c_int64,
+            torch.float32: ctypes.c_float,
+            torch.float64: ctypes.c_double,
+            torch.complex64: ctypes.c_double,
+            torch.complex128: ctypes.c_longdouble,
+        }
+        ctype_size = mpiDtype2Ctype[dtype]
+        torch_op = mpiOp2torch[operation.handle]
+
+        def op(sendbuf: MPI.memory, recvbuf: MPI.memory, datatype):
+            send_arr = (ctype_size * (count + offset)).from_address(sendbuf.address)
+            recv_arr = (ctype_size * (count + offset)).from_address(recvbuf.address)
+
+            send_tensor = torch.as_strided(
+                torch.frombuffer(send_arr, dtype=dtype, count=count, offset=offset), shape, stride
+            )
+            recv_tensor = torch.as_strided(
+                torch.frombuffer(recv_arr, dtype=dtype, count=count, offset=offset), shape, stride
+            )
+            torch_op(send_tensor, recv_tensor, out=recv_tensor)
+
+        op = MPI.Op.Create(op)
+
+        return op
+
     def __reduce_like(
         self,
         func: Callable,
         sendbuf: Union[DNDarray, torch.Tensor, Any],
         recvbuf: Union[DNDarray, torch.Tensor, Any],
-        *args,
-        **kwargs,
+        op: MPI.Op,
+        *args: Any,
+        **kwargs: Any,
     ) -> Tuple[Optional[DNDarray, torch.Tensor]]:
         """
         Generic function for reduction operations.
 
         Parameters
-        ------------
+        ----------
         func: Callable
             The respective MPI reduction operation
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address where to store the result of the reduction
+        op: MPI.Op
+            Operation to apply during the reduction.
+        *args: Any
+            Additional positional arguments to be passed to the function
+        **kwargs: Any
+            Additional keyword arguments to be passed to the function
+
         """
         sbuf = None
         rbuf = None
@@ -808,56 +901,59 @@ def __reduce_like(
         # harmonize the input and output buffers
         # MPI requires send and receive buffers to be of same type and length. If the torch tensors are either not both
         # contiguous or differently strided, they have to be made matching (if possible) first.
-        if isinstance(sendbuf, torch.Tensor):
-            # convert the send buffer to a pointer, number of elements and type are identical to the receive buffer
-            dummy = (
-                sendbuf.contiguous()
-            )  # make a contiguous copy and reassign the storage, old will be collected
-            # In PyTorch Version >= 2.0.0 we can use untyped_storage() instead of storage
-            # to keep backward compatibility with earlier PyTorch versions (where no untyped_storage() exists) we use a try/except
-            # (this applies to all places of Heat where untyped_storage() is used without further comment)
-            try:
-                sendbuf.set_(
-                    dummy.untyped_storage(),
-                    dummy.storage_offset(),
-                    size=dummy.shape,
-                    stride=dummy.stride(),
-                )
-            except AttributeError:
-                sendbuf.set_(
-                    dummy.storage(),
-                    dummy.storage_offset(),
-                    size=dummy.shape,
-                    stride=dummy.stride(),
-                )
-            sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu()
-            sendbuf = self.as_buffer(sbuf)
+        if sendbuf is not MPI.IN_PLACE:
+            # Send and recv buffer need the same number of elements.
+            if sendbuf.numel() != recvbuf.numel():
+                raise ValueError("Send and recv buffers need the same number of elements.")
+
+            # Stride and offset should be the same to create the same datatype and operation. If they differ, they should be made contiguous (at the expense of memory)
+            if (
+                sendbuf.stride() != recvbuf.stride()
+                or sendbuf.storage_offset() != recvbuf.storage_offset()
+            ):
+                if not sendbuf.is_contiguous():
+                    tmp = sendbuf.contiguous()
+                    try:
+                        sendbuf.set_(
+                            tmp.untyped_storage(),
+                            tmp.storage_offset(),
+                            size=tmp.shape,
+                            stride=tmp.stride(),
+                        )
+                    except AttributeError:
+                        sendbuf.set_(
+                            tmp.storage(), tmp.storage_offset(), size=tmp.shape, stride=tmp.stride()
+                        )
+                if not recvbuf.is_contiguous():
+                    tmp = recvbuf.contiguous()
+                    try:
+                        recvbuf.set_(
+                            tmp.untyped_storage(),
+                            tmp.storage_offset(),
+                            size=tmp.shape,
+                            stride=tmp.stride(),
+                        )
+                    except AttributeError:
+                        recvbuf.set_(
+                            tmp.storage(), tmp.storage_offset(), size=tmp.shape, stride=tmp.stride()
+                        )
+
         if isinstance(recvbuf, torch.Tensor):
+            # Datatype and count shall be derived from the recv buffer, and applied to both, as they should match after the last code block
             buf = recvbuf
-            # nothing matches, the buffers have to be made contiguous
-            dummy = recvbuf.contiguous()
-            try:
-                recvbuf.set_(
-                    dummy.untyped_storage(),
-                    dummy.storage_offset(),
-                    size=dummy.shape,
-                    stride=dummy.stride(),
-                )
-            except AttributeError:
-                recvbuf.set_(
-                    dummy.storage(),
-                    dummy.storage_offset(),
-                    size=dummy.shape,
-                    stride=dummy.stride(),
-                )
             rbuf = recvbuf if CUDA_AWARE_MPI else recvbuf.cpu()
-            if sendbuf is MPI.IN_PLACE:
-                recvbuf = self.as_buffer(rbuf)
-            else:
-                recvbuf = (self.as_mpi_memory(rbuf), sendbuf[1], sendbuf[2])
+            recvbuf: Tuple[MPI.memory, int, MPI.Datatype] = self.as_buffer(rbuf, is_contiguous=True)
+            if not recvbuf[2].is_predefined:
+                # If using a derived datatype, we need to define the reduce operation to be able to handle the it.
+                derived_op = self.__derived_op(rbuf, recvbuf[2], op)
+                op = derived_op
+
+        if isinstance(sendbuf, torch.Tensor):
+            sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu()
+            sendbuf = (self.as_mpi_memory(sbuf), recvbuf[1], recvbuf[2])
 
         # perform the actual reduction operation
-        return func(sendbuf, recvbuf, *args, **kwargs), sbuf, rbuf, buf
+        return func(sendbuf, recvbuf, op, *args, **kwargs), sbuf, rbuf, buf
 
     def Allreduce(
         self,
@@ -869,7 +965,7 @@ def Allreduce(
         Combines values from all processes and distributes the result back to all processes
 
         Parameters
-        ---------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -894,7 +990,7 @@ def Exscan(
         Computes the exclusive scan (partial reductions) of data on a collection of processes
 
         Parameters
-        ------------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -919,7 +1015,7 @@ def Iallreduce(
         Nonblocking allreduce reducing values on all processes to a single value
 
         Parameters
-        ---------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -941,7 +1037,7 @@ def Iexscan(
         Nonblocking Exscan
 
         Parameters
-        ------------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -963,7 +1059,7 @@ def Iscan(
         Nonblocking Scan
 
         Parameters
-        ------------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -986,7 +1082,7 @@ def Ireduce(
         Nonblocking reduction operation
 
         Parameters
-        ---------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -1011,7 +1107,7 @@ def Reduce(
         Reduce values from all processes to a single value on process "root"
 
         Parameters
-        ---------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -1038,7 +1134,7 @@ def Scan(
         Computes the scan (partial reductions) of data on a collection of processes in a nonblocking way
 
         Parameters
-        ------------
+        ----------
         sendbuf: Union[DNDarray, torch.Tensor, Any]
             Buffer address of the send message
         recvbuf: Union[DNDarray, torch.Tensor, Any]
@@ -1074,6 +1170,8 @@ def __allgather_like(
             Buffer address where to store the result
         axis: int
             Concatenation axis: The axis along which ``sendbuf`` is packed and along which ``recvbuf`` puts together individual chunks
+        **kwargs
+            Extra arguments to be passed to the function.
         """
         # dummy allocation for *v calls
         # ToDO: Propper implementation of usage
@@ -1269,6 +1367,8 @@ def __alltoall_like(
                 - if ``send_axis`` or ``recv_axis`` are ``None``, an error will be thrown
         recv_axis: int
             Prior split axis, along which blocks are received from the individual ranks
+        **kwargs
+            Extra arguments to be passed to the function.
         """
         if send_axis is None:
             raise NotImplementedError(
@@ -1484,7 +1584,6 @@ def Alltoallw(
             lshape, subsizes, substarts = subarray_params
 
             if np.all(np.array(subsizes) > 0):
-
                 if is_contiguous:
                     # Commit the source subarray datatypes
                     # Subarray parameters are calculated based on the work by Dalcin et al. (https://arxiv.org/abs/1804.09536)
@@ -1580,7 +1679,9 @@ def _create_recursive_vectortype(
         >>> datatype = MPI.INT
         >>> tensor_stride = [1, 2, 3]
         >>> subarray_sizes = [4, 5, 6]
-        >>> recursive_vectortype = create_recursive_vectortype(datatype, tensor_stride, subarray_sizes)
+        >>> recursive_vectortype = create_recursive_vectortype(
+        ...     datatype, tensor_stride, subarray_sizes
+        ... )
         """
         datatype_history = []
         current_datatype = datatype
@@ -1718,6 +1819,8 @@ def __gather_like(
             Number of elements to be scattered (vor non-v-calls)
         recv_factor: int
             Number of elements to be gathered (vor non-v-calls)
+        **kwargs
+            Extra arguments to be passed to the function.
         """
         sbuf, rbuf, recv_axis_permutation = None, None, None
 
@@ -1960,6 +2063,8 @@ def __scatter_like(
             Number of elements to be scattered (vor non-v-calls)
         recv_factor: int
             Number of elements to be gathered (vor non-v-calls)
+        **kwargs
+            Extra arguments to be passed to the function.
         """
         sbuf, rbuf, recv_axis_permutation = None, None, None
 
diff --git a/heat/core/complex_math.py b/heat/core/complex_math.py
index b33e913075..1384140a5d 100644
--- a/heat/core/complex_math.py
+++ b/heat/core/complex_math.py
@@ -1,5 +1,5 @@
 """
-This module handles operations focussing on complex numbers.
+Complex numbers module.
 """
 
 import torch
@@ -30,9 +30,9 @@ def angle(x: DNDarray, deg: bool = False, out: Optional[DNDarray] = None) -> DND
 
     Examples
     --------
-    >>> ht.angle(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]))
+    >>> ht.angle(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]))
     DNDarray([ 0.0000,  1.5708,  0.7854,  2.3562, -0.7854], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.angle(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]), deg=True)
+    >>> ht.angle(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]), deg=True)
     DNDarray([  0.,  90.,  45., 135., -45.], dtype=ht.float32, device=cpu:0, split=None)
     """
     a = _operations.__local_op(torch.angle, x, out)
@@ -56,7 +56,7 @@ def conjugate(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.conjugate(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]))
+    >>> ht.conjugate(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]))
     DNDarray([ (1-0j),     -1j,  (1-1j), (-2-2j),  (3+3j)], dtype=ht.complex64, device=cpu:0, split=None)
     """
     return _operations.__local_op(torch.conj, x, out)
@@ -81,7 +81,7 @@ def imag(x: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> ht.imag(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]))
+    >>> ht.imag(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]))
     DNDarray([ 0.,  1.,  1.,  2., -3.], dtype=ht.float32, device=cpu:0, split=None)
     """
     if types.heat_type_is_complexfloating(x.dtype):
@@ -101,7 +101,7 @@ def real(x: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> ht.real(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]))
+    >>> ht.real(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]))
     DNDarray([ 1.,  0.,  1., -2.,  3.], dtype=ht.float32, device=cpu:0, split=None)
     """
     if types.heat_type_is_complexfloating(x.dtype):
diff --git a/heat/core/constants.py b/heat/core/constants.py
index 3641178c66..80a745f598 100644
--- a/heat/core/constants.py
+++ b/heat/core/constants.py
@@ -1,5 +1,5 @@
 """
-This module defines constants used in HeAT.
+Constants module.
 """
 
 import torch
diff --git a/heat/core/devices.py b/heat/core/devices.py
index dfb69d2224..83a2be05c6 100644
--- a/heat/core/devices.py
+++ b/heat/core/devices.py
@@ -16,13 +16,13 @@
 
 class Device:
     """
-    Implements a compute device. HeAT can run computations on different compute devices or backends.
+    Implements a compute device. Heat can run computations on different compute devices or backends.
     A device describes the device type and id on which said computation should be carried out.
 
     Parameters
     ----------
     device_type : str
-        Represents HeAT's device name
+        Represents Heat's device name
     device_id : int
         The device id
     torch_device : str
@@ -34,6 +34,8 @@ class Device:
     device(cpu:0)
     >>> ht.Device("gpu", 0, "cuda:0")
     device(gpu:0)
+    >>> ht.Device("gpu", 0, "mps:0")  # on Apple M1/M2
+    device(gpu:0)
     """
 
     def __init__(self, device_type: str, device_id: int, torch_device: str):
@@ -133,6 +135,28 @@ def __eq__(self, other: Any) -> bool:
     # the GPU device should be exported as global symbol
     __all__.append("gpu")
 
+elif torch.backends.mps.is_built() and torch.backends.mps.is_available():
+    # Apple MPS available
+    gpu_id = 0
+    # create a new GPU device
+    gpu = Device("gpu", gpu_id, "mps:{}".format(gpu_id))
+    """
+    The standard GPU Device on Apple M1/M2
+
+    Examples
+    --------
+    >>> ht.cpu
+    device(cpu:0)
+    >>> ht.ones((2, 3), device=ht.gpu)
+    DNDarray([[1., 1., 1.],
+          [1., 1., 1.]], dtype=ht.float32, device=mps:0, split=None)
+    """
+    # add a GPU device string
+    __device_mapping[gpu.device_type] = gpu
+    __device_mapping["mps"] = gpu
+    # the GPU device should be exported as global symbol
+    __all__.append("gpu")
+
 
 def get_device() -> Device:
     """
@@ -165,7 +189,7 @@ def sanitize_device(device: Optional[Union[str, Device]] = None) -> Device:
     try:
         return __device_mapping[device.strip().lower()]
     except (AttributeError, KeyError, TypeError):
-        raise ValueError(f'Unknown device, must be one of {", ".join(__device_mapping.keys())}')
+        raise ValueError(f"Unknown device, must be one of {', '.join(__device_mapping.keys())}")
 
 
 def use_device(device: Optional[Union[str, Device]] = None) -> None:
diff --git a/heat/core/dndarray.py b/heat/core/dndarray.py
index 9d9bda1037..3a295531a0 100644
--- a/heat/core/dndarray.py
+++ b/heat/core/dndarray.py
@@ -188,7 +188,7 @@ def ndim(self) -> int:
     @property
     def __partitioned__(self) -> dict:
         """
-        This will return a dictionary containing information useful for working with the partitioned
+        Return a dictionary containing information useful for working with the partitioned
         data. These items include the shape of the data on each process, the starting index of the data
         that a process has, the datatype of the data, the local devices, as well as the global
         partitioning scheme.
@@ -208,13 +208,16 @@ def size(self) -> int:
         """
         Number of total elements of the ``DNDarray``
         """
-        return (
-            torch.prod(
+        if self.larray.is_mps:
+            # MPS does not support double precision
+            size = torch.prod(
+                torch.tensor(self.gshape, dtype=torch.float32, device=self.device.torch_device)
+            )
+        else:
+            size = torch.prod(
                 torch.tensor(self.gshape, dtype=torch.float64, device=self.device.torch_device)
             )
-            .long()
-            .item()
-        )
+        return size.long().item()
 
     @property
     def gnbytes(self) -> int:
@@ -382,7 +385,7 @@ def __prephalo(self, start, end) -> torch.Tensor:
         except IndexError:
             print("Indices out of bound")
 
-        return self.__array[ix].clone().contiguous()
+        return self.__array[ix].clone()
 
     def get_halo(self, halo_size: int, prev: bool = True, next: bool = True) -> torch.Tensor:
         """
@@ -479,6 +482,34 @@ def __array__(self) -> np.ndarray:
         """
         return self.larray.cpu().__array__()
 
+    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
+        """
+        Override NumPy's universal functions.
+        """
+        import heat
+
+        # TODO support ufunc method variants
+        if method == "__call__":
+            try:
+                func = getattr(heat, ufunc.__name__)
+            except AttributeError:
+                return NotImplemented
+            return func(*inputs, **kwargs)
+        else:
+            return NotImplemented
+
+    def __array_function__(self, func, types, args, kwargs):
+        """
+        Augments NumPy's functions.
+        """
+        import heat
+
+        try:
+            ht_func = getattr(heat, func.__name__)
+        except AttributeError:
+            return NotImplemented
+        return ht_func(*args, **kwargs)
+
     def astype(self, dtype, copy=True) -> DNDarray:
         """
         Returns a casted version of this array.
@@ -495,6 +526,21 @@ def astype(self, dtype, copy=True) -> DNDarray:
 
         """
         dtype = canonical_heat_type(dtype)
+        if self.__array.is_mps:
+            if dtype == types.float64:
+                # print warning
+                warnings.warn(
+                    "MPS does not support float64. Casting to float32 instead.",
+                    ResourceWarning,
+                )
+                dtype = types.float32
+            elif dtype == types.complex128:
+                # print warning
+                warnings.warn(
+                    "MPS does not support complex128. Casting to complex64 instead.",
+                    ResourceWarning,
+                )
+                dtype = types.complex64
         casted_array = self.__array.type(dtype.torch_type())
         if copy:
             return DNDarray(
@@ -788,7 +834,7 @@ def __float__(self) -> DNDarray:
         Float scalar casting.
 
         See Also
-        ---------
+        --------
         :func:`~heat.core.manipulations.flatten`
         """
         return self.__cast(float)
@@ -854,7 +900,7 @@ def __getitem__(self, key: Union[int, Tuple[int, ...], List[int, ...]]) -> DNDar
         >>> a[1:6]
         (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32)
         (2/2) >>> tensor([5], dtype=torch.int32)
-        >>> a = ht.zeros((4,5), split=0)
+        >>> a = ht.zeros((4, 5), split=0)
         (1/2) >>> tensor([[0., 0., 0., 0., 0.],
                           [0., 0., 0., 0., 0.]])
         (2/2) >>> tensor([[0., 0., 0., 0., 0.],
@@ -1116,6 +1162,7 @@ def is_balanced(self, force_check: bool = False) -> bool:
         assessed via collective communication.
 
         Parameters
+        ----------
         force_check : bool, optional
             If True, the balanced status of the ``DNDarray`` will be assessed via
             collective communication in any case.
@@ -1156,7 +1203,7 @@ def item(self):
         raised (by pytorch)
 
         Examples
-        -------
+        --------
         >>> import heat as ht
         >>> x = ht.zeros((1))
         >>> x.item()
@@ -1189,11 +1236,20 @@ def numpy(self) -> np.array:
         dist = self.copy().resplit_(axis=None)
         return dist.larray.cpu().numpy()
 
+    def _repr_pretty_(self, p, cycle):
+        """
+        Pretty print for IPython.
+        """
+        if cycle:
+            p.text(printing.__str__(self))
+        else:
+            p.text(printing.__str__(self))
+
     def __repr__(self) -> str:
         """
-        Computes a printable representation of the passed DNDarray.
+        Returns a printable representation of the passed DNDarray, targeting developers.
         """
-        return printing.__str__(self)
+        return printing.__repr__(self)
 
     def ravel(self):
         """
@@ -1205,9 +1261,9 @@ def ravel(self):
 
         Examples
         --------
-        >>> a = ht.ones((2,3), split=0)
+        >>> a = ht.ones((2, 3), split=0)
         >>> b = a.ravel()
-        >>> a[0,0] = 4
+        >>> a[0, 0] = 4
         >>> b
         DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0)
         """
@@ -1423,7 +1479,13 @@ def resplit_(self, axis: int = None):
 
         Examples
         --------
-        >>> a = ht.zeros((4, 5,), split=0)
+        >>> a = ht.zeros(
+        ...     (
+        ...         4,
+        ...         5,
+        ...     ),
+        ...     split=0,
+        ... )
         >>> a.lshape
         (0/2) (2, 5)
         (1/2) (2, 5)
@@ -1433,7 +1495,13 @@ def resplit_(self, axis: int = None):
         >>> a.lshape
         (0/2) (4, 5)
         (1/2) (4, 5)
-        >>> a = ht.zeros((4, 5,), split=0)
+        >>> a = ht.zeros(
+        ...     (
+        ...         4,
+        ...         5,
+        ...     ),
+        ...     split=0,
+        ... )
         >>> a.lshape
         (0/2) (2, 5)
         (1/2) (2, 5)
@@ -1522,7 +1590,7 @@ def __setitem__(
 
         Examples
         --------
-        >>> a = ht.zeros((4,5), split=0)
+        >>> a = ht.zeros((4, 5), split=0)
         (1/2) >>> tensor([[0., 0., 0., 0., 0.],
                           [0., 0., 0., 0., 0.]])
         (2/2) >>> tensor([[0., 0., 0., 0., 0.],
@@ -1798,7 +1866,7 @@ def __setter(
         Utility function for checking ``value`` and forwarding to :func:``__setitem__``
 
         Raises
-        -------------
+        ------
         NotImplementedError
             If the type of ``value`` ist not supported
         """
@@ -1834,15 +1902,15 @@ def tolist(self, keepsplit: bool = False) -> List:
 
         Examples
         --------
-        >>> a = ht.array([[0,1],[2,3]])
+        >>> a = ht.array([[0, 1], [2, 3]])
         >>> a.tolist()
         [[0, 1], [2, 3]]
 
-        >>> a = ht.array([[0,1],[2,3]], split=0)
+        >>> a = ht.array([[0, 1], [2, 3]], split=0)
         >>> a.tolist()
         [[0, 1], [2, 3]]
 
-        >>> a = ht.array([[0,1],[2,3]], split=1)
+        >>> a = ht.array([[0, 1], [2, 3]], split=1)
         >>> a.tolist(keepsplit=True)
         (1/2) [[0], [2]]
         (2/2) [[1], [3]]
@@ -1852,6 +1920,21 @@ def tolist(self, keepsplit: bool = False) -> List:
 
         return self.__array.tolist()
 
+    @classmethod
+    def __torch_function__(cls, func, types, args=(), kwargs=None):
+        """
+        Supports PyTorch's dispatch mechanism.
+        """
+        import heat
+
+        if kwargs is None:
+            kwargs = {}
+        try:
+            ht_func = getattr(heat, func.__name__)
+        except AttributeError:
+            return NotImplemented
+        return ht_func(*args, **kwargs)
+
     def __torch_proxy__(self) -> torch.Tensor:
         """
         Return a 1-element `torch.Tensor` strided as the global `self` shape.
@@ -1899,6 +1982,7 @@ def __xitem_get_key_start_stop(
 from . import statistics
 from . import stride_tricks
 from . import tiling
+from . import types
 
 from .devices import Device
 from .stride_tricks import sanitize_axis
diff --git a/heat/core/exponential.py b/heat/core/exponential.py
index 85778ef3d0..359bc4d7f1 100644
--- a/heat/core/exponential.py
+++ b/heat/core/exponential.py
@@ -1,5 +1,5 @@
 """
-This module computes exponential and logarithmic operations.
+Exponential and logarithmic operations module.
 """
 
 import torch
@@ -63,7 +63,7 @@ def expm1(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.expm1(ht.arange(5)) + 1.
+    >>> ht.expm1(ht.arange(5)) + 1.0
     DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float64, device=cpu:0, split=None)
     """
     return _operations.__local_op(torch.expm1, x, out)
@@ -303,7 +303,7 @@ def square(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
         A location in which to store the results. If provided, it must have a broadcastable shape. If not provided
         or set to :keyword:`None`, a fresh array is allocated.
 
-    Examples:
+    Examples
     --------
     >>> a = ht.random.rand(4)
     >>> a
diff --git a/heat/core/factories.py b/heat/core/factories.py
index dc2dbb4e01..389b671f24 100644
--- a/heat/core/factories.py
+++ b/heat/core/factories.py
@@ -59,15 +59,12 @@ def arange(
 
     Parameters
     ----------
-    start : scalar, optional
-        Start of interval.  The interval includes this value.  The default start value is 0.
-    stop : scalar
-        End of interval.  The interval does not include this value, except in some cases where ``step`` is not an
-        integer and floating point round-off affects the length of ``out``.
-    step : scalar, optional
-        Spacing between values.  For any output ``out``, this is the distance between two adjacent values,
-        ``out[i+1]-out[i]``. The default step size is 1. If ``step`` is specified as a position argument, ``start``
-        must also be given.
+    *args : int or float, optional
+        Positional arguments defining the interval. Can be:
+        - A single argument: interpreted as `stop`, with `start=0` and `step=1`.
+        - Two arguments: interpreted as `start` and `stop`, with `step=1`.
+        - Three arguments: interpreted as `start`, `stop`, and `step`.
+        The function raises a `TypeError` if more than three arguments are provided.
     dtype : datatype, optional
         The type of the output array.  If `dtype` is not given, it is automatically inferred from the other input
         arguments.
@@ -247,7 +244,7 @@ def array(
      4
      5
     [torch.LongStorage of size 6]
-    >>> c = ht.array(a, order='F')
+    >>> c = ht.array(a, order="F")
     >>> c
     DNDarray([[0, 1, 2],
               [3, 4, 5]], dtype=ht.int64, device=cpu:0, split=None)
@@ -264,7 +261,7 @@ def array(
     >>> a = np.arange(4 * 3).reshape(4, 3)
     >>> a.strides
     (24, 8)
-    >>> b = ht.array(a, order='F', split=0)
+    >>> b = ht.array(a, order="F", split=0)
     >>> b
     DNDarray([[ 0,  1,  2],
               [ 3,  4,  5],
@@ -324,7 +321,6 @@ def array(
                 f"'is_split' and the split axis of the object do not match ({is_split} != {obj.split}).\nIf you are trying to resplit an existing DNDarray in-place, use the method `DNDarray.resplit_()` instead."
             )
         elif device is not None and device != obj.device and copy is False:
-
             raise ValueError(
                 "argument `copy` is set to False, but copy of input object is necessary as the array is being copied across devices.\nUse the method `DNDarray.cpu()` or  `DNDarray.gpu()` to move the array to the desired device."
             )
@@ -516,24 +512,24 @@ def asarray(
 
     Examples
     --------
-    >>> a = [1,2]
+    >>> a = [1, 2]
     >>> ht.asarray(a)
     DNDarray([1, 2], dtype=ht.int64, device=cpu:0, split=None)
-    >>> a = np.array([1,2,3])
+    >>> a = np.array([1, 2, 3])
     >>> n = ht.asarray(a)
     >>> n
     DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None)
     >>> n[0] = 0
     >>> a
     DNDarray([0, 2, 3], dtype=ht.int64, device=cpu:0, split=None)
-    >>> a = torch.tensor([1,2,3])
+    >>> a = torch.tensor([1, 2, 3])
     >>> t = ht.asarray(a)
     >>> t
     DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None)
     >>> t[0] = 0
     >>> a
     DNDarray([0, 2, 3], dtype=ht.int64, device=cpu:0, split=None)
-    >>> a = ht.array([1,2,3,4], dtype=ht.float32)
+    >>> a = ht.array([1, 2, 3, 4], dtype=ht.float32)
     >>> ht.asarray(a, dtype=ht.float32) is a
     True
     >>> ht.asarray(a, dtype=ht.float64) is a
@@ -583,7 +579,12 @@ def empty(
     DNDarray([0., 0., 0.], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.empty(3, dtype=ht.int)
     DNDarray([59140784,        0, 59136816], dtype=ht.int32, device=cpu:0, split=None)
-    >>> ht.empty((2, 3,))
+    >>> ht.empty(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     DNDarray([[-1.7206e-10,  4.5905e-41, -1.7206e-10],
               [ 4.5905e-41,  4.4842e-44,  0.0000e+00]], dtype=ht.float32, device=cpu:0, split=None)
     """
@@ -629,7 +630,12 @@ def empty_like(
 
     Examples
     --------
-    >>> x = ht.ones((2, 3,))
+    >>> x = ht.ones(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     >>> x
     DNDarray([[1., 1., 1.],
               [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -736,7 +742,7 @@ def __factory(
     shape : int or Sequence[ints,...]
         Desired shape of the output array, e.g. 1 or (1, 2, 3,).
     dtype : datatype
-        The desired HeAT data type for the array, defaults to ht.float32.
+        The desired Heat data type for the array, defaults to ht.float32.
     split : int or None
         The axis along which the array is split and distributed.
     local_factory : callable
@@ -804,6 +810,8 @@ def __factory_like(
         Options: ``'C'`` or ``'F'``. Specifies the memory layout of the newly created array. Default is ``order='C'``,
         meaning the array will be stored in row-major order (C-like). If ``order=‘F’``, the array will be stored in
         column-major order (Fortran-like).
+    **kwargs
+        Keyword arguments for the factory method.
 
     Raises
     ------
@@ -867,7 +875,7 @@ def from_partitioned(x, comm: Optional[Communication] = None) -> DNDarray:
     comm: Communication, optional
         Handle to the nodes holding distributed parts or copies of this array.
 
-    See also
+    See Also
     --------
     :func:`ht.core.DNDarray.create_partition_interface <ht.core.DNDarray.create_partition_interface>`.
 
@@ -883,11 +891,11 @@ def from_partitioned(x, comm: Optional[Communication] = None) -> DNDarray:
     Examples
     --------
     >>> import heat as ht
-    >>> a = ht.ones((44,55), split=0)
+    >>> a = ht.ones((44, 55), split=0)
     >>> b = ht.from_partitioned(a)
-    >>> assert (a==b).all()
+    >>> assert (a == b).all()
     >>> a[40] = 4711
-    >>> assert (a==b).all()
+    >>> assert (a == b).all()
     """
     comm = sanitize_comm(comm)
     parted = x.__partitioned__
@@ -912,7 +920,7 @@ def from_partition_dict(parted: dict, comm: Optional[Communication] = None) -> D
     comm: Communication, optional
         Handle to the nodes holding distributed parts or copies of this array.
 
-    See also
+    See Also
     --------
     :func:`ht.core.DNDarray.create_partition_interface <ht.core.DNDarray.create_partition_interface>`.
 
@@ -928,11 +936,11 @@ def from_partition_dict(parted: dict, comm: Optional[Communication] = None) -> D
     Examples
     --------
     >>> import heat as ht
-    >>> a = ht.ones((44,55), split=0)
+    >>> a = ht.ones((44, 55), split=0)
     >>> b = ht.from_partition_dict(a.__partitioned__)
-    >>> assert (a==b).all()
+    >>> assert (a == b).all()
     >>> a[40] = 4711
-    >>> assert (a==b).all()
+    >>> assert (a == b).all()
     """
     comm = sanitize_comm(comm)
     return __from_partition_dict_helper(parted, comm)
@@ -971,7 +979,7 @@ def __from_partition_dict_helper(parted: dict, comm: Communication):
     gshape_list = list(gshape)
     lshape_list = list(data.shape)
     shape_diff = torch.tensor(
-        [g - l for g, l in zip(gshape_list, lshape_list)]
+        [g_shape - l_shape for g_shape, l_shape in zip(gshape_list, lshape_list)]
     )  # dont care about device
     nz = torch.nonzero(shape_diff)
 
@@ -1094,7 +1102,12 @@ def full_like(
 
     Examples
     --------
-    >>> x = ht.zeros((2, 3,))
+    >>> x = ht.zeros(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     >>> x
     DNDarray([[0., 0., 0.],
               [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -1284,7 +1297,7 @@ def meshgrid(*arrays: Sequence[DNDarray], indexing: str = "xy") -> List[DNDarray
     --------
     >>> x = ht.arange(4)
     >>> y = ht.arange(3)
-    >>> xx, yy = ht.meshgrid(x,y)
+    >>> xx, yy = ht.meshgrid(x, y)
     >>> xx
     DNDarray([[0, 1, 2, 3],
           [0, 1, 2, 3],
@@ -1385,7 +1398,12 @@ def ones(
     DNDarray([1., 1., 1.], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.ones(3, dtype=ht.int)
     DNDarray([1, 1, 1], dtype=ht.int32, device=cpu:0, split=None)
-    >>> ht.ones((2, 3,))
+    >>> ht.ones(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     DNDarray([[1., 1., 1.],
           [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None)
     """
@@ -1429,7 +1447,12 @@ def ones_like(
 
     Examples
     --------
-    >>> x = ht.zeros((2, 3,))
+    >>> x = ht.zeros(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     >>> x
     DNDarray([[0., 0., 0.],
               [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -1481,7 +1504,12 @@ def zeros(
     DNDarray([0., 0., 0.], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.zeros(3, dtype=ht.int)
     DNDarray([0, 0, 0], dtype=ht.int32, device=cpu:0, split=None)
-    >>> ht.zeros((2, 3,))
+    >>> ht.zeros(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     DNDarray([[0., 0., 0.],
               [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)
     """
@@ -1525,7 +1553,12 @@ def zeros_like(
 
     Examples
     --------
-    >>> x = ht.ones((2, 3,))
+    >>> x = ht.ones(
+    ...     (
+    ...         2,
+    ...         3,
+    ...     )
+    ... )
     >>> x
     DNDarray([[1., 1., 1.],
               [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None)
diff --git a/heat/core/indexing.py b/heat/core/indexing.py
index 33d94c04d0..e66ecd3203 100644
--- a/heat/core/indexing.py
+++ b/heat/core/indexing.py
@@ -115,14 +115,14 @@ def where(
         if only x or y is given or both are not DNDarrays or numerical scalars
 
     Notes
-    -------
+    -----
     When only condition is provided, this function is a shorthand for :func:`nonzero`.
 
     Examples
     --------
     >>> import heat as ht
     >>> x = ht.arange(10, split=0)
-    >>> ht.where(x < 5, x, 10*x)
+    >>> ht.where(x < 5, x, 10 * x)
     DNDarray([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90], dtype=ht.int64, device=cpu:0, split=0)
     >>> y = ht.array([[0, 1, 2], [0, 2, 4], [0, 3, 6]])
     >>> ht.where(y < 4, y, -1)
diff --git a/heat/core/io.py b/heat/core/io.py
index 427c7b8d49..aae6ab5b2c 100644
--- a/heat/core/io.py
+++ b/heat/core/io.py
@@ -2,6 +2,8 @@
 
 from __future__ import annotations
 
+from functools import reduce
+import operator
 import os.path
 from math import log10
 import numpy as np
@@ -27,6 +29,7 @@
 __HDF5_EXTENSIONS = frozenset([".h5", ".hdf5"])
 __NETCDF_EXTENSIONS = frozenset([".nc", ".nc4", "netcdf"])
 __NETCDF_DIM_TEMPLATE = "{}_dim_{}"
+__ZARR_EXTENSIONS = frozenset([".zarr"])
 
 __all__ = [
     "load",
@@ -36,8 +39,32 @@
     "supports_hdf5",
     "supports_netcdf",
     "load_npy_from_path",
+    "supports_zarr",
 ]
 
+
+def size_from_slice(size: int, s: slice) -> Tuple[int, int]:
+    """
+    Determines the size of a slice object.
+
+    Parameters
+    ----------
+    size: int
+        The size of the array the slice object is applied to.
+    s : slice
+        The slice object to determine the size of.
+
+    Returns
+    -------
+    int
+        The size of the sliced object.
+    int
+        The start index of the slice object.
+    """
+    new_range = range(size)[s]
+    return len(new_range), new_range.start if len(new_range) > 0 else 0
+
+
 try:
     import netCDF4 as nc
 except ImportError:
@@ -99,20 +126,20 @@ def load_netcdf(
             The device id on which to place the data, defaults to globally set default device.
 
         Raises
-        -------
+        ------
         TypeError
             If any of the input parameters are not of correct type.
 
         Examples
         --------
-        >>> a = ht.load_netcdf('data.nc', variable='DATA')
+        >>> a = ht.load_netcdf("data.nc", variable="DATA")
         >>> a.shape
         [0/2] (5,)
         [1/2] (5,)
         >>> a.lshape
         [0/2] (5,)
         [1/2] (5,)
-        >>> b = ht.load_netcdf('data.nc', variable='DATA', split=0)
+        >>> b = ht.load_netcdf("data.nc", variable="DATA", split=0)
         >>> b.shape
         [0/2] (5,)
         [1/2] (5,)
@@ -189,7 +216,7 @@ def save_netcdf(
             additional arguments passed to the created dataset.
 
         Raises
-        -------
+        ------
         TypeError
             If any of the input parameters are not of correct type.
         ValueError
@@ -199,7 +226,7 @@ def save_netcdf(
         Examples
         --------
         >>> x = ht.arange(100, split=0)
-        >>> ht.save_netcdf(x, 'data.nc', dataset='DATA')
+        >>> ht.save_netcdf(x, "data.nc", dataset="DATA")
         """
         if not isinstance(data, DNDarray):
             raise TypeError(f"data must be heat tensor, not {type(data)}")
@@ -251,7 +278,7 @@ def __get_expanded_split(
                 split-axis of dndarray.
 
             Raises
-            -------
+            ------
             ValueError
                 If resulting shapes do not match.
             """
@@ -295,7 +322,7 @@ def __merge_slices(
             data_slices: Optional[Tuple[int, slice]] = None,
         ) -> Tuple[Union[int, slice]]:
             """
-            This method allows replacing:
+            Allows replacing:
                 ``var[var_slices][data_slices] = data``
             (a `netcdf4.Variable.__getitem__` and a `numpy.ndarray.__setitem__` call)
 
@@ -489,7 +516,7 @@ def load_hdf5(
         path: str,
         dataset: str,
         dtype: datatype = types.float32,
-        load_fraction: float = 1.0,
+        slices: Optional[Tuple[Optional[slice], ...]] = None,
         split: Optional[int] = None,
         device: Optional[str] = None,
         comm: Optional[Communication] = None,
@@ -505,10 +532,8 @@ def load_hdf5(
             Name of the dataset to be read.
         dtype : datatype, optional
             Data type of the resulting array.
-        load_fraction : float between 0. (excluded) and 1. (included), default is 1.
-            if 1. (default), the whole dataset is loaded from the file specified in path
-            else, the dataset is loaded partially, with the fraction of the dataset (along the split axis) specified by load_fraction
-            If split is None, load_fraction is automatically set to 1., i.e. the whole dataset is loaded.
+        slices : tuple of slice objects, optional
+            Load only the specified slices of the dataset.
         split : int or None, optional
             The axis along which the data is distributed among the processing cores.
         device : str, optional
@@ -517,26 +542,79 @@ def load_hdf5(
             The communication to use for the data distribution.
 
         Raises
-        -------
+        ------
         TypeError
             If any of the input parameters are not of correct type
 
         Examples
         --------
-        >>> a = ht.load_hdf5('data.h5', dataset='DATA')
+        >>> a = ht.load_hdf5("data.h5", dataset="DATA")
         >>> a.shape
         [0/2] (5,)
         [1/2] (5,)
         >>> a.lshape
         [0/2] (5,)
         [1/2] (5,)
-        >>> b = ht.load_hdf5('data.h5', dataset='DATA', split=0)
+        >>> b = ht.load_hdf5("data.h5", dataset="DATA", split=0)
         >>> b.shape
         [0/2] (5,)
         [1/2] (5,)
         >>> b.lshape
         [0/2] (3,)
         [1/2] (2,)
+
+        Using the slicing argument:
+        >>> not_sliced = ht.load_hdf5("other_data.h5", dataset="DATA", split=0)
+        >>> not_sliced.shape
+        [0/2] (10,2)
+        [1/2] (10,2)
+        >>> not_sliced.lshape
+        [0/2] (5,2)
+        [1/2] (5,2)
+        >>> not_sliced.larray
+        [0/2] [[ 0,  1],
+               [ 2,  3],
+               [ 4,  5],
+               [ 6,  7],
+               [ 8,  9]]
+        [1/2] [[10, 11],
+               [12, 13],
+               [14, 15],
+               [16, 17],
+               [18, 19]]
+
+        >>> sliced = ht.load_hdf5("other_data.h5", dataset="DATA", split=0, slices=slice(8))
+        >>> sliced.shape
+        [0/2] (8,2)
+        [1/2] (8,2)
+        >>> sliced.lshape
+        [0/2] (4,2)
+        [1/2] (4,2)
+        >>> sliced.larray
+        [0/2] [[ 0,  1],
+               [ 2,  3],
+               [ 4,  5],
+               [ 6,  7]]
+        [1/2] [[ 8,  9],
+               [10, 11],
+               [12, 13],
+               [14, 15],
+               [16, 17]]
+
+        >>> sliced = ht.load_hdf5('other_data.h5', dataset='DATA', split=0, slices=(slice(2,8), slice(0,1))
+        >>> sliced.shape
+        [0/2] (6,1)
+        [1/2] (6,1)
+        >>> sliced.lshape
+        [0/2] (3,1)
+        [1/2] (3,1)
+        >>> sliced.larray
+        [0/2] [[ 4, ],
+               [ 6, ],
+               [ 8, ]]
+        [1/2] [[10, ],
+               [12, ],
+               [14, ]]
         """
         if not isinstance(path, str):
             raise TypeError(f"path must be str, not {type(path)}")
@@ -545,14 +623,6 @@ def load_hdf5(
         elif split is not None and not isinstance(split, int):
             raise TypeError(f"split must be None or int, not {type(split)}")
 
-        if not isinstance(load_fraction, float):
-            raise TypeError(f"load_fraction must be float, but is {type(load_fraction)}")
-        else:
-            if split is not None and (load_fraction <= 0.0 or load_fraction > 1.0):
-                raise ValueError(
-                    f"load_fraction must be between 0. (excluded) and 1. (included), but is {load_fraction}."
-                )
-
         # infer the type and communicator for the loaded array
         dtype = types.canonical_heat_type(dtype)
         # determine the comm and device the data will be placed on
@@ -563,13 +633,33 @@ def load_hdf5(
         with h5py.File(path, "r") as handle:
             data = handle[dataset]
             gshape = data.shape
-            if split is not None:
-                gshape = list(gshape)
-                gshape[split] = int(gshape[split] * load_fraction)
-                gshape = tuple(gshape)
+            new_gshape = tuple()
+            offsets = [0] * len(gshape)
+            if slices is not None:
+                for i in range(len(gshape)):
+                    if i < len(slices) and slices[i]:
+                        s = slices[i]
+                        if s.step is not None and s.step != 1:
+                            raise ValueError("Slices with step != 1 are not supported")
+                        new_axis_size, offset = size_from_slice(gshape[i], s)
+                        new_gshape += (new_axis_size,)
+                        offsets[i] = offset
+                    else:
+                        new_gshape += (gshape[i],)
+                        offsets[i] = 0
+
+                gshape = new_gshape
+
             dims = len(gshape)
             split = sanitize_axis(gshape, split)
             _, _, indices = comm.chunk(gshape, split)
+
+            if slices is not None:
+                new_indices = tuple()
+                for offset, index in zip(offsets, indices):
+                    new_indices += (slice(index.start + offset, index.stop + offset),)
+                indices = new_indices
+
             balanced = True
             if split is None:
                 data = torch.tensor(
@@ -614,7 +704,7 @@ def save_hdf5(
             Additional arguments passed to the created dataset.
 
         Raises
-        -------
+        ------
         TypeError
             If any of the input parameters are not of correct type.
         ValueError
@@ -623,7 +713,7 @@ def save_hdf5(
         Examples
         --------
         >>> x = ht.arange(100, split=0)
-        >>> ht.save_hdf5(x, 'data.h5', dataset='DATA')
+        >>> ht.save_hdf5(x, "data.h5", dataset="DATA")
         """
         if not isinstance(data, DNDarray):
             raise TypeError(f"data must be heat tensor, not {type(data)}")
@@ -695,7 +785,7 @@ def load(
         Additional options passed to the particular functions.
 
     Raises
-    -------
+    ------
     ValueError
         If the file extension is not understood or known.
     RuntimeError
@@ -703,10 +793,20 @@ def load(
 
     Examples
     --------
-    >>> ht.load('data.h5', dataset='DATA')
+    >>> ht.load("data.h5", dataset="DATA")
     DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.load('data.nc', variable='DATA')
+    >>> ht.load("data.nc", variable="DATA")
     DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
+
+    See Also
+    --------
+    :func:`load_csv` : Loads data from a CSV file.
+    :func:`load_csv_from_folder` : Loads multiple .csv files into one DNDarray which will be returned.
+    :func:`load_hdf5` : Loads data from an HDF5 file.
+    :func:`load_netcdf` : Loads data from a NetCDF4 file.
+    :func:`load_npy_from_path` : Loads multiple .npy files into one DNDarray which will be returned.
+    :func:`load_zarr` : Loads zarr-Format into DNDarray which will be returned.
+
     """
     if not isinstance(path, str):
         raise TypeError(f"Expected path to be str, but was {type(path)}")
@@ -724,6 +824,12 @@ def load(
             return load_netcdf(path, *args, **kwargs)
         else:
             raise RuntimeError(f"netcdf is required for file extension {extension}")
+    elif extension in __ZARR_EXTENSIONS:
+        if supports_zarr():
+            return load_zarr(path, *args, **kwargs)
+        else:
+            raise RuntimeError(f"Package zarr is required for file extension {extension}")
+
     else:
         raise ValueError(f"Unsupported file extension {extension}")
 
@@ -762,14 +868,14 @@ def load_csv(
         The communication to use for the data distribution, defaults to global default
 
     Raises
-    -------
+    ------
     TypeError
         If any of the input parameters are not of correct type.
 
     Examples
     --------
     >>> import heat as ht
-    >>> a = ht.load_csv('data.csv')
+    >>> a = ht.load_csv("data.csv")
     >>> a.shape
     [0/3] (150, 4)
     [1/3] (150, 4)
@@ -780,7 +886,7 @@ def load_csv(
     [1/3] (38, 4)
     [2/3] (37, 4)
     [3/3] (37, 4)
-    >>> b = ht.load_csv('data.csv', header_lines=10)
+    >>> b = ht.load_csv("data.csv", header_lines=10)
     >>> b.shape
     [0/3] (140, 4)
     [1/3] (140, 4)
@@ -833,12 +939,12 @@ def load_csv(
             f.seek(displs[rank], 0)
             line_starts = []
             r = f.read(counts[rank])
-            for pos, l in enumerate(r):
-                if chr(l) == "\n":
+            for pos, line in enumerate(r):
+                if chr(line) == "\n":
                     # Check if it is part of '\r\n'
                     if chr(r[pos - 1]) != "\r":
                         line_starts.append(pos + 1)
-                elif chr(l) == "\r":
+                elif chr(line) == "\r":
                     # check if file line is terminated by '\r\n'
                     if pos + 1 < len(r) and chr(r[pos + 1]) == "\n":
                         line_starts.append(pos + 2)
@@ -1107,7 +1213,7 @@ def save(
         Additional options passed to the particular functions.
 
     Raises
-    -------
+    ------
     ValueError
         If the file extension is not understood or known.
     RuntimeError
@@ -1116,7 +1222,7 @@ def save(
     Examples
     --------
     >>> x = ht.arange(100, split=0)
-    >>> ht.save(x, 'data.h5', 'DATA', mode='a')
+    >>> ht.save(x, "data.h5", "DATA", mode="a")
     """
     if not isinstance(path, str):
         raise TypeError(f"Expected path to be str, but was {type(path)}")
@@ -1134,6 +1240,11 @@ def save(
             raise RuntimeError(f"netcdf is required for file extension {extension}")
     elif extension in __CSV_EXTENSION:
         save_csv(data, path, *args, **kwargs)
+    elif extension in __ZARR_EXTENSIONS:
+        if supports_zarr():
+            return save_zarr(data, path, *args, **kwargs)
+        else:
+            raise RuntimeError(f"Package zarr is required for file extension {extension}")
     else:
         raise ValueError(f"Unsupported file extension {extension}")
 
@@ -1283,3 +1394,227 @@ def load_csv_from_folder(
         larray = torch.from_numpy(larray)
         x = factories.array(larray, dtype=dtype, device=device, is_split=split, comm=comm)
         return x
+
+
+try:
+    import zarr
+except ModuleNotFoundError:
+
+    def supports_zarr() -> bool:
+        """
+        Returns ``True`` if zarr is installed, ``False`` otherwise.
+        """
+        return False
+
+else:
+    __all__.extend(["load_zarr", "save_zarr"])
+
+    def supports_zarr() -> bool:
+        """
+        Returns ``True`` if zarr is installed, ``False`` otherwise.
+        """
+        return True
+
+    def load_zarr(
+        path: str,
+        split: int = 0,
+        device: Optional[str] = None,
+        comm: Optional[Communication] = None,
+        slices: Union[None, slice, Iterable[Union[slice, None]]] = None,
+        **kwargs,
+    ) -> DNDarray:
+        """
+        Loads zarr-Format into DNDarray which will be returned.
+
+        Parameters
+        ----------
+        path : str
+            Path to the directory in which a .zarr-file is located.
+        split : int
+            Along which axis the loaded arrays should be concatenated.
+        device : str, optional
+            The device id on which to place the data, defaults to globally set default device.
+        comm : Communication, optional
+            The communication to use for the data distribution, default is 'heat.MPI_WORLD'
+        slices: Union[None, slice, Iterable[Union[slice, None]]]
+            Load only a slice of the array instead of everything
+        **kwargs : Any
+            extra Arguments to pass to zarr.open
+        """
+        if not isinstance(path, str):
+            raise TypeError(f"path must be str, not {type(path)}")
+        if split is not None and not isinstance(split, int):
+            raise TypeError(f"split must be None or int, not {type(split)}")
+        if device is not None and not isinstance(device, str):
+            raise TypeError(f"device must be None or str, not {type(split)}")
+        if not isinstance(slices, (slice, Iterable)) and slices is not None:
+            raise TypeError(f"Slices Argument must be slice, tuple or None and not {type(slices)}")
+        if isinstance(slices, Iterable):
+            for elem in slices:
+                if isinstance(elem, slice) or elem is None:
+                    continue
+                raise TypeError(f"Tuple values of slices must be slice or None, not {type(elem)}")
+
+        for extension in __ZARR_EXTENSIONS:
+            if fnmatch.fnmatch(path, f"*{extension}"):
+                break
+        else:
+            raise ValueError("File has no zarr extension.")
+
+        arr: zarr.Array = zarr.open_array(store=path, **kwargs)
+        shape = arr.shape
+
+        if isinstance(slices, slice) or slices is None:
+            slices = [slices]
+
+        if len(shape) < len(slices):
+            raise ValueError(
+                f"slices Argument has more arguments than the length of the shape of the array. {len(shape)} < {len(slices)}"
+            )
+
+        slices = [elem if elem is not None else slice(None) for elem in slices]
+        slices.extend([slice(None) for _ in range(abs(len(slices) - len(shape)))])
+
+        dtype = types.canonical_heat_type(arr.dtype)
+        device = devices.sanitize_device(device)
+        comm = sanitize_comm(comm)
+
+        # slices = tuple(slice(*tslice.indices(length)) for length, tslice in zip(shape, slices))
+        slices = tuple(slices)
+        shape = [len(range(*tslice.indices(length))) for length, tslice in zip(shape, slices)]
+        offset, local_shape, local_slices = comm.chunk(shape, split)
+
+        return factories.array(
+            arr[slices][local_slices], dtype=dtype, is_split=split, device=device, comm=comm
+        )
+
+    def save_zarr(dndarray: DNDarray, path: str, overwrite: bool = False, **kwargs) -> None:
+        """
+        Writes the DNDArray into the zarr-format.
+
+        Parameters
+        ----------
+        dndarray : DNDarray
+            DNDArray to save.
+        path : str
+            path to save to.
+        overwrite : bool
+            Wether to overwrite an existing array.
+        **kwargs : Any
+            extra Arguments to pass to zarr.open and zarr.create
+
+        Raises
+        ------
+        TypeError
+            - If given parameters do not match or have conflicting information.
+            - If it already exists and no overwrite is specified.
+
+        Notes
+        -----
+        Zarr functions by chunking the data, were a chunk is a file inside the store.
+        The problem ist that only one process writes to it at a time. Therefore when two
+        processes try to write to the same chunk one will fail, unless the other finishes before
+        the other starts.
+
+        To alleviate it we can define the chunk sizes ourselves. To do this we just get the lowest size of
+        the distributed axis, ex: split=0 with a (4,4) shape with a worldsize of 4 you would chunk it with (1,4).
+
+        A problem arises when a process gets a bigger chunk and interferes with another process. Example:
+        N_PROCS = 4
+        SHAPE = (9,10)
+        SPLIT = 0
+        CHUNKS => (2,10)
+
+        In this problem one process will have a write region of 3 rows and therefore be able to either not write
+        or overwrite what another process does therefore destroying the parallel write as it would at the end load
+        2 chunks to write 3 rows.
+        To counter act this we just set the chunk size in the split axis to 1. This allows for no overwrites but can
+        cripple write speeds and or even speed it up.
+
+        Another Problem with this approach is that we tell zarr have full chunks, i.e if array has shape (10_000, 10_000)
+        and we split it at axis=0 with 4 processes we have chunks of (2_500, 10_000). Zarr will load the whole chunk into
+        memory making it memory intensive and probably inefficient. Better approach would be to have a smaller chunk size
+        for example half of it but that cannot be determined at all times so the current approach is a compromise.
+
+        Another Problem is the split=None scenario. In this case every processs has the same data, so only one needs to write
+        so we ignore chunking and let zarr decide the chunk size and let only one process, aka rank=0 write.
+
+        To avoid errors when using NumPy arrays as chunk shape, the chunks argument is only passed to zarr.create if it is
+        not None. This prevents issues with ambiguous truth values or attribute errors on None.
+
+        """
+        if not isinstance(path, str):
+            raise TypeError(f"path must be str, not {type(path)}")
+
+        for extension in __ZARR_EXTENSIONS:
+            if fnmatch.fnmatch(path, f"*{extension}"):
+                break
+        else:
+            raise ValueError("path does not end on an Zarr extension.")
+
+        if os.path.exists(path) and not overwrite:
+            raise RuntimeError("Given Path already exists.")
+
+        if MPI_WORLD.rank == 0:
+            if dndarray.split is None or MPI_WORLD.size == 1:
+                chunks = None
+            else:
+                chunks = np.array(dndarray.gshape)
+                axis = dndarray.split
+
+                if chunks[axis] % MPI_WORLD.size != 0:
+                    chunks[axis] = 1
+                else:
+                    chunks[axis] //= MPI_WORLD.size
+
+                    CODEC_LIMIT_BYTES = 2**31 - 1  # PR#1766
+
+                    for _ in range(
+                        10
+                    ):  # Use for loop instead of while true for better handling of edge cases
+                        byte_size = reduce(operator.mul, chunks, 1) * dndarray.larray.element_size()
+                        if byte_size > CODEC_LIMIT_BYTES:
+                            if chunks[axis] % 2 == 0:
+                                chunks[axis] /= 2
+                                continue
+                            else:
+                                chunks[axis] = 1
+                                break
+                        else:
+                            break
+                    else:
+                        chunks[axis] = 1
+                        warnings.warn(
+                            "Calculation of chunk size for zarr format unexpectadly defaulted to 1 on the split axis"
+                        )
+
+            dtype = dndarray.dtype.char()
+
+            zarr_create_kwargs = {
+                "store": path,
+                "shape": dndarray.gshape,
+                "dtype": dtype,
+                "overwrite": overwrite,
+                **kwargs,
+            }
+
+            if chunks is not None:
+                zarr_create_kwargs["chunks"] = chunks.tolist()
+
+            zarr_array = zarr.create(**zarr_create_kwargs)
+
+        # Wait for the file creation to finish
+        MPI_WORLD.Barrier()
+        zarr_array = zarr.open(store=path, mode="r+", **kwargs)
+
+        if dndarray.split is not None:
+            _, _, slices = MPI_WORLD.chunk(dndarray.gshape, dndarray.split)
+
+            zarr_array[slices] = (
+                dndarray.larray.cpu().numpy()  # Numpy array needed as zarr can only understand numpy dtypes and infers it.
+            )
+        else:
+            if MPI_WORLD.rank == 0:
+                zarr_array[:] = dndarray.larray.cpu().numpy()
+
+        MPI_WORLD.Barrier()
diff --git a/heat/core/linalg/__init__.py b/heat/core/linalg/__init__.py
index d4a2b1f972..d3a75b2ae3 100644
--- a/heat/core/linalg/__init__.py
+++ b/heat/core/linalg/__init__.py
@@ -7,3 +7,5 @@
 from .qr import *
 from .svdtools import *
 from .svd import *
+from .polar import *
+from .eigh import *
diff --git a/heat/core/linalg/basics.py b/heat/core/linalg/basics.py
index 2c0a786138..7a403b08fd 100644
--- a/heat/core/linalg/basics.py
+++ b/heat/core/linalg/basics.py
@@ -24,8 +24,12 @@
 from .. import statistics
 from .. import stride_tricks
 from .. import types
+from ..random import randn
+from .qr import qr
+from .solver import solve_triangular
 
 __all__ = [
+    "condest",
     "cross",
     "det",
     "dot",
@@ -45,6 +49,116 @@
 ]
 
 
+def _estimate_largest_singularvalue(A: DNDarray, algorithm: str = "fro") -> DNDarray:
+    """
+    Computes an upper estimate for the largest singular value of the input 2D DNDarray.
+
+    Parameters
+    ----------
+    A : DNDarray
+        The matrix, i.e., a 2D DNDarray, for which the largest singular value should be estimated.
+    algorithm : str
+        The algorithm to use for the estimation. Currently, only "fro" (default) is implemented.
+        If "fro" is chosen, the Frobenius norm of the matrix is used as an upper estimate.
+    """
+    if not isinstance(algorithm, str):
+        raise TypeError(
+            f"Parameter 'algorithm' needs to be a string, but is {algorithm} with data type {type(algorithm)}."
+        )
+    if algorithm == "fro":
+        return matrix_norm(A, ord="fro").squeeze()
+    else:
+        raise NotImplementedError("So far only algorithm='fro' implemented.")
+
+
+def condest(
+    A: DNDarray, p: Union[int, str] = None, algorithm: str = "randomized", params: list = None
+) -> DNDarray:
+    """
+    Computes a (possibly randomized) upper estimate of the l2-condition number of the input 2D DNDarray.
+
+    Parameters
+    ----------
+    A : DNDarray
+        The matrix, i.e., a 2D DNDarray, for which the condition number shall be estimated.
+    p : int or str (optional)
+        The norm to use for the condition number computation. If None, the l2-norm (default, p=2) is used.
+        So far, only p=2 is implemented.
+    algorithm : str
+        The algorithm to use for the estimation. Currently, only "randomized" (default) is implemented.
+    params : dict (optional)
+        A list of parameters required for the chosen algorithm; if not provided, default values for the respective algorithm are chosen.
+        If `algorithm="randomized"` the number of random samples to use can be specified under the key "nsamples"; default is 10.
+
+    Notes
+    -----
+    The "randomized" algorithm follows the approach described in [1]; note that in the paper actually the condition number w.r.t. the Frobenius norm is estimated.
+    However, this yields an upper bound for the condition number w.r.t. the l2-norm as well.
+
+    References
+    ----------
+    [1] T. Gudmundsson, C. S. Kenney, and A. J. Laub. Small-Sample Statistical Estimates for Matrix Norms. SIAM Journal on Matrix Analysis and Applications 1995 16:3, 776-792.
+    """
+    if p is None:
+        p = 2
+    if p != 2:
+        raise ValueError(
+            f"Only the case p=2 (condition number w.r.t. the euclidean norm) is implemented so far, but input was p={p} (type: {type(p)})."
+        )
+    if not isinstance(algorithm, str):
+        raise TypeError(
+            f"Parameter 'algorithm' needs to be a string, but is {algorithm} with data type {type(algorithm)}."
+        )
+    if algorithm == "randomized":
+        if params is None:
+            nsamples = 10  # set default value
+        else:
+            if not isinstance(params, dict) or "nsamples" not in params:
+                raise TypeError(
+                    "If not None, 'params' needs to be a dictionary containing the number of samples under the key 'nsamples'."
+                )
+            if not isinstance(params["nsamples"], int) or params["nsamples"] <= 0:
+                raise ValueError(
+                    f"The number of samples needs to be a positive integer, but is {params['nsamples']} with data type {type(params['nsamples'])}."
+                )
+            nsamples = params["nsamples"]
+
+        m = A.shape[0]
+        n = A.shape[1]
+
+        if n > m:
+            # the algorithm only works for m >= n, but fortunately, the condition number (w.r.t. l2-norm) is invariant under transposition
+            return condest(A.T, p=p, algorithm=algorithm, params=params)
+
+        _, R = qr(A, mode="r")  # only R factor is computed in QR
+
+        # random samples from unit sphere
+        # regarding the split: if A.split == 1, then n is probably large and we should split along an axis of size n; otherwise, both n and nsamples should be small
+        Q, R_not_used = qr(
+            randn(
+                n,
+                nsamples,
+                dtype=A.dtype,
+                split=0 if A.split == 1 else None,
+                device=A.device,
+                comm=A.comm,
+            )
+        )
+        del R_not_used
+
+        est = (
+            matrix_norm(R @ Q)
+            * A.dtype((m / nsamples) ** 0.5, comm=A.comm)
+            * matrix_norm(solve_triangular(R, Q))
+        )
+
+        return est.squeeze()
+    else:
+        raise NotImplementedError(
+            "So far only algorithm='randomized' is implemented. Please open an issue on GitHub if you would like to suggest implementing another algorithm."
+        )
+
+
 def cross(
     a: DNDarray, b: DNDarray, axisa: int = -1, axisb: int = -1, axisc: int = -1, axis: int = -1
 ) -> DNDarray:
@@ -174,7 +288,7 @@ def det(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[-2,-1,2],[2,1,4],[-3,3,-1]])
+    >>> a = ht.array([[-2, -1, 2], [2, 1, 4], [-3, 3, -1]])
     >>> ht.linalg.det(a)
     DNDarray(54., dtype=ht.float64, device=cpu:0, split=None)
     """
@@ -328,7 +442,7 @@ def inv(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[1., 2], [2, 3]])
+    >>> a = ht.array([[1.0, 2], [2, 3]])
     >>> ht.linalg.inv(a)
     DNDarray([[-3.,  2.],
               [ 2., -1.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -347,7 +461,13 @@ def inv(a: DNDarray) -> DNDarray:
 
     # no split in the square matrices
     if not a.is_distributed() or a.split < a.ndim - 2:
-        data = torch.inverse(a.larray)
+        try:
+            data = torch.inverse(a.larray)
+        except RuntimeError as e:
+            raise RuntimeError(e)
+        # torch.linalg.inv does not raise RuntimeError on MPS when inversion fails
+        if data.is_mps and torch.any(data.isnan()):
+            raise RuntimeError("linalg.inv: inversion could not be performed")
         return DNDarray(
             data,
             a.shape,
@@ -428,23 +548,23 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
     Batched inputs (with batch dimensions being leading dimensions) are allowed; see also the Notes below.
 
     Parameters
-    -----------
+    ----------
     a : DNDarray
-        matrix :math:`L \\times P` or vector :math:`P` or batch of matrices/vectors: :math:`B_1 \\times ... \\times B_k [\\times L] \\times P`
+        matrix :math:`L \\times P` or vector :math:`P` or batch of matrices: :math:`B_1 \\times ... \\times B_k \\times L \\times P`
     b : DNDarray
-        matrix :math:`P \\times Q` or vector :math:`P` or batch of matrices/vectors: :math:`B_1 \\times ... \\times B_k \\times P [\\times Q]`
+        matrix :math:`P \\times Q` or vector :math:`P` or batch of matrices: :math:`B_1 \\times ... \\times B_k \\times P \\times Q`
     allow_resplit : bool, optional
         Whether to distribute ``a`` in the case that both ``a.split is None`` and ``b.split is None``.
         Default is ``False``. If ``True``, if both are not split then ``a`` will be distributed in-place along axis 0.
 
     Notes
-    -----------
+    -----
     - For batched inputs, batch dimensions must coincide and if one matrix is split along a batch axis the other must be split along the same axis.
-    - If ``a`` or ``b`` is a (possibly batched) vector the result will also be a (possibly batched) vector.
+    - If ``a`` or ``b`` is a vector the result will also be a vector.
     - We recommend to avoid the particular split combinations ``1``-``0``, ``None``-``0``, and ``1``-``None`` (for ``a.split``-``b.split``) due to their comparably high memory consumption, if possible. Applying ``DNDarray.resplit_`` or ``heat.resplit`` on one of the two factors before calling ``matmul`` in these situations might improve performance of your code / might avoid memory bottlenecks.
 
     References
-    -----------
+    ----------
     [1] R. Gu, et al., "Improving Execution Concurrency of Large-scale Matrix Multiplication on
     Distributed Data-parallel Platforms," IEEE Transactions on Parallel and Distributed Systems,
     vol 28, no. 9. 2017. \n
@@ -453,7 +573,7 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
     Workshops (IPDPSW), Vancouver, BC, 2018, pp. 877-882.
 
     Examples
-    -----------
+    --------
     >>> a = ht.ones((n, m), split=1)
     >>> a[0] = ht.arange(1, m + 1)
     >>> a[:, -1] = ht.arange(1, n + 1).larray
@@ -529,6 +649,10 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
             raise NotImplementedError(
                 "Both input matrices have to be split along the same batch axis!"
             )
+        if vector_flag:  # batched matrix vector multiplication not supported
+            raise NotImplementedError(
+                "Batched matrix-vector multiplication is not supported, try using expand_dims to make it a batched matrix-matrix multiplication."
+            )
 
     comm = a.comm
     ndim = max(a.ndim, b.ndim)
@@ -695,11 +819,11 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
                 kB, a.gshape[-1]
             )  # shouldnt this always be kB and be the same as for split 11?
 
-        if a.lshape[-1] % kB != 0 or (
-            kB == 1 and a.lshape[-1] != 1
-        ):  # does kb == 1 imply a.lshape[-1] > 1?
+        if (kB == 1 and a.lshape[-1] != 1) or a.lshape[
+            -1
+        ] % kB != 0:  # does kb == 1 imply a.lshape[-1] > 1?
             rem_a = 1
-        if b.lshape[-2] % kB != 0 or (kB == 1 and b.lshape[-2] != 1):
+        if (kB == 1 and b.lshape[-2] != 1) or b.lshape[-2] % kB != 0:
             rem_b = 1
 
         # get the lshape map to determine what needs to be sent where as well as M and N
@@ -1236,9 +1360,9 @@ def matrix_norm(
 
     Examples
     --------
-    >>> ht.matrix_norm(ht.array([[1,2],[3,4]]))
+    >>> ht.matrix_norm(ht.array([[1, 2], [3, 4]]))
     DNDarray([[5.4772]], dtype=ht.float64, device=cpu:0, split=None)
-    >>> ht.matrix_norm(ht.array([[1,2],[3,4]]), keepdims=True, ord=-1)
+    >>> ht.matrix_norm(ht.array([[1, 2], [3, 4]]), keepdims=True, ord=-1)
     DNDarray([[4.]], dtype=ht.float64, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(x)
@@ -1382,9 +1506,9 @@ def norm(
     DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None)
     >>> LA.norm(b)
     DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None)
-    >>> LA.norm(b, ord='fro')
+    >>> LA.norm(b, ord="fro")
     DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None)
-    >>> LA.norm(a, float('inf'))
+    >>> LA.norm(a, float("inf"))
     DNDarray([4.], dtype=ht.float32, device=cpu:0, split=None)
     >>> LA.norm(b, ht.inf)
     DNDarray([9.], dtype=ht.float32, device=cpu:0, split=None)
@@ -1416,8 +1540,8 @@ def norm(
     DNDarray([3.7417, 4.2426], dtype=ht.float64, device=cpu:0, split=None)
     >>> LA.norm(c, axis=1, ord=1)
     DNDarray([6., 6.], dtype=ht.float64, device=cpu:0, split=None)
-    >>> m = ht.arange(8).reshape(2,2,2)
-    >>> LA.norm(m, axis=(1,2))
+    >>> m = ht.arange(8).reshape(2, 2, 2)
+    >>> LA.norm(m, axis=(1, 2))
     DNDarray([ 3.7417, 11.2250], dtype=ht.float32, device=cpu:0, split=None)
     >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :])
     (DNDarray(3.7417, dtype=ht.float32, device=cpu:0, split=None), DNDarray(11.2250, dtype=ht.float32, device=cpu:0, split=None))
@@ -2329,11 +2453,11 @@ def vdot(x1: DNDarray, x2: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([1+1j, 2+2j])
-    >>> b = ht.array([1+2j, 3+4j])
-    >>> ht.vdot(a,b)
+    >>> a = ht.array([1 + 1j, 2 + 2j])
+    >>> b = ht.array([1 + 2j, 3 + 4j])
+    >>> ht.vdot(a, b)
     DNDarray([(17+3j)], dtype=ht.complex64, device=cpu:0, split=None)
-    >>> ht.vdot(b,a)
+    >>> ht.vdot(b, a)
     DNDarray([(17-3j)], dtype=ht.complex64, device=cpu:0, split=None)
     """
     x1 = manipulations.flatten(x1)
@@ -2366,7 +2490,7 @@ def vecdot(
 
     Examples
     --------
-    >>> ht.vecdot(ht.full((3,3,3),3), ht.ones((3,3)), axis=0)
+    >>> ht.vecdot(ht.full((3, 3, 3), 3), ht.ones((3, 3)), axis=0)
     DNDarray([[9., 9., 9.],
               [9., 9., 9.],
               [9., 9., 9.]], dtype=ht.float32, device=cpu:0, split=None)
@@ -2433,9 +2557,9 @@ def vector_norm(
 
     Examples
     --------
-    >>> ht.vector_norm(ht.array([1,2,3,4]))
+    >>> ht.vector_norm(ht.array([1, 2, 3, 4]))
     DNDarray([5.4772], dtype=ht.float64, device=cpu:0, split=None)
-    >>> ht.vector_norm(ht.array([[1,2],[3,4]]), axis=0, ord=1)
+    >>> ht.vector_norm(ht.array([[1, 2], [3, 4]]), axis=0, ord=1)
     DNDarray([[4., 6.]], dtype=ht.float64, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(x)
diff --git a/heat/core/linalg/eigh.py b/heat/core/linalg/eigh.py
new file mode 100644
index 0000000000..0a7524447d
--- /dev/null
+++ b/heat/core/linalg/eigh.py
@@ -0,0 +1,309 @@
+"""
+Implements Symmetric Eigenvalue Decomposition
+"""
+
+import numpy as np
+import collections
+import torch
+from typing import Type, Callable, Dict, Any, TypeVar, Union, Tuple
+
+from ..dndarray import DNDarray
+from .. import factories
+from .. import types
+from ..linalg import matrix_norm, vector_norm, matmul, qr, polar
+from ..indexing import where
+from ..random import randn
+from ..devices import Device
+from ..manipulations import vstack, hstack, concatenate, diag, balance
+from .. import statistics
+from mpi4py import MPI
+from ..sanitation import sanitize_in_nd_realfloating
+
+
+__all__ = ["eigh"]
+
+
+def _subspaceiteration(
+    A: DNDarray,
+    C: DNDarray,
+    silent: bool = True,
+    safetyparam: int = 3,
+    maxit: int = None,
+    tol: float = None,
+    depth: int = 0,
+) -> DNDarray:
+    """
+    Auxiliary function that implements the subspace iteration as required for symmetric eigenvalue decomposition
+    via polar decomposition; cf. Ref. 2 below. The algorithm for subspace iteration itself is taken from Ref. 1,
+    Algorithm 3 in Sect. 5.1.
+
+    Given a symmetric matrix ``A`` and and a matrix ``C`` that is the orthogonal projection onto an invariant
+    subspace of A, this function computes and returns an orthogonal matrix ``Q`` such that Q = [V_1 V_2] with
+    C = V_1 V_1.T. Moreover, the dimension of the invariant subspace, i.e., the number of column of V_1 is
+    returned as well.
+
+    References
+    ----------
+    1.  Nakatsukasa, Y., & Higham, N. J. (2013). Stable and efficient spectral divide and conquer algorithms for
+        Hermitian eigenproblems. SIAM Journal on Scientific Computing, 35(3).
+    2.  Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the
+        matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3).
+    """
+    # set parameters for convergence
+    if A.dtype == types.float64:
+        maxit = 3 if maxit is None else maxit
+        tol = 1e-8 if tol is None else tol
+    elif A.dtype == types.float32:
+        maxit = 6 if maxit is None else maxit
+        tol = 1e-4 if tol is None else tol
+    else:
+        raise TypeError(
+            f"Input DNDarray must be of data type float32 or float64, but is of type {A.dtype}."
+        )
+
+    Anorm = matrix_norm(A, ord="fro")
+
+    # this initialization is proposed in Ref. 1, Sect. 5.1
+    k = int(round(matrix_norm(C, ord="fro").item() ** 2))
+    columnnorms = vector_norm(C, axis=0)
+    idx = where(
+        columnnorms
+        >= factories.ones(
+            columnnorms.shape,
+            comm=columnnorms.comm,
+            split=columnnorms.split,
+            device=columnnorms.device,
+        )
+        * statistics.percentile(columnnorms, 100.0 * (1 - (k + safetyparam) / columnnorms.shape[0]))
+    )
+    X = C[:, idx].balance()
+
+    # actual subspace iteration
+    it = 1
+    while it < maxit + 1:
+        # enrich X by additional random columns to get a full orthonormal basis by QR
+        X = hstack(
+            [
+                X,
+                randn(
+                    X.shape[0],
+                    X.shape[0] - X.shape[1],
+                    dtype=X.dtype,
+                    device=X.device,
+                    comm=X.comm,
+                    split=X.split,
+                ),
+            ]
+        )
+        Q, _ = qr(X)
+        Q_k = Q[:, :k].balance()
+        Q_k_orth = Q[:, k:].balance()
+        E = (Q_k_orth.T @ A) @ Q_k
+        Enorm = matrix_norm(E, ord="fro")
+        if Enorm / Anorm < tol:
+            # exit if success
+            if A.comm.rank == 0 and not silent:
+                print("\t" * depth + f"            Number of subspace iterations: {it}")
+            return Q, k
+        # else go on with iteration
+        X = C @ Q_k
+        it += 1
+    # warning if the iteration did not converge within the maximum number of iterations
+    if A.comm.rank == 0 and not silent:
+        print(
+            "\t" * depth
+            + f"            Subspace iteration did not converge in {maxit} iterations. \n"
+            + "\t" * depth
+            + f"            It holds ||E||_F/||A||_F = {Enorm / Anorm}, which might impair the accuracy of the result."  # noqa E226
+        )
+    return Q, k
+
+
+def _eigh(
+    A: DNDarray,
+    r: int = None,
+    silent: bool = True,
+    r_max: int = 8,
+    depth: int = 0,
+    orig_lsize: int = 0,
+) -> Tuple[DNDarray, DNDarray]:
+    """
+    Auxiliary function for eigh containing the main algorithmic content.
+    Inputs are as for the public `eigh`-function, except for:
+        `depth`:  an internal variable that is used to track the recursion depth,
+        `orig_lsize` an internal variable that is used to propagate the local shapes of the original input matrix
+            through the recursions in order to determine when the direct solution of the reduced problems is possible),
+        `r`: a hyperparameter for the computation of the polar decomposition via :func:`heat.linalg.polar` which is
+            applied multiple times in this function. See the documentation of :func:`heat.linalg.polar` for more details.
+            In the actual implementation, this parameter is set to `None` for simplicity.
+    """
+    n = A.shape[0]
+    global_comm = A.comm
+    nprocs = global_comm.Get_size()
+    rank = global_comm.rank
+
+    # direct solution in torch if the problem is small enough
+    if n <= orig_lsize or not A.is_distributed():
+        orig_split = A.split
+        A.resplit_(None)
+        Lambda_loc, Q_loc = torch.linalg.eigh(A.larray)
+        Lambda = factories.array(torch.flip(Lambda_loc, (0,)), split=0, comm=A.comm)
+        V = factories.array(torch.flip(Q_loc, (1,)), split=orig_split, comm=A.comm)
+        A.resplit_(orig_split)
+        return Lambda, V
+
+    if orig_lsize == 0:
+        orig_lsize = min(A.lshape_map[:, A.split])
+
+    # now we handle the main case: Zolo-PD is used to reduce the problem to two independent problems
+    sigma = statistics.median(diag(A))
+
+    U = polar.polar(
+        A
+        - sigma * factories.eye((n, n), dtype=A.dtype, device=A.device, comm=A.comm, split=A.split),
+        r,
+        False,
+    )
+
+    V, k = _subspaceiteration(
+        A,
+        0.5
+        * (U + factories.eye((n, n), dtype=A.dtype, device=A.device, comm=A.comm, split=A.split)),
+        silent,
+        depth,
+    )
+    A = V.T @ A @ V
+
+    if A.comm.rank == 0 and not silent:
+        print(
+            "\t" * depth
+            + f"At depth {depth}: Zolo-PD(r={'auto' if r is None else r}) on {nprocs} processes reduced symmetric eigenvalue problem of size {n} to"
+        )
+        print(
+            "\t" * depth
+            + f"            two independent problems of size {k} and {n - k} respectively."
+        )
+
+    # from the "global" A, two independent "local" A's are created
+    # the number of processes per local array is roughly proportional to their size with the constraint that
+    # each "local" A needs to get at least one process
+    nprocs1 = max(1, min(nprocs - 1, round(k / n * nprocs)))
+    nprocs2 = nprocs - nprocs1
+    new_lshapes = torch.tensor(
+        [k // nprocs1 + (i < k % nprocs1) for i in range(nprocs1)]
+        + [(n - k) // nprocs2 + (i < (n - k) % nprocs2) for i in range(nprocs2)]
+    )
+    new_lshape_map = A.lshape_map
+    new_lshape_map[:, A.split] = new_lshapes
+    A.redistribute_(target_map=new_lshape_map)
+    local_comm = A.comm.Split(color=rank < nprocs1, key=rank)
+    if A.split == 1:
+        A_local = factories.array(
+            A.larray[:k, :] if rank < nprocs1 else A.larray[k:, :],
+            comm=local_comm,
+            is_split=A.split,
+        )
+    else:
+        A_local = factories.array(
+            A.larray[:, :k] if rank < nprocs1 else A.larray[:, k:],
+            comm=local_comm,
+            is_split=A.split,
+        )
+
+    Lambda_local, V_local = _eigh(A_local, r, silent, r_max, depth + 1, orig_lsize)
+
+    Lambda = factories.array(Lambda_local.larray, is_split=0, comm=A.comm)
+    V_local_larray = V_local.larray
+    if A.split == 0:
+        if rank < nprocs1:
+            V_local_larray = torch.hstack(
+                [
+                    V_local_larray,
+                    torch.zeros(V_local_larray.shape[0], n - k, device=V_local.device.torch_device),
+                ]
+            )
+        else:
+            V_local_larray = torch.hstack(
+                [
+                    torch.zeros(V_local_larray.shape[0], k, device=V_local.device.torch_device),
+                    V_local_larray,
+                ]
+            )
+    else:
+        if rank < nprocs1:
+            V_local_larray = torch.vstack(
+                [
+                    V_local_larray,
+                    torch.zeros(n - k, V_local_larray.shape[1], device=V_local.device.torch_device),
+                ]
+            )
+        else:
+            V_local_larray = torch.vstack(
+                [
+                    torch.zeros(k, V_local_larray.shape[1], device=V_local.device.torch_device),
+                    V_local_larray,
+                ]
+            )
+    V_new = factories.array(V_local_larray, is_split=A.split, comm=A.comm, device=A.device)
+    V.balance_()
+    V_new.balance_()
+    V = V @ V_new
+
+    if A.comm.rank == 0 and not silent:
+        print(
+            "\t" * depth
+            + f"At depth {depth}: solutions of two independent problems of size {k} and {n - k} have been merged successfully."
+        )
+
+    return Lambda, V
+
+
+def eigh(
+    A: DNDarray,
+    r_max_zolopd: int = 8,
+    silent: bool = True,
+) -> Tuple[DNDarray, DNDarray]:
+    """
+    Computes the symmetric eigenvalue decomposition of a symmetric n x n - matrix A, provided as a DNDarray.
+
+    The function returns DNDarrays Lambda (shape (n,) with split = 0) and V (shape (n,n)) such that
+    A = V @ diag(Lambda) @ V^T, where Lambda contains the eigenvalues of A and V is an orthonormal matrix
+    containing the corresponding eigenvectors as columns.
+
+    Parameters
+    ----------
+    A : DNDarray
+        The input matrix. Must be symmetric.
+    r_max_zolopd : int, optional
+        This is a hyperparameter for the computation of the polar decomposition via :func:`heat.linalg.polar` which is
+        applied multiple times in this function. See the documentation of :func:`heat.linalg.polar` for more details on its
+        meaning and the respective default value.
+    silent : bool, optional
+        If True (default), suppresses output messages; otherwise, some information on the recursion is printed to the console.
+
+    Notes
+    -----
+    Unlike the :func:`torch.linalg.eigh` function, the eigenvalues are returned in descending order.
+    Note that no check of symmetry is performed on the input matrix A; thus, applying this function to a non-symmetric matrix may
+    result in unpredictable behaviour without a specific error message pointing to this issue.
+
+    The algorithm used for the computation of the symmetric eigenvalue decomposition is based on the Zolotarev polar decomposition;
+    see Algorithm 5.2 in:
+
+        Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the
+        matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3).
+
+    See Also
+    --------
+    :func:`heat.linalg.polar`
+    """
+    sanitize_in_nd_realfloating(A, "A", [2])
+    if A.shape[0] != A.shape[1]:
+        raise ValueError(
+            f"Input matrix must be symmetric and, consequently, square, but input shape was {A.shape[0]} x {A.shape[1]}."
+        )
+    if not isinstance(r_max_zolopd, int) or r_max_zolopd < 1 or r_max_zolopd > 8:
+        raise ValueError(
+            f"If provided, parameter r_max_zolopd must be a positive integer, but was {r_max_zolopd} of type {type(r_max_zolopd)}."
+        )
+    return _eigh(A, None, silent, r_max_zolopd, 0, 0)
diff --git a/heat/core/linalg/polar.py b/heat/core/linalg/polar.py
new file mode 100644
index 0000000000..ef3e58268e
--- /dev/null
+++ b/heat/core/linalg/polar.py
@@ -0,0 +1,370 @@
+"""
+Implements polar decomposition (PD)
+"""
+
+import numpy as np
+import collections
+import torch
+from typing import Type, Callable, Dict, Any, TypeVar, Union, Tuple
+
+from ..communication import MPICommunication, MPI
+from ..dndarray import DNDarray
+from .. import factories
+from .. import types
+from . import matrix_norm, vector_norm, matmul, qr, solve_triangular
+from .basics import _estimate_largest_singularvalue, condest
+from ..indexing import where
+from ..random import randn
+from ..devices import Device
+from ..manipulations import vstack, hstack, concatenate, diag, balance
+from ..exponential import sqrt
+from .. import statistics
+
+from scipy.special import ellipj
+from scipy.special import ellipkm1
+
+__all__ = ["polar"]
+
+
+def _zolopd_n_iterations(r: int, kappa: float) -> int:
+    """
+    Returns the number of iterations required in the Zolotarev-PD algorithm.
+    See the Table 3.1 in: Nakatsukasa, Y., & Freund, R. W. (2016). Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334
+
+    Inputs are `r` and `kappa` (named as in the paper), and the output is the number of iterations.
+    """
+    if kappa <= 1e2:
+        its = [4, 3, 2, 2, 2, 2, 2, 2]
+    elif kappa <= 1e3:
+        its = [3, 3, 2, 2, 2, 2, 2, 2]
+    elif kappa <= 1e5:
+        its = [5, 3, 3, 3, 2, 2, 2, 2]
+    elif kappa <= 1e7:
+        its = [5, 4, 3, 3, 3, 2, 2, 2]
+    else:
+        its = [6, 4, 3, 3, 3, 3, 3, 2]
+    return its[r - 1]
+
+
+def _compute_zolotarev_coefficients(
+    r: int, ell: float, device: str, dtype: types.datatype = types.float64
+) -> Tuple[DNDarray, DNDarray, DNDarray]:
+    """
+    Computes c=(c_i)_i defined in equation (3.4), as well as a=(a_j)_j and Mhat defined in formulas (4.2)/(4.3) of the paper Nakatsukasa, Y., & Freund, R. W. (2016). Computing the polar decomposition with applications. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334.
+    Evaluations of the respective complete elliptic integral of the first kind and the Jacobi elliptic functions are imported from SciPy.
+
+    Inputs are `r` and `ell` (named as in the paper), as well as the Heat data type `dtype` of the output (required for reasons of consistency).
+    Output is a tupe containing the vectors `a` and `c` as DNDarrays and `Mhat`.
+    """
+    uu = np.arange(1, 2 * r + 1) * ellipkm1(ell**2) / (2 * r + 1)
+    ellipfcts = np.asarray(ellipj(uu, 1 - ell**2)[:2])
+    cc = ell**2 * ellipfcts[0, :] ** 2 / ellipfcts[1, :] ** 2
+    aa = np.zeros(r)
+    Mhat = 1
+    for j in range(1, r + 1):
+        p1 = 1
+        p2 = 1
+        for k in range(1, r + 1):
+            p1 *= cc[2 * j - 2] - cc[2 * k - 1]
+            if k != j:
+                p2 *= cc[2 * j - 2] - cc[2 * k - 2]
+        aa[j - 1] = -p1 / p2
+        Mhat *= (1 + cc[2 * j - 2]) / (1 + cc[2 * j - 1])
+    return (
+        factories.array(cc, dtype=dtype, split=None, device=device),
+        factories.array(aa, dtype=dtype, split=None, device=device),
+        factories.array(Mhat, dtype=dtype, split=None, device=device),
+    )
+
+
+def _in_place_qr_with_q_only(A: DNDarray, procs_to_merge: int = 2) -> None:
+    r"""
+    Input A and procs_to_merge are as in heat.linalg.qr; difference it that this routine modified A in place and replaces it with Q.
+    """
+    if not A.is_distributed() or A.split < A.ndim - 2:
+        # handle the case of a single process or split=None: just PyTorch QR
+        # difference to heat.linalg.qr: we only return Q and put it directly in place of A
+        A.larray, R = torch.linalg.qr(A.larray, mode="reduced")
+        del R
+
+    elif A.split == A.ndim - 1:
+        # handle the case that A is split along the columns
+        # unlike in heat.linalg.qr, we know by assumption of Zolo-PD that A has at least as many rows as columns
+
+        nprocs = A.comm.size
+        with torch.no_grad():
+            for i in range(nprocs):
+                # this loop goes through all the column-blocks (i.e. local arrays) of the matrix
+                # this corresponds to the loop over all columns in classical Gram-Schmidt
+                A_lshapes = A.lshape_map
+                if i < nprocs - 1:
+                    if A.comm.rank > i:
+                        Q_buf = torch.zeros(
+                            tuple(A_lshapes[i, :]),
+                            dtype=A.larray.dtype,
+                            device=A.device.torch_device,
+                        )
+                    color = 0 if A.comm.rank < i else 1
+                    sub_comm = A.comm.Split(color, A.comm.rank)
+
+                if A.comm.rank == i:
+                    # orthogonalize the current block of columns by utilizing PyTorch QR
+                    Q, R = torch.linalg.qr(A.larray, mode="reduced")
+                    del R
+                    A.larray[...] = Q
+                    del Q
+                    if i < nprocs - 1:
+                        Q_buf = A.larray
+
+                if i < nprocs - 1 and A.comm.rank >= i:
+                    sub_comm.Bcast(Q_buf, root=0)
+
+                if A.comm.rank > i:
+                    # subtract the contribution of the current block of columns from the remaining columns
+                    R_loc = torch.transpose(Q_buf, -2, -1) @ A.larray
+                    A.larray -= Q_buf @ R_loc
+                    del R_loc, Q_buf
+
+    else:
+        A, r = qr(A)
+        del r
+
+
+def polar(
+    A: DNDarray,
+    r: int = None,
+    calcH: bool = True,
+    condition_estimate: float = 1.0e16,
+    silent: bool = True,
+    r_max: int = 8,
+) -> Tuple[DNDarray, DNDarray]:
+    """
+    Computes the so-called polar decomposition of the input 2D DNDarray ``A``, i.e., it returns the orthogonal matrix ``U`` and the symmetric, positive definite
+    matrix ``H`` such that ``A = U @ H``.
+
+    Input
+    -----
+    A : ht.DNDarray,
+        The input matrix for which the polar decomposition is computed;
+        must be two-dimensional, of data type float32 or float64, and must have at least as many rows as columns.
+    r : int, optional, default: None
+        The parameter r used in the Zolotarev-PD algorithm; if provided, must be an integer between 1 and 8 that divides the number of MPI processes.
+        Higher values of r lead to faster convergence, but memory consumption is proportional to r.
+        If not provided, the largest 1 <= r <= r_max that divides the number of MPI processes is chosen.
+    calcH : bool, optional, default: True
+        If True, the function returns the symmetric, positive definite matrix H. If False, only the orthogonal matrix U is returned.
+    condition_estimate : float, optional, default: 1.e16.
+        This argument allows to provide an estimate for the condition number of the input matrix ``A``, if such estimate is already known.
+        If a positive number greater than 1., this value is used as an estimate for the condition number of A.
+        If smaller or equal than 1., the condition number is estimated internally.
+        The default value of 1.e16 is the worst case scenario considered in [1].
+    silent : bool, optional, default: True
+        If True, the function does not print any output. If False, some information is printed during the computation.
+    r_max : int, optional, default: 8
+        See the description of r for the meaning; r_max is only taken into account if r is not provided.
+
+
+    Notes
+    -----
+    The implementation follows Algorithm 5.1 in Reference [1]; however, instead of switching from QR to Cholesky decomposition depending on the condition number,
+    we stick to QR decomposition in all iterations.
+
+    References
+    ----------
+    [1] Nakatsukasa, Y., & Freund, R. W. (2016). Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334.
+    """
+    # check whether input is DNDarray of correct shape
+    if not isinstance(A, DNDarray):
+        raise TypeError(f"Input ``A`` needs to be a DNDarray but is {type(A)}.")
+    if not A.ndim == 2:
+        raise ValueError(f"Input ``A`` needs to be a 2D DNDarray, but its dimension is {A.ndim}.")
+    if A.shape[0] < A.shape[1]:
+        raise ValueError(
+            f"Input ``A`` must have at least as many rows as columns, but has shape {A.shape}."
+        )
+
+    # check if A is a real floating point matrix and choose tolerances tol accordingly
+    if A.dtype == types.float32:
+        tol = 1.19e-7
+    elif A.dtype == types.float64:
+        tol = 2.22e-16
+    else:
+        raise TypeError(
+            f"Input ``A`` must be of data type float32 or float64 but has data type {A.dtype}"
+        )
+
+    # check if input for r is reasonable
+    if r is not None:
+        if not isinstance(r, int) or r < 1 or r > 8:
+            raise ValueError(
+                f"If specified, input ``r`` must be an integer between 1 and 8, but is {r} of data type {type(r)}."
+            )
+        if A.is_distributed() and (A.comm.size % r != 0 or A.comm.size == r):
+            raise ValueError(
+                f"If specified, input ``r`` must be a non-trivial divisor of the number MPI processes , but r={r} and A.comm.size={A.comm.size}."
+            )
+    else:
+        if not isinstance(r_max, int) or r_max < 1 or r_max > 8:
+            raise ValueError(
+                f"If specified, input ``r_max`` must be an integer between 1 and 8, but is {r_max} of data type {type(r_max)}."
+            )
+        for i in range(r_max, 0, -1):
+            if A.comm.size % i == 0 and A.comm.size // i > 1:
+                r = i
+                break
+        if not silent:
+            if A.comm.rank == 0:
+                print(f"Automatically chosen r={r} (r_max = {r_max}, {A.comm.size} processes).")
+
+    # check if input for condition_estimate is reasonable
+    if not isinstance(condition_estimate, float):
+        raise TypeError(
+            f"If specified, input ``condition_estimate`` must be a float but is {type(condition_estimate)}."
+        )
+
+    # early out for the non-distributed case
+    if not A.is_distributed():
+        U, s, vh = torch.linalg.svd(A.larray, full_matrices=False)
+        U @= vh
+        H = vh.T @ torch.diag(s) @ vh
+        if calcH:
+            return factories.array(U, is_split=None, comm=A.comm), factories.array(
+                H, is_split=None, comm=A.comm
+            )
+        else:
+            return factories.array(U, is_split=None, comm=A.comm)
+
+    alpha = _estimate_largest_singularvalue(A).item()
+
+    if condition_estimate <= 1.0:
+        kappa = condest(A).item()
+    else:
+        kappa = condition_estimate
+
+    if A.comm.rank == 0 and not silent:
+        print(
+            f"Condition number estimate: {kappa:2.2e} / Estimate for largest singular value: {alpha:2.2e}."
+        )
+
+    # each of these communicators has size r, along these communicators we parallelize the r many QR decompositions that are performed in parallel
+    horizontal_comm = A.comm.Split(A.comm.rank // r, A.comm.rank)
+
+    # each of these communicators has size MPI_WORLD.size / r and will carray a full copy of X for QR decomposition
+    vertical_comm = A.comm.Split(A.comm.rank % r, A.comm.rank)
+
+    # in each horizontal communicator, collect the local array of X from all processes
+    local_shapes = horizontal_comm.allgather(A.lshape[A.split])
+    new_local_shape = (
+        (sum(local_shapes), A.shape[1]) if A.split == 0 else (A.shape[0], sum(local_shapes))
+    )
+    counts = tuple(local_shapes)
+    displacements = tuple(np.cumsum([0] + list(local_shapes))[:-1])
+    X_collected_local = torch.zeros(
+        new_local_shape, dtype=A.dtype.torch_type(), device=A.device.torch_device
+    )
+    horizontal_comm.Allgatherv(
+        A.larray, (X_collected_local, counts, displacements), recv_axis=A.split
+    )
+
+    X = factories.array(X_collected_local, is_split=A.split, comm=vertical_comm)
+    X.balance_()
+    X /= alpha
+
+    # iteration counter and maximum number of iterations
+    it = 0
+    itmax = _zolopd_n_iterations(r, kappa)
+
+    # parameters and coefficients, see Ref. [1] for their meaning
+    ell = 1.0 / kappa
+    c, a, Mhat = _compute_zolotarev_coefficients(r, ell, A.device, dtype=A.dtype)
+
+    itmax = _zolopd_n_iterations(r, kappa)
+    while it < itmax:
+        it += 1
+        if not silent:
+            if A.comm.rank == 0:
+                print(f"Starting Zolotarev-PD iteration no. {it}...")
+        # remember current X for later convergence check
+        X_old = X.copy()
+        cId = factories.eye(X.shape[1], dtype=X.dtype, comm=X.comm, split=X.split, device=X.device)
+        cId *= c[2 * horizontal_comm.rank].item() ** 0.5
+        X = concatenate([X, cId], axis=0)
+        del cId
+        if X.split == 0:
+            Q, R = qr(X)
+            del R
+            Q1 = Q[: A.shape[0], :].balance()
+            Q2 = Q[A.shape[0] :, :].transpose().balance()
+            Q1Q2 = matmul(Q1, Q2)
+            del Q1, Q2
+            X = X[: A.shape[0], :].balance()
+            X /= r
+        else:
+            _in_place_qr_with_q_only(X)
+            Q1 = X[: A.shape[0], :].balance()
+            Q2 = X[A.shape[0] :, :].transpose().balance()
+            del X
+            Q1Q2 = matmul(Q1, Q2)
+            del Q1, Q2
+            X = X_old / r
+        X += a[horizontal_comm.rank].item() / c[2 * horizontal_comm.rank].item() ** 0.5 * Q1Q2
+        del Q1Q2
+        X *= Mhat.item()
+        # finally, sum over the horizontal communicators
+        horizontal_comm.Allreduce(MPI.IN_PLACE, X.larray, op=MPI.SUM)
+
+        # check for convergence and break if tolerance is reached
+        if it > 1 and matrix_norm(X - X_old, ord="fro") / matrix_norm(X, ord="fro") <= tol ** (
+            1 / (2 * r + 1)
+        ):
+            if not silent:
+                if A.comm.rank == 0:
+                    print(f"Zolotarev-PD iteration converged after {it} iterations.")
+            break
+        elif it < itmax:
+            # if another iteration is necessary, update coefficients and parameters for next iteration
+            ellold = ell
+            ell = 1
+            for j in range(r):
+                ell *= (ellold**2 + c[2 * j + 1].item()) / (ellold**2 + c[2 * j].item())
+            ell *= Mhat.item() * ellold
+            if ell >= 1.0:
+                ell = 1.0 - tol
+            c, a, Mhat = _compute_zolotarev_coefficients(r, ell, A.device, dtype=A.dtype)
+        else:
+            if not silent:
+                if A.comm.rank == 0:
+                    print(
+                        f"Zolotarev-PD iteration did not reach the convergence criterion after {itmax} iterations, which is most likely due to limited numerical accuracy and/or poor estimation of the condition number. The result may still be useful, but should be handled with care!"
+                    )
+
+    # as every process has much more data than required, we need to split the result into the parts that are actually
+    counts = [
+        X.lshape[X.split] // horizontal_comm.size + (r < X.lshape[X.split] % horizontal_comm.size)
+        for r in range(horizontal_comm.size)
+    ]
+    displacements = [sum(counts[:r]) for r in range(horizontal_comm.size)]
+
+    if A.split == 1:
+        U_local = X.larray[
+            :,
+            displacements[horizontal_comm.rank] : displacements[horizontal_comm.rank]
+            + counts[horizontal_comm.rank],
+        ]
+    else:
+        U_local = X.larray[
+            displacements[horizontal_comm.rank] : displacements[horizontal_comm.rank]
+            + counts[horizontal_comm.rank],
+            :,
+        ]
+    U = factories.array(U_local, is_split=A.split, comm=A.comm, device=A.device)
+    del X
+    U.balance_()
+
+    # postprocessing: compute H if requested
+    if calcH:
+        H = matmul(U.T, A)
+        H = 0.5 * (H + H.T.resplit(H.split))
+        return U, H.resplit(A.split)
+    else:
+        return U
diff --git a/heat/core/linalg/qr.py b/heat/core/linalg/qr.py
index f3cc5afe5b..4ca0c3fc01 100644
--- a/heat/core/linalg/qr.py
+++ b/heat/core/linalg/qr.py
@@ -1,5 +1,5 @@
 """
-QR decomposition of (distributed) 2-D ``DNDarray``s.
+QR decomposition of ``DNDarray``s.
 """
 
 import collections
@@ -7,6 +7,7 @@
 from typing import Tuple
 
 from ..dndarray import DNDarray
+from ..manipulations import concatenate
 from .. import factories
 from .. import communication
 from ..types import float32, float64
@@ -24,16 +25,18 @@ def qr(
     Factor the matrix ``A`` as *QR*, where ``Q`` is orthonormal and ``R`` is upper-triangular.
     If ``mode = "reduced``, function returns ``QR(Q=Q, R=R)``, if ``mode = "r"`` function returns ``QR(Q=None, R=R)``
 
+    This function also works for batches of matrices; in this case, the last two dimensions of the input array are considered as the matrix dimensions.
+    The output arrays have the same leading batch dimensions as the input array.
+
     Parameters
     ----------
-    A : DNDarray of shape (M, N)
-        Array which will be decomposed. So far only 2D arrays with datatype float32 or float64 are supported
-        For split=0, the matrix must be tall skinny, i.e. the local chunks of data must have at least as many rows as columns.
+    A : DNDarray of shape (M, N), of shape (...,M,N) in the batched case
+        Array which will be decomposed. So far only arrays with datatype float32 or float64 are supported
     mode : str, optional
-        default "reduced" returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N), respectively.
+        default "reduced" returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N). Potential batch dimensions are not modified.
         "r" returns only R, with dimensions (min(M,N), N).
     procs_to_merge : int, optional
-        This parameter is only relevant for split=0 and determines the number of processes to be merged at one step during the so-called TS-QR algorithm.
+        This parameter is only relevant for split=0 (-2, in the batched case) and determines the number of processes to be merged at one step during the so-called TS-QR algorithm.
         The default is 2. Higher choices might be faster, but will probably result in higher memory consumption. 0 corresponds to merging all processes at once.
         We only recommend to modify this parameter if you are familiar with the TS-QR algorithm (see the references below).
 
@@ -43,16 +46,20 @@ def qr(
 
         - If ``A`` is distributed along the columns (A.split = 1), so will be ``Q`` and ``R``.
 
-        - If ``A`` is distributed along the rows (A.split = 0), ``Q`` too will have  `split=0`, but ``R`` won't be distributed, i.e. `R. split = None` and a full copy of ``R`` will be stored on each process.
+        - If ``A`` is distributed along the rows (A.split = 0), ``Q`` too will have  `split=0`. ``R`` won't be distributed, i.e. `R. split = None`, if ``A`` is tall-skinny, i.e., if
+          the largest local chunk of data of ``A`` has at least as many rows as columns. Otherwise, ``R`` will be distributed along the rows as well, i.e., `R.split = 0`.
 
     Note that the argument `calc_q` allowed in earlier Heat versions is no longer supported; `calc_q = False` is equivalent to `mode = "r"`.
     Unlike ``numpy.linalg.qr()``, `ht.linalg.qr` only supports ``mode="reduced"`` or ``mode="r"`` for the moment, since "complete" may result in heavy memory usage.
 
     Heats QR function is built on top of PyTorchs QR function, ``torch.linalg.qr()``, using LAPACK (CPU) and MAGMA (CUDA) on
-    the backend. For split=0, tall-skinny QR (TS-QR) is implemented, while for split=1 a block-wise version of stabilized Gram-Schmidt orthogonalization is used.
+    the backend. Both cases split=0 and split=1 build on a column-block-wise version of stabilized Gram-Schmidt orthogonalization.
+    For split=1 (-1, in the batched case), this is directly applied to the local arrays of the input array.
+    For split=0, a tall-skinny QR (TS-QR) is implemented for the case of tall-skinny matrices (i.e., the largest local chunk of data has at least as many rows as columns),
+    and extended to non tall-skinny matrices by applying a block-wise version of stabilized Gram-Schmidt orthogonalization.
 
     References
-    -----------
+    ----------
     Basic information about QR factorization/decomposition can be found at, e.g.:
 
         - https://en.wikipedia.org/wiki/QR_factorization,
@@ -87,65 +94,58 @@ def qr(
     if procs_to_merge == 0:
         procs_to_merge = A.comm.size
 
-    if A.ndim != 2:
-        raise ValueError(
-            f"Array 'A' must be 2 dimensional, buts has {A.ndim} dimensions. \n Please open an issue on GitHub if you require QR for batches of matrices similar to PyTorch."
-        )
     if A.dtype not in [float32, float64]:
         raise TypeError(f"Array 'A' must have a datatype of float32 or float64, but has {A.dtype}")
 
     QR = collections.namedtuple("QR", "Q, R")
 
-    if not A.is_distributed():
+    if A.ndim == 3:
+        single_proc_qr = torch.vmap(torch.linalg.qr, in_dims=0, out_dims=0)
+    else:
+        single_proc_qr = torch.linalg.qr
+
+    if not A.is_distributed() or A.split < A.ndim - 2:
         # handle the case of a single process or split=None: just PyTorch QR
-        Q, R = torch.linalg.qr(A.larray, mode=mode)
-        R = DNDarray(
-            R,
-            gshape=R.shape,
-            dtype=A.dtype,
-            split=A.split,
-            device=A.device,
-            comm=A.comm,
-            balanced=True,
-        )
+        Q, R = single_proc_qr(A.larray, mode=mode)
+        R = factories.array(R, is_split=A.split)
         if mode == "reduced":
-            Q = DNDarray(
-                Q,
-                gshape=Q.shape,
-                dtype=A.dtype,
-                split=A.split,
-                device=A.device,
-                comm=A.comm,
-                balanced=True,
-            )
+            Q = factories.array(Q, is_split=A.split, device=A.device)
         else:
             Q = None
         return QR(Q, R)
 
-    if A.split == 1:
+    if A.split == A.ndim - 1:
         # handle the case that A is split along the columns
         # here, we apply a block-wise version of (stabilized) Gram-Schmidt orthogonalization
         # instead of orthogonalizing each column of A individually, we orthogonalize blocks of columns (i.e. the local arrays) at once
 
-        lshapes = A.lshape_map[:, 1]
+        lshapes = A.lshape_map[:, -1]
         lshapes_cum = torch.cumsum(lshapes, 0)
         nprocs = A.comm.size
 
-        if A.shape[0] >= A.shape[1]:
+        if A.shape[-2] >= A.shape[-1]:
             last_row_reached = nprocs
-            k = A.shape[1]
+            k = A.shape[-1]
         else:
-            last_row_reached = min(torch.argwhere(lshapes_cum >= A.shape[0]))[0]
-            k = A.shape[0]
+            last_row_reached = min(torch.argwhere(lshapes_cum >= A.shape[-2]))[0]
+            k = A.shape[-2]
 
         if mode == "reduced":
-            Q = factories.zeros(A.shape, dtype=A.dtype, split=1, device=A.device, comm=A.comm)
+            Q = factories.zeros(
+                A.shape, dtype=A.dtype, split=A.ndim - 1, device=A.device, comm=A.comm
+            )
 
-        R = factories.zeros((k, A.shape[1]), dtype=A.dtype, split=1, device=A.device, comm=A.comm)
+        R = factories.zeros(
+            (*A.shape[:-2], k, A.shape[-1]),
+            dtype=A.dtype,
+            split=A.ndim - 1,
+            device=A.device,
+            comm=A.comm,
+        )
         R_shapes = torch.hstack(
             [
                 torch.zeros(1, dtype=torch.int32, device=A.device.torch_device),
-                torch.cumsum(R.lshape_map[:, 1], 0),
+                torch.cumsum(R.lshape_map[:, -1], 0),
             ]
         )
 
@@ -154,157 +154,209 @@ def qr(
         for i in range(last_row_reached + 1):
             # this loop goes through all the column-blocks (i.e. local arrays) of the matrix
             # this corresponds to the loop over all columns in classical Gram-Schmidt
+
             if i < nprocs - 1:
-                k_loc_i = min(A.shape[0], A.lshape_map[i, 1])
+                k_loc_i = min(A.shape[-2], A.lshape_map[i, -1])
                 Q_buf = torch.zeros(
-                    (A.shape[0], k_loc_i), dtype=A.larray.dtype, device=A.device.torch_device
+                    (*A.shape[:-1], k_loc_i),
+                    dtype=A.larray.dtype,
+                    device=A.device.torch_device,
                 )
 
             if A.comm.rank == i:
                 # orthogonalize the current block of columns by utilizing PyTorch QR
-                Q_curr, R_loc = torch.linalg.qr(A_columns, mode="reduced")
+                Q_curr, R_loc = single_proc_qr(A_columns, mode="reduced")
                 if i < nprocs - 1:
-                    Q_buf = Q_curr
+                    Q_buf = Q_curr.contiguous()
                 if mode == "reduced":
                     Q.larray = Q_curr
-                r_size = R.larray[R_shapes[i] : R_shapes[i + 1], :].shape[0]
-                R.larray[R_shapes[i] : R_shapes[i + 1], :] = R_loc[:r_size, :]
+                r_size = R.larray[..., R_shapes[i] : R_shapes[i + 1], :].shape[-2]
+                R.larray[..., R_shapes[i] : R_shapes[i + 1], :] = R_loc[..., :r_size, :]
 
             if i < nprocs - 1:
                 # broadcast the orthogonalized block of columns to all other processes
-                req = A.comm.Ibcast(Q_buf, root=i)
-                req.Wait()
+                A.comm.Bcast(Q_buf, root=i)
 
             if A.comm.rank > i:
                 # subtract the contribution of the current block of columns from the remaining columns
-                R_loc = Q_buf.T @ A_columns
+                R_loc = torch.transpose(Q_buf, -2, -1) @ A_columns
                 A_columns -= Q_buf @ R_loc
-                r_size = R.larray[R_shapes[i] : R_shapes[i + 1], :].shape[0]
-                R.larray[R_shapes[i] : R_shapes[i + 1], :] = R_loc[:r_size, :]
+                r_size = R.larray[..., R_shapes[i] : R_shapes[i + 1], :].shape[-2]
+                R.larray[..., R_shapes[i] : R_shapes[i + 1], :] = R_loc[..., :r_size, :]
 
         if mode == "reduced":
-            Q = Q[:, :k].balance()
+            Q = Q[..., :, :k].balance()
         else:
             Q = None
 
         return QR(Q, R)
 
-    if A.split == 0:
-        # implementation of TS-QR for split = 0
-        # check that data distribution is reasonable for TS-QR (i.e. tall-skinny matrix with also tall-skinny local chunks of data)
-        if A.lshape_map[:, 0].max().item() < A.shape[1]:
-            raise ValueError(
-                "A is split along the rows and the local chunks of data are rectangular with more rows than columns. \n Applying TS-QR in this situation is not reasonable w.r.t. runtime and memory consumption. \n We recomment to split A along the columns instead. \n In case this is not an option for you, please open an issue on GitHub."
+    if A.split == A.ndim - 2:
+        # check that data distribution is reasonable for TS-QR
+        # we regard a matrix with split = 0 as suitable for TS-QR if its largest local chunk of data has at least as many rows as columns
+        biggest_number_of_local_rows = A.lshape_map[:, -2].max().item()
+        if biggest_number_of_local_rows < A.shape[-1]:
+            column_idx = torch.cumsum(A.lshape_map[:, -2], 0)
+            column_idx = column_idx[column_idx < A.shape[-1]]
+            column_idx = torch.cat(
+                (
+                    torch.tensor([0], device=column_idx.device),
+                    column_idx,
+                    torch.tensor([A.shape[-1]], device=column_idx.device),
+                )
             )
+            A_copy = A.copy()
+            R = A.copy()
+            # Block-wise Gram-Schmidt orthogonalization, applied to groups of columns
+            offset = 1 if A.shape[-1] <= A.shape[-2] else 2
+            for k in range(len(column_idx) - offset):
+                # since we only consider a group of columns, TS QR is applied to a tall-skinny matrix
+                Qnew, Rnew = qr(
+                    A_copy[..., :, column_idx[k] : column_idx[k + 1]],
+                    mode="reduced",
+                    procs_to_merge=procs_to_merge,
+                )
 
-        current_procs = [i for i in range(A.comm.size)]
-        current_comm = A.comm
-        local_comm = current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank)
-        Q_loc, R_loc = torch.linalg.qr(A.larray, mode=mode)
-        R_loc = R_loc.contiguous()  # required for all the communication ops lateron
-        if mode == "reduced":
-            leave_comm = current_comm.Split(current_comm.rank, A.comm.rank)
-
-        level = 1
-        while len(current_procs) > 1:
-            if A.comm.rank in current_procs and local_comm.size > 1:
-                # create array to collect the R_loc's from all processes of the process group of at most n_procs_to_merge processes
-                shapes_R_loc = local_comm.gather(R_loc.shape[0], root=0)
-                if local_comm.rank == 0:
-                    gathered_R_loc = torch.zeros(
-                        (sum(shapes_R_loc), R_loc.shape[1]),
-                        device=R_loc.device,
-                        dtype=R_loc.dtype,
+                # usual update of the remaining columns
+                if R.comm.rank == k:
+                    R.larray[
+                        ...,
+                        : (column_idx[k + 1] - column_idx[k]),
+                        column_idx[k] : column_idx[k + 1],
+                    ] = Rnew.larray
+                if R.comm.rank > k:
+                    R.larray[..., :, column_idx[k] : column_idx[k + 1]] *= 0
+                if k < len(column_idx) - 2:
+                    coeffs = (
+                        torch.transpose(Qnew.larray, -2, -1)
+                        @ A_copy.larray[..., :, column_idx[k + 1] :]
                     )
-                    counts = list(shapes_R_loc)
-                    displs = torch.cumsum(
-                        torch.tensor([0] + shapes_R_loc, dtype=torch.int32), 0
-                    ).tolist()[:-1]
-                else:
-                    gathered_R_loc = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype)
-                    counts = None
-                    displs = None
-                # gather the R_loc's from all processes of the process group of at most n_procs_to_merge processes
-                local_comm.Gatherv(R_loc, (gathered_R_loc, counts, displs), root=0, axis=0)
-                # perform QR decomposition on the concatenated, gathered R_loc's to obtain new R_loc
-                if local_comm.rank == 0:
-                    previous_shape = R_loc.shape
-                    Q_buf, R_loc = torch.linalg.qr(gathered_R_loc, mode=mode)
-                    R_loc = R_loc.contiguous()
-                else:
-                    Q_buf = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype)
+                    R.comm.Allreduce(communication.MPI.IN_PLACE, coeffs)
+                    if R.comm.rank == k:
+                        R.larray[..., :, column_idx[k + 1] :] = coeffs
+                    A_copy.larray[..., :, column_idx[k + 1] :] -= Qnew.larray @ coeffs
                 if mode == "reduced":
-                    if local_comm.rank == 0:
-                        Q_buf = Q_buf.contiguous()
-                    scattered_Q_buf = torch.empty(
-                        R_loc.shape if local_comm.rank != 0 else previous_shape,
-                        device=R_loc.device,
-                        dtype=R_loc.dtype,
-                    )
-                    # scatter the Q_buf to all processes of the process group
-                    local_comm.Scatterv((Q_buf, counts, displs), scattered_Q_buf, root=0, axis=0)
-                del gathered_R_loc, Q_buf
+                    Q = Qnew if k == 0 else concatenate((Q, Qnew), axis=-1)
+            if A.shape[-1] < A.shape[-2]:
+                R = R[..., : A.shape[-1], :].balance()
+            if mode == "reduced":
+                return QR(Q, R)
+            else:
+                return QR(None, R)
 
-            # for each process in the current processes, broadcast the scattered_Q_buf of this process
-            # to all leaves (i.e. all original processes that merge to the current process)
-            if mode == "reduced" and leave_comm.size > 1:
+        else:
+            # in this case the input is tall-skinny and we apply the TS-QR algorithm
+            # it follows the implementation of TS-QR for split = 0
+            current_procs = [i for i in range(A.comm.size)]
+            current_comm = A.comm
+            local_comm = current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank)
+            Q_loc, R_loc = single_proc_qr(A.larray, mode=mode)
+            R_loc = R_loc.contiguous()
+            if mode == "reduced":
+                leave_comm = current_comm.Split(current_comm.rank, A.comm.rank)
+
+            level = 1
+            while len(current_procs) > 1:
+                if A.comm.rank in current_procs and local_comm.size > 1:
+                    # create array to collect the R_loc's from all processes of the process group of at most n_procs_to_merge processes
+                    shapes_R_loc = local_comm.gather(R_loc.shape[-2], root=0)
+                    if local_comm.rank == 0:
+                        gathered_R_loc = torch.zeros(
+                            (*R_loc.shape[:-2], sum(shapes_R_loc), R_loc.shape[-1]),
+                            device=R_loc.device,
+                            dtype=R_loc.dtype,
+                        )
+                        counts = list(shapes_R_loc)
+                        displs = torch.cumsum(
+                            torch.tensor([0] + shapes_R_loc, dtype=torch.int32), 0
+                        ).tolist()[:-1]
+                    else:
+                        gathered_R_loc = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype)
+                        counts = None
+                        displs = None
+                    # gather the R_loc's from all processes of the process group of at most n_procs_to_merge processes
+                    local_comm.Gatherv(R_loc, (gathered_R_loc, counts, displs), root=0, axis=-2)
+                    # perform QR decomposition on the concatenated, gathered R_loc's to obtain new R_loc
+                    if local_comm.rank == 0:
+                        previous_shape = R_loc.shape
+                        Q_buf, R_loc = single_proc_qr(gathered_R_loc, mode=mode)
+                        R_loc = R_loc.contiguous()
+                    else:
+                        Q_buf = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype)
+                    if mode == "reduced":
+                        if local_comm.rank == 0:
+                            Q_buf = Q_buf.contiguous()
+                        scattered_Q_buf = torch.empty(
+                            R_loc.shape if local_comm.rank != 0 else previous_shape,
+                            device=R_loc.device,
+                            dtype=R_loc.dtype,
+                        )
+                        # scatter the Q_buf to all processes of the process group
+                        local_comm.Scatterv(
+                            (Q_buf, counts, displs), scattered_Q_buf, root=0, axis=-2
+                        )
+                    del gathered_R_loc, Q_buf
+
+                # for each process in the current processes, broadcast the scattered_Q_buf of this process
+                # to all leaves (i.e. all original processes that merge to the current process)
+                if mode == "reduced" and leave_comm.size > 1:
+                    try:
+                        scattered_Q_buf_shape = scattered_Q_buf.shape
+                    except UnboundLocalError:
+                        scattered_Q_buf_shape = None
+                    scattered_Q_buf_shape = leave_comm.bcast(scattered_Q_buf_shape, root=0)
+                    if scattered_Q_buf_shape is not None:
+                        # this is needed to ensure that only those Q_loc get updates that are actually part of the current process group
+                        if leave_comm.rank != 0:
+                            scattered_Q_buf = torch.empty(
+                                scattered_Q_buf_shape, device=Q_loc.device, dtype=Q_loc.dtype
+                            )
+                        leave_comm.Bcast(scattered_Q_buf, root=0)
+                    # update the local Q_loc by multiplying it with the scattered_Q_buf
                 try:
-                    scattered_Q_buf_shape = scattered_Q_buf.shape
+                    Q_loc = Q_loc @ scattered_Q_buf
+                    del scattered_Q_buf
                 except UnboundLocalError:
-                    scattered_Q_buf_shape = None
-                scattered_Q_buf_shape = leave_comm.bcast(scattered_Q_buf_shape, root=0)
-                if scattered_Q_buf_shape is not None:
-                    # this is needed to ensure that only those Q_loc get updates that are actually part of the current process group
-                    if leave_comm.rank != 0:
-                        scattered_Q_buf = torch.empty(
-                            scattered_Q_buf_shape, device=Q_loc.device, dtype=Q_loc.dtype
+                    pass
+
+                # update: determine processes to be active at next "merging" level, create new communicator and split it into groups for gathering
+                current_procs = [
+                    current_procs[i] for i in range(len(current_procs)) if i % procs_to_merge == 0
+                ]
+                if len(current_procs) > 1:
+                    new_group = A.comm.group.Incl(current_procs)
+                    current_comm = A.comm.Create_group(new_group)
+                    if A.comm.rank in current_procs:
+                        local_comm = communication.MPICommunication(
+                            current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank)
                         )
-                    leave_comm.Bcast(scattered_Q_buf, root=0)
-                # update the local Q_loc by multiplying it with the scattered_Q_buf
-            try:
-                Q_loc = Q_loc @ scattered_Q_buf
-                del scattered_Q_buf
-            except UnboundLocalError:
-                pass
-
-            # update: determine processes to be active at next "merging" level, create new communicator and split it into groups for gathering
-            current_procs = [
-                current_procs[i] for i in range(len(current_procs)) if i % procs_to_merge == 0
-            ]
-            if len(current_procs) > 1:
-                new_group = A.comm.group.Incl(current_procs)
-                current_comm = A.comm.Create_group(new_group)
-                if A.comm.rank in current_procs:
-                    local_comm = communication.MPICommunication(
-                        current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank)
-                    )
-                if mode == "reduced":
-                    leave_comm = A.comm.Split(A.comm.rank // procs_to_merge**level, A.comm.rank)
-            level += 1
-        # broadcast the final R_loc to all processes
-        R_gshape = (A.shape[1], A.shape[1])
-        if A.comm.rank != 0:
-            R_loc = torch.empty(R_gshape, dtype=R_loc.dtype, device=R_loc.device)
-        A.comm.Bcast(R_loc, root=0)
-        R = DNDarray(
-            R_loc,
-            gshape=R_gshape,
-            dtype=A.dtype,
-            split=None,
-            device=A.device,
-            comm=A.comm,
-            balanced=True,
-        )
-        if mode == "r":
-            Q = None
-        else:
-            Q = DNDarray(
-                Q_loc,
-                gshape=A.shape,
+                    if mode == "reduced":
+                        leave_comm = A.comm.Split(A.comm.rank // procs_to_merge**level, A.comm.rank)
+                level += 1
+            # broadcast the final R_loc to all processes
+            R_gshape = (*A.shape[:-2], A.shape[-1], A.shape[-1])
+            if A.comm.rank != 0:
+                R_loc = torch.empty(R_gshape, dtype=R_loc.dtype, device=R_loc.device)
+            A.comm.Bcast(R_loc, root=0)
+            R = DNDarray(
+                R_loc,
+                gshape=R_gshape,
                 dtype=A.dtype,
-                split=0,
+                split=None,
                 device=A.device,
                 comm=A.comm,
                 balanced=True,
             )
-        return QR(Q, R)
+            if mode == "r":
+                Q = None
+            else:
+                Q = DNDarray(
+                    Q_loc,
+                    gshape=A.shape,
+                    dtype=A.dtype,
+                    split=A.split,
+                    device=A.device,
+                    comm=A.comm,
+                    balanced=True,
+                )
+            return QR(Q, R)
diff --git a/heat/core/linalg/solver.py b/heat/core/linalg/solver.py
index 1a8d156b70..845f4d419a 100644
--- a/heat/core/linalg/solver.py
+++ b/heat/core/linalg/solver.py
@@ -274,9 +274,10 @@ def lanczos(
 
 def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray:
     """
-    This function provides a solver for (possibly batched) upper triangular systems of linear equations: it returns `x` in `Ax = b`, where `A` is a (possibly batched) upper triangular matrix and
+    Solver for (possibly batched) upper triangular systems of linear equations: it returns `x` in `Ax = b`, where `A` is a (possibly batched) upper triangular matrix and
     `b` a (possibly batched) vector or matrix of suitable shape, both provided as input to the function.
     The implementation builts on the corresponding solver in PyTorch and implements an memory-distributed, MPI-parallel block-wise version thereof.
+
     Parameters
     ----------
     A : DNDarray
@@ -339,7 +340,7 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray:
         else:  # A not split, b.split == -2
             b_lshapes_cum = torch.hstack(
                 [
-                    torch.zeros(1, dtype=torch.int32, device=tdev),
+                    torch.zeros(1, dtype=torch.int64, device=tdev),
                     torch.cumsum(b.lshape_map[:, -2], 0),
                 ]
             )
@@ -387,7 +388,7 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray:
     if A.split >= batch_dim:  # both splits in la dims
         A_lshapes_cum = torch.hstack(
             [
-                torch.zeros(1, dtype=torch.int32, device=tdev),
+                torch.zeros(1, dtype=torch.int64, device=tdev),
                 torch.cumsum(A.lshape_map[:, A.split], 0),
             ]
         )
@@ -411,7 +412,11 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray:
                 displ[i:] = 0
 
                 res_send = torch.empty(0)
-                res_recv = torch.zeros((*batch_shape, count[comm.rank], b.shape[-1]), device=tdev)
+                res_recv = torch.zeros(
+                    (*batch_shape, count[comm.rank], b.shape[-1]),
+                    device=tdev,
+                    dtype=b.dtype.torch_type(),
+                )
 
                 if comm.rank == i:
                     x.larray = torch.linalg.solve_triangular(
diff --git a/heat/core/linalg/svd.py b/heat/core/linalg/svd.py
index eb7ff1c87a..86a939c55a 100644
--- a/heat/core/linalg/svd.py
+++ b/heat/core/linalg/svd.py
@@ -5,8 +5,11 @@
 from typing import Tuple
 from ..dndarray import DNDarray
 from .qr import qr
+from .polar import polar
+from .eigh import eigh
 from ..types import float32, float64
 import torch
+from warnings import warn
 
 __all__ = ["svd"]
 
@@ -16,6 +19,7 @@ def svd(
     full_matrices: bool = False,
     compute_uv: bool = True,
     qr_procs_to_merge: int = 2,
+    r_max_zolopd: int = 8,
 ) -> Tuple[DNDarray, DNDarray, DNDarray]:
     """
     Computes the singular value decomposition of a matrix (the input array ``A``).
@@ -39,16 +43,29 @@ def svd(
         If ``False``, only the vector ``S`` containing the singular values is returned.
     qr_procs_to_merge : int, optional
         the number of processes to merge in the tall skinny QR decomposition that is applied if the input array is tall skinny (``M > N``) or short fat (``M < N``).
-        See the corresponding remarks for ``heat.linalg.qr`` for more details.
+        See the corresponding remarks for :func:``heat.linalg.qr`` for more details.
+    r_max_zolopd : int, optional
+        an internal parameter only relevant for the case that the input matrix is neither tall-skinny nor short-fat.
+        This parameter is passed to the Zolotarev-Polar Decomposition and the symmetric eigenvalue decomposition that is applied in this case.
+        See the documentation of :func:``heat.linalg.polar`` as well as of :func:``heat.linalg.eigh`` for more details.
 
-
-    Remarks
-    ----------
+    Notes
+    -----
     Unlike in NumPy, we currently do not support the option ``full_matrices=True``, since this can result in heavy memory consumption (in particular for tall skinny
     and short fat matrices) that should be avoided in the context Heat is designed for. If you nevertheless require this feature, please open an issue on GitHub.
 
     The algorithm used for the computation of the singular value depens on the shape of the input array ``A``.
-    For tall and skinny matrices (``M > N``), the algorithm is based on the tall-skinny QR decomposition; currently this is the only supported algorithm.
+    For tall and skinny matrices (``M > N``), the algorithm is based on the tall-skinny QR decomposition. For the remaining cases we use the approach based on
+    Zolotarev-Polar Decomposition and a symmetric eigenvalue decomposition based on Zolotarev-Polar Decomposition; see Algorithm 5.3 in:
+
+        Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the
+        matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3).
+
+    See Also
+    --------
+    :func:`heat.linalg.qr`
+    :func:`heat.linalg.polar`
+    :func:`heat.linalg.eigh`
     """
     if full_matrices:
         raise NotImplementedError(
@@ -67,6 +84,10 @@ def svd(
         )
     if qr_procs_to_merge == 0:
         qr_procs_to_merge = A.comm.size
+    if not isinstance(r_max_zolopd, int) or r_max_zolopd < 0 or r_max_zolopd > 8:
+        raise ValueError(
+            f"r_max_zolopd must be a non-negative int, but is currently {r_max_zolopd} of type {type(r_max_zolopd)}"
+        )
     if A.ndim != 2:
         raise ValueError(
             f"Array ``A`` must be 2 dimensional, buts has {A.ndim} dimensions. \n Please open an issue on GitHub if you require SVD for batches of matrices similar to PyTorch."
@@ -76,88 +97,7 @@ def svd(
             f"Array ``A`` must have a datatype of float32 or float64, but has {A.dtype}"
         )
 
-    if A.is_distributed() and A.split == 0:
-        if A.lshape_map[:, 0].max().item() < A.shape[1]:
-            raise ValueError(
-                "Input ``A`` is split along the rows and the local chunks of data are rectangular with more columns than rows. \n This case is not supported by the current implementation of SVD in Heat."
-            )
-        else:
-            # this is the distributed, tall skinny case
-            # compute SVD via tall skinny QR
-            if compute_uv:
-                # compute full SVD: first full QR, then SVD of R
-                Q, R = qr(A, mode="reduced", procs_to_merge=qr_procs_to_merge)
-                Utilde_loc, S_loc, Vt_loc = torch.linalg.svd(R.larray, full_matrices=False)
-                Utilde = DNDarray(
-                    Utilde_loc,
-                    tuple(Utilde_loc.shape),
-                    dtype=A.dtype,
-                    split=None,
-                    device=A.device,
-                    comm=A.comm,
-                    balanced=A.balanced,
-                )
-                S = DNDarray(
-                    S_loc,
-                    tuple(S_loc.shape),
-                    dtype=A.dtype,
-                    split=None,
-                    device=A.device,
-                    comm=A.comm,
-                    balanced=A.balanced,
-                )
-                V = DNDarray(
-                    Vt_loc.T,
-                    tuple(Vt_loc.T.shape),
-                    dtype=A.dtype,
-                    split=None,
-                    device=A.device,
-                    comm=A.comm,
-                    balanced=A.balanced,
-                )
-                U = (Utilde.T @ Q.T).T
-                return U, S, V
-            else:
-                # compute only singular values: first only R of QR, then singular values only of R
-                _, R = qr(A, mode="r", procs_to_merge=qr_procs_to_merge)
-                S_loc = torch.linalg.svdvals(R.larray)
-                S = DNDarray(
-                    S_loc,
-                    tuple(S_loc.shape),
-                    dtype=A.dtype,
-                    split=None,
-                    device=A.device,
-                    comm=A.comm,
-                    balanced=A.balanced,
-                )
-                return S
-    if A.is_distributed() and A.split == 1:
-
-        if A.lshape_map[:, 1].max().item() < A.shape[0]:
-            raise ValueError(
-                "Input ``A`` is split along the columns and the local chunks of data are rectangular with more rows than columns. \n This case is not supported by the current implementation of SVD in Heat."
-            )
-        else:
-            # this is the distributed, short fat case
-            # apply the tall skinny SVD to the transpose of A
-            if compute_uv:
-                V, S, U = svd(
-                    A.T,
-                    full_matrices=full_matrices,
-                    compute_uv=True,
-                    qr_procs_to_merge=qr_procs_to_merge,
-                )
-                return U, S, V
-            else:
-                S = svd(
-                    A.T,
-                    full_matrices=full_matrices,
-                    compute_uv=False,
-                    qr_procs_to_merge=qr_procs_to_merge,
-                )
-                return S
-
-    else:
+    if not A.is_distributed():
         # this is the non-distributed case
         if compute_uv:
             U_loc, S_loc, Vt_loc = torch.linalg.svd(A.larray, full_matrices=full_matrices)
@@ -201,3 +141,104 @@ def svd(
                 balanced=A.balanced,
             )
             return S
+    elif A.split == 0 and A.lshape_map[:, 0].max().item() >= A.shape[1]:
+        # this is the distributed, tall skinny case
+        # compute SVD via tall skinny QR
+        if compute_uv:
+            # compute full SVD: first full QR, then SVD of R
+            Q, R = qr(A, mode="reduced", procs_to_merge=qr_procs_to_merge)
+            Utilde_loc, S_loc, Vt_loc = torch.linalg.svd(R.larray, full_matrices=False)
+            Utilde = DNDarray(
+                Utilde_loc,
+                tuple(Utilde_loc.shape),
+                dtype=A.dtype,
+                split=None,
+                device=A.device,
+                comm=A.comm,
+                balanced=A.balanced,
+            )
+            S = DNDarray(
+                S_loc,
+                tuple(S_loc.shape),
+                dtype=A.dtype,
+                split=None,
+                device=A.device,
+                comm=A.comm,
+                balanced=A.balanced,
+            )
+            V = DNDarray(
+                Vt_loc.T,
+                tuple(Vt_loc.T.shape),
+                dtype=A.dtype,
+                split=None,
+                device=A.device,
+                comm=A.comm,
+                balanced=A.balanced,
+            )
+            U = (Utilde.T @ Q.T).T
+            return U, S, V
+        else:
+            # compute only singular values: first only R of QR, then singular values only of R
+            _, R = qr(A, mode="r", procs_to_merge=qr_procs_to_merge)
+            S_loc = torch.linalg.svdvals(R.larray)
+            S = DNDarray(
+                S_loc,
+                tuple(S_loc.shape),
+                dtype=A.dtype,
+                split=None,
+                device=A.device,
+                comm=A.comm,
+                balanced=A.balanced,
+            )
+            return S
+    elif A.split == 1 and A.lshape_map[:, 1].max().item() >= A.shape[0]:
+        # this is the distributed, short fat case
+        # apply the tall skinny SVD to the transpose of A
+        if compute_uv:
+            V, S, U = svd(
+                A.T,
+                full_matrices=full_matrices,
+                compute_uv=True,
+                qr_procs_to_merge=qr_procs_to_merge,
+            )
+            return U, S, V
+        else:
+            S = svd(
+                A.T,
+                full_matrices=full_matrices,
+                compute_uv=False,
+                qr_procs_to_merge=qr_procs_to_merge,
+            )
+            return S
+
+    else:
+        # this is the general, distributed case in which the matrix is neither tall skinny nor short fat
+        # we apply the Zolotarev-Polar Decomposition and the symmetric eigenvalue decomposition
+        if A.shape[0] < A.shape[1]:
+            # Zolo-PD requires A.shape[0] >= A.shape[1], so we need to transpose in this case
+            if compute_uv:
+                V, S, U = svd(
+                    A.T,
+                    full_matrices=full_matrices,
+                    compute_uv=True,
+                    qr_procs_to_merge=qr_procs_to_merge,
+                )
+                return U, S, V
+            else:
+                S = svd(
+                    A.T,
+                    full_matrices=full_matrices,
+                    compute_uv=False,
+                    qr_procs_to_merge=qr_procs_to_merge,
+                )
+                return S
+        else:
+            warn(
+                "You are performing the full SVD of a distributed matrix that is neither of tall-skinny nor short-fat shape. \n This operation may be costly in terms of memory and compute time."
+            )
+            U, H = polar(A, r_max=r_max_zolopd)
+            S, V = eigh(H, r_max_zolopd=r_max_zolopd)
+            if not compute_uv:
+                return S
+            else:
+                return U @ V, S, V
diff --git a/heat/core/linalg/svdtools.py b/heat/core/linalg/svdtools.py
index 3ff273a79c..fafe9fef46 100644
--- a/heat/core/linalg/svdtools.py
+++ b/heat/core/linalg/svdtools.py
@@ -11,21 +11,21 @@
 from ..dndarray import DNDarray
 from .. import factories
 from .. import types
-from ..linalg import matmul, vector_norm
+from ..linalg import matmul, vector_norm, qr, svd
 from ..indexing import where
 from ..random import randn
-
+from ..sanitation import sanitize_in_nd_realfloating
 from ..manipulations import vstack, hstack, diag, balance
 
 from .. import statistics
 from math import log, ceil, floor, sqrt
 
 
-__all__ = ["hsvd_rank", "hsvd_rtol", "hsvd"]
+__all__ = ["hsvd_rank", "hsvd_rtol", "hsvd", "rsvd", "isvd"]
 
 
 #######################################################################################
-# user-friendly versions of hSVD
+# hierachical SVD "hSVD"
 #######################################################################################
 
 
@@ -40,61 +40,53 @@ def hsvd_rank(
     Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray
 ]:
     """
-        Hierarchical SVD (hSVD) with prescribed truncation rank `maxrank`.
-        If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:maxrank] (and sigma[:maxrank], V[:,:maxrank]).
-
-        The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters.
-
-        One can expect a similar outcome from this routine as for sci-kit learn's TruncatedSVD (with `algorithm='randomized'`) although a different, determinstic algorithm is applied here. Hereby, the parameters `n_components`
-        and `n_oversamples` (sci-kit learn) roughly correspond to `maxrank` and `safetyshift` (see below).
-
-        Parameters
-        ----------
-        A : DNDarray
-            2D-array (float32/64) of which the hSVD has to be computed.
-        maxrank : int
-            truncation rank. (This parameter corresponds to `n_components` in sci-kit learn's TruncatedSVD.)
-        compute_sv : bool, optional
-            compute_sv=True implies that also Sigma and V are computed and returned. The default is False.
-        maxmergedim : int, optional
-            maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank.
-            Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large.
-        safetyshift : int, optional
-            Increases the actual truncation rank within the computations by a safety shift. The default is 5. (There is some similarity to `n_oversamples` in sci-kit learn's TruncatedSVD.)
-        silent : bool, optional
-            silent=False implies that some information on the computations are printed. The default is True.
-
-        Returns
-        -------
-        (Union[    Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray])
-            if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree).
-            if compute_sv=False: U, a-posteriori error estimate
-
-        Notes
-        -------
-        The size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues.
-        Note that, as sci-kit learn's randomized SVD, this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed
-        and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure.
+    Hierarchical SVD (hSVD) with prescribed truncation rank `maxrank`.
+    If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:maxrank] (and sigma[:maxrank], V[:,:maxrank]).
+
+    The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters.
+
+    One can expect a similar outcome from this routine as for sci-kit learn's TruncatedSVD (with `algorithm='randomized'`) although a different, determinstic algorithm is applied here. Hereby, the parameters `n_components`
+    and `n_oversamples` (sci-kit learn) roughly correspond to `maxrank` and `safetyshift` (see below).
+
+    Parameters
+    ----------
+    A : DNDarray
+        2D-array (float32/64) of which the hSVD has to be computed.
+    maxrank : int
+        truncation rank. (This parameter corresponds to `n_components` in sci-kit learn's TruncatedSVD.)
+    compute_sv : bool, optional
+        compute_sv=True implies that also Sigma and V are computed and returned. The default is False.
+    maxmergedim : int, optional
+        maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank.
+        Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large.
+    safetyshift : int, optional
+        Increases the actual truncation rank within the computations by a safety shift. The default is 5. (There is some similarity to `n_oversamples` in sci-kit learn's TruncatedSVD.)
+    silent : bool, optional
+        silent=False implies that some information on the computations are printed. The default is True.
+
+    Returns
+    -------
+    (Union[    Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray])
+        if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree).
+        if compute_sv=False: U, a-posteriori error estimate
+
+    Notes
+    -----
+    The size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues.
+    Note that, as sci-kit learn's randomized SVD, this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed
+    and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure.
 
     See Also
-    ---------
+    --------
     :func:`hsvd`
     :func:`hsvd_rtol`
-        References
-        -------
-        [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016.
-        [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018.
+
+    References
+    ----------
+    [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016.
+    [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018.
     """
-    if not isinstance(A, DNDarray):
-        raise TypeError(f"Argument needs to be a DNDarray but is {type(A)}.")
-    if not A.ndim == 2:
-        raise ValueError("A needs to be a 2D matrix")
-    if not A.dtype == types.float32 and not A.dtype == types.float64:
-        raise TypeError(
-            "Argument needs to be a DNDarray with datatype float32 or float64, but data type is {}.".format(
-                A.dtype
-            )
-        )
+    sanitize_in_nd_realfloating(A, "A", [2])
     A_local_size = max(A.lshape_map[:, 1])
 
     if maxmergedim is not None and maxmergedim < 2 * (maxrank + safetyshift) + 1:
@@ -135,52 +127,52 @@ def hsvd_rtol(
     Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray
 ]:
     """
-        Hierchical SVD (hSVD) with prescribed upper bound on the relative reconstruction error.
-        If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:r] (and sigma[:r], V[:,:r])
-        such that the rel. reconstruction error ||A-U[:,:r] diag(sigma[:r]) V[:,:r]^T ||_F / ||A||_F does not exceed rtol.
-
-        The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. This routine is similar to `hsvd_rank` with the difference that
-        truncation is not performed after a fixed number (namly `maxrank` many) singular values but after such a number of singular values that suffice to capture a prescribed fraction of the amount of information
-        contained in the input data (`rtol`).
-
-        Parameters
-        ----------
-        A : DNDarray
-            2D-array (float32/64) of which the hSVD has to be computed.
-        rtol : float
-            desired upper bound on the relative reconstruction error ||A-U Sigma V^T ||_F / ||A||_F. This upper bound is processed into 'local'
-            tolerances during the actual computations assuming the worst case scenario of a binary "merging tree"; therefore, the a-posteriori
-            error for the relative error using the true "merging tree" (see output) may be significantly smaller than rtol.
-            Prescription of maxrank or maxmergedim (disabled in default) can result in loss of desired precision, but can help to avoid memory issues.
-        compute_sv : bool, optional
-            compute_sv=True implies that also Sigma and V are computed and returned. The default is False.
-        no_of_merges : int, optional
-            Maximum number of processes to be merged at each step. If no further arguments are provided (see below),
-            this completely determines the "merging tree" and may cause memory issues. The default is None and results in a binary merging tree.
-            Note that no_of_merges dominates maxrank and maxmergedim in the sense that at most no_of_merges processes are merged
-            even if maxrank and maxmergedim would allow merging more processes.
-        maxrank : int, optional
-            maximal truncation rank. The default is None.
-            Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision.
-            Setting only maxrank (and not maxmergedim) results in an appropriate default choice for maxmergedim depending on the size of the local slices of A and the value of maxrank.
-        maxmergedim : int, optional
-            maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. The default is None.
-            Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large.
-            Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision.
-            Setting only maxmergedim (and not maxrank) results in an appropriate default choice for maxrank.
-        safetyshift : int, optional
-            Increases the actual truncation rank within the computations by a safety shift. The default is 5.
-        silent : bool, optional
-            silent=False implies that some information on the computations are printed. The default is True.
-
-        Returns
-        -------
-        (Union[    Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray])
-            if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree used in the computations).
-            if compute_sv=False: U, a-posteriori error estimate
-
-        Notes
-        -------
+    Hierchical SVD (hSVD) with prescribed upper bound on the relative reconstruction error.
+    If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:r] (and sigma[:r], V[:,:r])
+    such that the rel. reconstruction error ||A-U[:,:r] diag(sigma[:r]) V[:,:r]^T ||_F / ||A||_F does not exceed rtol.
+
+    The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. This routine is similar to `hsvd_rank` with the difference that
+    truncation is not performed after a fixed number (namly `maxrank` many) singular values but after such a number of singular values that suffice to capture a prescribed fraction of the amount of information
+    contained in the input data (`rtol`).
+
+    Parameters
+    ----------
+    A : DNDarray
+        2D-array (float32/64) of which the hSVD has to be computed.
+    rtol : float
+        desired upper bound on the relative reconstruction error ||A-U Sigma V^T ||_F / ||A||_F. This upper bound is processed into 'local'
+        tolerances during the actual computations assuming the worst case scenario of a binary "merging tree"; therefore, the a-posteriori
+        error for the relative error using the true "merging tree" (see output) may be significantly smaller than rtol.
+        Prescription of maxrank or maxmergedim (disabled in default) can result in loss of desired precision, but can help to avoid memory issues.
+    compute_sv : bool, optional
+        compute_sv=True implies that also Sigma and V are computed and returned. The default is False.
+    no_of_merges : int, optional
+        Maximum number of processes to be merged at each step. If no further arguments are provided (see below),
+        this completely determines the "merging tree" and may cause memory issues. The default is None and results in a binary merging tree.
+        Note that no_of_merges dominates maxrank and maxmergedim in the sense that at most no_of_merges processes are merged
+        even if maxrank and maxmergedim would allow merging more processes.
+    maxrank : int, optional
+        maximal truncation rank. The default is None.
+        Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision.
+        Setting only maxrank (and not maxmergedim) results in an appropriate default choice for maxmergedim depending on the size of the local slices of A and the value of maxrank.
+    maxmergedim : int, optional
+        maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. The default is None.
+        Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large.
+        Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision.
+        Setting only maxmergedim (and not maxrank) results in an appropriate default choice for maxrank.
+    safetyshift : int, optional
+        Increases the actual truncation rank within the computations by a safety shift. The default is 5.
+    silent : bool, optional
+        silent=False implies that some information on the computations are printed. The default is True.
+
+    Returns
+    -------
+    (Union[    Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray])
+        if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree used in the computations).
+        if compute_sv=False: U, a-posteriori error estimate
+
+    Notes
+    -----
         The maximum size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues.
         For similar reasons, prescribing only rtol and the number of processes to be merged in each step (without specifying maxrank or maxmergedim) may result in memory issues.
         Although prescribing maxrank is therefore strongly recommended to avoid memory issues, but may result in loss of desired precision (rtol). If this occures, a separate warning will be raised.
@@ -188,25 +180,18 @@ def hsvd_rtol(
         Note that this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure.
 
         To avoid confusion, note that `rtol` in this routine does not have any similarity to `tol` in scikit learn's TruncatedSVD.
+
     See Also
-    ---------
+    --------
     :func:`hsvd`
     :func:`hsvd_rank`
-        References
-        -------
+
+    References
+    ----------
         [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016.
         [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018.
     """
-    if not isinstance(A, DNDarray):
-        raise TypeError(f"Argument needs to be a DNDarray but is {type(A)}.")
-    if not A.ndim == 2:
-        raise ValueError("A needs to be a 2D matrix")
-    if not A.dtype == types.float32 and not A.dtype == types.float64:
-        raise TypeError(
-            "Argument needs to be a DNDarray with datatype float32 or float64, but data type is {}.".format(
-                A.dtype
-            )
-        )
+    sanitize_in_nd_realfloating(A, "A", [2])
     A_local_size = max(A.lshape_map[:, 1])
 
     if maxmergedim is not None and maxrank is None:
@@ -252,11 +237,6 @@ def hsvd_rtol(
     )
 
 
-################################################################################################
-# hSVD - "full" routine for the experts
-################################################################################################
-
-
 def hsvd(
     A: DNDarray,
     maxrank: Optional[int] = None,
@@ -271,7 +251,7 @@ def hsvd(
     Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray
 ]:
     """
-    This function computes an approximate truncated SVD of A utilizing a distributed hiearchical algorithm; see the references.
+    Computes an approximate truncated SVD of A utilizing a distributed hiearchical algorithm; see the references.
     The present function `hsvd` is a low-level routine, provides many options/parameters, but no default values, and is not recommended for usage by non-experts since conflicts
     arising from inappropriate parameter choice will not be catched. We strongly recommend to use the corresponding high-level functions `hsvd_rank` and `hsvd_rtol` instead.
 
@@ -303,12 +283,12 @@ def hsvd(
         if compute_sv=False: U, a-posteriori error estimate
 
     References
-    -------
+    ----------
     [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016.
     [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018.
 
     See Also
-    ---------
+    --------
     :func:`hsvd_rank`
     :func:`hsvd_rtol`
     """
@@ -338,7 +318,7 @@ def hsvd(
             "\t\t".join(["%d" % an for an in active_nodes]),
         )
 
-    U_loc, sigma_loc, err_squared_loc = compute_local_truncated_svd(
+    U_loc, sigma_loc, err_squared_loc = _compute_local_truncated_svd(
         level, A.comm.rank, A.larray, maxrank, loc_atol, safetyshift
     )
     U_loc = torch.matmul(U_loc, torch.diag(sigma_loc))
@@ -416,7 +396,7 @@ def hsvd(
 
             if len(future_nodes) == 1:
                 safetyshift = 0
-            U_loc, sigma_loc, err_squared_loc_new = compute_local_truncated_svd(
+            U_loc, sigma_loc, err_squared_loc_new = _compute_local_truncated_svd(
                 level, A.comm.rank, U_loc, maxrank, loc_atol, safetyshift
             )
 
@@ -470,12 +450,7 @@ def hsvd(
     return U, rel_error_estimate
 
 
-##############################################################################################
-# AUXILIARY ROUTINES
-##############################################################################################
-
-
-def compute_local_truncated_svd(
+def _compute_local_truncated_svd(
     level: int,
     proc_id: int,
     U_loc: torch.Tensor,
@@ -529,3 +504,295 @@ def compute_local_truncated_svd(
         sigma_loc = torch.zeros(1, dtype=U_loc.dtype, device=U_loc.device)
         U_loc = torch.zeros(U_loc.shape[0], 1, dtype=U_loc.dtype, device=U_loc.device)
         return U_loc, sigma_loc, err_squared_loc
+
+
+##############################################################################################
+# Randomized SVD "rSVD"
+##############################################################################################
+
+
+def rsvd(
+    A: DNDarray,
+    rank: int,
+    n_oversamples: int = 10,
+    power_iter: int = 0,
+    qr_procs_to_merge: int = 2,
+) -> Union[Tuple[DNDarray, DNDarray, DNDarray], Tuple[DNDarray, DNDarray]]:
+    r"""
+    Randomized SVD (rSVD) with prescribed truncation rank `rank`.
+    If :math:`A = U \operatorname{diag}(S) V^T` is the true SVD of A, this routine computes an approximation for U[:,:rank] (and S[:rank], V[:,:rank]).
+
+    The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters.
+
+    Parameters
+    ----------
+    A : DNDarray
+        2D-array (float32/64) of which the rSVD has to be computed.
+    rank : int
+        truncation rank. (This parameter corresponds to `n_components` in scikit-learn's TruncatedSVD.)
+    n_oversamples : int, optional
+        number of oversamples. The default is 10.
+    power_iter : int, optional
+        number of power iterations. The default is 0.
+        Choosing `power_iter > 0` can improve the accuracy of the SVD approximation in the case of slowly decaying singular values, but increases the computational cost.
+    qr_procs_to_merge : int, optional
+        number of processes to merge at each step of QR decomposition in the power iteration (if power_iter > 0). The default is 2. See the corresponding remarks for :func:`heat.linalg.qr() <heat.core.linalg.qr.qr()>` for more details.
+
+
+    Notes
+    -----
+    Memory requirements: the SVD computation of a matrix of size (rank + n_oversamples) x (rank + n_oversamples) must fit into the memory of a single process.
+    The implementation follows Algorithm 4.4 (randomized range finder) and Algorithm 5.1 (direct SVD) in [1].
+
+    References
+    ----------
+    [1] Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2), 217-288.
+    """
+    sanitize_in_nd_realfloating(A, "A", [2])
+    if not isinstance(rank, int):
+        raise TypeError(f"rank must be an integer, but is {type(rank)}.")
+    if rank < 1:
+        raise ValueError(f"rank must be positive, but is {rank}.")
+    if not isinstance(n_oversamples, int):
+        raise TypeError(
+            f"if provided, n_oversamples must be an integer, but is {type(n_oversamples)}."
+        )
+    if n_oversamples < 0:
+        raise ValueError(f"n_oversamples must be non-negative, but is {n_oversamples}.")
+    if not isinstance(power_iter, int):
+        raise TypeError(f"if provided, power_iter must be an integer, but is {type(power_iter)}.")
+    if power_iter < 0:
+        raise ValueError(f"power_iter must be non-negative, but is {power_iter}.")
+
+    ell = rank + n_oversamples
+    q = power_iter
+
+    # random matrix
+    splitOmega = 1 if A.split == 0 else 0
+    Omega = randn(A.shape[1], ell, dtype=A.dtype, device=A.device, split=splitOmega)
+
+    # compute the range of A
+    Y = matmul(A, Omega)
+    Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge)
+
+    # power iterations
+    for _ in range(q):
+        if Q.split is not None and Q.shape[Q.split] < Q.comm.size:
+            Q.resplit_(None)
+        Y = matmul(A.T, Q)
+        Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge)
+        if Q.split is not None and Q.shape[Q.split] < Q.comm.size:
+            Q.resplit_(None)
+        Y = matmul(A, Q)
+        Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge)
+
+    # compute the SVD of the projected matrix
+    if Q.split is not None and Q.shape[Q.split] < Q.comm.size:
+        Q.resplit_(None)
+    B = matmul(Q.T, A)
+    B.resplit_(
+        None
+    )  # B will be of size ell x ell and thus small enough to fit into memory of a single process
+    U, sigma, V = svd.svd(B)  # actually just torch svd as input is not split anymore
+    U = matmul(Q, U)[:, :rank]
+    U.balance_()
+    S = sigma[:rank]
+    V = V[:, :rank]
+    V.balance_()
+    return U, S, V
+
+
+##############################################################################################
+# Incremental SVD "iSVD"
+##############################################################################################
+
+
+def _isvd(
+    new_data: DNDarray,
+    U_old: DNDarray,
+    S_old: DNDarray,
+    V_old: Optional[DNDarray] = None,
+    maxrank: Optional[int] = None,
+    old_matrix_size: Optional[int] = None,
+    old_rowwise_mean: Optional[DNDarray] = None,
+) -> Union[Tuple[DNDarray, DNDarray, DNDarray], Tuple[DNDarray, DNDarray, DNDarray, DNDarray]]:
+    """
+    Helper function for iSVD and iPCA; follows roughly the "incremental PCA with mean update", Fig.1 in:
+    David A. Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang. Incremental Learning for Robust Visual Tracking. IJCV, 2008.
+
+    Either incremental SVD / PCA or incremental SVD / PCA  with mean subtraction is performed.
+
+    Parameters
+    ----------
+    new_data: DNDarray
+        new data as DNDarray
+    U_old, S_old, V_old: DNDarrays
+        "old" SVD-factors
+        if no V_old is provided, only U and S are computed (PCA)
+    maxrank: int, optional
+        rank to which new SVD should be truncated
+    old_matrix_size: int, optional
+        size of the old matrix; this does not need to be identical to V_old.shape[0] as "old" SVD might have been truncated
+    old_rowwise_mean: int, optional
+        row-wise mean of the old matrix; if not provided, no mean subtraction is performed
+    """
+    # old SVD is SVD of a matrix of dimension m x n as has rank r
+    # new data have shape m x d
+    d = new_data.shape[1]
+    n = V_old.shape[0] if V_old is not None else old_matrix_size
+    r = S_old.shape[0]
+    if maxrank is None:
+        maxrank = min(n + d, U_old.shape[0])
+    else:
+        maxrank = min(maxrank, min(n + d, U_old.shape[0]))
+
+    if old_rowwise_mean is not None:
+        new_data_rowwise_mean = statistics.mean(new_data, axis=1)
+        new_rowwise_mean = (old_matrix_size * old_rowwise_mean + d * new_data_rowwise_mean) / (
+            old_matrix_size + d
+        )
+        new_data -= new_data_rowwise_mean.reshape(-1, 1)
+        new_data = hstack(
+            [
+                new_data,
+                (new_data_rowwise_mean - old_rowwise_mean)
+                * (d * old_matrix_size / (d + old_matrix_size)) ** 0.5,
+            ]
+        )
+        d += 1
+
+    # orthogonalize and decompose new_data
+    UtC = U_old.T @ new_data
+    if U_old.split is not None:
+        new_data = new_data.resplit_(U_old.split) - U_old @ UtC
+    else:
+        new_data = new_data - (U_old @ UtC).resplit_(new_data.split)
+    P, Rc = qr(new_data)
+
+    # prepare one component of "new" V-factor
+    if V_old is not None:
+        V_new = vstack(
+            [
+                V_old,
+                factories.zeros(
+                    (d, r),
+                    device=V_old.device,
+                    dtype=V_old.dtype,
+                    split=V_old.split,
+                    comm=V_old.comm,
+                ),
+            ]
+        )
+        helper = vstack(
+            [
+                factories.zeros(
+                    (n, d),
+                    device=V_old.device,
+                    dtype=V_old.dtype,
+                    split=V_old.split,
+                    comm=V_old.comm,
+                ),
+                factories.eye(
+                    d, device=V_old.device, dtype=V_old.dtype, split=V_old.split, comm=V_old.comm
+                ),
+            ]
+        )
+        V_new = hstack([V_new, helper])
+        del helper
+
+    # prepare one component of "new" U-factor
+    U_new = hstack([U_old, P])
+
+    # prepare "inner" matrix that needs to be decomposed, decompose it
+    helper1 = vstack(
+        [
+            diag(S_old),
+            factories.zeros(
+                (Rc.shape[0] + UtC.shape[0] - r, r),
+                device=S_old.device,
+                dtype=S_old.dtype,
+                split=S_old.split,
+                comm=S_old.comm,
+            ),
+        ]
+    )
+    if r > d:
+        Rc = Rc.resplit_(UtC.split)
+    else:
+        UtC = UtC.resplit_(Rc.split)
+    helper2 = vstack([UtC, Rc])
+    innermat = hstack([helper1, helper2])
+    del (helper1, helper2)
+    # as innermat is small enough to fit into memory of a single process, we can use torch svd
+    u, s, v = svd.svd(innermat.resplit_(None))
+    del innermat
+
+    # truncate if desired
+    if maxrank < s.shape[0]:
+        u = u[:, :maxrank]
+        s = s[:maxrank]
+        v = v[:, :maxrank]
+
+    U_new = U_new @ u
+    if V_old is not None:
+        V_new = V_new @ v
+
+    if V_old is not None:  # use-case: SVD
+        return U_new, s, V_new
+    if old_rowwise_mean is not None:  # use-case PCA
+        return U_new, s, new_rowwise_mean
+
+
+def isvd(
+    new_data: DNDarray,
+    U_old: DNDarray,
+    S_old: DNDarray,
+    V_old: DNDarray,
+    maxrank: Optional[int] = None,
+) -> Tuple[DNDarray, DNDarray, DNDarray]:
+    r"""Incremental SVD (iSVD) for the addition of new data to an existing SVD.
+    Given the the SVD of an "old" matrix, :math:`X_\textnormal{old} = `U_\textnormal{old} \cdot S_\textnormal{old} \cdot V_\textnormal{old}^T`, and additional columns :math:`N` (\"`new_data`\"), this routine computes
+    (a possibly approximate) SVD of the extended matrix :math:`X_\textnormal{new} = [ X_\textnormal{old} | N]`.
+
+    Parameters
+    ----------
+    new_data : DNDarray
+        2D-array (float32/64) of columns that are added to the "old" SVD. It must hold `new_data.split != 1` if `U_old.split = 0`.
+    U_old : DNDarray
+        U-factor of the SVD of the "old" matrix, 2D-array (float32/64). It must hold `U_old.split != 0` if `new_data.split = 1`.
+    S_old : DNDarray
+        Sigma-factor of the SVD of the "old" matrix, 1D-array (float32/64)
+    V_old : DNDarray
+        V-factor of the SVD of the "old" matrix, 2D-array (float32/64)
+    maxrank : int, optional
+        truncation rank of the SVD of the extended matrix. The default is None, i.e., no bound on the maximal rank is imposed.
+
+    Notes
+    -----
+    Inexactness may arise due to truncation to maximal rank `maxrank` if rank of the data to be processed exceeds this rank.
+    If you set `maxrank` to a high number (or None) in order to avoid inexactness, you may encounter memory issues.
+    The implementation follows the approach described in Ref. [1], Sect. 2.
+
+    References
+    ----------
+    [1] Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1), 20-30.
+    """
+    # check if new_data, U_old, V_old are 2D DNDarrays and float32/64
+    sanitize_in_nd_realfloating(new_data, "new_data", [2])
+    sanitize_in_nd_realfloating(U_old, "U_old", [2])
+    sanitize_in_nd_realfloating(S_old, "S_old", [1])
+    sanitize_in_nd_realfloating(V_old, "V_old", [2])
+    # check if number of columns of U_old and V_old match the number of elements in S_old
+    if U_old.shape[1] != S_old.shape[0]:
+        raise ValueError(
+            "The number of columns of U_old must match the number of elements in S_old."
+        )
+    if V_old.shape[1] != S_old.shape[0]:
+        raise ValueError(
+            "The number of columns of V_old must match the number of elements in S_old."
+        )
+    # check if the number of columns of new_data matches the number of rows of U_old and V_old
+    if new_data.shape[0] != U_old.shape[0]:
+        raise ValueError("The number of rows of new_data must match the number of rows of U_old.")
+
+    return _isvd(new_data, U_old, S_old, V_old, maxrank)
diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py
index 9063c2558a..f8c500a72b 100644
--- a/heat/core/linalg/tests/test_basics.py
+++ b/heat/core/linalg/tests/test_basics.py
@@ -1,26 +1,74 @@
-from typing import Type
+import numpy as np
 import torch
-import os
-import unittest
 import heat as ht
-import numpy as np
 
 from ...tests.test_suites.basic_test import TestCase
+from ..basics import _estimate_largest_singularvalue
 
 
 class TestLinalgBasics(TestCase):
+    def test_estimate_largest_singularvalue(self):
+        for param in [(0, ht.float32), (1, ht.float64)]:
+            with self.subTest(param=param):
+                x = ht.random.randn(100, 100, split=param[0], dtype=param[1])
+                est = _estimate_largest_singularvalue(x)
+                self.assertIsInstance(est, ht.DNDarray)
+                self.assertTrue(est >= 0)
+                self.assertEqual(est.dtype, param[1])
+                self.assertTrue(est.item() >= np.linalg.svd(x.numpy(), compute_uv=False).max())
+
+        # catch wrong inputs
+        with self.assertRaises(NotImplementedError):
+            est = _estimate_largest_singularvalue(x, algorithm="invalid")
+        with self.assertRaises(TypeError):
+            est = _estimate_largest_singularvalue(x, algorithm=1)
+
+    def test_condest(self):
+        # split = 0, tall-skinny type, float32 (actually split = 1, but due to transposition this yields split = 0 interally)
+        x = ht.random.randn(25, 25 * ht.MPI_WORLD.size, split=1, dtype=ht.float32)
+        est = ht.linalg.condest(x)
+        self.assertIsInstance(est, ht.DNDarray)
+        self.assertTrue(est >= 0)
+        self.assertTrue(est.dtype, ht.float32)
+        xnp = x.numpy()
+        xnpsvals = np.linalg.svd(xnp, compute_uv=False)
+        self.assertTrue(est.item() >= xnpsvals.max() / xnpsvals.min())
+
+        # split = 1, float64
+        x = ht.random.randn(
+            25 * ht.MPI_WORLD.size + 2, 25 * ht.MPI_WORLD.size + 1, split=1, dtype=ht.float64
+        )
+        est = ht.linalg.condest(x, algorithm="randomized", params={"nsamples": 15})
+        self.assertEqual(est.shape, ())
+        self.assertEqual(est.device, x.device)
+        self.assertTrue(est.dtype, ht.float64)
+        self.assertTrue(est.item() >= np.linalg.svd(x.numpy(), compute_uv=False).max())
+
+        # catch wrong inputs
+        with self.assertRaises(NotImplementedError):
+            est = ht.linalg.condest(x, algorithm="invalid")
+        with self.assertRaises(TypeError):
+            est = ht.linalg.condest(x, algorithm=3.14)
+        with self.assertRaises(ValueError):
+            est = ht.linalg.condest(x, algorithm="randomized", params={"nsamples": 0})
+        with self.assertRaises(TypeError):
+            est = ht.linalg.condest(x, algorithm="randomized", params=10)
+        with self.assertRaises(ValueError):
+            est = ht.linalg.condest(x, p=3)
+
     def test_cross(self):
         a = ht.eye(3)
         b = ht.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
 
-        # different types
-        cross = ht.cross(a, b)
-        self.assertEqual(cross.shape, a.shape)
-        self.assertEqual(cross.dtype, a.dtype)
-        self.assertEqual(cross.split, a.split)
-        self.assertEqual(cross.comm, a.comm)
-        self.assertEqual(cross.device, a.device)
-        self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])))
+        # different types - do not run on MPS
+        if not self.is_mps:
+            cross = ht.cross(a, b)
+            self.assertEqual(cross.shape, a.shape)
+            self.assertEqual(cross.dtype, a.dtype)
+            self.assertEqual(cross.split, a.split)
+            self.assertEqual(cross.comm, a.comm)
+            self.assertEqual(cross.device, a.device)
+            self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])))
 
         # axis
         a = ht.eye(3, split=0)
@@ -32,7 +80,7 @@ def test_cross(self):
         self.assertEqual(cross.split, a.split)
         self.assertEqual(cross.comm, a.comm)
         self.assertEqual(cross.device, a.device)
-        self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])))
+        self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]], dtype=ht.int)))
 
         a = ht.eye(3, dtype=ht.int8, split=1)
         b = ht.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=ht.int8, split=1)
@@ -47,8 +95,8 @@ def test_cross(self):
 
         # test axisa, axisb, axisc
         np.random.seed(42)
-        np_a = np.random.randn(40, 3, 50)
-        np_b = np.random.randn(3, 40, 50)
+        np_a = np.random.randn(40, 3, 50).astype(np.float32)
+        np_b = np.random.randn(3, 40, 50).astype(np.float32)
         np_cross = np.cross(np_a, np_b, axisa=1, axisb=0)
 
         a = ht.array(np_a, split=0)
@@ -63,16 +111,26 @@ def test_cross(self):
         # test vector axes with 2 elements
         b_2d = ht.array(np_b[:-1, :, :], split=1)
         cross_3d_2d = ht.cross(a, b_2d, axisa=1, axisb=0)
-        np_cross_3d_2d = np.cross(np_a, np_b[:-1, :, :], axisa=1, axisb=0)
+        np_cross_3d_2d = np.cross(
+            np_a,
+            np.concatenate([np_b[:-1, :, :], np.zeros((1, 40, 50))], axis=0, dtype=np.float32),
+            axisa=1,
+            axisb=0,
+        )
         self.assert_array_equal(cross_3d_2d, np_cross_3d_2d)
 
         a_2d = ht.array(np_a[:, :-1, :], split=0)
         cross_2d_3d = ht.cross(a_2d, b, axisa=1, axisb=0)
-        np_cross_2d_3d = np.cross(np_a[:, :-1, :], np_b, axisa=1, axisb=0)
+        np_cross_2d_3d = np.cross(
+            np.concatenate([np_a[:, :-1, :], np.zeros((40, 1, 50))], axis=1, dtype=np.float32),
+            np_b,
+            axisa=1,
+            axisb=0,
+        )
         self.assert_array_equal(cross_2d_3d, np_cross_2d_3d)
 
         cross_z_comp = ht.cross(a_2d, b_2d, axisa=1, axisb=0)
-        np_cross_z_comp = np.cross(np_a[:, :-1, :], np_b[:-1, :, :], axisa=1, axisb=0)
+        np_cross_z_comp = np_a[:, 0, ...] * np_b[1, ...] - np_a[:, 1, ...] * np_b[0, ...]
         self.assert_array_equal(cross_z_comp, np_cross_z_comp)
 
         a_wrong_split = ht.array(np_a[:, :-1, :], split=2)
@@ -93,7 +151,7 @@ def test_cross(self):
     def test_det(self):
         # (3,3) with pivoting
         ares = ht.array(54.0)
-        a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=0, dtype=ht.double)
+        a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=0, dtype=ht.float32)
         adet = ht.linalg.det(a)
 
         self.assertTupleEqual(adet.shape, ares.shape)
@@ -102,7 +160,9 @@ def test_det(self):
         self.assertEqual(adet.device, a.device)
         self.assertTrue(ht.equal(adet, ares))
 
-        a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=1, dtype=ht.double)
+        dtype = ht.float64 if not self.is_mps else ht.float32
+
+        a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=1, dtype=dtype)
         adet = ht.linalg.det(a)
 
         self.assertTupleEqual(adet.shape, ares.shape)
@@ -113,7 +173,7 @@ def test_det(self):
 
         # det==0
         ares = ht.array(0.0)
-        a = ht.array([[0, 0, 0], [2, 1, 4], [-3, 3, -1]], dtype=ht.float64, split=0)
+        a = ht.array([[0, 0, 0], [2, 1, 4], [-3, 3, -1]], dtype=dtype, split=0)
         adet = ht.linalg.det(a)
 
         self.assertTupleEqual(adet.shape, ares.shape)
@@ -194,7 +254,11 @@ def test_dot(self):
         a1d = ht.array(data1d, dtype=ht.float32, split=0)
         b1d = ht.array(data1d, dtype=ht.float32, split=0)
         self.assertEqual(ht.dot(a1d, b1d), np.dot(data1d, data1d))
-        # 2 1D arrays,
+
+        dtype = np.float32 if self.is_mps else np.float64
+        data1d = data1d.astype(dtype)
+        data2d = data2d.astype(dtype)
+        data3d = data3d.astype(dtype)
 
         a2d = ht.array(data2d, split=1)
         b2d = ht.array(data2d, split=1)
@@ -210,13 +274,13 @@ def test_dot(self):
         const1 = 5
         const2 = 6
         # a is const
-        res = ht.dot(const1, b2d) - ht.array(np.dot(const1, data2d))
+        res = ht.dot(const1, b2d) - ht.array(np.dot(const1, data2d).astype(dtype))
         ret = 0
         ht.dot(const1, b2d, out=ret)
         self.assertEqual(ht.equal(res, ht.zeros(res.shape)), 1)
 
         # b is const
-        res = ht.dot(a2d, const2) - ht.array(np.dot(data2d, const2))
+        res = ht.dot(a2d, const2) - ht.array(np.dot(data2d, const2).astype(dtype))
         self.assertEqual(ht.equal(res, ht.zeros(res.shape)), 1)
         # a and b and const
         self.assertEqual(ht.dot(const2, const1), 5 * 6)
@@ -281,34 +345,38 @@ def test_inv(self):
         self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         # pivoting row change
-        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=0) / 3.0
-        a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=0)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        atol = 1e-6 if dtype == ht.float32 else 1e-12
+
+        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=dtype, split=0) / 3.0
+        a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=dtype, split=0)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
+        self.assertTrue(ht.allclose(ainv, ares, atol=atol))
 
-        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=1) / 3.0
-        a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=1)
+        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=dtype, split=1) / 3.0
+        a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=dtype, split=1)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares, atol=1e-15))
+        self.assertTrue(ht.allclose(ainv, ares, atol=atol))
 
         ht.random.seed(42)
-        a = ht.random.random((20, 20), dtype=ht.float64, split=1)
+        a = ht.random.random((20, 20), dtype=dtype, split=1)
         ainv = ht.linalg.inv(a)
         i = ht.eye(a.shape, split=1, dtype=a.dtype)
         # loss of precision in distributed floating-point ops
-        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-10))
+        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol))
 
         ht.random.seed(42)
-        a = ht.random.random((20, 20), dtype=ht.float64, split=0)
+        a = ht.random.random((20, 20), dtype=dtype, split=0)
         ainv = ht.linalg.inv(a)
         i = ht.eye(a.shape, split=0, dtype=a.dtype)
-        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-10))
+        print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}")
+        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol * 1e2))
 
         with self.assertRaises(RuntimeError):
             ht.linalg.inv(ht.array([1, 2, 3], split=0))
@@ -827,14 +895,16 @@ def test_matmul(self):
                 a = ht.zeros((3, 3, 3), split=0)
                 b = ht.zeros((4, 3, 3), split=0)
                 ht.matmul(a, b)
-            # not implemented split
-            """
-            todo
+            # split along different batch dimension
             with self.assertRaises(NotImplementedError):
-                a = ht.zeros((3, 3, 3))
-                b = ht.zeros((3, 3, 3))
+                a = ht.zeros((4, 3, 3, 3), split=0)
+                b = ht.zeros((4, 3, 3, 3), split=1)
+                ht.matmul(a, b)
+            # batched matrix-vector multiplication
+            with self.assertRaises(NotImplementedError):
+                a = ht.zeros((3, 3, 3), split=0)
+                b = ht.zeros((3, 3), split=0)
                 ht.matmul(a, b)
-            """
 
             # batched, split batch
             n = 11  # number of batches
@@ -1071,7 +1141,10 @@ def test_outer(self):
         self.assertTrue((ht_outer_split.numpy() == np_outer).all())
 
         # a_split.ndim > 1 and a.split != 0
-        a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float64, split=2)
+        if self.is_mps:
+            a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float32, split=2)
+        else:
+            a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float64, split=2)
         ht_outer_split = ht.outer(a_split_3d, b_split)
         np_outer_3d = np.outer(a_split_3d.numpy(), b_split.numpy())
         self.assertTrue(ht_outer_split.split == 0)
@@ -1772,39 +1845,40 @@ def test_tril(self):
         self.assertTrue((result.larray == comparison).all())
 
         local_ones = ht.ones((3, 4, 5, 6))
-
-        # 2D+ case, no offset, data is not split, module-level call
-        result = local_ones.tril()
-        comparison = torch.ones((5, 6), device=self.device.torch_device).tril()
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
-
-        # 2D+ case, positive offset, data is not split, module-level call
-        result = local_ones.tril(k=2)
-        comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=2)
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
-
-        # # 2D+ case, negative offset, data is not split, module-level call
-        result = local_ones.tril(k=-2)
-        comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=-2)
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
+        if not self.is_mps:
+            # triu, tril fail on MPS for ndim > 2
+            # 2D+ case, no offset, data is not split, module-level call
+            result = local_ones.tril()
+            comparison = torch.ones((5, 6), device=self.device.torch_device).tril()
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
+
+            # 2D+ case, positive offset, data is not split, module-level call
+            result = local_ones.tril(k=2)
+            comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=2)
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
+
+            # # 2D+ case, negative offset, data is not split, module-level call
+            result = local_ones.tril(k=-2)
+            comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=-2)
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
 
         distributed_ones = ht.ones((5,), split=0)
 
@@ -1994,39 +2068,39 @@ def test_triu(self):
         self.assertTrue((result.larray == comparison).all())
 
         local_ones = ht.ones((3, 4, 5, 6))
-
-        # 2D+ case, no offset, data is not split, module-level call
-        result = local_ones.triu()
-        comparison = torch.ones((5, 6), device=self.device.torch_device).triu()
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
-
-        # 2D+ case, positive offset, data is not split, module-level call
-        result = local_ones.triu(k=2)
-        comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=2)
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
-
-        # # 2D+ case, negative offset, data is not split, module-level call
-        result = local_ones.triu(k=-2)
-        comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=-2)
-        self.assertIsInstance(result, ht.DNDarray)
-        self.assertEqual(result.shape, (3, 4, 5, 6))
-        self.assertEqual(result.lshape, (3, 4, 5, 6))
-        self.assertEqual(result.split, None)
-        for i in range(3):
-            for j in range(4):
-                self.assertTrue((result.larray[i, j] == comparison).all())
+        if not self.is_mps:
+            # 2D+ case, no offset, data is not split, module-level call
+            result = local_ones.triu()
+            comparison = torch.ones((5, 6), device=self.device.torch_device).triu()
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
+
+            # 2D+ case, positive offset, data is not split, module-level call
+            result = local_ones.triu(k=2)
+            comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=2)
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
+
+            # # 2D+ case, negative offset, data is not split, module-level call
+            result = local_ones.triu(k=-2)
+            comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=-2)
+            self.assertIsInstance(result, ht.DNDarray)
+            self.assertEqual(result.shape, (3, 4, 5, 6))
+            self.assertEqual(result.lshape, (3, 4, 5, 6))
+            self.assertEqual(result.split, None)
+            for i in range(3):
+                for j in range(4):
+                    self.assertTrue((result.larray[i, j] == comparison).all())
 
         distributed_ones = ht.ones((5,), split=0)
 
@@ -2182,7 +2256,7 @@ def test_vecdot(self):
         c = ht.linalg.vecdot(a, b, axis=0, keepdims=True)
         self.assertEqual(c.dtype, ht.float32)
         self.assertEqual(c.device, a.device)
-        self.assertTrue(ht.equal(c, ht.array([[8, 8, 8, 8]])))
+        self.assertTrue(ht.equal(c, ht.array([[8, 8, 8, 8]], dtype=ht.float32)))
 
     def test_vector_norm(self):
         a = ht.arange(9, dtype=ht.float) - 4
@@ -2236,22 +2310,23 @@ def test_vector_norm(self):
         )
 
         # different dtype
-        vn = ht.linalg.vector_norm(ht.full((4, 4, 4), 1 + 1j, dtype=ht.int), axis=0, ord=4)
-        self.assertEqual(vn.split, None)
-        self.assertEqual(vn.dtype, ht.float)
-        self.assertTrue(
-            ht.equal(
-                vn,
-                ht.array(
-                    [
-                        [2.0, 2.0, 2.0, 2.0],
-                        [2.0, 2.0, 2.0, 2.0],
-                        [2.0, 2.0, 2.0, 2.0],
-                        [2.0, 2.0, 2.0, 2.0],
-                    ]
-                ),
+        if not self.is_mps:
+            vn = ht.linalg.vector_norm(ht.full((4, 4, 4), 1 + 1j, dtype=ht.int), axis=0, ord=4)
+            self.assertEqual(vn.split, None)
+            self.assertEqual(vn.dtype, ht.float)
+            self.assertTrue(
+                ht.equal(
+                    vn,
+                    ht.array(
+                        [
+                            [2.0, 2.0, 2.0, 2.0],
+                            [2.0, 2.0, 2.0, 2.0],
+                            [2.0, 2.0, 2.0, 2.0],
+                            [2.0, 2.0, 2.0, 2.0],
+                        ]
+                    ),
+                )
             )
-        )
 
         # bad ord
         with self.assertRaises(ValueError):
diff --git a/heat/core/linalg/tests/test_eigh.py b/heat/core/linalg/tests/test_eigh.py
new file mode 100644
index 0000000000..45b4eecb42
--- /dev/null
+++ b/heat/core/linalg/tests/test_eigh.py
@@ -0,0 +1,55 @@
+import heat as ht
+import unittest
+import numpy as np
+
+from ...tests.test_suites.basic_test import TestCase
+
+
+class TestEigh(TestCase):
+    def _check_eigh_result(self, X, Lambda, H):
+        dtypetol = 1e-3 if X.dtype == ht.float32 else 1e-5
+        self.assertEqual(Lambda.shape, (X.shape[0],))
+        self.assertEqual(H.shape, X.shape)
+        self.assertEqual(H.split, X.split)
+        self.assertEqual(Lambda.split, 0)
+        self.assertEqual(H.dtype, X.dtype)
+        self.assertEqual(Lambda.dtype, X.dtype)
+        X_rec = H @ ht.diag(Lambda) @ H.T
+        self.assertTrue(ht.norm(X - X_rec) / ht.norm(X) < dtypetol)
+        HtH = H.T @ H
+        eye_size_H = ht.eye(HtH.shape[0], split=HtH.split, dtype=X.dtype)
+        self.assertTrue(ht.norm(HtH - eye_size_H) / ht.norm(eye_size_H) < dtypetol)
+
+    def test_eigh(self):
+        # test with default values
+        splits = [None, 0, 1]
+        dtypes = [ht.float32, ht.float64]
+        i = 0
+        for split in splits:
+            for dtype in dtypes:
+                with self.subTest(split=split, dtype=dtype):
+                    ht.random.seed(41 + i)
+                    X = ht.random.randn(100, 100, split=split, dtype=dtype)
+                    X = X + X.T.resplit_(X.split)
+                    Lambda, H = ht.linalg.eigh(X)
+                    self._check_eigh_result(X, Lambda, H)
+                    i += 1
+
+    def test_eigh_options(self):
+        # test non-default options
+        ht.random.seed(42)
+        X = ht.random.randn(101, 101, split=0, dtype=ht.float32)
+        X = X @ X.T
+        Lambda, H = ht.linalg.eigh(X, r_max_zolopd=1, silent=False)
+        self._check_eigh_result(X, Lambda, H)
+
+    def test_eigh_catch_wrong_inputs(self):
+        # non-square DNDarray as input
+        X = ht.random.rand(100, 101, split=0, dtype=ht.float32)
+        with self.assertRaises(ValueError):
+            ht.linalg.eigh(X)
+
+        # r_max_zolopd not of right type
+        X = ht.random.rand(100, 100, split=0, dtype=ht.float32)
+        with self.assertRaises(ValueError):
+            ht.linalg.eigh(X, r_max_zolopd=2.2)
diff --git a/heat/core/linalg/tests/test_polar.py b/heat/core/linalg/tests/test_polar.py
new file mode 100644
index 0000000000..2d65a9c04d
--- /dev/null
+++ b/heat/core/linalg/tests/test_polar.py
@@ -0,0 +1,117 @@
+import heat as ht
+import unittest
+import torch
+import numpy as np
+
+from ...tests.test_suites.basic_test import TestCase
+
+
+class TestZolopolar(TestCase):
+    def _check_polar(self, A, U, H, dtypetol):
+        # check whether output has right type, shape and dtype
+        self.assertTrue(isinstance(U, ht.DNDarray))
+        self.assertEqual(U.shape, A.shape)
+        self.assertEqual(U.dtype, A.dtype)
+        self.assertTrue(isinstance(H, ht.DNDarray))
+        self.assertEqual(H.shape, (A.shape[1], A.shape[1]))
+        self.assertEqual(H.dtype, A.dtype)
+
+        # check whether output is correct
+        A_np = A.numpy()
+        U_np = U.numpy()
+        H_np = H.numpy()
+        # U orthogonal
+        self.assertTrue(
+            np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=dtypetol, rtol=dtypetol)
+        )
+        # H symmetric
+        self.assertTrue(np.allclose(H_np.T, H_np, atol=dtypetol, rtol=dtypetol))
+        # H positive definite, i.e., eigenvalues > 0
+        self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all())
+        # A = U H
+        self.assertTrue(np.allclose(A_np, U_np @ H_np, atol=dtypetol, rtol=dtypetol))
+
+    def test_catch_wrong_inputs(self):
+        # if A is not a DNDarray
+        with self.assertRaises(TypeError):
+            ht.polar("I am clearly not a DNDarray. Do you mind?")
+        # test wrong input dimension
+        with self.assertRaises(ValueError):
+            ht.polar(ht.zeros((10, 10, 10), dtype=ht.float32))
+        # test wrong input shape
+        with self.assertRaises(ValueError):
+            ht.polar(ht.random.rand(10, 11, dtype=ht.float32))
+        # test wrong input dtype
+        with self.assertRaises(TypeError):
+            ht.polar(ht.ones((10, 10), dtype=ht.int32))
+        # wrong input for r
+        with self.assertRaises(ValueError):
+            ht.polar(ht.ones((11, 10)), r=1.0)
+        # wrong input for tol
+        with self.assertRaises(TypeError):
+            ht.polar(ht.ones((11, 10)), r=2, condition_estimate=1)
+
+    def test_polar_split0(self):
+        # split=0, float32, no condition estimate provided, silent mode
+        for r in range(1, 9):
+            with self.subTest(r=r):
+                ht.random.seed(18112024)
+                A = ht.random.randn(100, 10 * r, split=0, dtype=ht.float32)
+                if (
+                    ht.MPI_WORLD.size % r == 0 and ht.MPI_WORLD.size != r
+                ) or ht.MPI_WORLD.size == 1:
+                    U, H = ht.polar(A, r=r)
+                    dtypetol = 1e-4
+                    self._check_polar(A, U, H, dtypetol)
+                else:
+                    with self.assertRaises(ValueError):
+                        U, H = ht.polar(A, r=r)
+
+        # cases not covered so far
+        A = ht.random.randn(100, 100, split=0, dtype=ht.float64)
+        U, H = ht.polar(A, condition_estimate=1.0e16, silent=False)
+        dtypetol = 1e-7
+
+        self._check_polar(A, U, H, dtypetol)
+
+        # case without calculating H
+        ht.random.seed(10122024)
+        A = ht.random.randn(100, 10, split=0, dtype=ht.float32)
+        U = ht.polar(A, calcH=False)
+        U_np = U.numpy()
+        self.assertTrue(np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=1e-4, rtol=1e-4))
+        H_np = U_np.T @ A.numpy()
+        self.assertTrue(np.allclose(H_np.T, H_np, atol=1e-4, rtol=1e-4))
+        self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all())
+
+    def test_polar_split1(self):
+        # split=1, float64, condition estimate provided, non-silent mode
+        for r in range(1, 9):
+            with self.subTest(r=r):
+                ht.random.seed(623)
+                A = ht.random.randn(100, 99, split=1, dtype=ht.float64)
+                if (
+                    ht.MPI_WORLD.size % r == 0 and ht.MPI_WORLD.size != r
+                ) or ht.MPI_WORLD.size == 1:
+                    U, H = ht.polar(A, r=r, silent=False, condition_estimate=1.0e16)
+                    dtypetol = 1e-7
+
+                    self._check_polar(A, U, H, dtypetol)
+                else:
+                    with self.assertRaises(ValueError):
+                        U, H = ht.polar(A, r=r)
+
+        # cases not covered so far
+        A = ht.random.randn(100, 99, split=1, dtype=ht.float32)
+        U, H = ht.polar(A, silent=False, condition_estimate=1.0e16)
+        dtypetol = 1e-4
+        self._check_polar(A, U, H, dtypetol)
+
+        # case without calculating H
+        A = ht.random.randn(100, 100, split=1, dtype=ht.float64)
+        U = ht.polar(A, calcH=False, condition_estimate=1.0e16)
+        U_np = U.numpy()
+        self.assertTrue(np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=1e-7, rtol=1e-7))
+        H_np = U_np.T @ A.numpy()
+        self.assertTrue(np.allclose(H_np.T, H_np, atol=1e-8, rtol=1e-8))
+        self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all())
diff --git a/heat/core/linalg/tests/test_qr.py b/heat/core/linalg/tests/test_qr.py
index 6de9e091d8..dc31e03caf 100644
--- a/heat/core/linalg/tests/test_qr.py
+++ b/heat/core/linalg/tests/test_qr.py
@@ -8,17 +8,21 @@
 
 class TestQR(TestCase):
     def test_qr_split1orNone(self):
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
         ht.random.seed(1234)
 
         for split in [1, None]:
             for mode in ["reduced", "r"]:
-                # note that split = 1 can be handeled for arbitrary shapes
+                # note that split = 1 can be handled for arbitrary shapes
                 for shape in [
                     (20 * ht.MPI_WORLD.size + 1, 40 * ht.MPI_WORLD.size),
                     (20 * ht.MPI_WORLD.size, 20 * ht.MPI_WORLD.size),
                     (40 * ht.MPI_WORLD.size - 1, 20 * ht.MPI_WORLD.size),
                 ]:
-                    for dtype in [ht.float32, ht.float64]:
+                    for dtype in dtypes:
                         dtypetol = 1e-3 if dtype == ht.float32 else 1e-6
                         mat = ht.random.randn(*shape, dtype=dtype, split=split)
                         qr = ht.linalg.qr(mat, mode=mode)
@@ -72,12 +76,20 @@ def test_qr_split1orNone(self):
                             )
 
     def test_qr_split0(self):
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
         split = 0
         for procs_to_merge in [0, 2, 3]:
             for mode in ["reduced", "r"]:
-                # split = 0 can be handeled only for tall skinny matrices s.t. the local chunks are at least square too
-                for shape in [(40 * ht.MPI_WORLD.size + 1, 40), (40 * ht.MPI_WORLD.size, 20)]:
-                    for dtype in [ht.float32, ht.float64]:
+                # split = 0 can be handled only for tall skinny matrices s.t. the local chunks are at least square too
+                for shape in [
+                    (20 * ht.MPI_WORLD.size + 1, 40 * ht.MPI_WORLD.size),
+                    (20 * ht.MPI_WORLD.size, 20 * ht.MPI_WORLD.size),
+                    (40 * ht.MPI_WORLD.size - 1, 20 * ht.MPI_WORLD.size),
+                ]:
+                    for dtype in dtypes:
                         dtypetol = 1e-3 if dtype == ht.float32 else 1e-6
                         mat = ht.random.randn(*shape, dtype=dtype, split=split)
 
@@ -124,13 +136,45 @@ def test_qr_split0(self):
                                 )
                             )
 
+    def test_batched_qr_splitNone(self):
+        # two batch dimensions, float64 data type, "split = None" (split batch axis)
+        x = ht.random.rand(2, 2 * ht.MPI_WORLD.size, 10, 9, dtype=ht.float32, split=1)
+        _, r = ht.linalg.qr(x, mode="r")
+        self.assertEqual(r.shape, (2, 2 * ht.MPI_WORLD.size, 9, 9))
+        self.assertEqual(r.split, 1)
+
+    def test_batched_qr_split1(self):
+        # skip float64 tests on MPS
+        if not self.is_mps:
+            # two batch dimensions, float64 data type, "split = 1" (last dimension)
+            ht.random.seed(0)
+            x = ht.random.rand(3, 2, 50, ht.MPI_WORLD.size * 5 + 3, dtype=ht.float64, split=3)
+            q, r = ht.linalg.qr(x)
+            batched_id = ht.stack([ht.eye(q.shape[3], dtype=ht.float64) for _ in range(6)]).reshape(
+                3, 2, q.shape[3], q.shape[3]
+            )
+
+            self.assertTrue(
+                ht.allclose(q.transpose([0, 1, 3, 2]) @ q, batched_id, atol=1e-6, rtol=1e-6)
+            )
+            self.assertTrue(ht.allclose(q @ r, x, atol=1e-6, rtol=1e-6))
+
+    def test_batched_qr_split0(self):
+        ht.random.seed(424242)
+        # one batch dimension, float32 data type, "split = 0" (second last dimension)
+        x = ht.random.randn(
+            8, ht.MPI_WORLD.size * 10 + 3, ht.MPI_WORLD.size * 10 - 1, dtype=ht.float32, split=1
+        )
+        q, r = ht.linalg.qr(x)
+        batched_id = ht.stack([ht.eye(q.shape[2], dtype=ht.float32) for _ in range(q.shape[0])])
+
+        self.assertTrue(ht.allclose(q.transpose([0, 2, 1]) @ q, batched_id, atol=1e-3, rtol=1e-3))
+        self.assertTrue(ht.allclose(q @ r, x, atol=1e-3, rtol=1e-3))
+
     def test_wronginputs(self):
         # test wrong input type
         with self.assertRaises(TypeError):
             ht.linalg.qr([1, 2, 3])
-        # test too many input dimensions
-        with self.assertRaises(ValueError):
-            ht.linalg.qr(ht.zeros((10, 10, 10)))
         # wrong data type for mode
         with self.assertRaises(TypeError):
             ht.linalg.qr(ht.zeros((10, 10)), mode=1)
@@ -148,13 +192,6 @@ def test_wronginputs(self):
         # test wrong procs_to_merge
         with self.assertRaises(ValueError):
             ht.linalg.qr(ht.zeros((10, 10)), procs_to_merge=1)
-        # test wrong shape
-        with self.assertRaises(ValueError):
-            ht.linalg.qr(ht.zeros((10, 10, 10)))
         # test wrong dtype
         with self.assertRaises(TypeError):
             ht.linalg.qr(ht.zeros((10, 10), dtype=ht.int32))
-        # test wrong shape for split=0
-        if ht.MPI_WORLD.size > 1:
-            with self.assertRaises(ValueError):
-                ht.linalg.qr(ht.zeros((10, 10), split=0))
diff --git a/heat/core/linalg/tests/test_solver.py b/heat/core/linalg/tests/test_solver.py
index 660ab995d6..944305b63e 100644
--- a/heat/core/linalg/tests/test_solver.py
+++ b/heat/core/linalg/tests/test_solver.py
@@ -31,9 +31,14 @@ def test_cg(self):
             ht.linalg.cg(A, b, A)
 
     def test_lanczos(self):
+        # single precision tolerance for torch.inv() is pretty bad
+        tolerance = 1e-3
+
+        dtype, atol = (ht.float32, tolerance) if self.is_mps else (ht.float64, 1e-12)
+
         # define positive definite matrix (n,n), split = 0
         n = 100
-        A = ht.random.randn(n, n, dtype=ht.float64, split=0)
+        A = ht.random.randn(n, n, dtype=dtype, split=0)
         B = A @ A.T
         # Lanczos decomposition with iterations m = n
         V, T = ht.lanczos(B, m=n)
@@ -41,32 +46,27 @@ def test_lanczos(self):
         self.assertTrue(T.dtype is B.dtype)
         # V must be unitary
         V_inv = ht.linalg.inv(V)
-        self.assertTrue(ht.allclose(V_inv, V.T))
+        self.assertTrue(ht.allclose(V_inv, V.T, atol=atol))
         # V T V.T must be = B, V transposed = V inverse
         lanczos_B = V @ T @ V_inv
-        self.assertTrue(ht.allclose(lanczos_B, B))
+        self.assertTrue(ht.allclose(lanczos_B, B, atol=atol))
 
         # complex128, output buffers
-        A = (
-            ht.random.rand(n, n, dtype=ht.float64, split=0)
-            + ht.random.rand(n, n, dtype=ht.float64, split=0) * 1j
-        )
-        A_conj = ht.conj(A)
-        B = A @ A_conj.T
-        m = n
-        V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm)
-        T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm)
-        # Lanczos decomposition with iterations m = n
-        ht.lanczos(B, m=m, V_out=V_out, T_out=T_out)
-        # V must be unitary
-        V_inv = ht.linalg.inv(V_out)
-        self.assertTrue(ht.allclose(V_inv, ht.conj(V_out).T))
-        # V T V* must be = B, V conjugate transpose = V inverse
-        lanczos_B = V_out @ T_out @ V_inv
-        self.assertTrue(ht.allclose(lanczos_B, B))
-
-        # single precision tolerance for torch.inv() is pretty bad
-        tolerance = 1e-3
+        if not self.is_mps:
+            A = ht.random.rand(n, n, dtype=ht.complex128, split=0)
+            A_conj = ht.conj(A)
+            B = A @ A_conj.T
+            m = n
+            V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm)
+            T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm)
+            # Lanczos decomposition with iterations m = n
+            ht.lanczos(B, m=m, V_out=V_out, T_out=T_out)
+            # V must be unitary
+            V_inv = ht.linalg.inv(V_out)
+            self.assertTrue(ht.allclose(V_inv, ht.conj(V_out).T))
+            # V T V* must be = B, V conjugate transpose = V inverse
+            lanczos_B = V_out @ T_out @ V_inv
+            self.assertTrue(ht.allclose(lanczos_B, B))
 
         # float32, pre_defined v0, split mismatch
         A = ht.random.randn(n, n, dtype=ht.float32, split=0)
@@ -77,46 +77,46 @@ def test_lanczos(self):
         V, T = ht.lanczos(B, m=n, v0=v0)
         self.assertTrue(V.dtype is B.dtype)
         self.assertTrue(T.dtype is B.dtype)
-        # V must be unitary
-        V_inv = ht.linalg.inv(V)
-        self.assertTrue(ht.allclose(V_inv, V.T, atol=tolerance))
-        # V T V.T must be = B, V transposed = V inverse
-        lanczos_B = V @ T @ V_inv
-        self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance))
+        # # skipping the following tests as torch.inv on float32 is too imprecise
+        # # V must be unitary
+        # V_inv = ht.linalg.inv(V)
+        # self.assertTrue(ht.allclose(V_inv, V.T, atol=atol))
+        # # V T V.T must be = B, V transposed = V inverse
+        # lanczos_B = V @ T @ V_inv
+        # self.assertTrue(ht.allclose(lanczos_B, B, atol=atol))
 
         # complex64
-        A = (
-            ht.random.randn(n, n, dtype=ht.float32, split=0)
-            + ht.random.randn(n, n, dtype=ht.float32, split=0) * 1j
-        )
-        A_conj = ht.conj(A)
-        B = A @ A_conj.T
-        # Lanczos decomposition with iterations m = n
-        V, T = ht.lanczos(B, m=n)
-        # V must be unitary
-        # V T V* must be = B, V conjugate transpose = V inverse
-        V_conj = ht.conj(V)
-        lanczos_B = V @ T @ V_conj.T
-        self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance))
+        if not self.is_mps:
+            # in principle, MPS supports complex64, but many operations are not implemented, e.g. matmul, div
+            A = ht.random.randn(n, n, dtype=ht.complex64, split=0)
+            A_conj = ht.conj(A)
+            B = A @ A_conj.T
+            # Lanczos decomposition with iterations m = n
+            V, T = ht.lanczos(B, m=n)
+            # V must be unitary
+            # V T V* must be = B, V conjugate transpose = V inverse
+            V_conj = ht.conj(V)
+            lanczos_B = V @ T @ V_conj.T
+            self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance))
 
         # non-distributed
-        A = ht.random.randn(n, n, dtype=ht.float64, split=None)
+        A = ht.random.randn(n, n, dtype=dtype, split=None)
         B = A @ A.T
         # Lanczos decomposition with iterations m = n
         m = n
         V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm)
-        T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm)
+        T_out = ht.zeros((m, m), dtype=dtype, device=B.device, comm=B.comm)
         ht.lanczos(B, m=m, V_out=V_out, T_out=T_out)
         self.assertTrue(V_out.dtype is B.dtype)
         self.assertTrue(T_out.dtype is B.real.dtype)
         # V must be unitary
         V_inv = ht.linalg.inv(V_out)
-        self.assertTrue(ht.allclose(V_inv, V_out.T))
+        self.assertTrue(ht.allclose(V_inv, V_out.T, atol=atol))
         # without output buffers
         V, T = ht.lanczos(B, m=m)
         # V T V.T must be = B, V transposed = V inverse
         lanczos_B = V @ T @ V.T
-        self.assertTrue(ht.allclose(lanczos_B, B))
+        self.assertTrue(ht.allclose(lanczos_B, B, atol=atol))
 
         with self.assertRaises(TypeError):
             V, T = ht.lanczos(B, m="3")
@@ -199,19 +199,20 @@ def test_solve_triangular(self):
             self.assertTrue(ht.equal(res, c))
 
         # batched tests
-        batch_shapes = [
-            (10,),
-            (
-                4,
-                4,
-                4,
-                20,
-            ),
-        ]
+        if self.is_mps:
+            # reduction ops on tensors with ndim > 4 are not supported on MPS
+            # see e.g. https://github.com/pytorch/pytorch/issues/129960
+            # fmt: off
+            batch_shapes = [(10,),]
+            # fmt: on
+        else:
+            # fmt: off
+            batch_shapes = [(10,), (4, 4, 4, 20,),]
+            # fmt: on
         m = 100  # data dimension size
 
         # exceptions
-        batch_shape = batch_shapes[1]
+        batch_shape = batch_shapes[-1]
 
         at = torch.rand((*batch_shape, m, m))
         # at += torch.eye(k)
@@ -235,7 +236,6 @@ def test_solve_triangular(self):
 
         for batch_shape in batch_shapes:
             # batch_shape = tuple() # no batch dimensions
-
             at = torch.rand((*batch_shape, m, m))
             # at += torch.eye(k)
             at += 1e2 * torch.ones_like(at)  # make gaussian elimination more stable
@@ -254,7 +254,6 @@ def test_solve_triangular(self):
                 b.resplit_(s1)
 
                 res = ht.linalg.solve_triangular(a, b)
-
                 self.assertTrue(ht.allclose(c, res))
 
             # split in batch dimension
@@ -264,5 +263,4 @@ def test_solve_triangular(self):
             c.resplit_(s)
 
             res = ht.linalg.solve_triangular(a, b)
-
             self.assertTrue(ht.allclose(c, res))
diff --git a/heat/core/linalg/tests/test_svd.py b/heat/core/linalg/tests/test_svd.py
index e25a5acd12..5badcb242d 100644
--- a/heat/core/linalg/tests/test_svd.py
+++ b/heat/core/linalg/tests/test_svd.py
@@ -8,7 +8,11 @@
 
 class TestTallSkinnySVD(TestCase):
     def test_tallskinny_split0(self):
-        for dtype in [ht.float32, ht.float64]:
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+        for dtype in dtypes:
             for n_merge in [0, None]:
                 tol = 1e-5 if dtype == ht.float32 else 1e-10
                 X = ht.random.randn(ht.MPI_WORLD.size * 10 + 3, 10, split=0, dtype=dtype)
@@ -30,7 +34,11 @@ def test_tallskinny_split0(self):
                 self.assertTrue(ht.all(S >= 0))
 
     def test_shortfat_split1(self):
-        for dtype in [ht.float32, ht.float64]:
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+        for dtype in dtypes:
             tol = 1e-5 if dtype == ht.float32 else 1e-10
             X = ht.random.randn(10, ht.MPI_WORLD.size * 10 + 3, split=1, dtype=dtype)
             U, S, V = ht.linalg.svd(X)
@@ -48,7 +56,11 @@ def test_shortfat_split1(self):
             self.assertTrue(ht.all(S >= 0))
 
     def test_singvals_only(self):
-        for dtype in [ht.float32, ht.float64]:
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+        for dtype in dtypes:
             tol = 1e-5 if dtype == ht.float32 else 1e-10
             for split in [0, 1]:
                 shape = (
@@ -69,16 +81,6 @@ def test_singvals_only(self):
                 )
 
     def test_wrong_inputs(self):
-        # split = 0 but not tall skinny
-        X = ht.random.randn(10, 10, split=0)
-        if ht.MPI_WORLD.size > 1:
-            with self.assertRaises(ValueError):
-                ht.linalg.svd(X)
-        # split = 1 but not short fat
-        X = ht.random.randn(10, 10, split=1)
-        if ht.MPI_WORLD.size > 1:
-            with self.assertRaises(ValueError):
-                ht.linalg.svd(X)
         # full_matrices = True
         X = ht.random.rand(10 * ht.MPI_WORLD.size, 5, split=0)
         with self.assertRaises(NotImplementedError):
@@ -100,3 +102,47 @@ def test_wrong_inputs(self):
         X = ht.ones((10 * ht.MPI_WORLD.size, 10), split=0, dtype=ht.int32)
         with self.assertRaises(TypeError):
             ht.linalg.svd(X)
+
+
+class TestZoloSVD(TestCase):
+    def test_full_svd(self):
+        shapes = [(100, 100), (117, 100), (100, 103)]
+        splits = [None, 0, 1]
+        dtypes = [ht.float32, ht.float64]
+        for shape in shapes:
+            for split in splits:
+                for dtype in dtypes:
+                    with self.subTest(shape=shape, split=split, dtype=dtype):
+                        ht.random.seed(123)
+                        tol = 1e-2 if dtype == ht.float32 else 1e-2
+                        X = ht.random.randn(*shape, split=split, dtype=dtype)
+                        if split is not None and ht.MPI_WORLD.size > 1:
+                            with self.assertWarns(UserWarning):
+                                U, S, V = ht.linalg.svd(X)
+                        else:
+                            U, S, V = ht.linalg.svd(X)
+                        self.assertTrue(
+                            ht.allclose(
+                                U.T @ U, ht.eye(U.shape[1], dtype=dtype), rtol=tol, atol=tol
+                            )
+                        )
+                        self.assertTrue(
+                            ht.allclose(
+                                V.T @ V, ht.eye(V.shape[1], dtype=dtype), rtol=tol, atol=tol
+                            )
+                        )
+                        self.assertTrue(ht.allclose(U @ ht.diag(S) @ V.T, X, rtol=tol, atol=tol))
+                        self.assertTrue(ht.all(S >= 0))
+
+    def test_options_full_svd(self):
+        # only singular values
+        X = ht.random.rand(101, 100, split=0, dtype=ht.float32)
+        S = ht.linalg.svd(X, compute_uv=False)
+
+        # prescribed r_max_zolopd
+        U, S, V = ht.linalg.svd(X, r_max_zolopd=1)
+
+        # catch error if r_max_zolopd is not provided properly
+        if X.is_distributed():
+            with self.assertRaises(ValueError):
+                ht.linalg.svd(X, r_max_zolopd=0)
diff --git a/heat/core/linalg/tests/test_svdtools.py b/heat/core/linalg/tests/test_svdtools.py
index 2946f4f88c..dbb517ab76 100644
--- a/heat/core/linalg/tests/test_svdtools.py
+++ b/heat/core/linalg/tests/test_svdtools.py
@@ -10,157 +10,178 @@
 
 class TestHSVD(TestCase):
     def test_hsvd_rank_part1(self):
-        nprocs = MPI.COMM_WORLD.Get_size()
-        test_matrices = [
-            ht.random.randn(50, 15 * nprocs, dtype=ht.float32, split=1),
-            ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=1),
-            ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=0),
-            ht.random.randn(15 * nprocs, 50, dtype=ht.float64, split=0),
-            ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=None),
-            ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=None),
-            ht.zeros((50, 15 * nprocs), dtype=ht.float32, split=1),
-        ]
-        rtols = [1e-1, 1e-2, 1e-3]
-        ranks = [5, 10, 15]
-
-        # check if hsvd yields "reasonable" results for random matrices, i.e.
-        #    U (resp. V) is orthogonal for split=1 (resp. split=0)
-        #    hsvd_rank yields the correct rank
-        #    the true reconstruction error is <= error estimate
-        #    for hsvd_rtol: true reconstruction error <= rtol (provided no further options)
-
-        for A in test_matrices:
-            if A.dtype == ht.float64:
-                dtype_tol = 1e-8
-            if A.dtype == ht.float32:
-                dtype_tol = 1e-3
+        # not testing on MPS for now as torch.norm() is unstable
+        if not self.is_mps:
+            nprocs = MPI.COMM_WORLD.Get_size()
+            test_matrices = [
+                ht.random.randn(50, 15 * nprocs, dtype=ht.float32, split=1),
+                ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=1),
+                ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=0),
+                ht.random.randn(15 * nprocs, 50, dtype=ht.float64, split=0),
+                ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=None),
+                ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=None),
+                ht.zeros((50, 15 * nprocs), dtype=ht.float32, split=1),
+            ]
+            rtols = [1e-1, 1e-2, 1e-3]
+            ranks = [5, 10, 15]
 
-            for r in ranks:
-                U, sigma, V, err_est = ht.linalg.hsvd_rank(A, r, compute_sv=True, silent=True)
-                hsvd_rk = U.shape[1]
-
-                if ht.norm(A) > 0:
-                    self.assertEqual(hsvd_rk, r)
-                    if A.split == 1:
-                        U_orth_err = (
-                            ht.norm(
-                                U.T @ U
-                                - ht.eye(hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device)
+            # check if hsvd yields "reasonable" results for random matrices, i.e.
+            #    U (resp. V) is orthogonal for split=1 (resp. split=0)
+            #    hsvd_rank yields the correct rank
+            #    the true reconstruction error is <= error estimate
+            #    for hsvd_rtol: true reconstruction error <= rtol (provided no further options)
+
+            for i, A in enumerate(test_matrices):
+                print("Testing hsvd for matrix {} of {}".format(i + 1, len(test_matrices)))
+                if A.dtype == ht.float64:
+                    dtype_tol = 1e-8
+                if A.dtype == ht.float32:
+                    dtype_tol = 1e-3
+
+                for r in ranks:
+                    U, sigma, V, err_est = ht.linalg.hsvd_rank(A, r, compute_sv=True, silent=True)
+                    hsvd_rk = U.shape[1]
+
+                    if ht.norm(A) > 0:
+                        self.assertEqual(hsvd_rk, r)
+                        if A.split == 1:
+                            U_orth_err = (
+                                ht.norm(
+                                    U.T @ U
+                                    - ht.eye(
+                                        hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device
+                                    )
+                                )
+                                / hsvd_rk**0.5
                             )
-                            / hsvd_rk**0.5
-                        )
-                        self.assertTrue(U_orth_err <= dtype_tol)
-                    if A.split == 0:
-                        V_orth_err = (
-                            ht.norm(
-                                V.T @ V
-                                - ht.eye(hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device)
+                            self.assertTrue(U_orth_err <= dtype_tol)
+                        if A.split == 0:
+                            V_orth_err = (
+                                ht.norm(
+                                    V.T @ V
+                                    - ht.eye(
+                                        hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device
+                                    )
+                                )
+                                / hsvd_rk**0.5
                             )
-                            / hsvd_rk**0.5
-                        )
-                        self.assertTrue(V_orth_err <= dtype_tol)
-                    true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A)
-                    self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol)
-                else:
-                    self.assertEqual(hsvd_rk, 1)
-                    self.assertEqual(ht.norm(U), 0)
-                    self.assertEqual(ht.norm(sigma), 0)
-                    self.assertEqual(ht.norm(V), 0)
-
-                # check if wrong parameter choice is caught
-                with self.assertRaises(RuntimeError):
-                    ht.linalg.hsvd_rank(A, r, maxmergedim=4)
-
-            for tol in rtols:
-                U, sigma, V, err_est = ht.linalg.hsvd_rtol(A, tol, compute_sv=True, silent=True)
-                hsvd_rk = U.shape[1]
-
-                if ht.norm(A) > 0:
-                    if A.split == 1:
-                        U_orth_err = (
-                            ht.norm(
-                                U.T @ U
-                                - ht.eye(hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device)
+                            self.assertTrue(V_orth_err <= dtype_tol)
+                        true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A)
+                        self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol)
+                    else:
+                        self.assertEqual(hsvd_rk, 1)
+                        self.assertEqual(ht.norm(U), 0)
+                        self.assertEqual(ht.norm(sigma), 0)
+                        self.assertEqual(ht.norm(V), 0)
+
+                    # check if wrong parameter choice is caught
+                    with self.assertRaises(RuntimeError):
+                        ht.linalg.hsvd_rank(A, r, maxmergedim=4)
+
+                for tol in rtols:
+                    U, sigma, V, err_est = ht.linalg.hsvd_rtol(A, tol, compute_sv=True, silent=True)
+                    hsvd_rk = U.shape[1]
+
+                    if ht.norm(A) > 0:
+                        if A.split == 1:
+                            U_orth_err = (
+                                ht.norm(
+                                    U.T @ U
+                                    - ht.eye(
+                                        hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device
+                                    )
+                                )
+                                / hsvd_rk**0.5
                             )
-                            / hsvd_rk**0.5
-                        )
-                        # print(U_orth_err)
-                        self.assertTrue(U_orth_err <= dtype_tol)
-                    if A.split == 0:
-                        V_orth_err = (
-                            ht.norm(
-                                V.T @ V
-                                - ht.eye(hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device)
+                            # print(U_orth_err)
+                            self.assertTrue(U_orth_err <= dtype_tol)
+                        if A.split == 0:
+                            V_orth_err = (
+                                ht.norm(
+                                    V.T @ V
+                                    - ht.eye(
+                                        hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device
+                                    )
+                                )
+                                / hsvd_rk**0.5
                             )
-                            / hsvd_rk**0.5
-                        )
-                        self.assertTrue(V_orth_err <= dtype_tol)
-                    true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A)
-                    self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol)
-                    self.assertTrue(true_rel_err <= tol)
-                else:
-                    self.assertEqual(hsvd_rk, 1)
-                    self.assertEqual(ht.norm(U), 0)
-                    self.assertEqual(ht.norm(sigma), 0)
-                    self.assertEqual(ht.norm(V), 0)
-
-                # check if wrong parameter choices are catched
-                with self.assertRaises(ValueError):
-                    ht.linalg.hsvd_rtol(A, tol, maxmergedim=4)
-                with self.assertRaises(ValueError):
-                    ht.linalg.hsvd_rtol(A, tol, maxmergedim=10, maxrank=11)
-                with self.assertRaises(ValueError):
-                    ht.linalg.hsvd_rtol(A, tol, no_of_merges=1)
-
-        # check if wrong input arrays are catched
-        wrong_test_matrices = [
-            0,
-            ht.ones((50, 15 * nprocs), dtype=ht.int8, split=1),
-            ht.ones((50, 15 * nprocs), dtype=ht.int16, split=1),
-            ht.ones((50, 15 * nprocs), dtype=ht.int32, split=1),
-            ht.ones((50, 15 * nprocs), dtype=ht.int64, split=1),
-            ht.ones((50, 15 * nprocs), dtype=ht.complex64, split=1),
-            ht.ones((50, 15 * nprocs), dtype=ht.complex128, split=1),
-        ]
-
-        for A in wrong_test_matrices:
-            with self.assertRaises(TypeError):
-                ht.linalg.hsvd_rank(A, 5)
-            with self.assertRaises(TypeError):
-                ht.linalg.hsvd_rank(A, 1e-1)
-
-        wrong_test_matrices = [
-            ht.ones((15, 15 * nprocs, 15), split=1, dtype=ht.float64),
-            ht.ones(15 * nprocs, split=0, dtype=ht.float64),
-        ]
-        for wrong_arr in wrong_test_matrices:
-            with self.assertRaises(ValueError):
-                ht.linalg.hsvd_rank(wrong_arr, 5)
-            with self.assertRaises(ValueError):
-                ht.linalg.hsvd_rtol(wrong_arr, 1e-1)
-
-        # check if compute_sv=False yields the correct number of outputs (=1)
-        self.assertEqual(len(ht.linalg.hsvd_rank(test_matrices[0], 5)), 2)
-        self.assertEqual(len(ht.linalg.hsvd_rtol(test_matrices[0], 5e-1)), 2)
+                            self.assertTrue(V_orth_err <= dtype_tol)
+                        true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A)
+                        self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol)
+                        self.assertTrue(true_rel_err <= tol)
+                    else:
+                        self.assertEqual(hsvd_rk, 1)
+                        self.assertEqual(ht.norm(U), 0)
+                        self.assertEqual(ht.norm(sigma), 0)
+                        self.assertEqual(ht.norm(V), 0)
+
+                    # check if wrong parameter choices are catched
+                    with self.assertRaises(ValueError):
+                        ht.linalg.hsvd_rtol(A, tol, maxmergedim=4)
+                    with self.assertRaises(ValueError):
+                        ht.linalg.hsvd_rtol(A, tol, maxmergedim=10, maxrank=11)
+                    with self.assertRaises(ValueError):
+                        ht.linalg.hsvd_rtol(A, tol, no_of_merges=1)
+
+                # check if wrong input arrays are catched
+                wrong_test_matrices = [
+                    0,
+                    ht.ones((50, 15 * nprocs), dtype=ht.int8, split=1),
+                    ht.ones((50, 15 * nprocs), dtype=ht.int16, split=1),
+                    ht.ones((50, 15 * nprocs), dtype=ht.int32, split=1),
+                    ht.ones((50, 15 * nprocs), dtype=ht.int64, split=1),
+                    ht.ones((50, 15 * nprocs), dtype=ht.complex64, split=1),
+                    ht.ones((50, 15 * nprocs), dtype=ht.complex128, split=1),
+                ]
+
+                for A in wrong_test_matrices:
+                    with self.assertRaises(TypeError):
+                        ht.linalg.hsvd_rank(A, 5)
+                    with self.assertRaises(TypeError):
+                        ht.linalg.hsvd_rank(A, 1e-1)
+
+                wrong_test_matrices = [
+                    ht.ones((15, 15 * nprocs, 15), split=1, dtype=ht.float64),
+                    ht.ones(15 * nprocs, split=0, dtype=ht.float64),
+                ]
+                for wrong_arr in wrong_test_matrices:
+                    with self.assertRaises(ValueError):
+                        ht.linalg.hsvd_rank(wrong_arr, 5)
+                    with self.assertRaises(ValueError):
+                        ht.linalg.hsvd_rtol(wrong_arr, 1e-1)
+
+                # check if compute_sv=False yields the correct number of outputs (=1)
+                self.assertEqual(len(ht.linalg.hsvd_rank(test_matrices[0], 5)), 2)
+                self.assertEqual(len(ht.linalg.hsvd_rtol(test_matrices[0], 5e-1)), 2)
 
     def test_hsvd_rank_part2(self):
         # check if hsvd_rank yields correct results for maxrank <= truerank
         nprocs = MPI.COMM_WORLD.Get_size()
         true_rk = max(10, nprocs)
-        test_matrices_low_rank = [
-            ht.utils.data.matrixgallery.random_known_rank(
-                50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
-            ),
-            ht.utils.data.matrixgallery.random_known_rank(
-                50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
-            ),
-            ht.utils.data.matrixgallery.random_known_rank(
-                15 * nprocs, 50, true_rk, split=0, dtype=ht.float64
-            ),
-            ht.utils.data.matrixgallery.random_known_rank(
-                15 * nprocs, 50, true_rk, split=0, dtype=ht.float64
-            ),
-        ]
+        if self.is_mps:
+            test_matrices_low_rank = [
+                ht.utils.data.matrixgallery.random_known_rank(
+                    50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
+                ),
+                ht.utils.data.matrixgallery.random_known_rank(
+                    50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
+                ),
+            ]
+        else:
+            test_matrices_low_rank = [
+                ht.utils.data.matrixgallery.random_known_rank(
+                    50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
+                ),
+                ht.utils.data.matrixgallery.random_known_rank(
+                    50, 15 * nprocs, true_rk, split=1, dtype=ht.float32
+                ),
+                ht.utils.data.matrixgallery.random_known_rank(
+                    15 * nprocs, 50, true_rk, split=0, dtype=ht.float64
+                ),
+                ht.utils.data.matrixgallery.random_known_rank(
+                    15 * nprocs, 50, true_rk, split=0, dtype=ht.float64
+                ),
+            ]
 
         for mat in test_matrices_low_rank:
             A = mat[0]
@@ -193,3 +214,120 @@ def test_hsvd_rank_part2(self):
                 self.assertTrue(U_orth_err <= dtype_tol)
                 self.assertTrue(V_orth_err <= dtype_tol)
                 self.assertTrue(true_rel_err <= dtype_tol)
+
+
+class TestRSVD(TestCase):
+    def test_rsvd(self):
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+        for dtype in dtypes:
+            dtype_tol = 1e-4 if dtype == ht.float32 else 1e-10
+            for split in [0, 1, None]:
+                X = ht.random.randn(200, 200, dtype=dtype, split=split)
+                for rank in [ht.MPI_WORLD.size, 10]:
+                    for n_oversamples in [5, 10]:
+                        for power_iter in [0, 1, 2, 3]:
+                            U, S, V = ht.linalg.rsvd(
+                                X, rank, n_oversamples=n_oversamples, power_iter=power_iter
+                            )
+                            self.assertEqual(U.shape, (X.shape[0], rank))
+                            self.assertEqual(S.shape, (rank,))
+                            self.assertEqual(V.shape, (X.shape[1], rank))
+                            self.assertTrue(ht.all(S >= 0))
+                            self.assertTrue(
+                                ht.allclose(
+                                    U.T @ U,
+                                    ht.eye(rank, dtype=U.dtype, split=U.split),
+                                    rtol=dtype_tol,
+                                    atol=dtype_tol,
+                                )
+                            )
+                            self.assertTrue(
+                                ht.allclose(
+                                    V.T @ V,
+                                    ht.eye(rank, dtype=V.dtype, split=V.split),
+                                    rtol=dtype_tol,
+                                    atol=dtype_tol,
+                                )
+                            )
+
+    def test_rsvd_catch_wrong_inputs(self):
+        X = ht.random.randn(10, 10)
+        # wrong dtype for rank
+        with self.assertRaises(TypeError):
+            ht.linalg.rsvd(X, "a")
+        # rank zero
+        with self.assertRaises(ValueError):
+            ht.linalg.rsvd(X, 0)
+        # wrong dtype for n_oversamples
+        with self.assertRaises(TypeError):
+            ht.linalg.rsvd(X, 10, n_oversamples="a")
+        # n_oversamples negative
+        with self.assertRaises(ValueError):
+            ht.linalg.rsvd(X, 10, n_oversamples=-1)
+        # wrong dtype for power_iter
+        with self.assertRaises(TypeError):
+            ht.linalg.rsvd(X, 10, power_iter="a")
+        # power_iter negative
+        with self.assertRaises(ValueError):
+            ht.linalg.rsvd(X, 10, power_iter=-1)
+
+
+class TestISVD(TestCase):
+    def test_isvd(self):
+        ht.random.seed(27183)
+        if self.is_mps:
+            dtypes = [ht.float32]
+        else:
+            dtypes = [ht.float32, ht.float64]
+        for dtype in dtypes:
+            dtypetol = 1e-5 if dtype == ht.float32 else 1e-10
+            for old_split in [0, 1, None]:
+                X_old, SVD_old = ht.utils.data.matrixgallery.random_known_rank(
+                    250, 25, 3 * ht.MPI_WORLD.size, split=old_split, dtype=dtype
+                )
+                U_old, S_old, V_old = SVD_old
+                for new_split in [0, 1, None]:
+                    new_data = ht.random.randn(
+                        250, 2 * ht.MPI_WORLD.size, split=new_split, dtype=dtype
+                    )
+                    U_new, S_new, V_new = ht.linalg.isvd(new_data, U_old, S_old, V_old)
+                    # check if U_new, V_new are orthogonal
+                    self.assertTrue(
+                        ht.allclose(
+                            U_new.T @ U_new,
+                            ht.eye(U_new.shape[1], dtype=U_new.dtype, split=U_new.split),
+                            atol=dtypetol,
+                            rtol=dtypetol,
+                        )
+                    )
+                    self.assertTrue(
+                        ht.allclose(
+                            V_new.T @ V_new,
+                            ht.eye(V_new.shape[1], dtype=V_new.dtype, split=V_new.split),
+                            atol=dtypetol,
+                            rtol=dtypetol,
+                        )
+                    )
+                    # check if entries of S_new are positive
+                    self.assertTrue(ht.all(S_new >= 0))
+                    # check if the reconstruction error is small
+                    X_new = ht.hstack([X_old, new_data.resplit_(X_old.split)])
+                    X_rec = U_new @ ht.diag(S_new) @ V_new.T
+                    self.assertTrue(ht.allclose(X_rec, X_new, atol=dtypetol, rtol=dtypetol))
+
+    def test_isvd_catch_wrong_inputs(self):
+        u_old = ht.zeros((10, 2))
+        s_old = ht.zeros((3,))
+        v_old = ht.zeros((5, 3))
+        new_data = ht.zeros((11, 5))
+        with self.assertRaises(ValueError):
+            ht.linalg.isvd(new_data, u_old, s_old, v_old)
+        s_old = ht.zeros((2,))
+        with self.assertRaises(ValueError):
+            ht.linalg.isvd(new_data, u_old, s_old, v_old)
+        v_old = ht.zeros((5, 2))
+        with self.assertRaises(ValueError):
+            ht.linalg.isvd(new_data, u_old, s_old, v_old)
diff --git a/heat/core/logical.py b/heat/core/logical.py
index 59051006ed..9bb8204008 100644
--- a/heat/core/logical.py
+++ b/heat/core/logical.py
@@ -48,7 +48,7 @@ def all(
     reference to ``out`` is returned.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array or object that can be converted to an array.
     axis : None or int or Tuple[int,...], optional
@@ -63,7 +63,7 @@ def all(
         With this option, the result will broadcast correctly against the original array.
 
     Examples
-    ---------
+    --------
     >>> x = ht.random.randn(4, 5)
     >>> x
     DNDarray([[ 0.7199,  1.3718,  1.5008,  0.3435,  1.2884],
@@ -114,7 +114,7 @@ def allclose(
     for all elements of ``x`` and ``y``, ``False`` otherwise
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         First array to compare
     y : DNDarray
@@ -128,7 +128,7 @@ def allclose(
         the output array.
 
     Examples
-    ---------
+    --------
     >>> x = ht.float32([[2, 2], [2, 2]])
     >>> ht.allclose(x, x)
     True
@@ -179,7 +179,7 @@ def any(
     The returning array is one dimensional unless axis is not ``None``.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input tensor
     axis : int, optional
@@ -193,7 +193,7 @@ def any(
         With this option, the result will broadcast correctly against the original array.
 
     Examples
-    ---------
+    --------
     >>> x = ht.float32([[0.3, 0, 0.5]])
     >>> x.any()
     DNDarray([True], dtype=ht.bool, device=cpu:0, split=None)
@@ -234,7 +234,7 @@ def isclose(
     within the given tolerance. If both ``x`` and ``y`` are scalars, returns a single boolean value.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array to compare.
     y : DNDarray
@@ -390,14 +390,14 @@ def logical_and(x: DNDarray, y: DNDarray) -> DNDarray:
     Compute the truth value of ``x`` AND ``y`` element-wise. Returns a boolean :class:`~heat.core.dndarray.DNDarray` containing the truth value of ``x`` AND ``y`` element-wise.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array of same shape
     y : DNDarray
         Input array of same shape
 
     Examples
-    ---------
+    --------
     >>> ht.logical_and(ht.array([True, False]), ht.array([False, False]))
     DNDarray([False, False], dtype=ht.bool, device=cpu:0, split=None)
     """
@@ -411,7 +411,7 @@ def logical_not(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
     Computes the element-wise logical NOT of the given input :class:`~heat.core.dndarray.DNDarray`.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array
     out : DNDarray, optional
@@ -419,7 +419,7 @@ def logical_not(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
         The output is a :class:`~heat.core.dndarray.DNDarray` with ``datatype=bool``.
 
     Examples
-    ---------
+    --------
     >>> ht.logical_not(ht.array([True, False]))
     DNDarray([False,  True], dtype=ht.bool, device=cpu:0, split=None)
     """
@@ -432,14 +432,14 @@ def logical_or(x: DNDarray, y: DNDarray) -> DNDarray:
     input :class:`~heat.core.dndarray.DNDarray`.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array of same shape
     y : DNDarray
         Input array of same shape
 
     Examples
-    ---------
+    --------
     >>> ht.logical_or(ht.array([True, False]), ht.array([False, False]))
     DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None)
     """
@@ -453,14 +453,14 @@ def logical_xor(x: DNDarray, y: DNDarray) -> DNDarray:
     Computes the element-wise logical XOR of the given input :class:`~heat.core.dndarray.DNDarray`.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input array of same shape
     y : DNDarray
         Input array of same shape
 
     Examples
-    ---------
+    --------
     >>> ht.logical_xor(ht.array([True, False, True]), ht.array([True, False, False]))
     DNDarray([False, False,  True], dtype=ht.bool, device=cpu:0, split=None)
     """
@@ -473,7 +473,7 @@ def __sanitize_close_input(x: DNDarray, y: DNDarray) -> Tuple[DNDarray, DNDarray
     Provides copies of ``x`` and ``y`` distributed along the same split axis (if original split axes do not match).
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         The left-hand side operand.
     y : DNDarray
@@ -493,7 +493,7 @@ def sanitize_input_type(
         In the former case, the scalar is wrapped in a :class:`~heat.core.dndarray.DNDarray`.
 
         Parameters
-        -----------
+        ----------
         x : Union[int, float, DNDarray]
             The left-hand side operand.
         y : Union[int, float, DNDarray]
diff --git a/heat/core/manipulations.py b/heat/core/manipulations.py
index 02ec09ec93..d685f4d5ad 100644
--- a/heat/core/manipulations.py
+++ b/heat/core/manipulations.py
@@ -192,7 +192,7 @@ def broadcast_to(x: DNDarray, shape: Tuple[int, ...]) -> DNDarray:
     --------
     >>> import heat as ht
     >>> a = ht.arange(100, split=0)
-    >>> b = ht.broadcast_to(a, (10,100))
+    >>> b = ht.broadcast_to(a, (10, 100))
     >>> b.shape
     (10, 100)
     >>> b.split
@@ -493,7 +493,12 @@ def concatenate(arrays: Sequence[DNDarray, ...], axis: int = 0) -> DNDarray:
         raise RuntimeError("Communicators of passed arrays mismatch.")
 
     # identify common data type
+    is_mps = arr0.larray.is_mps or arr1.larray.is_mps
     out_dtype = types.promote_types(arr0.dtype, arr1.dtype)
+    if is_mps and out_dtype == types.float64:
+        warnings.warn("MPS does not support float64, using float32 instead")
+        out_dtype = types.float32
+
     if arr0.dtype != out_dtype:
         arr0 = out_dtype(arr0, device=arr0.device)
     if arr1.dtype != out_dtype:
@@ -503,7 +508,9 @@ def concatenate(arrays: Sequence[DNDarray, ...], axis: int = 0) -> DNDarray:
     # no splits, local concat
     if s0 is None and s1 is None:
         return factories.array(
-            torch.cat((arr0.larray, arr1.larray), dim=axis), device=arr0.device, comm=arr0.comm
+            torch.cat((arr0.larray, arr1.larray), dim=axis),
+            device=arr0.device,
+            comm=arr0.comm,
         )
 
     # non-matching splits when both arrays are split
@@ -770,10 +777,12 @@ def diag(a: DNDarray, offset: int = 0) -> DNDarray:
             (abs(offset),), dtype=a.dtype, split=None, device=a.device, comm=a.comm
         )
         a = concatenate((padding, a))
-        indices_x = torch.arange(max(0, min(abs(offset) - off, lshape[0])), lshape[0])
+        indices_x = torch.arange(
+            max(0, min(abs(offset) - off, lshape[0])), lshape[0], device=a.device.torch_device
+        )
     else:
         # Offset = 0 values on main diagonal
-        indices_x = torch.arange(0, lshape[0])
+        indices_x = torch.arange(0, lshape[0], device=a.device.torch_device)
 
     indices_y = indices_x + off + offset
     a.balance_()
@@ -887,7 +896,7 @@ def dsplit(x: Sequence[DNDarray, ...], indices_or_sections: Iterable) -> List[DN
     the array is always split along the third axis provided the array dimension is greater than or equal to 3.
 
     See Also
-    ------
+    --------
     :func:`split`
     :func:`hsplit`
     :func:`vsplit`
@@ -945,7 +954,7 @@ def expand_dims(a: DNDarray, axis: int) -> DNDarray:
 
     Examples
     --------
-    >>> x = ht.array([1,2])
+    >>> x = ht.array([1, 2])
     >>> x.shape
     (2,)
     >>> y = ht.expand_dims(x, axis=0)
@@ -1023,7 +1032,7 @@ def flatten(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[[1,2],[3,4]],[[5,6],[7,8]]])
+    >>> a = ht.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
     >>> ht.flatten(a)
     DNDarray([1, 2, 3, 4, 5, 6, 7, 8], dtype=ht.int64, device=cpu:0, split=None)
     """
@@ -1031,14 +1040,22 @@ def flatten(a: DNDarray) -> DNDarray:
 
     if a.split is None:
         return factories.array(
-            torch.flatten(a.larray), dtype=a.dtype, is_split=None, device=a.device, comm=a.comm
+            torch.flatten(a.larray),
+            dtype=a.dtype,
+            is_split=None,
+            device=a.device,
+            comm=a.comm,
         )
 
     if a.split > 0:
         a = resplit(a, 0)
 
     a = factories.array(
-        torch.flatten(a.larray), dtype=a.dtype, is_split=a.split, device=a.device, comm=a.comm
+        torch.flatten(a.larray),
+        dtype=a.dtype,
+        is_split=a.split,
+        device=a.device,
+        comm=a.comm,
     )
     a.balance_()
 
@@ -1068,12 +1085,12 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[0,1],[2,3]])
+    >>> a = ht.array([[0, 1], [2, 3]])
     >>> ht.flip(a, [0])
     DNDarray([[2, 3],
               [0, 1]], dtype=ht.int64, device=cpu:0, split=None)
-    >>> b = ht.array([[0,1,2],[3,4,5]], split=1)
-    >>> ht.flip(a, [0,1])
+    >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=1)
+    >>> ht.flip(a, [0, 1])
     (1/2) tensor([5,4,3])
     (2/2) tensor([2,1,0])
     """
@@ -1087,7 +1104,7 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
 
     flipped = torch.flip(a.larray, axis)
 
-    if a.split not in axis:
+    if not a.is_distributed() or a.split not in axis:
         return factories.array(
             flipped, dtype=a.dtype, is_split=a.split, device=a.device, comm=a.comm
         )
@@ -1125,11 +1142,11 @@ def fliplr(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[0,1],[2,3]])
+    >>> a = ht.array([[0, 1], [2, 3]])
     >>> ht.fliplr(a)
     DNDarray([[1, 0],
               [3, 2]], dtype=ht.int64, device=cpu:0, split=None)
-    >>> b = ht.array([[0,1,2],[3,4,5]], split=0)
+    >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=0)
     >>> ht.fliplr(b)
     (1/2) tensor([[2, 1, 0]])
     (2/2) tensor([[5, 4, 3]])
@@ -1153,11 +1170,11 @@ def flipud(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([[0,1],[2,3]])
+    >>> a = ht.array([[0, 1], [2, 3]])
     >>> ht.flipud(a)
     DNDarray([[2, 3],
               [0, 1]], dtype=ht.int64, device=cpu:0, split=None))
-    >>> b = ht.array([[0,1,2],[3,4,5]], split=0)
+    >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=0)
     >>> ht.flipud(b)
     (1/2) tensor([3,4,5])
     (2/2) tensor([0,1,2])
@@ -1253,19 +1270,19 @@ def hstack(arrays: Sequence[DNDarray, ...]) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array((1,2,3))
-    >>> b = ht.array((2,3,4))
-    >>> ht.hstack((a,b)).larray
+    >>> a = ht.array((1, 2, 3))
+    >>> b = ht.array((2, 3, 4))
+    >>> ht.hstack((a, b)).larray
     [0/1] tensor([1, 2, 3, 2, 3, 4])
     [1/1] tensor([1, 2, 3, 2, 3, 4])
-    >>> a = ht.array((1,2,3), split=0)
-    >>> b = ht.array((2,3,4), split=0)
-    >>> ht.hstack((a,b)).larray
+    >>> a = ht.array((1, 2, 3), split=0)
+    >>> b = ht.array((2, 3, 4), split=0)
+    >>> ht.hstack((a, b)).larray
     [0/1] tensor([1, 2, 3])
     [1/1] tensor([2, 3, 4])
-    >>> a = ht.array([[1],[2],[3]], split=0)
-    >>> b = ht.array([[2],[3],[4]], split=0)
-    >>> ht.hstack((a,b)).larray
+    >>> a = ht.array([[1], [2], [3]], split=0)
+    >>> b = ht.array([[2], [3], [4]], split=0)
+    >>> ht.hstack((a, b)).larray
     [0/1] tensor([[1, 2],
     [0/1]         [2, 3]])
     [1/1] tensor([[3, 4]])
@@ -1391,7 +1408,7 @@ def pad(
 
 
     Notes
-    -----------
+    -----
     This function follows the principle of datatype integrity.
     Therefore, an array can only be padded with values of the same datatype.
     All values that violate this rule are implicitly cast to the datatype of the `DNDarray`.
@@ -1399,9 +1416,9 @@ def pad(
     Examples
     --------
     >>> a = torch.arange(2 * 3 * 4).reshape(2, 3, 4)
-    >>> b = ht.array(a, split = 0)
+    >>> b = ht.array(a, split=0)
     Pad last dimension
-    >>> c = ht.pad(b, (2,1), constant_values=1)
+    >>> c = ht.pad(b, (2, 1), constant_values=1)
     tensor([[[ 1,  1,  0,  1,  2,  3,  1],
          [ 1,  1,  4,  5,  6,  7,  1],
          [ 1,  1,  8,  9, 10, 11,  1]],
@@ -1409,7 +1426,7 @@ def pad(
          [ 1,  1, 16, 17, 18, 19,  1],
          [ 1,  1, 20, 21, 22, 23,  1]]])
     Pad last 2 dimensions
-    >>> d = ht.pad(b, [(1,0), (2,1)])
+    >>> d = ht.pad(b, [(1, 0), (2, 1)])
     DNDarray([[[ 0,  0,  0,  0,  0,  0,  0],
                [ 0,  0,  0,  1,  2,  3,  0],
                [ 0,  0,  4,  5,  6,  7,  0],
@@ -1420,7 +1437,7 @@ def pad(
                [ 0,  0, 16, 17, 18, 19,  0],
                [ 0,  0, 20, 21, 22, 23,  0]]], dtype=ht.int64, device=cpu:0, split=0)
     Pad last 3 dimensions
-    >>> e = ht.pad(b, ((2,1), [1,0], (2,1)))
+    >>> e = ht.pad(b, ((2, 1), [1, 0], (2, 1)))
     DNDarray([[[ 0,  0,  0,  0,  0,  0,  0],
                [ 0,  0,  0,  0,  0,  0,  0],
                [ 0,  0,  0,  0,  0,  0,  0],
@@ -1683,7 +1700,7 @@ def ravel(a: DNDarray) -> DNDarray:
         array to collapse
 
     Notes
-    ------
+    -----
     Returning a view of distributed data is only possible when `split != 0`. The returned DNDarray may be unbalanced.
     Otherwise, data must be communicated among processes, and `ravel` falls back to `flatten`.
 
@@ -1693,9 +1710,9 @@ def ravel(a: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.ones((2,3), split=0)
+    >>> a = ht.ones((2, 3), split=0)
     >>> b = ht.ravel(a)
-    >>> a[0,0] = 4
+    >>> a[0, 0] = 4
     >>> b
     DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0)
     """
@@ -1809,15 +1826,15 @@ def repeat(a: Iterable, repeats: Iterable, axis: Optional[int] = None) -> DNDarr
     >>> ht.repeat(3, 4)
     DNDarray([3, 3, 3, 3])
 
-    >>> x = ht.array([[1,2],[3,4]])
+    >>> x = ht.array([[1, 2], [3, 4]])
     >>> ht.repeat(x, 2)
     DNDarray([1, 1, 2, 2, 3, 3, 4, 4])
 
-    >>> x = ht.array([[1,2],[3,4]])
+    >>> x = ht.array([[1, 2], [3, 4]])
     >>> ht.repeat(x, [0, 1, 2, 0])
     DNDarray([2, 3, 3])
 
-    >>> ht.repeat(x, [1,2], axis=0)
+    >>> ht.repeat(x, [1, 2], axis=0)
     DNDarray([[1, 2],
             [3, 4],
             [3, 4]])
@@ -2030,6 +2047,8 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar
         The distribution axis of the reshaped array. If `new_split` is not provided, the reshaped array will have:
         -  the same split axis as the input array, if the original dimensionality is unchanged;
         -  split axis 0, if the number of dimensions is modified by reshaping.
+    **kwargs
+        Extra keyword arguments.
 
     Raises
     ------
@@ -2046,14 +2065,14 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar
 
     Examples
     --------
-    >>> a = ht.zeros((3,4))
-    >>> ht.reshape(a, (4,3))
+    >>> a = ht.zeros((3, 4))
+    >>> ht.reshape(a, (4, 3))
     DNDarray([[0., 0., 0.],
               [0., 0., 0.],
               [0., 0., 0.],
               [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)
     >>> a = ht.linspace(0, 14, 8, split=0)
-    >>> ht.reshape(a, (2,4))
+    >>> ht.reshape(a, (2, 4))
     (1/2) tensor([[0., 2., 4., 6.]])
     (2/2) tensor([[ 8., 10., 12., 14.]])
     # 3-dim array, distributed along axis 1
@@ -2066,7 +2085,7 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar
           [[0.0680, 0.4944, 0.4114, 0.6669],
            [0.6423, 0.2625, 0.5413, 0.2225],
            [0.0197, 0.5079, 0.4739, 0.4387]]], dtype=ht.float32, device=cpu:0, split=1)
-    >>> a.reshape(-1, 3) # reshape to 2-dim array: split axis will be set to 0
+    >>> a.reshape(-1, 3)  # reshape to 2-dim array: split axis will be set to 0
     DNDarray([[0.5525, 0.5434, 0.9477],
             [0.9503, 0.4165, 0.3924],
             [0.3310, 0.3935, 0.1008],
@@ -2075,7 +2094,7 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar
             [0.6669, 0.6423, 0.2625],
             [0.5413, 0.2225, 0.0197],
             [0.5079, 0.4739, 0.4387]], dtype=ht.float32, device=cpu:0, split=0)
-    >>> a.reshape(2,3,2,2, new_split=1) # reshape to 4-dim array, specify distribution axis
+    >>> a.reshape(2, 3, 2, 2, new_split=1)  # reshape to 4-dim array, specify distribution axis
     DNDarray([[[[0.5525, 0.5434],
                 [0.9477, 0.9503]],
 
@@ -2250,7 +2269,7 @@ def roll(
 
     Examples
     --------
-    >>> a = ht.arange(20).reshape((4,5))
+    >>> a = ht.arange(20).reshape((4, 5))
     >>> a
     DNDarray([[ 0,  1,  2,  3,  4],
           [ 5,  6,  7,  8,  9],
@@ -2268,6 +2287,9 @@ def roll(
           [ 0,  1,  2,  3,  4]], dtype=ht.int32, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(x)
+    if isinstance(axis, list):
+        axis = tuple(axis)
+    axis = stride_tricks.sanitize_axis(x.shape, axis)
 
     if axis is None:
         return roll(x.flatten(), shift, 0).reshape(x.shape, new_split=x.split)
@@ -2275,7 +2297,18 @@ def roll(
     # inputs are ints
     if isinstance(shift, int):
         if isinstance(axis, int):
-            if x.split is not None and (axis == x.split or (axis + x.ndim) == x.split):
+            if not x.is_distributed():
+                return DNDarray(
+                    torch.roll(x.larray, shift, axis),
+                    gshape=x.shape,
+                    dtype=x.dtype,
+                    split=x.split,
+                    device=x.device,
+                    comm=x.comm,
+                    balanced=x.balanced,
+                )
+            # x is distributed
+            if axis == x.split:
                 # roll along split axis
                 size = x.comm.Get_size()
                 rank = x.comm.Get_rank()
@@ -2284,9 +2317,6 @@ def roll(
                 lshape_map = x.create_lshape_map(force_check=False)[:, x.split]
                 cumsum_map = torch.cumsum(lshape_map, dim=0)  # cumulate along axis
                 indices = torch.arange(size, device=x.device.torch_device)
-                # NOTE Can be removed when min version>=1.9
-                if "1.8." in torch.__version__:  # pragma: no cover
-                    lshape_map = lshape_map.to(torch.int64)
                 index_map = torch.repeat_interleave(indices, lshape_map)  # index -> process
 
                 # compute index positions
@@ -2329,7 +2359,17 @@ def roll(
                 raise TypeError(f"axis must be a int, list or a tuple, got {type(axis)}")
 
             shift = [shift] * len(axis)
-
+            if not x.is_distributed():
+                return DNDarray(
+                    torch.roll(x.larray, shift, axis),
+                    gshape=x.shape,
+                    dtype=x.dtype,
+                    split=x.split,
+                    device=x.device,
+                    comm=x.comm,
+                    balanced=x.balanced,
+                )
+            # x is distributed
             return roll(x, shift, axis)
 
     else:  # input must be tuples now
@@ -2354,7 +2394,18 @@ def roll(
             if not isinstance(axis[i], int):
                 raise TypeError(f"Element {i} in axis is not an integer, got {type(axis[i])}")
 
-        if x.split is not None and (x.split in axis or (x.split - x.ndim) in axis):
+        if not x.is_distributed():
+            return DNDarray(
+                torch.roll(x.larray, shift, axis),
+                gshape=x.shape,
+                dtype=x.dtype,
+                split=x.split,
+                device=x.device,
+                comm=x.comm,
+                balanced=x.balanced,
+            )
+        # x is distributed
+        if x.split in axis:
             # remove split axis elements
             shift_split = 0
             for y in (x.split, x.split - x.ndim):
@@ -2416,7 +2467,7 @@ def rot90(m: DNDarray, k: int = 1, axes: Sequence[int, int] = (0, 1)) -> DNDarra
 
     Examples
     --------
-    >>> m = ht.array([[1,2],[3,4]], dtype=ht.int)
+    >>> m = ht.array([[1, 2], [3, 4]], dtype=ht.int)
     >>> m
     DNDarray([[1, 2],
               [3, 4]], dtype=ht.int32, device=cpu:0, split=None)
@@ -2426,8 +2477,8 @@ def rot90(m: DNDarray, k: int = 1, axes: Sequence[int, int] = (0, 1)) -> DNDarra
     >>> ht.rot90(m, 2)
     DNDarray([[4, 3],
               [2, 1]], dtype=ht.int32, device=cpu:0, split=None)
-    >>> m = ht.arange(8).reshape((2,2,2))
-    >>> ht.rot90(m, 1, (1,2))
+    >>> m = ht.arange(8).reshape((2, 2, 2))
+    >>> ht.rot90(m, 1, (1, 2))
     DNDarray([[[1, 3],
                [0, 2]],
 
@@ -2536,7 +2587,7 @@ def sort(a: DNDarray, axis: int = -1, descending: bool = False, out: Optional[DN
     """
     stride_tricks.sanitize_axis(a.shape, axis)
 
-    if a.split is None or axis != a.split:
+    if not a.is_distributed() or axis != a.split:
         # sorting is not affected by split -> we can just sort along the axis
         final_result, final_indices = torch.sort(a.larray, dim=axis, descending=descending)
 
@@ -2791,7 +2842,7 @@ def split(x: DNDarray, indices_or_sections: Iterable, axis: int = 0) -> List[DND
 
     Examples
     --------
-    >>> x = ht.arange(12).reshape((4,3))
+    >>> x = ht.arange(12).reshape((4, 3))
     >>> ht.split(x, 2)
         [ DNDarray([[0, 1, 2],
                     [3, 4, 5]]),
@@ -2989,7 +3040,7 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
     Split semantics: see Notes below.
 
     Parameters
-    -----------
+    ----------
     x : DNDarray
         Input data.
     axis : None or int or Tuple[int,...], optional
@@ -3006,9 +3057,9 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
     which, depending on the squeeze axis, may result in a lower numerical `split` value (see Examples).
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
-    >>> a = ht.random.randn(1,3,1,5)
+    >>> a = ht.random.randn(1, 3, 1, 5)
     >>> a
     DNDarray([[[[-0.2604,  1.3512,  0.1175,  0.4197,  1.3590]],
                [[-0.2777, -1.1029,  0.0697, -1.3074, -1.1931]],
@@ -3021,11 +3072,11 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
     DNDarray([[-0.2604,  1.3512,  0.1175,  0.4197,  1.3590],
               [-0.2777, -1.1029,  0.0697, -1.3074, -1.1931],
               [-0.4512, -1.2348, -1.1479, -0.0242,  0.4050]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.squeeze(a,axis=0).shape
+    >>> ht.squeeze(a, axis=0).shape
     (3, 1, 5)
-    >>> ht.squeeze(a,axis=-2).shape
+    >>> ht.squeeze(a, axis=-2).shape
     (1, 3, 5)
-    >>> ht.squeeze(a,axis=1).shape
+    >>> ht.squeeze(a, axis=1).shape
     Traceback (most recent call last):
     ...
     ValueError: Dimension along axis 1 is not 1 for shape (1, 3, 1, 5)
@@ -3137,7 +3188,7 @@ def stack(
     --------
     >>> a = ht.arange(20).reshape((4, 5))
     >>> b = ht.arange(20, 40).reshape((4, 5))
-    >>> ht.stack((a,b), axis=0).larray
+    >>> ht.stack((a, b), axis=0).larray
     tensor([[[ 0,  1,  2,  3,  4],
              [ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14],
@@ -3149,7 +3200,7 @@ def stack(
     >>> # distributed DNDarrays, 3 processes, stack along last dimension
     >>> a = ht.arange(20, split=0).reshape(4, 5)
     >>> b = ht.arange(20, 40, split=0).reshape(4, 5)
-    >>> ht.stack((a,b), axis=-1).larray
+    >>> ht.stack((a, b), axis=-1).larray
     [0/2] tensor([[[ 0, 20],
     [0/2]          [ 1, 21],
     [0/2]          [ 2, 22],
@@ -3241,7 +3292,7 @@ def swapaxes(x: DNDarray, axis1: int, axis2: int) -> DNDarray:
 
     Examples
     --------
-    >>> x = ht.array([[[0,1],[2,3]],[[4,5],[6,7]]])
+    >>> x = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])
     >>> ht.swapaxes(x, 0, 1)
     DNDarray([[[0, 1],
                [4, 5]],
@@ -3270,7 +3321,7 @@ def swapaxes(x: DNDarray, axis1: int, axis2: int) -> DNDarray:
 
 def unique(
     a: DNDarray, sorted: bool = False, return_inverse: bool = False, axis: int = None
-) -> Tuple[DNDarray, torch.tensor]:
+) -> Tuple[DNDarray, DNDarray]:
     """
     Finds and returns the unique elements of a `DNDarray`.
     If return_inverse is `True`, the second tensor will hold the list of inverse indices
@@ -3302,7 +3353,7 @@ def unique(
     array([[2, 3],
            [3, 1]])
     """
-    if a.split is None:
+    if not a.is_distributed():
         torch_output = torch.unique(
             a.larray, sorted=sorted, return_inverse=return_inverse, dim=axis
         )
@@ -3467,8 +3518,12 @@ def unique(
         result.resplit_(a.split)
 
     return_value = result
+
     if return_inverse:
-        return_value = [return_value, inverse_indices.to(a.device.torch_device)]
+        inverse_indices = factories.array(
+            inverse_indices, dtype=inverse_pos.dtype, device=a.device, comm=a.comm
+        )
+        return_value = [return_value, inverse_indices]
 
     return return_value
 
@@ -3485,6 +3540,7 @@ def unfold(a: DNDarray, axis: int, size: int, step: int = 1):
     """
     Returns a DNDarray which contains all slices of size `size` in the axis `axis`.
     Behaves like torch.Tensor.unfold for DNDarrays. [torch.Tensor.unfold](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html)
+
     Parameters
     ----------
     a : DNDarray
@@ -3649,7 +3705,13 @@ def resplit(arr: DNDarray, axis: Optional[int] = None) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.zeros((4, 5,), split=0)
+    >>> a = ht.zeros(
+    ...     (
+    ...         4,
+    ...         5,
+    ...     ),
+    ...     split=0,
+    ... )
     >>> a.lshape
     (0/2) (2, 5)
     (1/2) (2, 5)
@@ -3659,7 +3721,13 @@ def resplit(arr: DNDarray, axis: Optional[int] = None) -> DNDarray:
     >>> b.lshape
     (0/2) (4, 5)
     (1/2) (4, 5)
-    >>> a = ht.zeros((4, 5,), split=0)
+    >>> a = ht.zeros(
+    ...     (
+    ...         4,
+    ...         5,
+    ...     ),
+    ...     split=0,
+    ... )
     >>> a.lshape
     (0/2) (2, 5)
     (1/2) (2, 5)
@@ -3762,8 +3830,17 @@ def _axis2axisResplit(
     return target_larray
 
 
-DNDarray._axis2axisResplit = lambda self, comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile: _axis2axisResplit(
-    comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile
+DNDarray._axis2axisResplit = (
+    lambda self,
+    comm,
+    source_larray,
+    source_split,
+    source_tiles,
+    target_larray,
+    target_split,
+    target_tile: _axis2axisResplit(
+        comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile
+    )
 )
 DNDarray._axis2axisResplit.__doc__ = _axis2axisResplit.__doc__
 
@@ -3872,7 +3949,7 @@ def vstack(arrays: Sequence[DNDarray, ...]) -> DNDarray:
         1-D arrays must have the same length.
 
     Notes
-    -------
+    -----
     The split axis will be switched to 1 in the case that both elements are 1D and split=0
 
     See Also
@@ -3888,21 +3965,21 @@ def vstack(arrays: Sequence[DNDarray, ...]) -> DNDarray:
     --------
     >>> a = ht.array([1, 2, 3])
     >>> b = ht.array([2, 3, 4])
-    >>> ht.vstack((a,b)).larray
+    >>> ht.vstack((a, b)).larray
     [0/1] tensor([[1, 2, 3],
     [0/1]         [2, 3, 4]])
     [1/1] tensor([[1, 2, 3],
     [1/1]         [2, 3, 4]])
     >>> a = ht.array([1, 2, 3], split=0)
     >>> b = ht.array([2, 3, 4], split=0)
-    >>> ht.vstack((a,b)).larray
+    >>> ht.vstack((a, b)).larray
     [0/1] tensor([[1, 2],
     [0/1]         [2, 3]])
     [1/1] tensor([[3],
     [1/1]         [4]])
     >>> a = ht.array([[1], [2], [3]], split=0)
     >>> b = ht.array([[2], [3], [4]], split=0)
-    >>> ht.vstack((a,b)).larray
+    >>> ht.vstack((a, b)).larray
     [0] tensor([[1],
     [0]         [2],
     [0]         [3]])
@@ -3949,7 +4026,7 @@ def tile(x: DNDarray, reps: Sequence[int, ...]) -> DNDarray:
 
     Examples
     --------
-    >>> x = ht.arange(12).reshape((4,3)).resplit_(0)
+    >>> x = ht.arange(12).reshape((4, 3)).resplit_(0)
     >>> x
     DNDarray([[ 0,  1,  2],
               [ 3,  4,  5],
@@ -4185,7 +4262,7 @@ def topk(
     (Not Stable for split arrays)
 
     Parameters
-    -----------
+    ----------
     a: DNDarray
         Input data
     k: int
@@ -4202,16 +4279,16 @@ def topk(
     Examples
     --------
     >>> a = ht.array([1, 2, 3])
-    >>> ht.topk(a,2)
+    >>> ht.topk(a, 2)
     (DNDarray([3, 2], dtype=ht.int64, device=cpu:0, split=None), DNDarray([2, 1], dtype=ht.int64, device=cpu:0, split=None))
-    >>> a = ht.array([[1,2,3],[1,2,3]])
-    >>> ht.topk(a,2,dim=1)
+    >>> a = ht.array([[1, 2, 3], [1, 2, 3]])
+    >>> ht.topk(a, 2, dim=1)
     (DNDarray([[3, 2],
                [3, 2]], dtype=ht.int64, device=cpu:0, split=None),
      DNDarray([[2, 1],
                [2, 1]], dtype=ht.int64, device=cpu:0, split=None))
-    >>> a = ht.array([[1,2,3],[1,2,3]], split=1)
-    >>> ht.topk(a,2,dim=1)
+    >>> a = ht.array([[1, 2, 3], [1, 2, 3]], split=1)
+    >>> ht.topk(a, 2, dim=1)
     (DNDarray([[3, 2],
                [3, 2]], dtype=ht.int64, device=cpu:0, split=1),
      DNDarray([[2, 1],
@@ -4267,10 +4344,16 @@ def local_topk(*args, **kwargs):
         metadata = torch.tensor(
             [k, dim, largest, sorted, local_shape_len, *local_shape], device=indices.device
         )
-        send_buffer = torch.cat(
-            (metadata.double(), result.double().flatten(), indices.flatten().double())
-        )
 
+        if result.is_mps:
+            # MPS does not support double precision
+            send_buffer = torch.cat(
+                (metadata.float(), result.float().flatten(), indices.flatten().float())
+            )
+        else:
+            send_buffer = torch.cat(
+                (metadata.double(), result.double().flatten(), indices.flatten().double())
+            )
         return send_buffer
 
     gres = _operations.__reduce_op(
diff --git a/heat/core/memory.py b/heat/core/memory.py
index 72b8cc7d9b..dbf2d8723e 100644
--- a/heat/core/memory.py
+++ b/heat/core/memory.py
@@ -1,5 +1,5 @@
 """
-This module changes the internal memory of an array.
+Utilities to manage the internal memory of an array.
 """
 
 import torch
@@ -21,7 +21,7 @@ def copy(x: DNDarray) -> DNDarray:
 
     Examples
     --------
-    >>> a = ht.array([1,2,3])
+    >>> a = ht.array([1, 2, 3])
     >>> b = ht.copy(a)
     >>> b
     DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None)
@@ -44,7 +44,7 @@ def sanitize_memory_layout(x: torch.Tensor, order: str = "C") -> torch.Tensor:
     Return the given object with memory layout as defined below. The default memory distribution is assumed.
 
     Parameters
-    -----------
+    ----------
     x: torch.Tensor
         Input data
     order: str, optional.
diff --git a/heat/core/printing.py b/heat/core/printing.py
index 5f9f95218d..660c333e39 100644
--- a/heat/core/printing.py
+++ b/heat/core/printing.py
@@ -205,6 +205,14 @@ def __str__(dndarray) -> str:
     )
 
 
+def __repr__(dndarray) -> str:
+    """
+    Returns a printable representation of the passed DNDarray.
+    Unlike the __str__ method, which prints a representation targeted at users, this method targets developers by showing key internal parameters of the DNDarray.
+    """
+    return f"<DNDarray(MPI-rank: {dndarray.comm.rank}, Shape: {dndarray.shape}, Split: {dndarray.split}, Local Shape: {dndarray.lshape}, Device: {dndarray.device}, Dtype: {dndarray.dtype.__name__})>"
+
+
 def _torch_data(dndarray, summarize) -> DNDarray:
     """
     Extracts the data to be printed from the DNDarray in form of a torch tensor and returns it.
diff --git a/heat/core/random.py b/heat/core/random.py
index 5d22dbcc08..02e8bfad7d 100644
--- a/heat/core/random.py
+++ b/heat/core/random.py
@@ -129,8 +129,8 @@ def __counter_sequence(
         c_0 = (__counter & (max_count << 64)) >> 64
     c_1 = __counter & max_count
     total_elements = torch.prod(torch.tensor(shape))
-    if total_elements.item() > 2 * max_count:
-        raise ValueError(f"Shape is to big with {total_elements} elements")
+    # if total_elements.item() > 2 * max_count:
+    #    raise ValueError(f"Shape is to big with {total_elements} elements")
 
     if split is None:
         values = total_elements.item() // 2 + total_elements.item() % 2
@@ -216,7 +216,7 @@ def __counter_sequence(
     tmp_counter += used_values
     __counter = tmp_counter & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  # 128-bit mask
 
-    return x_0.contiguous(), x_1.contiguous(), lshape, lslice
+    return x_0, x_1, lshape, lslice
 
 
 def get_state() -> Tuple[str, int, int, int, float]:
@@ -332,7 +332,7 @@ def normal(
 
     Examples
     --------
-    >>> ht.random.normal(ht.array([-1,2]), ht.array([0.5, 2]), (2,))
+    >>> ht.random.normal(ht.array([-1, 2]), ht.array([0.5, 2]), (2,))
     DNDarray([-1.4669,  1.6596], dtype=ht.float64, device=cpu:0, split=None)
     """
     if not (isinstance(mean, (float, int))) and not isinstance(mean, DNDarray):
@@ -348,23 +348,26 @@ def normal(
     return mean + std * standard_normal(shape, dtype, split, device, comm)
 
 
-def permutation(x: Union[int, DNDarray]) -> DNDarray:
+def permutation(x: Union[int, DNDarray], **kwargs) -> DNDarray:
     """
     Randomly permute a sequence, or return a permuted range. If ``x`` is a multi-dimensional array, it is only shuffled
     along its first index.
 
     Parameters
-    -----------
+    ----------
     x : int or DNDarray
         If ``x`` is an integer, call :func:`heat.random.randperm <heat.core.random.randperm>`. If ``x`` is an array,
         make a copy and shuffle the elements randomly.
 
+    kwargs : dict, optional
+        Additional keyword arguments passed to :func:`heat.random.randperm <heat.core.random.randperm>` if ``x`` is an integer.
+
     See Also
-    -----------
+    --------
     :func:`heat.random.randperm <heat.core.random.randperm>` for randomly permuted ranges.
 
     Examples
-    ----------
+    --------
     >>> ht.random.permutation(10)
     DNDarray([9, 1, 5, 4, 8, 2, 7, 6, 3, 0], dtype=ht.int64, device=cpu:0, split=None)
     >>> ht.random.permutation(ht.array([1, 4, 9, 12, 15]))
@@ -381,7 +384,7 @@ def permutation(x: Union[int, DNDarray]) -> DNDarray:
     Thus, the array containing these indices needs to fit into the memory of a single MPI-process.
     """
     if isinstance(x, int):
-        return randperm(x)
+        return randperm(x, **kwargs)
     if not isinstance(x, DNDarray):
         raise TypeError("x must be int or DNDarray")
 
@@ -434,7 +437,7 @@ def permutation(x: Union[int, DNDarray]) -> DNDarray:
 
 
 def rand(
-    *args: List[int],
+    *d: int,
     dtype: Type[datatype] = types.float32,
     split: Optional[int] = None,
     device: Optional[Device] = None,
@@ -446,7 +449,7 @@ def rand(
 
     Parameters
     ----------
-    d1,d2,…,dn : List[int,...]
+    *d : int, optional
         The dimensions of the returned array, should all be positive. If no argument is given a single random samples is
         generated.
     dtype : Type[datatype], optional
@@ -472,11 +475,11 @@ def rand(
     DNDarray([0.1921, 0.9635, 0.5047], dtype=ht.float32, device=cpu:0, split=None)
     """
     # if args are not set, generate a single sample
-    if not args:
+    if not d:
         shape = (1,)
     else:
         # ensure that the passed dimensions are positive integer-likes
-        shape = tuple(int(ele) for ele in args)
+        shape = tuple(int(ele) for ele in d)
     if any(ele <= 0 for ele in shape):
         raise ValueError("negative dimensions are not allowed")
 
@@ -523,7 +526,7 @@ def rand(
         )
         if split is None:
             x = x.resplit_(None)
-        if not args or shape == ():
+        if not d or shape == ():
             x = x.item()
         return x
 
@@ -563,7 +566,7 @@ def randint(
         Handle to the nodes holding distributed parts or copies of this array.
 
     Raises
-    -------
+    ------
     TypeError
         If one of low or high is not an int.
     ValueError
@@ -619,7 +622,6 @@ def randint(
             x_0, x_1 = __threefry32(x_0, x_1, seed=__seed)
         else:  # torch.int64
             x_0, x_1 = __threefry64(x_0, x_1, seed=__seed)
-
         # stack the resulting sequence and normalize to given range
         values = torch.stack([x_0, x_1], dim=1).flatten()[lslice].reshape(lshape)
         # ATTENTION: this is biased and known, bias-free rejection sampling is difficult to do in parallel
@@ -665,7 +667,7 @@ def random_integer(
 
 
 def randn(
-    *args: List[int],
+    *d: int,
     dtype: Type[datatype] = types.float32,
     split: Optional[int] = None,
     device: Optional[str] = None,
@@ -676,7 +678,7 @@ def randn(
 
     Parameters
     ----------
-    d1,d2,…,dn : List[int,...]
+    *d : int, optional
         The dimensions of the returned array, should be all positive.
     dtype : Type[datatype], optional
         The datatype of the returned values. Has to be one of :class:`~heat.core.types.float32` or
@@ -697,7 +699,7 @@ def randn(
         Accepts arguments for mean and standard deviation.
 
     Raises
-    -------
+    ------
     TypeError
         If one of ``d1`` to ``dn`` is not an integer.
     ValueError
@@ -716,7 +718,7 @@ def randn(
     if __rng == "Threefry":
         # use threefry RNG and the Kundu transform to generate normally distributed random numbers
         # generate uniformly distributed random numbers first
-        normal_tensor = rand(*args, dtype=dtype, split=split, device=device, comm=comm)
+        normal_tensor = rand(*d, dtype=dtype, split=split, device=device, comm=comm)
         # convert the the values to a normal distribution using the Kundu transform
         normal_tensor.larray = __kundu_transform(normal_tensor.larray)
 
@@ -724,11 +726,11 @@ def randn(
     else:
         # use batchparallel RNG and torch's generation of normally distributed random numbers
         # if args are not set, generate a single sample
-        if not args:
+        if not d:
             shape = (1,)
         else:
             # ensure that the passed dimensions are positive integer-likes
-            shape = tuple(int(ele) for ele in args)
+            shape = tuple(int(ele) for ele in d)
         if any(ele <= 0 for ele in shape):
             raise ValueError("negative dimensions are not allowed")
 
@@ -749,7 +751,7 @@ def randn(
         )
         if split is None:
             x = x.resplit_(None)
-        if not args or shape == ():
+        if not d or shape == ():
             x = x.item()
         return x
 
@@ -779,7 +781,7 @@ def randperm(
         Handle to the nodes holding distributed parts or copies of this array.
 
     Raises
-    -------
+    ------
     TypeError
         If ``n`` is not an integer.
 
@@ -798,7 +800,7 @@ def randperm(
     device = devices.sanitize_device(device)
     comm = communication.sanitize_comm(comm)
     perm = torch.randperm(n, dtype=dtype.torch_type(), device=device.torch_device)
-    if __rng != "Threefry":
+    if comm.Get_size() > 1 and __rng != "Threefry":
         comm.Bcast(perm, root=0)
 
     return factories.array(perm, dtype=dtype, device=device, split=split, comm=comm)
diff --git a/heat/core/relational.py b/heat/core/relational.py
index 940d4538df..19cd7646b8 100644
--- a/heat/core/relational.py
+++ b/heat/core/relational.py
@@ -16,6 +16,7 @@
 from . import types
 from . import sanitation
 from . import factories
+from . import devices
 
 __all__ = [
     "eq",
@@ -48,9 +49,9 @@ def eq(x, y) -> DNDarray:
         The second operand involved in the comparison
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.eq(x, 3.0)
     DNDarray([[False, False],
               [ True, False]], dtype=ht.bool, device=cpu:0, split=None)
@@ -97,10 +98,10 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo
         The second operand involved in the comparison
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
-    >>> ht.equal(x, ht.float32([[1, 2],[3, 4]]))
+    >>> x = ht.float32([[1, 2], [3, 4]])
+    >>> ht.equal(x, ht.float32([[1, 2], [3, 4]]))
     True
     >>> y = ht.float32([[2, 2], [2, 2]])
     >>> ht.equal(x, y)
@@ -144,7 +145,11 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo
                     target_map[: x.comm.rank + 1, y.split].sum(),
                 )
                 x = factories.array(
-                    x.larray[tuple(idx)], is_split=y.split, copy=False, comm=x.comm, device=x.device
+                    x.larray[tuple(idx)],
+                    is_split=y.split,
+                    copy=False,
+                    comm=x.comm,
+                    device=x.device,
                 )
         elif x.split is not None and y.split is None:
             if x.is_balanced(force_check=False):
@@ -157,7 +162,11 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo
                     target_map[: y.comm.rank + 1, x.split].sum(),
                 )
                 y = factories.array(
-                    y.larray[tuple(idx)], is_split=x.split, copy=False, comm=y.comm, device=y.device
+                    y.larray[tuple(idx)],
+                    is_split=x.split,
+                    copy=False,
+                    comm=y.comm,
+                    device=y.device,
                 )
         elif x.split != y.split:
             raise ValueError(
@@ -171,6 +180,9 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo
                 y = y.balance()
 
     result_type = types.result_type(x, y)
+    is_mps = x.larray.is_mps or y.larray.is_mps
+    if is_mps and result_type is types.float64:
+        result_type = types.float32
     x = x.astype(result_type)
     y = y.astype(result_type)
 
@@ -196,9 +208,9 @@ def ge(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr
        The second operand to be compared less than or equal to first operand
 
     Examples
-    -------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.ge(x, 3.0)
     DNDarray([[False, False],
               [ True,  True]], dtype=ht.bool, device=cpu:0, split=None)
@@ -245,9 +257,9 @@ def gt(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr
        The second operand to be compared less than first operand
 
     Examples
-    -------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.gt(x, 3.0)
     DNDarray([[False, False],
               [False,  True]], dtype=ht.bool, device=cpu:0, split=None)
@@ -294,9 +306,9 @@ def le(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr
        The second operand to be compared greater than or equal to first operand
 
     Examples
-    -------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.le(x, 3.0)
     DNDarray([[ True,  True],
               [ True, False]], dtype=ht.bool, device=cpu:0, split=None)
@@ -343,9 +355,9 @@ def lt(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr
         The second operand to be compared greater than first operand
 
     Examples
-    -------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.lt(x, 3.0)
     DNDarray([[ True,  True],
               [False, False]], dtype=ht.bool, device=cpu:0, split=None)
@@ -393,9 +405,9 @@ def ne(x, y) -> DNDarray:
         The second operand involved in the comparison
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
-    >>> x = ht.float32([[1, 2],[3, 4]])
+    >>> x = ht.float32([[1, 2], [3, 4]])
     >>> ht.ne(x, 3.0)
     DNDarray([[ True,  True],
               [False,  True]], dtype=ht.bool, device=cpu:0, split=None)
diff --git a/heat/core/rounding.py b/heat/core/rounding.py
index dcee642b4c..32d5620ab9 100644
--- a/heat/core/rounding.py
+++ b/heat/core/rounding.py
@@ -45,7 +45,7 @@ def abs(
         precision.
 
     Raises
-    -------
+    ------
     TypeError
         If dtype is not a heat type.
     """
@@ -143,7 +143,7 @@ def clip(x: DNDarray, min, max, out: Optional[DNDarray] = None) -> DNDarray:
         the right shape to hold the output. Its type is preserved.
 
     Raises
-    -------
+    ------
     ValueError
         if either min or max is not set
     """
@@ -154,7 +154,13 @@ def clip(x: DNDarray, min, max, out: Optional[DNDarray] = None) -> DNDarray:
 
     if out is None:
         return dndarray.DNDarray(
-            x.larray.clamp(min, max), x.shape, x.dtype, x.split, x.device, x.comm, x.balanced
+            x.larray.clamp(min, max),
+            x.shape,
+            x.dtype,
+            x.split,
+            x.device,
+            x.comm,
+            x.balanced,
         )
 
     sanitation.sanitize_out(out, x.gshape, x.split, x.device)
@@ -237,7 +243,7 @@ def modf(x: DNDarray, out: Optional[Tuple[DNDarray, DNDarray]] = None) -> Tuple[
         If not provided or ``None``, a freshly-allocated array is returned.
 
     Raises
-    -------
+    ------
     TypeError
         if ``x`` is not a :class:`~heat.core.dndarray.DNDarray`
     TypeError
@@ -305,7 +311,7 @@ def round(
         precision.
 
     Raises
-    -------
+    ------
     TypeError
         if dtype is not a heat data type
 
@@ -361,7 +367,7 @@ def sgn(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
     >>> a = ht.array([-1, -0.5, 0, 0.5, 1])
     >>> ht.sign(a)
     DNDarray([-1., -1.,  0.,  1.,  1.], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sgn(ht.array([5-2j, 3+4j]))
+    >>> ht.sgn(ht.array([5 - 2j, 3 + 4j]))
     DNDarray([(0.9284766912460327-0.3713906705379486j),  (0.6000000238418579+0.800000011920929j)], dtype=ht.complex64, device=cpu:0, split=None)
     """
     return _operations.__local_op(torch.sgn, x, out)
@@ -388,7 +394,7 @@ def sign(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
     >>> a = ht.array([-1, -0.5, 0, 0.5, 1])
     >>> ht.sign(a)
     DNDarray([-1., -1.,  0.,  1.,  1.], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.sign(ht.array([5-2j, 3+4j]))
+    >>> ht.sign(ht.array([5 - 2j, 3 + 4j]))
     DNDarray([(1+0j), (1+0j)], dtype=ht.complex64, device=cpu:0, split=None)
     """
     # special case for complex values
diff --git a/heat/core/sanitation.py b/heat/core/sanitation.py
index a820c8b92e..cfebfb61dc 100644
--- a/heat/core/sanitation.py
+++ b/heat/core/sanitation.py
@@ -58,7 +58,7 @@ def sanitize_distribution(
         When the split-axes or sizes along the split-axis do not match.
 
     See Also
-    ---------
+    --------
     :func:`~heat.core.dndarray.create_lshape_map`
         Function to create the lshape_map.
     """
@@ -139,8 +139,7 @@ def sanitize_distribution(
             )
         elif not (
             # False
-            target_balanced
-            and arg.is_balanced(force_check=False)
+            target_balanced and arg.is_balanced(force_check=False)
         ):  # Split axes are the same and atleast one is not balanced
             current_map = arg.lshape_map
             out_map = current_map.clone()
@@ -174,12 +173,29 @@ def sanitize_in(x: Any):
         raise TypeError(f"Input must be a DNDarray, is {type(x)}")
 
 
+def sanitize_in_nd_realfloating(input: Any, inputname: str, allowed_ns: List[int]) -> None:
+    """
+    Verify that input object ``input`` is a real floating point ``DNDarray`` with number of dimensions contained in ``allowed_ns``.
+    The argument ``inputname`` is used for error messages.
+    """
+    if not isinstance(input, DNDarray):
+        raise TypeError(f"Argument {inputname} needs to be a DNDarray but is {type(input)}.")
+    if input.ndim not in allowed_ns:
+        raise ValueError(
+            f"Argument {inputname} needs to be a {allowed_ns}-dimensional, but is {input.ndim}-dimensional."
+        )
+    if not types.heat_type_is_realfloating(input.dtype):
+        raise TypeError(
+            f"Argument {inputname} needs to be a DNDarray with datatype float32 or float64, but data type is {input.dtype}."
+        )
+
+
 def sanitize_infinity(x: Union[DNDarray, torch.Tensor]) -> Union[int, float]:
     """
     Returns largest possible value for the ``dtype`` of the input array.
 
     Parameters
-    -----------
+    ----------
     x: Union[DNDarray, torch.Tensor]
         Input object.
     """
diff --git a/heat/core/signal.py b/heat/core/signal.py
index 82cba98566..4f02482ae0 100644
--- a/heat/core/signal.py
+++ b/heat/core/signal.py
@@ -5,15 +5,15 @@
 
 from .communication import MPI
 from .dndarray import DNDarray
-from .types import promote_types
+from .types import promote_types, float32, float64
 from .manipulations import pad, flip
-from .factories import array, zeros
+from .factories import array, zeros, arange
 import torch.nn.functional as fc
 
 __all__ = ["convolve"]
 
 
-def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
+def convolve(a: DNDarray, v: DNDarray, mode: str = "full", stride: int = 1) -> DNDarray:
     """
     Returns the discrete, linear convolution of two one-dimensional `DNDarray`s or scalars.
     Unlike `numpy.signal.convolve`, if ``a`` and/or ``v`` have more than one dimension, batch-convolution along the last dimension will be attempted. See `Examples` below.
@@ -30,7 +30,7 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
         Can be 'full', 'valid', or 'same'. Default is 'full'.
         'full':
           Returns the convolution at
-          each point of overlap, with an output shape of (N+M-1,). At
+          each point of overlap, with a length of '(N+M-2)//stride+1'. At
           the end-points of the convolution, the signals do not overlap
           completely, and boundary effects may be seen.
         'same':
@@ -38,34 +38,43 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
           effects are still visible. This mode is not supported for
           even-sized filter weights
         'valid':
-          Mode 'valid' returns output of length 'N-M+1'. The
+          Mode 'valid' returns output of length '(N-M)//stride+1'. The
           convolution product is only given for points where the signals
           overlap completely. Values outside the signal boundary have no
           effect.
+    stride : int
+        Stride of the convolution. Must be a positive integer. Default is 1.
+        Stride must be 1 for mode 'same'.
 
     Examples
     --------
     Note how the convolution operator flips the second array
     before "sliding" the two across one another:
 
-    >>> a = ht.ones(10)
+    >>> a = ht.ones(5)
     >>> v = ht.arange(3).astype(ht.float)
-    >>> ht.convolve(a, v, mode='full')
+    >>> ht.convolve(a, v, mode="full")
     DNDarray([0., 1., 3., 3., 3., 3., 2.])
-    >>> ht.convolve(a, v, mode='same')
+    >>> ht.convolve(a, v, mode="same")
     DNDarray([1., 3., 3., 3., 3.])
-    >>> ht.convolve(a, v, mode='valid')
+    >>> ht.convolve(a, v, mode="valid")
     DNDarray([3., 3., 3.])
-    >>> a = ht.ones(10, split = 0)
-    >>> v = ht.arange(3, split = 0).astype(ht.float)
-    >>> ht.convolve(a, v, mode='valid')
+    >>> ht.convolve(a, v, stride=2)
+    DNDarray([0., 3., 3., 2.])
+    >>> ht.convolve(a, v, mode="valid", stride=2)
+    DNDarray([3., 3.])
+
+    >>> a = ht.ones(10, split=0)
+    >>> v = ht.arange(3, split=0).astype(ht.float)
+    >>> ht.convolve(a, v, mode="valid")
     DNDarray([3., 3., 3., 3., 3., 3., 3., 3.])
 
     [0/3] DNDarray([3., 3., 3.])
     [1/3] DNDarray([3., 3., 3.])
     [2/3] DNDarray([3., 3.])
-    >>> a = ht.ones(10, split = 0)
-    >>> v = ht.arange(3, split = 0)
+
+    >>> a = ht.ones(10, split=0)
+    >>> v = ht.arange(3, split=0)
     >>> ht.convolve(a, v)
     DNDarray([0., 1., 3., 3., 3., 3., 3., 3., 3., 3., 3., 2.], dtype=ht.float32, device=cpu:0, split=0)
 
@@ -73,10 +82,10 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     [1/3] DNDarray([3., 3., 3., 3.])
     [2/3] DNDarray([3., 3., 3., 2.])
 
-    >>> a = ht.arange(50, dtype = ht.float64, split=0)
-    >>> a = a.reshape(10, 5) # 10 signals of length 5
+    >>> a = ht.arange(50, dtype=ht.float64, split=0)
+    >>> a = a.reshape(10, 5)  # 10 signals of length 5
     >>> v = ht.arange(3)
-    >>> ht.convolve(a, v) # batch processing: 10 signals convolved with filter v
+    >>> ht.convolve(a, v)  # batch processing: 10 signals convolved with filter v
     DNDarray([[  0.,   0.,   1.,   4.,   7.,  10.,   8.],
           [  0.,   5.,  16.,  19.,  22.,  25.,  18.],
           [  0.,  10.,  31.,  34.,  37.,  40.,  28.],
@@ -88,8 +97,8 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
           [  0.,  40., 121., 124., 127., 130.,  88.],
           [  0.,  45., 136., 139., 142., 145.,  98.]], dtype=ht.float64, device=cpu:0, split=0)
 
-    >>> v = ht.random.randint(0, 3, (10, 3), split=0) # 10 filters of length 3
-    >>> ht.convolve(a, v) # batch processing: 10 signals convolved with 10 filters
+    >>> v = ht.random.randint(0, 3, (10, 3), split=0)  # 10 filters of length 3
+    >>> ht.convolve(a, v)  # batch processing: 10 signals convolved with 10 filters
     DNDarray([[  0.,   0.,   2.,   4.,   6.,   8.,   0.],
             [  5.,   6.,   7.,   8.,   9.,   0.,   0.],
             [ 20.,  42.,  56.,  61.,  66.,  41.,  14.],
@@ -116,6 +125,10 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
         except TypeError:
             raise TypeError(f"non-supported type for filter: {type(v)}")
     promoted_type = promote_types(a.dtype, v.dtype)
+    if a.larray.is_mps and promoted_type == float64:
+        # cannot cast to float64 on MPS
+        promoted_type = float32
+
     a = a.astype(promoted_type)
     v = v.astype(promoted_type)
 
@@ -152,6 +165,12 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
             f"1-D convolution only supported for 1-dimensional signal and kernel. Signal: {a.shape}, Filter: {v.shape}"
         )
 
+    # check mode and stride for value errors
+    if stride < 1:
+        raise ValueError("Stride must be at positive integer")
+    if stride > 1 and mode == "same":
+        raise ValueError("Stride must be 1 for mode 'same'")
+
     if mode == "same" and v.shape[-1] % 2 == 0:
         raise ValueError("Mode 'same' cannot be used with even-sized kernel")
     if not v.is_balanced():
@@ -160,20 +179,23 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     # calculate pad size according to mode
     if mode == "full":
         pad_size = v.shape[-1] - 1
-        gshape = v.shape[-1] + a.shape[-1] - 1
     elif mode == "same":
         pad_size = v.shape[-1] // 2
-        gshape = a.shape[-1]
     elif mode == "valid":
         pad_size = 0
-        gshape = a.shape[-1] - v.shape[-1] + 1
     else:
         raise ValueError(f"Supported modes are 'full', 'valid', 'same', got {mode}")
 
+    gshape = (a.shape[-1] + 2 * pad_size - v.shape[-1]) // stride + 1
+
+    if v.is_distributed() and stride > 1:
+        gshape_stride_1 = a.shape[-1] + 2 * pad_size - v.shape[-1] + 1
+
     if batch_processing:
         # all operations are local torch operations, only the last dimension is convolved
         local_a = a.larray
         local_v = v.larray
+
         # flip filter for convolution, as Pytorch conv1d computes correlations
         local_v = torch.flip(local_v, [-1])
         local_batch_dims = tuple(local_a.shape[:-1])
@@ -204,7 +226,9 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
 
         # apply torch convolution operator if local signal isn't empty
         if torch.prod(torch.tensor(local_a.shape, device=local_a.device)) > 0:
-            local_convolved = fc.conv1d(local_a, local_v, padding=pad_size, groups=channels)
+            local_convolved = fc.conv1d(
+                local_a, local_v, padding=pad_size, groups=channels, stride=stride
+            )
         else:
             empty_shape = tuple(local_a.shape[:-1] + (gshape,))
             local_convolved = torch.empty(empty_shape, dtype=local_a.dtype, device=local_a.device)
@@ -232,6 +256,23 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
         a.get_halo(halo_size)
         # apply halos to local array
         signal = a.array_with_halos
+
+        # shift signal based on global kernel starts for any rank but first
+        if stride > 1 and not v.is_distributed():
+            if a.comm.rank == 0:
+                local_index = 0
+            else:
+                local_index = torch.sum(a.lshape_map[: a.comm.rank, 0]).item() - halo_size
+                local_index = local_index % stride
+
+                if local_index != 0:
+                    local_index = stride - local_index
+
+                # even kernels can produces doubles
+                if v.shape[-1] % 2 == 0 and local_index == 0:
+                    local_index = stride
+
+            signal = signal[local_index:]
     else:
         signal = a.larray
 
@@ -262,11 +303,15 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     if v.is_distributed():
         size = v.comm.size
 
+        # any stride is a subset of stride 1
+        if stride > 1:
+            gshape = gshape_stride_1
+
         for r in range(size):
             rec_v = t_v.clone()
             v.comm.Bcast(rec_v, root=r)
             t_v1 = rec_v.reshape(1, 1, rec_v.shape[0])
-            local_signal_filtered = fc.conv1d(signal, t_v1)
+            local_signal_filtered = fc.conv1d(signal, t_v1, stride=1)
             # unpack 3D result into 1D
             local_signal_filtered = local_signal_filtered[0, 0, :]
 
@@ -294,21 +339,29 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
                 )
             if r != size - 1:
                 start_idx += v.lshape_map[r + 1][0].item()
+
+        # any stride is a subset of arrays of stride 1
+        if stride > 1:
+            signal_filtered = signal_filtered[::stride]
+
         return signal_filtered
 
     else:
         # apply torch convolution operator
-        signal_filtered = fc.conv1d(signal, weight)
+        if signal.shape[-1] >= weight.shape[-1]:
+            signal_filtered = fc.conv1d(signal, weight, stride=stride)
 
-        # unpack 3D result into 1D
-        signal_filtered = signal_filtered[0, 0, :]
+            # unpack 3D result into 1D
+            signal_filtered = signal_filtered[0, 0, :]
+        else:
+            signal_filtered = torch.tensor([], device=str(signal.device))
 
         # if kernel shape along split axis is even we need to get rid of duplicated values
-        if a.comm.rank != 0 and v.shape[0] % 2 == 0:
+        if a.comm.rank != 0 and v.shape[0] % 2 == 0 and stride == 1:
             signal_filtered = signal_filtered[1:]
 
         return DNDarray(
-            signal_filtered.contiguous(),
+            signal_filtered,
             (gshape,),
             signal_filtered.dtype,
             a.split,
diff --git a/heat/core/statistics.py b/heat/core/statistics.py
index 29c557863d..57a9bbebc1 100644
--- a/heat/core/statistics.py
+++ b/heat/core/statistics.py
@@ -61,6 +61,8 @@ def argmax(
         By default, the index is into the flattened array, otherwise along the specified axis.
     out : DNDarray, optional.
         If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
+    **kwargs
+        Extra keyword arguments
 
     Examples
     --------
@@ -98,7 +100,13 @@ def local_argmax(*args, **kwargs):
             offset, _, _ = x.comm.chunk(shape, x.split)
             indices += torch.tensor(offset, dtype=indices.dtype)
 
-        return torch.cat([maxima.double(), indices.double()])
+        if maxima.is_mps:
+            # MPS framework doesn't support float64
+            out = torch.cat([maxima.float(), indices.float()])
+        else:
+            out = torch.cat([maxima.double(), indices.double()])
+
+        return out
 
     # axis sanitation
     if axis is not None and not isinstance(axis, int):
@@ -132,6 +140,8 @@ def argmin(
         By default, the index is into the flattened array, otherwise along the specified axis.
     out : DNDarray, optional
         Issue #100 If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
+    **kwargs
+        Extra keyword arguments
 
     Examples
     --------
@@ -156,21 +166,27 @@ def local_argmin(*args, **kwargs):
         # argmin will be the flattened index, computed standalone and the actual minimum value obtain separately
         if len(args) <= 1 and axis < 0:
             indices = torch.argmin(*args, **kwargs).reshape(1)
-            minimums = args[0].flatten()[indices]
+            minima = args[0].flatten()[indices]
 
             # artificially flatten the input tensor shape to correct the offset computation
             axis = 0
             shape = [np.prod(shape)]
         # usual case where indices and minimum values are both returned. Axis is not equal to None
         else:
-            minimums, indices = torch.min(*args, **kwargs)
+            minima, indices = torch.min(*args, **kwargs)
 
         # add offset of data chunks if reduction is computed across split axis
         if axis == x.split:
             offset, _, _ = x.comm.chunk(shape, x.split)
             indices += torch.tensor(offset, dtype=indices.dtype)
 
-        return torch.cat([minimums.double(), indices.double()])
+        if minima.is_mps:
+            # MPS framework doesn't support float64
+            out = torch.cat([minima.float(), indices.float()])
+        else:
+            out = torch.cat([minima.double(), indices.double()])
+
+        return out
 
     # axis sanitation
     if axis is not None and not isinstance(axis, int):
@@ -235,17 +251,17 @@ def average(
 
     Examples
     --------
-    >>> data = ht.arange(1,5, dtype=float)
+    >>> data = ht.arange(1, 5, dtype=float)
     >>> data
     DNDarray([1., 2., 3., 4.], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.average(data)
     DNDarray(2.5000, dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.average(ht.arange(1,11, dtype=float), weights=ht.arange(10,0,-1))
+    >>> ht.average(ht.arange(1, 11, dtype=float), weights=ht.arange(10, 0, -1))
     DNDarray([4.], dtype=ht.float64, device=cpu:0, split=None)
     >>> data = ht.array([[0, 1],
                          [2, 3],
                         [4, 5]], dtype=float, split=1)
-    >>> weights = ht.array([1./4, 3./4])
+    >>> weights = ht.array([1.0 / 4, 3.0 / 4])
     >>> ht.average(data, axis=1, weights=weights)
     DNDarray([0.7500, 2.7500, 4.7500], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.average(data, weights=weights)
@@ -581,11 +597,11 @@ def digitize(x: DNDarray, bins: Union[DNDarray, torch.Tensor], right: bool = Fal
 
     Examples
     --------
-    >>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.])
+    >>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.0])
     >>> bins = ht.array([0, 5, 10, 15, 20])
-    >>> ht.digitize(x,bins,right=True)
+    >>> ht.digitize(x, bins, right=True)
     DNDarray([1, 2, 3, 4, 4], dtype=ht.int64, device=cpu:0, split=None)
-    >>> ht.digitize(x,bins,right=False)
+    >>> ht.digitize(x, bins, right=False)
     DNDarray([1, 3, 3, 4, 5], dtype=ht.int64, device=cpu:0, split=None)
     """
     if isinstance(bins, DNDarray):
@@ -642,7 +658,7 @@ def histc(
 
     Examples
     --------
-    >>> ht.histc(ht.array([1., 2, 1]), bins=4, min=0, max=3)
+    >>> ht.histc(ht.array([1.0, 2, 1]), bins=4, min=0, max=3)
     DNDarray([0., 2., 1., 0.], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.histc(ht.arange(10, dtype=ht.float64, split=0), bins=10)
     DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float64, device=cpu:0, split=None)
@@ -659,7 +675,7 @@ def histc(
         out=out._DNDarray__array if out is not None and input.split is None else None,
     )
 
-    if input.split is None:
+    if not input.is_distributed():
         if out is None:
             out = DNDarray(
                 hist,
@@ -855,7 +871,7 @@ def maximum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar
     imaginary parts being ``NaN``. The net effect is that NaNs are propagated.
 
     Parameters
-    -----------
+    ----------
     x1 : DNDarray
             The first array containing the elements to be compared.
     x2 : DNDarray
@@ -865,7 +881,7 @@ def maximum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar
         If not provided or ``None``, a freshly-allocated array is returned.
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
     >>> a = ht.random.randn(3, 4)
     >>> a
@@ -920,12 +936,12 @@ def mean(x: DNDarray, axis: Optional[Union[int, Tuple[int, ...]]] = None) -> DND
 
     Examples
     --------
-    >>> a = ht.random.randn(1,3)
+    >>> a = ht.random.randn(1, 3)
     >>> a
     DNDarray([[-0.1164,  1.0446, -0.4093]], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.mean(a)
     DNDarray(0.1730, dtype=ht.float32, device=cpu:0, split=None)
-    >>> a = ht.random.randn(4,4)
+    >>> a = ht.random.randn(4, 4)
     >>> a
     DNDarray([[-1.0585,  0.7541, -1.1011,  0.5009],
               [-1.3575,  0.3344,  0.4506,  0.7379],
@@ -935,13 +951,13 @@ def mean(x: DNDarray, axis: Optional[Union[int, Tuple[int, ...]]] = None) -> DND
     DNDarray([-0.2262,  0.0413, -0.8328, -0.2619], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.mean(a, 0)
     DNDarray([-0.5392, -0.1655, -0.7539,  0.1791], dtype=ht.float32, device=cpu:0, split=None)
-    >>> a = ht.random.randn(4,4)
+    >>> a = ht.random.randn(4, 4)
     >>> a
     DNDarray([[-0.1441,  0.5016,  0.8907,  0.6318],
               [-1.1690, -1.2657,  1.4840, -0.1014],
               [ 0.4133,  1.4168,  1.3499,  1.0340],
               [-0.9236, -0.7535, -0.2466, -0.9703]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.mean(a, (0,1))
+    >>> ht.mean(a, (0, 1))
     DNDarray(0.1342, dtype=ht.float32, device=cpu:0, split=None)
     """
 
@@ -984,7 +1000,7 @@ def reduce_means_elementwise(output_shape_i: torch.Tensor) -> DNDarray:
     # ----------------------------------------------------------------------------------------------
     # sanitize dtype
     if types.heat_type_is_exact(x.dtype):
-        if x.dtype is types.int64:
+        if x.dtype is types.int64 and not x.larray.is_mps:
             x = x.astype(types.float64)
         else:
             x = x.astype(types.float32)
@@ -1067,7 +1083,11 @@ def median(
 
 
 DNDarray.median: Callable[[DNDarray, int, bool, bool, float], DNDarray] = (
-    lambda x, axis=None, keepdims=False, sketched=False, sketch_size=1.0 / MPI.COMM_WORLD.size: median(
+    lambda x,
+    axis=None,
+    keepdims=False,
+    sketched=False,
+    sketch_size=1.0 / MPI.COMM_WORLD.size: median(
         x, axis, keepdims, sketched=sketched, sketch_size=sketch_size
     )
 )
@@ -1217,7 +1237,7 @@ def minimum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar
     imaginary parts being ``NaN``. The net effect is that NaNs are propagated.
 
     Parameters
-    -----------
+    ----------
     x1 : DNDarray
         The first array containing the elements to be compared.
     x2 : DNDarray
@@ -1227,31 +1247,31 @@ def minimum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar
         If not provided or ``None``, a freshly-allocated array is returned.
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
-    >>> a = ht.random.randn(3,4)
+    >>> a = ht.random.randn(3, 4)
     >>> a
     DNDarray([[-0.5462,  0.0079,  1.2828,  1.4980],
               [ 0.6503, -1.1069,  1.2131,  1.4003],
               [-0.3203, -0.2318,  1.0388,  0.4439]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> b = ht.random.randn(3,4)
+    >>> b = ht.random.randn(3, 4)
     >>> b
     DNDarray([[ 1.8505,  2.3055, -0.2825, -1.4718],
               [-0.3684,  1.6866, -0.8570, -0.4779],
               [ 1.0532,  0.3775, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.minimum(a,b)
+    >>> ht.minimum(a, b)
     DNDarray([[-0.5462,  0.0079, -0.2825, -1.4718],
               [-0.3684, -1.1069, -0.8570, -0.4779],
               [-0.3203, -0.2318, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> c = ht.random.randn(1,4)
+    >>> c = ht.random.randn(1, 4)
     >>> c
     DNDarray([[-1.4358,  1.2914, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> ht.minimum(a,c)
+    >>> ht.minimum(a, c)
     DNDarray([[-1.4358,  0.0079, -0.6042, -1.4009],
               [-1.4358, -1.1069, -0.6042, -1.4009],
               [-1.4358, -0.2318, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None)
-    >>> d = ht.random.randn(3,4,5)
-    >>> ht.minimum(a,d)
+    >>> d = ht.random.randn(3, 4, 5)
+    >>> ht.minimum(a, d)
     ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
     """
     return _operations.__binary_op(torch.min, x1, x2, out)
@@ -1597,7 +1617,10 @@ def _create_sketch(
         output_shape = perc_size + output_shape
 
     # output data type must be float
-    output_dtype = types.float32 if x.larray.element_size() == 4 else types.float64
+    if x.larray.element_size() == 4 or x.larray.is_mps:
+        output_dtype = types.float32
+    else:
+        output_dtype = types.float64
     if out is not None:
         sanitation.sanitize_out(out, output_shape, output_split, x.device, x.comm)
         if output_dtype != out.dtype:
@@ -1779,6 +1802,8 @@ def std(
         Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N
         represents the number of elements. If ``ddof=1``, the Bessel correction will be applied.
         Setting ``ddof>1`` raises a ``NotImplementedError``.
+    **kwargs
+        Extra keyword arguments
 
     Examples
     --------
@@ -1787,7 +1812,7 @@ def std(
     DNDarray([[ 0.5714,  0.0048, -0.2942]], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.std(a)
     DNDarray(0.3590, dtype=ht.float32, device=cpu:0, split=None)
-    >>> a = ht.random.randn(4,4)
+    >>> a = ht.random.randn(4, 4)
     >>> a
     DNDarray([[ 0.8488,  1.2225,  1.2498, -1.4592],
               [-0.5820, -0.3928,  0.1509, -0.0174],
@@ -1800,7 +1825,7 @@ def std(
     """
     # sanitize dtype
     if types.heat_type_is_exact(x.dtype):
-        if x.dtype is types.int64:
+        if x.dtype is types.int64 and not x.larray.is_mps:
             x = x.astype(types.float64)
         else:
             x = x.astype(types.float32)
@@ -1919,6 +1944,9 @@ def var(
         Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N
         represents the number of elements. If ``ddof=1``, the Bessel correction will be applied.
         Setting ``ddof>1`` raises a ``NotImplementedError``.
+    **kwargs
+        Extra keyword arguments
+
 
     Notes
     -----
@@ -1938,14 +1966,14 @@ def var(
 
     Examples
     --------
-    >>> a = ht.random.randn(1,3)
+    >>> a = ht.random.randn(1, 3)
     >>> a
     DNDarray([[-2.3589, -0.2073,  0.8806]], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.var(a)
     DNDarray(1.8119, dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.var(a, ddof=1)
     DNDarray(2.7179, dtype=ht.float32, device=cpu:0, split=None)
-    >>> a = ht.random.randn(4,4)
+    >>> a = ht.random.randn(4, 4)
     >>> a
     DNDarray([[-0.8523, -1.4982, -0.5848, -0.2554],
               [ 0.8458, -0.3125, -0.2430,  1.9016],
diff --git a/heat/core/stride_tricks.py b/heat/core/stride_tricks.py
index 22e9fff694..a0d7ce0a15 100644
--- a/heat/core/stride_tricks.py
+++ b/heat/core/stride_tricks.py
@@ -22,20 +22,27 @@ def broadcast_shape(shape_a: Tuple[int, ...], shape_b: Tuple[int, ...]) -> Tuple
         Shape of second operand
 
     Raises
-    -------
+    ------
     ValueError
         If the two shapes cannot be broadcast.
 
     Examples
     --------
     >>> import heat as ht
-    >>> ht.core.stride_tricks.broadcast_shape((5,4),(4,))
+    >>> ht.core.stride_tricks.broadcast_shape((5, 4), (4,))
     (5, 4)
-    >>> ht.core.stride_tricks.broadcast_shape((1,100,1),(10,1,5))
+    >>> ht.core.stride_tricks.broadcast_shape((1, 100, 1), (10, 1, 5))
     (10, 100, 5)
-    >>> ht.core.stride_tricks.broadcast_shape((8,1,6,1),(7,1,5,))
+    >>> ht.core.stride_tricks.broadcast_shape(
+    ...     (8, 1, 6, 1),
+    ...     (
+    ...         7,
+    ...         1,
+    ...         5,
+    ...     ),
+    ... )
     (8,7,6,5))
-    >>> ht.core.stride_tricks.broadcast_shape((2,1),(8,4,3))
+    >>> ht.core.stride_tricks.broadcast_shape((2, 1), (8, 4, 3))
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "heat/core/stride_tricks.py", line 42, in broadcast_shape
@@ -69,20 +76,27 @@ def broadcast_shapes(*shapes: Tuple[int, ...]) -> Tuple[int, ...]:
         The broadcast output shape.
 
     Raises
-    -------
+    ------
     ValueError
         If the shapes cannot be broadcast.
 
     Examples
     --------
     >>> import heat as ht
-    >>> ht.broadcast_shapes((5,4),(4,))
+    >>> ht.broadcast_shapes((5, 4), (4,))
     (5, 4)
-    >>> ht.broadcast_shapes((1,100,1),(10,1,5))
+    >>> ht.broadcast_shapes((1, 100, 1), (10, 1, 5))
     (10, 100, 5)
-    >>> ht.broadcast_shapes((8,1,6,1),(7,1,5,))
+    >>> ht.broadcast_shapes(
+    ...     (8, 1, 6, 1),
+    ...     (
+    ...         7,
+    ...         1,
+    ...         5,
+    ...     ),
+    ... )
     (8,7,6,5))
-    >>> ht.broadcast_shapes((2,1),(8,4,3))
+    >>> ht.broadcast_shapes((2, 1), (8, 4, 3))
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "heat/core/stride_tricks.py", line 100, in broadcast_shapes
@@ -114,18 +128,18 @@ def sanitize_axis(
         The axis to be sanitized
 
     Raises
-    -------
+    ------
     ValueError
         if the axis cannot be sanitized, i.e. out of bounds.
     TypeError
         if the axis is not integral.
 
     Examples
-    -------
+    --------
     >>> import heat as ht
-    >>> ht.core.stride_tricks.sanitize_axis((5,4,4),1)
+    >>> ht.core.stride_tricks.sanitize_axis((5, 4, 4), 1)
     1
-    >>> ht.core.stride_tricks.sanitize_axis((5,4,4),-1)
+    >>> ht.core.stride_tricks.sanitize_axis((5, 4, 4), -1)
     2
     >>> ht.core.stride_tricks.sanitize_axis((5, 4), (1,))
     (1,)
@@ -178,7 +192,7 @@ def sanitize_shape(shape: Union[int, Tuple[int, ...]], lval: int = 0) -> Tuple[i
         Lowest legal value
 
     Raises
-    -------
+    ------
     ValueError
         If the shape contains illegal values, e.g. negative numbers.
     TypeError
diff --git a/heat/core/tests/test_arithmetics.py b/heat/core/tests/test_arithmetics.py
index 8d01c82358..8b8a8a902d 100644
--- a/heat/core/tests/test_arithmetics.py
+++ b/heat/core/tests/test_arithmetics.py
@@ -1,5 +1,6 @@
 import operator
 import math
+import platform
 
 import heat as ht
 import numpy as np
@@ -273,10 +274,11 @@ def test_add_(self):
                 a += b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a += b
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a += b
 
     def test_bitwise_and(self):
         int_result = ht.array([[0, 2], [2, 0]])
@@ -1441,7 +1443,11 @@ def test_cumsum_(self):
         a_string = self.a_string  # reset
 
     def test_diff(self):
-        ht_array = ht.random.rand(20, 20, 20, split=None)
+        if self.is_mps:
+            dtype = ht.float32
+        else:
+            dtype = ht.float64
+        ht_array = ht.random.rand(20, 20, 20, split=None, dtype=dtype)
         arb_slice = [0] * 3
         for dim in range(0, 3):  # loop over 3 dimensions
             arb_slice[dim] = slice(None)
@@ -1469,11 +1475,14 @@ def test_diff(self):
                         ht_diff_pend = ht.diff(lp_array, n=nl, axis=ax, prepend=0, append=ht_append)
                         np_append = np.ones(append_shape, dtype=lp_array.larray.cpu().numpy().dtype)
                         np_diff_pend = ht.array(
-                            np.diff(np_array, n=nl, axis=ax, prepend=0, append=np_append)
+                            np.diff(np_array, n=nl, axis=ax, prepend=0, append=np_append).astype(
+                                lp_array.larray.cpu().numpy().dtype
+                            ),
+                            dtype=dtype,
                         )
                         self.assertTrue(ht.equal(ht_diff_pend, np_diff_pend))
                         self.assertEqual(ht_diff_pend.split, sp)
-                        self.assertEqual(ht_diff_pend.dtype, ht.float64)
+                        self.assertEqual(ht_diff_pend.dtype, dtype)
 
         np_array = ht_array.numpy()
         ht_diff = ht.diff(ht_array, n=2)
@@ -1482,7 +1491,7 @@ def test_diff(self):
         self.assertEqual(ht_diff.split, None)
         self.assertEqual(ht_diff.dtype, ht_array.dtype)
 
-        ht_array = ht.random.rand(20, 20, 20, split=1, dtype=ht.float64)
+        ht_array = ht.random.rand(20, 20, 20, split=1, dtype=dtype)
         np_array = ht_array.copy().numpy()
         ht_diff = ht.diff(ht_array, n=2)
         np_diff = ht.array(np.diff(np_array, n=2))
@@ -1762,10 +1771,12 @@ def test_div_(self):
                 a /= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a /= b
+        # MPS does not support float64
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a /= b
 
     def test_divmod(self):
         # basic tests as floor_device and mod are tested separately
@@ -2058,10 +2069,12 @@ def test_floordiv_(self):
                 a //= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a //= b
+        # MPS does not support float64
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a //= b
 
     def test_fmod(self):
         result = ht.array([[1.0, 0.0], [1.0, 0.0]])
@@ -2253,10 +2266,12 @@ def test_fmod_(self):
                 a.fmod_(b)
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a.fmod_(b)
+        # MPS does not support float64
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a.fmod_(b)
 
     def test_gcd(self):
         a = ht.array([5, 10, 15])
@@ -2426,26 +2441,28 @@ def test_gcd_(self):
                 a.gcd_(b)
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32) * 6
-        b = ht.ones(3, dtype=ht.float64) * 4
-        with self.assertRaises(TypeError):
-            a.gcd_(b)
+        # MPS does not support float64
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32) * 6
+            b = ht.ones(3, dtype=ht.float64) * 4
+            with self.assertRaises(TypeError):
+                a.gcd_(b)
 
     def test_hypot(self):
         a = ht.array([2.0])
         b = ht.array([1.0, 3.0, 5.0])
         gt = ht.array([5, 13, 29])
-        result = (ht.hypot(a, b) ** 2).astype(ht.int64)
-
-        self.assertTrue(ht.equal(gt, result))
-        self.assertEqual(result.dtype, ht.int64)
+        result = ht.hypot(a, b) ** 2
+        self.assertTrue(ht.allclose(gt, result))
 
         with self.assertRaises(TypeError):
             ht.hypot(a)
         with self.assertRaises(TypeError):
             ht.hypot("a", "b")
-        with self.assertRaises(TypeError):
-            ht.hypot(a.astype(ht.int32), b.astype(ht.int32))
+        if a.device.torch_device.startswith("cpu"):
+            # torch.hypot does not support Int datatypes on CPU
+            with self.assertRaises(TypeError):
+                ht.hypot(a.astype(ht.int32), b.astype(ht.int32))
 
     def test_hypot_(self):
         # Copies of class variables for the in-place operations
@@ -2589,10 +2606,11 @@ def test_hypot_(self):
                 a.hypot_(b)
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a.hypot_(b)
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a.hypot_(b)
 
     def test_invert(self):
         int8_tensor = ht.array([[0, 1], [2, -2]], dtype=ht.int8)
@@ -2853,10 +2871,11 @@ def test_lcm_(self):
                 a.lcm_(b)
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32) * 2
-        b = ht.ones(3, dtype=ht.float64) * 3
-        with self.assertRaises(TypeError):
-            a.lcm_(b)
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32) * 2
+            b = ht.ones(3, dtype=ht.float64) * 3
+            with self.assertRaises(TypeError):
+                a.lcm_(b)
 
     def test_left_shift(self):
         int_tensor = ht.array([[0, 1], [2, 3]])
@@ -3239,10 +3258,11 @@ def test_mul_(self):
                 a *= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a *= b
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a *= b
 
     def test_nan_to_num(self):
         arr = ht.array([1, 2, 3, ht.nan, ht.inf, -ht.inf])
@@ -3379,12 +3399,12 @@ def test_nansum(self):
     def test_neg(self):
         self.assertTrue(ht.equal(ht.neg(ht.array([-1, 1])), ht.array([1, -1])))
         self.assertTrue(ht.equal(-ht.array([-1.0, 1.0]), ht.array([1.0, -1.0])))
-
-        a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0)
-        b = out = ht.empty(5, dtype=ht.complex64, split=0)
-        ht.negative(a, out=out)
-        self.assertTrue(ht.equal(out, ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0)))
-        self.assertIs(out, b)
+        if not self.is_mps:
+            a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0)
+            b = out = ht.empty(5, dtype=ht.complex64, split=0)
+            ht.negative(a, out=out)
+            self.assertTrue(ht.equal(out, ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0)))
+            self.assertIs(out, b)
 
         with self.assertRaises(TypeError):
             ht.neg(1)
@@ -3400,8 +3420,10 @@ def test_neg_(self):
 
         result = ht.array([[-1.0, -2.0], [-3.0, -4.0]])
         int_result = ht.array([-2, -2])
-        a_complex_vector = a_complex_vector_double = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0)
-        complex_result = ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0)
+        a_complex_vector = a_complex_vector_double = ht.array(
+            [1 + 1j, 2 - 2j, 3, 4j, 5], split=0, dtype=ht.complex64
+        )
+        complex_result = ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0, dtype=ht.complex64)
 
         # We identify the underlying PyTorch objects to check whether operations are really in-place
         underlying_torch_tensor = a_tensor.larray
@@ -3423,8 +3445,12 @@ def test_neg_(self):
         self.assertIs(an_int_vector.larray, underlying_int_torch_tensor)
         underlying_int_torch_tensor.copy_(self.an_int_vector.larray)
 
-        self.assertTrue(ht.equal(a_complex_vector.neg_(), complex_result))
-        self.assertTrue(ht.equal(a_complex_vector, complex_result))
+        # test only on Mac Os 14.0 and higher
+        if self.is_mps:
+            macos_version = int(platform.mac_ver()[0].split(".")[0])
+        if not self.is_mps or macos_version >= 14:
+            self.assertTrue(ht.equal(a_complex_vector.neg_(), complex_result))
+            self.assertTrue(ht.equal(a_complex_vector, complex_result))
         self.assertIs(a_complex_vector, a_complex_vector_double)
         self.assertIs(a_complex_vector.larray, underlying_complex_torch_tensor)
 
@@ -3453,11 +3479,12 @@ def test_pos(self):
         self.assertTrue(ht.equal(ht.pos(ht.array([-1, 1])), ht.array([-1, 1])))
         self.assertTrue(ht.equal(+ht.array([-1.0, 1.0]), ht.array([-1.0, 1.0])))
 
-        a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0)
-        b = out = ht.empty(5, dtype=ht.complex64, split=0)
-        ht.positive(a, out=out)
-        self.assertTrue(ht.equal(out, a))
-        self.assertIs(out, b)
+        if not self.is_mps:
+            a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0)
+            b = out = ht.empty(5, dtype=ht.complex64, split=0)
+            ht.positive(a, out=out)
+            self.assertTrue(ht.equal(out, a))
+            self.assertIs(out, b)
 
         with self.assertRaises(TypeError):
             ht.pos(1)
@@ -3661,10 +3688,11 @@ def test_pow_(self):
                 a **= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a **= b
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a **= b
 
     def test_prod(self):
         array_len = 11
@@ -3976,10 +4004,11 @@ def test_remainder_(self):
                 a %= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a %= b
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a %= b
 
     def test_right_shift(self):
         int_tensor = ht.array([[0, 1], [2, 3]])
@@ -4361,10 +4390,11 @@ def test_sub_(self):
                 a -= b
 
         # test function for invalid casting between data types
-        a = ht.ones(3, dtype=ht.float32)
-        b = ht.ones(3, dtype=ht.float64)
-        with self.assertRaises(TypeError):
-            a -= b
+        if not self.is_mps:
+            a = ht.ones(3, dtype=ht.float32)
+            b = ht.ones(3, dtype=ht.float64)
+            with self.assertRaises(TypeError):
+                a -= b
 
     def test_sum(self):
         array_len = 11
diff --git a/heat/core/tests/test_communication.py b/heat/core/tests/test_communication.py
index 48187a591b..131b21f79a 100644
--- a/heat/core/tests/test_communication.py
+++ b/heat/core/tests/test_communication.py
@@ -1,10 +1,18 @@
+import os
+import unittest
+import platform
+
 import numpy as np
 import torch
 import heat as ht
 
 from .test_suites.basic_test import TestCase
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.machine() == "arm64"
+
 
+@unittest.skipIf(is_mps, "Distribution not supported on Apple MPS")
 class TestCommunication(TestCase):
     @classmethod
     def setUpClass(cls):
@@ -215,7 +223,6 @@ def test_allgather(self):
         # contiguous data
         data = ht.ones((1, 7))
         output = ht.zeros((ht.MPI_WORLD.size, 7))
-
         # ensure prior invariants
         self.assertTrue(data.larray.is_contiguous())
         self.assertTrue(output.larray.is_contiguous())
@@ -2492,3 +2499,44 @@ def test_alltoallSorting(self):
             test4.comm.Alltoallv(test4.larray, redistributed4, send_axis=2, recv_axis=2)
         with self.assertRaises(NotImplementedError):
             test4.comm.Alltoallv(test4.larray, redistributed4, send_axis=None)
+
+    # The following test is only for the bool data type to save memory
+    @unittest.skipIf(
+        ht.MPI_WORLD.size == 1 or ht.MPI_WORLD.size > 2 or "rocm" in torch.__version__,
+        "Only for two or three processes and not on the AMD runner",
+    )
+    def test_largecount_workaround_IsendRecv(self):
+        shape = (2**15, 2**16)
+        data = (
+            torch.zeros(shape, dtype=torch.bool)
+            if ht.MPI_WORLD.rank % 2 == 0
+            else torch.ones(shape, dtype=torch.bool)
+        )
+        buf = torch.empty(shape, dtype=torch.bool)
+        req = ht.MPI_WORLD.Isend(
+            data, ht.MPI_WORLD.rank - 1 if ht.MPI_WORLD.rank > 0 else ht.MPI_WORLD.size - 1
+        )
+        ht.MPI_WORLD.Recv(
+            buf, ht.MPI_WORLD.rank + 1 if ht.MPI_WORLD.rank < ht.MPI_WORLD.size - 1 else 0
+        )
+        req.Wait()
+        self.assertTrue(
+            buf.all()
+            if (ht.MPI_WORLD.rank % 2 == 0 and ht.MPI_WORLD.rank != ht.MPI_WORLD.size - 1)
+            else not buf.all()
+        )
+
+    # the following test is only for up to three processes to save memory
+    @unittest.skipIf(
+        ht.MPI_WORLD.size == 1 or ht.MPI_WORLD.size > 2 or "rocm" in torch.__version__,
+        "Only for two or three processes and not on the AMD runner",
+    )
+    def test_largecount_workaround_Allreduce(self):
+        shape = (2**10, 2**11, 2**10)
+        data = (
+            torch.zeros(shape, dtype=torch.bool)
+            if ht.MPI_WORLD.rank % 2 == 0
+            else torch.ones(shape, dtype=torch.bool)
+        )
+        ht.MPI_WORLD.Allreduce(ht.MPI.IN_PLACE, data, op=ht.MPI.SUM)
+        self.assertTrue(data.all())
diff --git a/heat/core/tests/test_complex_math.py b/heat/core/tests/test_complex_math.py
index dd0f8236cd..cc56088bce 100644
--- a/heat/core/tests/test_complex_math.py
+++ b/heat/core/tests/test_complex_math.py
@@ -1,210 +1,225 @@
 import numpy as np
 import torch
 import heat as ht
+import platform
 
 from .test_suites.basic_test import TestCase
 
 
 class TestComplex(TestCase):
     def test_abs(self):
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
-        absolute = ht.absolute(a)
-        res = torch.abs(a.larray)
-
-        self.assertIs(absolute.device, self.device)
-        self.assertIs(absolute.dtype, ht.float)
-        self.assertEqual(absolute.shape, (5,))
-        self.assertTrue(torch.equal(absolute.larray, res))
-
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
-        absolute = ht.absolute(a)
-        res = torch.abs(a.larray)
-
-        self.assertIs(absolute.device, self.device)
-        self.assertIs(absolute.dtype, ht.float)
-        self.assertEqual(absolute.shape, (5,))
-        self.assertTrue(torch.equal(absolute.larray, res))
-
-        a = ht.array(
-            [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1, dtype=ht.complex128
-        )
-        absolute = ht.absolute(a)
-        res = torch.abs(a.larray)
-
-        self.assertIs(absolute.device, self.device)
-        self.assertIs(absolute.dtype, ht.double)
-        self.assertEqual(absolute.shape, (3, 2))
-        self.assertTrue(torch.equal(absolute.larray, res))
+        if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14:
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
+            absolute = ht.absolute(a)
+            res = torch.abs(a.larray)
+
+            self.assertIs(absolute.device, self.device)
+            self.assertIs(absolute.dtype, ht.float)
+            self.assertEqual(absolute.shape, (5,))
+            self.assertTrue(torch.equal(absolute.larray, res))
+
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
+            absolute = ht.absolute(a)
+            res = torch.abs(a.larray)
+
+            self.assertIs(absolute.device, self.device)
+            self.assertIs(absolute.dtype, ht.float)
+            self.assertEqual(absolute.shape, (5,))
+            self.assertTrue(torch.equal(absolute.larray, res))
+
+            if not self.is_mps:
+                a = ht.array(
+                    [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]],
+                    split=1,
+                    dtype=ht.complex128,
+                )
+                absolute = ht.absolute(a)
+                res = torch.abs(a.larray)
+
+                self.assertIs(absolute.device, self.device)
+                self.assertIs(absolute.dtype, ht.double)
+                self.assertEqual(absolute.shape, (3, 2))
+                self.assertTrue(torch.equal(absolute.larray, res))
 
     def test_angle(self):
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
-        angle = ht.angle(a)
-        res = torch.angle(a.larray)
-
-        self.assertIs(angle.device, self.device)
-        self.assertIs(angle.dtype, ht.float)
-        self.assertEqual(angle.shape, (5,))
-        self.assertTrue(torch.equal(angle.larray, res))
-
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
-        angle = ht.angle(a)
-        res = torch.angle(a.larray)
-
-        self.assertIs(angle.device, self.device)
-        self.assertIs(angle.dtype, ht.float)
-        self.assertEqual(angle.shape, (5,))
-        self.assertTrue(torch.equal(angle.larray, res))
-
-        a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1)
-        angle = ht.angle(a, deg=True)
-        res = ht.array(
-            [[0.0, 90.0], [45.0, 135.0], [-45.0, -135.0]],
-            dtype=ht.float32,
-            device=self.device,
-            split=1,
-        )
-
-        self.assertIs(angle.device, self.device)
-        self.assertIs(angle.dtype, ht.float32)
-        self.assertEqual(angle.shape, (3, 2))
-        self.assertTrue(ht.equal(angle, res))
-
-        # Not complex
-        a = ht.ones((4, 4), split=1)
-        angle = ht.angle(a)
-        res = ht.zeros((4, 4), split=1)
-
-        self.assertIs(angle.device, self.device)
-        self.assertIs(angle.dtype, ht.float32)
-        self.assertEqual(angle.shape, (4, 4))
-        self.assertTrue(ht.equal(angle, res))
+        if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14:
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
+            angle = ht.angle(a)
+            res = torch.angle(a.larray)
+
+            self.assertIs(angle.device, self.device)
+            self.assertIs(angle.dtype, ht.float)
+            self.assertEqual(angle.shape, (5,))
+            self.assertTrue(torch.equal(angle.larray, res))
+
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
+            angle = ht.angle(a)
+            res = torch.angle(a.larray)
+
+            self.assertIs(angle.device, self.device)
+            self.assertIs(angle.dtype, ht.float)
+            self.assertEqual(angle.shape, (5,))
+            self.assertTrue(torch.equal(angle.larray, res))
+
+            a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1)
+            angle = ht.angle(a, deg=True)
+            res = ht.array(
+                [[0.0, 90.0], [45.0, 135.0], [-45.0, -135.0]],
+                dtype=ht.float32,
+                device=self.device,
+                split=1,
+            )
+
+            self.assertIs(angle.device, self.device)
+            self.assertIs(angle.dtype, ht.float32)
+            self.assertEqual(angle.shape, (3, 2))
+            self.assertTrue(ht.equal(angle, res))
+
+            # Not complex
+            a = ht.ones((4, 4), split=1)
+            angle = ht.angle(a)
+            res = ht.zeros((4, 4), split=1)
+
+            self.assertIs(angle.device, self.device)
+            self.assertIs(angle.dtype, ht.float32)
+            self.assertEqual(angle.shape, (4, 4))
+            self.assertTrue(ht.equal(angle, res))
 
     def test_conjugate(self):
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
-        conj = ht.conjugate(a)
-        res = ht.array(
-            [1 - 0j, -1j, 1 - 1j, -2 - 2j, 3 + 3j], dtype=ht.complex64, device=self.device
-        )
-
-        self.assertIs(conj.device, self.device)
-        self.assertIs(conj.dtype, ht.complex64)
-        self.assertEqual(conj.shape, (5,))
-        # equal on complex numbers does not work on PyTorch
-        self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
-        self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
-
-        a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=0)
-        conj = ht.conjugate(a)
-        res = ht.array(
-            [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]],
-            dtype=ht.complex64,
-            device=self.device,
-            split=0,
-        )
-
-        self.assertIs(conj.device, self.device)
-        self.assertIs(conj.dtype, ht.complex64)
-        self.assertEqual(conj.shape, (3, 2))
-        # equal on complex numbers does not work on PyTorch
-        self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
-        self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
-
-        a = ht.array(
-            [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], dtype=ht.complex128, split=1
-        )
-        conj = ht.conjugate(a)
-        res = ht.array(
-            [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]],
-            dtype=ht.complex128,
-            device=self.device,
-            split=1,
-        )
-
-        self.assertIs(conj.device, self.device)
-        self.assertIs(conj.dtype, ht.complex128)
-        self.assertEqual(conj.shape, (3, 2))
-        # equal on complex numbers does not work on PyTorch
-        self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
-        self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
-
-        # Not complex
-        a = ht.ones((4, 4))
-        conj = ht.conj(a)
-        res = ht.ones((4, 4))
-
-        self.assertIs(conj.device, self.device)
-        self.assertIs(conj.dtype, ht.float32)
-        self.assertEqual(conj.shape, (4, 4))
-        self.assertTrue(ht.equal(conj, res))
-
-        # DNDarray method
-        a = ht.array([1 + 1j, 1 - 1j])
-        conj = a.conj()
-        res = ht.array([1 - 1j, 1 + 1j])
-
-        self.assertIs(conj.device, self.device)
-        self.assertTrue(ht.equal(conj, res))
+        if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14:
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
+            conj = ht.conjugate(a)
+            res = ht.array(
+                [1 - 0j, -1j, 1 - 1j, -2 - 2j, 3 + 3j], dtype=ht.complex64, device=self.device
+            )
+
+            self.assertIs(conj.device, self.device)
+            self.assertIs(conj.dtype, ht.complex64)
+            self.assertEqual(conj.shape, (5,))
+            # equal on complex numbers does not work on PyTorch
+            self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
+            if not self.is_mps:
+                # precision loss on imaginary part on MPS
+                self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
+
+            a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=0)
+            conj = ht.conjugate(a)
+            res = ht.array(
+                [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]],
+                dtype=ht.complex64,
+                device=self.device,
+                split=0,
+            )
+
+            self.assertIs(conj.device, self.device)
+            self.assertIs(conj.dtype, ht.complex64)
+            self.assertEqual(conj.shape, (3, 2))
+            # equal on complex numbers does not work on PyTorch
+            self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
+            if not self.is_mps:
+                # precision loss on imaginary part on MPS
+                self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
+
+            if not self.is_mps:
+                # complex128 not supported on MPS
+                a = ht.array(
+                    [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]],
+                    dtype=ht.complex128,
+                    split=1,
+                )
+                conj = ht.conjugate(a)
+                res = ht.array(
+                    [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]],
+                    dtype=ht.complex128,
+                    device=self.device,
+                    split=1,
+                )
+
+                self.assertIs(conj.device, self.device)
+                self.assertIs(conj.dtype, ht.complex128)
+                self.assertEqual(conj.shape, (3, 2))
+                # equal on complex numbers does not work on PyTorch
+                self.assertTrue(ht.equal(ht.real(conj), ht.real(res)))
+                self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res)))
+
+            # Not complex
+            a = ht.ones((4, 4))
+            conj = ht.conj(a)
+            res = ht.ones((4, 4))
+
+            self.assertIs(conj.device, self.device)
+            self.assertIs(conj.dtype, ht.float32)
+            self.assertEqual(conj.shape, (4, 4))
+            self.assertTrue(ht.equal(conj, res))
+
+            # DNDarray method
+            a = ht.array([1 + 1j, 1 - 1j])
+            conj = a.conj()
+            res = ht.array([1 - 1j, 1 + 1j])
+
+            self.assertIs(conj.device, self.device)
+            self.assertTrue(ht.equal(conj, res))
 
     def test_imag(self):
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
-        imag = ht.imag(a)
-        res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device)
-
-        self.assertIs(imag.device, self.device)
-        self.assertIs(imag.dtype, ht.float)
-        self.assertEqual(imag.shape, (5,))
-        self.assertTrue(ht.equal(imag, res))
-
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
-        imag = ht.imag(a)
-        res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device, split=0)
-
-        self.assertIs(imag.device, self.device)
-        self.assertIs(imag.dtype, ht.float)
-        self.assertEqual(imag.shape, (5,))
-        self.assertTrue(ht.equal(imag, res))
-
-        # Not complex
-        a = ht.ones((4, 4))
-        imag = a.imag
-        res = ht.zeros((4, 4))
-
-        self.assertIs(imag.device, self.device)
-        self.assertIs(imag.dtype, ht.float32)
-        self.assertEqual(imag.shape, (4, 4))
-        self.assertTrue(ht.equal(imag, res))
+        if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14:
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
+            imag = ht.imag(a)
+            res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device)
+
+            self.assertIs(imag.device, self.device)
+            self.assertIs(imag.dtype, ht.float)
+            self.assertEqual(imag.shape, (5,))
+            self.assertTrue(ht.equal(imag, res))
+
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
+            imag = ht.imag(a)
+            res = ht.array(
+                [0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device, split=0
+            )
+
+            self.assertIs(imag.device, self.device)
+            self.assertIs(imag.dtype, ht.float)
+            self.assertEqual(imag.shape, (5,))
+            self.assertTrue(ht.equal(imag, res))
+
+            # Not complex
+            a = ht.ones((4, 4))
+            imag = a.imag
+            res = ht.zeros((4, 4))
+
+            self.assertIs(imag.device, self.device)
+            self.assertIs(imag.dtype, ht.float32)
+            self.assertEqual(imag.shape, (4, 4))
+            self.assertTrue(ht.equal(imag, res))
 
     def test_real(self):
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
-        real = ht.real(a)
-        res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device)
-
-        self.assertIs(real.device, self.device)
-        self.assertIs(real.dtype, ht.float)
-        self.assertEqual(real.shape, (5,))
-        self.assertTrue(ht.equal(real, res))
-
-        a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
-        real = ht.real(a)
-        res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device, split=0)
-
-        self.assertIs(real.device, self.device)
-        self.assertIs(real.dtype, ht.float)
-        self.assertEqual(real.shape, (5,))
-        self.assertTrue(ht.equal(real, res))
-
-        # Not complex
-        a = ht.ones((4, 4), split=1)
-        real = a.real
-        res = ht.ones((4, 4), split=1)
-
-        self.assertIs(real.device, self.device)
-        self.assertIs(real.dtype, ht.float32)
-        self.assertEqual(real.shape, (4, 4))
-        self.assertIs(real, a)
-
-    # This test will be redundant with PyTorch 1.7
-    def test_full(self):
-        a = ht.full((4, 4), 1 + 1j)
-
-        self.assertIs(a.dtype, ht.complex64)
+        if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14:
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])
+            real = ht.real(a)
+            res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device)
+
+            self.assertIs(real.device, self.device)
+            self.assertIs(real.dtype, ht.float)
+            self.assertEqual(real.shape, (5,))
+            self.assertTrue(ht.equal(real, res))
+
+            a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0)
+            real = ht.real(a)
+            res = ht.array(
+                [1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device, split=0
+            )
+
+            self.assertIs(real.device, self.device)
+            self.assertIs(real.dtype, ht.float)
+            self.assertEqual(real.shape, (5,))
+            self.assertTrue(ht.equal(real, res))
+
+            # Not complex
+            a = ht.ones((4, 4), split=1)
+            real = a.real
+            res = ht.ones((4, 4), split=1)
+
+            self.assertIs(real.device, self.device)
+            self.assertIs(real.dtype, ht.float32)
+            self.assertEqual(real.shape, (4, 4))
+            self.assertIs(real, a)
diff --git a/heat/core/tests/test_dndarray.py b/heat/core/tests/test_dndarray.py
index 9cc1361b80..c6123c1cf2 100644
--- a/heat/core/tests/test_dndarray.py
+++ b/heat/core/tests/test_dndarray.py
@@ -244,14 +244,39 @@ def test_array(self):
         self.assertEqual(x.__array__().shape, x.gshape)
 
         # distributed case
-        x = ht.arange(6 * 7 * 8, dtype=ht.float64, split=0).reshape((6, 7, 8))
-        x_np = np.arange(6 * 7 * 8, dtype=np.float64).reshape((6, 7, 8))
+        if self.is_mps:
+            dtype = ht.float32
+            np_dtype = np.float32
+        else:
+            dtype = ht.float64
+            np_dtype = np.float64
+        x = ht.arange(6 * 7 * 8, dtype=dtype, split=0).reshape((6, 7, 8))
+        x_np = np.arange(6 * 7 * 8, dtype=np_dtype).reshape((6, 7, 8))
 
         self.assertTrue((x.__array__() == x.larray.cpu().numpy()).all())
         self.assertIsInstance(x.__array__(), np.ndarray)
         self.assertEqual(x.__array__().dtype, x_np.dtype)
         self.assertEqual(x.__array__().shape, x.lshape)
 
+    def test_array_ufunc(self):
+        arr = ht.array([1, 2, 3, 4])
+        self.assertIsInstance(np.multiply(arr, 3), ht.DNDarray)
+        self.assertIsInstance(np.add(arr, 3), ht.DNDarray)
+        self.assertIsInstance(np.sin(arr), ht.DNDarray)
+
+        with self.assertRaises(TypeError):
+            np.multiply.reduce(arr)
+        with self.assertRaises(TypeError):
+            np.heaviside(arr, 5)
+
+    def test_array_function(self):
+        arr = ht.array([1, 2, 3, 4])
+        self.assertIsInstance(np.concatenate([arr, arr]), ht.DNDarray)
+        self.assertIsInstance(np.sum(arr, axis=0), ht.DNDarray)
+
+        with self.assertRaises(TypeError):
+            np.array_equiv(arr, arr)
+
     def test_larray(self):
         # undistributed case
         x = ht.arange(6 * 7 * 8).reshape((6, 7, 8))
@@ -320,12 +345,13 @@ def test_astype(self):
         self.assertEqual(as_uint8.larray.dtype, torch.uint8)
         self.assertIsNot(as_uint8, data)
 
-        # check the copy case for uint8
-        as_float64 = data.astype(ht.float64, copy=False)
-        self.assertIsInstance(as_float64, ht.DNDarray)
-        self.assertEqual(as_float64.dtype, ht.float64)
-        self.assertEqual(as_float64.larray.dtype, torch.float64)
-        self.assertIs(as_float64, data)
+        # check the copy case for float64
+        if not self.is_mps:
+            as_float64 = data.astype(ht.float64, copy=False)
+            self.assertIsInstance(as_float64, ht.DNDarray)
+            self.assertEqual(as_float64.dtype, ht.float64)
+            self.assertEqual(as_float64.larray.dtype, torch.float64)
+            self.assertIs(as_float64, data)
 
     def test_balance_and_lshape_map(self):
         data = ht.zeros((70, 20), split=0)
@@ -347,9 +373,10 @@ def test_balance_and_lshape_map(self):
         data.balance_()
         self.assertTrue(data.is_balanced())
 
-        data = ht.zeros((70, 20), split=0, dtype=ht.float64)
-        data = ht.balance(data[:50], copy=True)
-        self.assertTrue(data.is_balanced())
+        if not self.is_mps:
+            data = ht.zeros((70, 20), split=0, dtype=ht.float64)
+            data = ht.balance(data[:50], copy=True)
+            self.assertTrue(data.is_balanced())
 
         data = ht.zeros((4, 120), split=1, dtype=ht.int64)
         data = data[:, 40:70].balance()
@@ -357,7 +384,9 @@ def test_balance_and_lshape_map(self):
 
         data = np.loadtxt("heat/datasets/iris.csv", delimiter=";")
         htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0)
-        self.assertTrue(ht.equal(htdata, ht.array(data, split=0, dtype=ht.float)))
+        self.assertTrue(
+            ht.equal(htdata, ht.array(data.astype(np.float32), split=0, dtype=ht.float))
+        )
 
         if ht.MPI_WORLD.size > 4:
             rank = ht.MPI_WORLD.rank
@@ -697,7 +726,8 @@ def test_lnbytes(self):
 
         # float
         x_float32 = ht.arange(6 * 7 * 8, dtype=ht.float32).reshape((6, 7, 8))
-        x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8))
+        if not self.is_mps:
+            x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8))
 
         # bool
         x_bool = ht.arange(6 * 7 * 8, dtype=ht.bool).reshape((6, 7, 8))
@@ -709,7 +739,8 @@ def test_lnbytes(self):
         self.assertEqual(x_int64.lnbytes, x_int64.gnbytes)
 
         self.assertEqual(x_float32.lnbytes, x_float32.gnbytes)
-        self.assertEqual(x_float64.lnbytes, x_float64.gnbytes)
+        if not self.is_mps:
+            self.assertEqual(x_float64.lnbytes, x_float64.gnbytes)
 
         self.assertEqual(x_bool.lnbytes, x_bool.gnbytes)
 
@@ -724,7 +755,8 @@ def test_lnbytes(self):
 
         # float
         x_float32_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float32)
-        x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64)
+        if not self.is_mps:
+            x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64)
 
         # bool
         x_bool_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.bool)
@@ -736,7 +768,8 @@ def test_lnbytes(self):
         self.assertEqual(x_int64_d.lnbytes, x_int64_d.lnumel * 8)
 
         self.assertEqual(x_float32_d.lnbytes, x_float32_d.lnumel * 4)
-        self.assertEqual(x_float64_d.lnbytes, x_float64_d.lnumel * 8)
+        if not self.is_mps:
+            self.assertEqual(x_float64_d.lnbytes, x_float64_d.lnumel * 8)
 
         self.assertEqual(x_bool_d.lnbytes, x_bool_d.lnumel * 1)
 
@@ -752,7 +785,8 @@ def test_nbytes(self):
 
         # float
         x_float32 = ht.arange(6 * 7 * 8, dtype=ht.float32).reshape((6, 7, 8))
-        x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8))
+        if not self.is_mps:
+            x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8))
 
         # bool
         x_bool = ht.arange(6 * 7 * 8, dtype=ht.bool).reshape((6, 7, 8))
@@ -764,7 +798,8 @@ def test_nbytes(self):
         self.assertEqual(x_int64.nbytes, 336 * 8)
 
         self.assertEqual(x_float32.nbytes, 336 * 4)
-        self.assertEqual(x_float64.nbytes, 336 * 8)
+        if not self.is_mps:
+            self.assertEqual(x_float64.nbytes, 336 * 8)
 
         self.assertEqual(x_bool.nbytes, 336 * 1)
 
@@ -776,7 +811,8 @@ def test_nbytes(self):
         self.assertEqual(x_int64.nbytes, x_int64.gnbytes)
 
         self.assertEqual(x_float32.nbytes, x_float32.gnbytes)
-        self.assertEqual(x_float64.nbytes, x_float64.gnbytes)
+        if not self.is_mps:
+            self.assertEqual(x_float64.nbytes, x_float64.gnbytes)
 
         self.assertEqual(x_bool.nbytes, x_bool.gnbytes)
 
@@ -791,7 +827,8 @@ def test_nbytes(self):
 
         # float
         x_float32_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float32)
-        x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64)
+        if not self.is_mps:
+            x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64)
 
         # bool
         x_bool_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.bool)
@@ -803,7 +840,8 @@ def test_nbytes(self):
         self.assertEqual(x_int64_d.nbytes, 336 * 8)
 
         self.assertEqual(x_float32_d.nbytes, 336 * 4)
-        self.assertEqual(x_float64_d.nbytes, 336 * 8)
+        if not self.is_mps:
+            self.assertEqual(x_float64_d.nbytes, 336 * 8)
 
         self.assertEqual(x_bool_d.nbytes, 336 * 1)
 
@@ -815,7 +853,8 @@ def test_nbytes(self):
         self.assertEqual(x_int64_d.nbytes, x_int64_d.gnbytes)
 
         self.assertEqual(x_float32_d.nbytes, x_float32_d.gnbytes)
-        self.assertEqual(x_float64_d.nbytes, x_float64_d.gnbytes)
+        if not self.is_mps:
+            self.assertEqual(x_float64_d.nbytes, x_float64_d.gnbytes)
 
         self.assertEqual(x_bool_d.nbytes, x_bool_d.gnbytes)
 
@@ -824,21 +863,23 @@ def test_ndim(self):
         self.assertEqual(a.ndim, 4)
 
     def test_numpy(self):
-        # ToDo: numpy does not work for distributed tensors du to issue#
+        # ToDo: numpy does not work for distributed tensors due to issue#
         # Add additional tests if the issue is solved
-        a = np.random.randn(10, 8)
-        b = ht.array(a)
-        self.assertIsInstance(b.numpy(), np.ndarray)
-        self.assertEqual(b.numpy().shape, a.shape)
-        self.assertEqual(b.numpy().tolist(), b.larray.cpu().numpy().tolist())
+        if not self.is_mps:
+            a = np.random.randn(10, 8)
+            b = ht.array(a)
+            self.assertIsInstance(b.numpy(), np.ndarray)
+            self.assertEqual(b.numpy().shape, a.shape)
+            self.assertEqual(b.numpy().tolist(), b.larray.cpu().numpy().tolist())
 
         a = ht.ones((10, 8), dtype=ht.float32)
         b = np.ones((2, 2)).astype("float32")
         self.assertEqual(a.numpy().dtype, b.dtype)
 
-        a = ht.ones((10, 8), dtype=ht.float64)
-        b = np.ones((2, 2)).astype("float64")
-        self.assertEqual(a.numpy().dtype, b.dtype)
+        if not self.is_mps:
+            a = ht.ones((10, 8), dtype=ht.float64)
+            b = np.ones((2, 2)).astype("float64")
+            self.assertEqual(a.numpy().dtype, b.dtype)
 
         a = ht.ones((10, 8), dtype=ht.int32)
         b = np.ones((2, 2)).astype("int32")
@@ -857,18 +898,19 @@ def test_or(self):
         )
 
     def test_partitioned(self):
-        a = ht.zeros((120, 120), split=0)
-        parted = a.__partitioned__
-        self.assertEqual(parted["shape"], (120, 120))
-        self.assertEqual(parted["partition_tiling"], (a.comm.size, 1))
-        self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0))
-
-        a.resplit_(None)
-        self.assertIsNone(a.__partitions_dict__)
-        parted = a.__partitioned__
-        self.assertEqual(parted["shape"], (120, 120))
-        self.assertEqual(parted["partition_tiling"], (1, 1))
-        self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0))
+        if not self.is_mps:
+            a = ht.zeros((120, 120), split=0)
+            parted = a.__partitioned__
+            self.assertEqual(parted["shape"], (120, 120))
+            self.assertEqual(parted["partition_tiling"], (a.comm.size, 1))
+            self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0))
+
+            a.resplit_(None)
+            self.assertIsNone(a.__partitions_dict__)
+            parted = a.__partitioned__
+            self.assertEqual(parted["shape"], (120, 120))
+            self.assertEqual(parted["partition_tiling"], (1, 1))
+            self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0))
 
     def test_redistribute(self):
         # need to test with 1, 2, 3, and 4 dims
@@ -934,177 +976,175 @@ def test_redistribute(self):
             with self.assertRaises(ValueError):
                 st.redistribute_(target_map=torch.zeros((2, 4)))
 
-    def test_repr(self):
-        a = ht.array([1, 2, 3, 4])
-        self.assertEqual(a.__repr__(), a.__str__())
-
     def test_resplit(self):
-        # resplitting with same axis, should leave everything unchanged
-        shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
-        data = ht.zeros(shape, split=None)
-        data.resplit_(None)
-
-        self.assertIsInstance(data, ht.DNDarray)
-        self.assertEqual(data.shape, shape)
-        self.assertEqual(data.lshape, shape)
-        self.assertEqual(data.split, None)
-
-        # resplitting with same axis, should leave everything unchanged
-        shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
-        data = ht.zeros(shape, split=1)
-        data.resplit_(1)
-
-        self.assertIsInstance(data, ht.DNDarray)
-        self.assertEqual(data.shape, shape)
-        self.assertEqual(data.lshape, (data.comm.size, 1))
-        self.assertEqual(data.split, 1)
-
-        # splitting an unsplit tensor should result in slicing the tensor locally
-        shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
-        data = ht.zeros(shape)
-        data.resplit_(-1)
-
-        self.assertIsInstance(data, ht.DNDarray)
-        self.assertEqual(data.shape, shape)
-        self.assertEqual(data.lshape, (data.comm.size, 1))
-        self.assertEqual(data.split, 1)
-
-        # unsplitting, aka gathering a tensor
-        shape = (ht.MPI_WORLD.size + 1, ht.MPI_WORLD.size)
-        data = ht.ones(shape, split=0)
-        data.resplit_(None)
-
-        self.assertIsInstance(data, ht.DNDarray)
-        self.assertEqual(data.shape, shape)
-        self.assertEqual(data.lshape, shape)
-        self.assertEqual(data.split, None)
-
-        # assign and entirely new split axis
-        shape = (ht.MPI_WORLD.size + 2, ht.MPI_WORLD.size + 1)
-        data = ht.ones(shape, split=0)
-        data.resplit_(1)
-
-        self.assertIsInstance(data, ht.DNDarray)
-        self.assertEqual(data.shape, shape)
-        self.assertEqual(data.lshape[0], ht.MPI_WORLD.size + 2)
-        self.assertTrue(data.lshape[1] == 1 or data.lshape[1] == 2)
-        self.assertEqual(data.split, 1)
-
-        # test sorting order of resplit
-        a_tensor = self.reference_tensor.copy()
-        N = ht.MPI_WORLD.size
-
-        # split along axis = 0
-        a_tensor.resplit_(axis=0)
-        local_shape = (1, N + 1, 2 * N)
-        local_tensor = self.reference_tensor[ht.MPI_WORLD.rank, :, :]
-        self.assertEqual(a_tensor.lshape, local_shape)
-        self.assertTrue((a_tensor.larray == local_tensor.larray).all())
-
-        # unsplit
-        a_tensor.resplit_(axis=None)
-        self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all())
-
-        # split along axis = 1
-        a_tensor.resplit_(axis=1)
-        if ht.MPI_WORLD.rank == 0:
-            local_shape = (N, 2, 2 * N)
-            local_tensor = self.reference_tensor[:, 0:2, :]
-        else:
-            local_shape = (N, 1, 2 * N)
-            local_tensor = self.reference_tensor[
-                :, ht.MPI_WORLD.rank + 1 : ht.MPI_WORLD.rank + 2, :
-            ]
-
-        self.assertEqual(a_tensor.lshape, local_shape)
-        self.assertTrue((a_tensor.larray == local_tensor.larray).all())
+        # MPS tests are always 1 process only
+        if not self.is_mps:
+            # resplitting with same axis, should leave everything unchanged
+            shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
+            data = ht.zeros(shape, split=None)
+            data.resplit_(None)
+
+            self.assertIsInstance(data, ht.DNDarray)
+            self.assertEqual(data.shape, shape)
+            self.assertEqual(data.lshape, shape)
+            self.assertEqual(data.split, None)
+
+            # resplitting with same axis, should leave everything unchanged
+            shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
+            data = ht.zeros(shape, split=1)
+            data.resplit_(1)
+
+            self.assertIsInstance(data, ht.DNDarray)
+            self.assertEqual(data.shape, shape)
+            self.assertEqual(data.lshape, (data.comm.size, 1))
+            self.assertEqual(data.split, 1)
+
+            # splitting an unsplit tensor should result in slicing the tensor locally
+            shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
+            data = ht.zeros(shape)
+            data.resplit_(-1)
+
+            self.assertIsInstance(data, ht.DNDarray)
+            self.assertEqual(data.shape, shape)
+            self.assertEqual(data.lshape, (data.comm.size, 1))
+            self.assertEqual(data.split, 1)
+
+            # unsplitting, aka gathering a tensor
+            shape = (ht.MPI_WORLD.size + 1, ht.MPI_WORLD.size)
+            data = ht.ones(shape, split=0)
+            data.resplit_(None)
+
+            self.assertIsInstance(data, ht.DNDarray)
+            self.assertEqual(data.shape, shape)
+            self.assertEqual(data.lshape, shape)
+            self.assertEqual(data.split, None)
+
+            # assign and entirely new split axis
+            shape = (ht.MPI_WORLD.size + 2, ht.MPI_WORLD.size + 1)
+            data = ht.ones(shape, split=0)
+            data.resplit_(1)
+
+            self.assertIsInstance(data, ht.DNDarray)
+            self.assertEqual(data.shape, shape)
+            self.assertEqual(data.lshape[0], ht.MPI_WORLD.size + 2)
+            self.assertTrue(data.lshape[1] == 1 or data.lshape[1] == 2)
+            self.assertEqual(data.split, 1)
+
+            # test sorting order of resplit
+            a_tensor = self.reference_tensor.copy()
+            N = ht.MPI_WORLD.size
+
+            # split along axis = 0
+            a_tensor.resplit_(axis=0)
+            local_shape = (1, N + 1, 2 * N)
+            local_tensor = self.reference_tensor[ht.MPI_WORLD.rank, :, :]
+            self.assertEqual(a_tensor.lshape, local_shape)
+            self.assertTrue((a_tensor.larray == local_tensor.larray).all())
+
+            # unsplit
+            a_tensor.resplit_(axis=None)
+            self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all())
+
+            # split along axis = 1
+            a_tensor.resplit_(axis=1)
+            if ht.MPI_WORLD.rank == 0:
+                local_shape = (N, 2, 2 * N)
+                local_tensor = self.reference_tensor[:, 0:2, :]
+            else:
+                local_shape = (N, 1, 2 * N)
+                local_tensor = self.reference_tensor[
+                    :, ht.MPI_WORLD.rank + 1 : ht.MPI_WORLD.rank + 2, :
+                ]
 
-        # unsplit
-        a_tensor.resplit_(axis=None)
-        self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all())
+            self.assertEqual(a_tensor.lshape, local_shape)
+            self.assertTrue((a_tensor.larray == local_tensor.larray).all())
 
-        # split along axis = 2
-        a_tensor.resplit_(axis=2)
-        local_shape = (N, N + 1, 2)
-        local_tensor = self.reference_tensor[
-            :, :, 2 * ht.MPI_WORLD.rank : 2 * ht.MPI_WORLD.rank + 2
-        ]
+            # unsplit
+            a_tensor.resplit_(axis=None)
+            self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all())
 
-        self.assertEqual(a_tensor.lshape, local_shape)
-        self.assertTrue((a_tensor.larray == local_tensor.larray).all())
+            # split along axis = 2
+            a_tensor.resplit_(axis=2)
+            local_shape = (N, N + 1, 2)
+            local_tensor = self.reference_tensor[
+                :, :, 2 * ht.MPI_WORLD.rank : 2 * ht.MPI_WORLD.rank + 2
+            ]
 
-        expected = torch.ones(
-            (ht.MPI_WORLD.size, 100), dtype=torch.int64, device=self.device.torch_device
-        )
-        data = ht.array(expected, split=1)
-        data.resplit_(None)
+            self.assertEqual(a_tensor.lshape, local_shape)
+            self.assertTrue((a_tensor.larray == local_tensor.larray).all())
 
-        self.assertTrue(torch.equal(data.larray, expected))
-        self.assertFalse(data.is_distributed())
-        self.assertIsNone(data.split)
-        self.assertEqual(data.dtype, ht.int64)
-        self.assertEqual(data.larray.dtype, expected.dtype)
+            expected = torch.ones(
+                (ht.MPI_WORLD.size, 100), dtype=torch.int64, device=self.device.torch_device
+            )
+            data = ht.array(expected, split=1)
+            data.resplit_(None)
 
-        expected = torch.zeros(
-            (100, ht.MPI_WORLD.size), dtype=torch.uint8, device=self.device.torch_device
-        )
-        data = ht.array(expected, split=0)
-        data.resplit_(None)
+            self.assertTrue(torch.equal(data.larray, expected))
+            self.assertFalse(data.is_distributed())
+            self.assertIsNone(data.split)
+            self.assertEqual(data.dtype, ht.int64)
+            self.assertEqual(data.larray.dtype, expected.dtype)
 
-        self.assertTrue(torch.equal(data.larray, expected))
-        self.assertFalse(data.is_distributed())
-        self.assertIsNone(data.split)
-        self.assertEqual(data.dtype, ht.uint8)
-        self.assertEqual(data.larray.dtype, expected.dtype)
-
-        # "in place"
-        length = torch.tensor([i + 20 for i in range(2)], device=self.device.torch_device)
-        test = torch.arange(
-            torch.prod(length), dtype=torch.float64, device=self.device.torch_device
-        ).reshape([i + 20 for i in range(2)])
-        a = ht.array(test, split=1)
-        a.resplit_(axis=0)
-        self.assertTrue(ht.equal(a, ht.array(test, split=0)))
-        self.assertEqual(a.split, 0)
-        self.assertEqual(a.dtype, ht.float64)
-        del a
-
-        test = torch.arange(torch.prod(length), device=self.device.torch_device)
-        a = ht.array(test, split=0)
-        a.resplit_(axis=None)
-        self.assertTrue(ht.equal(a, ht.array(test, split=None)))
-        self.assertEqual(a.split, None)
-        self.assertEqual(a.dtype, ht.int64)
-        del a
-
-        a = ht.array(test, split=None)
-        a.resplit_(axis=0)
-        self.assertTrue(ht.equal(a, ht.array(test, split=0)))
-        self.assertEqual(a.split, 0)
-        self.assertEqual(a.dtype, ht.int64)
-        del a
-
-        a = ht.array(test, split=0)
-        resplit_a = ht.manipulations.resplit(a, axis=None)
-        self.assertTrue(ht.equal(resplit_a, ht.array(test, split=None)))
-        self.assertEqual(resplit_a.split, None)
-        self.assertEqual(resplit_a.dtype, ht.int64)
-        del a
-
-        a = ht.array(test, split=None)
-        resplit_a = ht.manipulations.resplit(a, axis=0)
-        self.assertTrue(ht.equal(resplit_a, ht.array(test, split=0)))
-        self.assertEqual(resplit_a.split, 0)
-        self.assertEqual(resplit_a.dtype, ht.int64)
-        del a
-
-        # 1D non-contiguous resplit testing
-        t1 = ht.arange(10 * 10, split=0).reshape((10, 10))
-        t1_sub = t1[:, 1]  # .expand_dims(0)
-        res = ht.array([1, 11, 21, 31, 41, 51, 61, 71, 81, 91])
-        t1_sub.resplit_(axis=None)
-        self.assertTrue(ht.all(t1_sub == res))
-        self.assertEqual(t1_sub.split, None)
+            expected = torch.zeros(
+                (100, ht.MPI_WORLD.size), dtype=torch.uint8, device=self.device.torch_device
+            )
+            data = ht.array(expected, split=0)
+            data.resplit_(None)
+
+            self.assertTrue(torch.equal(data.larray, expected))
+            self.assertFalse(data.is_distributed())
+            self.assertIsNone(data.split)
+            self.assertEqual(data.dtype, ht.uint8)
+            self.assertEqual(data.larray.dtype, expected.dtype)
+
+            # "in place"
+            length = torch.tensor([i + 20 for i in range(2)], device=self.device.torch_device)
+            test = torch.arange(
+                torch.prod(length), dtype=torch.float64, device=self.device.torch_device
+            ).reshape([i + 20 for i in range(2)])
+            a = ht.array(test, split=1)
+            a.resplit_(axis=0)
+            self.assertTrue(ht.equal(a, ht.array(test, split=0)))
+            self.assertEqual(a.split, 0)
+            self.assertEqual(a.dtype, ht.float64)
+            del a
+
+            test = torch.arange(torch.prod(length), device=self.device.torch_device)
+            a = ht.array(test, split=0)
+            a.resplit_(axis=None)
+            self.assertTrue(ht.equal(a, ht.array(test, split=None)))
+            self.assertEqual(a.split, None)
+            self.assertEqual(a.dtype, ht.int64)
+            del a
+
+            a = ht.array(test, split=None)
+            a.resplit_(axis=0)
+            self.assertTrue(ht.equal(a, ht.array(test, split=0)))
+            self.assertEqual(a.split, 0)
+            self.assertEqual(a.dtype, ht.int64)
+            del a
+
+            a = ht.array(test, split=0)
+            resplit_a = ht.manipulations.resplit(a, axis=None)
+            self.assertTrue(ht.equal(resplit_a, ht.array(test, split=None)))
+            self.assertEqual(resplit_a.split, None)
+            self.assertEqual(resplit_a.dtype, ht.int64)
+            del a
+
+            a = ht.array(test, split=None)
+            resplit_a = ht.manipulations.resplit(a, axis=0)
+            self.assertTrue(ht.equal(resplit_a, ht.array(test, split=0)))
+            self.assertEqual(resplit_a.split, 0)
+            self.assertEqual(resplit_a.dtype, ht.int64)
+            del a
+
+            # 1D non-contiguous resplit testing
+            t1 = ht.arange(10 * 10, split=0).reshape((10, 10))
+            t1_sub = t1[:, 1]  # .expand_dims(0)
+            res = ht.array([1, 11, 21, 31, 41, 51, 61, 71, 81, 91])
+            t1_sub.resplit_(axis=None)
+            self.assertTrue(ht.all(t1_sub == res))
+            self.assertEqual(t1_sub.split, None)
 
         # 3D non-contiguous resplit testing (Column mayor ordering)
         torch_array = torch.arange(100, device=self.device.torch_device).reshape((10, 5, 2))
@@ -1257,14 +1297,20 @@ def test_setitem_getitem(self):
 
         # setting with heat tensor
         a = ht.zeros((4, 5), split=0)
-        a[1, 0:4] = ht.arange(4)
+        if self.is_mps:
+            a[1, 0:4] = ht.arange(4, dtype=a.dtype)
+        else:
+            a[1, 0:4] = ht.arange(4)
         # if a.comm.size == 2:
         for c, i in enumerate(range(4)):
             self.assertEqual(a[1, c], i)
 
         # setting with torch tensor
         a = ht.zeros((4, 5), split=0)
-        a[1, 0:4] = torch.arange(4, device=self.device.torch_device)
+        if self.is_mps:
+            a[1, 0:4] = torch.arange(4, dtype=a.larray.dtype, device=self.device.torch_device)
+        else:
+            a[1, 0:4] = torch.arange(4, device=self.device.torch_device)
         # if a.comm.size == 2:
         for c, i in enumerate(range(4)):
             self.assertEqual(a[1, c], i)
@@ -1365,7 +1411,10 @@ def test_setitem_getitem(self):
 
         # setting with heat tensor
         a = ht.zeros((4, 5), split=1)
-        a[1, 0:4] = ht.arange(4)
+        if self.is_mps:
+            a[1, 0:4] = ht.arange(4, dtype=a.dtype)
+        else:
+            a[1, 0:4] = ht.arange(4)
         for c, i in enumerate(range(4)):
             b = a[1, c]
             if b.larray.numel() > 0:
@@ -1373,7 +1422,10 @@ def test_setitem_getitem(self):
 
         # setting with torch tensor
         a = ht.zeros((4, 5), split=1)
-        a[1, 0:4] = torch.arange(4, device=self.device.torch_device)
+        if a.device.torch_device.startswith("mps"):
+            a[1, 0:4] = torch.arange(4, dtype=a.larray.dtype, device=self.device.torch_device)
+        else:
+            a[1, 0:4] = torch.arange(4, device=self.device.torch_device)
         for c, i in enumerate(range(4)):
             self.assertEqual(a[1, c], i)
 
@@ -1615,20 +1667,22 @@ def test_stride_and_strides(self):
             self.assertEqual(heat_float32.strides, numpy_float32.strides)
 
         # Local, float64, column-major memory layout
-        torch_float64 = torch.arange(
-            6 * 5 * 3 * 4 * 5 * 7, dtype=torch.float64, device=self.device.torch_device
-        ).reshape(6, 5, 3, 4, 5, 7)
-        heat_float64_F = ht.array(torch_float64, order="F")
-        numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F")
-        self.assertNotEqual(heat_float64_F.stride(), torch_float64.stride())
-        if pytorch_major_version >= 2:
-            self.assertTrue(
-                (
-                    np.asarray(heat_float64_F.strides) * 8 == np.asarray(numpy_float64_F.strides)
-                ).all()
-            )
-        else:
-            self.assertEqual(heat_float64_F.strides, numpy_float64_F.strides)
+        if not self.is_mps:
+            torch_float64 = torch.arange(
+                6 * 5 * 3 * 4 * 5 * 7, dtype=torch.float64, device=self.device.torch_device
+            ).reshape(6, 5, 3, 4, 5, 7)
+            heat_float64_F = ht.array(torch_float64, order="F")
+            numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F")
+            self.assertNotEqual(heat_float64_F.stride(), torch_float64.stride())
+            if pytorch_major_version >= 2:
+                self.assertTrue(
+                    (
+                        np.asarray(heat_float64_F.strides) * 8
+                        == np.asarray(numpy_float64_F.strides)
+                    ).all()
+                )
+            else:
+                self.assertEqual(heat_float64_F.strides, numpy_float64_F.strides)
 
         # Distributed, int16, row-major memory layout
         size = ht.communication.MPI_WORLD.size
@@ -1674,24 +1728,25 @@ def test_stride_and_strides(self):
             self.assertEqual(heat_float32_split.strides, numpy_float32_split_strides)
 
         # Distributed, float64, column-major memory layout
-        split = -2
-        torch_float64 = torch.arange(
-            6 * 5 * 3 * 4 * 5 * size * 7, dtype=torch.float64, device=self.device.torch_device
-        ).reshape(6, 5, 3, 4, 5 * size, 7)
-        heat_float64_F_split = ht.array(torch_float64, order="F", split=split)
-        numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F")
-        numpy_float64_F_split_strides = numpy_float64_F.strides[: split + 1] + tuple(
-            np.array(numpy_float64_F.strides[split + 1 :]) / size
-        )
-        if pytorch_major_version >= 2:
-            self.assertTrue(
-                (
-                    np.asarray(heat_float64_F_split.strides) * 8
-                    == np.asarray(numpy_float64_F_split_strides)
-                ).all()
+        if not self.is_mps:
+            split = -2
+            torch_float64 = torch.arange(
+                6 * 5 * 3 * 4 * 5 * size * 7, dtype=torch.float64, device=self.device.torch_device
+            ).reshape(6, 5, 3, 4, 5 * size, 7)
+            heat_float64_F_split = ht.array(torch_float64, order="F", split=split)
+            numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F")
+            numpy_float64_F_split_strides = numpy_float64_F.strides[: split + 1] + tuple(
+                np.array(numpy_float64_F.strides[split + 1 :]) / size
             )
-        else:
-            self.assertEqual(heat_float64_F_split.strides, numpy_float64_F_split_strides)
+            if pytorch_major_version >= 2:
+                self.assertTrue(
+                    (
+                        np.asarray(heat_float64_F_split.strides) * 8
+                        == np.asarray(numpy_float64_F_split_strides)
+                    ).all()
+                )
+            else:
+                self.assertEqual(heat_float64_F_split.strides, numpy_float64_F_split_strides)
 
     def test_tolist(self):
         a = ht.zeros([ht.MPI_WORLD.size, ht.MPI_WORLD.size, ht.MPI_WORLD.size], dtype=ht.int32)
@@ -1758,6 +1813,14 @@ def test_torch_proxy(self):
             )
         self.assertTrue(dndarray_proxy_nbytes == 1)
 
+    def test_torch_function(self):
+        arr = ht.array([1, 2, 3, 4])
+        self.assertIsInstance(torch.concatenate([arr, arr]), ht.DNDarray)
+        self.assertIsInstance(torch.sum(arr, axis=0), ht.DNDarray)
+
+        with self.assertRaises(TypeError):
+            torch.sigmoid(arr)
+
     def test_xor(self):
         int16_tensor = ht.array([[1, 1], [2, 2]], dtype=ht.int16)
         int16_vector = ht.array([[3, 4]], dtype=ht.int16)
diff --git a/heat/core/tests/test_exponential.py b/heat/core/tests/test_exponential.py
index 861e0166d2..b26cfe789a 100644
--- a/heat/core/tests/test_exponential.py
+++ b/heat/core/tests/test_exponential.py
@@ -6,9 +6,14 @@
 
 
 class TestExponential(TestCase):
+    def set_torch_dtype(self):
+        dtype = torch.float32 if self.is_mps else torch.float64
+        return dtype
+
     def test_exp(self):
         elements = 10
-        tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).exp()
+        torch_dtype = self.set_torch_dtype()
+        tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).exp()
         comparison = ht.array(tmp)
 
         # exponential of float32
@@ -19,11 +24,12 @@ def test_exp(self):
         self.assertTrue(ht.allclose(float32_exp, comparison.astype(ht.float32)))
 
         # exponential of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_exp = ht.exp(float64_tensor)
-        self.assertIsInstance(float64_exp, ht.DNDarray)
-        self.assertEqual(float64_exp.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_exp, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_exp = ht.exp(float64_tensor)
+            self.assertIsInstance(float64_exp, ht.DNDarray)
+            self.assertEqual(float64_exp.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_exp, comparison))
 
         # exponential of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
@@ -33,11 +39,12 @@ def test_exp(self):
         self.assertTrue(ht.allclose(int32_exp, ht.float32(comparison)))
 
         # exponential of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(elements, dtype=ht.int64)
-        int64_exp = int64_tensor.exp()
-        self.assertIsInstance(int64_exp, ht.DNDarray)
-        self.assertEqual(int64_exp.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_exp, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(elements, dtype=ht.int64)
+            int64_exp = int64_tensor.exp()
+            self.assertIsInstance(int64_exp, ht.DNDarray)
+            self.assertEqual(int64_exp.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_exp, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -56,8 +63,9 @@ def test_exp(self):
         self.assertEqual(actual.dtype, ht.float32)
 
     def test_expm1(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 10
-        tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).expm1()
+        tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).expm1()
         comparison = ht.array(tmp)
 
         # expm1onential of float32
@@ -68,11 +76,12 @@ def test_expm1(self):
         self.assertTrue(ht.allclose(float32_expm1, comparison.astype(ht.float32)))
 
         # expm1onential of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_expm1 = ht.expm1(float64_tensor)
-        self.assertIsInstance(float64_expm1, ht.DNDarray)
-        self.assertEqual(float64_expm1.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_expm1, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_expm1 = ht.expm1(float64_tensor)
+            self.assertIsInstance(float64_expm1, ht.DNDarray)
+            self.assertEqual(float64_expm1.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_expm1, comparison))
 
         # expm1onential of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
@@ -82,11 +91,12 @@ def test_expm1(self):
         self.assertTrue(ht.allclose(int32_expm1, ht.float32(comparison)))
 
         # expm1onential of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(elements, dtype=ht.int64)
-        int64_expm1 = int64_tensor.expm1()
-        self.assertIsInstance(int64_expm1, ht.DNDarray)
-        self.assertEqual(int64_expm1.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_expm1, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(elements, dtype=ht.int64)
+            int64_expm1 = int64_tensor.expm1()
+            self.assertIsInstance(int64_expm1, ht.DNDarray)
+            self.assertEqual(int64_expm1.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_expm1, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -95,8 +105,10 @@ def test_expm1(self):
             ht.expm1("hello world")
 
     def test_exp2(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 10
-        tmp = np.exp2(torch.arange(elements, dtype=torch.float64))
+        tmp = np.exp2(torch.arange(elements, dtype=torch_dtype).numpy())
+        tmp = torch.tensor(tmp)
         tmp = tmp.to(self.device.torch_device)
         comparison = ht.array(tmp, device=self.device)
 
@@ -108,11 +120,12 @@ def test_exp2(self):
         self.assertTrue(ht.allclose(float32_exp2, comparison.astype(ht.float32)))
 
         # exponential of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_exp2 = ht.exp2(float64_tensor)
-        self.assertIsInstance(float64_exp2, ht.DNDarray)
-        self.assertEqual(float64_exp2.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_exp2, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_exp2 = ht.exp2(float64_tensor)
+            self.assertIsInstance(float64_exp2, ht.DNDarray)
+            self.assertEqual(float64_exp2.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_exp2, comparison))
 
         # exponential of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
@@ -122,11 +135,12 @@ def test_exp2(self):
         self.assertTrue(ht.allclose(int32_exp2, ht.float32(comparison)))
 
         # exponential of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(elements, dtype=ht.int64)
-        int64_exp2 = int64_tensor.exp2()
-        self.assertIsInstance(int64_exp2, ht.DNDarray)
-        self.assertEqual(int64_exp2.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_exp2, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(elements, dtype=ht.int64)
+            int64_exp2 = int64_tensor.exp2()
+            self.assertIsInstance(int64_exp2, ht.DNDarray)
+            self.assertEqual(int64_exp2.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_exp2, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -135,8 +149,9 @@ def test_exp2(self):
             ht.exp2("hello world")
 
     def test_log(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device).log()
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log()
         comparison = ht.array(tmp)
 
         # logarithm of float32
@@ -147,11 +162,12 @@ def test_log(self):
         self.assertTrue(ht.allclose(float32_log, comparison.astype(ht.float32)))
 
         # logarithm of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_log = ht.log(float64_tensor)
-        self.assertIsInstance(float64_log, ht.DNDarray)
-        self.assertEqual(float64_log.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_log, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_log = ht.log(float64_tensor)
+            self.assertIsInstance(float64_log, ht.DNDarray)
+            self.assertEqual(float64_log.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_log, comparison))
 
         # logarithm of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(1, elements, dtype=ht.int32)
@@ -161,11 +177,12 @@ def test_log(self):
         self.assertTrue(ht.allclose(int32_log, ht.float32(comparison)))
 
         # logarithm of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(1, elements, dtype=ht.int64)
-        int64_log = int64_tensor.log()
-        self.assertIsInstance(int64_log, ht.DNDarray)
-        self.assertEqual(int64_log.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_log, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(1, elements, dtype=ht.int64)
+            int64_log = int64_tensor.log()
+            self.assertIsInstance(int64_log, ht.DNDarray)
+            self.assertEqual(int64_log.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_log, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -174,8 +191,9 @@ def test_log(self):
             ht.log("hello world")
 
     def test_log2(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device).log2()
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log2()
         comparison = ht.array(tmp)
 
         # logarithm of float32
@@ -186,11 +204,12 @@ def test_log2(self):
         self.assertTrue(ht.allclose(float32_log2, comparison.astype(ht.float32)))
 
         # logarithm of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_log2 = ht.log2(float64_tensor)
-        self.assertIsInstance(float64_log2, ht.DNDarray)
-        self.assertEqual(float64_log2.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_log2, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_log2 = ht.log2(float64_tensor)
+            self.assertIsInstance(float64_log2, ht.DNDarray)
+            self.assertEqual(float64_log2.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_log2, comparison))
 
         # logarithm of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(1, elements, dtype=ht.int32)
@@ -200,11 +219,12 @@ def test_log2(self):
         self.assertTrue(ht.allclose(int32_log2, ht.float32(comparison)))
 
         # logarithm of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(1, elements, dtype=ht.int64)
-        int64_log2 = int64_tensor.log2()
-        self.assertIsInstance(int64_log2, ht.DNDarray)
-        self.assertEqual(int64_log2.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_log2, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(1, elements, dtype=ht.int64)
+            int64_log2 = int64_tensor.log2()
+            self.assertIsInstance(int64_log2, ht.DNDarray)
+            self.assertEqual(int64_log2.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_log2, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -213,10 +233,9 @@ def test_log2(self):
             ht.log2("hello world")
 
     def test_log10(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(
-            1, elements, dtype=torch.float64, device=self.device.torch_device
-        ).log10()
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log10()
         comparison = ht.array(tmp)
 
         # logarithm of float32
@@ -227,11 +246,12 @@ def test_log10(self):
         self.assertTrue(ht.allclose(float32_log10, comparison.astype(ht.float32)))
 
         # logarithm of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_log10 = ht.log10(float64_tensor)
-        self.assertIsInstance(float64_log10, ht.DNDarray)
-        self.assertEqual(float64_log10.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_log10, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_log10 = ht.log10(float64_tensor)
+            self.assertIsInstance(float64_log10, ht.DNDarray)
+            self.assertEqual(float64_log10.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_log10, comparison))
 
         # logarithm of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(1, elements, dtype=ht.int32)
@@ -241,11 +261,12 @@ def test_log10(self):
         self.assertTrue(ht.allclose(int32_log10, ht.float32(comparison)))
 
         # logarithm of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(1, elements, dtype=ht.int64)
-        int64_log10 = int64_tensor.log10()
-        self.assertIsInstance(int64_log10, ht.DNDarray)
-        self.assertEqual(int64_log10.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_log10, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(1, elements, dtype=ht.int64)
+            int64_log10 = int64_tensor.log10()
+            self.assertIsInstance(int64_log10, ht.DNDarray)
+            self.assertEqual(int64_log10.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_log10, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -254,10 +275,9 @@ def test_log10(self):
             ht.log10("hello world")
 
     def test_log1p(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(
-            1, elements, dtype=torch.float64, device=self.device.torch_device
-        ).log1p()
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log1p()
         comparison = ht.array(tmp)
 
         # logarithm of float32
@@ -268,11 +288,12 @@ def test_log1p(self):
         self.assertTrue(ht.allclose(float32_log1p, comparison.astype(ht.float32)))
 
         # logarithm of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_log1p = ht.log1p(float64_tensor)
-        self.assertIsInstance(float64_log1p, ht.DNDarray)
-        self.assertEqual(float64_log1p.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_log1p, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_log1p = ht.log1p(float64_tensor)
+            self.assertIsInstance(float64_log1p, ht.DNDarray)
+            self.assertEqual(float64_log1p.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_log1p, comparison))
 
         # logarithm of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(1, elements, dtype=ht.int32)
@@ -282,11 +303,12 @@ def test_log1p(self):
         self.assertTrue(ht.allclose(int32_log1p, ht.float32(comparison)))
 
         # logarithm of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(1, elements, dtype=ht.int64)
-        int64_log1p = int64_tensor.log1p()
-        self.assertIsInstance(int64_log1p, ht.DNDarray)
-        self.assertEqual(int64_log1p.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_log1p, comparison))
+        if not self.is_mps:
+            int64_tensor = ht.arange(1, elements, dtype=ht.int64)
+            int64_log1p = int64_tensor.log1p()
+            self.assertIsInstance(int64_log1p, ht.DNDarray)
+            self.assertEqual(int64_log1p.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_log1p, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -295,8 +317,9 @@ def test_log1p(self):
             ht.log1p("hello world")
 
     def test_logaddexp(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device)
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device)
         tmp = tmp.logaddexp(tmp)
         comparison = ht.array(tmp)
 
@@ -308,11 +331,12 @@ def test_logaddexp(self):
         self.assertTrue(ht.allclose(float32_logaddexp, comparison.astype(ht.float32)))
 
         # logaddexp of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_logaddexp = ht.logaddexp(float64_tensor, float64_tensor)
-        self.assertIsInstance(float64_logaddexp, ht.DNDarray)
-        self.assertEqual(float64_logaddexp.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_logaddexp, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_logaddexp = ht.logaddexp(float64_tensor, float64_tensor)
+            self.assertIsInstance(float64_logaddexp, ht.DNDarray)
+            self.assertEqual(float64_logaddexp.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_logaddexp, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -321,8 +345,9 @@ def test_logaddexp(self):
             ht.logaddexp("hello world", "hello world")
 
     def test_logaddexp2(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 15
-        tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device)
+        tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device)
         tmp = tmp.logaddexp2(tmp)
         comparison = ht.array(tmp)
 
@@ -334,11 +359,12 @@ def test_logaddexp2(self):
         self.assertTrue(ht.allclose(float32_logaddexp2, comparison.astype(ht.float32)))
 
         # logaddexp2 of float64
-        float64_tensor = ht.arange(1, elements, dtype=ht.float64)
-        float64_logaddexp2 = ht.logaddexp2(float64_tensor, float64_tensor)
-        self.assertIsInstance(float64_logaddexp2, ht.DNDarray)
-        self.assertEqual(float64_logaddexp2.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_logaddexp2, comparison))
+        if not self.is_mps:
+            float64_tensor = ht.arange(1, elements, dtype=ht.float64)
+            float64_logaddexp2 = ht.logaddexp2(float64_tensor, float64_tensor)
+            self.assertIsInstance(float64_logaddexp2, ht.DNDarray)
+            self.assertEqual(float64_logaddexp2.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_logaddexp2, comparison))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -347,8 +373,9 @@ def test_logaddexp2(self):
             ht.logaddexp2("hello world", "hello world")
 
     def test_sqrt(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 25
-        tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).sqrt()
+        tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).sqrt()
         comparison = ht.array(tmp)
 
         # square roots of float32
@@ -359,11 +386,12 @@ def test_sqrt(self):
         self.assertTrue(ht.allclose(float32_sqrt, comparison.astype(ht.float32), 1e-06))
 
         # square roots of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_sqrt = ht.sqrt(float64_tensor)
-        self.assertIsInstance(float64_sqrt, ht.DNDarray)
-        self.assertEqual(float64_sqrt.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-06))
+        if not self.is_mps:
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_sqrt = ht.sqrt(float64_tensor)
+            self.assertIsInstance(float64_sqrt, ht.DNDarray)
+            self.assertEqual(float64_sqrt.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-06))
 
         # square roots of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
@@ -373,11 +401,12 @@ def test_sqrt(self):
         self.assertTrue(ht.allclose(int32_sqrt, ht.float32(comparison), 1e-06))
 
         # square roots of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(elements, dtype=ht.int64)
-        int64_sqrt = int64_tensor.sqrt()
-        self.assertIsInstance(int64_sqrt, ht.DNDarray)
-        self.assertEqual(int64_sqrt.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-06))
+        if not self.is_mps:
+            int64_tensor = ht.arange(elements, dtype=ht.int64)
+            int64_sqrt = int64_tensor.sqrt()
+            self.assertIsInstance(int64_sqrt, ht.DNDarray)
+            self.assertEqual(int64_sqrt.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-06))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -386,8 +415,9 @@ def test_sqrt(self):
             ht.sqrt("hello world")
 
     def test_sqrt_method(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 25
-        tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).sqrt()
+        tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).sqrt()
         comparison = ht.array(tmp)
 
         # square roots of float32
@@ -397,10 +427,11 @@ def test_sqrt_method(self):
         self.assertTrue(ht.allclose(float32_sqrt, comparison.astype(ht.float32), 1e-05))
 
         # square roots of float64
-        float64_sqrt = ht.arange(elements, dtype=ht.float64).sqrt()
-        self.assertIsInstance(float64_sqrt, ht.DNDarray)
-        self.assertEqual(float64_sqrt.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-05))
+        if not self.is_mps:
+            float64_sqrt = ht.arange(elements, dtype=ht.float64).sqrt()
+            self.assertIsInstance(float64_sqrt, ht.DNDarray)
+            self.assertEqual(float64_sqrt.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-05))
 
         # square roots of ints, automatic conversion to intermediate floats
         int32_sqrt = ht.arange(elements, dtype=ht.int32).sqrt()
@@ -409,10 +440,11 @@ def test_sqrt_method(self):
         self.assertTrue(ht.allclose(int32_sqrt, ht.float32(comparison), 1e-05))
 
         # square roots of longs, automatic conversion to intermediate floats
-        int64_sqrt = ht.arange(elements, dtype=ht.int64).sqrt()
-        self.assertIsInstance(int64_sqrt, ht.DNDarray)
-        self.assertEqual(int64_sqrt.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-05))
+        if not self.is_mps:
+            int64_sqrt = ht.arange(elements, dtype=ht.int64).sqrt()
+            self.assertIsInstance(int64_sqrt, ht.DNDarray)
+            self.assertEqual(int64_sqrt.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-05))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -449,9 +481,10 @@ def test_sqrt_out_of_place(self):
             ht.sqrt(number_range, "hello world")
 
     def test_square(self):
+        torch_dtype = self.set_torch_dtype()
         elements = 25
         tmp = torch.square(
-            torch.arange(elements, dtype=torch.float64, device=self.device.torch_device)
+            torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device)
         )
         comparison = ht.array(tmp)
 
@@ -463,11 +496,12 @@ def test_square(self):
         self.assertTrue(ht.allclose(float32_square, comparison.astype(ht.float32), 1e-09))
 
         # squares of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_square = ht.square(float64_tensor)
-        self.assertIsInstance(float64_square, ht.DNDarray)
-        self.assertEqual(float64_square.dtype, ht.float64)
-        self.assertTrue(ht.allclose(float64_square, comparison, 1e-09))
+        if not self.is_mps:
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_square = ht.square(float64_tensor)
+            self.assertIsInstance(float64_square, ht.DNDarray)
+            self.assertEqual(float64_square.dtype, ht.float64)
+            self.assertTrue(ht.allclose(float64_square, comparison, 1e-09))
 
         # squares of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
@@ -477,11 +511,12 @@ def test_square(self):
         self.assertTrue(ht.allclose(int32_square, ht.float32(comparison), 1e-09))
 
         # squares of longs, automatic conversion to intermediate floats
-        int64_tensor = ht.arange(elements, dtype=ht.int64)
-        int64_square = int64_tensor.square()
-        self.assertIsInstance(int64_square, ht.DNDarray)
-        self.assertEqual(int64_square.dtype, ht.float64)
-        self.assertTrue(ht.allclose(int64_square, comparison, 1e-09))
+        if not self.is_mps:
+            int64_tensor = ht.arange(elements, dtype=ht.int64)
+            int64_square = int64_tensor.square()
+            self.assertIsInstance(int64_square, ht.DNDarray)
+            self.assertEqual(int64_square.dtype, ht.float64)
+            self.assertTrue(ht.allclose(int64_square, comparison, 1e-09))
 
         # check exceptions
         with self.assertRaises(TypeError):
diff --git a/heat/core/tests/test_factories.py b/heat/core/tests/test_factories.py
index 25b3845f2a..a4ea615090 100644
--- a/heat/core/tests/test_factories.py
+++ b/heat/core/tests/test_factories.py
@@ -96,15 +96,19 @@ def test_arange(self):
         self.assertEqual(three_arg_arange_dtype_short.sum(axis=0, keepdims=True), 20)
 
         # testing setting dtype to float64
-        three_arg_arange_dtype_float64 = ht.arange(0, 10, 2, dtype=torch.float64)
-        self.assertIsInstance(three_arg_arange_dtype_float64, ht.DNDarray)
-        self.assertEqual(three_arg_arange_dtype_float64.shape, (5,))
-        self.assertLessEqual(three_arg_arange_dtype_float64.lshape[0], 5)
-        self.assertEqual(three_arg_arange_dtype_float64.dtype, ht.float64)
-        self.assertEqual(three_arg_arange_dtype_float64.larray.dtype, torch.float64)
-        self.assertEqual(three_arg_arange_dtype_float64.split, None)
-        # make an in direct check for the sequence, compare against the gaussian sum
-        self.assertEqual(three_arg_arange_dtype_float64.sum(axis=0, keepdims=True), 20.0)
+        if not self.is_mps:
+            three_arg_arange_dtype_float64 = ht.arange(0, 10, 2, dtype=torch.float64)
+            self.assertIsInstance(three_arg_arange_dtype_float64, ht.DNDarray)
+            self.assertEqual(three_arg_arange_dtype_float64.shape, (5,))
+            self.assertLessEqual(three_arg_arange_dtype_float64.lshape[0], 5)
+            self.assertEqual(three_arg_arange_dtype_float64.dtype, ht.float64)
+            self.assertEqual(three_arg_arange_dtype_float64.larray.dtype, torch.float64)
+            self.assertEqual(three_arg_arange_dtype_float64.split, None)
+            # make an in direct check for the sequence, compare against the gaussian sum
+            self.assertEqual(three_arg_arange_dtype_float64.sum(axis=0, keepdims=True), 20.0)
+
+            check_precision = ht.arange(16777217.0, 16777218, 1, dtype=ht.float64)
+            self.assertEqual(check_precision.sum(), 16777217)
 
         check_precision = ht.arange(16777217.0, 16777218, 1, dtype=ht.float64)
         self.assertEqual(check_precision.sum(), 16777217)
@@ -145,8 +149,9 @@ def test_array(self):
                 == torch.tensor(tuple_data, dtype=torch.int8, device=self.device.torch_device)
             ).all()
         )
-        check_precision = ht.array(16777217.0, dtype=ht.float64)
-        self.assertEqual(check_precision.sum(), 16777217)
+        if not self.is_mps:
+            check_precision = ht.array(16777217.0, dtype=ht.float64)
+            self.assertEqual(check_precision.sum(), 16777217)
 
         # basic array function, unsplit data, no copy
         torch_tensor = torch.tensor([6, 5, 4, 3, 2, 1], device=self.device.torch_device)
@@ -190,10 +195,18 @@ def test_array(self):
         )
 
         # distributed array, chunk local data (split), copy True
-        array_2d = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]])
+        if self.is_mps:
+            np_dtype = np.float32
+            torch_dtype = torch.float32
+        else:
+            np_dtype = np.float64
+            torch_dtype = torch.float64
+        ht_dtype = ht.types.canonical_heat_type(torch_dtype)
+
+        array_2d = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]], dtype=np_dtype)
         dndarray_2d = ht.array(array_2d, split=0, copy=True)
         self.assertIsInstance(dndarray_2d, ht.DNDarray)
-        self.assertEqual(dndarray_2d.dtype, ht.float64)
+        self.assertEqual(dndarray_2d.dtype, ht_dtype)
         self.assertEqual(dndarray_2d.gshape, (3, 3))
         self.assertEqual(len(dndarray_2d.lshape), 2)
         self.assertLessEqual(dndarray_2d.lshape[0], 3)
@@ -208,12 +221,12 @@ def test_array(self):
         # distributed array, chunk local data (split), copy False, torch devices
         array_2d = torch.tensor(
             [[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]],
-            dtype=torch.double,
+            dtype=torch_dtype,
             device=self.device.torch_device,
         )
-        dndarray_2d = ht.array(array_2d, split=0, copy=False, dtype=ht.double)
+        dndarray_2d = ht.array(array_2d, split=0, copy=False, dtype=ht_dtype)
         self.assertIsInstance(dndarray_2d, ht.DNDarray)
-        self.assertEqual(dndarray_2d.dtype, ht.float64)
+        self.assertEqual(dndarray_2d.dtype, ht_dtype)
         self.assertEqual(dndarray_2d.gshape, (3, 3))
         self.assertEqual(len(dndarray_2d.lshape), 2)
         self.assertLessEqual(dndarray_2d.lshape[0], 3)
@@ -229,9 +242,9 @@ def test_array(self):
             self.assertIs(dndarray_2d.larray, array_2d)
 
         # The array should not change as all properties match
-        dndarray_2d_new = ht.array(dndarray_2d, split=0, copy=False, dtype=ht.double)
+        dndarray_2d_new = ht.array(dndarray_2d, split=0, copy=False, dtype=ht_dtype)
         self.assertIsInstance(dndarray_2d_new, ht.DNDarray)
-        self.assertEqual(dndarray_2d_new.dtype, ht.float64)
+        self.assertEqual(dndarray_2d_new.dtype, ht_dtype)
         self.assertEqual(dndarray_2d_new.gshape, (3, 3))
         self.assertEqual(len(dndarray_2d_new.lshape), 2)
         self.assertLessEqual(dndarray_2d_new.lshape[0], 3)
@@ -245,14 +258,14 @@ def test_array(self):
         # Reuse the same array
         self.assertIs(dndarray_2d_new.larray, dndarray_2d.larray)
 
-        # Should throw exeception because of resplit it causes a resplit
+        # Should throw exeception because it causes a resplit
         with self.assertRaises(ValueError):
             dndarray_2d_new = ht.array(dndarray_2d, split=1, copy=False, dtype=ht.double)
 
         # The array should not change as all properties match
-        dndarray_2d_new = ht.array(dndarray_2d, is_split=0, copy=False, dtype=ht.double)
+        dndarray_2d_new = ht.array(dndarray_2d, is_split=0, copy=False, dtype=ht_dtype)
         self.assertIsInstance(dndarray_2d_new, ht.DNDarray)
-        self.assertEqual(dndarray_2d_new.dtype, ht.float64)
+        self.assertEqual(dndarray_2d_new.dtype, ht_dtype)
         self.assertEqual(dndarray_2d_new.gshape, (3, 3))
         self.assertEqual(len(dndarray_2d_new.lshape), 2)
         self.assertLessEqual(dndarray_2d_new.lshape[0], 3)
@@ -574,65 +587,67 @@ def get_offset(tensor_array):
 
     def test_from_partitioned(self):
         a = ht.zeros((120, 120), split=0)
-        b = ht.from_partitioned(a, comm=a.comm)
-        a[2, :] = 128
-        self.assertTrue(ht.equal(a, b))
-
-        a.resplit_(None)
-        b = ht.from_partitioned(a, comm=a.comm)
-        self.assertTrue(ht.equal(a, b))
-
-        a.resplit_(1)
-        b = ht.from_partitioned(a, comm=a.comm)
-        b[50] = 94
-        self.assertTrue(ht.equal(a, b))
-
-        del b.__partitioned__["shape"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partitioned(b)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
-
-        del b.__partitioned__["locals"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partitioned(b)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
-
-        del b.__partitioned__["locals"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partitioned(b)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
+        if not self.is_mps:
+            b = ht.from_partitioned(a, comm=a.comm)
+            a[2, :] = 128
+            self.assertTrue(ht.equal(a, b))
+
+            a.resplit_(None)
+            b = ht.from_partitioned(a, comm=a.comm)
+            self.assertTrue(ht.equal(a, b))
+
+            a.resplit_(1)
+            b = ht.from_partitioned(a, comm=a.comm)
+            b[50] = 94
+            self.assertTrue(ht.equal(a, b))
+
+            del b.__partitioned__["shape"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partitioned(b)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
+
+            del b.__partitioned__["locals"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partitioned(b)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
+
+            del b.__partitioned__["locals"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partitioned(b)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
 
     def test_from_partition_dict(self):
         a = ht.zeros((120, 120), split=0)
-        b = ht.from_partition_dict(a.__partitioned__, comm=a.comm)
-        a[0, 0] = 100
-        self.assertTrue(ht.equal(a, b))
-
-        a.resplit_(None)
-        a[0, 0] = 50
-        b = ht.from_partition_dict(a.__partitioned__, comm=a.comm)
-        self.assertTrue(ht.equal(a, b))
-
-        del b.__partitioned__["shape"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partition_dict(b.__partitioned__)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
-
-        del b.__partitioned__["locals"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partition_dict(b.__partitioned__)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
-
-        del b.__partitioned__["locals"]
-        with self.assertRaises(RuntimeError):
-            _ = ht.from_partition_dict(b.__partitioned__)
-        b.__partitions_dict__ = None
-        _ = b.__partitioned__
+        if not self.is_mps:
+            b = ht.from_partition_dict(a.__partitioned__, comm=a.comm)
+            a[0, 0] = 100
+            self.assertTrue(ht.equal(a, b))
+
+            a.resplit_(None)
+            a[0, 0] = 50
+            b = ht.from_partition_dict(a.__partitioned__, comm=a.comm)
+            self.assertTrue(ht.equal(a, b))
+
+            del b.__partitioned__["shape"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partition_dict(b.__partitioned__)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
+
+            del b.__partitioned__["locals"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partition_dict(b.__partitioned__)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
+
+            del b.__partitioned__["locals"]
+            with self.assertRaises(RuntimeError):
+                _ = ht.from_partition_dict(b.__partitioned__)
+            b.__partitions_dict__ = None
+            _ = b.__partitioned__
 
     def test_full(self):
         # simple tensor
@@ -732,8 +747,9 @@ def test_linspace(self):
 
         zero_samples = ht.linspace(-3, 5, num=0)
         self.assertEqual(zero_samples.size, 0)
-        check_precision = ht.linspace(0.0, 16777217.0, num=2, dtype=torch.float64)
-        self.assertEqual(check_precision.sum(), 16777217)
+        if not self.is_mps:
+            check_precision = ht.linspace(0.0, 16777217.0, num=2, dtype=torch.float64)
+            self.assertEqual(check_precision.sum(), 16777217)
 
         # simple inverse linear space
         descending = ht.linspace(-5, 3, num=100)
diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py
index 6f75846e5f..ac5ebd4a6c 100644
--- a/heat/core/tests/test_io.py
+++ b/heat/core/tests/test_io.py
@@ -1,3 +1,4 @@
+from typing import Iterable
 import numpy as np
 import os
 import torch
@@ -35,6 +36,11 @@ def setUpClass(cls):
             .to(cls.device.torch_device)
         )
 
+        cls.ZARR_SHAPE = (100, 100)
+        cls.ZARR_OUT_PATH = pwd + "/zarr_test_out.zarr"
+        cls.ZARR_IN_PATH = pwd + "/zarr_test_in.zarr"
+        cls.ZARR_TEMP_PATH = pwd + "/zarr_temp.zarr"
+
     def tearDown(self):
         # synchronize all nodes
         ht.MPI_WORLD.Barrier()
@@ -53,9 +59,38 @@ def tearDown(self):
                 pass
         # if ht.MPI_WORLD.rank == 0:
 
+        if ht.io.supports_zarr():
+            for file in [self.ZARR_TEMP_PATH, self.ZARR_IN_PATH, self.ZARR_OUT_PATH]:
+                try:
+                    shutil.rmtree(file)
+                except FileNotFoundError:
+                    pass
+
         # synchronize all nodes
         ht.MPI_WORLD.Barrier()
 
+    def test_size_from_slice(self):
+        test_cases = [
+            (1000, slice(500)),
+            (10, slice(0, 10, 2)),
+            (100, slice(0, 100, 10)),
+            (1000, slice(0, 1000, 100)),
+            (0, slice(0)),
+        ]
+        for size, slice_obj in test_cases:
+            with self.subTest(size=size, slice=slice_obj):
+                expected_sequence = list(range(size))[slice_obj]
+                if len(expected_sequence) == 0:
+                    expected_offset = 0
+                else:
+                    expected_offset = expected_sequence[0]
+
+                expected_new_size = len(expected_sequence)
+
+                new_size, offset = ht.io.size_from_slice(size, slice_obj)
+                self.assertEqual(expected_new_size, new_size)
+                self.assertEqual(expected_offset, offset)
+
     # catch-all loading
     def test_load(self):
         # HDF5
@@ -154,12 +189,23 @@ def test_load_csv(self):
         "Requires the environment variable 'TMPDIR' to point to a globally accessible path. Otherwise the test will be skiped on multi-node setups.",
     )
     def test_save_csv(self):
-        for rnd_type in [
-            (ht.random.randint, ht.types.int32),
-            (ht.random.randint, ht.types.int64),
-            (ht.random.rand, ht.types.float32),
-            (ht.random.rand, ht.types.float64),
-        ]:
+        # Test for different random types
+        # include float64 only if device is not MPS
+        data = None
+        if self.is_mps:
+            rnd_types = [
+                (ht.random.randint, ht.types.int32),
+                (ht.random.randint, ht.types.int64),
+                (ht.random.rand, ht.types.float32),
+            ]
+        else:
+            rnd_types = [
+                (ht.random.randint, ht.types.int32),
+                (ht.random.randint, ht.types.int64),
+                (ht.random.rand, ht.types.float32),
+                (ht.random.rand, ht.types.float64),
+            ]
+        for rnd_type in rnd_types:
             for separator in [",", ";", "|"]:
                 for split in [None, 0, 1]:
                     for headers in [None, ["# This", "# is a", "# test."]]:
@@ -541,10 +587,6 @@ def test_load_hdf5(self):
         self.assertEqual(iris.larray.dtype, torch.float32)
         self.assertTrue((self.IRIS == iris.larray).all())
 
-        # cropped load
-        iris_cropped = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0, load_fraction=0.5)
-        self.assertEqual(iris_cropped.shape[0], iris.shape[0] // 2)
-
         # positive split axis
         iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0)
         self.assertIsInstance(iris, ht.DNDarray)
@@ -582,10 +624,6 @@ def test_load_hdf5_exception(self):
             ht.load_hdf5("iris.h5", 1)
         with self.assertRaises(TypeError):
             ht.load_hdf5("iris.h5", dataset="data", split=1.0)
-        with self.assertRaises(TypeError):
-            ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, load_fraction="a")
-        with self.assertRaises(ValueError):
-            ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, load_fraction=0.0, split=0)
 
         # file or dataset does not exist
         with self.assertRaises(IOError):
@@ -783,17 +821,19 @@ def test_load_npy_float(self):
             float_array = np.concatenate(crea_array, 1)
         ht.MPI_WORLD.Barrier()
 
-        load_array = ht.load_npy_from_path(
-            os.path.join(os.getcwd(), "heat/datasets"), dtype=ht.float64, split=1
-        )
-        load_array_npy = load_array.numpy()
-        self.assertIsInstance(load_array, ht.DNDarray)
-        self.assertEqual(load_array.dtype, ht.float64)
-        if ht.MPI_WORLD.rank == 0:
-            self.assertTrue((load_array_npy == float_array).all)
-            for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")):
-                if fnmatch.fnmatch(file, "*.npy"):
-                    os.remove(os.path.join(os.getcwd(), "heat/datasets", file))
+        if not self.is_mps:
+            # float64 not supported in MPS
+            load_array = ht.load_npy_from_path(
+                os.path.join(os.getcwd(), "heat/datasets"), dtype=ht.float64, split=1
+            )
+            load_array_npy = load_array.numpy()
+            self.assertIsInstance(load_array, ht.DNDarray)
+            self.assertEqual(load_array.dtype, ht.float64)
+            if ht.MPI_WORLD.rank == 0:
+                self.assertTrue((load_array_npy == float_array).all)
+                for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")):
+                    if fnmatch.fnmatch(file, "*.npy"):
+                        os.remove(os.path.join(os.getcwd(), "heat/datasets", file))
 
     def test_load_npy_exception(self):
         with self.assertRaises(TypeError):
@@ -892,3 +932,235 @@ def test_load_multiple_csv_exception(self):
             ht.MPI_WORLD.Barrier()
             if ht.MPI_WORLD.rank == 0:
                 shutil.rmtree(os.path.join(os.getcwd(), "heat/datasets/csv_tests"))
+
+    def test_load_zarr(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        test_data = np.arange(self.ZARR_SHAPE[0] * self.ZARR_SHAPE[1]).reshape(self.ZARR_SHAPE)
+
+        if ht.MPI_WORLD.rank == 0:
+            try:
+                arr = zarr.create_array(
+                    self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64
+                )
+            except AttributeError:
+                arr = zarr.create(
+                    store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64
+                )
+            arr[:] = test_data
+
+        ht.MPI_WORLD.handle.Barrier()
+
+        dndarray = ht.load_zarr(self.ZARR_TEMP_PATH)
+        dndnumpy = dndarray.numpy()
+
+        if ht.MPI_WORLD.rank == 0:
+            self.assertTrue((dndnumpy == test_data).all())
+
+        ht.MPI_WORLD.Barrier()
+
+    def test_load_zarr_slice(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        test_data = np.arange(25).reshape(5, 5)
+
+        if ht.MPI_WORLD.rank == 0:
+            try:
+                arr = zarr.create_array(
+                    self.ZARR_TEMP_PATH, shape=test_data.shape, dtype=test_data.dtype
+                )
+            except AttributeError:
+                arr = zarr.create(
+                    store=self.ZARR_TEMP_PATH, shape=test_data.shape, dtype=test_data.dtype
+                )
+            arr[:] = test_data
+
+        ht.MPI_WORLD.Barrier()
+
+        slices_to_test = [
+            None,
+            slice(None),
+            slice(1, -1),
+            [None],
+            [None, slice(None)],
+            [None, slice(1, -1)],
+            [slice(1, -1)],
+            [slice(1, -1), None],
+        ]
+
+        for slices in slices_to_test:
+            with self.subTest(silces=slices):
+                dndarray = ht.load_zarr(self.ZARR_TEMP_PATH, slices=slices)
+                dndnumpy = dndarray.numpy()
+
+                if not isinstance(slices, Iterable):
+                    slices = [slices]
+
+                slices = tuple(
+                    slice(elem) if not isinstance(elem, slice) else elem for elem in slices
+                )
+
+                if ht.MPI_WORLD.rank == 0:
+                    self.assertTrue((dndnumpy == test_data[slices]).all())
+
+                ht.MPI_WORLD.Barrier()
+
+    def test_save_zarr_2d_split0(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+            for dims in [(i, self.ZARR_SHAPE[1]) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]:
+                with self.subTest(type=type, dims=dims):
+                    n = dims[0] * dims[1]
+                    dndarray = ht.arange(0, n, dtype=type, split=0).reshape(dims)
+                    ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True)
+                    dndnumpy = dndarray.numpy()
+                    zarr_array = zarr.open_array(self.ZARR_OUT_PATH)
+
+                    if ht.MPI_WORLD.rank == 0:
+                        self.assertTrue((dndnumpy == zarr_array).all())
+
+                    ht.MPI_WORLD.handle.Barrier()
+
+    def test_save_zarr_2d_split1(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+            for dims in [(self.ZARR_SHAPE[0], i) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]:
+                with self.subTest(type=type, dims=dims):
+                    n = dims[0] * dims[1]
+                    dndarray = ht.arange(0, n, dtype=type).reshape(dims).resplit(axis=1)
+                    ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True)
+                    dndnumpy = dndarray.numpy()
+                    zarr_array = zarr.open_array(self.ZARR_OUT_PATH)
+
+                    if ht.MPI_WORLD.rank == 0:
+                        self.assertTrue((dndnumpy == zarr_array).all())
+
+                    ht.MPI_WORLD.handle.Barrier()
+
+    def test_save_zarr_split_none(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+            for n in [10, 100, 1000]:
+                with self.subTest(type=type, n=n):
+                    dndarray = ht.arange(n, dtype=type, split=None)
+                    ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True)
+                    arr = zarr.open_array(self.ZARR_OUT_PATH)
+                    dndnumpy = dndarray.numpy()
+                    if ht.MPI_WORLD.rank == 0:
+                        self.assertTrue((dndnumpy == arr).all())
+
+                    ht.MPI_WORLD.handle.Barrier()
+
+    def test_save_zarr_1d_split_0(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+            for n in [10, 100, 1000]:
+                with self.subTest(type=type, n=n):
+                    dndarray = ht.arange(n, dtype=type, split=0)
+                    ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True)
+                    arr = zarr.open_array(self.ZARR_OUT_PATH)
+                    dndnumpy = dndarray.numpy()
+                    if ht.MPI_WORLD.rank == 0:
+                        self.assertTrue((dndnumpy == arr).all())
+
+                    ht.MPI_WORLD.handle.Barrier()
+
+    def test_load_zarr_arguments(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        with self.assertRaises(TypeError):
+            ht.load_zarr(None)
+        with self.assertRaises(ValueError):
+            ht.load_zarr("data.npy")
+        with self.assertRaises(TypeError):
+            ht.load_zarr("", "")
+        with self.assertRaises(TypeError):
+            ht.load_zarr("", device=1)
+        with self.assertRaises(TypeError):
+            ht.load_zarr("", slices=0)
+        with self.assertRaises(TypeError):
+            ht.load_zarr("", slices=[0])
+
+    def test_save_zarr_arguments(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        with self.assertRaises(TypeError):
+            ht.save_zarr(None, None)
+        with self.assertRaises(ValueError):
+            ht.save_zarr(None, "data.npy")
+
+        comm = ht.MPI_WORLD
+        if comm.rank == 0:
+            zarr.create(
+                store=self.ZARR_TEMP_PATH,
+                shape=(4, 4),
+                dtype=ht.types.int.char(),
+                overwrite=True,
+            )
+        comm.Barrier()
+
+        with self.assertRaises(RuntimeError):
+            ht.save_zarr(ht.arange(16).reshape((4, 4)), self.ZARR_TEMP_PATH)
+
+    @unittest.skipIf(not ht.io.supports_hdf5(), reason="Requires HDF5")
+    def test_load_partial_hdf5(self):
+        test_axis = [None, 0, 1]
+        test_slices = [
+            (slice(0, 50, None), slice(None, None, None)),
+            (slice(0, 50, None), slice(0, 2, None)),
+            (slice(50, 100, None), slice(None, None, None)),
+            (slice(None, None, None), slice(2, 4, None)),
+            (slice(50), None),
+            (None, slice(0, 3, 2)),
+            (slice(50),),
+            (slice(50, 100),),
+        ]
+        test_cases = [(a, s) for a in test_axis for s in test_slices]
+
+        for axis, slices in test_cases:
+            with self.subTest(axis=axis, slices=slices):
+                HDF5_PATH = os.path.join(os.getcwd(), "heat/datasets/iris.h5")
+                HDF5_DATASET = "data"
+                expect_error = False
+                for s in slices:
+                    if s and s.step not in [None, 1]:
+                        expect_error = True
+                        break
+
+                if expect_error:
+                    with self.assertRaises(ValueError):
+                        sliced_iris = ht.load_hdf5(
+                            HDF5_PATH, HDF5_DATASET, split=axis, slices=slices
+                        )
+                else:
+                    original_iris = ht.load_hdf5(HDF5_PATH, HDF5_DATASET, split=axis)
+                    tmp_slices = tuple(slice(None) if s is None else s for s in slices)
+                    expected_iris = original_iris[tmp_slices]
+                    sliced_iris = ht.load_hdf5(HDF5_PATH, HDF5_DATASET, split=axis, slices=slices)
+                    self.assertTrue(ht.equal(sliced_iris, expected_iris))
diff --git a/heat/core/tests/test_logical.py b/heat/core/tests/test_logical.py
index 3e46fd144e..c2da61d64b 100644
--- a/heat/core/tests/test_logical.py
+++ b/heat/core/tests/test_logical.py
@@ -182,7 +182,9 @@ def test_allclose(self):
         c = ht.zeros((4, 6), split=0)
         d = ht.zeros((4, 6), split=1)
         e = ht.zeros((4, 6))
-        f = ht.float64([[2.000005, 2.000005], [2.000005, 2.000005]])
+
+        if not self.is_mps:
+            f = ht.float64([[2.000005, 2.000005], [2.000005, 2.000005]])
 
         self.assertFalse(ht.allclose(a, b))
         self.assertTrue(ht.allclose(a, b, atol=1e-04))
@@ -190,7 +192,8 @@ def test_allclose(self):
         self.assertTrue(ht.allclose(a, 2))
         self.assertTrue(ht.allclose(a, 2.0))
         self.assertTrue(ht.allclose(2, a))
-        self.assertTrue(ht.allclose(f, a))
+        if not self.is_mps:
+            self.assertTrue(ht.allclose(f, a))
         self.assertTrue(ht.allclose(c, d))
         self.assertTrue(ht.allclose(c, e))
         self.assertTrue(e.allclose(c))
@@ -223,13 +226,14 @@ def test_any(self):
         self.assertTrue(ht.equal(any_tensor, res))
 
         # float values, no axis
-        x = ht.float64([[0, 0, 0], [0, 0, 0]])
-        res = ht.zeros(1, dtype=ht.uint8)
-        any_tensor = ht.any(x)
-        self.assertIsInstance(any_tensor, ht.DNDarray)
-        self.assertEqual(any_tensor.shape, ())
-        self.assertEqual(any_tensor.dtype, ht.bool)
-        self.assertTrue(ht.equal(any_tensor, res))
+        if not self.is_mps:
+            x = ht.float64([[0, 0, 0], [0, 0, 0]])
+            res = ht.zeros(1, dtype=ht.uint8)
+            any_tensor = ht.any(x)
+            self.assertIsInstance(any_tensor, ht.DNDarray)
+            self.assertEqual(any_tensor.shape, ())
+            self.assertEqual(any_tensor.dtype, ht.bool)
+            self.assertTrue(ht.equal(any_tensor, res))
 
         # split tensor, along axis
         x = ht.arange(10, split=0)
diff --git a/heat/core/tests/test_manipulations.py b/heat/core/tests/test_manipulations.py
index cefb95a01b..e3c5ad232d 100644
--- a/heat/core/tests/test_manipulations.py
+++ b/heat/core/tests/test_manipulations.py
@@ -56,9 +56,10 @@ def tests_broadcast_to(self):
         self.assertEqual(broadcasted.dtype, ht.float32)
 
         # check split
-        a = ht.zeros((5, 5), split=0)
-        broadcasted = ht.broadcast_to(a, (5, 5, 5))
-        self.assertEqual(broadcasted.split, 1)
+        if not self.is_mps:
+            a = ht.zeros((5, 5), split=0)
+            broadcasted = ht.broadcast_to(a, (5, 5, 5))
+            self.assertEqual(broadcasted.split, 1)
 
         # test view
         a = ht.arange(5)
@@ -442,10 +443,11 @@ def test_concatenate(self):
         self.assertEqual(res.lshape, tuple(lshape))
 
         # 0 0 0
-        x = ht.ones((16,), split=0, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        x = ht.ones((16,), split=0, dtype=dtype)
         res = ht.concatenate((x, y), axis=0)
         self.assertEqual(res.gshape, (32,))
-        self.assertEqual(res.dtype, ht.float64)
+        self.assertEqual(res.dtype, dtype)
         _, _, chk = res.comm.chunk((32,), res.split)
         lshape = [0]
         lshape[0] = chk[0].stop - chk[0].start
@@ -455,7 +457,7 @@ def test_concatenate(self):
         y = ht.ones((16,), split=None, dtype=ht.int64)
         res = ht.concatenate((x, y), axis=0)
         self.assertEqual(res.gshape, (32,))
-        self.assertEqual(res.dtype, ht.float64)
+        self.assertEqual(res.dtype, dtype)
         _, _, chk = res.comm.chunk((32,), res.split)
         lshape = [0]
         lshape[0] = chk[0].stop - chk[0].start
@@ -571,13 +573,14 @@ def test_diag(self):
             numpy_args={"k": 2},
         )
 
-        self.assert_func_equal(
-            (5,),
-            heat_func=ht.diag,
-            numpy_func=np.diag,
-            heat_args={"offset": -3},
-            numpy_args={"k": -3},
-        )
+        if not res.device.torch_device.startswith("mps"):
+            self.assert_func_equal(
+                (5,),
+                heat_func=ht.diag,
+                numpy_func=np.diag,
+                heat_args={"offset": -3},
+                numpy_args={"k": -3},
+            )
 
     def test_diagonal(self):
         size = ht.MPI_WORLD.size
@@ -685,7 +688,8 @@ def test_diagonal(self):
         res.balance_()
         self.assertTrue(
             torch.equal(
-                res.larray, torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device)
+                res.larray,
+                torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device),
             )
         )
 
@@ -702,7 +706,8 @@ def test_diagonal(self):
         res.balance_()
         self.assertTrue(
             torch.equal(
-                res.larray, torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device)
+                res.larray,
+                torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device),
             )
         )
 
@@ -824,29 +829,30 @@ def test_diagonal(self):
         with self.assertRaises(ValueError):
             ht.diagonal(data)
 
-        self.assert_func_equal(
-            (5, 5, 5),
-            heat_func=ht.diagonal,
-            numpy_func=np.diagonal,
-            heat_args={"dim1": 0, "dim2": 2},
-            numpy_args={"axis1": 0, "axis2": 2},
-        )
+        if not res.device.torch_device.startswith("mps"):
+            self.assert_func_equal(
+                (5, 5, 5),
+                heat_func=ht.diagonal,
+                numpy_func=np.diagonal,
+                heat_args={"dim1": 0, "dim2": 2},
+                numpy_args={"axis1": 0, "axis2": 2},
+            )
 
-        self.assert_func_equal(
-            (5, 4, 3, 2),
-            heat_func=ht.diagonal,
-            numpy_func=np.diagonal,
-            heat_args={"dim1": 1, "dim2": 2},
-            numpy_args={"axis1": 1, "axis2": 2},
-        )
+            self.assert_func_equal(
+                (5, 4, 3, 2),
+                heat_func=ht.diagonal,
+                numpy_func=np.diagonal,
+                heat_args={"dim1": 1, "dim2": 2},
+                numpy_args={"axis1": 1, "axis2": 2},
+            )
 
-        self.assert_func_equal(
-            (4, 6, 3),
-            heat_func=ht.diagonal,
-            numpy_func=np.diagonal,
-            heat_args={"dim1": 0, "dim2": 1},
-            numpy_args={"axis1": 0, "axis2": 1},
-        )
+            self.assert_func_equal(
+                (4, 6, 3),
+                heat_func=ht.diagonal,
+                numpy_func=np.diagonal,
+                heat_args={"dim1": 0, "dim2": 1},
+                numpy_args={"axis1": 0, "axis2": 1},
+            )
 
     def test_dsplit(self):
         # for further testing, see test_split
@@ -1773,7 +1779,7 @@ def test_repeat(self):
         # -------------------
         # a = np.ndarray
         # -------------------
-        a = np.array([1.2, 2.4, 3, 4, 5])
+        a = np.array([1.2, 2.4, 3, 4, 5]).astype(np.float32)
         # axis is None
         # repeats = scalar
         repeats = 2
@@ -2222,7 +2228,11 @@ def test_reshape(self):
         self.assertTrue(ht.equal(reshaped, result))
         self.assertEqual(reshaped.device, result.device)
 
-        b = ht.arange(4 * 5 * 6, dtype=ht.float64)
+        if a.device.torch_device.startswith("mps"):
+            float_type = ht.float32
+        else:
+            float_type = ht.float64
+        b = ht.arange(4 * 5 * 6, dtype=float_type)
         # test *shape input
         reshaped = b.reshape(4, 5, 6)
         self.assertTrue(reshaped.gshape == (4, 5, 6))
@@ -2269,8 +2279,8 @@ def test_reshape(self):
         self.assertEqual(reshaped.device, result.device)
 
         # 1-dim distributed vector
-        a = ht.arange(8, dtype=ht.float64, split=0, device=self.device)
-        result = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dtype=ht.float64, split=0)
+        a = ht.arange(8, dtype=float_type, split=0, device=self.device)
+        result = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dtype=float_type, split=0)
         reshaped = ht.reshape(a, (2, 2, 2))
 
         self.assertEqual(reshaped.size, result.size)
@@ -2554,74 +2564,75 @@ def test_roll(self):
         self.assertEqual(rolled.split, a.split)
         self.assertTrue(np.array_equal(rolled.numpy(), compare))
 
-        a = ht.arange(20, dtype=ht.complex64).reshape((4, 5), new_split=1)
-
-        rolled = ht.roll(a, -1)
-        compare = np.roll(a.numpy(), -1)
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, 1, 0)
-        compare = np.roll(a.numpy(), 1, 0)
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, -2, [0, 1])
-        compare = np.roll(a.numpy(), -2, [0, 1])
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, [1, 2, 1], [0, 1, -2])
-        compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2])
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        # added 3D test, only a quick test for functionality
-        a = ht.arange(4 * 5 * 6, dtype=ht.complex64).reshape((4, 5, 6), new_split=2)
-
-        rolled = ht.roll(a, -1)
-        compare = np.roll(a.numpy(), -1)
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, 1, 0)
-        compare = np.roll(a.numpy(), 1, 0)
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, -2, [0, 1])
-        compare = np.roll(a.numpy(), -2, [0, 1])
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
-
-        rolled = ht.roll(a, [1, 2, 1], [0, 1, -2])
-        compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2])
-        self.assertEqual(rolled.device, a.device)
-        self.assertEqual(rolled.size, a.size)
-        self.assertEqual(rolled.dtype, a.dtype)
-        self.assertEqual(rolled.split, a.split)
-        self.assertTrue(np.array_equal(rolled.numpy(), compare))
+        if not a.device.torch_device.startswith("mps"):
+            a = ht.arange(20, dtype=ht.complex64).reshape((4, 5), new_split=1)
+
+            rolled = ht.roll(a, -1)
+            compare = np.roll(a.numpy(), -1)
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, 1, 0)
+            compare = np.roll(a.numpy(), 1, 0)
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, -2, [0, 1])
+            compare = np.roll(a.numpy(), -2, [0, 1])
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, [1, 2, 1], [0, 1, -2])
+            compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2])
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            # added 3D test, only a quick test for functionality
+            a = ht.arange(4 * 5 * 6, dtype=ht.complex64).reshape((4, 5, 6), new_split=2)
+
+            rolled = ht.roll(a, -1)
+            compare = np.roll(a.numpy(), -1)
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, 1, 0)
+            compare = np.roll(a.numpy(), 1, 0)
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, -2, [0, 1])
+            compare = np.roll(a.numpy(), -2, [0, 1])
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
+
+            rolled = ht.roll(a, [1, 2, 1], [0, 1, -2])
+            compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2])
+            self.assertEqual(rolled.device, a.device)
+            self.assertEqual(rolled.size, a.size)
+            self.assertEqual(rolled.dtype, a.dtype)
+            self.assertEqual(rolled.split, a.split)
+            self.assertTrue(np.array_equal(rolled.numpy(), compare))
 
         with self.assertRaises(TypeError):
             ht.roll(a, 1.0, 0)
@@ -2687,7 +2698,10 @@ def test_row_stack(self):
         # test local row_stack, 2-D arrays
         a = np.arange(10, dtype=np.float32).reshape(2, 5)
         b = np.arange(15, dtype=np.float32).reshape(3, 5)
-        np_rstack = np.row_stack((a, b))
+        if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1":
+            np_rstack = np.vstack((a, b))
+        else:
+            np_rstack = np.row_stack((a, b))
         ht_a = ht.array(a)
         ht_b = ht.array(b)
         ht_rstack = ht.row_stack((ht_a, ht_b))
@@ -2695,14 +2709,20 @@ def test_row_stack(self):
 
         # 2-D and 1-D arrays
         c = np.arange(5, dtype=np.float32)
-        np_rstack = np.row_stack((a, b, c))
+        if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1":
+            np_rstack = np.vstack((a, b, c))
+        else:
+            np_rstack = np.row_stack((a, b, c))
         ht_c = ht.array(c)
         ht_rstack = ht.row_stack((ht_a, ht_b, ht_c))
         self.assertTrue((np_rstack == ht_rstack.numpy()).all())
 
         # 2-D and 1-D arrays, distributed
         c = np.arange(5, dtype=np.float32)
-        np_rstack = np.row_stack((a, b, c))
+        if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1":
+            np_rstack = np.vstack((a, b, c))
+        else:
+            np_rstack = np.row_stack((a, b, c))
         ht_a = ht.array(a, split=0)
         ht_b = ht.array(b, split=0)
         ht_c = ht.array(c, split=0)
@@ -2713,7 +2733,10 @@ def test_row_stack(self):
         # 1-D arrays, distributed, different dtypes
         d = np.arange(10).astype(np.float32)
         e = np.arange(10)
-        np_rstack = np.row_stack((d, e))
+        if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1":
+            np_rstack = np.vstack((d, e))
+        else:
+            np_rstack = np.row_stack((d, e))
         ht_d = ht.array(d, split=0)
         ht_e = ht.array(e, split=0)
         ht_rstack = ht.row_stack((ht_d, ht_e))
@@ -2814,20 +2837,22 @@ def test_sort(self):
         exp_axis_zero = torch.tensor(
             [[2, 3, 0], [0, 2, 3]], dtype=torch.int32, device=self.device.torch_device
         )
-        if torch.cuda.is_available() and data.device == ht.gpu and size < 4:
-            indices_axis_zero = torch.tensor(
-                [[0, 2, 2], [3, 2, 0]], dtype=torch.int32, device=self.device.torch_device
-            )
-        else:
-            indices_axis_zero = torch.tensor(
-                [[0, 2, 2], [3, 0, 0]], dtype=torch.int32, device=self.device.torch_device
-            )
+        indices_axis_zero = torch.tensor(
+            [[0, 2, 2], [3, 0, 0]], dtype=torch.int32, device=self.device.torch_device
+        )
         result, result_indices = ht.sort(data, axis=0)
         first = result[0].larray
         first_indices = result_indices[0].larray
         if rank == 0:
             self.assertTrue(torch.equal(first, exp_axis_zero))
-            self.assertTrue(torch.equal(first_indices, indices_axis_zero))
+            try:
+                self.assertTrue(torch.equal(first_indices, indices_axis_zero))
+            except AssertionError:
+                # if environment is CUDA (not ROCm), the indices are not sorted correctly
+                indices_axis_zero = torch.tensor(
+                    [[0, 2, 2], [3, 2, 0]], dtype=torch.int32, device=self.device.torch_device
+                )
+                self.assertTrue(torch.equal(first_indices, indices_axis_zero))
 
         data = ht.array(tensor, split=1)
         exp_axis_one = torch.tensor([[2, 2, 3]], dtype=torch.int32, device=self.device.torch_device)
@@ -3475,13 +3500,14 @@ def test_tile(self):
 
         # test tile along split axis
         # len(reps) = x.ndim
-        split = 1
-        x = ht.random.randn(3, 3, dtype=ht.float64, split=split)
-        reps = (2, 3)
-        tiled_along_split = ht.tile(x, reps)
-        np_tiled_along_split = np.tile(x.numpy(), reps)
-        self.assertTrue((tiled_along_split.numpy() == np_tiled_along_split).all())
-        self.assertTrue(tiled_along_split.dtype is x.dtype)
+        if not self.is_mps:
+            split = 1
+            x = ht.random.randn(3, 3, dtype=ht.float64, split=split)
+            reps = (2, 3)
+            tiled_along_split = ht.tile(x, reps)
+            np_tiled_along_split = np.tile(x.numpy(), reps)
+            self.assertTrue((tiled_along_split.numpy() == np_tiled_along_split).all())
+            self.assertTrue(tiled_along_split.dtype is x.dtype)
 
         # test exceptions
         float_reps = (1, 2, 2, 1.5)
@@ -3540,14 +3566,22 @@ def test_topk(self):
         self.assertTrue((out[1].larray == exp_zero.larray).all())
         self.assertTrue(out[1].larray.dtype == exp_zero_indcs.larray.dtype)
 
-        torch_array = torch.arange(
-            size, dtype=torch.float64, device=self.device.torch_device
-        ).expand(size, size)
+        if self.is_mps:
+            float_type = torch.float32
+        else:
+            float_type = torch.float64
+        ht_float_type = ht.types.canonical_heat_type(float_type)
+
+        torch_array = torch.arange(size, dtype=float_type, device=self.device.torch_device).expand(
+            size, size
+        )
         split_zero = ht.array(torch_array, split=0)
         split_one = ht.array(torch_array, split=1)
 
         res, indcs = ht.topk(split_zero, 2, sorted=True)
-        exp_zero = ht.array([[size - 1, size - 2] for i in range(size)], dtype=ht.float64, split=0)
+        exp_zero = ht.array(
+            [[size - 1, size - 2] for i in range(size)], dtype=ht_float_type, split=0
+        )
         exp_zero_indcs = ht.array(
             [[size - 1, size - 2] for i in range(size)], dtype=ht.int64, split=0
         )
@@ -3556,7 +3590,9 @@ def test_topk(self):
         self.assertTrue(indcs.larray.dtype == exp_zero_indcs.larray.dtype)
 
         res, indcs = ht.topk(split_one, 2, sorted=True)
-        exp_one = ht.array([[size - 1, size - 2] for i in range(size)], dtype=ht.float64, split=1)
+        exp_one = ht.array(
+            [[size - 1, size - 2] for i in range(size)], dtype=ht_float_type, split=1
+        )
         exp_one_indcs = ht.array(
             [[size - 1, size - 2] for i in range(size)], dtype=ht.int64, split=1
         )
@@ -3570,7 +3606,7 @@ def test_topk(self):
             out = (ht.empty_like(exp_zero), ht.empty_like(exp_zero_indcs))
             res, indcs = ht.topk(split_zero, 2, sorted=True, largest=False, out=out)
         with self.assertRaises(RuntimeError):
-            exp_zero = ht.array([[0, 1] for i in range(size)], dtype=ht.float64, split=0)
+            exp_zero = ht.array([[0, 1] for i in range(size)], dtype=ht_float_type, split=0)
             exp_zero_indcs = ht.array([[0, 1] for i in range(size)], dtype=ht.int16, split=0)
             out = (ht.empty_like(exp_zero), ht.empty_like(exp_zero_indcs))
             res, indcs = ht.topk(split_zero, 2, sorted=True, largest=False, out=out)
@@ -3619,11 +3655,15 @@ def test_unique(self):
 
         res, inv = ht.unique(data, return_inverse=True, axis=0)
         _, exp_inv = torch_array.unique(dim=0, return_inverse=True, sorted=True)
-        self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype)))
+        self.assertTrue(
+            (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all()
+        )
 
         res, inv = ht.unique(data, return_inverse=True, axis=1)
         _, exp_inv = torch_array.unique(dim=1, return_inverse=True, sorted=True)
-        self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype)))
+        self.assertTrue(
+            (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all()
+        )
 
         torch_array = torch.tensor(
             [[1, 1, 2], [1, 2, 2], [2, 1, 2], [1, 3, 2], [0, 1, 2]],
@@ -3647,7 +3687,9 @@ def test_unique(self):
 
         data_split_zero = ht.array(torch_array, split=0)
         res, inv = ht.unique(data_split_zero, return_inverse=True, sorted=True)
-        self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype)))
+        self.assertTrue(
+            (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all()
+        )
 
     def test_vsplit(self):
         # for further testing, see test_split
diff --git a/heat/core/tests/test_printing.py b/heat/core/tests/test_printing.py
index cc8fd6d0a9..fd6e382e2a 100644
--- a/heat/core/tests/test_printing.py
+++ b/heat/core/tests/test_printing.py
@@ -430,10 +430,19 @@ def test_split_2_above_threshold(self):
         if dndarray.comm.rank == 0:
             self.assertEqual(comparison, __str)
 
+    def test___repr__(self):
+        a = ht.array([1, 2, 3, 4])
+        r = a.__repr__()
+        self.assertEqual(
+            r,
+            f"<DNDarray(MPI-rank: {a.comm.rank}, Shape: {a.shape}, Split: {a.split}, Local Shape: {a.lshape}, Device: {a.device}, Dtype: {a.dtype.__name__})>",
+        )
+
 
 class TestPrintingGPU(TestCase):
     def test_print_GPU(self):
         # this test case also includes GPU now, checking the output is not done; only test whether the routine itself works...
-        a0 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(0)
-        a1 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(1)
-        print(a0, a1)
+        if not self.is_mps:
+            a0 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(0)
+            a1 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(1)
+            print(a0, a1)
diff --git a/heat/core/tests/test_random.py b/heat/core/tests/test_random.py
index c8e867c490..f0bc9b1f92 100644
--- a/heat/core/tests/test_random.py
+++ b/heat/core/tests/test_random.py
@@ -1,9 +1,16 @@
+import os
+import platform
+import unittest
+
 import numpy as np
 import torch
 
 import heat as ht
 from .test_suites.basic_test import TestCase
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
 
 class TestRandom_Batchparallel(TestCase):
     def test_default(self):
@@ -53,10 +60,13 @@ def test_permutation(self):
         if self.device.torch_device == "cpu":
             state = torch.random.get_rng_state()
         else:
-            state = torch.cuda.get_rng_state(self.device.torch_device)
+            if self.is_mps:
+                state = torch.mps.get_rng_state()
+            else:
+                state = torch.cuda.get_rng_state(self.device.torch_device)
 
         # results
-        a = ht.random.permutation(10)
+        a = ht.random.permutation(10, device=self.device)
 
         b_arr = ht.arange(10, dtype=ht.float32)
         b = ht.random.permutation(ht.resplit(b_arr, 0))
@@ -70,7 +80,10 @@ def test_permutation(self):
         if self.device.torch_device == "cpu":
             torch.random.set_rng_state(state)
         else:
-            torch.cuda.set_rng_state(state, self.device.torch_device)
+            if self.is_mps:
+                torch.mps.set_rng_state(state)
+            else:
+                torch.cuda.set_rng_state(state, self.device.torch_device)
 
         # torch results to compare to
         a_cmp = torch.randperm(a.shape[0], device=self.device.torch_device)
@@ -83,18 +96,19 @@ def test_permutation(self):
         self.assertEqual(a.dtype, ht.int64)
         self.assertEqual(b.dtype, ht.float32)
 
-        c0.resplit_(None)
-        c1.resplit_(None)
-        b.resplit_(None)
+        if not self.is_mps:
+            c0.resplit_(None)
+            c1.resplit_(None)
+            b.resplit_(None)
 
-        # due to different states of the torch RNG on different processes and due to construction of the permutation
-        # the values are only equal on process no 0 which has been used for generating the permutation
-        if ht.MPI_WORLD.rank == 0:
-            self.assertTrue((a.larray == a_cmp).all())
-            self.assertTrue((b.larray == b_cmp).all())
-            self.assertTrue((c.larray == c_cmp).all())
-            self.assertTrue((c0.larray == c0_cmp).all())
-            self.assertTrue((c1.larray == c1_cmp).all())
+            # due to different states of the torch RNG on different processes and due to construction of the permutation
+            # the values are only equal on process no 0 which has been used for generating the permutation
+            if ht.MPI_WORLD.rank == 0:
+                self.assertTrue((a.larray == a_cmp).all())
+                self.assertTrue((b.larray == b_cmp).all())
+                self.assertTrue((c.larray == c_cmp).all())
+                self.assertTrue((c0.larray == c0_cmp).all())
+                self.assertTrue((c1.larray == c1_cmp).all())
 
         with self.assertRaises(TypeError):
             ht.random.permutation("abc")
@@ -122,19 +136,21 @@ def test_rand(self):
         self.assertTrue((counts <= 2).all())
 
         # Two large arrays that were created after each other don't share too much values
-        b = ht.random.rand(14, 7, 3, 12, 18, 42, split=5, comm=ht.MPI_WORLD, dtype=ht.float64)
-        c = np.concatenate((a.flatten(), b.numpy().flatten()))
-        _, counts = np.unique(c, return_counts=True)
-        self.assertTrue((counts <= 2).all())
+        if not self.is_mps:
+            # this condition is not met if b is float32, MPS does not support float64
+            b = ht.random.rand(14, 7, 3, 12, 18, 42, split=5, comm=ht.MPI_WORLD, dtype=ht.float64)
+            c = np.concatenate((a.flatten(), b.numpy().flatten()))
+            _, counts = np.unique(c, return_counts=True)
+            self.assertTrue((counts <= 2).all())
 
-        # Values should be spread evenly across the range [0, 1)
-        mean = np.mean(c)
-        median = np.median(c)
-        std = np.std(c)
-        self.assertTrue(0.49 < mean < 0.51)
-        self.assertTrue(0.49 < median < 0.51)
-        self.assertTrue(std < 0.3)
-        self.assertTrue(((0 <= c) & (c < 1)).all())
+            # Values should be spread evenly across the range [0, 1)
+            mean = np.mean(c)
+            median = np.median(c)
+            std = np.std(c)
+            self.assertTrue(0.49 < mean < 0.51)
+            self.assertTrue(0.49 < median < 0.51)
+            self.assertTrue(std < 0.3)
+            self.assertTrue(((0 <= c) & (c < 1)).all())
 
         # No arguments work correctly
         ht.random.seed(seed)
@@ -196,7 +212,9 @@ def test_randint(self):
         ht.random.seed(13579)
         b = ht.random.randint(low=0, high=10000, size=shape, split=2, dtype=ht.int64)
 
-        self.assertTrue(ht.equal(a, b))
+        if not self.is_mps:
+            # assertion fails on more than 4 dimensions on MPS
+            self.assertTrue(ht.equal(a, b))
         mean = ht.mean(a)
         # median = ht.median(a)
         std = ht.std(a)
@@ -252,11 +270,12 @@ def test_randint(self):
         self.assertTrue(ht.equal(a, b))
 
     def test_randn(self):
+        float_dtype = ht.float32 if self.is_mps else ht.float64
         # Test that the random values have the correct distribution
         ht.random.seed(54321)
         shape = (5, 10, 13, 23)
-        a = ht.random.randn(*shape, split=0, dtype=ht.float64)
-        self.assertEqual(a.dtype, ht.float64)
+        a = ht.random.randn(*shape, split=0, dtype=float_dtype)
+        self.assertEqual(a.dtype, float_dtype)
         mean = ht.mean(a)
         median = ht.median(a)
         std = ht.std(a)
@@ -265,22 +284,23 @@ def test_randn(self):
         self.assertTrue(0.98 < std < 1.02)
 
         # Creating the same array two times without resetting seed results in different elements
-        c = ht.random.randn(*shape, split=0, dtype=ht.float64)
+        c = ht.random.randn(*shape, split=0, dtype=float_dtype)
         self.assertEqual(c.shape, a.shape)
         self.assertFalse(ht.allclose(a, c))
 
-        # All the created values should be different
-        d = ht.concatenate((a, c))
-        d.resplit_(None)
-        d = d.numpy()
-        _, counts = np.unique(d, return_counts=True)
-        self.assertTrue((counts == 1).all())
+        if not self.is_mps:
+            # If dtype is float64, all the created values should be different
+            d = ht.concatenate((a, c))
+            d.resplit_(None)
+            d = d.numpy()
+            _, counts = np.unique(d, return_counts=True)
+            self.assertTrue((counts == 1).all())
 
         # Two arrays are the same for same seed and split-axis != 0
         ht.random.seed(12345)
-        a = ht.random.randn(*shape, split=3, dtype=ht.float64)
+        a = ht.random.randn(*shape, split=3, dtype=float_dtype)
         ht.random.seed(12345)
-        b = ht.random.randn(*shape, split=3, dtype=ht.float64)
+        b = ht.random.randn(*shape, split=3, dtype=float_dtype)
         self.assertTrue(ht.equal(a, b))
 
         # Tests with float32
@@ -313,32 +333,43 @@ def test_randn(self):
         self.assertTrue(isinstance(x, float))
 
     def test_randperm(self):
+        # Reset RNG
+        ht.random.seed()
         if self.device.torch_device == "cpu":
             state = torch.random.get_rng_state()
         else:
-            state = torch.cuda.get_rng_state(self.device.torch_device)
+            if self.is_mps:
+                state = torch.mps.get_rng_state()
+            else:
+                state = torch.cuda.get_rng_state(self.device.torch_device)
 
         # results
         a = ht.random.randperm(10, dtype=ht.int32)
         b = ht.random.randperm(4, dtype=ht.float32, split=0)
         c = ht.random.randperm(5, split=0)
-        d = ht.random.randperm(5, dtype=ht.float64)
+        if not self.is_mps:
+            d = ht.random.randperm(5, dtype=ht.float64)
 
         if self.device.torch_device == "cpu":
             torch.random.set_rng_state(state)
         else:
-            torch.cuda.set_rng_state(state, self.device.torch_device)
+            if self.is_mps:
+                torch.mps.set_rng_state(state)
+            else:
+                torch.cuda.set_rng_state(state, self.device.torch_device)
 
         # torch results to compare to
-        a_cmp = torch.randperm(10, dtype=torch.int32, device=self.device.torch_device)
+        a_cmp = torch.randperm(10, dtype=torch.int32, device=a.larray.device)
         b_cmp = torch.randperm(4, dtype=torch.float32, device=self.device.torch_device)
         c_cmp = torch.randperm(5, dtype=torch.int64, device=self.device.torch_device)
-        d_cmp = torch.randperm(5, dtype=torch.float64, device=self.device.torch_device)
+        if not self.is_mps:
+            d_cmp = torch.randperm(5, dtype=torch.float64, device=self.device.torch_device)
 
         self.assertEqual(a.dtype, ht.int32)
         self.assertEqual(b.dtype, ht.float32)
         self.assertEqual(c.dtype, ht.int64)
-        self.assertEqual(d.dtype, ht.float64)
+        if not self.is_mps:
+            self.assertEqual(d.dtype, ht.float64)
         brsp = ht.resplit(b)
         crsp = ht.resplit(c)
 
@@ -348,7 +379,8 @@ def test_randperm(self):
             self.assertTrue((a.larray == a_cmp).all())
             self.assertTrue((brsp.larray == b_cmp).all())
             self.assertTrue((crsp.larray == c_cmp).all())
-            self.assertTrue((d.larray == d_cmp).all())
+            if not self.is_mps:
+                self.assertTrue((d.larray == d_cmp).all())
 
         with self.assertRaises(TypeError):
             ht.random.randperm("abc")
@@ -411,6 +443,7 @@ def test_set_state(self):
 """
 
 
+@unittest.skipIf(is_mps, "Threefry not supported on Apple MPS")
 class TestRandom_Threefry(TestCase):
     def test_setting_threefry(self):
         ht.random.set_state(("Threefry", 12345, 0xFFF))
@@ -605,7 +638,7 @@ def test_rand(self):
         self.assertTrue(ht.equal(a, b))
 
         # Too big arrays cant be created
-        with self.assertRaises(ValueError):
+        with self.assertRaises(RuntimeError):
             ht.random.randn(0x7FFFFFFFFFFFFFFF)
         with self.assertRaises(ValueError):
             ht.random.rand(3, 2, -2, 5, split=1)
diff --git a/heat/core/tests/test_rounding.py b/heat/core/tests/test_rounding.py
index 761742095d..597cd044f9 100644
--- a/heat/core/tests/test_rounding.py
+++ b/heat/core/tests/test_rounding.py
@@ -1,3 +1,5 @@
+import platform
+
 import numpy as np
 import torch
 
@@ -17,12 +19,14 @@ def test_abs(self):
         int16_absolute_values_fabs = ht.fabs(int16_tensor_fabs)
         int32_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int32, split=0)
         int32_absolute_values_fabs = ht.fabs(int32_tensor_fabs)
-        int64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int64, split=0)
-        int64_absolute_values_fabs = ht.fabs(int64_tensor_fabs)
+        if not self.is_mps:
+            int64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int64, split=0)
+            int64_absolute_values_fabs = ht.fabs(int64_tensor_fabs)
         float32_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float32, split=0)
         float32_absolute_values_fabs = ht.fabs(float32_tensor_fabs)
-        float64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float64, split=0)
-        float64_absolute_values_fabs = ht.fabs(float64_tensor_fabs)
+        if not self.is_mps:
+            float64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float64, split=0)
+            float64_absolute_values_fabs = ht.fabs(float64_tensor_fabs)
 
         # basic absolute test
         self.assertIsInstance(absolute_values, ht.DNDarray)
@@ -32,9 +36,11 @@ def test_abs(self):
         self.assertEqual(int8_absolute_values_fabs.sum(axis=0), 100.0)
         self.assertEqual(int16_absolute_values_fabs.sum(axis=0), 100.0)
         self.assertEqual(int32_absolute_values_fabs.sum(axis=0), 100.0)
-        self.assertEqual(int64_absolute_values_fabs.sum(axis=0), 100.0)
+        if not self.is_mps:
+            self.assertEqual(int64_absolute_values_fabs.sum(axis=0), 100.0)
         self.assertEqual(float32_absolute_values_fabs.sum(axis=0), 110.5)
-        self.assertEqual(float64_absolute_values_fabs.sum(axis=0), 110.5)
+        if not self.is_mps:
+            self.assertEqual(float64_absolute_values_fabs.sum(axis=0), 110.5)
 
         # check whether output works
         # for abs==absolute
@@ -65,9 +71,10 @@ def test_abs(self):
         self.assertEqual(int8_absolute_values_fabs.dtype, ht.float32)
         self.assertEqual(int16_absolute_values_fabs.dtype, ht.float32)
         self.assertEqual(int32_absolute_values_fabs.dtype, ht.float32)
-        self.assertEqual(int64_absolute_values_fabs.dtype, ht.float64)
         self.assertEqual(float32_absolute_values_fabs.dtype, ht.float32)
-        self.assertEqual(float64_absolute_values_fabs.dtype, ht.float64)
+        if not self.is_mps:
+            self.assertEqual(int64_absolute_values_fabs.dtype, ht.float64)
+            self.assertEqual(float64_absolute_values_fabs.dtype, ht.float64)
 
         # exceptions
         # for abs==absolute
@@ -92,8 +99,9 @@ def test_abs(self):
 
     def test_ceil(self):
         start, end, step = -5.0, 5.0, 1.4
+        float_dtype = torch.float32 if self.is_mps else torch.float64
         comparison = torch.arange(
-            start, end, step, dtype=torch.float64, device=self.device.torch_device
+            start, end, step, dtype=float_dtype, device=self.device.torch_device
         ).ceil()
 
         # exponential of float32
@@ -105,12 +113,13 @@ def test_ceil(self):
         self.assertTrue((float32_floor.larray == comparison.float()).all())
 
         # exponential of float64
-        float64_tensor = ht.arange(start, end, step, dtype=ht.float64)
-        float64_floor = float64_tensor.ceil()
-        self.assertIsInstance(float64_floor, ht.DNDarray)
-        self.assertEqual(float64_floor.dtype, ht.float64)
-        self.assertEqual(float64_floor.dtype, ht.float64)
-        self.assertTrue((float64_floor.larray == comparison).all())
+        if not self.is_mps:
+            float64_tensor = ht.arange(start, end, step, dtype=ht.float64)
+            float64_floor = float64_tensor.ceil()
+            self.assertIsInstance(float64_floor, ht.DNDarray)
+            self.assertEqual(float64_floor.dtype, ht.float64)
+            self.assertEqual(float64_floor.dtype, ht.float64)
+            self.assertTrue((float64_floor.larray == comparison).all())
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -159,12 +168,13 @@ def test_floor(self):
         self.assertTrue((float32_floor.larray == comparison.float()).all())
 
         # exponential of float64
-        float64_tensor = ht.arange(start, end, step, dtype=ht.float64) + 0.01
-        float64_floor = float64_tensor.floor()
-        self.assertIsInstance(float64_floor, ht.DNDarray)
-        self.assertEqual(float64_floor.dtype, ht.float64)
-        self.assertEqual(float64_floor.dtype, ht.float64)
-        self.assertTrue((float64_floor.larray == comparison).all())
+        if not self.is_mps:
+            float64_tensor = ht.arange(start, end, step, dtype=ht.float64) + 0.01
+            float64_floor = float64_tensor.floor()
+            self.assertIsInstance(float64_floor, ht.DNDarray)
+            self.assertEqual(float64_floor.dtype, ht.float64)
+            self.assertEqual(float64_floor.dtype, ht.float64)
+            self.assertTrue((float64_floor.larray == comparison).all())
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -191,18 +201,19 @@ def test_modf(self):
         self.assert_array_equal(float32_modf[1], comparison[1])
 
         # exponential of float64
-        npArray = np.arange(start, end, step, np.float64)
-        comparison = np.modf(npArray)
+        if not self.is_mps:
+            npArray = np.arange(start, end, step, np.float64)
+            comparison = np.modf(npArray)
 
-        float64_tensor = ht.array(npArray, dtype=ht.float64)
-        float64_modf = float64_tensor.modf()
-        self.assertIsInstance(float64_modf[0], ht.DNDarray)
-        self.assertIsInstance(float64_modf[1], ht.DNDarray)
-        self.assertEqual(float64_modf[0].dtype, ht.float64)
-        self.assertEqual(float64_modf[1].dtype, ht.float64)
+            float64_tensor = ht.array(npArray, dtype=ht.float64)
+            float64_modf = float64_tensor.modf()
+            self.assertIsInstance(float64_modf[0], ht.DNDarray)
+            self.assertIsInstance(float64_modf[1], ht.DNDarray)
+            self.assertEqual(float64_modf[0].dtype, ht.float64)
+            self.assertEqual(float64_modf[1].dtype, ht.float64)
 
-        self.assert_array_equal(float64_modf[0], comparison[0])
-        self.assert_array_equal(float64_modf[1], comparison[1])
+            self.assert_array_equal(float64_modf[0], comparison[0])
+            self.assert_array_equal(float64_modf[1], comparison[1])
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -211,8 +222,9 @@ def test_modf(self):
             ht.modf(object())
         with self.assertRaises(TypeError):
             ht.modf(float32_tensor, 1)
-        with self.assertRaises(ValueError):
-            ht.modf(float32_tensor, (float32_tensor, float32_tensor, float64_tensor))
+        if not self.is_mps:
+            with self.assertRaises(ValueError):
+                ht.modf(float32_tensor, (float32_tensor, float32_tensor, float64_tensor))
         with self.assertRaises(TypeError):
             ht.modf(float32_tensor, (float32_tensor, 2))
 
@@ -233,23 +245,24 @@ def test_modf(self):
         self.assert_array_equal(float32_modf_distrbd[1], comparison[1])
 
         # exponential of float64
-        npArray = npArray = np.arange(start, end, step, np.float64)
-        comparison = np.modf(npArray)
-
-        float64_tensor_distrbd = ht.array(npArray, split=0)
-        float64_modf_distrbd = (
-            ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype),
-            ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype),
-        )
-        # float64_modf_distrbd = float64_tensor_distrbd.modf()
-        float64_tensor_distrbd.modf(out=float64_modf_distrbd)
-        self.assertIsInstance(float64_modf_distrbd[0], ht.DNDarray)
-        self.assertIsInstance(float64_modf_distrbd[1], ht.DNDarray)
-        self.assertEqual(float64_modf_distrbd[0].dtype, ht.float64)
-        self.assertEqual(float64_modf_distrbd[1].dtype, ht.float64)
-
-        self.assert_array_equal(float64_modf_distrbd[0], comparison[0])
-        self.assert_array_equal(float64_modf_distrbd[1], comparison[1])
+        if not self.is_mps:
+            npArray = npArray = np.arange(start, end, step, np.float64)
+            comparison = np.modf(npArray)
+
+            float64_tensor_distrbd = ht.array(npArray, split=0)
+            float64_modf_distrbd = (
+                ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype),
+                ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype),
+            )
+            # float64_modf_distrbd = float64_tensor_distrbd.modf()
+            float64_tensor_distrbd.modf(out=float64_modf_distrbd)
+            self.assertIsInstance(float64_modf_distrbd[0], ht.DNDarray)
+            self.assertIsInstance(float64_modf_distrbd[1], ht.DNDarray)
+            self.assertEqual(float64_modf_distrbd[0].dtype, ht.float64)
+            self.assertEqual(float64_modf_distrbd[1].dtype, ht.float64)
+
+            self.assert_array_equal(float64_modf_distrbd[0], comparison[0])
+            self.assert_array_equal(float64_modf_distrbd[1], comparison[1])
 
     def test_round(self):
         size = ht.communication.MPI_WORLD.size
@@ -266,13 +279,14 @@ def test_round(self):
         self.assert_array_equal(float32_round, comparison)
 
         # exponential of float64
-        comparison = torch.arange(start, end, step, dtype=torch.float64).round()
-        float64_tensor = ht.array(comparison, dtype=ht.float64)
-        float64_round = float64_tensor.round()
-        self.assertIsInstance(float64_round, ht.DNDarray)
-        self.assertEqual(float64_round.dtype, ht.float64)
-        self.assertEqual(float64_round.dtype, ht.float64)
-        self.assert_array_equal(float64_round, comparison)
+        if not self.is_mps:
+            comparison = torch.arange(start, end, step, dtype=torch.float64).round()
+            float64_tensor = ht.array(comparison, dtype=ht.float64)
+            float64_round = float64_tensor.round()
+            self.assertIsInstance(float64_round, ht.DNDarray)
+            self.assertEqual(float64_round.dtype, ht.float64)
+            self.assertEqual(float64_round.dtype, ht.float64)
+            self.assert_array_equal(float64_round, comparison)
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -286,24 +300,25 @@ def test_round(self):
 
         # with split tensors
 
-        # exponential of float32
-        comparison = torch.arange(start, end, step, dtype=torch.float32)  # .round()
-        float32_tensor_distrbd = ht.array(comparison, split=0, dtype=ht.double)
-        comparison = comparison.round()
-        float32_round_distrbd = float32_tensor_distrbd.round(dtype=ht.float)
-        self.assertIsInstance(float32_round_distrbd, ht.DNDarray)
-        self.assertEqual(float32_round_distrbd.dtype, ht.float32)
-        self.assert_array_equal(float32_round_distrbd, comparison)
-
-        # exponential of float64
-        comparison = torch.arange(start, end, step, dtype=torch.float64)  # .round()
-        float64_tensor_distrbd = ht.array(comparison, split=0)
-        comparison = comparison.round()
-        float64_round_distrbd = float64_tensor_distrbd.round()
-        self.assertIsInstance(float64_round_distrbd, ht.DNDarray)
-        self.assertEqual(float64_round_distrbd.dtype, ht.float64)
-        self.assertEqual(float64_round_distrbd.dtype, ht.float64)
-        self.assert_array_equal(float64_round_distrbd, comparison)
+        if not self.is_mps:
+            # exponential of float32
+            comparison = torch.arange(start, end, step, dtype=torch.float32)  # .round()
+            float32_tensor_distrbd = ht.array(comparison, split=0, dtype=ht.double)
+            comparison = comparison.round()
+            float32_round_distrbd = float32_tensor_distrbd.round(dtype=ht.float)
+            self.assertIsInstance(float32_round_distrbd, ht.DNDarray)
+            self.assertEqual(float32_round_distrbd.dtype, ht.float32)
+            self.assert_array_equal(float32_round_distrbd, comparison)
+
+            # exponential of float64
+            comparison = torch.arange(start, end, step, dtype=torch.float64)  # .round()
+            float64_tensor_distrbd = ht.array(comparison, split=0)
+            comparison = comparison.round()
+            float64_round_distrbd = float64_tensor_distrbd.round()
+            self.assertIsInstance(float64_round_distrbd, ht.DNDarray)
+            self.assertEqual(float64_round_distrbd.dtype, ht.float64)
+            self.assertEqual(float64_round_distrbd.dtype, ht.float64)
+            self.assert_array_equal(float64_round_distrbd, comparison)
 
     def test_sgn(self):
         # floats
@@ -325,7 +340,9 @@ def test_sgn(self):
         self.assertEqual(signed.dtype, ht.heat_type_of(comparison))
         self.assertEqual(signed.shape, a.shape)
         self.assertEqual(signed.device, a.device)
-        self.assertTrue(ht.equal(signed, ht.array(comparison, split=0)))
+        # complex types only supported on MPS starting from MacOS 14.0+
+        if not self.is_mps or platform.mac_ver()[0] >= "14.0":
+            self.assertTrue(ht.equal(signed, ht.array(comparison, split=0)))
 
     def test_sign(self):
         # floats 1d
@@ -339,50 +356,54 @@ def test_sign(self):
         self.assertEqual(signed.split, a.split)
         self.assertTrue(ht.equal(signed, comparison))
 
-        # complex + 2d + split
-        a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=0)
-        signed = ht.sign(a)
-        comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=0)
-
-        self.assertEqual(signed.dtype, comparison.dtype)
-        self.assertEqual(signed.shape, comparison.shape)
-        self.assertEqual(signed.device, a.device)
-        self.assertEqual(signed.split, a.split)
-        self.assertTrue(ht.allclose(signed.real, comparison.real))
-        self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
-
-        # complex + split + out
-        a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=1)
-        b = ht.empty_like(a)
-        signed = ht.sign(a, b)
-        comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=1)
-
-        self.assertIs(b, signed)
-        self.assertEqual(signed.dtype, comparison.dtype)
-        self.assertEqual(signed.shape, comparison.shape)
-        self.assertEqual(signed.device, a.device)
-        self.assertEqual(signed.split, a.split)
-        self.assertTrue(ht.allclose(signed.real, comparison.real))
-        self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
-
-        # zeros + 3d + complex + split
-        a = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2)
-        signed = ht.sign(a)
-        comparison = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2)
-
-        self.assertEqual(signed.dtype, comparison.dtype)
-        self.assertEqual(signed.shape, comparison.shape)
-        self.assertEqual(signed.device, a.device)
-        self.assertEqual(signed.split, a.split)
-        self.assertTrue(ht.allclose(signed.real, comparison.real))
-        self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
+        # complex on MPS only from MacOS 14.0+
+        if not self.is_mps or platform.mac_ver()[0] >= "14.0":
+            # complex + 2d + split
+            a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=0)
+            signed = ht.sign(a)
+            comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=0)
+
+            self.assertEqual(signed.dtype, comparison.dtype)
+            self.assertEqual(signed.shape, comparison.shape)
+            self.assertEqual(signed.device, a.device)
+            self.assertEqual(signed.split, a.split)
+            self.assertTrue(ht.allclose(signed.real, comparison.real))
+            self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
+
+            # complex + split + out
+            a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=1)
+            b = ht.empty_like(a)
+            signed = ht.sign(a, b)
+            comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=1)
+
+            self.assertIs(b, signed)
+            self.assertEqual(signed.dtype, comparison.dtype)
+            self.assertEqual(signed.shape, comparison.shape)
+            self.assertEqual(signed.device, a.device)
+            self.assertEqual(signed.split, a.split)
+            self.assertTrue(ht.allclose(signed.real, comparison.real))
+            self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
+
+            # zeros + 3d + complex + split
+            if not self.is_mps:
+                # double precision complex not supported on MPS
+                a = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2)
+                signed = ht.sign(a)
+                comparison = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2)
+
+                self.assertEqual(signed.dtype, comparison.dtype)
+                self.assertEqual(signed.shape, comparison.shape)
+                self.assertEqual(signed.device, a.device)
+                self.assertEqual(signed.split, a.split)
+                self.assertTrue(ht.allclose(signed.real, comparison.real))
+                self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5))
 
     def test_trunc(self):
         base_array = np.random.randn(20)
+        if self.is_mps:
+            base_array = base_array.astype(np.float32)
 
-        comparison = torch.tensor(
-            base_array, dtype=torch.float64, device=self.device.torch_device
-        ).trunc()
+        comparison = torch.tensor(base_array, device=self.device.torch_device).trunc()
 
         # trunc of float32
         float32_tensor = ht.array(base_array, dtype=ht.float32)
@@ -392,11 +413,12 @@ def test_trunc(self):
         self.assertTrue((float32_floor.larray == comparison.float()).all())
 
         # trunc of float64
-        float64_tensor = ht.array(base_array, dtype=ht.float64)
-        float64_floor = float64_tensor.trunc()
-        self.assertIsInstance(float64_floor, ht.DNDarray)
-        self.assertEqual(float64_floor.dtype, ht.float64)
-        self.assertTrue((float64_floor.larray == comparison).all())
+        if not self.is_mps:
+            float64_tensor = ht.array(base_array, dtype=ht.float64)
+            float64_floor = float64_tensor.trunc()
+            self.assertIsInstance(float64_floor, ht.DNDarray)
+            self.assertEqual(float64_floor.dtype, ht.float64)
+            self.assertTrue((float64_floor.larray == comparison).all())
 
         # check exceptions
         with self.assertRaises(TypeError):
diff --git a/heat/core/tests/test_sanitation.py b/heat/core/tests/test_sanitation.py
index 2e79a08105..fd08a1401f 100644
--- a/heat/core/tests/test_sanitation.py
+++ b/heat/core/tests/test_sanitation.py
@@ -14,6 +14,17 @@ def test_sanitize_in(self):
         with self.assertRaises(TypeError):
             ht.sanitize_in(np_x)
 
+    def sanitize_in_nd_realfloating(self):
+        x = "this is not a DNDarray"
+        with self.assertRaises(TypeError):
+            ht.sanitize_in_nd_realfloating(x, "x", [2])
+        x = ht.zeros(10, 10, 10, dtype=ht.float32, split=0)
+        with self.assertRaises(ValueError):
+            ht.sanitize_in_nd_realfloating(x, "x", [1, 2])
+        x = ht.zeros(10, 10, dtype=ht.int32, split=None)
+        with self.assertRaises(ValueError):
+            ht.sanitize_in_nd_realfloating(x, "x", [1, 2])
+
     def test_sanitize_out(self):
         output_shape = (4, 5, 6)
         output_split = 1
diff --git a/heat/core/tests/test_signal.py b/heat/core/tests/test_signal.py
index 818538cfea..ad3ecea12a 100644
--- a/heat/core/tests/test_signal.py
+++ b/heat/core/tests/test_signal.py
@@ -30,12 +30,12 @@ def test_convolve(self):
 
         with self.assertRaises(TypeError):
             signal_wrong_type = [0, 1, 2, "tre", 4, "five", 6, "ʻehiku", 8, 9, 10]
-            ht.convolve(signal_wrong_type, kernel_odd, mode="full")
+            ht.convolve(signal_wrong_type, kernel_odd, mode="full", stride=1)
         with self.assertRaises(TypeError):
             filter_wrong_type = [1, 1, "pizza", "pineapple"]
-            ht.convolve(dis_signal, filter_wrong_type, mode="full")
+            ht.convolve(dis_signal, filter_wrong_type, mode="full", stride=1)
         with self.assertRaises(ValueError):
-            ht.convolve(dis_signal, kernel_odd, mode="invalid")
+            ht.convolve(dis_signal, kernel_odd, mode="invalid", stride=1)
         if dis_signal.comm.size > 1:
             with self.assertRaises(ValueError):
                 s = dis_signal.reshape((2, -1)).resplit(axis=1)
@@ -59,17 +59,19 @@ def test_convolve(self):
             modes = ["full", "same", "valid"]
             for i, mode in enumerate(modes):
                 # odd kernel size
-                conv = ht.convolve(dis_signal, kernel_odd, mode=mode)
-                gathered = manipulations.resplit(conv, axis=None)
-                self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+                if not self.is_mps:
+                    # torch convolution does not support int on MPS
+                    conv = ht.convolve(dis_signal, kernel_odd, mode=mode)
+                    gathered = manipulations.resplit(conv, axis=None)
+                    self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
 
-                conv = ht.convolve(dis_signal, dis_kernel_odd, mode=mode)
-                gathered = manipulations.resplit(conv, axis=None)
-                self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+                    conv = ht.convolve(dis_signal, dis_kernel_odd, mode=mode)
+                    gathered = manipulations.resplit(conv, axis=None)
+                    self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
 
-                conv = ht.convolve(signal, dis_kernel_odd, mode=mode)
-                gathered = manipulations.resplit(conv, axis=None)
-                self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+                    conv = ht.convolve(signal, dis_kernel_odd, mode=mode).astype(ht.float)
+                    gathered = manipulations.resplit(conv, axis=None)
+                    self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
 
                 # different data types
                 conv = ht.convolve(dis_signal.astype(ht.float), kernel_odd)
@@ -87,17 +89,36 @@ def test_convolve(self):
                 # even kernel size
                 # skip mode 'same' for even kernels
                 if mode != "same":
-                    conv = ht.convolve(dis_signal, kernel_even, mode=mode)
-                    dis_conv = ht.convolve(dis_signal, dis_kernel_even, mode=mode)
-                    gathered = manipulations.resplit(conv, axis=None)
-                    dis_gathered = manipulations.resplit(dis_conv, axis=None)
+                    # int tests not on MPS
+                    if not self.is_mps:
+                        conv = ht.convolve(dis_signal, kernel_even, mode=mode)
+                        dis_conv = ht.convolve(dis_signal, dis_kernel_even, mode=mode)
+                        gathered = manipulations.resplit(conv, axis=None)
+                        dis_gathered = manipulations.resplit(dis_conv, axis=None)
 
-                    if mode == "full":
-                        self.assertTrue(ht.equal(full_even, gathered))
-                        self.assertTrue(ht.equal(full_even, dis_gathered))
+                        if mode == "full":
+                            self.assertTrue(ht.equal(full_even, gathered))
+                            self.assertTrue(ht.equal(full_even, dis_gathered))
+                        else:
+                            self.assertTrue(ht.equal(full_even[3:-3], gathered))
+                            self.assertTrue(ht.equal(full_even[3:-3], dis_gathered))
                     else:
-                        self.assertTrue(ht.equal(full_even[3:-3], gathered))
-                        self.assertTrue(ht.equal(full_even[3:-3], dis_gathered))
+                        # float tests
+                        conv = ht.convolve(dis_signal.astype(ht.float), kernel_even, mode=mode)
+                        dis_conv = ht.convolve(
+                            dis_signal.astype(ht.float), dis_kernel_even.astype(ht.float), mode=mode
+                        )
+                        gathered = manipulations.resplit(conv, axis=None)
+                        dis_gathered = manipulations.resplit(dis_conv, axis=None)
+
+                        if mode == "full":
+                            self.assertTrue(ht.equal(full_even.astype(ht.float), gathered))
+                            self.assertTrue(ht.equal(full_even.astype(ht.float), dis_gathered))
+                        else:
+                            self.assertTrue(ht.equal(full_even[3:-3].astype(ht.float), gathered))
+                            self.assertTrue(
+                                ht.equal(full_even[3:-3].astype(ht.float), dis_gathered)
+                            )
 
                 # distributed large signal and kernel
                 np.random.seed(12)
@@ -105,27 +126,44 @@ def test_convolve(self):
                 np_b = np.random.randint(1000, size=1543)
                 np_conv = np.convolve(np_a, np_b, mode=mode)
 
-                a = ht.array(np_a, split=0, dtype=ht.int32)
-                b = ht.array(np_b, split=0, dtype=ht.int32)
-                conv = ht.convolve(a, b, mode=mode)
-                self.assert_array_equal(conv, np_conv)
+                if self.is_mps:
+                    # torch convolution only supports float on MPS
+                    a = ht.array(np_a, split=0, dtype=ht.float32)
+                    b = ht.array(np_b, split=0, dtype=ht.float32)
+                    conv = ht.convolve(a, b, mode=mode)
+                    self.assert_array_equal(conv, np_conv.astype(np.float32))
+                else:
+                    a = ht.array(np_a, split=0, dtype=ht.int32)
+                    b = ht.array(np_b, split=0, dtype=ht.int32)
+                    conv = ht.convolve(a, b, mode=mode)
+                    self.assert_array_equal(conv, np_conv)
 
         # test edge cases
         # non-distributed signal, size-1 kernel
-        signal = ht.arange(0, 16).astype(ht.int)
-        alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-        kernel = ht.ones(1).astype(ht.int)
-        conv = ht.convolve(alt_signal, kernel)
+        if self.is_mps:
+            # torch convolution only supports float on MPS
+            signal = ht.arange(0, 16, dtype=ht.float32)
+            alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
+            kernel = ht.ones(1, dtype=ht.float32)
+            conv = ht.convolve(alt_signal, kernel)
+        else:
+            signal = ht.arange(0, 16).astype(ht.int)
+            alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
+            kernel = ht.ones(1).astype(ht.int)
+            conv = ht.convolve(alt_signal, kernel)
         self.assertTrue(ht.equal(signal, conv))
 
-        conv = ht.convolve(1, 5)
-        self.assertTrue(ht.equal(ht.array([5]), conv))
+        if not self.is_mps:
+            conv = ht.convolve(1, 5)
+            self.assertTrue(ht.equal(ht.array([5]), conv))
 
-        # test batched convolutions, distributed along the first axis
-        signal = ht.random.randn(1000, dtype=ht.float64)
-        batch_signal = ht.empty((10, 1000), dtype=ht.float64, split=0)
+        # test batched convolutions
+        float_dtype = ht.float32 if self.is_mps else ht.float64
+        # distributed along the first axis
+        signal = ht.random.randn(1000, dtype=float_dtype)
+        batch_signal = ht.empty((10, 1000), dtype=float_dtype, split=0)
         batch_signal.larray[:] = signal.larray
-        kernel = ht.random.randn(19, dtype=ht.float64)
+        kernel = ht.random.randn(19, dtype=float_dtype)
         batch_convolved = ht.convolve(batch_signal, kernel, mode="same")
         self.assertTrue(ht.equal(ht.convolve(signal, kernel, mode="same"), batch_convolved[0]))
 
@@ -133,13 +171,13 @@ def test_convolve(self):
         dis_kernel = ht.array(kernel, split=0)
         batch_convolved = ht.convolve(batch_signal, dis_kernel)
         self.assertTrue(ht.equal(ht.convolve(signal, kernel), batch_convolved[0]))
-        batch_kernel = ht.empty((10, 19), dtype=ht.float64, split=1)
+        batch_kernel = ht.empty((10, 19), dtype=float_dtype, split=1)
         batch_kernel.larray[:] = dis_kernel.larray
         batch_convolved = ht.convolve(batch_signal, batch_kernel, mode="full")
         self.assertTrue(ht.equal(ht.convolve(signal, kernel, mode="full"), batch_convolved[0]))
 
         # n-D batch convolution
-        batch_signal = ht.empty((4, 3, 3, 1000), dtype=ht.float64, split=1)
+        batch_signal = ht.empty((4, 3, 3, 1000), dtype=float_dtype, split=1)
         batch_signal.larray[:, :, :] = signal.larray
         batch_convolved = ht.convolve(batch_signal, kernel, mode="valid")
         self.assertTrue(
@@ -147,10 +185,260 @@ def test_convolve(self):
         )
 
         # test batch-convolve exceptions
-        batch_kernel_wrong_shape = ht.random.randn(3, 19, dtype=ht.float64)
+        batch_kernel_wrong_shape = ht.random.randn(3, 19, dtype=float_dtype)
         with self.assertRaises(ValueError):
             ht.convolve(batch_signal, batch_kernel_wrong_shape)
         if kernel.comm.size > 1:
             batch_signal_wrong_split = batch_signal.resplit(-1)
             with self.assertRaises(ValueError):
                 ht.convolve(batch_signal_wrong_split, kernel)
+
+    def test_only_balanced_kernel(self):
+        signal = ht.arange(0, 16, split=0).astype(ht.float32)
+        dis_kernel = ht.array([1, 1, 1], split=0).astype(ht.float32)
+
+        if self.comm.size > 1:
+            target_map = dis_kernel.lshape_map
+            target_map[0] = 3
+            target_map[1:] = 0
+            dis_kernel.redistribute_(dis_kernel.lshape_map, target_map)
+            with self.assertRaises(ValueError):
+                ht.convolve(signal, dis_kernel)
+
+    def test_convolve_stride_errors(self):
+        dis_signal = ht.arange(0, 16, split=0).astype(ht.int)
+        kernel_odd = ht.ones(3).astype(ht.int)
+        kernel_even = [1, 1, 1, 1]
+
+        # stride not positive integer
+        with self.assertRaises(ValueError):
+            ht.convolve(dis_signal, kernel_even, mode="full", stride=0)
+
+        # stride > 1 for mode 'same'
+        with self.assertRaises(ValueError):
+            ht.convolve(dis_signal, kernel_odd, mode="same", stride=2)
+
+    def test_convolve_stride_batch_convolutions(self):
+        float_dtype = ht.float32 if self.is_mps else ht.float64
+        signal = ht.random.randn(1000, dtype=float_dtype)
+        kernel = ht.random.randn(19, dtype=float_dtype)
+
+        # distributed input along the first axis
+        stride = 123
+        batch_signal = ht.empty((10, 1000), dtype=float_dtype, split=0)
+        batch_signal.larray[:] = signal.larray
+
+        batch_convolved = ht.convolve(batch_signal, kernel, mode="valid", stride=stride)
+        self.assertTrue(
+            ht.equal(ht.convolve(signal, kernel, mode="valid", stride=stride), batch_convolved[0])
+        )
+
+        # distributed kernel
+        stride = 142
+        dis_kernel = ht.array(kernel, split=0)
+
+        batch_convolved = ht.convolve(batch_signal, dis_kernel, stride=stride)
+        self.assertTrue(ht.equal(ht.convolve(signal, kernel, stride=stride), batch_convolved[0]))
+
+        # batch kernel
+        stride = 41
+        batch_kernel = ht.empty((10, 19), dtype=float_dtype, split=1)
+        batch_kernel.larray[:] = dis_kernel.larray
+
+        batch_convolved = ht.convolve(batch_signal, kernel, mode="full", stride=stride)
+        self.assertTrue(
+            ht.equal(ht.convolve(signal, kernel, mode="full", stride=stride), batch_convolved[0])
+        )
+
+        # n-D batch convolution
+        stride = 55
+        batch_signal = ht.empty((4, 3, 3, 1000), dtype=float_dtype, split=1)
+        batch_signal.larray[:, :, :] = signal.larray
+
+        batch_convolved = ht.convolve(batch_signal, kernel, mode="valid", stride=stride)
+        self.assertTrue(
+            ht.equal(
+                ht.convolve(signal, kernel, mode="valid", stride=stride),
+                batch_convolved[1, 2, 0],
+            )
+        )
+
+    def assert_convolution_stride(self, signal, kernel, mode, stride, solution):
+        conv = ht.convolve(signal, kernel, mode=mode, stride=stride)
+        gathered = manipulations.resplit(conv, axis=None)
+        self.assertTrue(ht.equal(solution, gathered))
+
+    def test_convolve_stride_kernel_odd_mode_full(self):
+
+        ht_dtype = ht.int
+
+        mode = "full"
+        stride = 2
+        solution = ht.array([0, 3, 9, 15, 21, 27, 33, 39, 29]).astype(ht_dtype)
+
+        dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype)
+        signal = ht.arange(0, 16).astype(ht_dtype)
+        kernel = ht.ones(3).astype(ht_dtype)
+        dis_kernel = ht.ones(3, split=0).astype(ht_dtype)
+
+        # avoid kernel larger than signal chunk
+        if self.comm.size <= 3:
+
+            if not self.is_mps:
+                # torch convolution does not support int on MPS
+                self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution)
+                self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution)
+                self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution)
+
+            # different data types of input and kernel
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+
+    def test_convolve_stride_kernel_odd_mode_valid(self):
+
+        ht_dtype = ht.int
+
+        mode = "valid"
+        stride = 2
+        solution = ht.array([3, 9, 15, 21, 27, 33, 39]).astype(ht_dtype)
+
+        dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype)
+        signal = ht.arange(0, 16).astype(ht_dtype)
+        kernel = ht.ones(3).astype(ht_dtype)
+        dis_kernel = ht.ones(3, split=0).astype(ht_dtype)
+
+        # avoid kernel larger than signal chunk
+        if self.comm.size <= 3:
+
+            if not self.is_mps:
+                # torch convolution does not support int on MPS
+                self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution)
+                self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution)
+                self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution)
+
+            # different data types of input and kernel
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+
+    def test_convolve_stride_kernel_even_mode_full(self):
+
+        ht_dtype = ht.int
+
+        mode = "full"
+        stride = 2
+        solution = ht.array([0, 3, 10, 18, 26, 34, 42, 50, 42, 15]).astype(ht_dtype)
+
+        dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype)
+        signal = ht.arange(0, 16).astype(ht_dtype)
+        kernel = [1, 1, 1, 1]
+        dis_kernel = ht.ones(4, split=0).astype(ht_dtype)
+
+        # avoid kernel larger than signal chunk
+        if self.comm.size <= 3:
+
+            if not self.is_mps:
+                # torch convolution does not support int on MPS
+                self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution)
+                self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution)
+                self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution)
+
+            # different data types of input and kernel
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+
+    def test_convolve_stride_kernel_even_mode_valid(self):
+
+        ht_dtype = ht.int
+
+        mode = "valid"
+        stride = 2
+        solution = ht.array([6, 14, 22, 30, 38, 46, 54]).astype(ht_dtype)
+
+        dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype)
+        signal = ht.arange(0, 16).astype(ht_dtype)
+        kernel = [1, 1, 1, 1]
+        dis_kernel = ht.ones(4, split=0).astype(ht_dtype)
+
+        # avoid kernel larger than signal chunk
+        if self.comm.size <= 3:
+
+            if not self.is_mps:
+                # torch convolution does not support int on MPS
+                self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution)
+                self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution)
+                self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution)
+
+            # different data types of input and kernel
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+            self.assert_convolution_stride(
+                dis_signal.astype(ht.float), dis_kernel, mode, stride, solution
+            )
+
+    def test_convolution_stride_large_signal_and_kernel_modes(self):
+        if self.comm.size <= 3:
+            # prep
+            np.random.seed(12)
+            np_a = np.random.randint(1000, size=4418)
+            np_b = np.random.randint(1000, size=154)
+            # torch convolution does not support int on MPS
+            ht_dtype = ht.float32 if self.is_mps else ht.int32
+            np_type = np.float32 if self.is_mps else np.int32
+            stride = np.random.randint(1, high=len(np_a), size=1)[0]
+
+            for mode in ["full", "valid"]:
+                # solution
+                np_conv = np.convolve(np_a, np_b, mode=mode)
+                solution = np_conv[::stride].astype(np_type)
+
+                # test
+                a = ht.array(np_a, split=0, dtype=ht_dtype)
+                b = ht.array(np_b, split=None, dtype=ht_dtype)
+                conv = ht.convolve(a, b, mode=mode, stride=stride)
+                self.assert_array_equal(conv, solution)
+
+                b = ht.array(np_b, split=0, dtype=ht_dtype)
+                conv = ht.convolve(a, b, mode=mode, stride=stride)
+                self.assert_array_equal(conv, solution)
+
+    def test_convolution_stride_kernel_size_1(self):
+
+        # prep
+        ht_dtype = ht.float32 if self.is_mps else ht.int32
+
+        # non-distributed signal
+        signal = ht.arange(0, 16, dtype=ht_dtype)
+        alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
+        kernel = ht.ones(1, dtype=ht_dtype)
+        conv = ht.convolve(alt_signal, kernel, stride=2)
+        self.assertTrue(ht.equal(signal[0::2], conv))
+
+        if not self.is_mps:
+            for s in [2, 3, 4]:
+                conv = ht.convolve(1, 5, stride=s)
+                self.assertTrue(ht.equal(ht.array([5]), conv))
diff --git a/heat/core/tests/test_statistics.py b/heat/core/tests/test_statistics.py
index a6024d6b54..64579e7c73 100644
--- a/heat/core/tests/test_statistics.py
+++ b/heat/core/tests/test_statistics.py
@@ -60,7 +60,7 @@ def test_argmax(self):
         data = ht.tril(ht.ones((size, size), split=0), k=-1)
 
         result = ht.argmax(data, axis=0)
-        expected = torch.tensor(np.argmax(data.numpy(), axis=0))
+        expected = torch.tensor(np.argmax(data.numpy(), axis=0), device=result.larray.device)
         self.assertIsInstance(result, ht.DNDarray)
         self.assertEqual(result.dtype, ht.int64)
         self.assertEqual(result.larray.dtype, torch.int64)
@@ -77,7 +77,7 @@ def test_argmax(self):
 
         output = ht.empty((size,), dtype=ht.int64)
         result = ht.argmax(data, axis=0, out=output)
-        expected = torch.tensor(np.argmax(data.numpy(), axis=0))
+        expected = torch.tensor(np.argmax(data.numpy(), axis=0), device=result.larray.device)
         self.assertIsInstance(result, ht.DNDarray)
         self.assertEqual(output.dtype, ht.int64)
         self.assertEqual(output.larray.dtype, torch.int64)
@@ -151,7 +151,7 @@ def test_argmin(self):
         data = ht.triu(ht.ones((size, size), split=0), k=1)
 
         result = ht.argmin(data, axis=0)
-        expected = torch.tensor(np.argmin(data.numpy(), axis=0))
+        expected = torch.tensor(np.argmin(data.numpy(), axis=0), device=result.larray.device)
         self.assertIsInstance(result, ht.DNDarray)
         self.assertEqual(result.dtype, ht.int64)
         self.assertEqual(result.larray.dtype, torch.int64)
@@ -168,7 +168,7 @@ def test_argmin(self):
 
         output = ht.empty((size,), dtype=ht.int64)
         result = ht.argmin(data, axis=0, out=output)
-        expected = torch.tensor(np.argmin(data.numpy(), axis=0))
+        expected = torch.tensor(np.argmin(data.numpy(), axis=0), device=result.larray.device)
         self.assertIsInstance(result, ht.DNDarray)
         self.assertEqual(output.dtype, ht.int64)
         self.assertEqual(output.larray.dtype, torch.int64)
@@ -228,21 +228,25 @@ def test_average(self):
         self.assertEqual(avg_horizontal.larray.dtype, torch.float32)
         self.assertTrue((avg_horizontal.numpy() == np.average(comparison, axis=1)).all())
 
+        if self.is_mps:
+            dtype = torch.float32
+        else:
+            dtype = torch.float64
         # check weighted average over all float elements of split 3d tensor, across split axis
         random_volume = ht.array(
-            torch.randn((3, 3, 3), dtype=torch.float64, device=self.device.torch_device), is_split=1
+            torch.randn((3, 3, 3), dtype=dtype, device=self.device.torch_device), is_split=1
         )
         size = random_volume.comm.size
         random_weights = ht.array(
-            torch.randn((3 * size,), dtype=torch.float64, device=self.device.torch_device), split=0
+            torch.randn((3 * size,), dtype=dtype, device=self.device.torch_device), split=0
         )
         avg_volume = ht.average(random_volume, weights=random_weights, axis=1)
         np_avg_volume = np.average(random_volume.numpy(), weights=random_weights.numpy(), axis=1)
         self.assertIsInstance(avg_volume, ht.DNDarray)
         self.assertEqual(avg_volume.shape, (3, 3))
         self.assertEqual(avg_volume.lshape, (3, 3))
-        self.assertEqual(avg_volume.dtype, ht.float64)
-        self.assertEqual(avg_volume.larray.dtype, torch.float64)
+        self.assertEqual(avg_volume.dtype, ht.types.canonical_heat_type(dtype))
+        self.assertEqual(avg_volume.larray.dtype, dtype)
         self.assertEqual(avg_volume.split, None)
         self.assertAlmostEqual(avg_volume.numpy().all(), np_avg_volume.all())
         avg_volume_with_cumwgt = ht.average(
@@ -256,15 +260,15 @@ def test_average(self):
         # check weighted average over all float elements of split 3d tensor (3d weights)
 
         random_weights_3d = ht.array(
-            torch.randn((3, 3, 3), dtype=torch.float64, device=self.device.torch_device), is_split=1
+            torch.randn((3, 3, 3), dtype=dtype, device=self.device.torch_device), is_split=1
         )
         avg_volume = ht.average(random_volume, weights=random_weights_3d, axis=1)
         np_avg_volume = np.average(random_volume.numpy(), weights=random_weights.numpy(), axis=1)
         self.assertIsInstance(avg_volume, ht.DNDarray)
         self.assertEqual(avg_volume.shape, (3, 3))
         self.assertEqual(avg_volume.lshape, (3, 3))
-        self.assertEqual(avg_volume.dtype, ht.float64)
-        self.assertEqual(avg_volume.larray.dtype, torch.float64)
+        self.assertEqual(avg_volume.dtype, ht.types.canonical_heat_type(dtype))
+        self.assertEqual(avg_volume.larray.dtype, dtype)
         self.assertEqual(avg_volume.split, None)
         self.assertAlmostEqual(avg_volume.numpy().all(), np_avg_volume.all())
         avg_volume_with_cumwgt = ht.average(
@@ -344,8 +348,13 @@ def test_bincount(self):
         w = ht.arange(5)
         res = ht.bincount(a, weights=w)
         self.assertEqual(res.size, 5)
-        self.assertEqual(res.dtype, ht.float64)
-        self.assertTrue(ht.equal(res, ht.arange(5, dtype=ht.float64)))
+        if self.is_mps:
+            # torch.bincount on MPS returns int32 here
+            self.assertEqual(res.dtype, ht.int32)
+            self.assertTrue(ht.equal(res, ht.arange(5)))
+        else:
+            self.assertEqual(res.dtype, ht.float64)
+            self.assertTrue(ht.equal(res, ht.arange(5, dtype=ht.float64)))
 
         res = ht.bincount(a, minlength=8)
         self.assertEqual(res.size, 8)
@@ -356,8 +365,13 @@ def test_bincount(self):
         w = ht.arange(4, split=0)
         res = ht.bincount(a, weights=w)
         self.assertEqual(res.size, 4)
-        self.assertEqual(res.dtype, ht.float64)
-        self.assertTrue(ht.equal(res, ht.arange(4, dtype=ht.float64)))
+        if self.is_mps:
+            # torch.bincount on MPS returns int32 here
+            self.assertEqual(res.dtype, ht.int32)
+            self.assertTrue(ht.equal(res, ht.arange(4)))
+        else:
+            self.assertEqual(res.dtype, ht.float64)
+            self.assertTrue(ht.equal(res, ht.arange(4, dtype=ht.float64)))
 
         with self.assertRaises(ValueError):
             ht.bincount(ht.array([0, 1, 2, 3], split=0), weights=ht.array([1, 2, 3, 4]))
@@ -390,66 +404,73 @@ def test_bucketize(self):
                 ht.bucketize(a, ht.array([0.0, 0.5, 1.0], split=0))
 
     def test_cov(self):
-        x = ht.array([[0, 2], [1, 1], [2, 0]], dtype=ht.float, split=1).T
+        if self.is_mps:
+            dtype = ht.float32
+            np_dtype = np.float32
+        else:
+            dtype = ht.float64
+            np_dtype = np.float64
+
+        x = ht.array([[0, 2], [1, 1], [2, 0]], dtype=dtype, split=1).T
         if x.comm.size < 3:
             cov = ht.cov(x)
             actual = ht.array([[1, -1], [-1, 1]], split=0)
             self.assertTrue(ht.equal(cov, actual))
 
         data = np.loadtxt("heat/datasets/iris.csv", delimiter=";")
-        np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False)
+        np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False).astype(np_dtype)
 
         # split = None tests
         htdata = ht.load("heat/datasets/iris.csv", sep=";", split=None)
         ht_cov = ht.cov(htdata[:, 0], htdata[:, 1:3], rowvar=False)
-        comp = ht.array(np_cov, dtype=ht.float)
+        comp = ht.array(np_cov, dtype=dtype)
         self.assertTrue(ht.allclose(comp - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False)
+        np_cov = np.cov(data, rowvar=False).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False, ddof=1)
+        np_cov = np.cov(data, rowvar=False, ddof=1).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False, ddof=1)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False, bias=True)
+        np_cov = np.cov(data, rowvar=False, bias=True).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False, bias=True)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
         # split = 0 tests
         data = np.loadtxt("heat/datasets/iris.csv", delimiter=";")
-        np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False)
+        np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False).astype(np_dtype)
 
         htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0)
         ht_cov = ht.cov(htdata[:, 0], htdata[:, 1:3], rowvar=False)
         comp = ht.array(np_cov, dtype=ht.float)
         self.assertTrue(ht.allclose(comp - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False)
+        np_cov = np.cov(data, rowvar=False).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False, ddof=1)
+        np_cov = np.cov(data, rowvar=False, ddof=1).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False, ddof=1)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
-        np_cov = np.cov(data, rowvar=False, bias=True)
+        np_cov = np.cov(data, rowvar=False, bias=True).astype(np_dtype)
         ht_cov = ht.cov(htdata, rowvar=False, bias=True)
-        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4))
+        self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4))
 
         if 1 < x.comm.size < 5:
             # split 1 tests
             htdata = ht.load("heat/datasets/iris.csv", sep=";", split=1)
-            np_cov = np.cov(data, rowvar=False)
+            np_cov = np.cov(data, rowvar=False).astype(np_dtype)
             ht_cov = ht.cov(htdata, rowvar=False)
-            self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float), ht_cov, atol=1e-4))
+            self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype), ht_cov, atol=1e-4))
 
-            np_cov = np.cov(data, data, rowvar=True)
+            np_cov = np.cov(data, data, rowvar=True).astype(np_dtype)
 
             htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0)
             ht_cov = ht.cov(htdata, htdata, rowvar=True)
-            self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float), ht_cov, atol=1e-4))
+            self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype), ht_cov, atol=1e-4))
 
             htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0)
             with self.assertRaises(RuntimeError):
@@ -516,14 +537,16 @@ def test_digitize(self):
                 ht.digitize(a, ht.array([0.0, 0.5, 1.0], split=0))
 
     def test_histc(self):
-        # few entries and float64
-        c = torch.arange(4, dtype=torch.float64, device=self.device.torch_device)
+        dtype = torch.float32 if self.is_mps else torch.float64
+
+        # few entries and (if not MPS) float64
+        c = torch.arange(4, dtype=dtype, device=self.device.torch_device)
         comp = torch.histc(c, 7)
         a = ht.array(c)
         res = ht.histc(a, 7)
 
         self.assertEqual(res.shape, (7,))
-        self.assertEqual(res.dtype, ht.float64)
+        self.assertEqual(res.dtype, ht.types.canonical_heat_type(dtype))
         self.assertEqual(res.device, self.device)
         self.assertEqual(res.split, None)
         self.assertTrue(torch.equal(res.larray, comp))
@@ -586,7 +609,7 @@ def test_histc(self):
         self.assertTrue(torch.equal(out.larray, comp))
 
         # Alias
-        a = ht.arange(10, dtype=ht.float)
+        a = ht.arange(10, dtype=dtype)
         hist = ht.histc(a, 10)
         alias = ht.histogram(a)
 
@@ -622,7 +645,9 @@ def __split_calc(ht_split, axis):
         # 1 dim
         ht_data = ht.random.rand(50)
         np_data = ht_data.copy().numpy()
-        np_kurtosis32 = ht.array((ss.kurtosis(np_data, bias=False)), dtype=ht_data.dtype)
+        np_kurtosis32 = ht.array(
+            (ss.kurtosis(np_data, bias=False)).astype(np_data.dtype), dtype=ht_data.dtype
+        )
         self.assertAlmostEqual(ht.kurtosis(ht_data), np_kurtosis32.item(), places=5)
         ht_data = ht.resplit(ht_data, 0)
         self.assertAlmostEqual(ht.kurtosis(ht_data), np_kurtosis32.item(), places=5)
@@ -651,21 +676,23 @@ def __split_calc(ht_split, axis):
             sp = __split_calc(ht_data.split, ax)
             self.assertEqual(ht_kurtosis.split, sp)
 
-        # 2 dim float64
-        ht_data = ht.random.rand(50, 30, dtype=ht.float64)
+        # 2 dim float64 (if not MPS)
+        dtype = ht.float64 if not self.is_mps else ht.float32
+        ht_data = ht.random.rand(50, 30, dtype=dtype)
         np_data = ht_data.copy().numpy()
-        np_kurtosis32 = ss.kurtosis(np_data, axis=None, bias=False)
+        np_kurtosis32 = ss.kurtosis(np_data, axis=None, bias=False).astype(np_data.dtype)
         self.assertAlmostEqual(ht.kurtosis(ht_data) - np_kurtosis32, 0, places=5)
         ht_data = ht.resplit(ht_data, 0)
         for ax in range(2):
             np_kurtosis32 = ht.array(
-                (ss.kurtosis(np_data, axis=ax, bias=False)), dtype=ht_data.dtype
+                (ss.kurtosis(np_data, axis=ax, bias=False)).astype(np_data.dtype),
+                dtype=ht_data.dtype,
             )
             ht_kurtosis = ht.kurtosis(ht_data, axis=ax)
             self.assertTrue(ht.allclose(ht_kurtosis, np_kurtosis32, atol=1e-5))
             sp = __split_calc(ht_data.split, ax)
             self.assertEqual(ht_kurtosis.split, sp)
-            self.assertEqual(ht_kurtosis.dtype, ht.float64)
+            self.assertEqual(ht_kurtosis.dtype, dtype)
         ht_data = ht.resplit(ht_data, 1)
         for ax in range(2):
             np_kurtosis32 = ht.array(
@@ -675,7 +702,7 @@ def __split_calc(ht_split, axis):
             self.assertTrue(ht.allclose(ht_kurtosis, np_kurtosis32, atol=1e-5))
             sp = __split_calc(ht_data.split, ax)
             self.assertEqual(ht_kurtosis.split, sp)
-            self.assertEqual(ht_kurtosis.dtype, ht.float64)
+            self.assertEqual(ht_kurtosis.dtype, dtype)
 
         # 3 dim
         ht_data = ht.random.rand(50, 30, 16)
@@ -819,11 +846,12 @@ def test_maximum(self):
         self.assertTrue((maximum_volume.numpy() == np_maximum).all())
 
         # check maximum against size-1 array
-        random_volume_1_split_none = ht.random.randn(1, split=None, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        random_volume_1_split_none = ht.random.randn(1, split=None, dtype=dtype)
         random_volume_2_splitdiff = ht.random.randn(3, 3, 4, split=1)
         maximum_volume_splitdiff = ht.maximum(random_volume_1_split_none, random_volume_2_splitdiff)
         self.assertEqual(maximum_volume_splitdiff.split, 1)
-        self.assertEqual(maximum_volume_splitdiff.dtype, ht.float64)
+        self.assertEqual(maximum_volume_splitdiff.dtype, dtype)
 
         random_volume_1_split_none = ht.random.randn(3, 3, 4, split=0)
         random_volume_2_splitdiff = ht.random.randn(1, split=None)
@@ -1082,11 +1110,12 @@ def test_minimum(self):
         self.assertTrue((minimum_volume.numpy() == np_minimum).all())
 
         # check minimum against size-1 array
-        random_volume_1_split_none = ht.random.randn(1, split=None, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        random_volume_1_split_none = ht.random.randn(1, split=None, dtype=dtype)
         random_volume_2_splitdiff = ht.random.randn(3, 3, 4, split=1)
         minimum_volume_splitdiff = ht.minimum(random_volume_1_split_none, random_volume_2_splitdiff)
         self.assertEqual(minimum_volume_splitdiff.split, 1)
-        self.assertEqual(minimum_volume_splitdiff.dtype, ht.float64)
+        self.assertEqual(minimum_volume_splitdiff.dtype, dtype)
 
         random_volume_1_split_none = ht.random.randn(3, 3, 4, split=0)
         random_volume_2_splitdiff = ht.random.randn(1, split=None)
@@ -1182,12 +1211,13 @@ def test_percentile(self):
         # test list q and writing to output buffer
         q = [0.1, 2.3, 15.9, 50.0, 84.1, 97.7, 99.9]
         axis = 2
+        out_dtype = ht.float32 if self.is_mps else ht.float64
         try:
             p_np = np.percentile(x_np, q, axis=axis, method="lower", keepdims=True)
         except TypeError:
             p_np = np.percentile(x_np, q, axis=axis, interpolation="lower", keepdims=True)
         p_ht = ht.percentile(x_ht, q, axis=axis, interpolation="lower", keepdims=True)
-        out = ht.empty(p_np.shape, dtype=ht.float64, split=None, device=x_ht.device)
+        out = ht.empty(p_np.shape, dtype=out_dtype, split=None, device=x_ht.device)
         ht.percentile(x_ht, q, axis=axis, out=out, interpolation="lower", keepdims=True)
         self.assertEqual(p_ht.numpy()[5].all(), p_np[5].all())
         self.assertEqual(out.numpy()[2].all(), p_np[2].all())
@@ -1225,8 +1255,9 @@ def test_percentile(self):
 
         # test tuple axis and out buffer
         q = (20, 50, 80)
+        dtype = ht.float32 if self.is_mps else ht.float64
         for split in [None, 2, 1, 0]:
-            x_ht = ht.random.randn(3, 10, 10, dtype=ht.float64, split=split)
+            x_ht = ht.random.randn(3, 10, 10, dtype=dtype, split=split)
             x_np = x_ht.numpy()
             p_np = np.percentile(x_np, q, axis=(0, 1))
             if isinstance(split, int) and split == 2:
@@ -1250,13 +1281,14 @@ def test_percentile(self):
         t_out = torch.empty((len(q),), dtype=torch.float64)
         with self.assertRaises(TypeError):
             ht.percentile(x_ht, q, out=t_out)
-        out_wrong_dtype = ht.empty((len(q),), dtype=ht.float32)
-        with self.assertRaises(TypeError):
-            ht.percentile(x_ht, q, out=out_wrong_dtype)
-        out_wrong_shape = ht.empty((len(q) + 1,), dtype=ht.float64)
+        if not self.is_mps:
+            out_wrong_dtype = ht.empty((len(q),), dtype=ht.float32)
+            with self.assertRaises(TypeError):
+                ht.percentile(x_ht, q, out=out_wrong_dtype)
+        out_wrong_shape = ht.empty((len(q) + 1,), dtype=dtype)
         with self.assertRaises(ValueError):
             ht.percentile(x_ht, q, out=out_wrong_shape)
-        out_wrong_split = ht.empty((len(q),), dtype=ht.float32, split=0)
+        out_wrong_split = ht.empty((len(q),), dtype=dtype, split=0)
         with self.assertRaises(ValueError):
             ht.percentile(x_ht, q, out=out_wrong_split)
 
@@ -1314,7 +1346,9 @@ def __split_calc(ht_split, axis):
         # 1 dim
         ht_data = ht.random.rand(50)
         np_data = ht_data.copy().numpy()
-        np_skew32 = ht.array(ss.skew(np_data, bias=False)).astype(ht_data.dtype)
+        np_skew32 = ht.array(ss.skew(np_data, bias=False).astype(np_data.dtype)).astype(
+            ht_data.dtype
+        )
         self.assertAlmostEqual(ht.skew(ht_data), np_skew32.item(), places=5)
         ht_data = ht.resplit(ht_data, 0)
         self.assertAlmostEqual(ht.skew(ht_data), np_skew32.item(), places=5)
@@ -1340,9 +1374,10 @@ def __split_calc(ht_split, axis):
             self.assertEqual(ht_skew.split, sp)
 
         # 2 dim float64
-        ht_data = ht.random.rand(50, 30, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        ht_data = ht.random.rand(50, 30, dtype=dtype)
         np_data = ht_data.copy().numpy()
-        np_skew32 = ss.skew(np_data, axis=None, bias=False)
+        np_skew32 = ss.skew(np_data, axis=None, bias=False).astype(np_data.dtype)
         self.assertAlmostEqual(ht.skew(ht_data) - np_skew32, 0, places=5)
         ht_data = ht.resplit(ht_data, 0)
         for ax in range(2):
@@ -1351,7 +1386,7 @@ def __split_calc(ht_split, axis):
             self.assertTrue(ht.allclose(ht_skew, np_skew32, atol=1e-5))
             sp = __split_calc(ht_data.split, ax)
             self.assertEqual(ht_skew.split, sp)
-            self.assertEqual(ht_skew.dtype, ht.float64)
+            self.assertEqual(ht_skew.dtype, dtype)
         ht_data = ht.resplit(ht_data, 1)
         for ax in range(2):
             np_skew32 = ht.array((ss.skew(np_data, axis=ax, bias=False)), dtype=ht_data.dtype)
@@ -1359,12 +1394,12 @@ def __split_calc(ht_split, axis):
             self.assertTrue(ht.allclose(ht_skew, np_skew32, atol=1e-5))
             sp = __split_calc(ht_data.split, ax)
             self.assertEqual(ht_skew.split, sp)
-            self.assertEqual(ht_skew.dtype, ht.float64)
+            self.assertEqual(ht_skew.dtype, dtype)
 
         # 3 dim
         ht_data = ht.random.rand(50, 30, 16)
         np_data = ht_data.copy().numpy()
-        np_skew32 = ss.skew(np_data, axis=None, bias=False)
+        np_skew32 = ss.skew(np_data, axis=None, bias=False).astype(np_data.dtype)
         self.assertAlmostEqual(ht.skew(ht_data) - np_skew32, 0, places=5)
         for split in range(3):
             ht_data = ht.resplit(ht_data, split)
@@ -1379,7 +1414,10 @@ def test_std(self):
         # test basics
         a = ht.arange(1, 5)
         self.assertAlmostEqual(a.std(), 1.118034)
-        self.assertAlmostEqual(a.std(bessel=True), 1.2909944)
+        if self.is_mps:
+            self.assertAlmostEqual(a.std(bessel=True).item(), 1.2909944, places=5)
+        else:
+            self.assertAlmostEqual(a.std(bessel=True), 1.2909944)
 
         # test raises
         x = ht.zeros((2, 3, 4))
@@ -1423,7 +1461,10 @@ def test_var(self):
             ht.var(x, axis=torch.Tensor([0, 0]))
 
         a = ht.arange(1, 5)
-        self.assertEqual(a.var(ddof=1), 1.666666666666666)
+        if self.is_mps:
+            self.assertAlmostEqual(a.var(ddof=1).item(), 1.666666666666666, places=5)
+        else:
+            self.assertEqual(a.var(ddof=1), 1.666666666666666)
 
         # ones
         dimensions = []
diff --git a/heat/core/tests/test_suites/basic_test.py b/heat/core/tests/test_suites/basic_test.py
index a222203d91..79242e297f 100644
--- a/heat/core/tests/test_suites/basic_test.py
+++ b/heat/core/tests/test_suites/basic_test.py
@@ -1,30 +1,26 @@
-import unittest
-import platform
 import os
-
-from heat.core import dndarray, MPICommunication, MPI, types, factories
-import heat as ht
+import platform
+import unittest
 
 import numpy as np
 import torch
+from typing import Optional, Callable, Any, Union
+
+import heat as ht
+from heat.core import MPI, MPICommunication, dndarray, factories, types, Device
 
 
 # TODO adapt for GPU once this is working properly
 class TestCase(unittest.TestCase):
     __comm = MPICommunication()
-    __device = None
-    _hostnames: list[str] = None
-
-    @property
-    def comm(self):
-        return TestCase.__comm
+    device: Device = ht.cpu
+    _hostnames: Optional[list[str]] = None
+    other_device: Optional[Device] = None
+    envar: Optional[str] = None
 
-    @property
-    def device(self):
-        return TestCase.__device
 
     @classmethod
-    def setUpClass(cls):
+    def setUpClass(cls) -> None:
         """
         Read the environment variable 'HEAT_TEST_USE_DEVICE' and return the requested devices.
         Supported values
@@ -36,8 +32,8 @@ def setUpClass(cls):
         RuntimeError if value of 'HEAT_TEST_USE_DEVICE' is not recognized
 
         """
-
         envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+        is_mps = False
 
         if envar == "cpu":
             ht.use_device("cpu")
@@ -46,35 +42,48 @@ def setUpClass(cls):
             if torch.cuda.is_available():
                 torch.cuda.set_device(torch.device(ht.gpu.torch_device))
                 other_device = ht.gpu
-        elif envar == "gpu" and torch.cuda.is_available():
-            ht.use_device("gpu")
-            torch.cuda.set_device(torch.device(ht.gpu.torch_device))
-            ht_device = ht.gpu
-            other_device = ht.cpu
+            elif torch.backends.mps.is_built() and torch.backends.mps.is_available():
+                other_device = ht.gpu
+        elif envar == "gpu":
+            if torch.cuda.is_available():
+                ht.use_device("gpu")
+                torch.cuda.set_device(torch.device(ht.gpu.torch_device))
+                ht_device = ht.gpu
+                other_device = ht.cpu
+            elif torch.backends.mps.is_built() and torch.backends.mps.is_available():
+                ht.use_device("gpu")
+                ht_device = ht.gpu
+                other_device = ht.cpu
+                is_mps = True
         else:
             raise RuntimeError(
                 f"Value '{envar}' of environment variable 'HEAT_TEST_USE_DEVICE' is unsupported"
             )
 
-        cls.device, cls.other_device, cls.envar = ht_device, other_device, envar
+        cls.device, cls.other_device, cls.envar, cls.is_mps = ht_device, other_device, envar, is_mps
+
+    @property
+    def comm(self) -> MPICommunication:
+        return self.__comm
 
-    def get_rank(self):
+
+    def get_rank(self) -> Optional[int]:
         return self.comm.rank
 
-    def get_size(self):
+    def get_size(self) -> Optional[int]:
         return self.comm.size
 
     @classmethod
-    def get_hostnames(cls):
+    def get_hostnames(cls) -> list[str]:
         if not cls._hostnames:
             if platform.system() == "Windows":
                 host = platform.uname().node
             else:
                 host = os.uname()[1]
-            cls._hostnames = set(cls.__comm.handle.allgather(host))
+            cls._hostnames = list(set(cls.__comm.handle.allgather(host)))
         return cls._hostnames
 
-    def assert_array_equal(self, heat_array, expected_array):
+    def assert_array_equal(self, heat_array: ht.DNDarray, expected_array: Union[np.ndarray,torch.Tensor], rtol:float=1e-5, atol:float=1e-08) -> None:
         """
         Check if the heat_array is equivalent to the expected_array. Therefore first the split heat_array is compared to
         the corresponding expected_array slice locally and second the heat_array is combined and fully compared with the
@@ -142,23 +151,24 @@ def assert_array_equal(self, heat_array, expected_array):
         )
         # compare local tensors to corresponding slice of expected_array
         is_allclose = torch.tensor(
-            np.allclose(heat_array.larray.cpu(), expected_array[slices]), dtype=torch.int32
+            np.allclose(heat_array.larray.cpu(), expected_array[slices], atol=atol, rtol=rtol),
+            dtype=torch.int32,
         )
         heat_array.comm.Allreduce(MPI.IN_PLACE, is_allclose, MPI.SUM)
         self.assertTrue(is_allclose == heat_array.comm.size)
 
     def assert_func_equal(
         self,
-        shape,
-        heat_func,
-        numpy_func,
-        distributed_result=True,
-        heat_args=None,
-        numpy_args=None,
-        data_types=(np.int32, np.int64, np.float32, np.float64),
-        low=-10000,
-        high=10000,
-    ):
+        shape: Union[tuple[Any, ...],list[Any]],
+        heat_func: Callable[..., Any],
+        numpy_func: Callable[..., Any],
+        distributed_result: bool=True,
+        heat_args: Optional[dict[str, Any]]=None,
+        numpy_args:Optional[dict[str, Any]]=None,
+        data_types: tuple[type,...]=(np.int32, np.int64, np.float32, np.float64),
+        low:int=-10000,
+        high:int=10000,
+    ) -> None:
         """
         This function will create random tensors of the given shape with different data types.
         All of these tensors will be tested with `ht.assert_func_equal_for_tensor`.
@@ -192,7 +202,7 @@ def assert_func_equal(
 
         Raises
         ------
-        AssertionError if the functions to not perform equally.
+        AssertionError if the functions do not perform equally.
 
         Examples
         --------
@@ -204,13 +214,19 @@ def assert_func_equal(
         AssertionError: [...]
         >>> self.assert_func_equal((1, 3, 5), ht.any, np.any, distributed_result=False)
 
-        >>> heat_args = {'sorted': True, 'axis': 0}
-        >>> numpy_args = {'axis': 0}
-        >>> self.assert_func_equal([5, 5, 5, 5], ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args)
+        >>> heat_args = {"sorted": True, "axis": 0}
+        >>> numpy_args = {"axis": 0}
+        >>> self.assert_func_equal(
+        ...     [5, 5, 5, 5], ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args
+        ... )
         """
         if not isinstance(shape, tuple) and not isinstance(shape, list):
             raise ValueError(f"The shape must be either a list or a tuple but was {type(shape)}")
 
+        if self.is_mps and np.float64 in data_types:
+            # MPS does not support float64
+            data_types = [dtype for dtype in data_types if dtype != np.float64]
+
         for dtype in data_types:
             tensor = self.__create_random_np_array(shape, dtype=dtype, low=low, high=high)
             self.assert_func_equal_for_tensor(
@@ -224,13 +240,13 @@ def assert_func_equal(
 
     def assert_func_equal_for_tensor(
         self,
-        tensor,
-        heat_func,
-        numpy_func,
-        heat_args=None,
-        numpy_args=None,
-        distributed_result=True,
-    ):
+        tensor: Union[np.ndarray,torch.Tensor],
+        heat_func: Callable[..., Any],
+        numpy_func: Callable[..., Any],
+        heat_args:Optional[dict[str,Any]]=None,
+        numpy_args:Optional[dict[str,Any]]=None,
+        distributed_result:bool=True,
+    ) -> None:
         """
         This function tests if the heat function and the numpy function create the equal result on the given tensor.
 
@@ -268,9 +284,11 @@ def assert_func_equal_for_tensor(
         >>> self.assert_func_equal_for_tensor(a, ht.any, np.any, distributed_result=False)
 
         >>> a = torch.ones([5, 5, 5, 5])
-        >>> heat_args = {'sorted': True, 'axis': 0}
-        >>> numpy_args = {'axis': 0}
-        >>> self.assert_func_equal_for_tensor(a, ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args)
+        >>> heat_args = {"sorted": True, "axis": 0}
+        >>> numpy_args = {"axis": 0}
+        >>> self.assert_func_equal_for_tensor(
+        ...     a, ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args
+        ... )
         """
         self.assertTrue(callable(heat_func))
         self.assertTrue(callable(numpy_func))
@@ -310,12 +328,12 @@ def assert_func_equal_for_tensor(
             else:
                 self.assertTrue(np.array_equal(ht_res.larray.cpu().numpy(), np_res))
 
-    def assertTrue_memory_layout(self, tensor, order):
+    def assertTrue_memory_layout(self, tensor: ht.DNDarray, order: str) -> None:
         """
         Checks that the memory layout of a given heat tensor is as specified by argument order.
 
-        Parameters:
-        -----------
+        Parameters
+        ----------
         order: str, 'C' for C-like (row-major), 'F' for Fortran-like (column-major) memory layout.
         """
         stride = tensor.larray.stride()
@@ -328,7 +346,7 @@ def assertTrue_memory_layout(self, tensor, order):
         else:
             raise ValueError(f"expected order to be 'C' or 'F', but was {order}")
 
-    def __create_random_np_array(self, shape, dtype=np.float32, low=-10000, high=10000):
+    def __create_random_np_array(self, shape: Union[list[Any],tuple[Any]], dtype:type=np.float32, low:int=-10000, high:int=10000) -> np.ndarray:
         """
         Creates a random array based on the input parameters.
         The used seed will be printed to stdout for debugging purposes.
@@ -364,7 +382,7 @@ def __create_random_np_array(self, shape, dtype=np.float32, low=-10000, high=100
         self.comm.Bcast(seed, root=0)
         np.random.seed(seed=seed.item())
         if issubclass(dtype, np.floating):
-            array = np.random.randn(*shape)
+            array: np.ndarray = np.random.randn(*shape)
         elif issubclass(dtype, np.integer):
             array = np.random.randint(low=low, high=high, size=shape)
         else:
diff --git a/heat/core/tests/test_tiling.py b/heat/core/tests/test_tiling.py
index f94784a33e..b6e00c3161 100644
--- a/heat/core/tests/test_tiling.py
+++ b/heat/core/tests/test_tiling.py
@@ -1,9 +1,17 @@
+import os
+import platform
+import unittest
+
 import torch
 
 import heat as ht
 from .test_suites.basic_test import TestCase
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.machine() == "arm64"
+
 
+@unittest.skipIf(is_mps, "Distribution not supported on Apple MPS")
 class TestSplitTiles(TestCase):
     # most of the cases are covered by the resplit tests
     def test_raises(self):
diff --git a/heat/core/tests/test_trigonometrics.py b/heat/core/tests/test_trigonometrics.py
index 9202550589..7e09472b86 100644
--- a/heat/core/tests/test_trigonometrics.py
+++ b/heat/core/tests/test_trigonometrics.py
@@ -10,7 +10,7 @@ def test_arccos(self):
         # base elements
         elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0]
         comparison = torch.tensor(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).acos()
 
         # arccos of float32
@@ -18,19 +18,20 @@ def test_arccos(self):
         float32_arccos = ht.acos(float32_tensor)
         self.assertIsInstance(float32_arccos, ht.DNDarray)
         self.assertEqual(float32_arccos.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_arccos.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_arccos.larray, comparison))
 
-        # arccos of float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_arccos = ht.arccos(float64_tensor)
-        self.assertIsInstance(float64_arccos, ht.DNDarray)
-        self.assertEqual(float64_arccos.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_arccos.larray.double(), comparison))
+        if not self.is_mps:
+            # arccos of float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_arccos = ht.arccos(float64_tensor)
+            self.assertIsInstance(float64_arccos, ht.DNDarray)
+            self.assertEqual(float64_arccos.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_arccos.larray, comparison.double()))
 
         # arccos of value out of domain
         nan_tensor = ht.array([1.2])
         nan_arccos = ht.arccos(nan_tensor)
-        self.assertIsInstance(float64_arccos, ht.DNDarray)
+        self.assertIsInstance(nan_arccos, ht.DNDarray)
         self.assertEqual(nan_arccos.dtype, ht.float32)
         self.assertTrue(math.isnan(nan_arccos.larray.item()))
 
@@ -43,7 +44,7 @@ def test_arccos(self):
     def test_acosh(self):
         # base elements
         comparison = torch.arange(
-            1, 31, dtype=torch.float64, device=self.device.torch_device
+            1, 31, dtype=torch.float32, device=self.device.torch_device
         ).acosh()
 
         # acosh of float32
@@ -51,28 +52,33 @@ def test_acosh(self):
         float32_acosh = ht.acosh(float32_tensor)
         self.assertIsInstance(float32_acosh, ht.DNDarray)
         self.assertEqual(float32_acosh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_acosh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_acosh.larray, comparison))
 
-        # acosh of float64
-        float64_tensor = ht.arange(1, 31, dtype=ht.float64)
-        float64_acosh = ht.acosh(float64_tensor)
-        self.assertIsInstance(float64_acosh, ht.DNDarray)
-        self.assertEqual(float64_acosh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_acosh.larray.double(), comparison))
+        if not self.is_mps:
+            # acosh of float64
+            float64_tensor = ht.arange(1, 31, dtype=ht.float64)
+            float64_acosh = ht.acosh(float64_tensor)
+            self.assertIsInstance(float64_acosh, ht.DNDarray)
+            self.assertEqual(float64_acosh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_acosh.larray, comparison.double()))
 
         # acosh of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(1, 31, dtype=ht.int32)
         int32_acosh = ht.acosh(int32_tensor)
         self.assertIsInstance(int32_acosh, ht.DNDarray)
         self.assertEqual(int32_acosh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_acosh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(int32_acosh.larray, comparison))
 
         # acosh of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(1, 31, dtype=ht.int64)
         int64_acosh = ht.arccosh(int64_tensor)
         self.assertIsInstance(int64_acosh, ht.DNDarray)
-        self.assertEqual(int64_acosh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_acosh.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_acosh.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_acosh.larray, comparison))
+        else:
+            self.assertEqual(int64_acosh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_acosh.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -84,7 +90,7 @@ def test_arcsin(self):
         # base elements
         elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0]
         comparison = torch.tensor(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).asin()
 
         # arcsin of float32
@@ -92,19 +98,20 @@ def test_arcsin(self):
         float32_arcsin = ht.asin(float32_tensor)
         self.assertIsInstance(float32_arcsin, ht.DNDarray)
         self.assertEqual(float32_arcsin.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_arcsin.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_arcsin.larray, comparison))
 
-        # arcsin of float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_arcsin = ht.arcsin(float64_tensor)
-        self.assertIsInstance(float64_arcsin, ht.DNDarray)
-        self.assertEqual(float64_arcsin.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_arcsin.larray.double(), comparison))
+        if not self.is_mps:
+            # arcsin of float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_arcsin = ht.arcsin(float64_tensor)
+            self.assertIsInstance(float64_arcsin, ht.DNDarray)
+            self.assertEqual(float64_arcsin.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_arcsin.larray, comparison.double()))
 
         # arcsin of value out of domain
         nan_tensor = ht.array([1.2])
         nan_arcsin = ht.arcsin(nan_tensor)
-        self.assertIsInstance(float64_arcsin, ht.DNDarray)
+        self.assertIsInstance(nan_arcsin, ht.DNDarray)
         self.assertEqual(nan_arcsin.dtype, ht.float32)
         self.assertTrue(math.isnan(nan_arcsin.larray.item()))
 
@@ -118,7 +125,7 @@ def test_asinh(self):
         # base elements
         elements = 30
         comparison = torch.linspace(
-            -28, 30, elements, dtype=torch.float64, device=self.device.torch_device
+            -28, 30, elements, dtype=torch.float32, device=self.device.torch_device
         ).asinh()
 
         # asinh of float32
@@ -126,28 +133,33 @@ def test_asinh(self):
         float32_asinh = ht.asinh(float32_tensor)
         self.assertIsInstance(float32_asinh, ht.DNDarray)
         self.assertEqual(float32_asinh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_asinh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_asinh.larray, comparison))
 
-        # asinh of float64
-        float64_tensor = ht.linspace(-28, 30, elements, dtype=ht.float64)
-        float64_asinh = ht.asinh(float64_tensor)
-        self.assertIsInstance(float64_asinh, ht.DNDarray)
-        self.assertEqual(float64_asinh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_asinh.larray.double(), comparison))
+        if not self.is_mps:
+            # asinh of float64
+            float64_tensor = ht.linspace(-28, 30, elements, dtype=ht.float64)
+            float64_asinh = ht.asinh(float64_tensor)
+            self.assertIsInstance(float64_asinh, ht.DNDarray)
+            self.assertEqual(float64_asinh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_asinh.larray, comparison.double()))
 
         # asinh of ints, automatic conversion to intermediate floats
         int32_tensor = ht.linspace(-28, 30, elements, dtype=ht.int32)
         int32_asinh = ht.asinh(int32_tensor)
         self.assertIsInstance(int32_asinh, ht.DNDarray)
         self.assertEqual(int32_asinh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_asinh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_asinh.larray, comparison))
 
         # asinh of longs, automatic conversion to intermediate floats
         int64_tensor = ht.linspace(-28, 30, elements, dtype=ht.int64)
         int64_asinh = ht.arcsinh(int64_tensor)
         self.assertIsInstance(int64_asinh, ht.DNDarray)
-        self.assertEqual(int64_asinh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_asinh.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_asinh.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_asinh.larray, comparison))
+        else:
+            self.assertEqual(int64_asinh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_asinh.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -159,7 +171,7 @@ def test_arctan(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).atan()
 
         # arctan of float32
@@ -167,28 +179,33 @@ def test_arctan(self):
         float32_arctan = ht.arctan(float32_tensor)
         self.assertIsInstance(float32_arctan, ht.DNDarray)
         self.assertEqual(float32_arctan.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_arctan.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_arctan.larray, comparison))
 
-        # arctan of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_arctan = ht.arctan(float64_tensor)
-        self.assertIsInstance(float64_arctan, ht.DNDarray)
-        self.assertEqual(float64_arctan.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_arctan.larray.double(), comparison))
+        if not self.is_mps:
+            # arctan of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_arctan = ht.arctan(float64_tensor)
+            self.assertIsInstance(float64_arctan, ht.DNDarray)
+            self.assertEqual(float64_arctan.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_arctan.larray, comparison.double()))
 
         # arctan of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_arctan = ht.arctan(int32_tensor)
         self.assertIsInstance(int32_arctan, ht.DNDarray)
         self.assertEqual(int32_arctan.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_arctan.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_arctan.larray, comparison))
 
         # arctan of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_arctan = ht.atan(int64_tensor)
         self.assertIsInstance(int64_arctan, ht.DNDarray)
-        self.assertEqual(int64_arctan.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_arctan.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_arctan.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_arctan.larray, comparison))
+        else:
+            self.assertEqual(int64_arctan.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_arctan.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -207,25 +224,25 @@ def test_arctan2(self):
         self.assertEqual(float32_arctan2.dtype, ht.float32)
         self.assertTrue(torch.allclose(float32_arctan2.larray, float32_comparison))
 
-        float64_y = torch.randn(30, dtype=torch.float64, device=self.device.torch_device)
-        float64_x = torch.randn(30, dtype=torch.float64, device=self.device.torch_device)
+        if not self.is_mps:
+            float64_y = torch.randn(30, dtype=torch.float64, device=self.device.torch_device)
+            float64_x = torch.randn(30, dtype=torch.float64, device=self.device.torch_device)
 
-        float64_comparison = torch.atan2(float64_y, float64_x)
-        float64_arctan2 = ht.atan2(ht.array(float64_y), ht.array(float64_x))
+            float64_comparison = torch.atan2(float64_y, float64_x)
+            float64_arctan2 = ht.atan2(ht.array(float64_y), ht.array(float64_x))
 
-        self.assertIsInstance(float64_arctan2, ht.DNDarray)
-        self.assertEqual(float64_arctan2.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_arctan2.larray, float64_comparison))
+            self.assertIsInstance(float64_arctan2, ht.DNDarray)
+            self.assertEqual(float64_arctan2.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_arctan2.larray, float64_comparison))
 
         # Rare Special Case with integers
-        int32_x = ht.array([-1, +1, +1, -1])
-        int32_y = ht.array([-1, -1, +1, +1])
-
-        int32_comparison = ht.array([-135.0, -45.0, 45.0, 135.0], dtype=ht.float64)
+        int32_x = ht.array([-1, +1, +1, -1], dtype=ht.int32)
+        int32_y = ht.array([-1, -1, +1, +1], dtype=ht.int32)
+        int32_comparison = ht.array([-135.0, -45.0, 45.0, 135.0], dtype=ht.float32)
         int32_arctan2 = ht.arctan2(int32_y, int32_x) * 180 / ht.pi
 
         self.assertIsInstance(int32_arctan2, ht.DNDarray)
-        self.assertEqual(int32_arctan2.dtype, ht.float64)
+        self.assertEqual(int32_arctan2.dtype, ht.float32)
         self.assertTrue(ht.allclose(int32_arctan2, int32_comparison))
 
         int16_x = ht.array([-1, +1, +1, -1], dtype=ht.int16)
@@ -242,7 +259,7 @@ def test_atanh(self):
         # base elements
         elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0]
         comparison = torch.tensor(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).atanh()
 
         # atanh of float32
@@ -250,19 +267,20 @@ def test_atanh(self):
         float32_atanh = ht.atanh(float32_tensor)
         self.assertIsInstance(float32_atanh, ht.DNDarray)
         self.assertEqual(float32_atanh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_atanh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_atanh.larray, comparison))
 
-        # atanh of float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_atanh = ht.atanh(float64_tensor)
-        self.assertIsInstance(float64_atanh, ht.DNDarray)
-        self.assertEqual(float64_atanh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_atanh.larray.double(), comparison))
+        if not self.is_mps:
+            # atanh of float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_atanh = ht.atanh(float64_tensor)
+            self.assertIsInstance(float64_atanh, ht.DNDarray)
+            self.assertEqual(float64_atanh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_atanh.larray, comparison.double()))
 
         # atanh of value out of domain
         nan_tensor = ht.array([1.2])
         nan_atanh = ht.arctanh(nan_tensor)
-        self.assertIsInstance(float64_atanh, ht.DNDarray)
+        self.assertIsInstance(nan_atanh, ht.DNDarray)
         self.assertEqual(nan_atanh.dtype, ht.float32)
         self.assertTrue(math.isnan(nan_atanh.larray.item()))
 
@@ -277,7 +295,7 @@ def test_degrees(self):
         elements = [0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14]
         comparison = (
             180.0
-            * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device)
+            * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device)
             / 3.141592653589793
         )
 
@@ -286,14 +304,15 @@ def test_degrees(self):
         float32_degrees = ht.degrees(float32_tensor)
         self.assertIsInstance(float32_degrees, ht.DNDarray)
         self.assertEqual(float32_degrees.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_degrees.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_degrees.larray, comparison))
 
-        # degrees with float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_degrees = ht.degrees(float64_tensor)
-        self.assertIsInstance(float64_degrees, ht.DNDarray)
-        self.assertEqual(float64_degrees.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_degrees.larray.double(), comparison))
+        if not self.is_mps:
+            # degrees with float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_degrees = ht.degrees(float64_tensor)
+            self.assertIsInstance(float64_degrees, ht.DNDarray)
+            self.assertEqual(float64_degrees.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_degrees.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -306,7 +325,7 @@ def test_deg2rad(self):
         elements = [0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0]
         comparison = (
             3.141592653589793
-            * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device)
+            * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device)
             / 180.0
         )
 
@@ -315,14 +334,15 @@ def test_deg2rad(self):
         float32_deg2rad = ht.deg2rad(float32_tensor)
         self.assertIsInstance(float32_deg2rad, ht.DNDarray)
         self.assertEqual(float32_deg2rad.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_deg2rad.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_deg2rad.larray, comparison))
 
-        # deg2rad with float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_deg2rad = ht.deg2rad(float64_tensor)
-        self.assertIsInstance(float64_deg2rad, ht.DNDarray)
-        self.assertEqual(float64_deg2rad.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_deg2rad.larray.double(), comparison))
+        if not self.is_mps:
+            # deg2rad with float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_deg2rad = ht.deg2rad(float64_tensor)
+            self.assertIsInstance(float64_deg2rad, ht.DNDarray)
+            self.assertEqual(float64_deg2rad.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_deg2rad.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -334,7 +354,7 @@ def test_cos(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).cos()
 
         # cosine of float32
@@ -342,28 +362,33 @@ def test_cos(self):
         float32_cos = ht.cos(float32_tensor)
         self.assertIsInstance(float32_cos, ht.DNDarray)
         self.assertEqual(float32_cos.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_cos.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_cos.larray, comparison))
 
-        # cosine of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_cos = ht.cos(float64_tensor)
-        self.assertIsInstance(float64_cos, ht.DNDarray)
-        self.assertEqual(float64_cos.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_cos.larray.double(), comparison))
+        if not self.is_mps:
+            # cosine of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_cos = ht.cos(float64_tensor)
+            self.assertIsInstance(float64_cos, ht.DNDarray)
+            self.assertEqual(float64_cos.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_cos.larray, comparison.double()))
 
         # cosine of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_cos = ht.cos(int32_tensor)
         self.assertIsInstance(int32_cos, ht.DNDarray)
         self.assertEqual(int32_cos.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_cos.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_cos.larray, comparison))
 
         # cosine of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_cos = int64_tensor.cos()
         self.assertIsInstance(int64_cos, ht.DNDarray)
-        self.assertEqual(int64_cos.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_cos.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_cos.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_cos.larray, comparison))
+        else:
+            self.assertEqual(int64_cos.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_cos.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -375,7 +400,7 @@ def test_cosh(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).cosh()
 
         # hyperbolic cosine of float32
@@ -383,28 +408,33 @@ def test_cosh(self):
         float32_cosh = float32_tensor.cosh()
         self.assertIsInstance(float32_cosh, ht.DNDarray)
         self.assertEqual(float32_cosh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_cosh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_cosh.larray, comparison))
 
-        # hyperbolic cosine of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_cosh = ht.cosh(float64_tensor)
-        self.assertIsInstance(float64_cosh, ht.DNDarray)
-        self.assertEqual(float64_cosh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_cosh.larray.double(), comparison))
+        if not self.is_mps:
+            # hyperbolic cosine of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_cosh = ht.cosh(float64_tensor)
+            self.assertIsInstance(float64_cosh, ht.DNDarray)
+            self.assertEqual(float64_cosh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_cosh.larray, comparison.double()))
 
         # hyperbolic cosine of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_cosh = ht.cosh(int32_tensor)
         self.assertIsInstance(int32_cosh, ht.DNDarray)
         self.assertEqual(int32_cosh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_cosh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_cosh.larray, comparison))
 
         # hyperbolic cosine of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_cosh = ht.cosh(int64_tensor)
         self.assertIsInstance(int64_cosh, ht.DNDarray)
-        self.assertEqual(int64_cosh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_cosh.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_cosh.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_cosh.larray, comparison))
+        else:
+            self.assertEqual(int64_cosh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_cosh.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -417,7 +447,7 @@ def test_rad2deg(self):
         elements = [0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14]
         comparison = (
             180.0
-            * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device)
+            * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device)
             / 3.141592653589793
         )
 
@@ -426,14 +456,15 @@ def test_rad2deg(self):
         float32_rad2deg = ht.rad2deg(float32_tensor)
         self.assertIsInstance(float32_rad2deg, ht.DNDarray)
         self.assertEqual(float32_rad2deg.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_rad2deg.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_rad2deg.larray, comparison))
 
-        # rad2deg with float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_rad2deg = ht.rad2deg(float64_tensor)
-        self.assertIsInstance(float64_rad2deg, ht.DNDarray)
-        self.assertEqual(float64_rad2deg.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_rad2deg.larray.double(), comparison))
+        if not self.is_mps:
+            # rad2deg with float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_rad2deg = ht.rad2deg(float64_tensor)
+            self.assertIsInstance(float64_rad2deg, ht.DNDarray)
+            self.assertEqual(float64_rad2deg.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_rad2deg.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -446,7 +477,7 @@ def test_radians(self):
         elements = [0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0]
         comparison = (
             3.141592653589793
-            * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device)
+            * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device)
             / 180.0
         )
 
@@ -455,14 +486,15 @@ def test_radians(self):
         float32_radians = ht.radians(float32_tensor)
         self.assertIsInstance(float32_radians, ht.DNDarray)
         self.assertEqual(float32_radians.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_radians.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_radians.larray, comparison))
 
-        # radians with float64
-        float64_tensor = ht.array(elements, dtype=ht.float64)
-        float64_radians = ht.radians(float64_tensor)
-        self.assertIsInstance(float64_radians, ht.DNDarray)
-        self.assertEqual(float64_radians.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_radians.larray.double(), comparison))
+        if not self.is_mps:
+            # radians with float64
+            float64_tensor = ht.array(elements, dtype=ht.float64)
+            float64_radians = ht.radians(float64_tensor)
+            self.assertIsInstance(float64_radians, ht.DNDarray)
+            self.assertEqual(float64_radians.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_radians.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -474,7 +506,7 @@ def test_sin(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).sin()
 
         # sine of float32
@@ -482,28 +514,33 @@ def test_sin(self):
         float32_sin = float32_tensor.sin()
         self.assertIsInstance(float32_sin, ht.DNDarray)
         self.assertEqual(float32_sin.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_sin.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_sin.larray, comparison))
 
-        # sine of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_sin = ht.sin(float64_tensor)
-        self.assertIsInstance(float64_sin, ht.DNDarray)
-        self.assertEqual(float64_sin.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_sin.larray.double(), comparison))
+        if not self.is_mps:
+            # sine of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_sin = ht.sin(float64_tensor)
+            self.assertIsInstance(float64_sin, ht.DNDarray)
+            self.assertEqual(float64_sin.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_sin.larray, comparison.double()))
 
         # sine of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_sin = ht.sin(int32_tensor)
         self.assertIsInstance(int32_sin, ht.DNDarray)
         self.assertEqual(int32_sin.dtype, ht.float32)
-        self.assertTrue(torch.allclose(int32_sin.larray.double(), comparison))
+        self.assertTrue(torch.allclose(int32_sin.larray, comparison))
 
         # sine of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_sin = ht.sin(int64_tensor)
         self.assertIsInstance(int64_sin, ht.DNDarray)
-        self.assertEqual(int64_sin.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_sin.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_sin.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_sin.larray, comparison))
+        else:
+            self.assertEqual(int64_sin.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_sin.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -515,7 +552,7 @@ def test_sinh(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).sinh()
 
         # hyperbolic sine of float32
@@ -523,28 +560,33 @@ def test_sinh(self):
         float32_sinh = float32_tensor.sinh()
         self.assertIsInstance(float32_sinh, ht.DNDarray)
         self.assertEqual(float32_sinh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_sinh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_sinh.larray, comparison))
 
-        # hyperbolic sine of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_sinh = ht.sinh(float64_tensor)
-        self.assertIsInstance(float64_sinh, ht.DNDarray)
-        self.assertEqual(float64_sinh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_sinh.larray.double(), comparison))
+        if not self.is_mps:
+            # hyperbolic sine of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_sinh = ht.sinh(float64_tensor)
+            self.assertIsInstance(float64_sinh, ht.DNDarray)
+            self.assertEqual(float64_sinh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_sinh.larray, comparison.double()))
 
         # hyperbolic sine of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_sinh = ht.sinh(int32_tensor)
         self.assertIsInstance(int32_sinh, ht.DNDarray)
         self.assertEqual(int32_sinh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(int32_sinh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(int32_sinh.larray, comparison))
 
         # hyperbolic sine of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_sinh = ht.sinh(int64_tensor)
         self.assertIsInstance(int64_sinh, ht.DNDarray)
-        self.assertEqual(int64_sinh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_sinh.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_sinh.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_sinh.larray, comparison))
+        else:
+            self.assertEqual(int64_sinh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_sinh.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -556,7 +598,7 @@ def test_tan(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).tan()
 
         # tangent of float32
@@ -564,28 +606,33 @@ def test_tan(self):
         float32_tan = float32_tensor.tan()
         self.assertIsInstance(float32_tan, ht.DNDarray)
         self.assertEqual(float32_tan.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_tan.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_tan.larray, comparison))
 
-        # tangent of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_tan = ht.tan(float64_tensor)
-        self.assertIsInstance(float64_tan, ht.DNDarray)
-        self.assertEqual(float64_tan.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_tan.larray.double(), comparison))
+        if not self.is_mps:
+            # tangent of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_tan = ht.tan(float64_tensor)
+            self.assertIsInstance(float64_tan, ht.DNDarray)
+            self.assertEqual(float64_tan.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_tan.larray, comparison.double()))
 
         # tangent of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_tan = ht.tan(int32_tensor)
         self.assertIsInstance(int32_tan, ht.DNDarray)
         self.assertEqual(int32_tan.dtype, ht.float32)
-        self.assertTrue(torch.allclose(int32_tan.larray.double(), comparison))
+        self.assertTrue(torch.allclose(int32_tan.larray, comparison))
 
         # tangent of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_tan = ht.tan(int64_tensor)
         self.assertIsInstance(int64_tan, ht.DNDarray)
-        self.assertEqual(int64_tan.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_tan.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_tan.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_tan.larray, comparison))
+        else:
+            self.assertEqual(int64_tan.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_tan.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
@@ -597,7 +644,7 @@ def test_tanh(self):
         # base elements
         elements = 30
         comparison = torch.arange(
-            elements, dtype=torch.float64, device=self.device.torch_device
+            elements, dtype=torch.float32, device=self.device.torch_device
         ).tanh()
 
         # hyperbolic tangent of float32
@@ -605,28 +652,33 @@ def test_tanh(self):
         float32_tanh = float32_tensor.tanh()
         self.assertIsInstance(float32_tanh, ht.DNDarray)
         self.assertEqual(float32_tanh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(float32_tanh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(float32_tanh.larray, comparison))
 
-        # hyperbolic tangent of float64
-        float64_tensor = ht.arange(elements, dtype=ht.float64)
-        float64_tanh = ht.tanh(float64_tensor)
-        self.assertIsInstance(float64_tanh, ht.DNDarray)
-        self.assertEqual(float64_tanh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(float64_tanh.larray.double(), comparison))
+        if not self.is_mps:
+            # hyperbolic tangent of float64
+            float64_tensor = ht.arange(elements, dtype=ht.float64)
+            float64_tanh = ht.tanh(float64_tensor)
+            self.assertIsInstance(float64_tanh, ht.DNDarray)
+            self.assertEqual(float64_tanh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(float64_tanh.larray, comparison.double()))
 
         # hyperbolic tangent of ints, automatic conversion to intermediate floats
         int32_tensor = ht.arange(elements, dtype=ht.int32)
         int32_tanh = ht.tanh(int32_tensor)
         self.assertIsInstance(int32_tanh, ht.DNDarray)
         self.assertEqual(int32_tanh.dtype, ht.float32)
-        self.assertTrue(torch.allclose(int32_tanh.larray.double(), comparison))
+        self.assertTrue(torch.allclose(int32_tanh.larray, comparison))
 
         # hyperbolic tangent of longs, automatic conversion to intermediate floats
         int64_tensor = ht.arange(elements, dtype=ht.int64)
         int64_tanh = ht.tanh(int64_tensor)
         self.assertIsInstance(int64_tanh, ht.DNDarray)
-        self.assertEqual(int64_tanh.dtype, ht.float64)
-        self.assertTrue(torch.allclose(int64_tanh.larray.double(), comparison))
+        if self.is_mps:
+            self.assertEqual(int64_tanh.dtype, ht.float32)
+            self.assertTrue(torch.allclose(int64_tanh.larray, comparison))
+        else:
+            self.assertEqual(int64_tanh.dtype, ht.float64)
+            self.assertTrue(torch.allclose(int64_tanh.larray, comparison.double()))
 
         # check exceptions
         with self.assertRaises(TypeError):
diff --git a/heat/core/tests/test_types.py b/heat/core/tests/test_types.py
index 6aa765a070..42e0124ef2 100644
--- a/heat/core/tests/test_types.py
+++ b/heat/core/tests/test_types.py
@@ -23,7 +23,9 @@ def assert_is_instantiable_heat_type(self, heat_type, torch_type):
         no_value = heat_type()
         self.assertIsInstance(no_value, ht.DNDarray)
         self.assertEqual(no_value.shape, (1,))
-        self.assertEqual((no_value.larray == 0).all().item(), 1)
+        if not self.is_mps and not ht.types.heat_type_is_complexfloating(heat_type):
+            # equal unstable on MPS and complex types
+            self.assertEqual((no_value.larray == 0).all().item(), 1)
         self.assertEqual(no_value.larray.dtype, torch_type)
 
         # check a type constructor with a complex value
@@ -31,15 +33,17 @@ def assert_is_instantiable_heat_type(self, heat_type, torch_type):
         elaborate_value = heat_type(ground_truth)
         self.assertIsInstance(elaborate_value, ht.DNDarray)
         self.assertEqual(elaborate_value.shape, (2, 3))
-        self.assertEqual(
-            (
-                elaborate_value.larray
-                == torch.tensor(ground_truth, dtype=torch_type, device=self.device.torch_device)
+        if not self.is_mps and not ht.types.heat_type_is_complexfloating(heat_type):
+            # equal unstable on MPS and complex types
+            self.assertEqual(
+                (
+                    elaborate_value.larray
+                    == torch.tensor(ground_truth, dtype=torch_type, device=self.device.torch_device)
+                )
+                .all()
+                .item(),
+                1,
             )
-            .all()
-            .item(),
-            1,
-        )
         self.assertEqual(elaborate_value.larray.dtype, torch_type)
 
         # check exception when there is more than one parameter
@@ -94,8 +98,9 @@ def test_float32(self):
         self.assert_is_instantiable_heat_type(ht.float_, torch.float32)
 
     def test_float64(self):
-        self.assert_is_instantiable_heat_type(ht.float64, torch.float64)
-        self.assert_is_instantiable_heat_type(ht.double, torch.float64)
+        if not self.is_mps:
+            self.assert_is_instantiable_heat_type(ht.float64, torch.float64)
+            self.assert_is_instantiable_heat_type(ht.double, torch.float64)
 
     def test_flexible(self):
         self.assert_non_instantiable_heat_type(ht.flexible)
@@ -108,10 +113,11 @@ def test_complex64(self):
         self.assertEqual(ht.complex64.char(), "c8")
 
     def test_complex128(self):
-        self.assert_is_instantiable_heat_type(ht.complex128, torch.complex128)
-        self.assert_is_instantiable_heat_type(ht.cdouble, torch.complex128)
+        if not self.is_mps:
+            self.assert_is_instantiable_heat_type(ht.complex128, torch.complex128)
+            self.assert_is_instantiable_heat_type(ht.cdouble, torch.complex128)
 
-        self.assertEqual(ht.complex128.char(), "c16")
+            self.assertEqual(ht.complex128.char(), "c16")
 
     def test_iscomplex(self):
         a = ht.array([1, 1.2, 1 + 1j, 1 + 0j])
@@ -336,19 +342,20 @@ def test_result_type(self):
         self.assertEqual(ht.result_type(1.0, ht.array(1, dtype=ht.int32)), ht.float32)
         self.assertEqual(ht.result_type(ht.uint8, ht.int8), ht.int16)
         self.assertEqual(ht.result_type("b", "f4"), ht.float32)
-        self.assertEqual(ht.result_type(ht.array([1], dtype=ht.float64), "f4"), ht.float64)
-        self.assertEqual(
-            ht.result_type(
-                ht.array([1, 2, 3, 4], dtype=ht.float64, split=0),
-                1,
-                ht.bool,
-                "u",
-                torch.uint8,
-                np.complex128,
-                ht.array(1, dtype=ht.int64),
-            ),
-            ht.complex128,
-        )
+        if not self.is_mps:
+            self.assertEqual(ht.result_type(ht.array([1], dtype=ht.float64), "f4"), ht.float64)
+            self.assertEqual(
+                ht.result_type(
+                    ht.array([1, 2, 3, 4], dtype=ht.float64, split=0),
+                    1,
+                    ht.bool,
+                    "u",
+                    torch.uint8,
+                    np.complex128,
+                    ht.array(1, dtype=ht.int64),
+                ),
+                ht.complex128,
+            )
         self.assertEqual(
             ht.result_type(np.array([1, 2, 3]), np.dtype("int32"), torch.tensor([1, 2, 3])),
             ht.int64,
diff --git a/heat/core/tests/test_vmap.py b/heat/core/tests/test_vmap.py
index 8fd1f4734d..0f7ba62d2e 100644
--- a/heat/core/tests/test_vmap.py
+++ b/heat/core/tests/test_vmap.py
@@ -1,5 +1,6 @@
 import heat as ht
 import torch
+import os
 
 from .test_suites.basic_test import TestCase
 
@@ -79,51 +80,45 @@ def func(x0, m=1, scale=2):
             vfunc_torch = torch.vmap(func, (0,), (0,))
             y0_torch = vfunc_torch(x0_torch, m=2, scale=3)
 
-            print(y0.resplit(None).larray, y0_torch)
-
             self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch))
 
         def test_vmap_with_chunks(self):
-            # same as before but now with prescribed chunk sizes for the vmap
-            x0 = ht.random.randn(5 * ht.MPI_WORLD.size, 10, 10, split=0)
-            x1 = ht.random.randn(10, 5 * ht.MPI_WORLD.size, split=1)
-            out_dims = (0, 0)
-
-            def func(x0, x1, k=2, scale=1e-2):
-                return torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, scale * x0 @ x1
-
-            vfunc = ht.vmap(func, out_dims, chunk_size=2)
-            y0, y1 = vfunc(x0, x1, k=2, scale=-2.2)
-
-            # compare with torch
-            x0_torch = x0.resplit(None).larray
-            x1_torch = x1.resplit(None).larray
-            vfunc_torch = torch.vmap(func, (0, 1), (0, 0))
-            y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=2, scale=-2.2)
-
-            self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch))
-            self.assertTrue(torch.allclose(y1.resplit(None).larray, y1_torch))
-
-            # two inputs (only one of them split), two outputs, including keyword arguments that are not vmapped
-            # output split along different axis
-            x0 = ht.random.randn(5 * ht.MPI_WORLD.size, 10, 10, split=0)
-            x1 = ht.random.randn(10, 5 * ht.MPI_WORLD.size, split=None)
-            out_dims = (0, 1)
-
-            def func(x0, x1, k=2, scale=1e-2):
-                return torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, scale * x0 @ x1
-
-            vfunc = ht.vmap(func, out_dims, chunk_size=1)
-            y0, y1 = vfunc(x0, x1, k=5, scale=2.2)
-
-            # compare with torch
-            x0_torch = x0.resplit(None).larray
-            x1_torch = x1.resplit(None).larray
-            vfunc_torch = torch.vmap(func, (0, None), (0, 1))
-            y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=5, scale=2.2)
-
-            self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch))
-            self.assertTrue(torch.allclose(y1.resplit(None).larray, y1_torch))
+            x1_splits = [None, 1]
+            chunk_sizes = list(range(1, 5))
+            dtypes = [ht.float32, ht.float64]
+            for x1_split in x1_splits:
+                for cs in chunk_sizes:
+                    for dtype in dtypes:
+                        with self.subTest(x1_split=x1_split, chunk_size=cs, dtype=dtype):
+                            # same as before but now with prescribed chunk sizes for the vmap
+                            x0 = ht.random.randn(
+                                5 * ht.MPI_WORLD.size, 10, 10, split=0, dtype=dtype
+                            )
+                            x1 = ht.random.randn(
+                                10, 5 * ht.MPI_WORLD.size, split=x1_split, dtype=dtype
+                            )
+                            out_dims = (0, 0)
+
+                            def func(x0, x1, k=2, scale=1e-2):
+                                return (
+                                    torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2,
+                                    scale * x0 @ x1,
+                                )
+
+                            vfunc = ht.vmap(func, out_dims, chunk_size=cs)
+                            y0, y1 = vfunc(x0, x1, k=2, scale=-2.2)
+
+                            # compare with torch
+                            x0_torch = x0.resplit(None).larray
+                            x1_torch = x1.resplit(None).larray
+                            vfunc_torch = torch.vmap(func, (0, x1_split), out_dims)
+                            y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=2, scale=-2.2)
+
+                            self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch))
+                            tol = 1e-12 if dtype == ht.float64 else 1e-4
+                            self.assertTrue(
+                                torch.allclose(y1.resplit(None).larray, y1_torch, atol=tol)
+                            )
 
         def test_vmap_catch_errors(self):
             # not a callable
diff --git a/heat/core/tiling.py b/heat/core/tiling.py
index 1418ae8245..aa1294497f 100644
--- a/heat/core/tiling.py
+++ b/heat/core/tiling.py
@@ -39,7 +39,13 @@ class SplitTiles:
 
     Examples
     --------
-    >>> a = ht.zeros((10, 11,), split=None)
+    >>> a = ht.zeros(
+    ...     (
+    ...         10,
+    ...         11,
+    ...     ),
+    ...     split=None,
+    ... )
     >>> a.create_split_tiles()
     >>> print(a.tiles.tile_ends_g)
     [0/2] tensor([[ 4,  7, 10],
@@ -190,7 +196,9 @@ def __getitem__(self, key: Union[int, slice, Tuple[Union[int, slice], ...]]) ->
 
         Examples
         --------
-        >>> test = torch.arange(np.prod([i + 6 for i in range(2)])).reshape([i + 6 for i in range(2)])
+        >>> test = torch.arange(np.prod([i + 6 for i in range(2)])).reshape(
+        ...     [i + 6 for i in range(2)]
+        ... )
         >>> a = ht.array(test, split=0).larray
         [0/2] tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.],
         [0/2]         [ 7.,  8.,  9., 10., 11., 12., 13.]])
@@ -387,7 +395,7 @@ class SquareDiagTiles:
         Default: 2
 
     Attributes
-    -----------
+    ----------
     __col_per_proc_list : List
         List is length of the number of processes, each element has the number of tile
         columns on the process whos rank equals the index
@@ -408,7 +416,7 @@ class SquareDiagTiles:
     The generation of these tiles may unbalance the original ``DNDarray``!
 
     Notes
-    -----------
+    -----
     This tiling scheme is intended for use with the :func:`~heat.core.linalg.qr.qr` function.
     """
 
@@ -509,7 +517,6 @@ def __init__(self, arr: DNDarray, tiles_per_proc: int = 2) -> None:  # noqa: D10
         # if arr.split == 1:  # adjust the 0th dim to be the cumsum
         row_inds = [0] + row_inds[:-1]
         row_inds = torch.tensor(row_inds, device=arr.larray.device).cumsum(dim=0)
-
         for num, c in enumerate(col_inds):  # set columns
             tile_map[:, num, 1] = c
         for num, r in enumerate(row_inds):  # set rows
@@ -1012,7 +1019,9 @@ def local_set(
         >>> a = ht.zeros((11, 10), split=0)
         >>> a_tiles = tiling.SquareDiagTiles(a, tiles_per_proc=2)  # type: tiling.SquareDiagTiles
         >>> local = a_tiles.local_get(key=slice(None))
-        >>> a_tiles.local_set(key=slice(None), value=torch.arange(local.numel()).reshape(local.shape))
+        >>> a_tiles.local_set(
+        ...     key=slice(None), value=torch.arange(local.numel()).reshape(local.shape)
+        ... )
         >>> print(a.larray)
         [0/1] tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
         [0/1]         [10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
diff --git a/heat/core/trigonometrics.py b/heat/core/trigonometrics.py
index 63926127a2..4ffa3825bb 100644
--- a/heat/core/trigonometrics.py
+++ b/heat/core/trigonometrics.py
@@ -59,7 +59,7 @@ def arccos(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.arccos(ht.array([-1.,-0., 0.83]))
+    >>> ht.arccos(ht.array([-1.0, -0.0, 0.83]))
     DNDarray([3.1416, 1.5708, 0.5917], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.acos, x, out)
@@ -91,7 +91,7 @@ def acosh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.acosh(ht.array([1., 10., 20.]))
+    >>> ht.acosh(ht.array([1.0, 10.0, 20.0]))
     DNDarray([0.0000, 2.9932, 3.6883], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.acosh, x, out)
@@ -117,7 +117,7 @@ def arcsin(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.arcsin(ht.array([-1.,-0., 0.83]))
+    >>> ht.arcsin(ht.array([-1.0, -0.0, 0.83]))
     DNDarray([-1.5708, -0.0000,  0.9791], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.asin, x, out)
@@ -149,7 +149,7 @@ def asinh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.asinh(ht.array([-10., 0., 10.]))
+    >>> ht.asinh(ht.array([-10.0, 0.0, 10.0]))
     DNDarray([-2.9982,  0.0000,  2.9982], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.asinh, x, out)
@@ -211,10 +211,6 @@ def arctan2(x1: DNDarray, x2: DNDarray) -> DNDarray:
     >>> ht.arctan2(y, x) * 180 / ht.pi
     DNDarray([-135.0000,  -45.0000,   45.0000,  135.0000], dtype=ht.float64, device=cpu:0, split=None)
     """
-    # Cast integer to float because torch.atan2() only supports integer types on PyTorch 1.5.0.
-    x1 = x1.astype(types.promote_types(x1.dtype, types.float))
-    x2 = x2.astype(types.promote_types(x2.dtype, types.float))
-
     return binary_op(torch.atan2, x1, x2)
 
 
@@ -243,7 +239,7 @@ def atanh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.atanh(ht.array([-1.,-0., 0.83]))
+    >>> ht.atanh(ht.array([-1.0, -0.0, 0.83]))
     DNDarray([  -inf, -0.0000, 1.1881], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.atanh, x, out)
@@ -321,7 +317,7 @@ def deg2rad(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.deg2rad(ht.array([0.,20.,45.,78.,94.,120.,180., 270., 311.]))
+    >>> ht.deg2rad(ht.array([0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0]))
     DNDarray([0.0000, 0.3491, 0.7854, 1.3614, 1.6406, 2.0944, 3.1416, 4.7124, 5.4280], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.deg2rad, x, out)
@@ -341,7 +337,7 @@ def degrees(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.degrees(ht.array([0.,0.2,0.6,0.9,1.2,2.7,3.14]))
+    >>> ht.degrees(ht.array([0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14]))
     DNDarray([  0.0000,  11.4592,  34.3775,  51.5662,  68.7549, 154.6986, 179.9088], dtype=ht.float32, device=cpu:0, split=None)
     """
     return rad2deg(x, out=out)
@@ -361,7 +357,7 @@ def rad2deg(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.rad2deg(ht.array([0.,0.2,0.6,0.9,1.2,2.7,3.14]))
+    >>> ht.rad2deg(ht.array([0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14]))
     DNDarray([  0.0000,  11.4592,  34.3775,  51.5662,  68.7549, 154.6986, 179.9088], dtype=ht.float32, device=cpu:0, split=None)
     """
     return local_op(torch.rad2deg, x, out=out)
@@ -381,7 +377,7 @@ def radians(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray:
 
     Examples
     --------
-    >>> ht.radians(ht.array([0., 20., 45., 78., 94., 120., 180., 270., 311.]))
+    >>> ht.radians(ht.array([0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0]))
     DNDarray([0.0000, 0.3491, 0.7854, 1.3614, 1.6406, 2.0944, 3.1416, 4.7124, 5.4280], dtype=ht.float32, device=cpu:0, split=None)
     """
     return deg2rad(x, out)
diff --git a/heat/core/types.py b/heat/core/types.py
index 6bb8e0272c..6858b206ee 100644
--- a/heat/core/types.py
+++ b/heat/core/types.py
@@ -46,6 +46,8 @@
     "canonical_heat_type",
     "heat_type_is_exact",
     "heat_type_is_inexact",
+    "heat_type_is_realfloating",
+    "heat_type_is_complexfloating",
     "iscomplex",
     "isreal",
     "issubdtype",
@@ -502,7 +504,7 @@ def canonical_heat_type(a_type: Union[str, Type[datatype], Any]) -> Type[datatyp
         In the three former cases the according mapped type is looked up, in the latter the type is simply returned.
 
     Raises
-    -------
+    ------
     TypeError
         If the type cannot be converted.
     """
@@ -547,9 +549,26 @@ def heat_type_is_inexact(ht_dtype: Type[datatype]) -> bool:
     return ht_dtype in _inexact
 
 
+def heat_type_is_realfloating(ht_dtype: Type[datatype]) -> bool:
+    """
+    Check if Heat type is a real floating point number, i.e float32 or float64
+
+    Parameters
+    ----------
+    ht_dtype: Type[datatype]
+        Heat type to check
+
+    Returns
+    -------
+    out: bool
+        True if ht_dtype is a real float, False otherwise
+    """
+    return ht_dtype in (float32, float64)
+
+
 def heat_type_is_complexfloating(ht_dtype: Type[datatype]) -> bool:
     """
-    Check if HeAT type is a complex floating point number, i.e complex64
+    Check if Heat type is a complex floating point number, i.e complex64
 
     Parameters
     ----------
@@ -580,7 +599,7 @@ def heat_type_of(
         The object for which to infer the type.
 
     Raises
-    -------
+    ------
     TypeError
         If the object's type cannot be inferred.
     """
@@ -696,7 +715,7 @@ def can_cast(
 
 
     Raises
-    -------
+    ------
     TypeError
         If the types are not understood or casting is not a string
     ValueError
@@ -714,7 +733,7 @@ def can_cast(
     True
     >>> ht.can_cast(2.0e200, "u1")
     False
-    >>> ht.can_cast('i8', 'i4', 'no')
+    >>> ht.can_cast("i8", "i4", "no")
     False
     >>> ht.can_cast("i8", "i4", "safe")
     False
@@ -774,7 +793,7 @@ def iscomplex(x: dndarray.DNDarray) -> dndarray.DNDarray:
 
     Examples
     --------
-    >>> ht.iscomplex(ht.array([1+1j, 1]))
+    >>> ht.iscomplex(ht.array([1 + 1j, 1]))
     DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None)
     """
     sanitation.sanitize_in(x)
@@ -796,7 +815,7 @@ def isreal(x: dndarray.DNDarray) -> dndarray.DNDarray:
 
     Examples
     --------
-    >>> ht.iscomplex(ht.array([1+1j, 1]))
+    >>> ht.iscomplex(ht.array([1 + 1j, 1]))
     DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None)
     """
     return _operations.__local_op(torch.isreal, x, None, no_cast=True)
@@ -825,7 +844,7 @@ def issubdtype(
     False
     >>> ht.issubdtype(ht.float64, ht.float32)
     False
-    >>> ht.issubdtype('i', ht.integer)
+    >>> ht.issubdtype("i", ht.integer)
     True
     """
     # Assure that each argument is a ht.dtype
@@ -868,7 +887,7 @@ def promote_types(
 
 
 def result_type(
-    *arrays_and_types: Tuple[Union[dndarray.DNDarray, Type[datatype], Any]]
+    *arrays_and_types: Tuple[Union[dndarray.DNDarray, Type[datatype], Any]],
 ) -> Type[datatype]:
     """
     Returns the data type that results from type promotions rules performed in an arithmetic operation.
@@ -975,7 +994,7 @@ class finfo:
         Kind of floating point data-type about which to get information.
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
     >>> info = ht.types.finfo(ht.float32)
     >>> info.bits
@@ -1023,7 +1042,7 @@ class iinfo:
         Kind of floating point data-type about which to get information.
 
     Examples
-    ---------
+    --------
     >>> import heat as ht
     >>> info = ht.types.iinfo(ht.int32)
     >>> info.bits
diff --git a/heat/core/version.py b/heat/core/version.py
index 0d7e23cc23..30094d536d 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -1,10 +1,10 @@
-"""This module contains Heat's version information."""
+"""Heat's version information."""
 
 major: int = 1
 """Indicates Heat's main version."""
-minor: int = 5
+minor: int = 6
 """Indicates feature extension."""
-micro: int = 1
+micro: int = 0
 """Indicates revisions for bugfixes."""
 extension: str = None
 """Indicates special builds, e.g. for specific hardware."""
@@ -13,4 +13,4 @@
     __version__: str = f"{major}.{minor}.{micro}"
     """The combined version string, consisting out of major, minor, micro and possibly extension."""
 else:
-    __version__: str = f"{major}.{minor}.{micro}-{extension}"
+    __version__ = f"{major}.{minor}.{micro}-{extension}"
diff --git a/heat/core/vmap.py b/heat/core/vmap.py
index 03cb0f449f..2defdfc928 100644
--- a/heat/core/vmap.py
+++ b/heat/core/vmap.py
@@ -1,4 +1,5 @@
 """
+Vmap module.
 This implements a functionality similar to PyTorchs vmap function.
 Requires PyTorch 2.0.0 or higher.
 """
@@ -21,7 +22,7 @@ def vmap(
     chunk_size: int = None,
 ) -> Callable[[Tuple[DNDarray]], Tuple[DNDarray]]:
     """
-    This function is used to apply a function to a DNDarray in a vectorized way.
+    Apply a function to a DNDarray in a vectorized way.
     `heat.vmap` return a callable that can be applied to DNDarrays.
     Vectorization will automatically take place along the split axis/axes of the DNDarray(s);
     therefore, unlike in PyTorch, there is no argument `in_dims`.
diff --git a/heat/decomposition/__init__.py b/heat/decomposition/__init__.py
index 9a9721c92f..1589ee59fb 100644
--- a/heat/decomposition/__init__.py
+++ b/heat/decomposition/__init__.py
@@ -3,3 +3,4 @@
 """
 
 from .pca import *
+from .dmd import *
diff --git a/heat/decomposition/dmd.py b/heat/decomposition/dmd.py
new file mode 100644
index 0000000000..556ce26d96
--- /dev/null
+++ b/heat/decomposition/dmd.py
@@ -0,0 +1,715 @@
+"""
+Module implementing the Dynamic Mode Decomposition (DMD) algorithm.
+"""
+
+import heat as ht
+from typing import Optional, Union, List
+import torch
+
+try:
+    from typing import Self
+except ImportError:
+    from typing_extensions import Self
+
+
+def _torch_matrix_diag(diagonal):
+    # auxiliary function to create a batch of diagonal matrices from a batch of diagonal vectors
+    # source: fmassas comment on Oct 4, 2018 in https://github.com/pytorch/pytorch/issues/12160 [Accessed Oct 09, 2024]
+    N = diagonal.shape[-1]
+    shape = diagonal.shape[:-1] + (N, N)
+    device, dtype = diagonal.device, diagonal.dtype
+    result = torch.zeros(shape, dtype=dtype, device=device)
+    indices = torch.arange(result.numel(), device=device).reshape(shape)
+    indices = indices.diagonal(dim1=-2, dim2=-1)
+    result.view(-1)[indices] = diagonal
+    return result
+
+
+class DMD(ht.RegressionMixin, ht.BaseEstimator):
+    """
+    Dynamic Mode Decomposition (DMD), plain vanilla version with SVD-based implementation.
+
+    The time series of which DMD shall be computed must be provided as a 2-D DNDarray of shape (n_features, n_timesteps).
+    Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis.
+
+    Parameters
+    ----------
+    svd_solver : str, optional
+        Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'.
+    svd_rank : int, optional
+        The rank to which SVD shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation.
+        For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified.
+        For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified.
+        For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to.
+    svd_tol : float, optional
+        The tolerance to which SVD shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation.
+        For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified.
+        For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None.
+
+    Attributes
+    ----------
+    svd_solver : str
+        The algorithm used for the singular value decomposition (SVD).
+    svd_rank : int
+        The rank to which SVD shall be truncated.
+    svd_tol : float
+        The tolerance to which SVD shall be truncated.
+    rom_basis_ : DNDarray
+        The reduced order model basis.
+    rom_transfer_matrix_ : DNDarray
+        The reduced order model transfer matrix.
+    rom_eigenvalues_ : DNDarray
+        The reduced order model eigenvalues.
+    rom_eigenmodes_ : DNDarray
+        The reduced order model eigenmodes ("DMD modes")
+
+    Notes
+    -----
+    We follow the "exact DMD" method as described in [1], Sect. 2.2.
+
+    References
+    ----------
+    [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016.
+    """
+
+    def __init__(
+        self,
+        svd_solver: Optional[str] = "full",
+        svd_rank: Optional[int] = None,
+        svd_tol: Optional[float] = None,
+    ):
+        # Check if the specified SVD algorithm is valid
+        if not isinstance(svd_solver, str):
+            raise TypeError(
+                f"Invalid type '{type(svd_solver)}' for 'svd_solver'. Must be a string."
+            )
+        # check if the specified SVD algorithm is valid
+        if svd_solver not in ["full", "hierarchical", "randomized"]:
+            raise ValueError(
+                f"Invalid SVD algorithm '{svd_solver}'. Must be one of 'full', 'hierarchical', 'randomized'."
+            )
+        # check if the respective algorithm got the right combination of non-None parameters
+        if svd_solver == "full" and svd_rank is not None and svd_tol is not None:
+            raise ValueError(
+                "For 'full' SVD, at most one of 'svd_rank' or 'svd_tol' may be specified."
+            )
+        if svd_solver == "hierarchical":
+            if svd_rank is None and svd_tol is None:
+                raise ValueError(
+                    "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but none of them is specified."
+                )
+            if svd_rank is not None and svd_tol is not None:
+                raise ValueError(
+                    "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but currently both are specified."
+                )
+        if svd_solver == "randomized":
+            if svd_rank is None:
+                raise ValueError("For 'randomized' SVD, 'svd_rank' must be specified.")
+            if svd_tol is not None:
+                raise ValueError("For 'randomized' SVD, 'svd_tol' must be None.")
+        # check correct data types of non-None parameters
+        if svd_rank is not None:
+            if not isinstance(svd_rank, int):
+                raise TypeError(
+                    f"Invalid type '{type(svd_rank)}' for 'svd_rank'. Must be an integer."
+                )
+            if svd_rank < 1:
+                raise ValueError(
+                    f"Invalid value '{svd_rank}' for 'svd_rank'. Must be a positive integer."
+                )
+        if svd_tol is not None:
+            if not isinstance(svd_tol, float):
+                raise TypeError(f"Invalid type '{type(svd_tol)}' for 'svd_tol'. Must be a float.")
+            if svd_tol <= 0:
+                raise ValueError(f"Invalid value '{svd_tol}' for 'svd_tol'. Must be non-negative.")
+        # set or initialize the attributes
+        self.svd_solver = svd_solver
+        self.svd_rank = svd_rank
+        self.svd_tol = svd_tol
+        self.rom_basis_ = None
+        self.rom_transfer_matrix_ = None
+        self.rom_eigenvalues_ = None
+        self.rom_eigenmodes_ = None
+        self.dmdmodes_ = None
+        self.n_modes_ = None
+
+    def fit(self, X: ht.DNDarray) -> Self:
+        """
+        Fits the DMD model to the given data.
+
+        Parameters
+        ----------
+        X : DNDarray
+            The time series data to fit the DMD model to. Must be of shape (n_features, n_timesteps).
+        """
+        ht.sanitize_in(X)
+        # check if the input data is a 2-D DNDarray
+        if X.ndim != 2:
+            raise ValueError(
+                f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_features, n_timesteps)."
+            )
+        # check if the input data has at least two time steps
+        if X.shape[1] < 2:
+            raise ValueError(
+                f"Invalid number of time steps '{X.shape[1]}' in input data 'X'. Must have at least two time steps."
+            )
+        # first step of DMD: compute the SVD of the input data from first to second last time step
+        if self.svd_solver == "full" or not X.is_distributed():
+            U, S, V = ht.linalg.svd(
+                X[:, :-1] if X.split == 0 else X[:, :-1].balance(), full_matrices=False
+            )
+            if self.svd_tol is not None:
+                # truncation w.r.t. prescribed bound on explained variance
+                # determine svd_rank accordingly
+                total_variance = (S**2).sum()
+                variance_threshold = (1 - self.svd_tol) * total_variance.larray.item()
+                variance_cumsum = (S**2).larray.cumsum(0)
+                self.n_modes_ = len(variance_cumsum[variance_cumsum <= variance_threshold]) + 1
+            elif self.svd_rank is not None:
+                # truncation w.r.t. prescribed rank
+                self.n_modes_ = self.svd_rank
+            else:
+                # no truncation
+                self.n_modes_ = S.shape[0]
+            self.rom_basis_ = U[:, : self.n_modes_]
+            V = V[:, : self.n_modes_]
+            S = S[: self.n_modes_]
+        # compute SVD via "hierarchical" SVD
+        elif self.svd_solver == "hierarchical":
+            if self.svd_tol is not None:
+                # hierarchical SVD with prescribed upper bound on relative error
+                U, S, V, _ = ht.linalg.hsvd_rtol(
+                    X[:, :-1] if X.split == 0 else X[:, :-1].balance(),
+                    self.svd_tol,
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+            else:
+                # hierarchical SVD with prescribed, fixed rank
+                U, S, V, _ = ht.linalg.hsvd_rank(
+                    X[:, :-1] if X.split == 0 else X[:, :-1].balance(),
+                    self.svd_rank,
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+            self.rom_basis_ = U
+            self.n_modes_ = U.shape[1]
+        else:
+            # compute SVD via "randomized" SVD
+            U, S, V = ht.linalg.rsvd(
+                X[:, :-1] if X.split == 0 else X[:, :-1].balance_(),
+                self.svd_rank,
+            )
+            self.rom_basis_ = U
+            self.n_modes_ = U.shape[1]
+        # second step of DMD: compute the reduced order model transfer matrix
+        # we need to assume that the the transfer matrix of the ROM is small enough to fit into memory of one process
+        if X.split == 0 or X.split is None:
+            # if split axis of the input data is 0, using X[:,1:] does not result in un-balancedness and corresponding problems in matmul
+            self.rom_transfer_matrix_ = self.rom_basis_.T @ X[:, 1:] @ V / S
+        else:
+            # if input is split along columns, X[:,1:] will be un-balanced and cause problems in matmul
+            Xplus = X[:, 1:]
+            Xplus.balance_()
+            self.rom_transfer_matrix_ = self.rom_basis_.T @ Xplus @ V / S
+
+        self.rom_transfer_matrix_.resplit_(None)
+        # third step of DMD: compute the reduced order model eigenvalues and eigenmodes
+        eigvals_loc, eigvec_loc = torch.linalg.eig(self.rom_transfer_matrix_.larray)
+        self.rom_eigenvalues_ = ht.array(eigvals_loc, split=None, device=X.device)
+        self.rom_eigenmodes_ = ht.array(eigvec_loc, split=None, device=X.device)
+        self.dmdmodes_ = self.rom_basis_ @ self.rom_eigenmodes_
+
+    def predict_next(self, X: ht.DNDarray, n_steps: int = 1) -> ht.DNDarray:
+        """
+        Predicts and returns the state(s) after n_steps-many time steps for given a current state(s).
+
+        Parameters
+        ----------
+        X : DNDarray
+            The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states,
+            i.e., X can be of shape (n_features,) or (n_features, n_current_states).
+            The output will have the same shape as the input.
+        n_steps : int, optional
+            The number of steps to predict into the future. Default is 1, i.e., the next time step is predicted.
+        """
+        if not isinstance(n_steps, int):
+            raise TypeError(f"Invalid type '{type(n_steps)}' for 'n_steps'. Must be an integer.")
+        if self.rom_basis_ is None:
+            raise RuntimeError("Model has not been fitted yet. Call 'fit' first.")
+        # sanitize input data
+        ht.sanitize_in(X)
+        # if X is a 1-D DNDarray, we add an artificial batch dimension
+        if X.ndim == 1:
+            X = X.expand_dims(1)
+        # check if the input data has the right number of features
+        if X.shape[0] != self.rom_basis_.shape[0]:
+            raise ValueError(
+                f"Invalid number of features '{X.shape[0]}' in input data 'X'. Must have the same number of features as the training data."
+            )
+        rom_mat = self.rom_transfer_matrix_.copy()
+        rom_mat.larray = torch.linalg.matrix_power(rom_mat.larray, n_steps)
+        # the following line looks that complicated because we have to make sure that splits of the resulting matrices in
+        # each of the products are split along the axis that deserves being splitted
+        nextX = (self.rom_basis_.T @ X).T.resplit_(None) @ (self.rom_basis_ @ rom_mat).T
+        return (nextX.T).squeeze()
+
+    def predict(self, X: ht.DNDarray, steps: Union[int, List[int]]) -> ht.DNDarray:
+        """
+        Predics and returns future states given a current state(s) and returns them all as an array of size (n_steps, n_features).
+
+        This function avoids a time-stepping loop (i.e., repeated calls to 'predict_next') and computes the future states in one go.
+        To do so, the number of future times to predict must be of moderate size as an array of shape (n_steps, self.n_modes_, self.n_modes_) must fit into memory.
+        Moreover, it must be ensured that:
+
+        - the array of initial states is not split or split along the batch axis (axis 1) and the feature axis is small (i.e., self.rom_basis_ is not split)
+
+        Parameters
+        ----------
+        X : DNDarray
+            The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states,
+            i.e., X can be of shape (n_features,) or (n_current_states, n_features).
+        steps : int or List[int]
+            if int: predictions at time step 0, 1, ..., steps-1 are computed
+            if List[int]: predictions at time steps given in the list are computed
+        """
+        if self.rom_basis_ is None:
+            raise RuntimeError("Model has not been fitted yet. Call 'fit' first.")
+        # sanitize input data
+        ht.sanitize_in(X)
+        # if X is a 1-D DNDarray, we add an artificial batch dimension
+        if X.ndim == 1:
+            X = X.expand_dims(1)
+        # check if the input data has the right number of features
+        if X.shape[0] != self.rom_basis_.shape[0]:
+            raise ValueError(
+                f"Invalid number of features '{X.shape[0]}' in input data 'X'. Must have the same number of features as the training data."
+            )
+        if isinstance(steps, int):
+            steps = torch.arange(steps, dtype=torch.int32, device=X.device.torch_device)
+        elif isinstance(steps, list):
+            steps = torch.tensor(steps, dtype=torch.int32, device=X.device.torch_device)
+        else:
+            raise TypeError(
+                f"Invalid type '{type(steps)}' for 'steps'. Must be an integer or a list of integers."
+            )
+        steps = steps.reshape(-1, 1).repeat(1, self.rom_eigenvalues_.shape[0])
+        X_rom = self.rom_basis_.T @ X
+
+        transfer_mat = _torch_matrix_diag(torch.pow(self.rom_eigenvalues_.larray, steps))
+        transfer_mat = (
+            self.rom_eigenmodes_.larray @ transfer_mat @ self.rom_eigenmodes_.larray.inverse()
+        )
+        transfer_mat = torch.real(
+            transfer_mat
+        )  # necessary to avoid imaginary parts due to numerical errors
+
+        if self.rom_basis_.split is None and (X.split is None or X.split == 1):
+            result = (
+                transfer_mat @ X_rom.larray
+            )  # here we assume that X_rom is not split or split along the second axis (axis 1)
+            del transfer_mat
+
+            result = (
+                self.rom_basis_.larray @ result
+            )  # here we assume that self.rom_basis_ is not split (i.e., the feature number is small)
+            result = ht.array(result, is_split=2 if X.split == 1 else None)
+            return result.squeeze().T
+        else:
+            raise NotImplementedError(
+                "Predicting multiple time steps in one go is not supported for the given data layout. Please, use 'predict_next' instead, or open an issue on GitHub if you require this feature."
+            )
+
+    def __str__(self):
+        if ht.MPI_WORLD.rank == 0:
+            if self.rom_basis_ is not None:
+                return (
+                    f"-------------------- DMD (Dynamic Mode Decomposition) --------------------\n"
+                    f"Number of modes: {self.n_modes_}\n"
+                    f"State space dimension: {self.rom_basis_.shape[0]}\n"
+                    f"DMD eigenvalues: {self.rom_eigenvalues_.larray}\n"
+                    f"--------------------------------------------------------------------------\n"
+                    f"ROM basis of shape {self.rom_basis_.shape}:\n"
+                    f"\t split axis: {self.rom_basis_.split}\n"
+                    f"\t device: {self.rom_basis_.device.__str__().split(':')[-2]}\n"
+                    f"--------------------------------------------------------------------------\n"
+                )
+            else:
+                return (
+                    f"---------------- UNFITTED DMD (Dynamic Mode Decomposition) ---------------\n"
+                    f"Parameters for fit are as follows: \n"
+                    f"\t SVD-solver: {self.svd_solver}\n"
+                    f"\t SVD-rank: {self.svd_rank}\n"
+                    f"\t SVD-tolerance: {self.svd_tol}\n"
+                    f"--------------------------------------------------------------------------\n"
+                )
+        else:
+            return ""
+
+
+class DMDc(ht.RegressionMixin, ht.BaseEstimator):
+    """
+    Dynamic Mode Decomposition with Control (DMDc), plain vanilla version with SVD-based implementation.
+
+    The time series of states and controls must be provided as 2-D DNDarrays of shapes (n_state_features, n_timesteps) and (n_control_features, n_timesteps), respectively.
+    Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis.
+
+    Parameters
+    ----------
+    svd_solver : str, optional
+        Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'.
+    svd_rank : int, optional
+        The rank to which SVD of the states shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation.
+        For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified.
+        For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified.
+        For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to.
+    svd_tol : float, optional
+        The tolerance to which SVD of the states shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation.
+        For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified.
+        For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None.
+
+    Attributes
+    ----------
+    svd_solver : str
+        The algorithm used for the singular value decomposition (SVD).
+    svd_rank : int
+        The rank to which SVD shall be truncated.
+    svd_tol : float
+        The tolerance to which SVD shall be truncated.
+    rom_basis_ : DNDarray
+        The reduced order model basis.
+    rom_transfer_matrix_ : DNDarray
+        The reduced order model transfer matrix.
+    rom_control_matrix_ : DNDarray
+        The reduced order model control matrix.
+    rom_eigenvalues_ : DNDarray
+        The reduced order model eigenvalues.
+    rom_eigenmodes_ : DNDarray
+        The reduced order model eigenmodes ("DMD modes")
+
+    Notes
+    -----
+    We follow the approach described in [1], Sects. 3.3 and 3.4.
+    In the case that svd_rank is prescribed, the rank of the SVD of the full system matrix is set to svd_rank + n_control_features; cf. https://github.com/dynamicslab/pykoopman
+    for the same approach.
+
+    References
+    ----------
+    [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016.
+    """
+
+    def __init__(
+        self,
+        svd_solver: Optional[str] = "full",
+        svd_rank: Optional[int] = None,
+        svd_tol: Optional[float] = None,
+    ):
+        # Check if the specified SVD algorithm is valid
+        if not isinstance(svd_solver, str):
+            raise TypeError(
+                f"Invalid type '{type(svd_solver)}' for 'svd_solver'. Must be a string."
+            )
+        # check if the specified SVD algorithm is valid
+        if svd_solver not in ["full", "hierarchical", "randomized"]:
+            raise ValueError(
+                f"Invalid SVD algorithm '{svd_solver}'. Must be one of 'full', 'hierarchical', 'randomized'."
+            )
+        # check if the respective algorithm got the right combination of non-None parameters
+        if svd_solver == "full" and svd_rank is not None and svd_tol is not None:
+            raise ValueError(
+                "For 'full' SVD, at most one of 'svd_rank' or 'svd_tol' may be specified."
+            )
+        if svd_solver == "hierarchical":
+            if svd_rank is None and svd_tol is None:
+                raise ValueError(
+                    "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but none of them is specified."
+                )
+            if svd_rank is not None and svd_tol is not None:
+                raise ValueError(
+                    "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but currently both are specified."
+                )
+        if svd_solver == "randomized":
+            if svd_rank is None:
+                raise ValueError("For 'randomized' SVD, 'svd_rank' must be specified.")
+            if svd_tol is not None:
+                raise ValueError("For 'randomized' SVD, 'svd_tol' must be None.")
+        # check correct data types of non-None parameters
+        if svd_rank is not None:
+            if not isinstance(svd_rank, int):
+                raise TypeError(
+                    f"Invalid type '{type(svd_rank)}' for 'svd_rank'. Must be an integer."
+                )
+            if svd_rank < 1:
+                raise ValueError(
+                    f"Invalid value '{svd_rank}' for 'svd_rank'. Must be a positive integer."
+                )
+        if svd_tol is not None:
+            if not isinstance(svd_tol, float):
+                raise TypeError(f"Invalid type '{type(svd_tol)}' for 'svd_tol'. Must be a float.")
+            if svd_tol <= 0:
+                raise ValueError(f"Invalid value '{svd_tol}' for 'svd_tol'. Must be non-negative.")
+        # set or initialize the attributes
+        self.svd_solver = svd_solver
+        self.svd_rank = svd_rank
+        self.svd_tol = svd_tol
+        self.rom_basis_ = None
+        self.rom_transfer_matrix_ = None
+        self.rom_control_matrix_ = None
+        self.rom_eigenvalues_ = None
+        self.rom_eigenmodes_ = None
+        self.dmdmodes_ = None
+        self.n_modes_ = None
+        self.n_modes_system_ = None
+
+    def fit(self, X: ht.DNDarray, C: ht.DNDarray) -> Self:
+        """
+        Fits the DMD model to the given data.
+
+        Parameters
+        ----------
+        X : DNDarray
+            The time series data of states to fit the DMD model to. Must be of shape (n_state_features, n_timesteps).
+        C : DNDarray
+            The time series of control inputs to fit the DMD model to. Must be of shape (n_control_features, n_timesteps).
+        """
+        ht.sanitize_in(X)
+        ht.sanitize_in(C)
+        # check if the input data is a 2-D DNDarray
+        if X.ndim != 2:
+            raise ValueError(
+                f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_state_features, n_timesteps)."
+            )
+        if C.ndim != 2:
+            raise ValueError(
+                f"Invalid shape '{C.shape}' for input data 'C'. Must be a 2-D DNDarray of shape (n_control_features, n_timesteps)."
+            )
+        # check if the input data has at least two time steps
+        if X.shape[1] < 2:
+            raise ValueError(
+                f"Invalid number of time steps '{X.shape[1]}' in input data 'X'. Must have at least two time steps."
+            )
+        if C.shape[1] < 2:
+            raise ValueError(
+                f"Invalid number of time steps '{C.shape[1]}' in input data 'C'. Must have at least two time steps."
+            )
+        # check if the input data has the same number of time steps
+        if X.shape[1] != C.shape[1]:
+            raise ValueError(
+                f"Invalid number of time steps {X.shape[1]} in input data 'X' and {C.shape[1]} in input data 'C'. Must have the same number of time steps."
+            )
+        if X.split is not None and C.split is not None and X.split != C.split:
+            raise ValueError(
+                f"If both input data 'X' and 'C' are distributed, they must be distributed along the same axis, but X.split={X.split}, C.split={C.split}."
+            )
+        Xplus = X[:, 1:]
+        Xplus.balance_()
+        Omega = ht.concatenate((X, C), axis=0)[:, :-1]
+        # first step of DMDc: compute the SVD of the input data from first to second last time step
+        # as well as of the full system matrix
+        if self.svd_solver == "full" or not X.is_distributed():
+            U, S, V = ht.linalg.svd(Xplus, full_matrices=False)
+            Utilde, Stilde, Vtilde = ht.linalg.svd(Omega, full_matrices=False)
+            if self.svd_tol is not None:
+                # truncation w.r.t. prescribed bound on explained variance
+                # determine svd_rank accordingly
+                total_variance = (S**2).sum()
+                variance_threshold = (1 - self.svd_tol) * total_variance.larray.item()
+                variance_cumsum = (S**2).larray.cumsum(0)
+                self.n_modes_ = len(variance_cumsum[variance_cumsum <= variance_threshold]) + 1
+                total_variance_system = (Stilde**2).sum()
+                variance_threshold_system = (1 - self.svd_tol) * total_variance_system.larray.item()
+                variance_cumsum_system = (Stilde**2).larray.cumsum(0)
+                self.n_modes_system_ = (
+                    len(variance_cumsum_system[variance_cumsum_system <= variance_threshold_system])
+                    + 1
+                )
+            elif self.svd_rank is not None:
+                # truncation w.r.t. prescribed rank
+                self.n_modes_ = self.svd_rank
+                self.n_modes_system_ = self.svd_rank + C.shape[0]
+            else:
+                # no truncation
+                self.n_modes_ = S.shape[0]
+                self.n_modes_system = Stilde.shape[0]
+
+            self.rom_basis_ = U[:, : self.n_modes_]
+            V = V[:, : self.n_modes_]
+            S = S[: self.n_modes_]
+            Vtilde = Vtilde[:, : self.n_modes_system_]
+            Stilde = Stilde[: self.n_modes_system_]
+            Utilde1 = Utilde[: X.shape[0], : self.n_modes_system_]
+            Utilde2 = Utilde[X.shape[0] :, : self.n_modes_system_]
+        # compute SVD via "hierarchical" SVD
+        elif self.svd_solver == "hierarchical":
+            if self.svd_tol is not None:
+                # hierarchical SVD with prescribed upper bound on relative error
+                U, S, V, _ = ht.linalg.hsvd_rtol(
+                    Xplus,
+                    self.svd_tol,
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+                Utilde, Stilde, Vtilde, _ = ht.linalg.hsvd_rtol(
+                    Omega,
+                    self.svd_tol,
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+            else:
+                # hierarchical SVD with prescribed, fixed rank
+                U, S, V, _ = ht.linalg.hsvd_rank(
+                    Xplus,
+                    self.svd_rank,
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+                Utilde, Stilde, Vtilde, _ = ht.linalg.hsvd_rank(
+                    Omega,
+                    self.svd_rank + C.shape[0],
+                    compute_sv=True,
+                    safetyshift=5,
+                )
+            self.rom_basis_ = U
+            self.n_modes_ = U.shape[1]
+            self.n_modes_system_ = Utilde.shape[1]
+            Utilde1 = Utilde[: X.shape[0], :]
+            Utilde2 = Utilde[X.shape[0] :, :]
+        else:
+            # compute SVD via "randomized" SVD
+            U, S, V = ht.linalg.rsvd(
+                Xplus,
+                self.svd_rank,
+            )
+            Utilde, Stilde, Vtilde = ht.linalg.rsvd(
+                Omega,
+                self.svd_rank + C.shape[0],
+            )
+            self.rom_basis_ = U
+            self.n_modes_ = U.shape[1]
+            self.n_modes_system_ = Utilde.shape[1]
+            Utilde1 = Utilde[: X.shape[0], :]
+            Utilde2 = Utilde[X.shape[0] :, :]
+
+        # ensure that everything is balanced for the following steps
+        Utilde2.balance_()
+        Utilde1.balance_()
+        Vtilde.balance_()
+        if Utilde2.split is not None and Utilde2.shape[Utilde2.split] < Utilde2.comm.size:
+            Utilde2.resplit_((Utilde2.split + 1) % 2)
+        if Utilde1.split is not None and Utilde1.shape[Utilde1.split] < Utilde1.comm.size:
+            Utilde1.resplit_((Utilde1.split + 1) % 2)
+        if Vtilde.split is not None and Vtilde.shape[Vtilde.split] < Vtilde.comm.size:
+            Vtilde.resplit_((Vtilde.split + 1) % 2)
+        # second step of DMD: compute the reduced order model transfer matrix
+        # we need to assume that the the transfer matrix of the ROM is small enough to fit into memory of one process
+        self.rom_transfer_matrix_ = (
+            self.rom_basis_.T
+            @ Xplus
+            @ (Vtilde / Stilde)
+            @ (Utilde1.T @ self.rom_basis_).resplit_(None)
+        )
+        self.rom_control_matrix_ = (self.rom_basis_.T @ Xplus) @ (
+            (Vtilde / Stilde) @ Utilde2.T
+        ).resplit_(0)
+        self.rom_transfer_matrix_.resplit_(None)
+        self.rom_control_matrix_.resplit_(None)
+
+        # third step of DMD: compute the reduced order model eigenvalues and eigenmodes
+        eigvals_loc, eigvec_loc = torch.linalg.eig(self.rom_transfer_matrix_.larray)
+        self.rom_eigenvalues_ = ht.array(eigvals_loc, split=None, device=X.device)
+        self.rom_eigenmodes_ = ht.array(eigvec_loc, split=None, device=X.device)
+        self.dmdmodes_ = (
+            Xplus @ (Vtilde / Stilde) @ Utilde1.T @ self.rom_basis_ @ self.rom_eigenmodes_
+        )
+
+    def predict(self, X: ht.DNDarray, C: ht.DNDarray) -> ht.DNDarray:
+        """
+        Predicts and returns future states given the current state(s) ``X`` and control trajectory ``C``.
+
+        Parameters
+        ----------
+        X : DNDarray
+            The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states,
+            i.e., X can be of shape (n_state_features,) or (n_batch, n_state_features).
+        C : DNDarray
+            The control trajectory for the prediction. Must have the same number of control features as the training data, i.e., C must be of shape
+            (n_control_features,) --for a single time step-- or (n_control_features, n_timesteps).
+        """
+        if self.rom_basis_ is None:
+            raise RuntimeError("Model has not been fitted yet. Call 'fit' first.")
+        # sanitize input data
+        ht.sanitize_in(X)
+        ht.sanitize_in(C)
+        # if X is a 1-D DNDarray, we add an artificial batch dimension; check correct dimensions for X
+        if X.ndim == 1:
+            X = X.expand_dims(0)
+        if X.ndim > 2:
+            raise ValueError(
+                f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_batch, n_state_features) or a 1-D DNDarray of shape (n_state_features,)."
+            )
+        # if C is a 1-D DNDarray, we add an artificial dimension for the single time step; check correct dimensions for C
+        if C.ndim == 1:
+            C = C.expand_dims(1)
+        if C.ndim > 2:
+            raise ValueError(
+                f"Invalid shape '{C.shape}' for input data 'C'. Must be a 2-D DNDarray of shape (n_control_features, n_timesteps) or a 1-D DNDarray of shape (n_control_features,) for a single time step."
+            )
+        # check if the input data has the right number of features for control and state space
+        if X.shape[1] != self.rom_basis_.shape[0]:
+            raise ValueError(
+                f"Invalid number of features '{X.shape[1]}' in input data 'X'. Must have the same number of features as the training data (={self.rom_basis_.shape[0]})."
+            )
+        if C.shape[0] != self.rom_control_matrix_.shape[1]:
+            raise ValueError(
+                f"Invalid number of features '{C.shape[1]}' in input data 'C'. Must have the same number of features as the training data (={self.rom_control_matrix_.shape[1]})."
+            )
+        # different cases
+        if C.split is not None:
+            raise ValueError("So far only C.split = None is supported .")
+        # time evolution in the reduced order model
+        X_red = X @ self.rom_basis_
+        X_red_full = ht.zeros(
+            (X.shape[0], self.rom_basis_.shape[1], C.shape[1]),
+            split=X_red.split,
+            device=X.device,
+            dtype=X.dtype,
+        )
+        X_red_full[:, :, 0] = X_red
+        for i in range(1, C.shape[1]):
+            X_red_full[:, :, i] = (self.rom_transfer_matrix_ @ X_red_full[:, :, i - 1].T).T + (
+                self.rom_control_matrix_ @ C[:, i - 1]
+            ).T
+        # reshape in order to be able to multiply with basis again
+        X_red_full = X_red_full.reshape(self.rom_basis_.shape[1], -1).resplit_(
+            1 if X_red_full.split == 0 else None
+        )
+        X_pred = self.rom_basis_ @ X_red_full
+        # reshape again and return
+        return X_pred.reshape(X.shape[0], X.shape[1], C.shape[1])
+
+    def __str__(self):
+        if ht.MPI_WORLD.rank == 0:
+            if self.rom_basis_ is not None:
+                return (
+                    f"----------- DMDc (Dynamic Mode Decomposition with control) ---------------\n"
+                    f"Number of modes: {self.n_modes_}\n"
+                    f"State space dimension: {self.rom_basis_.shape[0]}\n"
+                    f"Control space dimension: {self.rom_control_matrix_.shape[1]}\n"
+                    f"DMD eigenvalues: {self.rom_eigenvalues_.larray}\n"
+                    f"--------------------------------------------------------------------------\n"
+                    f"ROM basis of shape {self.rom_basis_.shape}:\n"
+                    f"\t split axis: {self.rom_basis_.split}\n"
+                    f"\t device: {self.rom_basis_.device.__str__().split(':')[-2]}\n"
+                    f"--------------------------------------------------------------------------\n"
+                )
+            else:
+                return (
+                    f"-------- UNFITTED DMDc (Dynamic Mode Decomposition with control) ---------\n"
+                    f"Parameters for fit are as follows: \n"
+                    f"\t SVD-solver: {self.svd_solver}\n"
+                    f"\t SVD-rank: {self.svd_rank}\n"
+                    f"\t SVD-tolerance: {self.svd_tol}\n"
+                    f"--------------------------------------------------------------------------\n"
+                )
+        else:
+            return ""
diff --git a/heat/decomposition/pca.py b/heat/decomposition/pca.py
index dde1d15f5e..ac715ad594 100644
--- a/heat/decomposition/pca.py
+++ b/heat/decomposition/pca.py
@@ -4,6 +4,7 @@
 
 import heat as ht
 from typing import Optional, Tuple, Union
+from ..core.linalg.svdtools import _isvd
 
 try:
     from typing import Self
@@ -36,16 +37,18 @@ class PCA(ht.TransformMixin, ht.BaseEstimator):
     svd_solver : {'full', 'hierarchical'}, default='hierarchical'
         'full' : Full SVD is performed. In general, this is more accurate, but also slower. So far, this is only supported for tall-skinny or short-fat data.
         'hierarchical' : Hierarchical SVD, i.e., an algorithm for computing an approximate, truncated SVD, is performed. Only available for data split along axis no. 0.
+        'randomized' : Randomized SVD is performed.
     tol : float, default=None
         Not yet necessary as iterative methods for PCA are not yet implemented.
-    iterated_power : {'auto', int}, default='auto'
-        if svd_solver='randomized', ... (not yet supported)
+    iterated_power :  int, default=0
+        if svd_solver='randomized', this parameter is the number of iterations for the power method.
+        Choosing `iterated_power > 0` can lead to better results in the case of slowly decaying singular values but is computationally more expensive.
     n_oversamples : int, default=10
-        if svd_solver='randomized', ... (not yet supported)
+        if svd_solver='randomized', this parameter is the number of additional random vectors to sample the range of X so that the range of X can be approximated more accurately.
     power_iteration_normalizer : {'qr'}, default='qr'
-        if svd_solver='randomized', ... (not yet supported)
+        if svd_solver='randomized', this parameter is the normalization form of the iterated power method. So far, only QR is supported.
     random_state : int, default=None
-        if svd_solver='randomized', ... (not yet supported)
+        if svd_solver='randomized', this parameter allows to set the seed for the random number generator.
 
     Attributes
     ----------
@@ -53,16 +56,17 @@ class PCA(ht.TransformMixin, ht.BaseEstimator):
         Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.
     explained_variance_ : DNDarray of shape (n_components,)
         The amount of variance explained by each of the selected components.
-        Not supported by svd_solver='hierarchical'.
+        Not supported by svd_solver='hierarchical' and svd_solver='randomized'.
     explained_variance_ratio_ : DNDarray of shape (n_components,)
         Percentage of variance explained by each of the selected components.
-        Not supported by svd_solver='hierarchical'.
+        Not supported by svd_solver='hierarchical' and svd_solver='randomized'.
     total_explained_variance_ratio_ : float
         The percentage of total variance explained by the selected components together.
         For svd_solver='hierarchical', an lower estimate for this quantity is provided; see :func:`ht.linalg.hsvd_rtol` and :func:`ht.linalg.hsvd_rank` for details.
+        Not supported by svd_solver='randomized'.
     singular_values_ : DNDarray of shape (n_components,)
         The singular values corresponding to each of the selected components.
-        Not supported by svd_solver='hierarchical'.
+        Not supported by svd_solver='hierarchical' and svd_solver='randomized'.
     mean_ : DNDarray of shape (n_features,)
         Per-feature empirical mean, estimated from the training set.
     n_components_ : int
@@ -73,9 +77,10 @@ class PCA(ht.TransformMixin, ht.BaseEstimator):
         not yet implemented
 
     Notes
-    ------------
-    Hieararchical SVD (`svd_solver = "hierarchical"`) computes and approximate, truncated SVD. Thus, the results are not exact, in general, unless the
-    truncation rank chose is larger than the actual rank (matrix rank) of the underlying data; see :func:`ht.linalg.hsvd_rank` and :func:`ht.linalg.hsvd_rtol` for details.
+    -----
+    Hierarchical SVD (`svd_solver = "hierarchical"`) computes an approximate, truncated SVD. Thus, the results are not exact, in general, unless the
+    truncation rank chosen is larger than the actual rank (matrix rank) of the underlying data; see :func:`ht.linalg.hsvd_rank` and :func:`ht.linalg.hsvd_rtol` for details.
+    Randomized SVD (`svd_solver = "randomized"`) is a stochastic algorithm that computes an approximate, truncated SVD.
     """
 
     def __init__(
@@ -85,7 +90,7 @@ def __init__(
         whiten: bool = False,
         svd_solver: str = "hierarchical",
         tol: Optional[float] = None,
-        iterated_power: Union[str, int] = "auto",
+        iterated_power: Union[str, int] = 0,
         n_oversamples: int = 10,
         power_iteration_normalizer: str = "qr",
         random_state: Optional[int] = None,
@@ -99,10 +104,12 @@ def __init__(
             raise NotImplementedError("Whitening is not yet supported. Please set whiten=False.")
         if not (svd_solver == "full" or svd_solver == "hierarchical" or svd_solver == "randomized"):
             raise ValueError(
-                "At the moment, only svd_solver='full' (for tall-skinny or short-fat data) and svd_solver='hierarchical' are supported. \n An implementation of the 'full' option for arbitrarily shaped data as well as the option 'randomized' are already planned."
+                "At the moment, only svd_solver='full' (for tall-skinny or short-fat data), svd_solver='hierarchical', and svd_solver='randomized' are supported. \n An implementation of the 'full' option for arbitrarily shaped data is already planned."
+            )
+        if not isinstance(iterated_power, int):
+            raise TypeError(
+                "iterated_power must be an integer. The option 'auto' is not yet supported."
             )
-        if iterated_power != "auto" and not isinstance(iterated_power, int):
-            raise TypeError("iterated_power must be 'auto' or an integer.")
         if isinstance(iterated_power, int) and iterated_power < 0:
             raise ValueError("if an integer, iterated_power must be greater or equal to 0.")
         if power_iteration_normalizer != "qr":
@@ -113,10 +120,8 @@ def __init__(
             raise ValueError(
                 "Argument tol is not yet necessary as iterative methods for PCA are not yet implemented. Please set tol=None."
             )
-        if random_state is None:
-            random_state = 0
-        if not isinstance(random_state, int):
-            raise ValueError("random_state must be None or an integer.")
+        if random_state is not None and not isinstance(random_state, int):
+            raise ValueError(f"random_state must be None or an integer, was {type(random_state)}.")
         if (
             n_components is not None
             and not (isinstance(n_components, int) and n_components >= 1)
@@ -135,6 +140,9 @@ def __init__(
         self.n_oversamples = n_oversamples
         self.power_iteration_normalizer = power_iteration_normalizer
         self.random_state = random_state
+        if self.random_state is not None:
+            # set random seed accordingly
+            ht.random.seed(self.random_state)
 
         # set future attributes to None to initialize those that will not be computed later on with None (e.g., explained_variance_ for svd_solver='hierarchical')
         self.components_ = None
@@ -220,10 +228,15 @@ def fit(self, X: ht.DNDarray, y=None) -> Self:
             self.total_explained_variance_ratio_ = 1 - info.larray.item() ** 2
 
         else:
-            # here one could add other computational backends
-            raise NotImplementedError(
-                f"The chosen svd_solver {self.svd_solver} is not yet implemented."
+            # compute SVD via "randomized" SVD
+            _, S, V = ht.linalg.rsvd(
+                X_centered,
+                self.n_components_,
+                n_oversamples=self.n_oversamples,
+                power_iter=self.iterated_power,
             )
+            self.components_ = V.T
+            self.n_components_ = V.shape[1]
 
         self.n_samples_ = X.shape[0]
         self.noise_variance_ = None  # not yet implemented
@@ -265,3 +278,156 @@ def inverse_transform(self, X: ht.DNDarray) -> ht.DNDarray:
             )
 
         return X @ self.components_ + self.mean_
+
+
+class IncrementalPCA(ht.TransformMixin, ht.BaseEstimator):
+    """
+    Incremental Principal Component Analysis (PCA).
+
+    This class allows for incremental updates of the PCA model. This is especially useful for large data sets that do not fit into memory.
+
+    An example how to apply this class is given in, e.g., `benchmarks/cb/decomposition.py`.
+
+    Parameters
+    ----------
+    n_components : int, optional
+        Number of components to keep. If `n_components` is not set all components are kept (default).
+    copy : bool, default=True
+        In-place operations are not yet supported. Please set `copy=True`.
+    whiten : bool, default=False
+        Not yet supported.
+    batch_size : int, optional
+        Currently not needed and only added for API consistency and possible future extensions.
+
+    Attributes
+    ----------
+    components_ : DNDarray of shape (n_components, n_features)
+        Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by `explained_variance_.
+    singular_values_ : DNDarray of shape (n_components,)
+        The singular values corresponding to each of the selected components.
+    mean_ : DNDarray of shape (n_features,)
+        Per-feature empirical mean, estimated from the training set.
+    n_components_ : int
+        The estimated number of components.
+    n_samples_seen_ : int
+        Number of samples processed so far.
+    """
+
+    def __init__(
+        self,
+        n_components: Optional[int] = None,
+        copy: bool = True,
+        whiten: bool = False,
+        batch_size: Optional[int] = None,
+    ):
+        if not copy:
+            raise NotImplementedError(
+                "In-place operations for PCA are not supported at the moment. Please set copy=True."
+            )
+        if whiten:
+            raise NotImplementedError("Whitening is not yet supported. Please set whiten=False.")
+        if n_components is not None:
+            if not isinstance(n_components, int):
+                raise TypeError(
+                    f"n_components must be None or an integer, but is {type(n_components)}."
+                )
+            else:
+                if n_components < 1:
+                    raise ValueError("if an integer, n_components must be greater or equal to 1.")
+        self.whiten = whiten
+        self.n_components = n_components
+        self.batch_size = batch_size
+        self.components_ = None
+        # self.explained_variance_ = None            # not yet supported
+        # self.explained_variance_ratio_ = None      # not yet supported
+        self.singular_values_ = None
+        self.mean_ = None
+        self.n_components_ = None
+        self.batch_size_ = None
+        self.n_samples_seen_ = 0
+
+    def fit(self, X, y=None) -> Self:
+        """
+        Not yet implemented; please use `.partial_fit` instead.
+        Please open an issue on GitHub if you would like to see this method implemented and make a suggestion on how you would like to see it implemented.
+        """
+        raise NotImplementedError(
+            f"You have called IncrementalPCA's `.fit`-method with an argument of type {type(X)}. \n So far, we have only implemented the method `.partial_fit` which performs a single-step update of incremental PCA. \n Please consider using `.partial_fit` for the moment, and open an issue on GitHub in which we can discuss what you would like to see implemented for the `.fit`-method."
+        )
+
+    def partial_fit(self, X: ht.DNDarray, y=None):
+        """
+        One single step of incrementally building up the PCA.
+        Input X is the current batch of data that needs to be added to the existing PCA.
+        """
+        ht.sanitize_in(X)
+        if y is not None:
+            raise ValueError(
+                "Argument y is ignored and just present for API consistency by convention."
+            )
+        if self.n_samples_seen_ == 0:
+            # this is the first batch of data, hence we need to initialize everything
+            if self.n_components is None:
+                self.n_components_ = min(X.shape)
+            else:
+                self.n_components_ = min(X.shape[0], X.shape[1], self.n_components)
+
+            self.mean_ = X.mean(axis=0)
+            X_centered = X - self.mean_
+            _, S, V = ht.linalg.svd(X_centered)
+            self.components_ = V[:, : self.n_components_].T
+            self.singular_values_ = S[: self.n_components_]
+            self.n_samples_seen_ = X.shape[0]
+
+        else:
+            # if already batches of data have been seen before, only an update is necessary
+            U, S, mean = _isvd(
+                X.T,
+                self.components_.T,
+                self.singular_values_,
+                V_old=None,
+                maxrank=self.n_components,
+                old_matrix_size=self.n_samples_seen_,
+                old_rowwise_mean=self.mean_,
+            )
+            self.components_ = U.T
+            self.singular_values_ = S
+            self.mean_ = mean
+            self.n_samples_seen_ += X.shape[0]
+            self.n_components_ = self.components_.shape[0]
+
+    def transform(self, X: ht.DNDarray) -> ht.DNDarray:
+        """
+        Apply dimensionality based on PCA to X.
+
+        Parameters
+        ----------
+        X : DNDarray of shape (n_samples, n_features)
+            Data set to be transformed.
+        """
+        ht.sanitize_in(X)
+        if X.shape[1] != self.mean_.shape[0]:
+            raise ValueError(
+                f"X must have the same number of features as the training data. Expected {self.mean_.shape[0]} but got {X.shape[1]}."
+            )
+
+        # center data and apply PCA
+        X_centered = X - self.mean_
+        return X_centered @ self.components_.T
+
+    def inverse_transform(self, X: ht.DNDarray) -> ht.DNDarray:
+        """
+        Transform data back to its original space.
+
+        Parameters
+        ----------
+        X : DNDarray of shape (n_samples, n_components)
+            Data set to be transformed back.
+        """
+        ht.sanitize_in(X)
+        if X.shape[1] != self.n_components_:
+            raise ValueError(
+                f"Dimension mismatch. Expected input of shape n_points x {self.n_components_} but got {X.shape}."
+            )
+
+        return X @ self.components_ + self.mean_
diff --git a/heat/decomposition/tests/test_dmd.py b/heat/decomposition/tests/test_dmd.py
new file mode 100644
index 0000000000..9aa803d6c6
--- /dev/null
+++ b/heat/decomposition/tests/test_dmd.py
@@ -0,0 +1,589 @@
+import os
+import unittest
+import platform
+import numpy as np
+import torch
+import heat as ht
+
+from ...core.tests.test_suites.basic_test import TestCase
+
+# MPS does not support non-float matrix multiplication
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
+
+@unittest.skipIf(is_mps, "MPS does not support non-float matrix multiplication")
+class TestDMD(TestCase):
+    def test_dmd_setup_catch_wrong(self):
+        # catch wrong inputs during setup
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMD(svd_solver=0)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="Gramian")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="full", svd_rank=3, svd_tol=1e-1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="full", svd_tol=-0.031415926)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="hierarchical")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3, svd_tol=1e-1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="randomized")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="randomized", svd_rank=2, svd_tol=1e-1)
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMD(svd_solver="full", svd_rank=0.1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=0)
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMD(svd_solver="hierarchical", svd_tol="auto")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMD(svd_solver="randomized", svd_rank=0)
+
+    def test_dmd_fit_catch_wrong(self):
+        dmd = ht.decomposition.DMD(svd_solver="full")
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0))
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0))
+
+    def test_dmd_predict_catch_wrong(self):
+        # not yet fitted
+        dmd = ht.decomposition.DMD(svd_solver="full")
+        with self.assertRaises(RuntimeError):
+            dmd.predict_next(ht.zeros(10))
+        with self.assertRaises(RuntimeError):
+            dmd.predict(ht.zeros(10), 10)
+
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32)
+        dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4)
+        dmd.fit(X)
+        # wrong shape of input for prediction
+        with self.assertRaises(ValueError):
+            dmd.predict_next(ht.zeros((100, 4), split=0))
+        with self.assertRaises(ValueError):
+            dmd.predict(ht.zeros((100, 4), split=0), 10)
+        # wrong input for steps in predict
+        with self.assertRaises(TypeError):
+            dmd.predict(
+                ht.zeros((1000, 5), split=0),
+                "this is clearly neither an integer nor a list of integers",
+            )
+        # check catching wrong n_steps argument
+        with self.assertRaises(TypeError):
+            dmd.predict_next(X, "this is clearly not an integer")
+        # what not has been implemented so far
+        with self.assertRaises(NotImplementedError):
+            dmd.predict(ht.zeros((1000, 5), split=0), 10)
+
+    def test_dmd_functionality_split0_full(self):
+        # split=0, full SVD
+        X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0)
+        dmd = ht.decomposition.DMD(svd_solver="full")
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_eigenmodes_.dtype == ht.complex64)
+        self.assertEqual(dmd.rom_eigenmodes_.shape, (dmd.n_modes_, dmd.n_modes_))
+        dmd = ht.decomposition.DMD(svd_solver="full", svd_tol=1e-1)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_basis_.shape[0] == 10 * ht.MPI_WORLD.size)
+        dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=3)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_basis_.shape[1] == 3)
+        self.assertTrue(dmd.dmdmodes_.shape == (10 * ht.MPI_WORLD.size, 3))
+
+    def test_dmd_functionality_split0_hierarchical(self):
+        # split=0, hierarchical SVD
+        X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0)
+        dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_eigenvalues_.shape == (3,))
+        dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_tol=1e-1)
+        dmd.fit(X)
+        Y = ht.random.randn(10 * ht.MPI_WORLD.size, split=0)
+        Z = dmd.predict_next(Y)
+        self.assertTrue(Z.shape == (10 * ht.MPI_WORLD.size,))
+        self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex64)
+        self.assertTrue(dmd.dmdmodes_.dtype == ht.complex64)
+
+    def test_dmd_functionality_split0_randomized(self):
+        # split=0, randomized SVD
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32)
+        dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4)
+        dmd.fit(X)
+        Y = ht.random.rand(1000, 2 * ht.MPI_WORLD.size, split=1, dtype=ht.float32)
+        Z = dmd.predict_next(Y, 2)
+        self.assertTrue(Z.dtype == ht.float32)
+        self.assertEqual(Z.shape, Y.shape)
+        Y = ht.random.rand(1000, split=0, dtype=ht.float32)
+        Z = dmd.predict_next(Y, 2)
+        self.assertTrue(Z.dtype == ht.float32)
+        self.assertEqual(Z.shape, Y.shape)
+
+    def test_dmd_functionality_split1_full(self):
+        # split=1, full SVD
+        X = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        dmd = ht.decomposition.DMD(svd_solver="full")
+        print(dmd)
+        dmd.fit(X)
+        print(dmd)
+        self.assertTrue(dmd.dmdmodes_.shape[0] == 10)
+        dmd = ht.decomposition.DMD(svd_solver="full", svd_tol=1e-1)
+        dmd.fit(X)
+        dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=3)
+        dmd.fit(X)
+        self.assertTrue(dmd.dmdmodes_.shape[1] == 3)
+
+    def test_dmd_functionality_split1_hierarchical(self):
+        # split=1, hierarchical SVD
+        X = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_transfer_matrix_.shape == (3, 3))
+        self.assertTrue(dmd.rom_transfer_matrix_.dtype == ht.float64)
+        dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_tol=1e-1)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex128)
+        Y = ht.random.randn(10, 2 * ht.MPI_WORLD.size, split=1)
+        Z = dmd.predict_next(Y)
+        self.assertTrue(Z.shape == Y.shape)
+
+    def test_dmd_functionality_split1_randomized(self):
+        # split=1, randomized SVD
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0)
+        dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4)
+        dmd.fit(X)
+        self.assertTrue(dmd.rom_eigenmodes_.shape == (4, 4))
+        self.assertTrue(dmd.n_modes_ == 4)
+        Y = ht.random.randn(1000, 2, split=0, dtype=ht.float64)
+        Z = dmd.predict_next(Y)
+        self.assertTrue(Z.dtype == Y.dtype)
+        self.assertEqual(Z.shape, Y.shape)
+
+    def test_dmd_correctness_split0(self):
+        ht.random.seed(25032025)
+        # test correctness on behalf of a constructed example with known solution
+        # to do so we need to use the exact SVD, i.e., the "full" solver
+        r = 6
+        A_red = ht.array(
+            [
+                [0.0, -1.0, 0.0, 0.0, 0.0, 0.0],
+                [1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+                [0.0, 0.0, 1.5, 0.0, 0.0, 0.0],
+                [0.0, 0.0, 0.0, 0.5, 0.0, 0.0],
+                [0.0, 0.0, 0.0, 0.0, -1.5, 0.0],
+                [0.0, 0.0, 0.0, 0.0, 0.0, -0.5],
+            ],
+            split=None,
+            dtype=ht.float32,
+        )
+        x0_red = ht.random.randn(r, 1, split=None)
+        m, n = 25 * ht.MPI_WORLD.size, 15
+        X = ht.hstack(
+            [
+                (ht.array(torch.linalg.matrix_power(A_red.larray, i) @ x0_red.larray))
+                for i in range(n + 1)
+            ]
+        )
+        U = ht.random.randn(m, r, split=0)
+        U, _ = ht.linalg.qr(U)
+        X = U @ X
+
+        dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=r)
+        dmd.fit(X)
+
+        # check whether the DMD-modes are correct
+        sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy())
+        sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy()))
+        self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-3, rtol=1e-3))
+
+        # check prediction of next states
+        Y = dmd.predict_next(X)
+        self.assertTrue(ht.allclose(Y[:, :n], X[:, 1:], atol=1e-3, rtol=1e-3))
+
+        # check prediction of previous states
+        Y = dmd.predict_next(X, -1)
+        self.assertTrue(ht.allclose(Y[:, 1:], X[:, :n], atol=1e-3, rtol=1e-3))
+
+    def test_dmd_correctness_split1(self):
+        # dtype is float64, transfer matrix with nontrivial kernel
+        r = 3
+        A_red = ht.array(
+            [[0.0, 0.0, 1.0], [0.5, 0.0, 0.0], [0.5, 0.0, 0.0]], split=None, dtype=ht.float64
+        )
+        x0_red = ht.random.randn(r, 1, split=None, dtype=ht.float64)
+        m, n = 10, 15 * ht.MPI_WORLD.size + 2
+        X = ht.hstack(
+            [
+                (ht.array(torch.linalg.matrix_power(A_red.larray, i) @ x0_red.larray))
+                for i in range(n + 1)
+            ]
+        )
+        U = ht.random.randn(m, r, split=None, dtype=ht.float64)
+        U, _ = ht.linalg.qr(U)
+        X = U @ X
+        X = X.resplit_(1)
+
+        dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=r)
+        dmd.fit(X)
+
+        # check whether the DMD-modes are correct
+        sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy())
+        sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy()))
+        self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-12, rtol=1e-12))
+
+        # check prediction of third-next step
+        Y = dmd.predict_next(X, 3)
+        self.assertTrue(ht.allclose(Y[:, : n - 2], X[:, 3:], atol=1e-12, rtol=1e-12))
+        # note: checking previous steps doesn't make sense here, as kernel of A_red is nontrivial
+
+        # check batch prediction (split = 1)
+        X_batch = X[:, : 5 * ht.MPI_WORLD.size]
+        X_batch.balance_()
+        Y = dmd.predict(X_batch, 5)
+        Y_np = Y.numpy()
+        X_np = X.numpy()
+        for i in range(5):
+            self.assertTrue(np.allclose(Y_np[i, :, :5], X_np[:, i : i + 5], atol=1e-12, rtol=1e-12))
+
+        # check batch prediction (split = None)
+        X_batch = ht.random.rand(10, 2 * ht.MPI_WORLD.size, split=None)
+        Y = dmd.predict(X_batch, [-1, 1, 3])
+
+
+class TestDMDc(TestCase):
+    def test_dmdc_setup_catch_wrong(self):
+        # catch wrong inputs
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMDc(svd_solver=0)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="Gramian")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="full", svd_rank=3, svd_tol=1e-1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="full", svd_tol=-0.031415926)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="hierarchical")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3, svd_tol=1e-1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="randomized")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="randomized", svd_rank=2, svd_tol=1e-1)
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMDc(svd_solver="full", svd_rank=0.1)
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=0)
+        with self.assertRaises(TypeError):
+            ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol="auto")
+        with self.assertRaises(ValueError):
+            ht.decomposition.DMDc(svd_solver="randomized", svd_rank=0)
+
+    def test_dmdc_fit_catch_wrong(self):
+        dmd = ht.decomposition.DMDc(svd_solver="full")
+        # wrong dimensions of input
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0), ht.zeros((2, 4), split=0))
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((2, 4), split=0), ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0))
+        # less than two timesteps
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0), ht.zeros((2, 4), split=0))
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((2, 4), split=0), ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0))
+        # inconsistent number of timesteps
+        with self.assertRaises(ValueError):
+            dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 3), split=0), ht.zeros((2, 4), split=0))
+        # predict for fit
+        with self.assertRaises(RuntimeError):
+            dmd.predict(ht.zeros((5 * ht.MPI_WORLD.size, 3), split=0), ht.zeros((2, 4), split=0))
+        # split mismatch for X and C
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32)
+        dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4)
+        # split mismatch for X and C
+        C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1)
+        with self.assertRaises(ValueError):
+            dmd.fit(X, C)
+
+    def test_dmdc_predict_catch_wrong(self):
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32)
+        dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4)
+        C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None)
+        dmd.fit(X, C)
+        Y = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=1)
+        # wrong dimensions of input for prediction
+        with self.assertRaises(ValueError):
+            dmd.predict(Y, ht.zeros((5, 5, 5), split=0))
+        with self.assertRaises(ValueError):
+            dmd.predict(ht.zeros((5, 5, 5), split=0), C)
+        # wrong sizes for inputs in predict
+        with self.assertRaises(ValueError):
+            dmd.predict(Y, ht.zeros((10, 5), split=0))
+        with self.assertRaises(ValueError):
+            dmd.predict(ht.zeros((1000, 5), split=0), C)
+        # wrong split for C
+        with self.assertRaises(ValueError):
+            dmd.predict(Y, ht.zeros((10, 5), split=1))
+        # wrong shape for C
+        with self.assertRaises(ValueError):
+            dmd.predict(Y, ht.zeros((5, 5), split=None))
+
+    def test_dmdc_functionality_split0_full(self):
+        # split=0, full SVD
+        X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0)
+        C = ht.random.randn(10, 10, split=0)
+        dmd = ht.decomposition.DMDc(svd_solver="full")
+        print(dmd)
+        dmd.fit(X, C)
+        print(dmd)
+        self.assertTrue(dmd.rom_eigenmodes_.dtype == ht.complex64)
+        self.assertEqual(dmd.rom_eigenmodes_.shape, (dmd.n_modes_, dmd.n_modes_))
+        dmd = ht.decomposition.DMDc(svd_solver="full", svd_tol=1e-1)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_basis_.shape[0] == 10 * ht.MPI_WORLD.size)
+        dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_basis_.shape[1] == 3)
+        self.assertTrue(dmd.dmdmodes_.shape == (10 * ht.MPI_WORLD.size, 3))
+
+    def test_dmdc_functionality_split0_hierarchical(self):
+        # split=0, hierarchical SVD
+        X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0)
+        C = ht.random.randn(10, 10, split=0)
+        dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_eigenvalues_.shape == (3,))
+        dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol=1e-1)
+        dmd.fit(X, C)
+        Y = ht.random.randn(3, 10 * ht.MPI_WORLD.size, split=1)
+        C = ht.random.randn(10, 5, split=None)
+        Z = dmd.predict(Y, C)
+        self.assertTrue(Z.shape == (3, 10 * ht.MPI_WORLD.size, 5))
+        self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex64)
+        self.assertTrue(dmd.dmdmodes_.dtype == ht.complex64)
+
+    def test_dmdc_functionality_split0_randomized(self):
+        # split=0, randomized SVD
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32)
+        dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4)
+        C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None)
+        dmd.fit(X, C)
+        Y = ht.random.rand(2 * ht.MPI_WORLD.size, 1000, split=0, dtype=ht.float32)
+        C = ht.random.rand(10, 5, split=None)
+        Z = dmd.predict(Y, C)
+        self.assertTrue(Z.dtype == ht.float32)
+        self.assertEqual(Z.shape, (2 * ht.MPI_WORLD.size, 1000, 5))
+
+    def test_dmdc_functionality_split1_full(self):
+        # split=1, full SVD
+        X = ht.random.randn(10, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        C = ht.random.randn(2, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        dmd = ht.decomposition.DMDc(svd_solver="full")
+        dmd.fit(X, C)
+        self.assertTrue(dmd.dmdmodes_.shape[0] == 10)
+        dmd = ht.decomposition.DMDc(svd_solver="full", svd_tol=1e-1)
+        dmd.fit(X, C)
+        dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.dmdmodes_.shape[1] == 3)
+
+    def test_dmdc_functionality_split1_hierarchical(self):
+        # split=1, hierarchical SVD
+        X = ht.random.randn(10, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        C = ht.random.randn(2, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64)
+        dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_transfer_matrix_.shape == (3, 3))
+        self.assertTrue(dmd.rom_transfer_matrix_.dtype == ht.float64)
+        dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol=1e-1)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex128)
+        Y = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0)
+        C = ht.random.randn(2, split=None)
+        Z = dmd.predict(Y, C)
+        self.assertTrue(Z.shape == (10 * ht.MPI_WORLD.size, 10, 1))
+
+    def test_dmdc_functionality_split1_randomized(self):
+        # split=1, randomized SVD
+        X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0)
+        C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None)
+        dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=8)
+        dmd.fit(X, C)
+        self.assertTrue(dmd.rom_eigenmodes_.shape == (8, 8))
+        self.assertTrue(dmd.n_modes_ == 8)
+        Y = ht.random.randn(1000, split=0, dtype=ht.float64)
+        Z = dmd.predict(Y, C)
+        self.assertTrue(Z.dtype == Y.dtype)
+        self.assertEqual(Z.shape, (1, 1000, 10 * ht.MPI_WORLD.size))
+
+    def test_dmdc_correctness_split0(self):
+        # check correctness on behalf of a constructed example with known solution,
+        # thus only the "full" solver is used
+        r = 3
+        A_red = ht.array(
+            [
+                [0.0, 1, 0.0],
+                [-1.0, 0.0, 0.0],
+                [0.0, 0.0, 0.1],
+            ],
+            split=None,
+            dtype=ht.float64,
+        )
+        B_red = ht.array(
+            [
+                [1.0, 0.0],
+                [0.0, -1.0],
+                [0.0, 1.0],
+            ],
+            split=None,
+            dtype=ht.float64,
+        )
+        x0_red = ht.array(
+            [
+                [
+                    10.0,
+                ],
+                [
+                    5.0,
+                ],
+                [
+                    -10.0,
+                ],
+            ],
+            split=None,
+            dtype=ht.float64,
+        )
+        m, n = 10 * ht.MPI_WORLD.size, 10
+        C = 0.1 * ht.ones((2, n), split=None, dtype=ht.float64)
+        X_red = [x0_red]
+        for k in range(n - 1):
+            X_red.append(A_red @ X_red[-1] + B_red @ C[:, k].reshape(-1, 1))
+        X = ht.stack(X_red, axis=1).squeeze()
+        U = ht.random.randn(m, r, split=0, dtype=ht.float64)
+        U, _ = ht.linalg.qr(U)
+        X = U @ X
+
+        dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3)
+        dmd.fit(X, C)
+
+        # check whether the DMD-modes are correct
+        sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy())
+        sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy()))
+        self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-12, rtol=1e-12))
+
+        # check if DMD fits the data correctly
+        X_red = dmd.rom_basis_.T @ X
+        X_res = (
+            X_red[:, 1:]
+            - dmd.rom_transfer_matrix_ @ X_red[:, :-1]
+            - dmd.rom_control_matrix_ @ C[:, :-1]
+        )
+        self.assertTrue(ht.max(ht.abs(X_res)) < 1e-10)
+
+        # check predict
+        Y = dmd.predict(X[:, 0], C[:, :10]).squeeze()
+
+        # check prediction of next states
+        Y_red = dmd.rom_basis_.T @ Y
+        Y_res = (
+            Y_red[:, 1:]
+            - dmd.rom_transfer_matrix_ @ Y_red[:, :-1]
+            - dmd.rom_control_matrix_ @ C[:, :-1]
+        )
+        self.assertTrue(ht.max(ht.abs(Y_res)) < 1e-10)
+        self.assertTrue(ht.allclose(Y[:, :], X[:, :10], atol=1e-10, rtol=1e-10))
+
+    def test_dmdc_correctness_split1(self):
+        # check correctness on behalf of a constructed example with known solution,
+        # thus only the "full" solver is used
+        A_red = ht.array(
+            [
+                [
+                    1.0,
+                    0.0,
+                    0.0,
+                    0.0,
+                    0.0,
+                ],
+                [
+                    0.0,
+                    1.05,
+                    0.0,
+                    0.0,
+                    0.0,
+                ],
+                [
+                    0.0,
+                    0.0,
+                    -0.1,
+                    0.0,
+                    0.0,
+                ],
+                [
+                    0.0,
+                    0.0,
+                    0.0,
+                    0.0,
+                    0.5,
+                ],
+                [
+                    0.0,
+                    0.0,
+                    0.0,
+                    -0.5,
+                    0.0,
+                ],
+            ],
+            split=None,
+            dtype=ht.float32,
+        )
+        B_red = ht.array(
+            [
+                [1.0, 0.0],
+                [0.0, 1.0],
+                [1.0, 0.0],
+                [0.0, 1.0],
+                [0.0, 0.0],
+            ],
+            split=None,
+            dtype=ht.float32,
+        )
+        x0_red = ht.ones((5, 1), split=None, dtype=ht.float32)
+        n = 20 * ht.MPI_WORLD.size
+        C = 0.1 * ht.random.randn(2, n, split=None, dtype=ht.float32)
+        X_red = [x0_red]
+        for k in range(n - 1):
+            X_red.append(A_red @ X_red[-1] + B_red @ C[:, k].reshape(-1, 1))
+        X = ht.stack(X_red, axis=1).squeeze()
+        X.resplit_(1)
+
+        dmd = ht.decomposition.DMDc(svd_solver="full")
+        dmd.fit(X, C)
+
+        # check whether the DMD-modes are correct
+        sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy())
+        sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy()))
+        self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-4, rtol=1e-4))
+
+        # check if DMD fits the data correctly
+        X_red = dmd.rom_basis_.T @ X
+        X_red.resplit_(None)
+        X_res = (
+            X_red[:, 1:]
+            - dmd.rom_transfer_matrix_ @ X_red[:, :-1]
+            - dmd.rom_control_matrix_ @ C[:, :-1]
+        )
+        self.assertTrue(ht.max(ht.abs(X_res)) < 1e-2)
+
+        # # check predict
+        Y = dmd.predict(X[:, 0], C).squeeze()
+
+        # check prediction of next states
+        Y_red = dmd.rom_basis_.T @ Y
+        Y_res = (
+            Y_red[:, 1:]
+            - dmd.rom_transfer_matrix_ @ Y_red[:, :-1]
+            - dmd.rom_control_matrix_ @ C[:, :-1]
+        )
+        self.assertTrue(ht.max(ht.abs(Y_res)) < 1e-2)
+        self.assertTrue(ht.allclose(Y[:, :], X[:, :], atol=1e-2, rtol=1e-2))
diff --git a/heat/decomposition/tests/test_pca.py b/heat/decomposition/tests/test_pca.py
index ffe6d52750..361272ec91 100644
--- a/heat/decomposition/tests/test_pca.py
+++ b/heat/decomposition/tests/test_pca.py
@@ -20,10 +20,10 @@ def test_pca_setup(self):
         self.assertEqual(pca.whiten, False)
         self.assertEqual(pca.svd_solver, "hierarchical")
         self.assertEqual(pca.tol, None)
-        self.assertEqual(pca.iterated_power, "auto")
+        self.assertEqual(pca.iterated_power, 0)
         self.assertEqual(pca.n_oversamples, 10)
         self.assertEqual(pca.power_iteration_normalizer, "qr")
-        self.assertEqual(pca.random_state, 0)
+        self.assertEqual(pca.random_state, None)
 
         # check catching of invalid parameters
         # wrong withening
@@ -115,7 +115,6 @@ def test_pca_with_hiearchical_rtol(self):
             and pca.total_explained_variance_ratio_ >= 0.0
             and pca.total_explained_variance_ratio_ <= 1.0
         )
-        print(pca.total_explained_variance_ratio_)
         self.assertTrue(pca.total_explained_variance_ratio_ >= ratio)
         if ht.MPI_WORLD.size > 1:
             self.assertEqual(pca.explained_variance_, None)
@@ -192,8 +191,147 @@ def test_pca_with_full_rtol(self):
         self.assertEqual(pca.noise_variance_, None)
 
     def test_pca_randomized(self):
-        pca = ht.decomposition.PCA(n_components=2, svd_solver="randomized")
+        rank = 2
+        pca = ht.decomposition.PCA(n_components=rank, svd_solver="randomized")
         data = ht.random.randn(15 * ht.MPI_WORLD.size, 5, split=0)
+
+        pca.fit(data)
+        self.assertEqual(pca.components_.shape, (rank, 5))
+        self.assertEqual(pca.n_components_, rank)
+        self.assertEqual(pca.mean_.shape, (5,))
+
         if ht.MPI_WORLD.size > 1:
-            with self.assertRaises(NotImplementedError):
-                pca.fit(data)
+            self.assertEqual(pca.total_explained_variance_ratio_, None)
+            self.assertEqual(pca.noise_variance_, None)
+            self.assertEqual(pca.explained_variance_, None)
+            self.assertEqual(pca.explained_variance_ratio_, None)
+            self.assertEqual(pca.singular_values_, None)
+
+        pca = ht.decomposition.PCA(n_components=None, svd_solver="randomized", random_state=1234)
+        self.assertEqual(ht.random.get_state()[1], 1234)
+
+
+class TestIncrementalPCA(TestCase):
+    def test_incrementalpca_setup(self):
+        pca = ht.decomposition.IncrementalPCA(n_components=2)
+
+        # check correct base classes
+        self.assertTrue(ht.is_estimator(pca))
+        self.assertTrue(ht.is_transformer(pca))
+
+        # check correct default values
+        self.assertEqual(pca.n_components, 2)
+        self.assertEqual(pca.whiten, False)
+        self.assertEqual(pca.batch_size, None)
+        self.assertEqual(pca.components_, None)
+        self.assertEqual(pca.singular_values_, None)
+        self.assertEqual(pca.mean_, None)
+        self.assertEqual(pca.n_components_, None)
+        self.assertEqual(pca.batch_size_, None)
+        self.assertEqual(pca.n_samples_seen_, 0)
+
+        # check catching of invalid parameters
+        # whitening and in-place are not yet supported
+        with self.assertRaises(NotImplementedError):
+            ht.decomposition.IncrementalPCA(whiten=True)
+        with self.assertRaises(NotImplementedError):
+            ht.decomposition.IncrementalPCA(copy=False)
+        # wrong n_components
+        with self.assertRaises(TypeError):
+            ht.decomposition.IncrementalPCA(n_components=0.9)
+        with self.assertRaises(ValueError):
+            ht.decomposition.IncrementalPCA(n_components=0)
+
+    def test_incrementalpca_full_rank_reached_split0(self):
+        # full rank is reached, split = 0
+        # dtype float32
+        pca = ht.decomposition.IncrementalPCA()
+        data0 = ht.random.randn(150 * ht.MPI_WORLD.size, 2 * ht.MPI_WORLD.size + 1, split=0)
+        data1 = 1.0 + ht.random.rand(50 * ht.MPI_WORLD.size, 2 * ht.MPI_WORLD.size + 1, split=0)
+        data = ht.vstack([data0, data1])
+        data0_np = data0.numpy()
+        data_np = data.numpy()
+
+        # test partial_fit, step 0
+        pca.partial_fit(data0)
+        self.assertEqual(
+            pca.components_.shape, (2 * ht.MPI_WORLD.size + 1, 2 * ht.MPI_WORLD.size + 1)
+        )
+        self.assertEqual(pca.n_components_, 2 * ht.MPI_WORLD.size + 1)
+        self.assertEqual(pca.mean_.shape, (2 * ht.MPI_WORLD.size + 1,))
+        self.assertEqual(pca.singular_values_.shape, (2 * ht.MPI_WORLD.size + 1,))
+        self.assertEqual(pca.n_samples_seen_, 150 * ht.MPI_WORLD.size)
+        s0_np = np.linalg.svd(data0_np - data0_np.mean(axis=0), compute_uv=False, hermitian=False)
+        self.assertTrue(np.allclose(s0_np, pca.singular_values_.numpy()))
+
+        # test partial_fit, step 1
+        pca.partial_fit(data1)
+        self.assertEqual(
+            pca.components_.shape, (2 * ht.MPI_WORLD.size + 1, 2 * ht.MPI_WORLD.size + 1)
+        )
+        self.assertEqual(pca.n_components_, 2 * ht.MPI_WORLD.size + 1)
+        self.assertTrue(ht.allclose(pca.mean_, ht.mean(data, axis=0)))
+        self.assertEqual(pca.singular_values_.shape, (2 * ht.MPI_WORLD.size + 1,))
+        self.assertEqual(pca.n_samples_seen_, 200 * ht.MPI_WORLD.size)
+        s_np = np.linalg.svd(data_np - data_np.mean(axis=0), compute_uv=False, hermitian=False)
+        self.assertTrue(np.allclose(s_np, pca.singular_values_.numpy()))
+
+        # test transform (only possible here, as in the next test truncation happens)
+        new_data = ht.random.rand(100, 2 * ht.MPI_WORLD.size + 1, split=1)
+        Y = pca.transform(new_data)
+        Z = pca.inverse_transform(Y)
+        self.assertTrue(ht.allclose(new_data, Z, atol=1e-4, rtol=1e-4))
+
+    def test_incrementalpca_truncation_happens_split1(self):
+        # full rank not reached, but truncation happens, split = 1
+        # dtype float64 unless on MPS
+        dtype = ht.float64 if not self.is_mps else ht.float32
+        pca = ht.decomposition.IncrementalPCA(n_components=15)
+        data0 = ht.random.randn(9, 100 * ht.MPI_WORLD.size + 1, split=1, dtype=dtype)
+        data1 = 1.0 + ht.random.rand(11, 100 * ht.MPI_WORLD.size + 1, split=1, dtype=dtype)
+        data = ht.vstack([data0, data1])
+        data0_np = data0.numpy()
+        data_np = data.numpy()
+
+        # test partial_fit, step 0
+        pca.partial_fit(data0)
+        self.assertEqual(pca.components_.shape, (9, 100 * ht.MPI_WORLD.size + 1))
+        self.assertEqual(pca.components_.dtype, dtype)
+        self.assertEqual(pca.n_components_, 9)
+        self.assertEqual(pca.mean_.shape, (100 * ht.MPI_WORLD.size + 1,))
+        self.assertEqual(pca.mean_.dtype, dtype)
+        self.assertEqual(pca.singular_values_.shape, (9,))
+        self.assertEqual(pca.singular_values_.dtype, dtype)
+        self.assertEqual(pca.n_samples_seen_, 9)
+        s0_np = np.linalg.svd(data0_np - data0_np.mean(axis=0), compute_uv=False, hermitian=False)
+        if not self.is_mps:
+            self.assertTrue(np.allclose(s0_np, pca.singular_values_.numpy(), atol=1e-12))
+
+        # test partial_fit, step 1
+        # here actually truncation happens as we have rank 20 but n_components=15
+        pca.partial_fit(data1)
+        self.assertEqual(pca.components_.shape, (15, 100 * ht.MPI_WORLD.size + 1))
+        self.assertEqual(pca.n_components_, 15)
+        self.assertEqual(pca.mean_.shape, (100 * ht.MPI_WORLD.size + 1,))
+        self.assertEqual(pca.singular_values_.shape, (15,))
+        self.assertEqual(pca.n_samples_seen_, 20)
+        s_np = np.linalg.svd(data_np - data_np.mean(axis=0), compute_uv=False, hermitian=False)
+        self.assertTrue(np.allclose(s_np[:15], pca.singular_values_.numpy()))
+
+    def test_incrementalpca_catch_wrong_inputs(self):
+        pca = ht.decomposition.IncrementalPCA(n_components=1)
+        data0 = ht.random.randn(15, 15, split=None)
+
+        # fit is not yet implemented
+        with self.assertRaises(NotImplementedError):
+            pca.fit(data0)
+        # wrong input for partial_fit
+        with self.assertRaises(ValueError):
+            pca.partial_fit(data0, y="Why can't we get rid of this argument?")
+
+        pca.partial_fit(data0)
+        # wrong inputs for transform and inverse transform
+        with self.assertRaises(ValueError):
+            pca.transform(ht.zeros((15, 16), split=None))
+        with self.assertRaises(ValueError):
+            pca.inverse_transform(ht.zeros((17, 2), split=None))
diff --git a/heat/fft/tests/test_fft.py b/heat/fft/tests/test_fft.py
index e3ff6bc0de..b0ecdc68b0 100644
--- a/heat/fft/tests/test_fft.py
+++ b/heat/fft/tests/test_fft.py
@@ -1,24 +1,41 @@
 import numpy as np
 import torch
 import unittest
+import platform
+import os
 
 import heat as ht
 from heat.core.tests.test_suites.basic_test import TestCase
 
 torch_ihfftn = hasattr(torch.fft, "ihfftn")
 
+# On MPS, FFTs only supported for MacOS 14+
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
 
+
+@unittest.skipIf(
+    is_mps and int(platform.mac_ver()[0].split(".")[0]) < 14,
+    "FFT on Apple MPS only supported on MacOS 14+",
+)
 class TestFFT(TestCase):
     def test_fft_ifft(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # 1D non-distributed
-        x = ht.random.randn(6, dtype=ht.float64)
+        x = ht.random.randn(6, dtype=dtype)
         y = ht.fft.fft(x)
         np_y = np.fft.fft(x.numpy())
+        if self.is_mps:
+            np_y = np_y.astype(np.complex64)
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, x.shape)
-        self.assert_array_equal(y, np_y)
-        backwards = ht.fft.ifft(y)
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.ifft(y)
+            self.assertTrue(ht.allclose(backwards, x))
 
         # 1D distributed
         x = ht.random.randn(6, split=0)
@@ -28,10 +45,12 @@ def test_fft_ifft(self):
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, np_y.shape)
         self.assertTrue(y.split == 0)
-        self.assert_array_equal(y, np_y)
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
 
         # n-D distributed
-        x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0)
+        x = ht.random.randn(10, 8, 6, dtype=dtype, split=0)
         # FFT along last axis
         n = 5
         y = ht.fft.fft(x, n=n)
@@ -39,7 +58,9 @@ def test_fft_ifft(self):
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, np_y.shape)
         self.assertTrue(y.split == 0)
-        self.assert_array_equal(y, np_y)
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
 
         # FFT along distributed axis, n not None
         n = 8
@@ -48,10 +69,12 @@ def test_fft_ifft(self):
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, np_y.shape)
         self.assertTrue(y.split == 0)
-        self.assert_array_equal(y, np_y)
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
 
         # complex input
-        x = x + 1j * ht.random.randn(10, 8, 6, dtype=ht.float64, split=0)
+        x = x + 1j * ht.random.randn(10, 8, 6, dtype=dtype, split=0)
         # FFT along last axis (distributed)
         x.resplit_(axis=2)
         y = ht.fft.fft(x, n=n)
@@ -75,24 +98,32 @@ def test_fft_ifft(self):
             ht.fft.fft(x, axis=(0, 1))
 
     def test_fft2_ifft2(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # 2D FFT along non-split axes
-        x = ht.random.randn(3, 6, 6, split=0, dtype=ht.float64)
+        x = ht.random.randn(3, 6, 6, split=0, dtype=dtype)
         y = ht.fft.fft2(x)
         np_y = np.fft.fft2(x.numpy())
         self.assertTrue(y.split == 0)
         self.assert_array_equal(y, np_y)
-        backwards = ht.fft.ifft2(y)
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.ifft2(y)
+            self.assertTrue(ht.allclose(backwards, x))
 
         # 2D FFT along split axes
-        x = ht.random.randn(10, 6, 6, split=0, dtype=ht.float64)
+        x = ht.random.randn(10, 6, 6, split=0, dtype=dtype)
         axes = (0, 1)
         y = ht.fft.fft2(x, axes=axes)
         np_y = np.fft.fft2(x.numpy(), axes=axes)
         self.assertTrue(y.split == 0)
-        self.assert_array_equal(y, np_y)
-        backwards = ht.fft.ifft2(y, axes=axes)
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.ifft2(y, axes=axes)
+            self.assertTrue(ht.allclose(backwards, x))
 
         # exceptions
         x = ht.arange(10, split=0)
@@ -100,6 +131,7 @@ def test_fft2_ifft2(self):
             ht.fft.fft2(x)
 
     def test_fftn_ifftn(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # 1D non-distributed
         x = ht.random.randn(6)
         y = ht.fft.fftn(x)
@@ -107,8 +139,11 @@ def test_fftn_ifftn(self):
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, x.shape)
         self.assert_array_equal(y, np_y)
-        backwards = ht.fft.ifftn(y)
-        self.assertTrue(ht.allclose(backwards, x, atol=1e-7))
+        if not self.is_mps:
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.ifftn(y)
+            self.assertTrue(ht.allclose(backwards, x, atol=1e-7))
 
         # 1D distributed
         x = ht.random.randn(6, split=0)
@@ -120,10 +155,10 @@ def test_fftn_ifftn(self):
         self.assert_array_equal(y, np_y)
 
         # n-D distributed
-        x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0)
+        x = ht.random.randn(10, 8, 6, dtype=dtype, split=0)
         # FFT along last 2 axes
         y = ht.fft.fftn(x, s=(6, 6))
-        np_y = np.fft.fftn(x.numpy(), s=(6, 6))
+        np_y = np.fft.fftn(x.numpy(), s=(6, 6), axes=(1, 2))
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, np_y.shape)
         self.assertTrue(y.split == 0)
@@ -206,8 +241,11 @@ def test_fftshift_ifftshift(self):
         np_y = np.fft.fftshift(x.numpy())
         self.assertEqual(y.shape, np_y.shape)
         self.assert_array_equal(y, np_y)
-        backwards = ht.fft.ifftshift(y)
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.ifftshift(y)
+            self.assertTrue(ht.allclose(backwards, x))
 
         # distributed
         # (following fftshift example from torch.fft)
@@ -223,8 +261,10 @@ def test_fftshift_ifftshift(self):
         with self.assertRaises(IndexError):
             ht.fft.fftshift(x, axes=(0, 2))
 
+    @unittest.skipIf(is_mps, "Insufficient precision on MPS")
     def test_hfft_ihfft(self):
-        x = ht.zeros((3, 5), split=0, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        x = ht.zeros((3, 5), split=0, dtype=dtype)
         edges = [1, 3, 7]
         for i, n in enumerate(edges):
             x[i] = ht.linspace(0, n, 5)
@@ -237,8 +277,10 @@ def test_hfft_ihfft(self):
         reconstructed_x = ht.fft.hfft(inv_fft, n=n)
         self.assertEqual(reconstructed_x.shape[-1], n)
 
+    @unittest.skipIf(is_mps, "Insufficient precision on MPS")
     def test_hfft2_ihfft2(self):
-        x = ht.random.randn(10, 6, 6, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        x = ht.random.randn(10, 6, 6, dtype=dtype)
         if torch_ihfftn:
             inv_fft = ht.fft.ihfft2(x)
             reconstructed_x = ht.fft.hfft2(inv_fft, s=x.shape[-2:])
@@ -247,8 +289,10 @@ def test_hfft2_ihfft2(self):
             with self.assertRaises(NotImplementedError):
                 ht.fft.ihfft2(x)
 
+    @unittest.skipIf(is_mps, "Insufficient precision on MPS")
     def test_hfftn_ihfftn(self):
-        x = ht.random.randn(10, 6, 6, dtype=ht.float64)
+        dtype = ht.float32 if self.is_mps else ht.float64
+        x = ht.random.randn(10, 6, 6, dtype=dtype)
         if torch_ihfftn:
             inv_fft = ht.fft.ihfftn(x)
             reconstructed_x = ht.fft.hfftn(inv_fft, s=x.shape)
@@ -260,36 +304,45 @@ def test_hfftn_ihfftn(self):
                 ht.fft.ihfftn(x)
 
     def test_rfft_irfft(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # n-D distributed
-        x = ht.random.randn(10, 8, 3, dtype=ht.float64, split=0)
+        x = ht.random.randn(10, 8, 3, dtype=dtype, split=0)
         # FFT along last axis
         y = ht.fft.rfft(x)
         np_y = np.fft.rfft(x.numpy())
         self.assertTrue(y.split == 0)
         self.assert_array_equal(y, np_y)
-        backwards = ht.fft.irfft(y, n=x.shape[-1])
-        self.assertTrue(ht.allclose(backwards, x))
-        backwards_no_n = ht.fft.irfft(y)
-        self.assertEqual(backwards_no_n.shape[-1], 2 * (y.shape[-1] - 1))
+        if not self.is_mps:
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.irfft(y, n=x.shape[-1])
+            self.assertTrue(ht.allclose(backwards, x))
+            backwards_no_n = ht.fft.irfft(y)
+            self.assertEqual(backwards_no_n.shape[-1], 2 * (y.shape[-1] - 1))
 
         # exceptions
         # complex input
-        x = x + 1j * ht.random.randn(10, 8, 3, dtype=ht.float64, split=0)
+        x = x + 1j * ht.random.randn(10, 8, 3, dtype=dtype, split=0)
         with self.assertRaises(TypeError):
             ht.fft.rfft(x)
 
     def test_rfftn_irfftn(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # n-D distributed
-        x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0)
+        x = ht.random.randn(10, 8, 6, dtype=dtype, split=0)
         # FFT along last 2 axes
         y = ht.fft.rfftn(x, axes=(1, 2))
         np_y = np.fft.rfftn(x.numpy(), axes=(1, 2))
         self.assertIsInstance(y, ht.DNDarray)
         self.assertEqual(y.shape, np_y.shape)
         self.assertTrue(y.split == 0)
-        self.assert_array_equal(y, np_y)
-        backwards = ht.fft.irfftn(y, s=x.shape[-2:])
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # precision loss on imaginary part of single elements of MPS tensor
+            self.assert_array_equal(y, np_y)
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.irfftn(y, s=x.shape[-2:])
+            self.assertTrue(ht.allclose(backwards, x))
         # FFT along all axes
         # TODO: comment this out after merging indexing PR
         # y = ht.fft.rfftn(x)
@@ -298,13 +351,14 @@ def test_rfftn_irfftn(self):
 
         # exceptions
         # complex input
-        x = x + 1j * ht.random.randn(10, 8, 6, dtype=ht.float64, split=0)
+        x = x + 1j * ht.random.randn(10, 8, 6, dtype=dtype, split=0)
         with self.assertRaises(TypeError):
             ht.fft.rfftn(x)
 
     def test_rfft2_irfft2(self):
+        dtype = ht.float32 if self.is_mps else ht.float64
         # n-D distributed
-        x = ht.random.randn(4, 8, 6, dtype=ht.float64, split=0)
+        x = ht.random.randn(4, 8, 6, dtype=dtype, split=0)
         # FFT along last 2 axes
         y = ht.fft.rfft2(x, axes=(1, 2))
         np_y = np.fft.rfft2(x.numpy(), axes=(1, 2))
@@ -313,5 +367,8 @@ def test_rfft2_irfft2(self):
         self.assertTrue(y.split == 0)
         self.assert_array_equal(y, np_y)
 
-        backwards = ht.fft.irfft2(y, s=x.shape[-2:])
-        self.assertTrue(ht.allclose(backwards, x))
+        if not self.is_mps:
+            # backwards transform buggy on MPS, see
+            # https://github.com/pytorch/pytorch/issues/124096
+            backwards = ht.fft.irfft2(y, s=x.shape[-2:])
+            self.assertTrue(ht.allclose(backwards, x))
diff --git a/heat/naive_bayes/gaussianNB.py b/heat/naive_bayes/gaussianNB.py
index 9baaa50504..2cbb10cf08 100644
--- a/heat/naive_bayes/gaussianNB.py
+++ b/heat/naive_bayes/gaussianNB.py
@@ -108,7 +108,7 @@ def __check_partial_fit_first_call(self, classes: Optional[DNDarray] = None) ->
         set on :class:`GaussianNB`.
         """
         if getattr(self, "classes_", None) is None and classes is None:
-            raise ValueError("classes must be passed on the first call " "to partial_fit.")
+            raise ValueError("classes must be passed on the first call to partial_fit.")
 
         elif classes is not None:
             unique_labels = classes
@@ -273,7 +273,7 @@ def __partial_fit(
                 raise ValueError("Sample weights must be 1D tensor")
             if sample_weight.shape != (n_samples,):
                 raise ValueError(
-                    f"sample_weight.shape == {sample_weight.shape}, expected {(n_samples, )}!"
+                    f"sample_weight.shape == {sample_weight.shape}, expected {(n_samples,)}!"
                 )
 
         # If the ratio of data variance between dimensions is too small, it
@@ -293,8 +293,12 @@ def __partial_fit(
             self.theta_ = ht.zeros((n_classes, n_features), dtype=x.dtype, device=x.device)
             self.sigma_ = ht.zeros((n_classes, n_features), dtype=x.dtype, device=x.device)
 
+            if x.larray.is_mps:
+                class_count_dtype = ht.float32
+            else:
+                class_count_dtype = ht.types.promote_types(x.dtype, ht.float)
             self.class_count_ = ht.zeros(
-                (x.comm.size, n_classes), dtype=ht.float64, device=x.device, split=0
+                (x.comm.size, n_classes), dtype=class_count_dtype, device=x.device, split=0
             )
             # Initialise the class prior
             # Take into account the priors
@@ -305,7 +309,7 @@ def __partial_fit(
                     priors = self.priors
                 # Check that the provide prior match the number of classes
                 if len(priors) != n_classes:
-                    raise ValueError("Number of priors must match number of" " classes.")
+                    raise ValueError("Number of priors must match number of classes.")
                 # Check that the sum is 1
                 if not ht.isclose(priors.sum(), ht.array(1.0, dtype=priors.dtype)):
                     raise ValueError("The sum of the priors should be 1.")
@@ -316,7 +320,7 @@ def __partial_fit(
             else:
                 # Initialize the priors to zeros for each class
                 self.class_prior_ = ht.zeros(
-                    len(self.classes_), dtype=ht.float64, split=None, device=x.device
+                    len(self.classes_), dtype=class_count_dtype, split=None, device=x.device
                 )
         else:
             if x.shape[1] != self.theta_.shape[1]:
diff --git a/heat/naive_bayes/tests/test_gaussiannb.py b/heat/naive_bayes/tests/test_gaussiannb.py
index 57fe5122bc..3918c6d4a0 100644
--- a/heat/naive_bayes/tests/test_gaussiannb.py
+++ b/heat/naive_bayes/tests/test_gaussiannb.py
@@ -23,14 +23,16 @@ def test_get_and_set_params(self):
         self.assertEqual(1e-10, gnb.var_smoothing)
 
     def test_fit_iris(self):
+        if self.is_mps:
+            dtype = ht.float32
+        else:
+            dtype = ht.float64
         # load sklearn train/test sets and resulting probabilities
-        X_train = ht.load("heat/datasets/iris_X_train.csv", sep=";", dtype=ht.float64)
-        X_test = ht.load("heat/datasets/iris_X_test.csv", sep=";", dtype=ht.float64)
+        X_train = ht.load("heat/datasets/iris_X_train.csv", sep=";", dtype=dtype)
+        X_test = ht.load("heat/datasets/iris_X_test.csv", sep=";", dtype=dtype)
         y_train = ht.load("heat/datasets/iris_y_train.csv", sep=";", dtype=ht.int64).squeeze()
         y_test = ht.load("heat/datasets/iris_y_test.csv", sep=";", dtype=ht.int64).squeeze()
-        y_pred_proba_sklearn = ht.load(
-            "heat/datasets/iris_y_pred_proba.csv", sep=";", dtype=ht.float64
-        )
+        y_pred_proba_sklearn = ht.load("heat/datasets/iris_y_pred_proba.csv", sep=";", dtype=dtype)
 
         # test ht.GaussianNB
         from heat.naive_bayes import GaussianNB
diff --git a/heat/nn/__init__.py b/heat/nn/__init__.py
index 4bac4f4f23..7da9d072a3 100644
--- a/heat/nn/__init__.py
+++ b/heat/nn/__init__.py
@@ -1,5 +1,5 @@
 """
-This is the heat.nn submodule.
+Neural network submodule.
 
 It contains data parallel specific nn modules. It also includes all of the modules in the torch.nn namespace
 """
diff --git a/heat/nn/data_parallel.py b/heat/nn/data_parallel.py
index 4f9ceee02c..a3a9a0a434 100644
--- a/heat/nn/data_parallel.py
+++ b/heat/nn/data_parallel.py
@@ -1,5 +1,5 @@
 """
-This file is for the general data parallel neural network classes.
+General data parallel neural network classes.
 """
 
 import warnings
@@ -312,7 +312,7 @@ def _reset_parameters(module: tnn.Module) -> None:
 
 class DataParallelMultiGPU(tnn.Module):
     """
-    This creates data parallel networks local to each node using PyTorch's distributed class. This does NOT
+    Creates data parallel networks local to each node using PyTorch's distributed class. This does NOT
     do any global synchronizations. To make optimal use of this structure, use :func:`ht.optim.DASO <heat.optim.dp_optimizer.DASO>`.
 
     Notes
diff --git a/heat/optim/__init__.py b/heat/optim/__init__.py
index 5e1cc5399e..fb7f869897 100644
--- a/heat/optim/__init__.py
+++ b/heat/optim/__init__.py
@@ -1,5 +1,5 @@
 """
-This is the heat.optimizer submodule.
+Optimizer module.
 
 It contains data parallel specific optimizers and learning rate schedulers. It also includes all of the
 optimizers and learning rate schedulers in the torch namespace
diff --git a/heat/optim/dp_optimizer.py b/heat/optim/dp_optimizer.py
index 5e45545349..d1f219588d 100644
--- a/heat/optim/dp_optimizer.py
+++ b/heat/optim/dp_optimizer.py
@@ -862,9 +862,7 @@ class DataParallelOptimizer:
         use blocking communications or not. will typically be overwritten by :func:`nn.DataParallel <heat.nn.data_parallel.DataParallel>`
     """
 
-    def __init__(
-        self, torch_optimizer: torch.optim.Optimizer, blocking: bool = False
-    ):  # noqa: D107
+    def __init__(self, torch_optimizer: torch.optim.Optimizer, blocking: bool = False):  # noqa: D107
         self.torch_optimizer = torch_optimizer
         if not isinstance(blocking, bool):
             raise TypeError(f"blocking parameter must be a boolean, currently {type(blocking)}")
diff --git a/heat/preprocessing/preprocessing.py b/heat/preprocessing/preprocessing.py
index 7057b22f20..442ffff933 100644
--- a/heat/preprocessing/preprocessing.py
+++ b/heat/preprocessing/preprocessing.py
@@ -443,7 +443,7 @@ def inverse_transform(self, Y: ht.DNDarray) -> Union[Self, ht.DNDarray]:
 
 class RobustScaler(ht.TransformMixin, ht.BaseEstimator):
     """
-    This scaler transforms the features of a given data set making use of statistics
+    Scales the features of a given data set making use of statistics
     that are robust to outliers: it removes the median and scales the data according to
     the quantile range (defaults to IQR: Interquartile Range); this routine is similar
     to ``sklearn.preprocessing.RobustScaler``.
diff --git a/heat/py.typed b/heat/py.typed
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/heat/regression/lasso.py b/heat/regression/lasso.py
index 7d99a72454..8e9b2d45b7 100644
--- a/heat/regression/lasso.py
+++ b/heat/regression/lasso.py
@@ -42,7 +42,7 @@ class Lasso(ht.RegressionMixin, ht.BaseEstimator):
     Examples
     --------
     >>> X = ht.random.randn(10, 4, split=0)
-    >>> y = ht.random.randn(10,1, split=0)
+    >>> y = ht.random.randn(10, 1, split=0)
     >>> estimator = ht.regression.lasso.Lasso(max_iter=100, tol=None)
     >>> estimator.fit(X, y)
     """
diff --git a/heat/sparse/factories.py b/heat/sparse/factories.py
index 0966785cdf..dbdf111f16 100644
--- a/heat/sparse/factories.py
+++ b/heat/sparse/factories.py
@@ -141,7 +141,7 @@ def sparse_csc_matrix(
     Create a :class:`~heat.sparse.DCSC_matrix` from :class:`torch.Tensor` (layout ==> torch.sparse_csc)
     >>> indptr = torch.tensor([0, 2, 3, 6])
     >>> indices = torch.tensor([0, 2, 2, 0, 1, 2])
-    >>> data = torch.tensor([1., 4., 5., 2., 3., 6.], dtype=torch.float)
+    >>> data = torch.tensor([1.0, 4.0, 5.0, 2.0, 3.0, 6.0], dtype=torch.float)
     >>> torch_sparse_csc = torch.sparse_csc_tensor(indptr, indices, data)
     >>> heat_sparse_csc = ht.sparse.sparse_csc_matrix(torch_sparse_csc, split=1)
     >>> heat_sparse_csc
diff --git a/heat/sparse/manipulations.py b/heat/sparse/manipulations.py
index 355b04cbd1..2199f32ebd 100644
--- a/heat/sparse/manipulations.py
+++ b/heat/sparse/manipulations.py
@@ -43,7 +43,11 @@ def __to_sparse(array: DNDarray, orientation="row") -> __DCSX_matrix:
     array.balance_()
     method = sparse_csr_matrix if orientation == "row" else sparse_csc_matrix
     result = method(
-        array.larray, dtype=array.dtype, is_split=array.split, device=array.device, comm=array.comm
+        array.larray,
+        dtype=array.dtype,
+        is_split=array.split,
+        device=array.device,
+        comm=array.comm,
     )
     return result
 
diff --git a/heat/sparse/tests/test_arithmetics_csr.py b/heat/sparse/tests/test_arithmetics_csr.py
index aac8ced1d5..38f23062a5 100644
--- a/heat/sparse/tests/test_arithmetics_csr.py
+++ b/heat/sparse/tests/test_arithmetics_csr.py
@@ -4,14 +4,19 @@
 import heat as ht
 
 import os
+import platform
 import random
 
 from heat.core.tests.test_suites.basic_test import TestCase
 
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
+
 @unittest.skipIf(
-    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12,
-    f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.",
+    is_mps,
+    "sparse_csr_tensor not supported on MPS (PyTorch 2.3)",
 )
 class TestArithmeticsCSR(TestCase):
     @classmethod
diff --git a/heat/sparse/tests/test_dcscmatrix.py b/heat/sparse/tests/test_dcscmatrix.py
index 595ae483cc..22386d1444 100644
--- a/heat/sparse/tests/test_dcscmatrix.py
+++ b/heat/sparse/tests/test_dcscmatrix.py
@@ -1,4 +1,6 @@
 import unittest
+import os
+import platform
 import heat as ht
 import torch
 
@@ -6,10 +8,13 @@
 
 from typing import Tuple
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
 
 @unittest.skipIf(
-    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12,
-    f"ht.sparse requires torch >= 2.0. Found version {torch.__version__}.",
+    is_mps,
+    "sparse_csr_tensor not supported on MPS (PyTorch 2.3)",
 )
 class TestDCSC_matrix(TestCase):
     @classmethod
diff --git a/heat/sparse/tests/test_dcsrmatrix.py b/heat/sparse/tests/test_dcsrmatrix.py
index 6cf86ebf87..4f5b99df64 100644
--- a/heat/sparse/tests/test_dcsrmatrix.py
+++ b/heat/sparse/tests/test_dcsrmatrix.py
@@ -1,4 +1,6 @@
 import unittest
+import os
+import platform
 import heat as ht
 import torch
 
@@ -7,9 +9,13 @@
 from typing import Tuple
 
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
+
 @unittest.skipIf(
-    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12,
-    f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.",
+    is_mps,
+    "sparse_csr_tensor not supported on MPS (PyTorch 2.3)",
 )
 class TestDCSR_matrix(TestCase):
     @classmethod
diff --git a/heat/sparse/tests/test_factories.py b/heat/sparse/tests/test_factories.py
index b9422f1d3f..84dd5e2b5d 100644
--- a/heat/sparse/tests/test_factories.py
+++ b/heat/sparse/tests/test_factories.py
@@ -1,14 +1,19 @@
 import unittest
+import os
+import platform
 import heat as ht
 import torch
 import scipy
 
 from heat.core.tests.test_suites.basic_test import TestCase
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
 
 @unittest.skipIf(
-    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12,
-    f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.",
+    is_mps,
+    "sparse_csr_tensor not supported on MPS (PyTorch 2.3)",
 )
 class TestFactories(TestCase):
     @classmethod
diff --git a/heat/sparse/tests/test_manipulations.py b/heat/sparse/tests/test_manipulations.py
index 1de090a871..97b5ab5ca9 100644
--- a/heat/sparse/tests/test_manipulations.py
+++ b/heat/sparse/tests/test_manipulations.py
@@ -1,13 +1,18 @@
 import unittest
+import os
+import platform
 import heat as ht
 import torch
 
 from heat.core.tests.test_suites.basic_test import TestCase
 
+envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
+is_mps = envar == "gpu" and platform.system() == "Darwin"
+
 
 @unittest.skipIf(
-    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12,
-    f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.",
+    is_mps,
+    "sparse_csr_tensor not supported on MPS (PyTorch 2.3)",
 )
 class TestManipulations(TestCase):
     @classmethod
diff --git a/heat/spatial/distance.py b/heat/spatial/distance.py
index 03579fbdb7..5a92e727b7 100644
--- a/heat/spatial/distance.py
+++ b/heat/spatial/distance.py
@@ -227,7 +227,7 @@ def _dist(X: DNDarray, Y: DNDarray = None, metric: Callable = _euclidian) -> DND
         If metric requires additional arguments, it must be handed over as a lambda function: ``lambda x, y: metric(x, y, **args)``
 
     Notes
-    -------
+    -----
     If ``X.split=None`` and ``Y.split=0``, result will be ``split=1``
 
     """
diff --git a/heat/spatial/tests/test_distances.py b/heat/spatial/tests/test_distances.py
index d8ce1a44ca..d5769c2009 100644
--- a/heat/spatial/tests/test_distances.py
+++ b/heat/spatial/tests/test_distances.py
@@ -238,10 +238,11 @@ def test_cdist(self):
         result = ht.array(res, dtype=ht.float32, split=0)
         self.assertTrue(ht.allclose(d, result, atol=1e-8))
 
-        B = A.astype(ht.float64)
-        d = ht.spatial.cdist(A, B, quadratic_expansion=False)
-        result = ht.array(res, dtype=ht.float64, split=0)
-        self.assertTrue(ht.allclose(d, result, atol=1e-8))
+        if not self.is_mps:
+            B = A.astype(ht.float64)
+            d = ht.spatial.cdist(A, B, quadratic_expansion=False)
+            result = ht.array(res, dtype=ht.float64, split=0)
+            self.assertTrue(ht.allclose(d, result, atol=1e-8))
 
         B = A.astype(ht.int16)
         d = ht.spatial.cdist(A, B, quadratic_expansion=False)
@@ -257,7 +258,8 @@ def test_cdist(self):
         result = ht.array(res, dtype=ht.float32, split=0)
         self.assertTrue(ht.allclose(d, result, atol=1e-8))
 
-        B = A.astype(ht.float64)
-        d = ht.spatial.cdist(B, quadratic_expansion=False)
-        result = ht.array(res, dtype=ht.float64, split=0)
-        self.assertTrue(ht.allclose(d, result, atol=1e-8))
+        if not self.is_mps:
+            B = A.astype(ht.float64)
+            d = ht.spatial.cdist(B, quadratic_expansion=False)
+            result = ht.array(res, dtype=ht.float64, split=0)
+            self.assertTrue(ht.allclose(d, result, atol=1e-8))
diff --git a/heat/tests/test_cli.py b/heat/tests/test_cli.py
new file mode 100644
index 0000000000..2979c9744b
--- /dev/null
+++ b/heat/tests/test_cli.py
@@ -0,0 +1,56 @@
+from unittest.mock import patch
+import argparse
+from heat import cli
+import io
+import contextlib
+
+class TestCLI:
+    @patch("argparse.ArgumentParser.parse_args", return_value=argparse.Namespace(info=False))
+    def test_cli_help(self, mock_parse_args):
+        stdout = io.StringIO()
+        with contextlib.redirect_stdout(stdout):
+            cli.cli()
+
+        print(stdout.getvalue())
+        assert "usage: heat [-h] [-i]" in stdout.getvalue()
+
+    @patch("platform.platform")
+    @patch("mpi4py.MPI.Get_library_version")
+    @patch("torch.cuda.is_available")
+    @patch("torch.cuda.device_count")
+    @patch("torch.cuda.current_device")
+    @patch("torch.cuda.get_device_name")
+    @patch("torch.cuda.get_device_properties")
+    def test_platform_info(
+        self,
+        mock_get_device_properties,
+        mock_get_device_name,
+        mock_get_default_device,
+        mock_device_count,
+        mock_cuda_current_device,
+        mock_mpi_lib_version,
+        mock_platform,
+    ):
+        mock_platform.return_value = "Test Platform"
+        mock_mpi_lib_version.return_value = "Test MPI Library"
+        mock_cuda_current_device.return_value = True
+        mock_device_count.return_value = 1
+        mock_get_default_device.return_value = "cuda:0"
+        mock_get_device_name.return_value = "Test Device"
+        mock_get_device_properties.return_value.total_memory = 1024**4  # 1TiB
+
+        stdout_stream = io.StringIO()
+        with contextlib.redirect_stdout(stdout_stream):
+            cli.plaform_info()
+        stdout = stdout_stream.getvalue()
+        print(stdout)
+        assert "HeAT: Helmholtz Analytics Toolkit" in stdout
+        assert "Platform: Test Platform" in stdout
+        assert "mpi4py Version:" in stdout
+        assert "MPI Library Version: Test MPI Library" in stdout
+        assert "Torch Version:" in stdout
+        assert "CUDA Available: True" in stdout
+        assert "Device count: 1" in stdout
+        assert "Default device: cuda:0" in stdout
+        assert "Device name: Test Device" in stdout
+        assert "Device memory: 1024.0 GiB" in stdout
diff --git a/heat/utils/data/_utils.py b/heat/utils/data/_utils.py
index d0a80a9c1d..a20cd2fb09 100644
--- a/heat/utils/data/_utils.py
+++ b/heat/utils/data/_utils.py
@@ -1,4 +1,5 @@
 """
+Data utilities module.
 This file contains functions which may be useful for certain datatypes, but are not test in the heat framework
 This file contains standalone utilities for data preparation which may be useful
 The functions contained within are not tested, nor actively supported
diff --git a/heat/utils/data/datatools.py b/heat/utils/data/datatools.py
index 6bc92f4b75..044195ccbe 100644
--- a/heat/utils/data/datatools.py
+++ b/heat/utils/data/datatools.py
@@ -214,7 +214,7 @@ def __init__(
 
     def __getitem__(self, index: Union[int, slice, tuple, list, torch.Tensor]) -> torch.Tensor:
         """
-        This is the most basic form of getitem. As the dataset is often very specific to the dataset,
+        Basic form of __getitem__. As the dataset is often very specific to the dataset,
         this should be overwritten by the user. In this form it only gets the raw items from the data.
         """
         if self.transforms:
diff --git a/heat/utils/data/partial_dataset.py b/heat/utils/data/partial_dataset.py
index 5b48d72efa..f06d496790 100644
--- a/heat/utils/data/partial_dataset.py
+++ b/heat/utils/data/partial_dataset.py
@@ -174,6 +174,7 @@ def Ishuffle(self):
 
     def __getitem__(self, index: Union[int, slice, List[int], torch.Tensor]) -> torch.Tensor:
         """
+        Abstract __getitem__ method.
         This should be defined by the user at runtime. This function needs to be designed such
         that the data is in the 0th dimension and the indexes called are only in the 0th dim!
         """
diff --git a/heat/utils/data/spherical.py b/heat/utils/data/spherical.py
index af57bf4637..133f25c89a 100644
--- a/heat/utils/data/spherical.py
+++ b/heat/utils/data/spherical.py
@@ -63,7 +63,7 @@ def create_clusters(
     The clusters are of the same size (quantitatively) and distributed evenly over the processes, unless cluster_weight is specified.
 
     Parameters
-    ------------
+    ----------
     n_samples: int
         Number of overall samples
     n_features: int
@@ -146,7 +146,7 @@ def create_clusters(
         for k in range(n_clusters)
     ]
     local_data = torch.cat(local_data, dim=0)
-    rand_perm = torch.randperm(local_shape[0])
+    rand_perm = torch.randperm(local_shape[0], device=device.torch_device)
     local_data = local_data[rand_perm, :]
     data = ht.DNDarray(
         local_data,
diff --git a/heat/utils/data/tests/test_matrixgallery.py b/heat/utils/data/tests/test_matrixgallery.py
index e7696c44c3..17390cb013 100644
--- a/heat/utils/data/tests/test_matrixgallery.py
+++ b/heat/utils/data/tests/test_matrixgallery.py
@@ -32,13 +32,14 @@ def test_hermitian(self):
         self.assertTrue(A_err <= 1e-6)
 
         for posdef in [True, False]:
-            # test complex double precision
-            A = ht.utils.data.matrixgallery.hermitian(
-                20, dtype=ht.complex128, split=0, positive_definite=posdef
-            )
-            A_err = ht.norm(A - A.T.conj().resplit_(A.split)) / ht.norm(A)
-            self.assertTrue(A.dtype == ht.complex128)
-            self.assertTrue(A_err <= 1e-12)
+            if not self.is_mps:
+                # test complex double precision
+                A = ht.utils.data.matrixgallery.hermitian(
+                    20, dtype=ht.complex128, split=0, positive_definite=posdef
+                )
+                A_err = ht.norm(A - A.T.conj().resplit_(A.split)) / ht.norm(A)
+                self.assertTrue(A.dtype == ht.complex128)
+                self.assertTrue(A_err <= 1e-12)
 
             # test real datatype
             A = ht.utils.data.matrixgallery.hermitian(
diff --git a/pyproject.toml b/pyproject.toml
index 5306055560..5168e89f48 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -2,5 +2,178 @@
 requires = ["setuptools"]
 build-backend = "setuptools.build_meta"
 
-[tool.black]
+[project]
+name="heat"
+dynamic = ["version"]
+description="A framework for high-performance data analytics and machine learning."
+readme = "README.md"
+authors = [
+    { name = "Markus Götz", email = "markus.goetz@kit.edu"},
+    { name = "Charlotte Debus", email = "charlotte.debus@kit.edu"},
+    { name = "Daniel Coquelin", email = "daniel.coquelin@kit.edu"},
+    { name = "Kai Krajsek", email = "k.krajsek@fz-juelich.de"},
+    { name = "Claudia Comito", email = "c.comito@fz-juelich.de"},
+    { name = "Philipp Knechtges", email = "philipp.knechtges@dlr.de"},
+    { name = "Björn Hagemeier", email = "b.hagemeier@fz-juelich.de"},
+    { name = "Martin Siggel", email = "martin.siggel@dlr.de"},
+    { name = "Achim Basermann", email = "achim.basermann@dlr.de"},
+    { name = "Achim Streit", email = "achim.streit@kit.de"},
+]
+maintainers = [
+    { name = "Claudia Comito", email = "c.comito@fz-juelich.de"},
+    { name = "Michael Tarnawa", email = "m.tarnawa@fz-juelich.de"},
+    { name = "Fabian Hoppe", email = "f.hoppe@dlr.de"},
+    { name = "Juan Pedro Gutiérrez Hermosillo Muriedas", email = "juan.muriedas@kit.edu"},
+    { name = "Hakan Akdag", email = "hakan.akdag@dlr.de"},
+    { name = "Berkant Palazoglu", email = "b.palazoglu@fz-juelich.de"}
+]
+license = "MIT"
+license-files = ["LICENSE"]
+keywords=["data", "analytics", "tensors", "distributed", "gpu"]
+classifiers=[
+    "Development Status :: 5 - Production/Stable",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Intended Audience :: Science/Research",
+    "Topic :: Scientific/Engineering",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Scientific/Engineering :: Information Analysis",
+    "Topic :: Scientific/Engineering :: Mathematics",
+    "Typing :: Typed"
+]
+
+requires-python = ">=3.10"
+
+dependencies = [
+    "mpi4py>=3.0.0",
+    "torch~=2.0,<2.8.0",
+    "torchvision~=0.15",
+    "scipy~=1.14",
+]
+
+[project.optional-dependencies]
+## IO Modules
+hdf5 = ["h5py>=2.8.0"]
+netcdf = ["netCDF4>=1.5.6"]
+zarr = ["zarr"]
+
+## Examples and tutorial
+examples = [
+    "scikit-learn~=0.24",
+    "matplotlib~=3.1",
+    "jupyter",
+    "ipyparallel",
+    "pillow"
+]
+
+dev = [
+    # QA
+    "pre-commit",
+    "ruff",
+    "mypy",
+
+    # Testing
+    "pytest",
+    "coverage",
+
+    # Benchmarking
+    "perun",
+]
+
+docs = [
+    "sphinx",
+    "sphinx_rtd_theme",
+    "sphinx-autoapi",
+    "nbsphinx",
+    "sphinx-autobuild",
+    "sphinx-copybutton",
+]
+
+[project.scripts]
+heat = "heat.cli:cli"
+
+[project.urls]
+Homepage = "https://github.com/helmholtz-analytics/heat"
+Documentation = "https://heat.readthedocs.io/"
+Repository = "https://github.com/helmholtz-analytics/heat"
+Issues = "https://github.com/helmholtz-analytics/heat/issues"
+Changelog = "https://github.com/helmholtz-analytics/heat/blob/main/CHANGELOG.md"
+
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["heat", "heat.*"]
+exclude = ["*tests*", "*benchmarks*"]
+
+
+[tool.setuptools.package-data]
+datasets = ["*.csv", "*.h5", "*.nc"]
+heat = ["py.typed"]
+
+[tool.setuptools.dynamic]
+version = {attr = "heat.core.version.__version__"}
+
+# Mypy
+[tool.mypy]
+packages=["heat"]
+python_version="3.10"
+exclude=[
+    'test_\w+\.py$',
+    '^benchmarks/',
+    '^examples/'
+]
+
+# Strict configuration from https://careers.wolt.com/en/blog/tech/professional-grade-mypy-configuration
+disallow_untyped_defs = true
+disallow_any_unimported = true
+no_implicit_optional = true
+check_untyped_defs=true
+warn_return_any=true
+show_error_codes =true
+warn_unused_ignores=true
+follow_imports = "normal"
+follow_untyped_imports = true
+
+# Ignore most the errors now, focus only ont eh core module
+ignore_errors=true
+
+[[tool.mypy.overrides]]
+module = "heat.core.*"
+ignore_errors=false
+
+
+# Ruff
+[tool.ruff]
+target-version = "py310"
+exclude = ["tutorials", "examples", "benchmarks", "scripts", "**/tests/", "doc", "docker"]
 line-length = 100
+
+[tool.ruff.lint]
+select = ["E", "F", "D", "W", "D417"]
+
+ignore = [
+    "E203",
+    "E402",
+    "E501",
+    "F401",
+    "F403",
+    "D105",
+    "D107",
+    "D200",
+    "D203",
+    "D205",
+    "D212",
+    "D301",
+    "D400",
+    "D401",
+    "D402",
+    "D410",
+    "D415",
+]
+
+[tool.ruff.lint.pydocstyle]
+convention = "numpy"
+
+[tool.ruff.format]
+docstring-code-format = true
diff --git a/scripts/numpy_coverage_tables.py b/scripts/numpy_coverage_tables.py
index e4d68e64ae..1f98c5d86a 100644
--- a/scripts/numpy_coverage_tables.py
+++ b/scripts/numpy_coverage_tables.py
@@ -534,6 +534,47 @@
 numpy_functions.append(numpy_random_operations)
 headers[str(len(headers))] = "NumPy Random Operations"
 
+# numpy fft operations
+numpy_fft_operations = [
+    "fft.fft",
+    "fft.ifft",
+    "fft.fft2",
+    "fft.ifft2",
+    "fft.fftn",
+    "fft.ifftn",
+    "fft.rfft",
+    "fft.irfft",
+    "fft.fftshift",
+    "fft.ifftshift",
+]
+numpy_functions.append(numpy_fft_operations)
+headers[str(len(headers))] = "NumPy FFT Operations"
+
+# numpy masked array operations
+numpy_masked_array_operations = [
+    "ma.masked_array",
+    "ma.masked_where",
+    "ma.fix_invalid",
+    "ma.is_masked",
+    "ma.mean",
+    "ma.median",
+    "ma.std",
+    "ma.var",
+    "ma.sum",
+    "ma.min",
+    "ma.max",
+    "ma.ptp",
+    "ma.count",
+    "ma.any",
+    "ma.all",
+    "ma.masked_equal",
+    "ma.masked_greater",
+    "ma.masked_less",
+    "ma.notmasked_contiguous",
+]
+numpy_functions.append(numpy_masked_array_operations)
+headers[str(len(headers))] = "NumPy Masked Array Operations"
+
 # initialize markdown file
 # open the file in write mode
 f = open("coverage_tables.md", "w")
@@ -558,20 +599,23 @@
 
     # Check if functions exist in the heat library and create table rows
     for func_name in function_list:
-        if (
-            hasattr(heat, func_name)
+        if (hasattr(heat, func_name)
             or hasattr(heat.linalg, func_name.replace("linalg.", ""))
             or hasattr(heat.random, func_name.replace("random.", ""))
+            or (hasattr(heat, "fft") and hasattr(heat.fft, func_name.replace("fft.", "")))
+            or (hasattr(heat, "ma") and hasattr(heat.ma, func_name.replace("ma.", "")))
         ):
             support_status = "✅"  # Green checkmark for supported functions
         else:
             support_status = "❌"  # Red cross for unsupported functions
 
-        table_row = f"| {func_name} | {support_status} |"
+        # **CHANGE 2: Create the issue URL and add it to the row**
+        issue_url = f"https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+{func_name}"
+        table_row = f"| {func_name} | {support_status} | [Search]({issue_url}) |"
         table_rows.append(table_row)
 
-    # Create the Markdown table header
-    table_header = f"| {headers[str(i)]} | Heat |\n|---|---|\n"
+    # **CHANGE 1: Add the "Issues" column to the table header**
+    table_header = f"| {headers[str(i)]} | Heat | Issues |\n|---|---|---|\n"
 
     # Combine the header and table rows
     markdown_table = table_header + "\n".join(table_rows)
@@ -581,3 +625,35 @@
     # Print the Markdown table
     f.write(markdown_table)
     f.write("\n")
+
+
+# for i, function_list in enumerate(numpy_functions):
+#     f.write(f"## {headers[str(i)]}\n")
+#     # Initialize a list to store the rows of the Markdown table
+#     table_rows = []
+
+#     # Check if functions exist in the heat library and create table rows
+#     for func_name in function_list:
+#         if (
+#             hasattr(heat, func_name)
+#             or hasattr(heat.linalg, func_name.replace("linalg.", ""))
+#             or hasattr(heat.random, func_name.replace("random.", ""))
+#         ):
+#             support_status = "✅"  # Green checkmark for supported functions
+#         else:
+#             support_status = "❌"  # Red cross for unsupported functions
+
+#         table_row = f"| {func_name} | {support_status} |"
+#         table_rows.append(table_row)
+
+#     # Create the Markdown table header
+#     table_header = f"| {headers[str(i)]} | Heat |\n|---|---|\n"
+
+#     # Combine the header and table rows
+#     markdown_table = table_header + "\n".join(table_rows)
+
+#     # write link to table of contents
+#     f.write("[Back to Table of Contents](#table-of-contents)\n\n")
+#     # Print the Markdown table
+#     f.write(markdown_table)
+#     f.write("\n")
diff --git a/setup.cfg b/setup.cfg
deleted file mode 100644
index 70371c49dd..0000000000
--- a/setup.cfg
+++ /dev/null
@@ -1,14 +0,0 @@
-[metadata]
-description_file = README.md
-
-[pycodestyle]
-max-line-length = 100
-ignore = E203,E402,W503
-
-[flake8]
-max-line-length = 100
-ignore = E203,E402,W503,E501,F403,F401
-
-[pydocstyle]
-add-select = D417
-add-ignore = D105, D107, D200, D203, D205, D212, D301, D400, D401, D402, D410, D415
diff --git a/setup.py b/setup.py
deleted file mode 100644
index 2fd4d61363..0000000000
--- a/setup.py
+++ /dev/null
@@ -1,52 +0,0 @@
-from setuptools import setup, find_packages
-import codecs
-
-
-with codecs.open("README.md", "r", "utf-8") as handle:
-    long_description = handle.read()
-
-__version__ = None  # appeases flake, assignment in exec() below
-with open("./heat/core/version.py") as handle:
-    exec(handle.read())
-
-setup(
-    name="heat",
-    packages=find_packages(exclude=("*tests*", "*benchmarks*")),
-    package_data={"heat.datasets": ["*.csv", "*.h5", "*.nc"]},
-    version=__version__,
-    description="A framework for high-performance data analytics and machine learning.",
-    long_description=long_description,
-    long_description_content_type="text/markdown",
-    author="Helmholtz Association",
-    author_email="martin.siggel@dlr.de",
-    url="https://github.com/helmholtz-analytics/heat",
-    keywords=["data", "analytics", "tensors", "distributed", "gpu"],
-    python_requires=">=3.9",
-    classifiers=[
-        "Development Status :: 4 - Beta",
-        "Programming Language :: Python :: 3.9",
-        "Programming Language :: Python :: 3.10",
-        "Programming Language :: Python :: 3.11",
-        "Programming Language :: Python :: 3.12",
-        "License :: OSI Approved :: MIT License",
-        "Intended Audience :: Science/Research",
-        "Topic :: Scientific/Engineering",
-    ],
-    install_requires=[
-        "mpi4py>=3.0.0",
-        "numpy>=1.22.0, <2",
-        "torch>=2.0.0, <2.6.1",
-        "scipy>=1.10.0",
-        "pillow>=6.0.0",
-        "torchvision>=0.15.2, <0.21.1",
-    ],
-    extras_require={
-        "docutils": ["docutils>=0.16"],
-        "hdf5": ["h5py>=2.8.0"],
-        "netcdf": ["netCDF4>=1.5.6"],
-        "dev": ["pre-commit>=1.18.3"],
-        "examples": ["scikit-learn>=0.24.0", "matplotlib>=3.1.0"],
-        "cb": ["perun>=0.2.0"],
-        "pandas": ["pandas>=1.4"],
-    },
-)
diff --git a/tutorials/hpc/2_basics.ipynb b/tutorials/hpc/2_basics.ipynb
deleted file mode 120000
index 68f73c480c..0000000000
--- a/tutorials/hpc/2_basics.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-../local/2_basics.ipynb
\ No newline at end of file
diff --git a/tutorials/hpc/3_internals.ipynb b/tutorials/hpc/3_internals.ipynb
deleted file mode 120000
index 4105ea65c6..0000000000
--- a/tutorials/hpc/3_internals.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-../local/3_internals.ipynb
\ No newline at end of file
diff --git a/tutorials/hpc/4_loading_preprocessing.ipynb b/tutorials/hpc/4_loading_preprocessing.ipynb
deleted file mode 120000
index c2010bb811..0000000000
--- a/tutorials/hpc/4_loading_preprocessing.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-../local/4_loading_preprocessing.ipynb
\ No newline at end of file
diff --git a/tutorials/hpc/5_matrix_factorizations.ipynb b/tutorials/hpc/5_matrix_factorizations.ipynb
deleted file mode 120000
index 41ae51349c..0000000000
--- a/tutorials/hpc/5_matrix_factorizations.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-../local/5_matrix_factorizations.ipynb
\ No newline at end of file
diff --git a/tutorials/hpc/6_clustering.ipynb b/tutorials/hpc/6_clustering.ipynb
deleted file mode 120000
index 8668389f7e..0000000000
--- a/tutorials/hpc/6_clustering.ipynb
+++ /dev/null
@@ -1 +0,0 @@
-../local/6_clustering.ipynb
\ No newline at end of file
diff --git a/tutorials/local/2_basics.ipynb b/tutorials/local/2_basics.ipynb
deleted file mode 100644
index 834169c76e..0000000000
--- a/tutorials/local/2_basics.ipynb
+++ /dev/null
@@ -1,780 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Heat Basics\n",
-    "\n",
-    "We have started an `ipcluster` with 4 engines at the end of the [Intro notebook](1_intro.ipynb).\n",
-    "\n",
-    "Let's start the interactive session with a look into the `heat` data object. But first, we need to import the `ipyparallel` client."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from ipyparallel import Client\n",
-    "rc = Client(profile=\"default\")\n",
-    "rc.ids\n",
-    "\n",
-    "if len(rc.ids) == 0:\n",
-    "    print(\"No engines found\")\n",
-    "else:\n",
-    "    print(f\"{len(rc.ids)} engines found\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We will always start `heat` cells with the `%%px` magic command to execute the cell on all engines. However, the first section of this tutorial doesn't deal with distributed arrays. In these cases, we will use the `%%px --target 0` magic command to execute the cell only on the first engine."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### DNDarrays\n",
-    "\n",
-    "\n",
-    "Similar to a NumPy `ndarray`, a Heat `dndarray`  (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension. \n",
-    "\n",
-    "Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "%%px \n",
-    "import heat as ht\n",
-    "a = ht.array([1, 2, 3])\n",
-    "a\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.ones((4, 5,))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.arange(10)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.full((3, 2,), fill_value=9)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Data Types\n",
-    "\n",
-    "Heat supports various data types and operations to retrieve and manipulate the type of a Heat array. However, in contrast to NumPy, Heat is limited to logical (bool) and numerical types (uint8, int16/32/64, float32/64, and complex64/128). \n",
-    "\n",
-    "**NOTE:** by default, Heat will allocate floating-point values in single precision, due to a much higher processing performance on GPUs. This is one of the main differences between Heat and NumPy."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.zeros((3, 4,))\n",
-    "a"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "b = a.astype(ht.int64)\n",
-    "b"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Operations\n",
-    "\n",
-    "Heat supports many mathematical operations, ranging from simple element-wise functions, binary arithmetic operations, and linear algebra, to more powerful reductions. Operations are by default performed on the entire array or they can be performed along one or more of its dimensions when available. Most relevant for data-intensive applications is that **all Heat functionalities support memory-distributed computation and GPU acceleration**. This holds for all operations, including reductions, statistics, linear algebra, and high-level algorithms. \n",
-    "\n",
-    "You can try out the few simple examples below if you want, but we will skip to the [Parallel Processing](#Parallel-Processing) section to see memory-distributed operations in action."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.full((3, 4,), 8)\n",
-    "b = ht.ones((3, 4,))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a + b"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.sub(a, b)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.arange(5).sin()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a.T"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "b.sum(axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "Heat implements the same broadcasting rules (implicit repetion of an operation when the rank/shape of the operands do not match) as NumPy does, e.g.:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.arange(10) + 3"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.ones((3, 4,))\n",
-    "b = ht.arange(4)\n",
-    "c = a + b\n",
-    "\n",
-    "a, b, c"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Indexing\n",
-    "\n",
-    "Heat allows the indexing of arrays, and thereby, the extraction of a partial view of the elements in an array. It is possible to obtain single values as well as entire chunks, i.e. slices."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "a = ht.arange(10)\n",
-    "a"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "a[3]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "a[1:7]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "a[::2]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**NOTE:** Indexing in Heat is undergoing a [major overhaul](https://github.com/helmholtz-analytics/heat/pull/938), to increase interoperability with NumPy/PyTorch indexing, and to provide a fully distributed item setting functionality. Stay tuned for this feature in the next release."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Documentation\n",
-    "\n",
-    "Heat is extensively documented. You may find the online API reference on Read the Docs: [Heat Documentation](https://heat.readthedocs.io/). It is also possible to look up the docs in an interactive session."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "help(ht.sum)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Parallel Processing\n",
-    "---\n",
-    "\n",
-    "Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs. \n",
-    "\n",
-    "For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem.\n",
-    "\n",
-    "**NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### GPUs\n",
-    "\n",
-    "Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-block alert-info\">\n",
-    "<b>The following cells will only work if you have a GPU available.</b>\n",
-    "\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "ht.zeros((3, 4,), device='gpu')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Arrays on the same device can be seamlessly used in any Heat operation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.zeros((3, 4,), device='gpu')\n",
-    "b = ht.ones((3, 4,), device='gpu')\n",
-    "a + b"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.full((3, 4,), 4, device='cpu')\n",
-    "b = ht.ones((3, 4,), device='gpu')\n",
-    "a + b"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It is possible to explicitly move an array from one device to the other and back to avoid this error."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px --target 0\n",
-    "a = ht.full((3, 4,), 4, device='gpu')\n",
-    "a.cpu()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll put our multi-GPU setup to the test in the next section."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Distributed Computing\n",
-    "\n",
-    "Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept.\n",
-    "\n",
-    "<img src=\"https://github.com/helmholtz-analytics/heat/blob/main/doc/images/split_array.png?raw=true\" width=\"100%\"></img>\n",
-    "\n",
-    "The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions. \n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.\n",
-    "\n",
-    "You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node.\n",
-    "\n",
-    "On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This\n",
-    "\n",
-    "```bash\n",
-    "python ./my_script.py\n",
-    "```\n",
-    "\n",
-    "becomes this instead:\n",
-    "\n",
-    "```bash\n",
-    "mpirun -n <number_of_processors> python ./my_script.py\n",
-    "```\n",
-    "On an HPC cluster you'll of course use SBATCH or similar.\n",
-    "\n",
-    "\n",
-    "Let's see some examples of working with distributed Heat:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)). "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "import heat as ht\n",
-    "dndarray = ht.arange(60).reshape(5,4,3)\n",
-    "dndarray"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.\n",
-    "\n",
-    "Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "dndarray = ht.arange(60, split=0).reshape(5,4,3)\n",
-    "dndarray"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the \"local array\" via the `larray` attribute."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "dndarray.larray"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator.\n",
-    "\n",
-    "(In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "print(f\"Global shape of the dndarray: {dndarray.shape}\")\n",
-    "print(f\"On rank {dndarray.comm.rank}/{dndarray.comm.size}, local shape of the dndarray: {dndarray.lshape}\")\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported.  \n",
-    "\n",
-    "The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px \n",
-    "# transpose \n",
-    "dndarray.T\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# reduction operation along the distribution axis\n",
-    "%timeit -n 1 dndarray.sum(axis=0)\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px \n",
-    "# reduction operation along non-distribution axis: no communication required\n",
-    "%timeit -n 1 dndarray.sum(axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Operations between tensors with equal split or no split are fully parallelizable and therefore very fast."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n",
-    "\n",
-    "# element-wise multiplication\n",
-    "dndarray * other_dndarray\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems.\n",
-    "\n",
-    "So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize. \n",
-    "\n",
-    "With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# free up memory if necessary\n",
-    "try:\n",
-    "    del x, y, z\n",
-    "except NameError:\n",
-    "    pass\n",
-    "\n",
-    "n, m = 40000, 40000\n",
-    "x = ht.random.randn(n, m, split=0, device=\"gpu\") # distributed RNG\n",
-    "y = ht.random.randn(m, n, split=None, device=\"gpu\")\n",
-    "z =  x @ y\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "`ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that:\n",
-    "- If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.)\n",
-    "- If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU.\n",
-    "- If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option.\n",
-    "\n",
-    "Time the multiplication for different split parameters and see how the performance changes.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "z = %timeit -n 1 -r 5 x @ y "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Heat supports many linear algebra operations:\n",
-    "```bash\n",
-    ">>> ht.linalg.\n",
-    "ht.linalg.basics        ht.linalg.hsvd_rtol(    ht.linalg.projection(   ht.linalg.triu(\n",
-    "ht.linalg.cg(           ht.linalg.inv(          ht.linalg.qr(           ht.linalg.vdot(\n",
-    "ht.linalg.cross(        ht.linalg.lanczos(      ht.linalg.solver        ht.linalg.vecdot(\n",
-    "ht.linalg.det(          ht.linalg.matmul(       ht.linalg.svdtools      ht.linalg.vector_norm(\n",
-    "ht.linalg.dot(          ht.linalg.matrix_norm(  ht.linalg.trace(        \n",
-    "ht.linalg.hsvd(         ht.linalg.norm(         ht.linalg.transpose(    \n",
-    "ht.linalg.hsvd_rank(    ht.linalg.outer(        ht.linalg.tril(         \n",
-    "```\n",
-    "\n",
-    "and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](tinyurl.com/demoissues) and we'll get back to you."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Interoperability\n",
-    "\n",
-    "We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows. Here a basic example with xarrays:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "import xarray as xr\n",
-    "\n",
-    "local_xr = xr.DataArray(dndarray.larray, dims=(\"z\", \"y\", \"x\"))\n",
-    "# proceed with local xarray operations\n",
-    "local_xr\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**NOTE:** this is not a distributed `xarray`, but local xarray objects on each rank.\n",
-    "Work on [expanding xarray support](https://github.com/helmholtz-analytics/heat/pull/1183) is ongoing.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "import torch\n",
-    "torch_array = torch.arange(5)\n",
-    "heat_array = ht.array(torch_array, copy=False)\n",
-    "heat_array[0] = -1\n",
-    "print(torch_array)\n",
-    "\n",
-    "torch_array = torch.arange(5)\n",
-    "heat_array = ht.array(torch_array, copy=True)\n",
-    "heat_array[0] = -1\n",
-    "print(torch_array)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](tinyurl.com/demoissues) if you encounter any issues or have any feature requests."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the [next notebook](3_internals.ipynb), let's have a look at Heat's most important internal functions."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "heat-dev-torch2",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.18"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/tutorials/local/3_internals.ipynb b/tutorials/local/3_internals.ipynb
deleted file mode 100644
index f592f77ed2..0000000000
--- a/tutorials/local/3_internals.ipynb
+++ /dev/null
@@ -1,301 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Heat as infrastructure for MPI applications\n",
-    "\n",
-    "In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Your IPython cluster should still be running. Let's check it out."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from ipyparallel import Client\n",
-    "rc = Client(profile=\"default\")\n",
-    "rc.ids\n",
-    "\n",
-    "if len(rc.ids) == 0:\n",
-    "    print(\"No engines found\")\n",
-    "else:\n",
-    "    print(f\"{len(rc.ids)} engines found\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If no engines are found, go back to the [Intro](1_intro.ipynb) for instructions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We already mentioned that the DNDarray object is \"MPI-aware\". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it. \n",
-    "\n",
-    "We will use the %%px magic in every cell that executes MPI code."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "\n",
-    "%%px\n",
-    "a = ht.random.randn(7,4,3, split=0)\n",
-    "a.comm"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# MPI size = total number of processes \n",
-    "size = a.comm.size\n",
-    "\n",
-    "print(f\"a is distributed over {size} processes\")\n",
-    "print(f\"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# MPI rank = rank of each process\n",
-    "rank = a.comm.rank\n",
-    "# Local shape = shape of the data on each process\n",
-    "local_shape = a.lshape\n",
-    "print(f\"Rank {rank} holds a slice of a with local shape {local_shape}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Distribution map\n",
-    "\n",
-    "In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array. \n",
-    "\n",
-    "The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray. \n",
-    "\n",
-    "The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "lshape_map = a.lshape_map\n",
-    "lshape_map"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Modifying the DNDarray distribution\n",
-    "\n",
-    "In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm.\n",
-    "\n",
-    "Depending on your needs, you can choose between:\n",
-    "- `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a \"target map\".\n",
-    "- `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis.\n",
-    "\n",
-    "Let's see some examples."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "#redistribute\n",
-    "target_map = a.lshape_map\n",
-    "target_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n",
-    "# in-place redistribution (see ht.redistribute for out-of-place)\n",
-    "a.redistribute_(target_map=target_map)\n",
-    "\n",
-    "# new lshape map after redistribution\n",
-    "a.lshape_map"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# local arrays after redistribution\n",
-    "a.larray"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# resplit\n",
-    "a.resplit_(axis=1)\n",
-    "\n",
-    "a.lshape_map"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# \"un-split\" distributed array\n",
-    "a.resplit_(axis=None)\n",
-    "# each process now holds a copy of the entire array"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# make `a` split again\n",
-    "a = ht.array(a, split=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Making disjoint data into a global DNDarray\n",
-    "\n",
-    "Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend.\n",
-    "\n",
-    "You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be \"joined\" into a global, distributed DNDarray."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# create some random local arrays on each process\n",
-    "import numpy as np\n",
-    "local_array = np.random.rand(3, 4)\n",
-    "\n",
-    "# join them into a distributed array\n",
-    "a_0 = ht.array(local_array, is_split=0)\n",
-    "a_0.shape"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The `ht.array` function takes any data object as an input that can be converted to a torch tensor. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can access the MPI communication functionalities of the DNDarray through the `comm` attribute, i.e.:\n",
-    "\n",
-    "```python\n",
-    "# these are just examples, this cell won't do anything\n",
-    "a.comm.Allreduce(a, b, op=MPI.SUM)\n",
-    "\n",
-    "a.comm.Allgather(a, b)\n",
-    "a.comm.Isend(a, dest=1, tag=0)\n",
-    "```\n",
-    "\n",
-    "etc."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the next notebooks, we'll show you how we use Heat's distributed-array infrastructure to scale complex data analysis workflows to large datasets and high-performance computing resources.\n",
-    "\n",
-    "- [Data loading and preprocessing](4_loading_preprocessing.ipynb)\n",
-    "- [Matrix factorization algorithms](5_matrix_factorizations.ipynb)\n",
-    "- [Clustering algorithms](6_clustering.ipynb)"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/tutorials/local/4_loading_preprocessing.ipynb b/tutorials/local/4_loading_preprocessing.ipynb
deleted file mode 100644
index 9abf4f3f55..0000000000
--- a/tutorials/local/4_loading_preprocessing.ipynb
+++ /dev/null
@@ -1,209 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Loading and Preprocessing\n",
-    "\n",
-    "### Refresher\n",
-    "\n",
-    "Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. \n",
-    "\n",
-    "As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each \"worker\" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.\n",
-    "\n",
-    "In other words: \n",
-    "- you don't have to worry about optimizing data chunk sizes; \n",
-    "- you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; \n",
-    "- you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### I/O\n",
-    "\n",
-    "Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).\n",
-    "\n",
-    "This particular example data set (generated from all Asteroids from the [JPL Small Body Database](https://ssd.jpl.nasa.gov/sb/)) is really small, but it allows to demonstrate the basic functionality of Heat. \n",
-    " "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Your ipcluster should still be running (see the [Intro](1_intro.ipynb)). Let's test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from ipyparallel import Client\n",
-    "rc = Client(profile=\"default\")\n",
-    "rc.ids"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The above cell should return [0, 1, 2, 3].\n",
-    "\n",
-    "Now let's import `heat` and load the data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "import heat as ht\n",
-    "X = ht.load_hdf5(\"/p/scratch/training2404/data/JPL_SBDB/sbdb_asteroids.h5\",dtype=ht.float64,dataset=\"data\",split=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Data exploration\n",
-    "\n",
-    "Let's get an idea of the size of the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px \n",
-    "# print global metadata once only\n",
-    "if X.comm.rank == 0:\n",
-    "    print(f\"X is a {X.ndim}-dimensional array with shape{X.shape}\")\n",
-    "    print(f\"X takes up {X.nbytes/1e6} MB of memory.\")\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "X is a matrix of shape *(datapoints, features)*. \n",
-    "\n",
-    "To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "features_mean = ht.mean(X,axis=0)\n",
-    "features_var = ht.var(X,axis=0)\n",
-    "features_max = ht.max(X,axis=0)\n",
-    "features_min = ht.min(X,axis=0)\n",
-    "# ht.percentile is buggy, see #1389, we'll leave it out for now\n",
-    "#features_median = ht.percentile(X,50.,axis=0)\n",
-    "\n",
-    "if ht.MPI_WORLD.rank == 0:\n",
-    "    print(f\"Mean: {features_mean}\")\n",
-    "    print(f\"Var: {features_var}\")\n",
-    "    print(f\"Max: {features_max}\")\n",
-    "    print(f\"Min: {features_min}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Preprocessing/scaling\n",
-    "\n",
-    "Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated.\n",
-    "\n",
-    "Again, please let us know if you're missing any features."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# Standard Scaler\n",
-    "scaler = ht.preprocessing.StandardScaler()\n",
-    "X_standardized = scaler.fit_transform(X)\n",
-    "standardized_mean = ht.mean(X_standardized,axis=0)\n",
-    "standardized_var = ht.var(X_standardized,axis=0)\n",
-    "print(f\"Standard Scaler Mean: {standardized_mean}\")\n",
-    "print(f\"Standard Scaler Var: {standardized_var}\")\n",
-    "\n",
-    "# Robust Scaler\n",
-    "scaler = ht.preprocessing.RobustScaler()\n",
-    "X_robust = scaler.fit_transform(X)\n",
-    "robust_mean = ht.mean(X_robust,axis=0)\n",
-    "robust_var = ht.var(X_robust,axis=0)\n",
-    "\n",
-    "print(f\"Robust Scaler Mean: {robust_mean}\")\n",
-    "print(f\"Robust Scaler Median: {robust_var}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data. Check out our dedicated \"clustering\" notebook for an example.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)! "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/tutorials/local/6_clustering.ipynb b/tutorials/local/6_clustering.ipynb
deleted file mode 100644
index 6e6960b405..0000000000
--- a/tutorials/local/6_clustering.ipynb
+++ /dev/null
@@ -1,787 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Cluster Analysis\n",
-    "================\n",
-    "\n",
-    "This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html). \n",
-    "\n",
-    "We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation.\n",
-    "\n",
-    "We will use matplotlib for visualization of data and results."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "4 engines found\n"
-     ]
-    }
-   ],
-   "source": [
-    "from ipyparallel import Client\n",
-    "rc = Client(profile=\"default\")\n",
-    "rc.ids\n",
-    "\n",
-    "if len(rc.ids) == 0:\n",
-    "    print(\"No engines found\")\n",
-    "else:\n",
-    "    print(f\"{len(rc.ids)} engines found\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%px import heat as ht\n",
-    "%matplotlib inline"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Spherical Clouds of Datapoints\n",
-    "------------------------------\n",
-    "For a simple demonstration of the clustering process and the differences between the algorithms, we will create an\n",
-    "artificial dataset, consisting of two circularly shaped clusters positioned at $(x_1=2, y_1=2)$ and $(x_2=-2, y_2=-2)$ in 2D space.\n",
-    "For each cluster we will sample 100 arbitrary points from a circle with radius of $R = 1.0$ by drawing random numbers\n",
-    "for the spherical coordinates $( r\\in [0,R], \\phi \\in [0,2\\pi])$, translating these to cartesian coordinates\n",
-    "and shifting them by $+2$ for cluster ``c1`` and $-2$ for cluster ``c2``. The resulting concatenated dataset ``data`` has shape\n",
-    "$(200, 2)$ and is distributed among the ``p`` processes along axis 0 (sample axis)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "\n",
-    "num_ele = 100\n",
-    "R = 1.0\n",
-    "\n",
-    "# Create default spherical point cloud\n",
-    "# Sample radius between 0 and 1, and phi between 0 and 2pi\n",
-    "r = ht.random.rand(num_ele, split=0) * R\n",
-    "phi = ht.random.rand(num_ele, split=0) * 2 * ht.constants.PI\n",
-    "\n",
-    "# Transform spherical coordinates to cartesian coordinates\n",
-    "x = r * ht.cos(phi)\n",
-    "y = r * ht.sin(phi)\n",
-    "\n",
-    "\n",
-    "# Stack the sampled points and shift them to locations (2,2) and (-2, -2)\n",
-    "cluster1 = ht.stack((x + 2, y + 2), axis=1)\n",
-    "cluster2 = ht.stack((x - 2, y - 2), axis=1)\n",
-    "\n",
-    "data = ht.concatenate((cluster1, cluster2), axis=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's plot the data for illustration. In order to do so with matplotlib, we need to unsplit the data (gather it from\n",
-    "all processes) and transform it into a numpy array. Plotting can only be done on rank 0.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "data_np = ht.resplit(data, axis=None).numpy()  "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\u001b[0;31mOut[0:13]: \u001b[0m[<matplotlib.lines.Line2D at 0x17fd07390>]"
-      ]
-     },
-     "metadata": {
-      "after": null,
-      "completed": null,
-      "data": {},
-      "engine_id": 0,
-      "engine_uuid": "e3649dd0-f970dcd5e37935a1f3fe07c8",
-      "error": null,
-      "execute_input": "import matplotlib.pyplot as plt\nplt.plot(data_np[:,0], data_np[:,1], 'bo')\n",
-      "execute_result": {
-       "data": {
-        "text/plain": "[<matplotlib.lines.Line2D at 0x17fd07390>]"
-       },
-       "execution_count": 13,
-       "metadata": {}
-      },
-      "follow": null,
-      "msg_id": null,
-      "outputs": [],
-      "received": null,
-      "started": null,
-      "status": null,
-      "stderr": "",
-      "stdout": "",
-      "submitted": "2024-03-21T09:43:55.286159Z"
-     },
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[output:0]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA3+0lEQVR4nO3dfXBV9Z3H8c8llUAVUvMAi+ZaGHTdtq7OFt0OGdklW5eRP9q4FBS6WlwfOihYkbXtqjsCTh0cQZFqsbo7o53V4BNROnbraFui7DCOpSNbt46OIBSSgCQwmyDTJtPL3T/OnuTm5jz8zrnn3HNu7vs1k4m5Ofeec6+255vf9+GXyefzeQEAACRgQtIXAAAAqheBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASMxnkr4AL6dPn1ZPT4+mTJmiTCaT9OUAAAAD+XxeJ0+e1DnnnKMJE7zXPFIdiPT09CibzSZ9GQAAIITDhw+rubnZ85hUByJTpkyRZL2RqVOnJnw1AADAxMDAgLLZ7PB93EuqAxE7HTN16lQCEQAAKoxJWQXFqgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDGxBiKPP/64Lr744uE5IHPnztXPf/7zOE8JAAAqSKwDzZqbm/XAAw/o/PPPlyT95Cc/UVtbm95991196UtfivPUAAAYy+WkXbukI0ekGTOkefOkmpqkr6o6ZPL5fL6cJ6yvr9fGjRt14403+h47MDCguro69ff3M1kVABCLjg7p9tulrq6Rx5qbpS1bpEWLkruuShbk/l22GpFcLqfnnntOp06d0ty5cx2PGRwc1MDAwKgvAADi0tEhLV48OgiRpO5u6/GOjmSuq5rEHoi89957Ouuss1RbW6sVK1bo5Zdf1he/+EXHYzds2KC6urrhL3beBYDqk8tJnZ3Stm3W91wuvvN8+9uSU17Afmz16vjOD0vsgciFF16ovXv36u2339Ytt9yi5cuX6/3333c89q677lJ/f//w1+HDh+O+PABAinR0SDNnSq2t0je/aX2fOTOelYn775eOH3f/fT4vHT5s1Y4gPmWvEbniiis0e/ZsPfHEE77HUiMCANXDTpMU35XsDVxfeim6mo1cTpo2TTpxwv/Y9nZp2bKxzy8ubpUoeLUFuX/H2jXjJJ/Pa3BwsNynBQCkWC5nFYy6pUkyGStN0tYWzc191y6zIESygopCTsWtDQ3W98IVFrvgta2NAMVLrIHI3XffrYULFyqbzerkyZN67rnn1NnZqddeey3O0wIAKsyuXWMLRgsVpknmzy/9fEeOmB3X0DCy2iG5r9o4pXi6u6VvfMN6DacAhY4cS6yByCeffKLrrrtOR44cUV1dnS6++GK99tpr+vu///s4TwsAqDCmgYHpcX6KVzncfOc7I6sXXqs2TuzjioMUuyMnylRTJSt7jUgQ1IgAQHXo7LQKU/3s3BnNikguZxXBdne7BxYNDdInn4wEIqbXaCKTsVZGDhwYn2maVM4RAQDAzbx51o3ZLkwtlslI2ezoNEkpamqs9Ij92k6efHJ0kBDVaoxER04hAhEAQOK8AgP750ceiXb1YNEiKz1y7rmjH89mpe3bx6ZNTNM5QUQZ3FQqAhEAQCq4BQZnny2tW2d1n8RxzoMHrZRPe7v1/cAB59oNe9UmSnEEN5WGGhEAQKrkctawsS1bRrfYltptEsXGdvfdJ61dG+78hagRGcGKCAAgVXbssFZAiud8lLL/S1QTWy+4wPxYe7ZIuVJNlYpABACQGn6DzaTg+79EubGdaSpl82ar42b79rGppuZmWncLkZoBAKRG1G28dpuu27C0oCkSv7Zfp9eLIiVUaUjNAAAqkmkXyfbto3fmdduxN8jEVhNhuntqaqzgY8YM6/3t2sWOvoUIRAAAqWGa+njssZE6j+99z73+I46JrW7dPW4pl3LuKFyJSM0AAFLDZOKpCXt1Yt06sy6XMBNbTVIu5dxROE2C3L8JRAAAqWLfvKXSg5Fzz7Veo6fHvKYjKlHXp5icLy21KNSIAAAqllvqI6h83goCvv1t6+dyt9Ga1qd0djrXtwRRyekfAhEAQOoUTjxdtaq017rggmA1HVExrTu5+urSAogo25OTQCACADDi1pkSl5oaq26jqam015kxI9go96iYFt6WMrgtjrkr5UaNCADAV0eHdcMr/Ku71JHrpud1KvY0Yddg7Nsn7d4dXe2EaS1GKYW3pvUjUc9diQo1IgAAIyarHEkt/Xv9te/Hrv9YulSaPds/9WG62hOkFsNr5ogf0/kmcbQnlxuBCABUKZObapJL/37FnrZrrhm7K25zs3TnndKmTf4BlGlwESYgcyu8ra/3f1+SfwBhmv5J8y6/pGYAoAqZzrdIcul/2zYrMPDT3m4VfBamS1parJUQv9bZhx6yAhm/z6HUVtzidE4uJ11xhf978/tcw4ycL4cg9+/PlOmaAAAp4bfKkclYqxxtbcku/U+bZnbcjBkjha22zk6z1tlbbzX7HIKMincKHIqvL5ezAgS/AGLePPdz2q+7ZYsVVGYyo1+rUnb5JTUDAFUmyE01qaX/jg5p+XLvYzIZKZt1vlmbBkZ9fe6/K/wcog7IwuxZ4yboyPm0IRABgCoT5KY6b97Y+gsnP/tZaddUyE4bdXe7H2PfrB9+2AoUiotMowyM7HSKiSDnjTKASKI9OSrUiABAlQla9/Hii1YNhp8XXxwZzR6WXy2GrblZWrbMCkCcWorb2vxrJxobpd5e/2vaudMKyOKqxUjTaPao0L4LAHBlr3J4tZQ2NVkFn/Y/m7j11tK7Z0w7ZW680bsjZscO/9TH1q3en0Nh6ifKVEoxu35k2TLre6UHIUERiABAlTGZb9Hba3WddHSYp3J6e/3nXvgxPddDD/m3FLe1eac+Fi8OFlxUei1GWhGIAEAVMtlYzl5d+Ogj89cttXvGtMbi00/df2cXma5bZ83r2L/fvXYiaHBRybUYaUWNCABUsaEh6ybs1j2SyVi//+MfvTtMbKXOEzGZiyEFm7ZqMop+PNZpJIkaEQCAkd27/VtYu7qklSv9X8utlTYIv1qMfD74yHeTUfTVXqeRJAIRAKhipqmUCy+Uvvtd999nMtENzvJKl6xeHfz1CutGhobKu4Mw/BGIAEAVCzIf48EHrRbd4i6abDb6Yk23Woy2tnCvZ9eNnHuu2YZ1KB9qRACgyhTWQ0ybJl1/fbD5GEnWU5jOGTFVvKcMosFeMwAARx0d1j4zhTfyhoaRvVVM9iop3jelnGpqrDqOjRujeb3iPWWoDSk/UjMAUCXctrE/ccL6Xrw1fRrnY3R0WIPMolS4pwzKjxURAKgCJjvuTp4s/eIX0rFj0aRcok7heL2HQsUrO6bi2EEY/ghEAKAKmOy429U1kvoolVMKyGSehxfT8e/Fe8g0NZntKRP1DsIwQyACAFUg6m3svdgpoOJVCXueR9h0T5Dx79nsyEpMS4s1rt6vILfUGSgIhxoRAKgCcWxj78QvBSRZhaFh5neYXtuaNVbdiz2cbOLE+DasQ+kIRACgCvjtuFu402wpTFJAYQtDTXYNlqxJscWTVNO8YV0uV91D1ghEAKAKxLmNfaE4U0AmuwYXKl55SeOGdR0d1lyUah6yRiACAFWiHKsCcaeA7PfQ2Oh9XOEOvIWrDGH3lIlj1cKtndpkb5zxhMmqAFBl4pyMarJ7bvGk1jCefVa69lrz4706dvw+jzg6gPwmxEb1OSWFyaoAAFdxTka10yeLF5tPag2jeFXHj1vHjluQcfPN0gUXSB99JK1da/56poLU0iQ1xbZcWBEBAETO6QafzVqttU1Npa/G+K28OCleZXBrMzZVyqrFtm1WTYif9vZo5rqUGysiAIBELVpk7d1SmPLo65PuuGPs6sPmzVbNh19wUrxZ3003WTUgpgpXGebNM5vSavp6QVctytVOXQkIRAAAsShMAXV0SFdfPfbG39UlLVky+jGn+gunFZawjhwxn9Jq+npB2a3IDFmjawYAEDPTPWJsxV0jbt0lYc2YEe2+MtOmBX9OudqpKwGBCACMcyatp3EO1Qq6+lA4gXVoqPQUSqGzzrLeW5jgIWppHrJWTrEGIhs2bNBll12mKVOmaNq0abrqqqv04YcfxnlKAEABk4FZcQ/VCrP6YNdfbN0a3UqIJH36qXTFFdL110sNDWaD0fy8+mr456ZxyFq5xdo1c+WVV2rp0qW67LLL9Kc//Un33HOP3nvvPb3//vs688wzfZ9P1wwAhOfWFWLffF96yfrud0ypN8XOTiu4CWPVKumxx0o7v5PC1uLiNuOg7C6gakijmApy/y5r+25vb6+mTZumN998U3/zN3/jezyBCACEYzIwy04JxD1UK0yrrW3lSulHPwp/bi+ZjFRfL02aZF1bKXbuHP/zPoIIcv8ua41If3+/JKm+vt7x94ODgxoYGBj1BQAIzmRgVleX2VCtRx8trWYk6B4xhX70o/hWGvJ56fhxqw24VIXpp2rfxC6osgUi+Xxea9as0eWXX66LLrrI8ZgNGzaorq5u+CubzZbr8gBgXImyK+SOO0qvGXErzDQR941806bSX8Oe98EmdsGVLTWzcuVK/exnP9N//dd/qbm52fGYwcFBDQ4ODv88MDCgbDZLagYAAiqlLsNJVDUjxfu69PZKa9aYFaROmCCdPh3+3HEoTF/t2BF/vU1Qce4r5CV1NSK33XabXnnlFb311luaNWuW8fOoEQGAcEw2n7NXJ0xrN+LaiC2Xs9I/d9wR3WuaKqVQtTDAaGvzrsmRrM/u4MHyFbXGsVmfqdTUiOTzea1atUodHR361a9+FSgIAQCEZzIwa8uWkWNMFI40j1JNjTR9erSvaSpIEFJ8Py2c92EyK6WrS7r//uDXGIbbELjiYXFpEGsgsnLlSj3zzDNqb2/XlClTdPToUR09elR/+MMf4jwtAEBmA7MWLZLuvDPY60ZZf2KrhD1VHnvMfd6H6Weydm38QYDXJNvCYXFpKaKNNTWTcSmPfuqpp3T99df7Pp/UDACUzqtOwK/N10kcraqltPiWi9f7DlKTk81Gn94Kcy2bN1srUXHUjqRm990yjigBALgo3HyuWJDx63FuxGankhYvdq/bmDxZSmpBPZv1ft/2JnYmn2XYHXtNma7OFNbklKt2xAl7zQBAFQuaZolzIza/Ft+og5BMxhrzbnKc3/surMkxEUd6yxYmzZVk7QiBCABUMdObVlNTedpP7b1X1q+P9zx25cCTT0rbt1srAk6yWfP3vWiR+XXHWRNjr84EGR6XZO1IWUe8B0WNCADEK5ez6gSOH3c/ZupUa97HxInlu6agdStBZbPWKocdYNh1NN3d1nttarJWZoLWTpiM1o+jBbqY3TUjBa+5iaIGKDU1IgCAynfGGdHfNL0KaIPUrYSxebN0222j35NTHY3TNdrX5zYgrLDWxa1r5Zpr4p8lYqe5iueImIgzbeSEQAQAqtiuXd6rIZL1+yiLK/0GbcV5I2xqkm691T8QcLpGu56k8PNyKvK0W6I3bnR+7U2brJWRBx8M9x5MLVpkDVqzA6dPPjEbGlfuVmpqRACgipne9KMKDkwGbcV5I+ztlWbP9i7KdLvG48fHBm1ORZ65nLXhnZeNG6UXXwx27WHYKz3LllmrQH61Iw0N8XRFeSEQAYAqZnrTLyU4sHej/Y//kG64wX/QVktL8GLLILw6RLyGgTlxKvI0TS2tXFnewlA7beT13o4ft/bMKScCEQCoYn4dFpmM/wwNL4W70X7rW1J/v/ux9gj53bvdx9NHwatDJEx9SvHoe9PVo97e6Mfl+2lr825ZzmTK3zlDIAIA44C96rBtm/Xd9EZisidN2NkhbikOP0eOuM8UaWwMfh1O3PbNKSUFZT83yOpRuQtD/WqC4tpPyAuBCABUuMJVh29+0/o+c6b5cCqTPWmCCpriKGTfyO2ZIoX7uzzySPDX81IcCJSSgrKfO2+eecA0bVr484VR7pogEwQiAFDBotpl1emmX7ipW1CdncFXQpzSQIXFlvPnu09dLfZP/2R2XHHgEWYYWPF119RY9R9pVI6aoKAIRACgQkW9y2rxTT/srIuODunqq4M/L5+Xli71Pq9JoNDcLP34x+FqX7xSVW6vI41NX114of9zJenYMbPjohJ3TVAYBCIAUKH8CiuTyPfbKzQnToR7/saN0n33ude6+NW0ZDLW7ydODF/74paqamgYW+jplr5K48qDFG9NUFiMeAeACrVtm1UT4qe93VrliFsco9nddoV1GjhWPLbd67iHH7bqONwmpNrvJ+hk1cLnzpxppcic7rLlGvXuxvTzCyvI/ZtABAAqVGenVZjqJ4q9Q0yYXk8Q9l/phasOQfeFKQ4oenulNWtG34SbmqR//EervTXo/jJu3PZ7cXpPSfAas18qAhEAqAJp+6vbdIUmqML3sWOH93h4P3Zw4HXna2yUrr02mqAk7pWHtCIQAYAqkaa/uuNYESm0fr20bt3YIML0vYZJHQUJcrzOG9fKQ1oRiABAFUnLX90mKzTnnmv9rqcn+IyR+nr3IliT1Z8wgVJa0iiVJsj9m64ZAKhwUc8ACcukI2PLFumHP3Q+xo9XJ45Jh1CYIV1h2qARDIEIAIwDUc0AKZXJlFa3Y9xkMt77oxTyCjbCtsom0QZdTQhEAACRMlmhKT5m/fqROSCF7J+/8x2zc3sFG2GmphaKe+x52P2CKt1nkr4AAMD4Y6/QBDnmooucO2IeecTqYPm3f/PvEPKaCGqnjhYvto4PWqMS5/AxpzqfKAplKwHFqgCA1PDqMImqQ8jppu8l7jZot5biSi6UpWsGADAuRdUhZAc8O3ZIzzwj9fU5H+c1UC2Kdly/luKkJ7CGRSACABi3op7LURiUPPusNXnVVhzkRJ1CSdt03KgEuX9TIwIAqCgm9SdhXm/+fGnTJv/UUPGf793d1uNhUiimBbBxF8omiUAEAID/5xbk5HLWSohTDiGft1Ioq1dbRbVBVmfSuktvOdG+CwCoaiZts7t2eRe3hp014tdSnMlY6SGvbqBKRyACAKhaHR1WsWhrq7VhX2ur9XNHx+jjwqRQTAIck2m0jzxSWYWqQRGIAACqkl3zUbzSYdd8FAYjQVMopgGOZDaNdjyjawYAUHWCts2abOjX3Czt2yc98IC0dq3zMZJ7cGGvoHR2Wj/bBbSVuBrCpncAAHgIWvNhkkJZulSaPds5CLFfU3LfQG/HDun666Uf/MD6uuIK91WU8YRABABQdcLUfHilUO6802r99ZvW6lbUGiRNNN7QvgsAqDph22YXLbJadAtnjbS0WCshQQodiota42gNrhQEIgCAyEQ99TQudttsmE30imeNdHaa71tjKwxwgqSJKmm6qilSMwCASATpFElalG2zQaaeOs0FqfbpqgQiAICSVWKNQ1Rts0GnnhYHONU+XZX2XQBASSp9B9lS00l+rb02t83xTFuD0/r5OaF9FwBQNnGNPy8Xu+Zj2bJwczu80jy29eulgwedV1mqfboqgQgAoCTVXuMguad5sllp+3bp3nu9A4lqnq5K1wwAoCTVXuNgc2rtDZLmKfX5lYoaEQBAScpV41AprcGgRgQAUEblqHGopNZgBEMgAgAoWZw1DpXYGgxzpGYAAJGJOn1S6a3B1SrI/ZtiVQBAZIrHn5eq2sefV4NYUzNvvfWWvva1r+mcc85RJpPRK6+8EufpAADjDK3B41+sgcipU6d0ySWX6LHHHovzNACAcYrW4PEv1tTMwoULtXDhwjhPAQAYx0rZJReVIVVdM4ODgxoYGBj1BQCoXtU+/rwapCoQ2bBhg+rq6oa/stls0pcEAEhYNY8/rwZla9/NZDJ6+eWXddVVV7keMzg4qMHBweGfBwYGlM1mad8FADBZtYJUbPtubW2tamtrk74MAEAKRd0ajHRIVWoGAABUl1hXRD799FPt27dv+OcDBw5o7969qq+v13nnnRfnqQEAQAWINRDZs2ePWltbh39es2aNJGn58uV6+umn4zw1AACoALEGIvPnz1eKt7IBAAAJo0YEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkpiyByNatWzVr1ixNmjRJc+bM0a5du8pxWgAAkHKxByLPP/+8Vq9erXvuuUfvvvuu5s2bp4ULF+rQoUNxnxoAAKRcJp/P5+M8wVe+8hV9+ctf1uOPPz782Be+8AVdddVV2rBhg+dzBwYGVFdXp/7+fk2dOjXOywQAABEJcv+OdUVkaGhIv/nNb7RgwYJRjy9YsEC7d+8ec/zg4KAGBgZGfQEAgPEr1kCkr69PuVxO06dPH/X49OnTdfTo0THHb9iwQXV1dcNf2Ww2zssDAAAJK0uxaiaTGfVzPp8f85gk3XXXXerv7x/+Onz4cDkuDwAAJOQzcb54Y2Ojampqxqx+HDt2bMwqiSTV1taqtrY2zksCAAApEuuKyMSJEzVnzhy98cYbox5/44031NLSEuepAQBABYh1RUSS1qxZo+uuu06XXnqp5s6dqyeffFKHDh3SihUr4j41AABIudgDkWuuuUbHjx/XfffdpyNHjuiiiy7Sf/7nf+rzn/983KcGAAApF/sckVIwRwQAgMqTmjkiAAAAXghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYmINRO6//361tLTos5/9rD73uc/FeSoAAFCBYg1EhoaGtGTJEt1yyy1xngYAAFSoz8T54uvXr5ckPf3003GeBmWWy0m7dklHjkgzZkjz5kk1NUlfFQCgEsUaiAQ1ODiowcHB4Z8HBgYSvBo4eekl6dZbpd7ekceam6UtW6RFi5K7LgBAZUpVseqGDRtUV1c3/JXNZpO+pNTL5aTOTmnbNut7Lhffub73PWnJktFBiCR1dUmLF0sdHfGdGwAwPgUORNatW6dMJuP5tWfPnlAXc9ddd6m/v3/46/Dhw6Fep1p0dEgzZ0qtrdI3v2l9nzkznoDgxReljRvdf5/PS6tXxxsIAQDGn8CpmVWrVmnp0qWex8ycOTPUxdTW1qq2tjbUc6tNR4e1CpHPj368u9t6/KWXokuV5HJWOsbP4cNW7cj8+WOf71RTQq0JACBwINLY2KjGxsY4rgWGcjnp9tvHBiGS9VgmY61OtLVFc2PftUvq6zM79siR0T93dFjX2tU18lhzs7RsmZVOKn784YelpiaCEwCoFrEWqx46dEgnTpzQoUOHlMvltHfvXknS+eefr7POOivOU49ru3aNvoEXy+fdVyfCKA4uvMyYMfLPbqs2XV3OaZ6uLunqq0c/RiEsAIxvsQYi9957r37yk58M//xXf/VXkqSdO3dqfhR3yCplGhgECSC8FAYXXpqarBUMyXvVJog4Uk0AgPSItWvm6aefVj6fH/NFEFIa08DA9Dg/8+ZZKxN+tm4dSaP4rdqYsgMZCmEBYHxKVfsuzNiBQSbj/PtMRspmR1YnSlVTY6VH3M4nSd/9rrVyYYtqNUYanWoCAIwvBCIVyA4MpLHBgf3zI49EW+S5aJGVHileGWlqkl54QXrwwdGPR7UaUyjK4AYAkA4EIhXKDgzOPXf042efLa1bZ3XMxHHOgwelnTul9nbr+5Ej1pCzYqbpnCDiCG4AAMnK5POllhPGZ2BgQHV1derv79fUqVOTvpxUyuWk+++3VkhOnBh5vNRukyhmfNx3n7R2bbjzF8pkrPdz4ACtvABQCYLcv1kRqXA7dlgrIIVBiDTSbRJmympUE1svuCD4uYvFlWoCAKQDgUgF8xtsJgXvNrFnfxR3vIQJbExTKf/6r1aa58UXx6Zzmptp3QWA8YzUTAXr7LRWK/zs3Gk22CyXs1Y+3Npug6ZI7Nfr7nYOlpxej7HvAFD5gty/Yx1ohniZdpFs325997upRz2x1e7uWbzYCjoKgxG3lEtNTTTTYAEAlYHUTAUzTX089tjoOo9czlpN2bbN+m6nbuKY2OrW3eOXcnG7RgDA+MKKSAWzW2TdUh/Furulb3xDamiQjh8fedzusIlrYuuiRVY7sWnKxW2jPPacAYDxhxqRCmcXl0rh93Wx0yQvvCDdcUewmo6ouW2UZ19jVIWr1KIAQHxo360ibqmPIOyb/po10sMPW/9cromtheLoAnISVXsyAKB0BCLjQOHE01Wrwr2GXYja1BSupiMKpsWyjz4avnYkyvZkAEDpSM3EJKmlf9OWXjft7dKyZclc/7Zt1gpFEEFqR6JuTwYAOKN9N2FJFlv29Vk30bDpC7sQNao22iABTZi9ZOyVDJPVmqjbkwEApSM1E4BJS2mSS/8dHdLVV4cLQjIZKZuVWlqia5sNWothdwEV16d4CVI7Ekd7MgCgNAQihkxuquUqtnTidW7bhP//t+1WiLp0qTR7tn/gEFdAZg9Ac7pGL4UrGV7iak8GAIRHIGLA9KYaZOk/an7nlqTTp6X1650LUe+8U9q0yf89xh2QldIF5LeS4bfiYq8KzZsX/NwAgHAIRHwEuakmufTf3W123AUXjHTYtLdb3/fts1Y3/N7jiy+WJyAr7AJqb5c2bzZ7b34rGV4rLuzyCwDJoFjVR5CbalJL/x0dVqBgeu7iQtTOTrP3eOut7sFKJmNdQ1tbNAFZ4TXmctJDD/kPWjNZybBXXJyKiR95hMmtAFBurIj4CHJTtZf+/fT1lXZNhey0kd9r2jfrXC78HjNe54gzIIt6JaN4xWXnTqtllyAEAMqPQMRHkJtqTc3IZFIvK1ZEU7BqUqAqjex8+4c/SFdcMba2I8oVmsKALMpajLCb57mxV1yWLbO+k44BgGQQiPgIelNtavJ/zePHpfvvL/3aTApUJWnKlJHzFrJrO3p7/d+jyfuSRgKyOGoxWMkAgPGHQMSHX0tpPi/ddNPIz6Zpjh/+sPRVEdNzuZ3HXkn5538eKQh1Cxy2bg0WkEW9gmFjJQMAxhcCEQN+LaVr1wZPcxw/Xnobr+m5Tp1y/51d2/Hf/y2tWyedc87o39uBw+LFwVc5WMEAAPhhr5kAcjkrpbJ27djf2Tfj55+3akBOnPB/PXtfl1KuZ+ZM724Syb+GpFBzs3TzzVabr9NIdqfx9dksHScAgBFB7t8EIgGYbpp2443W6oKfnTtL39PE7pqRRgccdoFqUHbw4pU+SWpDPwBAZQhy/yY1E4DpTJHLL5caGtyPi3KCp1cthulskUKFA8yGhpxHuVOnAQCICoFIAKbFoceOSU8+6VzYGccET7dajLa2cK9nB1Tnnmu+YR0AAGEQiAQQZKaIvVJRPOCs1K4RN06rFKYD1twUDzArxw7CAIDqQiDio3Cn2VwueAtrkl0jNTWlFcMWi3sHYQBA9WGvGQ9OHSINDSN7qxQXh0pjUy7F+7qUU0eHtaNulApHuSf1vgAA4weBiAu7G6W488Ruy62vHz2ptNRN06LuRAk6/j2oOHYQBgBUHwIRB143cXs1ZPJk6Re/sApTSw0cnFZemputAWJhAxvT8e+NjdaId1tT0+if3US9gzAAoDoRiDgwadPt6oqmBsNt5cUuDA1b2Gq6YvHQQ1Zdi70S09IizZ7tPSStuTma1mMAAChWdWB6Ey81PeG38iKFLww1XbFYs8ZKN9ndNhMnxrNhHQAATghEHARp0y2F6YC0MHvS+O0abOvrG9uSG9eGdVEp7GQqHLQGAKg8BCIO/G7iUU1GjXPlxW/X4GLFKy9Jtx676eiwBqsxaA0AxgcCEQdeN/Eo0xNxr7zYKxuNjd7H2Ssv69ZFM8o9rhULu56meBWJQWsAULnY9M5D3DvNmuye29xsrUSUEvQ8+6x07bXmx3t17Pi1GcfRAWSf12TDwVI/KwBA6dh9N0Jx7zTrtXuuFE1NRmenlcIw5XZutyDj5pulCy6QPvpIWrvW/PWCMH0PUexoDAAoDYFIhUl65cVJ8QqDW5uxqVJXLLZts2pC/LS3RzvWHgAQXJD7N3NEUmDRImun3OKVF8laCSh+zGSFpnAlZ9o06aabrBoQU4UdO/PmmU1pNX29MCsW5epkAgCUF4FIShTvSeO2z400drR8cf2F03PDOnLEfEqr6euFYXcyMWgNAMYXumZSyK075Pjx0UGINLZjxO25Yc2YEe2+MtOmhXteuTqZAADlRSCSMqab1dkKJ7AODZk/N5Pxv2nbs1LSku5I+6A1AEBwsQUiBw8e1I033qhZs2Zp8uTJmj17ttauXauhoaG4ThkJkxkYcU72DJMGsesvtm41f24+73/dl1xiXU9Li9mUVhPHjpX2/LQOWgMAhBNbjcgHH3yg06dP64knntD555+v//mf/9HNN9+sU6dOadOmTXGdtiQmMzDimpNhKyUNsn9/6ecv9Oqr1ldzs9WJsmmTFYyUUrT60UelX1dxPQ0AoHKVtX1348aNevzxx/Xxxx8bHV/O9l239tTCGRiS/zHlnvlRaPNm6Y47Sju/E/v93XmntQpUSv1Jc7O1okEtBwCMX0Hu32WtEenv71d9fb3r7wcHBzUwMDDqqxxMdsG9/fb4dsotZLpZXbH6eulLX7LqJ6JIoRSy399zz0kPPVTaa3V1jd7Ejw3sAKC6lS0Q2b9/vx599FGtWLHC9ZgNGzaorq5u+CubzZbl2kx2we3qMtsp99FHS7uZBt2sznbihLRggfTHP1rXEkcwcviwtGpV6a9lp5/YwA4AEDgQWbdunTKZjOfXnj17Rj2np6dHV155pZYsWaKbbrrJ9bXvuusu9ff3D38dPnw4+DsKIcr21DvuKP1m6tYd0tAwMkvEzYkT1vczzwx/fi+9vaW/xowZ6dzAjtUZACi/wDUifX196uvr8zxm5syZmjRpkiQrCGltbdVXvvIVPf3005owwTz2KVeNSCl1GU6iqhlx2udGsq736qtHgg4nNTXpu5HaQ8f27ZNmz07XBnZxFyEDQDVJzV4z3d3dam1t1Zw5c/TMM8+oJuBdpVyBiMkuuPbqhOl+LXHeTKMOnMqhMDirr0/XBnYmhcoEIwBgLhXFqj09PZo/f76y2aw2bdqk3t5eHT16VEePHo3rlKGZTO3csmXkGBOFe6tELcpUUlyKP8fCoWOm179jR/TXVcykUDmKImQAgLPY5oi8/vrr2rdvn/bt26fm5uZRv0vjhr92XYbT8nzhLrh33ilt3Gj+unEEDWmZdOpl7Vrpb//WeXM+0+t/5BHreXGuRpgUKttFyNOne280CAAIrqxzRIIq5xwRm1Ndhn3TsVM4QeZoxJFe8EslScnWiDQ0SJ984n6zNv0cy1Ersm2b1bETBLUjAOAtFamZSmVP7Vy2zPpeeAMMMn49kxnZqyWOa/Rr8b3ggmjPabI3jX3ck096H1t4/V7iTG/ZwqwuJdnZAwDjDYFIAEHTLHHuBuvW4mv74IPozmUHO2vWWP/sFvw0NJgXdi5aZNVemIizJibMADlqRwAgOgQiAZj+9Tx1ank6LewN4Navj/c8dqHpgw86Bz/19dY1fPJJsPfc1mZ2XJw1MWEHyJVjtQYAqgE1IgGY1jace670+99HtxoSdd1KEJs3S7fdNvq9eF2P6XUXXrtXrUu59qZxmiNior3dSuMBAEZQIxKTmhrp5pv9j+vuju4vZb8x6EHqVoJqapJuvXVsEOBUR1M8lfTFF/3Ht5vUivzv/5anjddeXdq50wouNm82e14ldDABQJqxIhKQaZdFFH8pmwzaGhwM3vURhEmHiOlqgtuAsO99z78levv28napmKzW+HUHAUC1YkUkRqZ/AZfyl3IuJ/3yl9bqi9+grWnTwp/HhF+HiNueMU6cijxzOSu483P77eUtDLVXa7zC9OPHy7NaAwDjGYFIQH5dFqW27dqpmCuu8N5Lxi6WlIJ3fQTh1SHiNZXU6/UKizxNU0tdXeUvDG1r895kMJOhcwYASlWVgUgpu6yajIMP27YbZHXBduxYuK6PINw6REqpT7FbcoO05pZ7tP2uXdaqhxs6ZwCgdFUXiPgVf5pwm+FRuJ9KUGFWFyQrBeR2PfX1wa/DS3EgUEpgYKeugqSwyl0Yavr+KmHvHwBIq9j2mkkjt+JPuw4iSBCxaJG1dG/Sxmoi6OqCPf7cTgE5XU8uZ6V4/Fx5pfTaa/7HFQcCYQKD4uueN88KoLq7vZ83YYLU0hL8fKUoRz0QAFS7qlkRiWOXVa9x8EGv7Ze/DPacfF5aunT0OYuvZ/58//qR5mar4DJM3UvQqaROqauaGunb3/Z/7unT0u7dZueJStz1QACAKgpETHdZLXe+304V/eAHwZ+7aZN3SsmvniWTsX4/cWK4upegU0ndUlem++KUOwUSZz0QAMBSNYFIGvP9YYpTC+Xz0ooV0rPPuhfdmtazhK17cXteNmsNNbMHhO3cae2i6/Q6aU6BxFEPBAAYUTUDzTo7rcJUPzt3WimNuMUxmt1r+FgpY9mlkcfsuSXHjo1+HdPXd+I3PMyuKzlwILnVh1LeHwBUmyD376oJRNJ2szMNjIJwmlxa6g3Ub2pqY6N07bVWoWwpN2d7dUga/e/HbRorACC9mKzqIG35/jhSQMVFt6W2Kpukjvr6rM8tTBt0IVIgAFCdqmZFxOb0F342a91My3mzM10RmTJFOnky+OuvXy+tW+e9T43X+w2TOopi9YIUCABUPlIzPtJwszNNFT30kHTNNdZjQf5N1de7j4g3SUOFTR2loZ4DAJAsUjM+opr/Ueo1mKSKlixxTln4MdmnxqtVOWzqiLHnAIAgqjIQSYsgrbUHD1odPc88IzU1eQ/Z8tqorZBXsFFqq2zcbdCl7BcEAEiPqhrxnkamo+LtVRxJmjzZKiLNZJw7TL7zHWntWv9zewUb9lRRt9RRKa9dKqc6H6/WZQBAelVljch44FV029YWTauyW0utl7hrRNz2C6LNFwDSg2LVKuFVdBvVXA6/OSKF4phjUsivk4dCWQBIBwIRSIquVbkwmLAnq776qjVavrfX/bWjTqGkbTouAMAZgQiGxdmqbLIiE2UKZds2azCbn/Z2qyMKAJCMIPdvilXHucIi13K9di5nrYQ4hbj5vBWMrF5t1bIECYrSvDkeACAc2ncRiEnb7K5d3jUlYWeN2J08Xq3L2ezIRn0AgPQjEIEx071rTGeIFB5nEuCkbb8gAEDpCERgxG0DvO5u6/HCYCRoCiXI5nxsjgcA4wvFqvAVtG3WdB+dffukBx5wHr7mV9Sahv2CAADO2GsGkQpa82GSQlm6VJo9230CrB3ArF7tnqaZN88KQo4csc7NmHcAqDwEIvAVpubDK4Vy553Spk3+Q9K8ilqDpHMAAOlF+y58hW2bddpHp6XFWgkJkhAsDoTcZpTY9SrUigBA5aBGJEGVUudgWvNhMlrddDpqocJJqYx5B4D0o0akAlRSaiHKtlnTNI/92sVzQeKaUQIASAaBSAKCtMKmRVRts0GnnhYHOGHqVQAA6UUgUmZ+488l906RpC1aJB08aKVK2tut7wcOBKvH8JuOanMLcBjzDgDjC4FImVV6asHeX2bZMut70DoMrzSPbf16K+BxCnAY8w4A4wuBSJmRWnBP82Sz0vbt0r33ugc4jHkHgPGFQKTMSC1YSknzMOYdAMYP2nfLLMpWWL/zVEJrcCmq4T0CQCUKcv9moFmZ2amFxYutoKMwGIkqtdDRYRXEFtaiNDdb5x1PqwV2vQoAoHKRmklAnKmFSmwNBgBUL1IzCYo6tcDUUQBAGpCaqRBRpxaCtAaT0gAApEGsqZmvf/3rOu+88zRp0iTNmDFD1113nXp6euI8ZVWjNRgAUGliDURaW1v1wgsv6MMPP9T27du1f/9+LV68OM5TVjVagwEAlaasNSI//elPddVVV2lwcFBnnHGG7/HjvUYkauVqDQYAwEsqd989ceKEnn32WbW0tLgGIYODgxoYGBj1BXNMHQUAVJrYA5Hvf//7OvPMM9XQ0KBDhw5px44drsdu2LBBdXV1w1/ZbDbuyxt3mDoKAKgkgVMz69at0/r16z2P+fWvf61LL71UktTX16cTJ07o97//vdavX6+6ujq9+uqryjjsWjY4OKjBwcHhnwcGBpTNZknNhMDUUQBAUoKkZgIHIn19ferr6/M8ZubMmZo0adKYx7u6upTNZrV7927NnTvX91zUiAAAUHlinSPS2NioxsbGUBdmxzyFqx4AAKB6xTbQ7J133tE777yjyy+/XGeffbY+/vhj3XvvvZo9e7bRaggAABj/YitWnTx5sjo6OvTVr35VF154oW644QZddNFFevPNN1VbWxvXaQEAQAWJbUXkL//yL/WrX/0qrpcHAADjALvvAgCAxBCIAACAxBCIAACAxBCIAACAxBCIAACAxMTWNRMFewAam98BAFA57Pu2yfD2VAciJ0+elCQ2vwMAoAKdPHlSdXV1nscE3mumnE6fPq2enh5NmTLFcZM8J/ZGeYcPH2Z/mgD43ILjMwuHzy04PrNw+NyCi+ozy+fzOnnypM455xxNmOBdBZLqFZEJEyaoubk51HOnTp3Kf3gh8LkFx2cWDp9bcHxm4fC5BRfFZ+a3EmKjWBUAACSGQAQAACRm3AUitbW1Wrt2LRvrBcTnFhyfWTh8bsHxmYXD5xZcEp9ZqotVAQDA+DbuVkQAAEDlIBABAACJIRABAACJIRABAACJGfeByNe//nWdd955mjRpkmbMmKHrrrtOPT09SV9Wah08eFA33nijZs2apcmTJ2v27Nlau3athoaGkr601Lv//vvV0tKiz372s/rc5z6X9OWk0tatWzVr1ixNmjRJc+bM0a5du5K+pNR766239LWvfU3nnHOOMpmMXnnllaQvKfU2bNigyy67TFOmTNG0adN01VVX6cMPP0z6slLt8ccf18UXXzw8yGzu3Ln6+c9/XpZzj/tApLW1VS+88II+/PBDbd++Xfv379fixYuTvqzU+uCDD3T69Gk98cQT+t3vfqfNmzfrxz/+se6+++6kLy31hoaGtGTJEt1yyy1JX0oqPf/881q9erXuuecevfvuu5o3b54WLlyoQ4cOJX1pqXbq1Cldcskleuyxx5K+lIrx5ptvauXKlXr77bf1xhtv6E9/+pMWLFigU6dOJX1pqdXc3KwHHnhAe/bs0Z49e/R3f/d3amtr0+9+97v4T56vMjt27MhnMpn80NBQ0pdSMR588MH8rFmzkr6MivHUU0/l6+rqkr6M1Pnrv/7r/IoVK0Y99hd/8Rf5f/mXf0noiiqPpPzLL7+c9GVUnGPHjuUl5d98882kL6WinH322fl///d/j/08435FpNCJEyf07LPPqqWlRWeccUbSl1Mx+vv7VV9fn/RloIINDQ3pN7/5jRYsWDDq8QULFmj37t0JXRWqRX9/vyTx/2OGcrmcnnvuOZ06dUpz586N/XxVEYh8//vf15lnnqmGhgYdOnRIO3bsSPqSKsb+/fv16KOPasWKFUlfCipYX1+fcrmcpk+fPurx6dOn6+jRowldFapBPp/XmjVrdPnll+uiiy5K+nJS7b333tNZZ52l2tparVixQi+//LK++MUvxn7eigxE1q1bp0wm4/m1Z8+e4eO/+93v6t1339Xrr7+umpoafetb31K+ygbKBv3MJKmnp0dXXnmllixZoptuuimhK09WmM8N7jKZzKif8/n8mMeAKK1atUq//e1vtW3btqQvJfUuvPBC7d27V2+//bZuueUWLV++XO+//37s5/1M7GeIwapVq7R06VLPY2bOnDn8z42NjWpsbNSf//mf6wtf+IKy2azefvvtsiw5pUXQz6ynp0etra2aO3eunnzyyZivLr2Cfm5w1tjYqJqamjGrH8eOHRuzSgJE5bbbbtNPf/pTvfXWW2pubk76clJv4sSJOv/88yVJl156qX79619ry5YteuKJJ2I9b0UGInZgEYa9EjI4OBjlJaVekM+su7tbra2tmjNnjp566ilNmFCRC2eRKOW/NYyYOHGi5syZozfeeEP/8A//MPz4G2+8oba2tgSvDONRPp/XbbfdppdfflmdnZ2aNWtW0pdUkfL5fFnulRUZiJh655139M477+jyyy/X2WefrY8//lj33nuvZs+eXVWrIUH09PRo/vz5Ou+887Rp0yb19vYO/+7P/uzPEryy9Dt06JBOnDihQ4cOKZfLae/evZKk888/X2eddVayF5cCa9as0XXXXadLL710eKXt0KFD1B/5+PTTT7Vv377hnw8cOKC9e/eqvr5e5513XoJXll4rV65Ue3u7duzYoSlTpgyvxNXV1Wny5MkJX1063X333Vq4cKGy2axOnjyp5557Tp2dnXrttdfiP3nsfTkJ+u1vf5tvbW3N19fX52tra/MzZ87Mr1ixIt/V1ZX0paXWU089lZfk+AVvy5cvd/zcdu7cmfSlpcaPfvSj/Oc///n8xIkT81/+8pdppzSwc+dOx/+uli9fnvSlpZbb/4c99dRTSV9aat1www3D/9tsamrKf/WrX82//vrrZTl3Jp+vsqpNAACQGtWb/AcAAIkjEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIn5P0cWyeHXPnRsAAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {
-      "engine": 0
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px --target 0\n",
-    "import matplotlib.pyplot as plt\n",
-    "plt.plot(data_np[:,0], data_np[:,1], 'bo')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the\n",
-    "initial centroids."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:3] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[ 2.0065,  2.0425],\n",
-       "          [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:2] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[ 2.0065,  2.0425],\n",
-       "          [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:0] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[ 2.0065,  2.0425],\n",
-       "          [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:1] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[ 2.0065,  2.0425],\n",
-       "          [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px\n",
-    "kmeans = ht.cluster.KMeans(n_clusters=2, init=\"kmeans++\")\n",
-    "labels = kmeans.fit_predict(data).squeeze()\n",
-    "centroids = kmeans.cluster_centers_\n",
-    "\n",
-    "# Select points assigned to clusters c1 and c2\n",
-    "c1 = data[ht.where(labels == 0), :]\n",
-    "c2 = data[ht.where(labels == 1), :]\n",
-    "# After slicing, the arrays are no longer distributed evenly among the processes; we might need to balance the load\n",
-    "c1.balance_() #in-place operation\n",
-    "c2.balance_()\n",
-    "\n",
-    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
-    "        f\"Centroids = {centroids}\")\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's plot the assigned clusters and the respective centroids:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "# just for plotting: collect all the data on each process and extract the numpy arrays. This will copy data to CPU if necessary.\n",
-    "c1_np = c1.numpy()\n",
-    "c2_np = c2.numpy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[output:0]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA98klEQVR4nO3df5RU1Z3v/U9Vd+jGQLdg+4uHRml+iRq9KiY2o3NFCALRiFEnetc4zk3ijD62V+R55kYljz8mEsyMEY2ORk0uybor0RgRyWVABBN/ZBEu4siYCwINTdIkHVQk6QYmNOmu8/zRvQ+nTp1Tfar6nDqnut6vtVihq6vq7Kpk5nzY+7u/O2VZliUAAIAYpOMeAAAAqFwEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbKrjHkA+mUxGHR0dGjlypFKpVNzDAQAAAViWpYMHD2rMmDFKp/PPeSQ6iHR0dKixsTHuYQAAgCLs3btXY8eOzfucRAeRkSNHSur7IHV1dTGPBgAABNHV1aXGxkb7Pp5PooOIWY6pq6sjiAAAUGaClFVQrAoAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiE2kQeSpp57SOeecY/cBaW5u1po1a6K8JAAAKCORBpGxY8fqoYce0ubNm7V582Zddtlluuqqq7R169YoLwsAAMpEyrIsq5QXHD16tP75n/9ZX/7ylwd8bldXl+rr69XZ2UlnVQBA6I6sXyql0qqdeUfu7157TLIyqp11ZwwjK2+F3L9LViPS29ur559/XocPH1Zzc7Pnc7q7u9XV1ZX1BwCAyKTS6l7/SF/ocDjy2mPqXv+IlKKUMmqRf8O/+tWvNGLECNXU1OiWW27RihUrdOaZZ3o+d8mSJaqvr7f/cPIuACAqR9YvlSTVzFqYFUZMCKlqavacKUG4Il+aOXr0qNrb2/XHP/5Ry5cv13e/+1298cYbnmGku7tb3d3d9s/m9D6WZgBg6Cv1MokJHDWzFkpS3wxI1TCp96ikvoBCEClOopZmhg0bpokTJ2ratGlasmSJzj33XD322GOez62pqbF32HDiLgBUmBIvk9TOvMOeDZFECIlJdakvaFlW1qwHAACS7Bu/CQa1M+/ImrWIIhi4rzkQr1kb85iknFkbCl4HFmkQueeeezR37lw1Njbq4MGDev755/X666/rlVdeifKyAIAy5QwG3T9/Quo9WtrZiaphqpnRkhWGsvTP2mT9zvGYWeaRpEPPfFG9ezZmPWYQUI6JNIh88MEHuvHGG/X73/9e9fX1Ouecc/TKK6/os5/9bJSXBQCUsdqZd9ghRFXDIg0h9rKPlLM04xVGvGZt/N63d8/GvNf0CiiVKNIg8r3vfS/KtwcADEFHXnvMDiHqPaojrz0WWRjp2b1B0rGaEGdIqJm1ULIyOa/xm7XJ95h5XdRLTeWo5DUiAAD4cd+onTMWYd+4zayFMxQ4Q0bNrIWeSyd2TYiZQXHM2nQ7QpRzvLEtNZUBgggAIBG8ZguCLoUUxcp4hgL7Z4/ZEElZNSHOWZueto2S1SulqrJmckq51FSOCCIAgGQoNhgUKV+haNCwUDOjRdKxoFTVNF0jbn4uu/ZEKtlSUzkiiAAAEsEEA68tsubvce828WyClqqyf1/ddJGk3JmcUiw1lSuCCAAgWby2yKr43Sahdmx1zdrYSy6pKtXMvMNz1sbZKj7SpaYyRRABACRK6I3NQgw2Oc3KHEsu7t+XeqmpXBFEAACJE2Zjsyg6tgbZ3RNGDUolIIgAABIpzN0mYQabku/uGeIiP/QOAIBiuJc+Dj17Q+5z1i/VoWdv6KsD8Xi98/HamXd49v4oWJ4lF3cTtCPrl+Yc4uc3vkpFEAEAJI5z1qH+wVZVNU1Xb9uGnDDSs2eTets2qGfPJs/XO0/s9erYWozaWXf6hpjamXdkL8mU+EThcsQ3AABIFK+ljxE3P5cTRo689ph62zbYj5ubvdfr3cHGnCVTbBgJysySOK9Fm/dsKcuyrLgH4aerq0v19fXq7OxUXV1d3MMBAJRAvu22h569Qb1tG+xZjZxiUdfjkv+Nv5SBIN/4Bv3eYW5PDkkh929mRAAAiZJv6WPEzc951nnkrf8ooKYjKl7jC61+pMyXf5I9OgBARXPfrHMKWJ/5oufjztcUVNMREc/xhRQgyn35h+27AIC8Yp36dx4wJ2U1IOte/4h692y0l2uS2kbdr+dIzayFdoCQBtffJMztyaVGEAEA5Bdyy/VCeJ3ZYn6umbVQPW0b7YJVr54ePW0b+5ZzXIoJUMUEsoF6jjjDSBiN28rxlF+WZgCgAhVSnxD31H/tzDtU1dQsqe9sF+d1q8d/WlVN01U9/tMer8neTeP8fGbpo6A6jWKWUgLUp4TV3ySs7cmlxowIAFSiAmc54p76H3Hz8+r82qTcItU8Mxojbn4uZ5kmSGt2yft7KKZVfJA2714BotDvNejnSiKCCABUoKJuqjFO/Rd7sx4oQBX6PYQdyMIIEOXecp4gAgAVqpCb6pH1S/u6l3qEgbALVt21GO7ZiZ62XxZ0gx0oQBUaLsIKZKEFiDI/5ZcgAgAVLOhN1bRST41qVN1//4V9E3UWix5ZvzScMDLATpmaWQtV3dQc+GYdZDalkHARxlKKpNACRLmf8kuxKgBUsCAFjqaVempUo6w/7NWhZ2/IKgZNjWrs63YaUuMsZ3Fsz+4NOSGkduYdkpVRVdN0z5u1s8g0aGv3oIWeYbaKT0J/kyRgRgQAKlTg+gTHv9xNz47Oe5okq9cOJ2EXrjqXKHrb/y13uSSVVm/bBlU3XeT7mYIufRRTwFqOtRhJRRABgAqUVXdhZbKWF9w3afN3qW8nigkhkiIJIYbfcompIfFrBlbV1Gy3UB9o6aOgcFHmtRhJRRABgErkuKl6zgC4btJGX2Fq77H3SVVFNgvgW4vRX0Pi1QxMkqr7e44E2jobIKzYj5V5LUZSEUQAoAI5b6pBt7Aem3GY3l8TUiVZvTr07A2e3UsHI8hyiR2STDMwKavDqvv9vHb2EC7iRxABAAy4hdUdQtw1I2GGkUKWS5y7ayTZnVSDNGlDMrBrBgAgyfuoelv/LhVnCJH6akb8WqkXLUBb9BxVw7J215TjKbSVKmVZlhX3IPx0dXWpvr5enZ2dqquri3s4ADCk2csf/WHEffOO9RRev7FKWeOV5P24lUnM2CtBIfdvZkQAAIH6YySp70XP7g2SlDNe85hSVdkzO8UcWIeSoEYEACqE34yGe9urlNz+GKbVfO+ejTk1JD1tG+06Flm9np1PCzlbB6VBEAGASuFz4q6ZXTDbXo1E9sfob2Tm3h3j7P7qrGPx220TxwnC8EYQAYAK4Tcr4J5d8HpNUjg/g/PQPa8dPe7nm5/jOkEY3ihWBYAKM1BR6qDfvwRFrV6fIUhBqj0rFNFnRx+KVQEAvvJu0w1DCQpDvT7DQMW0zo6sgz2wDuFhaQYAKkxox9j7KEVhqP0Z+nfHeDUxc868cGBdchFEAKCCBD5xd5CiLAx114RUNU33DT02DqxLLGpEAKBC+M1KRLmNtfNrk+zZl/oHWwf9fn5ByhlK3AWrSZWkBnFho0YEAJCrmNbpg+C1BDRors9gxm4O4SuXECKJJmv9WJoBgApRypNmo1oC8voMWVty89y8i5lliHLWgiZrfQgiAIBQlbow1D3zYq7xC+ss/b933qGHlz6mi1Nbc+pGgoQMvyZwYZ3oS5M1gggAIGwlLAz1m3k5su5bWrQypX/f3q5Ft39Za6+yVPvZ/yd7TK6QYYKJpJwg1b3+EfXs3qARf/fj0GctKr3JGkEEABAqr6WKKJY48s28rH5msTZt/0j/94Uj9eTb7Xrrpr/XlT7ByD6t1xFM7GDTf7aNJPXu2WgX35qZkCPrl4bSnC3K7dRJVxmVMACAeHkUZh5Zv1SHnr3BszDzyGuP9YWX/ueZ1zn/LivTf8BdJus1NZf9N33zveG6cMwwLb7seF04Zpi+seLf5LVJ1BS72ksjLj17Ntm7cZwN1CSFUlAa5NTjoY4ZEQBA5LxqRJw3+bz1F46ZCvffe9s2qLrpomPbeMdfpHWbb9Wm7e1a/lcnKpVK6a6L63XNC29r1Tdv1WenTcmZwXAvjdTMaMmq2TBbgiVl1aEMdmmGJmt9CCIAgJLwKsw0N3n3AXZ+N+eaWQvtWQNJdlgxvUSqxn9GX3/gq7pwzDBdNr5WknTZ+FpdOGaYvv74Ms36wTdzxuVeGpHkmv3wb7c1qJ0zNFmTFHFDsyVLluill17S9u3bNXz4cE2fPl3f/OY3NWXKlECvp6EZABQv0K4QqeRNtdxNzoIewud+niT77ybQvPbhCF3zP7Zp+V+dqJlNw+3Xvtb2J13zwkf66ZK/15V3fcd+v57dG7JOH3ZuM866jtTXTt7qzZo1kVRxu1yCSExDszfeeEO33XabNm7cqHXr1qmnp0ezZ8/W4cOHo7wsAEAK1jCrxE21fAszAxzC536e8+/V4z8tHT9WD72yK2s2xLhsfK0u/L9q9OAPVsuyLPvzOUOIW82MFlU1NUuSUqMa+0JI/9k2PW0bJUlV4y8ihAxSpEszr7zyStbPy5Yt00knnaR33nlHf/mXfxnlpQGg4hXSMKsUTbX8ttr2tG0MtGsk3xJKz55Neu3fWvV2x1G7NsQplUrprr+o0zUv7NWqb96qv+xco6qmZlU3NUtWRoee+aJdc2LXplgZVTc1q7djq6w/7FVqVKOsP+y1O7hWNU3XiJufC+37qVQlrRHp7OyUJI0ePdrz993d3eru7rZ/7urqKsm4AGCoCtIwqxRNtfxqP3raNto39erxn1bPnk2ehZqHnr3BDglmOUXqm7WQpCPrvqWH3u71nA0xnLUib/zgmxo+a4E9NvN+zsJZ5zJNalSjhp1/jbpff9I+9ddZ24LilWz7rmVZWrhwoS6++GKdffbZns9ZsmSJ6uvr7T+NjY2lGh4ADFlBlj6ynpOq8p2RMFtqC+ZRmHnktceyQoiZkTCn6ZrlIhNCUseP7Qsu/aEhNarRDgqvH2nS23sO6K6L63NmQwyzg+btjqN6s3dq3xj6P0/V+IskyQ4XXiFEqfSxGRmrN2vrsHkvv223g/ruhriSBZGWlha99957eu45/2msu+++W52dnfafvXv3lmp4ADBkBTl8zn5Of0HmoWdvyPn9YGpGamfdmRtu+sPJiJufs39vDrAzN3lnWBk27a/sv1c1TbeXS7rf/rG+8ZNf5p0NMcysyL0L/k5/Wv+oeto25nyu7vWPZIUQ6w977ZkaZ7+PvoP2HN8Hh9gVpSRLM7fffrt++tOf6s0339TYsWN9n1dTU6OamppSDAkAKkKQw+fczzEzEIeevUEjbn4uspoRvwPspL4w0PubzZ7LRM4eI71tG/Ra2598a0PcjvUVadeaZ7+heX+3SNUTptvbf+1+IX3PlvWHvfbjA/X7SNIhdlEe1he2SIOIZVm6/fbbtWLFCr3++usaP358lJcDADgEaZhl/u6sjRhx83N2GOm8+3RJVklvpPnOXskaf9UwWZalh37RqfHHV+uE46q0Zd9Rv7e1nXBclcYfX62HftGpuTdbdq2I8zvpY9nLRtVNubtjvPp91M68Qz27N/jW25QsBER8WF+YIg0it912m370ox9p5cqVGjlypPbt2ydJqq+v1/Dhwwd4NQBgUAI2zHI3FZP6wogJIX41I1EZ6OwVZ1A52it1HOzV7w726tLv7yvoOn+urtPRo39WvruR6dyad0uxS/WE6X11LK4gVcoQkKTZmYFEGkSeeuopSdKll16a9fiyZcv0t3/7t1FeGgAqXr5/dbtvRO4lm74aEcuuGSnV7pCgS0lm+25NdUprbzxZ+//D1YU0Xa2Rt62UJHVv+pFkWTr69vN94av/dyeddJLqx47NbmKmVNbnlgbZbt1Rk1PqEFCK3VBhiHxpBgCQfFk3rdces3eFOGtEnM+LwkBLSUff+YnSoxrtOg4zkzO2rlqN48bKOtIlHTnW9qHmD2/1vf788/uWmk6utgNGzR/eUu35fe/ds7vv/Uxhqnlf8599zcyCtVt3z3o4C1/jCAHuZS5TAJyk2hHOmgEASOq/afWHEKWq7GZdJTuIzbGU5Cy2dPYb6f3DXqm2Tqnh9VlhwfrjbyUdCxNmvD1tG5U50C7rj7/1DFaS1LtnY05Bqn2InilgDbDjxStI2SEgJu5lLnPQoJSc2hGCCABUOHPTl3SsjXn/ckzfY45/JUd4EFvWv8RdxZZ9BbTXq7ftl32zHrV1ObtcUsePVd1//0XWGTLO2ROvYFXV1Nx387UyWbUgzjqa6qaLgn1uV01OTgho+6VU4lobr2Uu06dFSkbtCEEEACqd46bvVZth/pVcypuU1yxMdVNzXxCRZP3xt+rtnwWpGn+R3QzNLDsc6j8Lpk8qpxW7M2gUUoiad8yOIOUXAuKqtZHcAWx6YmpHCCIAgETyK7Z0L3dUT5iedbO3l5ekvOfXRHXjdc48eIWAnraNqh7/6WhrMQLsmDJ9WvIdNFgKBBEAqHT9Ny0p96Zvfh+WQhtt5RRbSjk1F85ZE2cIybfzptAxZi9fZXJmP9zLV2bZyDSFM9c2tS7mPaMKIwPtmBpoi3QpEUQAoMI5b1p+jcRCU2CjLfcN07y2qmm6qpsusn/OmQlR326YI/Je5sm7Q8RrjK7lq3zjNu/p1aHWWWBb3XRRYd9dSEytjVdQk1TynTMEEQCApIEbiYWhkEZbXlthpb6dMeZGXjNr4bGbaH8IcYYUc0Cec0lioB0iQXcJDVTkmdWh9p4me0u0u118KfWFoV9mPeb+vKXeOUMQAQAEaiRW8Hua5Qwrk7XUYZYonLMY+UKI2c5b1dQsKZV1Qm/NrIVZW3YlZe1+cfbxKGSHiF99itdjvkFl/VJVN12k3j3/2w5JJoSY35f8vBfXMpyU/d9v1Xj/LrJRIYgAQIULciZNUTen/uUM5zZbEwb6fk7Z24Vz3t9VbOlVk+FcmpHke4PtaftlUTtE/M68Cbx85VjOOfZYlT3GOHp2uINPEnbOEEQAoIyFcspqwDNpCuW1XdQUa9qzGD4t5IO0p7f7nEg5oSArjMy8Q51fm1Rw7YvXUpWk4pev+j9rUs578QtapTZwqzgAQHL1/6s766Ysx9kxATqC1s66M28vjcEsH9TOvEM1sxb2dyetsmdCTCv1+m+02XUe7s8wENOa3R0UzDVNgPINFHk4Z4nqH2y1x+j1mNf7Obfw2tJVx8ae1eckHsV8L1FgRgQAyljSTln1mqHJ+pe3JMmSs8lYMT02jrz2mHr3bPStaXHOmhRa+1JMHUnO+zm28Gb1PqkapqrTpuWcdlxqUdQEFYsgAgBlzq+wMpabnMfWV+dpucdYvj02BtrWGrSmpejaF6+lKkeRp3Opym/5qnbWnXaxqqSsmYfqpouCt42PQGQ1QUVKWQk+Irerq0v19fXq7OxUXV1d3MMBgERz1kHUP9ga2zj8tt1KyjrR1izPjLj5OR165otZMxzu93PWugStiwmlfmaQ/GYe4qwRKcX3Usj9mxkRABgCktQp0/2va8N9Mzb9QEyPDS/5Goblu7bf80oZTpI282AE/f5KhSACAGUuSev9Rla79VSVahx1G87ljO6fPd73nKphqpnRkndpJZQQ4dxS6+hv4g48oYSSiHYjDTUEEQAoY0n9V/cRE0L6Z2jcjrVZz36OvTvFq9alwPbwXtxbinvbNti1KeaQurBCSdJmHpKKIAIA5SyB/+oOMkOTr3bCDiau3hZh7RDKep/+LcVmmci0ZPccFyJBsSoAIDSFnBvj9xxJdhjxK17tXv9I3ucEYRf39jcaM/9pCmiTUFharihWBQDEI+AMTb6be1VTs0bc/LxvrUsYHUHdxb3OMNLbtsEOKYSQ6BFEAAChKbYuopBal8HuEHJfyyzHZM2MxNz2vJLQ4h0AEL88Mynudu1B26x78apNMYWqsnqVGtWYFUZK0fb8yPqlvtc58tpjfbuFhjBmRAAAsQt6yN2gdwg5Ao/fzIjZTWMO6gv0voMRwm6gckYQAQCUhxB2CGUFHlcoce+WkZVRddNFkYeRpJ0XVGrsmgEAVLwgzdIkRdqVNazdQElQyP2bGhEAQMWrnXWn702/duYdfQGjfwnFXc9hB4jU4G6ptTPv8O2hMpSxNAMAQABRL6Ek6bygUiKIAAAQkDOMeLahL1ISzwsqFYIIAKAiFXuIXhgN1dzXSuJ5QaVCjQgAoDIVWfPhtYSS9ftC+4IE7KEyVDEjAgCoSMXUfARaQimwL0iln9JLEAEAVKxCaj4GWkLpaduoETc/lxNwJKln9wb17tno/95FLhMNBSzNAAAqWuBts3mWUEw3VrMkY5ZVutc/ou71j+QNIZIi3xqcZMyIAAAqWtBts/lmJEbc/Fz+nS4DFLVWcndVgggAoGKFuW3Wa5lHUuC+IFFtDU66oTvXAwAomXI8Qdav5qPQE32dspZ5pIJPCa7E7qrMiAAABq8cT5AN4RA9N3uZxyVoX5BK7K5KEAEADFo51jiEvW3WfN6qpmZVNzVLyv4+Bgo4ldpdlSACAAhFpdY4SN7LPIZnGAnw+krprkoQAQCEJuz252VjsMs8ESwTlYuUZVlW3IPw09XVpfr6enV2dqquri7u4QAABmAvJ/TXOFTKjAiyFXL/ZtcMACAUzuWFQnaKoLKxNAMAGLQoaxwquf15JSCIAAAGL8oah3LcGozACCIAgEGL8gTZctwajOAIIgCAxKvkrcFDXaTFqm+++aauvPJKjRkzRqlUSi+//HKUlwMADGGV2P68EkQaRA4fPqxzzz1XTzzxRJSXAQBUAK/25yh/kS7NzJ07V3Pnzo3yEgCAClCp7c8rQaJqRLq7u9Xd3W3/3NXVFeNoAABJUMntzytBooLIkiVL9MADD8Q9DABAklRw+/NKULIW76lUSitWrND8+fN9n+M1I9LY2EiLdwAAykghLd4TNSNSU1OjmpqauIcBAABKhLNmAABAbCKdETl06JB27dpl/7xnzx5t2bJFo0eP1rhx46K8NAAAKAORBpHNmzdrxowZ9s8LF/adB3DTTTfp+9//fpSXBgAAZSDSIHLppZeqRLWwAACgDFEjAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEJuSBJEnn3xS48ePV21trS644AK99dZbpbgsAABIuMiDyI9//GMtWLBAixYt0rvvvqtLLrlEc+fOVXt7e9SXBgAACZeyLMuK8gKf+cxndP755+upp56yH5s6darmz5+vJUuW5H1tV1eX6uvr1dnZqbq6uiiHCQAAQlLI/TvSGZGjR4/qnXfe0ezZs7Menz17tjZs2JDz/O7ubnV1dWX9AQAAQ1ekQWT//v3q7e3VySefnPX4ySefrH379uU8f8mSJaqvr7f/NDY2Rjk8AAAQs5IUq6ZSqayfLcvKeUyS7r77bnV2dtp/9u7dW4rhAQCAmFRH+eYNDQ2qqqrKmf348MMPc2ZJJKmmpkY1NTVRDgkAACRIpDMiw4YN0wUXXKB169ZlPb5u3TpNnz49yksDAIAyEOmMiCQtXLhQN954o6ZNm6bm5mY988wzam9v1y233BL1pQEAQMJFHkS++MUv6uOPP9Y//uM/6ve//73OPvtsrV69WqeddlrUlwYAAAkXeR+RwaCPCAAA5ScxfUQAAADyIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACITaRBZPHixZo+fbqOO+44HX/88VFeCgAAlKFIg8jRo0d13XXX6dZbb43yMgAAoExVR/nmDzzwgCTp+9//fpSXAQAAZSrSIFKo7u5udXd32z93dXXFOBq4LV+1Rul0WlfPuzzndytWr1Umk9E1V8yNYWQAgHKVqGLVJUuWqL6+3v7T2NgY95DQb/mqNdq+a7deXLVaK1avzfrd4kef0IurViudTtT/nAAAZaDgO8f999+vVCqV98/mzZuLGszdd9+tzs5O+8/evXuLep9KsHzVmpxAYKxYvVbLV60J9XrpdFrbdrbqzMmTssLI4kefsB/3mikBACCfgpdmWlpadP311+d9zumnn17UYGpqalRTU1PUaytNOp3Wi6tWS1JWAFixeq1eXLVa114xL9TrmWu8uGq1HUZeWv2KMpmMzpw8SYsWtIR6PQBAZSg4iDQ0NKihoSGKsaAAzmBgfnaGkChmJ9zXzGQySqfTviHEr6bELPOcMXFCTk0JtSYAUFkiLVZtb2/XgQMH1N7ert7eXm3ZskWSNHHiRI0YMSLKS1cEZzB4+ZW16unpjSyEOK/50uo1ymQsSX1hZMXqtZ7X9Ju12b5rt7btbM16rgkn23a25szmEE4AYOiKtLrw3nvv1Xnnnaf77rtPhw4d0nnnnafzzjuv6BoS5Lp63uWqrq5ST0+vqqurIq/TWPzoE8pkLKXTKUnKqRlxj+3aK+Zl/X7F6rV2Tcm2na324yaEuGtNzCwPhbAAMDSlLMuy4h6En66uLtXX16uzs1N1dXVxDyeRzI3ahJEoZ0SchamLFrTY1zahwu/afmN0P+5+n6iXmgAA0Sjk/p2oPiIojPtGbX6WFPqN2zmTYWpC3AWsmUwm53WmTsRv1iadTtmPO8NNqZaaAADxIoiUKa/ZAq8C1rBkMhnPUGB+9qvhcNaJmDBilmO8Hr963uV2CCnFUhMAIF4EkTIVJBiEKV+haNCwMH9OdlCSlDObs21nqx1CnOEEADA0EUTKlAkGXltkzd/j3m3i7mli6kEMZ2Hq1fMu17adrZ41KOb3AIChhyBS5sJubBbmeTLuWRuz5JJKpTR10kSdMXFC1nubEGIej3KpCQCQDASRMhd2Y7Mwg40zsKxYvTZrycW9TbfUS00AgGQgiAwBYTY2i6Jja5DdPWHUoAAAyg9BZIgIc7dJmMGm1Lt7AADlhXaVQ4R76WPxo0/4Pi/IybxhdWzNt+Ry7RXzWHIBgApHEBkCnLMOP/j2I3aHUncYeXDp477t0t0BxR1svFq4B3HNFXN9Q8zV8y7PWZJZvmqN77WChigAQPkgiJQ5r6WPRQtacsLIitVr9X7rrrzvYQKKO9i4z4uJkimWdV+LM2cAYGiiRqTM+S19LFrQYp8Nc9N/W2jXeUj5C1HjrumIoljWKcztyQCAwSOIlLl8N81FC1rsEOKu8/ArRE3CNtowi2Xdwu67AgAYHOa5hzC/Oo98haiF1nRExT3GTCYTSu2IKZJ1Lv9wyi8AxIcZkZDFOfXvvLb75rr40SeyznhJ+nku7hC1fddubdvZKmnwMxlRzrgAAApDEAlZnFP/5trmzBZn3Ydpn27Glq+5WBgGE8j86lac4x9s7Qin/AJAMhBEAijkphp1sWU+zmubFuruALRtZ6umTproWYi6bWerFi1oGfAzBlFsIBuoWNaEkTAarSV9VggAKgFBJIBCb6pxTv07r+3cLXP1vMu1fNUa30JUM4viviE7P2MpAlmQYtmdbW2DmskI0nIeAFAaBJEAirmpxjn173ftgXbYuG/IQW/YYQaygc6cGexMRtzbkwEA2QgiARVyU12+ao2279rtecMMu2DVa5bC3KzT6XRBN+uBPmPcgSyMmYwkbE8GABxDEClA0Juq2eFx5uRJWTMNziLSsLiXjZyFne4C1aBhJN9nLHSWI6xajLBmMjjlFwCShT4iBQhy/orZodIwenRWzYUzGEgK7cwUZ18Ms0XXXOvaK+Zp0YIWTZ000bdFezFnzAQ9EC/MVvEcngcAQxMzIgEFXRYwN0yp71/qzpmDMydPkmVZoW/jdc4MpNOprK27knTWlMl6v3WXtu7Ymbe2o5CakIFmOcKuxWAmAwCGJoJIAIXcVN03zBdXrbZv2maZJIodNH5LKqaGxDkb4QwZZitv0M9YaCCjFgMAkA9BJADnTdVZHOq+qeYrRO3p6Y20l4jfLIWpIbn2inl2GDGBReqbLXF/RifnZxxMIPN6TwAACCIBOG+q7uJQc1PNt8zhvPFHIcgshfm9CSqS7KZn7s/o1xMkXx8SiVkOAEDhCCIFKmS5woQS50xF2L0qCpmlcJ41I2nABmZuzHIAAMJGECnCQFtY3QWr7oDiLhodjGJqMaqrqzR/zuV2Ma15PqfQAgBKLWVZlhX3IPx0dXWpvr5enZ2dqquri3s4OUwL9erqKv3g249k/c7vph7Hzd65VGNmZpxByf14JpOJ7QRhAED5K+T+TR+RIg3UbyNJfS+27tgpSTn9PMxj6XQqa7eNqYNxfyYTaNJp/mcDAAgHSzNFCFIcmoR6CtNq/v3WXTk1JNt2ttrNzzIZy7MnCMs2AICoEUR8+J006+6/ISX30LR0Om13c3UvD23b2aoTTxid1fzMb7dNqU8QBgBUDoKID/c2XcMsc5j+G0YSt7A6w4S7kZmzDXy+QBXXCcIAgMpAEPHhtzzhXubwek0h/GZepHAKQ/1mNjKZTM5MifP5poFZGAfWAQDghyCSRymWJ/xmXvL18yhUoTMbhbRyBwBgMAgiA4h6eaIUhaFmZiOdTnvObLhnXsI+sA4AAD8EkQGUYnkiypkXd02IOXjPXNdr5oUD6wAApUJDszzytW6PYkYgX4O0YviN3xlK3AWrSRZ1PQ0AIBw0NAuB3/KEaQbmbvYVxvXyNUgrhntmw4x/285WpdOpsgohkmi0BgBDEEszPkq5PBFVYajX7ICz5iWVSuUdU6EzDKXcAWR+ptEaAJQ3goiPUnVGLXVhqHvmJeiOnSAho1Q7gCQarQHAUEEQiVkSZl5eXLVa//7uv+nF53+ka6//L2r93b6cMblDhgkmknKClDlh+Gt33h7JjAWN1gBg6KBYtULkOw34J//rX/Xqip/o4w/26YSTT9Ez31umL3xuTt73kI7N2Jj3NGfbbNvZKin3pN+wiknNOJzvTxgBgOQo5P7NjEgCeS2DOGcg3Dd0Z/2F87XOv5tOqmaGxfmad9/ZrOc+2Kcp5/wn7Xhviz5Z7V074pzxqK6uyvm9CSFnTp6knW1t9oyFeU0YSzM0WgOAoYUgkkBetRbOx5w3dPcshfN57r+bkOB8jWVZev6H/1MnnHyKzp9+ifZ/sE+333GHdrz/vmcxq3tZZP6cy7PqNcyWYElZdShhzFrQaA0Ahh6CSAIFvbkOdGO+9op59nZjSVlLKua8mf96a4tad+zQpZ+7SqlUSp+a9hm9/q8r9V9vbdGVV1yRs5TiLnaVjgWO6uqqrCASNhqtAcDQE1mNyK9//Wt9/etf189+9jPt27dPY8aM0V//9V9r0aJFGjZsWKD3qPQaEa9aCEmB6iPcr5WOBYapkyZKkj76+GP98LtPKyXps1dfp1QqJcuy9OqKn0iSFnz1Hn3tzttz3tNrWcR5HffSjJk1OXPyJJ0xcQJNxwBgiEtEQ7Pt27crk8no6aef1tatW7V06VJ95zvf0T333BPVJQdl+ao1vk3EVqxeq+Wr1gR6Tpiunnd51mzD1fMu93wsyGudfz9rymS937pL723Zoo8/2Kezp33GXoYxsyIff7BP3Qc77fdb/OgTeZdY5s+53A4423a25syamJkSmo4BAJwiW5qZM2eO5sw5tvOiqalJO3bs0FNPPaWHH344qssWLWgPjKj7ZDj5dVsNcvZNviWUrTt2auqkiVr70gs64eRTdGrjuKzXnto4TiedOkav/uv/0j/+f4v08ppX7fqSTCajFavXauuOnXq/dVfWjpizpkzWRx8f0P4DByT1hRPnd8buFgCAW0lrRDo7OzV69Gjf33d3d6u7u9v+uaurqxTDklRY185SdPbMtwwy0I4Rv/DkDAYd7b/Rxx/ss2tDnFKplM48/0K9/q8rdd+D39Du33+Yc80Gx3+P7rFIUsPo0b67awAAMEoWRHbv3q3HH39c3/rWt3yfs2TJEj3wwAOlGlKOIF07S9HZM0i4MVtzncWo2UFhlLbtbNW2na1qGD1aJzWcYL9nvtkQ49TGcTrh5FP05L88oWe+tyyrGNbMhpiTfJ3X2X/ggKZOmqiv3Xl71iF+8+dcnlVMygF2AACpiBqR+++/X6lUKu+fzZs3Z72mo6NDc+bM0XXXXaevfOUrvu999913q7Oz0/6zd+/ewj/RIAWpwXA+J9/NtNiaEa/dIeYxs9vFvZ3X3OS37tgpSTqpoUHbdrbqxBP6woFZWnl9wy/12muv6eMP9ulTjtoQN2etyCerU1qxeq2+/si3tWL1Wp01ZbIkZR2el0ql7CWZs6ZM9lwacgYLDrADAEhFzIi0tLTo+uuvz/uc008/3f57R0eHZsyYoebmZj3zzDN5X1dTU6OamppChxQqr7oMd9Awz0mn08pkMlr86BNatKAl6/eDqRnxmgnwmx0w17nmirlasXqtXbfhnB0xhaLbdrbKsiz9avP/zjsbYphZka/8/d/rkrmf11lTJudsC85k+jZdmc1X7i3CfktISTrAjtkZAIhPwUGkoaFBDQ0NgZ77u9/9TjNmzNAFF1ygZcuWJf5fuUG6drqfs/jRJ7RtZ6sdRkp5Mx1omcj5+3Q6pUzG0u/3tvvWhrg5+4oMs3qzPt+Zkyd5vsYEniBNx5JygF0pDusDAHiLrEako6NDl156qcaNG6eHH35YH330kf27U045JarLFi1I107z9zMnT7J/t2hBix1G/vq2O2RZpd0dMtABcM7fm9mQEXX1qh0+XAc++nDA968dPlwj6uq16uWX9NKVV+oLn5tjh41USnJ2oTEzL87vxzkOKbfpWN8yU8pz/KWajUjS7AwAVJrIgsirr76qXbt2adeuXRo7dmzW75J4zl7Qrp3mZutcslm0oMUOIel0qqQ3roGWko4tI6X05z/36E+HDuk/Dh/SKy8+X9B1hn2iWkePHtWK1WtzQoi5tgkh7u/H8Ppetu/arUzGssOIeV2pZyOSMjsDAJWG03eL4Lc8Y5Y/SnUD81tKMsWrzkPoTNv1w4cOKpXJ6Ihjm7QkzbzkL3TZxdP1s19s0Dv//iv9satLqXRKVsbSzEv+Qtd/Yb7Gjh2bc8KulF0XMnXSRJ01ZXKgmQx3DYv7P+MIAs6dPp+f/VlqRwCgCJy+GzHnv55fWv2KfbKts4bC+bwoDLSUdNzw4fqPP/1JJ54w2g4N114xT2/8cqM++viAjnO937vbWzVh4iR1Z6R0Ta0u/otPZX2eCe9t1dixY+3D88z7mes5C1jPmjI5cAhxhyjn4XylDiHu2SVn4KJ2BACiQRApgtllYXbNpNPprF0zUydNjDyMOJeSnLs+rp53uV3DceIJo/XRxwc0vLZWn5t1mbbtbNVHHx+wx3jWlMn2dt/3W3fZYzahyjl+Z78Q81p3kzfn9uJCxm+uY5ZE0um0zpg4IbwvKwC/YGR6pZgxUjsCAOEiiBTBucvChBHTD8PcpMzyRFS8enJIfTdL98zMn44csf9uZkpMkDA31/dbd9nv5wxV5j0l6f9s3zFgHU3QpYqBTvUt5Q6rgWaXTBihdgQAwkcQGaQvzOs7TyfO81T8tsd6mTdzRtZznX83NS5+hab53ncwnznItukoBSlUdp4mTAgBgPAQRArkdY5LVOepFNJoy2/Xh6lhMdz1HEaYNS6FjNvv7BozVtOEbeqkiZEVhuZ7XxOMghw0CAAoHEGkQO5/PTt7eLjPUxmsQhttuXuKbNvZqkwmo1QqJcuy7N0o7vBkajKcN1h3XYTfsotX6DDj3razVWdMnJATOpzjNoW+Zlzmms5aF0l2W/lSe3Dp457dap3jZ+cMABSPIFIg501noPNUBqvQRlvu8TgLVt1bY814U6lU1lZf53VN7Uu+HSJeYckdIszYvMbtDinOm7xzd04cMxDu2hkp978Tds4AwOAQRIoURV2D1+xC9lbhNb59SryWjCTpPzdfZP/s7CdiXPO5uVm/L3SHiF9YcoYe05sj6Ps4w0ichaFm9suMS8r+73bqpIks0QDAIBFEihCkHXwxNyjnkoZ754okuwNpvhBitvP69fgwzpw8yQ4ezt8Xs0MkX1dSZ4OwfO9jQphzdsm8Jq7mYe7rsXMGAMKX7FPoQrZ81ZqcY+eNFavXavmqNYHeJ98ui6B9NLxcPe9yexZh8aNP2OPKrkmwcj6DezzXXDHXrrPwGk86ndKiBS32753LDGdMnGCHgUJ2iFw97/Kc13kVefoxIcyEEEnq6enV4kefsBudxcnr8wEABq+iZkTCOmV1oF0Wg+E8RO/GljuzQoTfMkGQ8Ty49HFJytn54e4BUuwOEffrzGcoZunKOVsTV5dVN3bOAEA0KiqIFFr8GZdFC1o8Q4i7k6l7V4ofU3TpFwrM+xZb9+J39o4zQORbunJex8wI7Wxrs3/vd4heqcTd5wQAhrKKCiJSsk5Z9eu3YWoi/Dh3pZw5eVLeawStZym27sXrdaY9uztAOGdfnDKZTFbbeGddyfw5l2vrjp2RdqnNJ6p6IABAn4oLIlJuv424biReS0XumhDT7dRrV0qQABWka2ghzwvy/s4tue7XeY13oC3RX7vz9ryfMUrFfi8AgGBSlmVZcQ/CTyHHCBfC3OzNzS7OZRm/bbeScpY6pGM1Hs6D6dzvV65NtvyWQJK0bAYAGFgh9++K2jUjZd/sfvDtR+xdI/l2dETJuXPlpdXHdu04b76LFrTYSzB9B8Kl7GUPJ/PZwtph4txl5N5x5NxlVMiOIz9+SyBx//cDAIhWRS3NJHW937lUlE6n9IV5c3PGYYo4zVLNQM3HCjnvxY/7lGHz9607dtrFr+4ZnWJnZFgCAYDKVFFBJKk3O6+6CPfvvZYs8jUfC2OrsjOkeTVFMwWzXksphYpySzQAILkqKogk8WY30NbQgWZx0um0Z9FtWFuVne+TfVBeKmuLLvUcAIBiVGSxalL43bydj2cyGd8lFlPEmq/oNqzCXOeWWulYrUomYyWi6BcAkByF3L8rakYkaYIsFfnN4ri38Po12Qpjq7LX0pH5u9+MDAAAQRBEYlTsUlEhRbeDbU3ut714/pxjTdVMGClF99MwinABAMlBEClDQYtuB9ua3CuEuAOJ2c3j3sUTlbDOCwIAJANBpAwFmUkJY6uyM/AsX7UmK9BI0tRJE7VoQYs9E1GKMFIu5wUBAIKhWHWIinIJI8h7S4p0CSVJ3XEBANkKuX8TRBCJIDuCBhscnDt5fvDtRwY7ZABASNg1g9hFvYQy2CJcAEAyEEQQGWcY8er+WqzBFuECAJKDIIJAiq05CaOPiftaSTwvCABQnIo7fRfFMdtmCz3x12sJxcl9qq/7te5TffNtXTadaAEA5YMZEQRSTM1HkCWUQvuCJPG8IABA8QgiCKyQmo+BllC27WzVogUtOQFHkrbu2Kn3W3exJRcAKgBBBAUJWvORbwnFtIY3O128wki+EEKbdwAYOqgRQUEGqvkwrrlirm+QWLSgRddeMc+z5kTSgEWtxdarAACShxkRBBbmtlmvZR5JgfqC0OYdAIYOgkgMynFpIYpts85lHkkFBZyoepQAAEqLIBKDcjxBNuiJv4UwyzxuQQNO2D1KAAClRxCJQTkuLYS9bdZ83qmTJuqsKZMlZX8fQQIObd4BoPwRRGJSyUsL+UKXVxgJ8h60eQeA8kQQiVGlLi0MdpmHNu8AMHQQRGJUqUsLg13miaJeBQAQD4JITFhaKB5t3gFg6CCIxCDKpYVy3BoMAKhcBJEYRLm0UI5bgwEAlYsgEoMolxbKcWswAKByEUSGoEreGgwAKC+Rng72+c9/XuPGjVNtba1OPfVU3Xjjjero6Ijykuh39bzL7d04lbQ1GABQXiINIjNmzNALL7ygHTt2aPny5dq9e7euvfbaKC+JfkFPyQUAIE6RLs3ceeed9t9PO+003XXXXZo/f77+/Oc/6xOf+ESUl65obA0GAJSLktWIHDhwQD/84Q81ffp03xDS3d2t7u5u++eurq5SDW/IoOsoAKCcRLo0I0lf/epX9clPflInnHCC2tvbtXLlSt/nLlmyRPX19fafxsbGqIc35OTbGnztFfPoOgoASJSUZVlWIS+4//779cADD+R9zttvv61p06ZJkvbv368DBw7oN7/5jR544AHV19dr1apVSqVSOa/zmhFpbGxUZ2en6urqChkmAACISVdXl+rr6wPdvwsOIvv379f+/fvzPuf0009XbW1tzuO//e1v1djYqA0bNqi5uXnAaxXyQQAAQDIUcv8uuEakoaFBDQ0NRQ3MZB7nrAcAAKhckRWrbtq0SZs2bdLFF1+sUaNGqa2tTffee68mTJgQaDYEAAAMfZEVqw4fPlwvvfSSZs6cqSlTpuhLX/qSzj77bL3xxhuqqamJ6rIAAKCMRDYj8qlPfUo/+9nPonp7AAAwBES+fRcAAMAPQQQAAMSGIAIAAGJDEAEAALEhiAAAgNiU7NC7YpgGaBx+BwBA+TD37SDN2xMdRA4ePChJHH4HAEAZOnjwoOrr6/M+p+CzZkopk8moo6NDI0eO9Dwkz4s5KG/v3r2cT1MAvrfC8Z0Vh++tcHxnxeF7K1xY35llWTp48KDGjBmjdDp/FUiiZ0TS6bTGjh1b1Gvr6ur4H14R+N4Kx3dWHL63wvGdFYfvrXBhfGcDzYQYFKsCAIDYEEQAAEBshlwQqamp0X333cfBegXieysc31lx+N4Kx3dWHL63wsXxnSW6WBUAAAxtQ25GBAAAlA+CCAAAiA1BBAAAxIYgAgAAYjPkg8jnP/95jRs3TrW1tTr11FN14403qqOjI+5hJdavf/1rffnLX9b48eM1fPhwTZgwQffdd5+OHj0a99ASb/HixZo+fbqOO+44HX/88XEPJ5GefPJJjR8/XrW1tbrgggv01ltvxT2kxHvzzTd15ZVXasyYMUqlUnr55ZfjHlLiLVmyRBdeeKFGjhypk046SfPnz9eOHTviHlaiPfXUUzrnnHPsRmbNzc1as2ZNSa495IPIjBkz9MILL2jHjh1avny5du/erWuvvTbuYSXW9u3blclk9PTTT2vr1q1aunSpvvOd7+iee+6Je2iJd/ToUV133XW69dZb4x5KIv34xz/WggULtGjRIr377ru65JJLNHfuXLW3t8c9tEQ7fPiwzj33XD3xxBNxD6VsvPHGG7rtttu0ceNGrVu3Tj09PZo9e7YOHz4c99ASa+zYsXrooYe0efNmbd68WZdddpmuuuoqbd26NfqLWxVm5cqVViqVso4ePRr3UMrGP/3TP1njx4+PexhlY9myZVZ9fX3cw0icT3/609Ytt9yS9dgZZ5xh3XXXXTGNqPxIslasWBH3MMrOhx9+aEmy3njjjbiHUlZGjRplffe73438OkN+RsTpwIED+uEPf6jp06frE5/4RNzDKRudnZ0aPXp03MNAGTt69KjeeecdzZ49O+vx2bNna8OGDTGNCpWis7NTkvj/YwH19vbq+eef1+HDh9Xc3Bz59SoiiHz1q1/VJz/5SZ1wwglqb2/XypUr4x5S2di9e7cef/xx3XLLLXEPBWVs//796u3t1cknn5z1+Mknn6x9+/bFNCpUAsuytHDhQl188cU6++yz4x5Oov3qV7/SiBEjVFNTo1tuuUUrVqzQmWeeGfl1yzKI3H///UqlUnn/bN682X7+P/zDP+jdd9/Vq6++qqqqKv3N3/yNrAprKFvodyZJHR0dmjNnjq677jp95StfiWnk8Srme4O/VCqV9bNlWTmPAWFqaWnRe++9p+eeey7uoSTelClTtGXLFm3cuFG33nqrbrrpJm3bti3y61ZHfoUItLS06Prrr8/7nNNPP93+e0NDgxoaGjR58mRNnTpVjY2N2rhxY0mmnJKi0O+so6NDM2bMUHNzs5555pmIR5dchX5v8NbQ0KCqqqqc2Y8PP/wwZ5YECMvtt9+un/70p3rzzTc1duzYuIeTeMOGDdPEiRMlSdOmTdPbb7+txx57TE8//XSk1y3LIGKCRTHMTEh3d3eYQ0q8Qr6z3/3ud5oxY4YuuOACLVu2TOl0WU6chWIw/1vDMcOGDdMFF1ygdevW6eqrr7YfX7duna666qoYR4ahyLIs3X777VqxYoVef/11jR8/Pu4hlSXLskpyryzLIBLUpk2btGnTJl188cUaNWqU2tradO+992rChAkVNRtSiI6ODl166aUaN26cHn74YX300Uf270455ZQYR5Z87e3tOnDggNrb29Xb26stW7ZIkiZOnKgRI0bEO7gEWLhwoW688UZNmzbNnmlrb2+n/mgAhw4d0q5du+yf9+zZoy1btmj06NEaN25cjCNLrttuu00/+tGPtHLlSo0cOdKeiauvr9fw4cNjHl0y3XPPPZo7d64aGxt18OBBPf/883r99df1yiuvRH/xyPflxOi9996zZsyYYY0ePdqqqamxTj/9dOuWW26xfvvb38Y9tMRatmyZJcnzD/K76aabPL+3n//853EPLTH+5V/+xTrttNOsYcOGWeeffz7bKQP4+c9/7vm/q5tuuinuoSWW3/8PW7ZsWdxDS6wvfelL9v9tnnjiidbMmTOtV199tSTXTllWhVVtAgCAxKjcxX8AABA7gggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYvP/A7VSxPISKof1AAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {
-      "engine": 0
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px --target 0\n",
-    "# plotting on 1 process only\n",
-    "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n",
-    "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n",
-    "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n",
-    "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:3] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[-2.0081, -2.0299],\n",
-       "          [ 1.9919,  1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:2] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[-2.0081, -2.0299],\n",
-       "          [ 1.9919,  1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:0] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[-2.0081, -2.0299],\n",
-       "          [ 1.9919,  1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:1] Number of points assigned to c1: 100 \n",
-       "Number of points assigned to c2: 100 \n",
-       "Centroids = DNDarray([[-2.0081, -2.0299],\n",
-       "          [ 1.9919,  1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px\n",
-    "kmedians = ht.cluster.KMedians(n_clusters=2, init=\"kmedians++\")\n",
-    "labels = kmedians.fit_predict(data).squeeze()\n",
-    "centroids = kmedians.cluster_centers_\n",
-    "\n",
-    "# Select points assigned to clusters c1 and c2\n",
-    "c1 = data[ht.where(labels == 0), :]\n",
-    "c2 = data[ht.where(labels == 1), :]\n",
-    "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n",
-    "c1.balance_()\n",
-    "c2.balance_()\n",
-    "\n",
-    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
-    "        f\"Centroids = {centroids}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Plotting the assigned clusters and the respective centroids:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "c1_np = c1.numpy()\n",
-    "c2_np = c2.numpy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[output:0]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9RklEQVR4nO3df5RU9Z3n/1d1d+jGQLc2jShLE2l+CTG6Cv6A0RkRB4E1ETc6ibthnJOEObjiSDjf2UTJ+mOig5lxRCYmjpoczNkxqCMSHQZQwKjJEIL4lTEB5VcTG22JIptuYENzuuvuH92fy61b91bdqr637q2u5+McztLVVXU/VWbnvvh83p/3J2VZliUAAIAYVMU9AAAAULkIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2NTEPYBc0um02tvbNXToUKVSqbiHAwAAArAsS0ePHtXIkSNVVZV7ziPRQaS9vV3Nzc1xDwMAABTh4MGDGjVqVM7nJDqIDB06VFLvB6mvr495NAAAIIjOzk41Nzfb9/FcEh1EzHJMfX09QQQAgDITpKyCYlUAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwiDSKPPvqozj//fLsPyLRp07R+/fooLwkAAMpIpEFk1KhReuCBB7R9+3Zt375dV111la677jrt3LkzyssCAIAykbIsyyrlBRsbG/X3f//3+trXvpb3uZ2dnWpoaFBHRwedVQEAoVu9dr2qqqp0/dxrsn63Zt1LSqfT+uK1c2IYWXkr5P5dshqRnp4ePf300zp+/LimTZvm+Zyuri51dnZm/AEAICpVVVV6bu06rVn3Usbja9a9pOfWrst7ciz6L/Jv+Ne//rWGDBmi2tpaLVy4UGvWrNHkyZM9n7ts2TI1NDTYfzh5FwAQldVre2sWb7h2bkYYMSFk0vhxnjMlCFfkSzMnT55UW1ubfv/732v16tX64Q9/qNdee80zjHR1damrq8v+2Zzex9IMAAx8pV4mMYHjhmvnSpKeW7tONTXV6u7ukdQbUAgixUnU0sygQYM0btw4TZ06VcuWLdMFF1ygFStWeD63trbW3mHDibsAUFlKvUxy/dxr7NkQSYSQmNSU+oKWZWXMegAAIMm+8ZtgcP3cazJmLaIIBu5r5uM1a2Mek5Q1a0PBa36RBpE777xTc+bMUXNzs44ePaqnn35ar776qjZs2BDlZQEAZcoZDH664SV1d/eUdHaipqZa82ZfkxGGnMysjfN3zsfMMo8k3bf8e3pn776MxwwCyimRBpHf/e53mj9/vj788EM1NDTo/PPP14YNG/Snf/qnUV4WAFDGrp97jR1CamqqIw0hZsZFyl6a8QojXrM2fu/7zt59Oa/pFVAqUaRB5Ec/+lGUbw8AGIDWrDsVQrq7e7Rm3UuRhZGdu/dIOlUT4gwJN1w7V+l0Ous1frM2uR4zr4t6qakclbxGBAAAP+4btXPGIuwbt5m1cIYCZ8i44dq5nksnpibEBCXnrM3z6zZ4zuTEtdRUDggiAIBE8JotCLoUUox0Ou0ZCszPXrMhUmZNiHPWZteevUqn06qqqsqYySnlUlM5IogAABKh2GBQrFyFokHDwrzZmUFp8oTxWrp4UcZMjqSSLTWVI4IIACARTDDw2iJr/h73bhOvJmhVVSn795MnjJeUPZNTiqWmckUQAQAkitcWWan43SZhdmx1z9qYJZeqqpT+69w5nrM2zlbxUS41lSuCCAAgUcJubBZmsHE3K3Muubh/X+qlpnJFEAEAJE6Yjc2i6NgaZHdPGDUolYAgAgBIpDB3m4QZbEq9u2egi/zQOwAAiuFe+rj/4UeynrN67Xrd//AjWr12vefrnY9fP/caz94fhcq15OJugrZ67fqsQ/z8xlepCCIAgMRxzjr8+B8f0uQJ47Vrz96sMPLuvv3atWev3t233/P1zhN7vTq2FuOL187xDTHXz70mY0mm1CcKlyO+AQBAongtfSxdvCgrjJgmYuZxc7P3er072JizZIoNI0GZWRLntWjznillWZYV9yD8dHZ2qqGhQR0dHaqvr497OACAEsi13fb+hx/Rrj177VkNd7Go+3HJ/8ZfykCQa3z9Feb25LAUcv9mRgQAkCi5lj6WLl7kWeeRq/6jkJqOqHiNL6z6kXJf/kn26AAAFc19s3bXedy3/HuejztfU0hNR1S8xhdWgCj35R+27wIAcopz6t/ZjExSRgOy59au0zt799nLNUlto+7Xc+SGa+faAULqX3+TMLcnlxpBBACQU9gt1wvhdWaL+fmGa+dq1569dsGqV0+PXXv2auniRVnvW0yAKiaQ5es54gwjYTRuK8dTflmaAYAKVEh9QtxT/9fPvUaTxo+T1Hu2i/O6544bq8kTxuvccWOzXuPeTeP8fGbpo5DvoZillCD1KWH1Nwlre3KpMSMCABWo0FmOuKf+v/2N23TzXy3JulnnmtFYunhR1jJNkNbskvf3UEyr+CBt3r0CRKHfa9DPlUQEEQCoQMXcVOOc+i/2Zp0vQBX6PYQdyMIIEOXecp4gAgAVqpCb6uq16/Xuvv2eYSDsglV3LYZ7dmLn7j0F3WDzBahCw0VYgSysAFHup/wSRACgggW9qZpW6sOHNerh79xt30SdxaKr164PJYzk2ylzw7Vz9dmJEwLfrIPMphQSLsJYSpHCCxDlfsovxaoAUMGCFDiaVurDhzXq40+O6P6HH8koBh0+rFG79uwNrXGWszh25+49WSHk+rnXKJ1Oa/KE8Z43a2eRadDW7kELPcNsFZ+E/iZJwIwIAFSooPUJzn+5m54d8xctVjpt2eEk7MJV5xLF3gMHspZLqqqq7JkYv88UdOmjmALWcqzFSCqCCABUIOdNNZ1OZywvuG/S5u9S704UE0IkRRJCDL/lElND4tcMbNL4cXYL9XxLH4WEi3KvxUgqgggAVCDnTdVrBsB9kzZ6C1NPnZXq1+ArDH61GKaGxKsZmCR9duIEScFqJ4KEFaPcazGSiiACABXIeVMNuoXVPGZqQ6qqqpROp3X/w494di/tjyDLJeb3JqhIyuiw6n4/r509hIv4EUQAAHm3sLpDiLtmJMwwUshyiXN3jSS7k2qQJm1IBnbNAAAkeR9Vb5hdKs4QIvXWjPi1Ui9WkLbobjU11Rm7a8rxFNpKlbIsy8r/tHh0dnaqoaFBHR0dqq+vj3s4ADCgmZu2CSPum3ecp/D6jVVSxngleT6eTqcTM/ZKUMj9mxkRAECg/hhJ6nuxc/ceScoar3msqiqVMbNTzIF1KA1qRACgQvjNaLi3vUrJ7Y9hWs2/s3dfVg3Jrj177TqWdNry7HxayNk6KA2CCABUCL8Td83sgtn2aiSxP4azkZm7INXZ5TXfbps4ThCGN4IIAFQIv1kB9+yC12uSwvkZnIfuee3ocT/f/BzXCcLwRrEqAFSYfEWp/VWKolavzxCkINXMCkX12dGLYlUAgK9c23TDUIrCUK/PkK+Y1tmRtb8H1iE8LM0AQIUJ6xh7P6UoDDWfoaqqyvMzuGdeOLAuuQgiAFBBgp40219RFoa6a0ImTxjvG3oMDqxLLmpEAKBC+M1KRLmN9ea/WmLPvvz4Hx/q9/v5BSlnKHEXrCZVkhrEhY0aEQBAlmJap/eH1xJQf7k/gxl77yF8qbIJIVJpamnKAUszAFAhSnnSbFRLQF6fwbklN5VK5RxTobMMUc5a0GStF0EEABCqUheGumdevK7hVTcSJGT4NYEL60RfmqwRRAAAIStlYajfzEuQWQZ3yDDBRFLW859bu047d+/Rt79xW+izFpXeZI1iVQBA5KJY4shXfCspb+My98yG8+A8E07e3bdfu/bszXo/SZE1Zyv3MFLI/ZsZEQBA5LyWOJw3efcShzOcOEOM8+/pdLrvgLt01msk6fl16/POMjhnPGpqqrN+b8Y3ecJ47Wlttd/PvKa/SzOl2k6dZAQRAEDkvGpEnDf5XPUXzhDj/rt5vfME4TXrXtJ/vPX/a+3TT2nq5X+sM0eOsnemeM1guJdG5s2+JqNmw2wJlpRRh9LfmQuarPUiiAAASsKrMNPc5N0H2PndnG+4dq7dml1SxpLK5AnjNXnCeP3Lv/6bXl7zL/r9J4fV+eH7+pM/uTLr+U7uYldJGe3jc1Uw9GfnDE3WekUaRJYtW6bnn39e7777rgYPHqzp06fru9/9riZOnBjlZQEAClaXIamkTbXcsw9LFy+yw0euXSN+SyjuQPPxJ5/ow4Nt+uR3hzTx/P+sbdu26bThZ2vk6M94fr6du/dknD7sVV/yzt599vfknjWRvMNNEKXcTp1kkXZLee2113Trrbdq69at2rhxo7q7uzVr1iwdP348yssCABSsYVapm2r5nXMT5BA+9/Ocfz933FgNH9aojw5/ond3bNfws87WRdOv0LARZ+nX238ly7Ky6kmeW7suI4S4zZt9jSaNHydJGj6s0d7O293dYy/VTBo/rqJCQxQinRHZsGFDxs8rV67UmWeeqTfffFN//Md/HOWlAaDiFdIwqxRNtfwKM3ft2RvoEL5cSyjv7tuvjz85oj90/B8d+uADXflfrlMqldLnpl6qV//tBX14sE03fv6/ZFx30vhx+uzECUqn07pv+feUSqUyCmfT6bQ+O3GC3nv/A338yRENH9aojz85YndwnTxhvJYuXhTa91OpSloj0tHRIUlqbGz0/H1XV5e6urrsnzs7O0syLgAYqII0zCpFUy2/2o9de/baN/Vzx43Vu/v2exZq3v/wI3ZIMMspUu+shRn7pPHj9PB3/1bDRpyls5tHS5LObh6tYSPO0r7f7NC//Ou/eX6+Netest/PWTjrXKYZPqxRV1x6iV58eWPfqb+pjNoWFK9kjewty9KSJUt0+eWX67zzzvN8zrJly9TQ0GD/aW5uLtXwAGDACrL04XxOrpqR1WvXFzUGr8LMNeteygghzl0wzuUiE0KaGhu1a89eOzQMH9ZoB4XJE8Zr8+bN+uR3h/S5qZfard7NrMj7772n331wMOs7MJ/HLMGYcOEVQpw1Iul05lKPeS+/83T6890NdCWbEVm0aJHefvtt/eIXv/B9zh133KElS5bYP3d2dhJGAKCf/OoyvJ5TVVWldDqt+x9+JGPZob8tzb0KM/12jZgdMOl0OiOsmIAyecJ4Sb2hYfiwRr265Zf6+JMj+vX2X2XMhhhnN49W04iz9B/bturs5tEZB/C9s3ef3tm7z35Pc33DLMc4+524T/01om4HP1CVJIjcdtttevHFF/X6669r1KhRvs+rra1VbW1tKYYEABUhSMMs93PMDIQJI1HVjPgdYCf1hgHTQMyrnsXc1E3RqNkpY2pDnFKplM7rqxUZM2K4Lpp6ccaOl89OnGCHCvN+5nUff3LEfjxfv48kHWIX5WF9YYs0iFiWpdtuu01r1qzRq6++qjFjxkR5OQCAQ5CGWebvztqIpYsX2WHkK7feLstSSW+kuc5e8drGa1mW72yIYWZFXv63f9WFU6ZmXc+8p5PZaXPuuLFZTdecr3Muz1w/9xrt3L3Ht96mVCGgnGZnIq0RufXWW/XP//zP+slPfqKhQ4fq0KFDOnTokP7whz9EeVkAgHI3zLrh2rlKp9N2m3RTG2EsXbxIqZRkWVJVVaqk/5r3Wkpyj9+5c8bMhjhrQ9zMrMi2bdv0vUf/yW6Mlq9p2K49e31nFsxY3KHisxMnSFJWkIpqS7TfuEzjN/P9xTU7k0+kMyKPPvqoJOnKK6/MeHzlypX6i7/4iygvDQAVr5CGWe4lm/sffsQOIem0VbLdIUGXkkwICTIbYpgdNL/Z/ivNm7PGDi3Oa5wKX1V2SOlPu3VnkCp1CCjFbqgwRL40AwBIPudN6/l1G+yZEmeNiPN5Uci3lPTqlq06s2mYXccxecJ4bdq0ybc2xM3ZV+Tu+/5Wf/O/lkqSdu7eI+lUYaqZITL/b1NjY+B2616n+bpP9C0l9zKXKQBOUu0IZ80AACT13rRMCKmqqrJ3zZTqIDbnUpKz2NLZb+TwkSM6bXCdTht8mnbu3qN9v9mhIfUNqhs8WEc+/ijvNeoGD9bQhgb94PuP6D9fNEWpVMreNeO3K8Ysz+TjFaRMCIiLe5nL7P6RklM7QhABgApnbvqS7BBi/uVsHjP/So7yIDbnv8TdxZZLFy/Sfcu/p3f27tP//cMJnTb4NE0cO0YvHP5E//f4MW147umCrtVYU6NnX/xXnXfuuXatiLMg1VmI6u4X4sddk+MOATt37yl5rY3XMpfZBi3Fu7PHSFkJXj/p7OxUQ0ODOjo6VF9fH/dwAGBAci69eNVmxHWDylUvYhw/dlQjhzcppZRa29o084o/0lWXT9ePnnpGrW1t9vPuv+OvM1535pln6o23d0a2FOE39lJ9l37Xc8/0mJAU9rgKuX8zIwIASCS/YkvncsenhwzVn86cmXGz/489+5ROW2ocfqZ9o33v0MdZN9pcfa36w3mz96p12bVnr84dNzbSWoxcO6bM702fllwHDZYCQQQAKpy5aUnZN33z+7AU2mjLXWwpKavmwrnM8Py69Uqneyf6c+28KXSM7uUr5xjd43ZuiXZ2qHXWupj3jCqM5NsxFaTbbqkQRACgwjlvWn6NxMJSaKMt9w3TebaMs9bh1G6fU9UGZjeMX8dTv2UZrzE6H3OO0Wvc5j29OtQ6d+M428OXkqm18VvyKvXOGYIIAEBSsDNp+quQNuheW2Gl3m225kZumnZJp2ZunCHFHJDnXJLIt0Mk6C6hfHUfzg618xd9I2OmJM66G/OdGO7PW+qdMwQRAECgRmKFMssZZieOs15i1569GT1LcoUQs5130vhxSqVSGSf03nDtXLv/h+GszXD28SikaNSvPsXrMb/3Wb12vSZPGK939+23Q5IJIeb3pe7Z4V6GkzL/+04aP67kAYkgAgAVLsiZNMXcnMxyhvMwORMGdu3Zq1QqlRVSDHc48arJcC7NSPK9weY6+yUXvzNvgi5fOZdzTj2WsscYR88Od/BJQtdVgggAlLEwTlkNssOiGM4wY0KDKdY0sxjOniXO6wdpT+88g8YdCtyn4t78V0sKrn3xO/Om2OUr0y4/Kee95DpcsJSiP3kHABAZ869u98FwhRyw9sVr5xR0qFshzOFrpjtp70yI7Fbq//uR5VmHswVlilHdQcF5qJ+U/xA9L85Zoh//40P2GL0e83o/5xZew/nfwswQxamY7yUKzIgAQBkrpPizFLxmaJz/8pZ6D5VLpeTZQj5ojw1TdOlX0+I+8baQ2pdi6kjc7+cuTHXOPExoabFPO45rFiKKmqBiEUQAoMwl6ZRVr62vztNyDcuSb4+NfNtag9a0FFv74rVU5SzydC5V+S1fffHaOXaxqpS5nGN29UTZLj+XqGqCikUQAYABICnr/e4bmvvvZpbAbME1YcTd28LJq2FYkJqWYmtfvGZjgtSseL0mV6v3Uu+YMaKqCSoWZ80AwABgbnBRnR1S7Hic3Ddjd8Gq8znu9wnr84RR3BtUkP4ocResRoWzZgCggiRpvd9wtluvqkrpv86dk7UMkE6n9dMNLyudTqumplrzZl+Tc2kljBDhXDpybh12NzkLI5QkbeYhqQgiAFDGkrbe7xxXOm3ZMzRuzjbrzueYnShetS6Ftof34t5SbOpSTG1KmKGkmOWcSkQQAYAylsR/dQeZoclVO2GCibvWJawdQs73MVuKnTUrzhbshYQcFIcaEQBAaAo5N8bvOZJy1rqEVQ9jmpydakPf23Bs8oTx9iF1A72WIyrUiAAAYhF0hibXzX3S+HH69jdu8611CWOHkLuZlwkhVVW9Z9mYkEIIiR5BBAAQmmLrIgqpdenvKcHua5nlmFNhpCr2bdCVhBbvAIDY5ZpJcbdrD9pm3YtXbYopVE2nLQ0f1mjvpilV2/PVa9f7XmfNupe0eu36yMcQJ2ZEAACxC3rIXX93CDkDj9/MiNlN4zzdN8qZkTB2A5UzgggAoCyEsUPIGXjcocS9W8acFxN1GEnaeUGlxq4ZAEDFC9IsTVKkXVmT1h23Pwq5f1MjAgCoeF+8do7vTf/6udfoi9fOsZdQ3PUcJkBUVfXvlnr93Gt8e6gMZCzNAAAQQNRLKP3dDVSuCCIAAATkDCNebeiLlcTzgkqFIAIAqEjFHqIXRkM197WSeF5QqVAjAgCoSMXWfHgtoTgV2hckaA+VgYoZEQBARSqm5iPIEkqhfUEq/ZRegggAoGIVUvORbwll1569Wrp4UVbAkaSdu/fonb37fN+72GWigYClGQBARQu6bTbXEorpxmqWZMyyynNr1+m5tetyhhCp+GWigYAZEQBARQu6bTbXjMTSxYty7nTJV9Rayd1VCSIAgIoV5rZZr2UeSYH7gkS1NTjpBu5cDwCgZMrxBFm/mo9CT/R1ci7zSCr4lOBK7K7KjAgAoN/K8QTZMA7RczPLPG5B+4JUYndVgggAoN/KscYh7G2z5vNOGj9On504QVLm95Ev4FRqd1WCCAAgFJVa4yB5L/MYXmEkyOsrpbsqQQQAEJqw25+Xi/4u80SxTFQuUpZlWXEPwk9nZ6caGhrU0dGh+vr6uIcDAMjD/Mve1DhUyowIMhVy/2ZGBAAQikqtcUD/EEQAAP0WZY1DJbc/rwQEEQBAv0VZ41COW4MRHEEEANBvUZ4gW45bgxEcQQQAkHiVvDV4oIu0xfvrr7+uz3/+8xo5cqRSqZR++tOfRnk5AMAAVontzytBpEHk+PHjuuCCC/TII49EeRkAQAXwan+O8hfp0sycOXM0Zw6VzACA/mFr8MCVqBqRrq4udXV12T93dnbGOBoAQBJUcvvzSpCoILJs2TLde++9cQ8DAJAgldz+vBKUrMV7KpXSmjVrNG/ePN/neM2INDc30+IdAIAyUrYt3mtra1VbWxv3MAAAQIlEumsGAAAgl0hnRI4dO6Z9+/bZPx84cEA7duxQY2OjRo8eHeWlAQBAGYg0iGzfvl0zZsywf16yZIkk6eabb9aTTz4Z5aUBAEAZiDSIXHnllSpRLSwAAChD1IgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEpiRB5Ac/+IHGjBmjuro6TZkyRT//+c9LcVkAAJBwkQeRZ555RosXL9bSpUv11ltv6YorrtCcOXPU1tYW9aUBAEDCpSzLsqK8wKWXXqqLLrpIjz76qP3YpEmTNG/ePC1btiznazs7O9XQ0KCOjg7V19dHOUwAABCSQu7fkc6InDx5Um+++aZmzZqV8fisWbO0ZcuWrOd3dXWps7Mz4w8AABi4Ig0ihw8fVk9Pj0aMGJHx+IgRI3To0KGs5y9btkwNDQ32n+bm5iiHBwAAYlaSYtVUKpXxs2VZWY9J0h133KGOjg77z8GDB0sxPAAAEJOaKN+8qalJ1dXVWbMfH330UdYsiSTV1taqtrY2yiEBAIAEiXRGZNCgQZoyZYo2btyY8fjGjRs1ffr0KC8NAADKQKQzIpK0ZMkSzZ8/X1OnTtW0adP0+OOPq62tTQsXLoz60gAAIOEiDyJf+tKX9Mknn+hv/uZv9OGHH+q8887TunXr9JnPfCbqSwMAgISLvI9If9BHBACA8pOYPiIAAAC5EEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEJtIgcv/992v69Ok67bTTdPrpp0d5KQAAUIYiDSInT57UjTfeqFtuuSXKywAAgDJVE+Wb33vvvZKkJ598MsrLAACAMhVpEClUV1eXurq67J87OztjHA3cTmxaLqWqVDfz9uzfbV4hWWnVXf2NGEYGAChXiSpWXbZsmRoaGuw/zc3NcQ8JfU5sWq7uA9vUtemh3tDhcOyJm9S16SEplaj/OQEAykDBd4577rlHqVQq55/t27cXNZg77rhDHR0d9p+DBw8W9T6V4MSm5VmBwP7d5hW9sxdhSlWpp3WLqlumZ4SRY0/cZD/uNVMCAEAuBS/NLFq0SF/+8pdzPuecc84pajC1tbWqra0t6rUVJ1XVOwshZQSAE5tXqGvTQ6q9ekmolzPX6Nr0kB1GujavkKweVbdM15AFq0K9HgCgMhQcRJqamtTU1BTFWFAAZzAwPztDSBSzE+5ryuqRUtW+IcSvpsQs89SMuSSrpoRaEwCoLJEWq7a1tenIkSNqa2tTT0+PduzYIUkaN26chgwZEuWlK4IzGHT97BGp52RkIcR5TTMTIkmyenRi8wrva/rM2nQf2Kae1i0ZTzXhpKd1S9ZsDuEEAAauSKsL77rrLl144YW6++67dezYMV144YW68MILi64hQba6mbdL1YOknpNS9aDI6zSOPXGTPRMiKatmxD222quXZPz+xOYVdk1JT+sW+3ETQty1JmaWh0JYABiYUpZlWXEPwk9nZ6caGhrU0dGh+vr6uIeTSPaNui+MRDkj4ixMHbJglX1tEyr8ru03Rvfj7veJeqkJABCNQu7fieojgsK4b9T2jV0K/cbtnMkwNSHuAlZZ6ezX9dWJ+M7apKrtx53hplRLTQCAeBFEypTXbIFXAWtorLRnKLB/9qvhcNSJmDBilmO8Hq+bebsdQkqx1AQAiBdBpFwFCAZhylUoGjQs1M5YJMkRQKSs2Zzu1q12CHGGEwDAwEQQKVMmGHhtkTV/j3u3ibuniV0P0sdZmFo383Z1t271rEExvwcADDwEkXIXcmOzUM+Tcc3a2EsuqSpVj7lMNWMuyXhvE0LM45EuNQEAEoEgUuZCb2wWYrBxBpYTm1dkLLnUtFyWObYSLzUBAJKBIDIAhNnYLIqOrUF294RRgwIAKD8EkQEizN0mYQabku/uAQCUFdpVDhDupY9jT9zk+7wgJ/OG1rE1x5JL7dVLWHIBgApHEBkAnLMODffttTuUusPIsce/5Nsu3R1Q3MHGq4V7EHVXf8M3xNTNvD370LtNy32vFTREAQDKB0GkzHktfQxZsCorjJzYvEI9B7bmfA8TUNzBxn1eTKT6imXd1+LMGQAYmKgRKXc+Sx9DFqyyz4bp+PZ4u85Dyl2IGndNRxTFsk6hbk8GAPQbQaTM5bppDlmwyg4h7joP30LUBGyjDbNYNkvIfVcAAP3DPPcA5lfnkasQtdCajqi4xygrHUrtiCmSdS7/cMovAMSHGZGQxTn177y2++Z67ImbMs54Sfp5Lu4Q1X1gm3pat0gKodFalDMuAICCMCMStjiLLfuubUKHs+7DtE/v2vRQSQpR+7P7xatY1jn+MGYyQtueDADoF2ZEAihkliPqYstcnNc2B8q5Zwx6b+jTPAtRu1u3asiCVXk/YyBF1mLkK5a1w1QIjdaSPisEAJWAIBJEgTfVOKf+ndd27papm3m7Tmxa7luIak6+dd+QnZ+xJIEsQLFsz3vb+zWTEaTlPACgNAgiARRzUw2z5Xox4/W6dr4dNu4bctAbdpiBLN+ZM/2dyYh7ezIAIBNBJKBCbqonNi1X94FtnjfMsAtWvWYp7Jt1qrqgm3W+zxh3IAtlJiMB25MBAKcQRAoQ9KZqdnhUt0zPmGkwyx+h9qpwLRuZa5nOqqamwvy+v5+x0FmOsGoxwprJ4JRfAEgWds0UIMj5K2aHSur0URk1F85gICm0M1OcfTHMbhlzrdqrl/S1e5/muzOmmDNmgu44CbVVPIfnAcCARBAJKPBNte+GOWjqn0k6VTR6KoRYoW/jNTfjntYtUqraDiHmpl3TMk2S1L1/i+dnKvSMmaCBzGsGo9gwkpRGawCAcLE0E0AhywLuG2LXpofsm3ZNy2WRbeP1W1IxNSTOAJC5hDOtoDNmAtdpUIsBAAiAIBKE46bqLA5131RzFqL2nIy0l4hvLUZfDUnt1UvsMGIHFp2aLQkSHPoTyDzfEwBQ8QgiAWTcVF3FofbMg2sLq/Nn540/CkFmKeyxmdoOyW565v6Mfj1BcvUhkcQsBwCgYASRAhWyXGHvjnHMVITdq6KQWYqMs2akvA3Msj47sxwAgJBRrFoEZ9Flx7fHZy+59C1zSKdmIkzxpyR1t/4yvMEUs5ukelDG+DiFFgAQl5RlWVbcg/DT2dmphoYGdXR0qL6+Pu7hZDEt1FU9SA337c34nd9NPY6bvXOpxszMOINI1uNWOrYThAEA5a+Q+zczIkXKu4U1QX0vzLZd97Zc85jpwGrvtonzBGEAQEWhRqQIQYpDk1BPYVrN9xzYmlVD0t261W5+JqvHs/NpHCcIAwAqC0HEh99Js+7+G1KCD01LVdmN1NzLQz2tW5Q6ozmj+ZnfbptSnyAMAKgcBBE/rm26hlnmsPtv9EniFlZnmMhuZDY9qwOrV6CK6wRhAEBlIIj48FuecC9zeL2mEH4zL1I4haF+Mxuy0qppuSxvA7MwDqwDAMAPQSSHkixP+My85OrnUahCZzYKauUOAEA/EETyiHp5ohSFofbMRt/uGK8mZs6Zl0KapAEA0B8EkTxKsTwR5cyLuyakumW6b+ixcWAdAKBEaGiWQ67W7VHMCORqkFYMv/E7Q4m7YDXJoq6nAQCEg4ZmIfBbnjDNwLIamIVwvZwN0orhmtkw4+9p3SKlqssqhEii0RoADEAszfgp4fJEVIWhXrMDGTUvOW7cxcwwlHIHkPmZRmsAUN4IIj5K1Rm11IWh7pmXoDt2goSMUu0Akmi0BgADBUEkbgmYeQk0w+AKGSaYSMp6ftemh9S9f4uG/OUzkcxY0GgNAAYOilUrRL7TgCVlnMDrN/PhnNlwHpxnwkn3gW29NSiu95MUWjGpPeY84wUAxKOQ+zczIgnktQzinIFw39Cd9RfO12a8j5XuO+AunfUaSepyLNn43dQzloyqB2X93oSQ6pbp6nlv+6klICm0pRkarQHAwMI2gyTy2h3S95h7d0jWjhHna11/790tU+W9y8R1Aq+fupm3289T9SB7F1HHt8dnbAl216GEMWtR6p1MAIDoMSOSQEGLVfMVutZevcS+SUvKWFIx580ce+ImbX7lFf2vX1Xpocd/pMtTO3trPFq3qmbMJVlLKe5iV0kZwaSm5bJTSzNho9EaAAw4kdWI/Pa3v9V3vvMdvfLKKzp06JBGjhypr3zlK1q6dKkGDcqe1vdS6TUiXrUQkgLVR7hfK8n+e3XLNMmylP79B0ofadOsZ4/rjQNHdNklF+uV+/+bTv77j6QTnapuma4hC1ZlvafXsojzOu6lmdoZi+xGal7hBgAwsCSiRuTdd99VOp3WY489pnHjxuk3v/mNFixYoOPHj+vBBx+M6rJFC7Q9VSppZ0+/3SFBdoy4XyvJMWsxzQ4Qr3w8VG8cOKj/cfFQ/WDbG1r/xG81s2WwUmc0q2bMJfb7HXvippwN0GpnLFJ36y/V0/rLrGJVSfaSTU3LZWF9PQCAASCyIDJ79mzNnj3b/rmlpUW7d+/Wo48+msggErQHRtR9Mpx8u60GOPsm1xJKd+svVd0yXd37/10PbNini/9Tre6/6nS98UGXHvhFh66adKYa/ucvMt7L1H/ISuvE5hXq3r9FPQe2ZuyIqWmZpvSRg7J+/76k3nDi/M7Y3QIAcCtpjUhHR4caGxt9f9/V1aWuri77587OzlIMS1JhXTtL0dkz1zJIvh0jvttsXcHg1RMteqP9oFb/2XClUil96/IGffHZj/XKOx9pbl/A8RtH6ozmjO8uY5lGUur0Ub67awAAMErWR2T//v266KKL9A//8A/6+te/7vmce+65R/fee2/W46WsEQnSoyLqPhZe4cYdREwnU0neQeH0/6Sqxs+op3WLUqePUlXjaHtppbt1q7r3/7tm/e/fSZJenj9CqVRKlmX1PlY7RC//2aeVqqn1/HzHHv+Seg5szTo8L3X6KFm/f1/VYy7TkL98JuMQv9oZizKWrjjADgAGrkgPvbvnnnuUSqVy/tm+fXvGa9rb2zV79mzdeOONviFEku644w51dHTYfw4ePFjo8PrNvT3V60aZ8ZxUte/N9MSm5cUNwmt3SN9jzhCSFUwkdbf+UpJOhZAzmmX9/n07MJzc/qx6Wrf01oa0n9S3Lm9QKpWSJHtW5I0DR/TKb09mfAcnNq/Qscf/TCc2r1DN2OmSlHF4npSyl2Rqxk73XBrKCBYcYAcAUBEzIocPH9bhw4dzPuecc85RXV2dpN4QMmPGDF166aV68sknVVUV/AYTx66ZgmZEUtWS1ZN3d0nUY3XPhrh/tnt79LFnPnRqNsTzd38+Uimls2Y+3Es+Tu4twrmWr/KNv1SYnQGAcEW6a6apqUlNTU2BnvvBBx9oxowZmjJlilauXFlQCIlDkK6d7ueY3STHnrhJQxasKunNNN8BcBl1L32hSZJeOXBCb7SftGtDnJy1Ij+vn6UZTZ12CHF+vuqW6Z5j6m7dmrW7xq8vSmIOsCvBYX0AAG+RFau2t7fryiuv1OjRo/Xggw/q448/tn931llnRXXZogU5Bdf8vbpluv27IQtW2WGk445zJFklvZnmOwAu4/fqnfF44BcdunjkIF01ps7zPa8aU6eLRw7Sd77/Y13xlTNVM/aP1NO6xd6hY8KGlJJ0akLNOXMSuOmYle4NSR7jL9VsRCGFygCAcEU2RfHyyy9r3759euWVVzRq1CidffbZ9p9EytG1067BsE4tUzhrG3qXZfpuyj41I1Hx3eLr/n2qWtKp2RBnbYibXSvyQZdeea9bQxassr8Ds5U3I4T07YxxLt94tVuvm3l7VqjoPrCtd6amL4yY15W6VsTZKr7j2+MJIQBQIpy+WwS/5Rmz/FGqG1iuGgtZ6cxD6Fq35KwNcXM+97Uff1eDr17ce033CbvKrAupbpmmmpZpgWYy3DUsXrUopQ4CGTt9rvwf1I4AQBES0Vl1IMuobdi8IqNgtVSnweZdSqqrl050KnVGsx0afn76XL3R/phnbYibs1Zk/RN/q7mpVO/7m8PzlF2YamYUalqmBQ4hWfU4fbtwvJZ3ouaeXXIGLmpHACAaBJEimF0WdgFoqjpj10y1o4V6ZDdTx1KSc9eHs4YjdUazrP9zUKqr16A/+pq+85ff0ZjTazTstGrtOHQy7yWGnVatMWcM0nffHqyrNv6D/b7VYy5TzdjMoOAMI4EOn3MthWXUsqSqM9rLl0KuHUfUjgBAdAgixXDssjBhxNQ2mJuUWZ6IildPDqn3ZumemdGJTh19abnaD3fog6M9uvLJQwVdq3vwSaX+5P+TDv4q7+m3QZcq8p7qW8I+Ivlml0wYiXVnDwAMUASRfqp17biI4ybltz3WqbZGemn+CHV+7iZJ0slfPaVBl/53+++Sem/+VlqDLv3vqr3kv9mvPfPMM9UwalSgMRQjyLbpSOUoVDa/d54mTAgBgPAQRArkeY5LROepFNJoy68nh6lhkaRR9TXSe//SO/bx3zo1Y3LWoKwal9rxzUXfcAsZt9/ZNeazdLdu7TtQ77LICkNzva8JRkEOGgQAFI4gUijXv56dPTzMeSqhKbDRlrunSO9NvEdmq63ZjZIVnvpqMpw3WHddhN+yi2fo6Bt3d+tW1Yy5JCt0ZIzbsSXaec3MfiWy28qX2rEnvqye1l96ztZIYucMAPQTQaRAzptO3vNU+nutAhttucfjLFh1b40145VSktXjvUOkr29Izh0iHmHJHSLM2LzG7Q4pzpu8c3dOHDMQvT1TfpnxmPu/CTtnAKB/CCJFiqKuwWt2wWurcK7zb9znwAyacqP9s/vMGUmq7QsCxe4Q8QtLztBjenMEfR9nGIm1MLRv9suMS8r8b1s95jKWaACgnwgiRQjSDr6oG5RjScO5HdjWt1U4Vwgx23n9enwY1S3TVdNyWdbvi9khkuvMGGeDsFzvY2+JdswuOU/+jWMJxH09ds4AQPiSfQpdyE5sWu7ZelzqvZmf2LQ82BsFaQdfhLqZt9uzCMeeuMkeV2ZNQk/2Z3D35Lj6G3adhed4+vqeONuaS72zDzVjLjkVBgrYIVI38/as1+VrP585pr4lnh5Hf5Oekzr2xE0lbfXux+vzAQD6r7JmREI6ZTXfLov+yDhE784We8eLJN9lgiDjOfb4l3ofcO38cPcAKXaHiPt15jMUs3TlnK2Jq8uqGztnACAaFRVEyuWU1SELVnmGEHcnU/euFD8nNq9Qz4GtvqHAvG+xdS9+Z+84A0SupSvndcyMUM972+3fO0/+jUPsfU4AYACrqCAi5a5nKDW/fhsnHL0/vDh3pdS0XJb7GgHrWYqte/F6nWnP7g4QztmXDFbaPizPXVdSO2ORuvdvibRLbS6R1QMBACRVYBCRsvttxHYj8VgqyqoJ6Wsh77UrJVCACtA1tKDnBXh/55Zc9+u8xptvS/SQv3wm92eMUrHfCwAgkJRlWVbcg/BTyDHChbBv9n03uziXZfy23UrKWuqQZI/ZdEL1er9ybbLltwSSpGUzAEB+hdy/K2rXjJR5s2u4b6+9ayTnjo4IOXeudDnG4Lz5DlmwStUtfZ1F+06nNcseTnbACmmHiXOXkXvHkXOXUUE7jvyu5bMEEvd/HwBAtCpqaSap6/0ZS0WpatU6d7P0qWm5rHdWpG+pJl/zsULOe/GVccrwqb93799iF7+6Z3SKnpFhCQQAKlJFBZGk3uy86iLcv/dassjZfCyErcrOkObVFM0UzHotpRQqyi3RAIDkqqggksSbXb6toXlncVLVnkW3YW1Vzngf10F5zi261HMAAIpRkcWqSeF3886YWbDSvkssdhFrjqLbsApznVtqJdnLSLJ6ElH0CwBIjkLu3xU1I5I4AZaK/GZx3Ft4/ZpshbFV2XPpyPzdZ0YGAIAgCCIxKnapqJCi2/62JvfbXlw7Y5FdI2LCSCm6n4ZShAsASAyCSDkKWHTb39bkXiHEHUhMS3b3Lp7IhHReEAAgGQgiZSjITEooW5UdgefEpuUZgUaSqlumaciCVfZMhDmoLtB7F6lczgsCAARDseoAFeUSRpD3lhTpEkqSuuMCADIVcv8miCASQXYE9Tc4OHfyNNy3t79DBgCEhF0ziF3USyj9LcIFACQDQQSRcYYRz+6vRepvES4AIDkIIgik2JqTMPqYuK+VxPOCAADFqbjTd1Gkvm2zhZ7467WEkvF716m+7tdmneqbY+uy3YkWAFA2mBFBIMXUfARaQimwL0gSzwsCABSPIILACqn5yLeE0t26VUMWrMoKOJLUvX+Leg5sZUsuAFQAgggKErjmI8cSimkNb3a6eIWRXCGENu8AMHBQI4KC5Kv5MOqu/oZvkBiyYJVqr17iWXMiKX9Ra5H1KgCA5GFGBIGFuW3Wa5lHUqC+ILR5B4CBgyASg3JcWohi22zGMo9UUMCJqkcJAKC0CCJxKMcTZAOe+FsIe5nHJWjACbtHCQCg9AgiMSjHpYWwt82az1vdMk01LdMkZX4fQQIObd4BoPwRRGJSyUsLuUKXZxgJ8B60eQeA8kQQiVHFLi30c5mHNu8AMHAQRGJUqUsL/V7miaBeBQAQD4JITFhaKB5t3gFg4CCIxCDKpYVy3BoMAKhcBJE4RLm0UI5bgwEAFYsgEoMolxbKcWswAKByEUQGoEreGgwAKC+Rng72hS98QaNHj1ZdXZ3OPvtszZ8/X+3t7VFeEn3qZt5u78apqK3BAICyEmkQmTFjhp599lnt3r1bq1ev1v79+3XDDTdEeUn0CXpKLgAAcUpZlmWV6mIvvvii5s2bp66uLn3qU5/K+/zOzk41NDSoo6ND9fX1JRjhwOC3NZjlGQBAKRRy/y5ZjciRI0f01FNPafr06b4hpKurS11dXfbPnZ2dpRregEHXUQBAOYl0aUaSvvnNb+rTn/60hg0bpra2Nr3wwgu+z122bJkaGhrsP83NzVEPb+DJsTW49uoldB0FACRKwUsz99xzj+69996cz3njjTc0depUSdLhw4d15MgRvffee7r33nvV0NCgtWvXKpVKZb3Oa0akubmZpRkAAMpIIUszBQeRw4cP6/Dhwzmfc84556iuri7r8ffff1/Nzc3asmWLpk2blvda1IgAAFB+Iq0RaWpqUlNTU1EDM5nHOesBAAAqV2TFqtu2bdO2bdt0+eWX64wzzlBra6vuuusujR07NtBsCAAAGPgiK1YdPHiwnn/+ec2cOVMTJ07UV7/6VZ133nl67bXXVFtbG9VlAQBAGYlsRuRzn/ucXnnllajeHgAADACRb98FAADwQxABAACxIYgAAIDYEEQAAEBsCCIAACA2JTv0rhimARqH3wEAUD7MfTtI8/ZEB5GjR49KEoffAQBQho4ePaqGhoaczyn4rJlSSqfTam9v19ChQz0PyfNiDso7ePAg59MUgO+tcHxnxeF7KxzfWXH43goX1ndmWZaOHj2qkSNHqqoqdxVIomdEqqqqNGrUqKJeW19fz//wisD3Vji+s+LwvRWO76w4fG+FC+M7yzcTYlCsCgAAYkMQAQAAsRlwQaS2tlZ33303B+sViO+tcHxnxeF7KxzfWXH43goXx3eW6GJVAAAwsA24GREAAFA+CCIAACA2BBEAABAbgggAAIjNgA8iX/jCFzR69GjV1dXp7LPP1vz589Xe3h73sBLrt7/9rb72ta9pzJgxGjx4sMaOHau7775bJ0+ejHtoiXf//fdr+vTpOu2003T66afHPZxE+sEPfqAxY8aorq5OU6ZM0c9//vO4h5R4r7/+uj7/+c9r5MiRSqVS+ulPfxr3kBJv2bJluvjiizV06FCdeeaZmjdvnnbv3h33sBLt0Ucf1fnnn283Mps2bZrWr19fkmsP+CAyY8YMPfvss9q9e7dWr16t/fv364Ybboh7WIn17rvvKp1O67HHHtPOnTu1fPly/dM//ZPuvPPOuIeWeCdPntSNN96oW265Je6hJNIzzzyjxYsXa+nSpXrrrbd0xRVXaM6cOWpra4t7aIl2/PhxXXDBBXrkkUfiHkrZeO2113Trrbdq69at2rhxo7q7uzVr1iwdP3487qEl1qhRo/TAAw9o+/bt2r59u6666ipdd9112rlzZ/QXtyrMCy+8YKVSKevkyZNxD6Vs/N3f/Z01ZsyYuIdRNlauXGk1NDTEPYzEueSSS6yFCxdmPHbuueda3/rWt2IaUfmRZK1ZsybuYZSdjz76yJJkvfbaa3EPpaycccYZ1g9/+MPIrzPgZ0Scjhw5oqeeekrTp0/Xpz71qbiHUzY6OjrU2NgY9zBQxk6ePKk333xTs2bNynh81qxZ2rJlS0yjQqXo6OiQJP7vWEA9PT16+umndfz4cU2bNi3y61VEEPnmN7+pT3/60xo2bJja2tr0wgsvxD2ksrF//35973vf08KFC+MeCsrY4cOH1dPToxEjRmQ8PmLECB06dCimUaESWJalJUuW6PLLL9d5550X93AS7de//rWGDBmi2tpaLVy4UGvWrNHkyZMjv25ZBpF77rlHqVQq55/t27fbz//rv/5rvfXWW3r55ZdVXV2tP//zP5dVYQ1lC/3OJKm9vV2zZ8/WjTfeqK9//esxjTxexXxv8JdKpTJ+tiwr6zEgTIsWLdLbb7+tVatWxT2UxJs4caJ27NihrVu36pZbbtHNN9+sXbt2RX7dmsivEIFFixbpy1/+cs7nnHPOOfbfm5qa1NTUpAkTJmjSpElqbm7W1q1bSzLllBSFfmft7e2aMWOGpk2bpscffzzi0SVXod8bvDU1Nam6ujpr9uOjjz7KmiUBwnLbbbfpxRdf1Ouvv65Ro0bFPZzEGzRokMaNGydJmjp1qt544w2tWLFCjz32WKTXLcsgYoJFMcxMSFdXV5hDSrxCvrMPPvhAM2bM0JQpU7Ry5UpVVZXlxFko+vO/NZwyaNAgTZkyRRs3btT1119vP75x40Zdd911MY4MA5FlWbrtttu0Zs0avfrqqxozZkzcQypLlmWV5F5ZlkEkqG3btmnbtm26/PLLdcYZZ6i1tVV33XWXxo4dW1GzIYVob2/XlVdeqdGjR+vBBx/Uxx9/bP/urLPOinFkydfW1qYjR46ora1NPT092rFjhyRp3LhxGjJkSLyDS4AlS5Zo/vz5mjp1qj3T1tbWRv1RHseOHdO+ffvsnw8cOKAdO3aosbFRo0ePjnFkyXXrrbfqJz/5iV544QUNHTrUnolraGjQ4MGDYx5dMt15552aM2eOmpubdfToUT399NN69dVXtWHDhugvHvm+nBi9/fbb1owZM6zGxkartrbWOuecc6yFCxda77//ftxDS6yVK1dakjz/ILebb77Z83v72c9+FvfQEuP73/++9ZnPfMYaNGiQddFFF7GdMoCf/exnnv+7uvnmm+MeWmL5/d+wlStXxj20xPrqV79q///N4cOHWzNnzrRefvnlklw7ZVkVVrUJAAASo3IX/wEAQOwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIzf8DYp3+lvuVfPsAAAAASUVORK5CYII=",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {
-      "engine": 0
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px --target 0\n",
-    "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n",
-    "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n",
-    "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n",
-    "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The Iris Dataset\n",
-    "------------------------------\n",
-    "The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from\n",
-    "three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader.\n",
-    "\n",
-    "**NOTE: you might have to change the path to the dataset in the following cell.**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%px\n",
-    "iris = ht.load(\"./heat/datasets/iris.csv\", sep=\";\", split=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well.\n",
-    "\n",
-    "Fitting the dataset with `kmeans`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\u001b[0;31mOut[2:20]: \u001b[0m\n",
-       "KMeans({\n",
-       "    \"n_clusters\": 3,\n",
-       "    \"init\": \"probability_based\",\n",
-       "    \"max_iter\": 300,\n",
-       "    \"tol\": 0.0001,\n",
-       "    \"random_state\": null\n",
-       "})"
-      ]
-     },
-     "metadata": {
-      "after": null,
-      "completed": null,
-      "data": {},
-      "engine_id": 2,
-      "engine_uuid": "69de46f1-abc608b965c5bc79faeb092a",
-      "error": null,
-      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
-      "execute_result": {
-       "data": {
-        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
-       },
-       "execution_count": 20,
-       "metadata": {}
-      },
-      "follow": null,
-      "msg_id": null,
-      "outputs": [],
-      "received": null,
-      "started": null,
-      "status": null,
-      "stderr": "",
-      "stdout": "",
-      "submitted": "2024-03-21T09:47:32.371869Z"
-     },
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "\u001b[0;31mOut[3:20]: \u001b[0m\n",
-       "KMeans({\n",
-       "    \"n_clusters\": 3,\n",
-       "    \"init\": \"probability_based\",\n",
-       "    \"max_iter\": 300,\n",
-       "    \"tol\": 0.0001,\n",
-       "    \"random_state\": null\n",
-       "})"
-      ]
-     },
-     "metadata": {
-      "after": null,
-      "completed": null,
-      "data": {},
-      "engine_id": 3,
-      "engine_uuid": "a4657187-cf8e91c40f19240ba56a42f6",
-      "error": null,
-      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
-      "execute_result": {
-       "data": {
-        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
-       },
-       "execution_count": 20,
-       "metadata": {}
-      },
-      "follow": null,
-      "msg_id": null,
-      "outputs": [],
-      "received": null,
-      "started": null,
-      "status": null,
-      "stderr": "",
-      "stdout": "",
-      "submitted": "2024-03-21T09:47:32.371965Z"
-     },
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "\u001b[0;31mOut[1:20]: \u001b[0m\n",
-       "KMeans({\n",
-       "    \"n_clusters\": 3,\n",
-       "    \"init\": \"probability_based\",\n",
-       "    \"max_iter\": 300,\n",
-       "    \"tol\": 0.0001,\n",
-       "    \"random_state\": null\n",
-       "})"
-      ]
-     },
-     "metadata": {
-      "after": null,
-      "completed": null,
-      "data": {},
-      "engine_id": 1,
-      "engine_uuid": "689d2228-122a4c5ed76d6d5375819746",
-      "error": null,
-      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
-      "execute_result": {
-       "data": {
-        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
-       },
-       "execution_count": 20,
-       "metadata": {}
-      },
-      "follow": null,
-      "msg_id": null,
-      "outputs": [],
-      "received": null,
-      "started": null,
-      "status": null,
-      "stderr": "",
-      "stdout": "",
-      "submitted": "2024-03-21T09:47:32.371782Z"
-     },
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "\u001b[0;31mOut[0:26]: \u001b[0m\n",
-       "KMeans({\n",
-       "    \"n_clusters\": 3,\n",
-       "    \"init\": \"probability_based\",\n",
-       "    \"max_iter\": 300,\n",
-       "    \"tol\": 0.0001,\n",
-       "    \"random_state\": null\n",
-       "})"
-      ]
-     },
-     "metadata": {
-      "after": null,
-      "completed": null,
-      "data": {},
-      "engine_id": 0,
-      "engine_uuid": "e3649dd0-f970dcd5e37935a1f3fe07c8",
-      "error": null,
-      "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n",
-      "execute_result": {
-       "data": {
-        "text/plain": "KMeans({\n    \"n_clusters\": 3,\n    \"init\": \"probability_based\",\n    \"max_iter\": 300,\n    \"tol\": 0.0001,\n    \"random_state\": null\n})"
-       },
-       "execution_count": 26,
-       "metadata": {}
-      },
-      "follow": null,
-      "msg_id": null,
-      "outputs": [],
-      "received": null,
-      "started": null,
-      "status": null,
-      "stderr": "",
-      "stdout": "",
-      "submitted": "2024-03-21T09:47:32.371675Z"
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px\n",
-    "k = 3\n",
-    "kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n",
-    "kmeans.fit(iris)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:3] Number of points assigned to c1: 50 \n",
-       "Number of points assigned to c2: 38 \n",
-       "Number of points assigned to c3: 62\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:0] Number of points assigned to c1: 50 \n",
-       "Number of points assigned to c2: 38 \n",
-       "Number of points assigned to c3: 62\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:1] Number of points assigned to c1: 50 \n",
-       "Number of points assigned to c2: 38 \n",
-       "Number of points assigned to c3: 62\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[stdout:2] Number of points assigned to c1: 50 \n",
-       "Number of points assigned to c2: 38 \n",
-       "Number of points assigned to c3: 62\n"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%px\n",
-    "labels = kmeans.predict(iris).squeeze()\n",
-    "\n",
-    "# Select points assigned to clusters c1, c2 and c3\n",
-    "c1 = iris[ht.where(labels == 0), :]\n",
-    "c2 = iris[ht.where(labels == 1), :]\n",
-    "c3 = iris[ht.where(labels == 2), :]\n",
-    "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n",
-    "#TODO is balancing really necessary?\n",
-    "c1.balance_()\n",
-    "c2.balance_()\n",
-    "c3.balance_()\n",
-    "\n",
-    "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c3: {c3.shape[0]}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Number of points assigned to c1: 50 \n",
-      "Number of points assigned to c2: 39 \n",
-      "Number of points assigned to c3: 61\n"
-     ]
-    }
-   ],
-   "source": [
-    "# compare Heat results with sklearn\n",
-    "from sklearn.cluster import KMeans\n",
-    "import sklearn.datasets\n",
-    "k = 3\n",
-    "iris_sk = sklearn.datasets.load_iris().data\n",
-    "kmeans_sk = KMeans(n_clusters=k, init=\"k-means++\").fit(iris_sk)\n",
-    "labels_sk = kmeans_sk.predict(iris_sk)\n",
-    "\n",
-    "c1_sk = iris_sk[labels_sk == 0, :]\n",
-    "c2_sk = iris_sk[labels_sk == 1, :]\n",
-    "c3_sk = iris_sk[labels_sk == 2, :]\n",
-    "print(f\"Number of points assigned to c1: {c1_sk.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c2: {c2_sk.shape[0]} \\n\"\n",
-    "        f\"Number of points assigned to c3: {c3_sk.shape[0]}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "heat_env",
-   "language": "python",
-   "name": "heat_env"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.8"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py b/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py
new file mode 100644
index 0000000000..3c7fb8ec3d
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py
@@ -0,0 +1,25 @@
+import heat as ht
+
+# ### DNDarrays
+#
+#
+# Similar to a NumPy `ndarray`, a Heat `dndarray`  (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension.
+#
+# Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**.
+
+
+a = ht.array([1, 2, 3])
+print("array creation with values [1,2,3] with the heat array method:")
+print(a)
+
+a = ht.ones((4, 5))
+print("array creation of shape = (4, 5) example with the heat ones method:")
+print(a)
+
+a = ht.arange(10)
+print("array creation with [0,1,...,9] example with the heat arange method:")
+print(a)
+
+a = ht.full((3, 2), fill_value=9)
+print("array creation with ones and shape = (3, 2) with the heat full method:")
+print(a)
diff --git a/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py b/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py
new file mode 100644
index 0000000000..5c6ab039d3
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py
@@ -0,0 +1,22 @@
+import heat as ht
+import numpy as np
+import torch
+
+# ### Data Types
+#
+# Heat supports various data types and operations to retrieve and manipulate the type of a Heat array. However, in contrast to NumPy, Heat is limited to logical (bool) and numerical types (uint8, int16/32/64, float32/64, and complex64/128).
+#
+# **NOTE:** by default, Heat will allocate floating-point values in single precision, due to a much higher processing performance on GPUs. This is one of the main differences between Heat and NumPy.
+
+a = ht.zeros((3, 4))
+print(f"floating-point values in single precision is default: {a.dtype}")
+
+b = torch.zeros(3, 4)
+print(f"like in PyTorch: {b.dtype}")
+
+
+b = np.zeros((3, 4))
+print(f"whereas floating-point values in double  precision is default in numpy: {b.dtype}")
+
+b = a.astype(ht.int64)
+print(f"casting to int64: {b}")
diff --git a/tutorials/scripts/hpc/01_basics/03_basics_operations.py b/tutorials/scripts/hpc/01_basics/03_basics_operations.py
new file mode 100644
index 0000000000..f2ea879388
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/03_basics_operations.py
@@ -0,0 +1,30 @@
+import heat as ht
+
+# ### Operations
+#
+# Heat supports many mathematical operations, ranging from simple element-wise functions, binary arithmetic operations, and linear algebra, to more powerful reductions. Operations are by default performed on the entire array or they can be performed along one or more of its dimensions when available. Most relevant for data-intensive applications is that **all Heat functionalities support memory-distributed computation and GPU acceleration**. This holds for all operations, including reductions, statistics, linear algebra, and high-level algorithms.
+#
+# You can try out the few simple examples below if you want, but we will skip to the [Parallel Processing](#Parallel-Processing) section to see memory-distributed operations in action.
+
+a = ht.full((3, 4), 8)
+b = ht.ones((3, 4))
+c = a + b
+print("matrix addition a + b:")
+print(c)
+
+
+c = ht.sub(a, b)
+print("matrix substraction a - b:")
+print(c)
+
+c = ht.arange(5).sin()
+print("application of sin() elementwise:")
+print(c)
+
+c = a.T
+print("transpose operation:")
+print(c)
+
+c = b.sum(axis=1)
+print("summation of array elements:")
+print(c)
diff --git a/tutorials/scripts/hpc/01_basics/04_basics_indexing.py b/tutorials/scripts/hpc/01_basics/04_basics_indexing.py
new file mode 100644
index 0000000000..0949a21f09
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/04_basics_indexing.py
@@ -0,0 +1,13 @@
+import heat as ht
+
+# ## Indexing
+
+# Heat allows the indexing of arrays, and thereby, the extraction of a partial view of the elements in an array. It is possible to obtain single values as well as entire chunks, i.e. slices.
+
+a = ht.arange(10)
+
+print(a[3])
+print(a[1:7])
+print(a[::2])
+
+# **NOTE:** Indexing in Heat is undergoing a [major overhaul](https://github.com/helmholtz-analytics/heat/pull/938), to increase interoperability with NumPy/PyTorch indexing, and to provide a fully distributed item setting functionality. Stay tuned for this feature in the next release.
diff --git a/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py b/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py
new file mode 100644
index 0000000000..e84663b164
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py
@@ -0,0 +1,14 @@
+import heat as ht
+
+# ---
+# Heat implements the same broadcasting rules (implicit repetion of an operation when the rank/shape of the operands do not match) as NumPy does, e.g.:
+
+a = ht.arange(10) + 3
+print(f"broadcast example of adding single value 3 to [0, 1, ..., 9]: {a}")
+
+
+a = ht.ones((3, 4))
+b = ht.arange(4)
+print(
+    f"broadcasting across the first dimension of {a} with shape = (3, 4) and {b} with shape = (4): {a + b}"
+)
diff --git a/tutorials/scripts/hpc/01_basics/06_basics_gpu.py b/tutorials/scripts/hpc/01_basics/06_basics_gpu.py
new file mode 100644
index 0000000000..6383d8dda4
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/06_basics_gpu.py
@@ -0,0 +1,39 @@
+import heat as ht
+import torch
+
+# ## Parallel Processing
+# ---
+#
+# Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs.
+#
+# For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem.
+#
+# **NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device.
+
+# ### GPUs
+#
+# Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU.
+
+if torch.cuda.is_available():
+    ht.zeros((3, 4), device="gpu")
+
+# Arrays on the same device can be seamlessly used in any Heat operation.
+
+if torch.cuda.is_available():
+    a = ht.zeros((3, 4), device="gpu")
+    b = ht.ones((3, 4), device="gpu")
+    print(a + b)
+
+# However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead).
+
+if torch.cuda.is_available():
+    a = ht.full((3, 4), 4, device="cpu")
+    b = ht.ones((3, 4), device="gpu")
+    print(a + b)
+
+# It is possible to explicitly move an array from one device to the other and back to avoid this error.
+
+if torch.cuda.is_available():
+    a = ht.full((3, 4), 4, device="gpu")
+    a.cpu()
+    print(a + b)
diff --git a/tutorials/scripts/hpc/01_basics/07_basics_distributed.py b/tutorials/scripts/hpc/01_basics/07_basics_distributed.py
new file mode 100644
index 0000000000..b92eb169be
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/07_basics_distributed.py
@@ -0,0 +1,70 @@
+import heat as ht
+
+# ### Distributed Computing
+#
+# Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept.
+#
+# <img src="https://github.com/helmholtz-analytics/heat/blob/main/doc/source/_static/images/split_array.png?raw=true" width="100%"></img>
+#
+# The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions.
+#
+#
+#
+#
+# Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.
+#
+# You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node.
+#
+# On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This
+#
+# ```bash
+# python ./my_script.py
+# ```
+#
+# becomes this instead:
+#
+# ```bash
+# mpirun -n <number_of_processors> python ./my_script.py
+# ```
+# On an HPC cluster you'll of course use SBATCH or similar.
+#
+#
+# Let's see some examples of working with distributed Heat:
+
+# In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)).
+
+
+dndarray = ht.arange(60).reshape(5, 4, 3)
+if dndarray.comm.rank == 0:
+    print("3-dimensional DNDarray of integers ranging from 0 to 59:")
+print(dndarray)
+
+
+# Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.
+#
+# Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis.
+
+
+dndarray = ht.arange(60, split=0).reshape(5, 4, 3)
+if dndarray.comm.rank == 0:
+    print("3-dimensional DNDarray of integers ranging from 0 to 59:")
+print(dndarray)
+
+
+# The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the "local array" via the `larray` attribute.
+
+print(f"data on process no {dndarray.comm.rank}: {dndarray.larray}")
+
+
+# Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information.
+
+# The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes.
+
+# The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator.
+#
+# (In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.)
+
+
+print(f"Global shape on rank {dndarray.comm.rank}: {dndarray.shape}")
+print(f"Local shape on rank: {dndarray.comm.rank}: {dndarray.lshape}")
+print(f"Local device on rank: {dndarray.comm.rank}: {dndarray.device}")
diff --git a/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py b/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py
new file mode 100644
index 0000000000..a8bf106585
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py
@@ -0,0 +1,24 @@
+import heat as ht
+
+dndarray = ht.arange(60, split=0).reshape(5, 4, 3)
+
+# You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported.
+#
+# The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.
+
+
+# transpose
+print(dndarray.T)
+
+
+# reduction operation along the distribution axis
+print(dndarray.sum(axis=0))
+
+# min / max etc.
+print(ht.sin(dndarray).min(axis=0))
+
+
+other_dndarray = ht.arange(60, 120, split=0).reshape(5, 4, 3)  # distributed reshape
+
+# element-wise multiplication
+print(dndarray * other_dndarray)
diff --git a/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py b/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py
new file mode 100644
index 0000000000..d15ea26eb8
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py
@@ -0,0 +1,55 @@
+# As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems.
+#
+# So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize.
+#
+# With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs:
+
+import heat as ht
+import torch
+
+if torch.cuda.is_available():
+    device = "gpu"
+else:
+    device = "cpu"
+
+n, m = 400, 400
+x = ht.random.randn(n, m, split=0, device=device)  # distributed RNG
+y = ht.random.randn(m, n, split=None, device=device)
+z = x @ y
+print(z)
+
+# `ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead.
+
+# You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that:
+# - If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.)
+# - If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU.
+# - If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option.
+#
+# Time the multiplication for different split parameters and see how the performance changes.
+#
+#
+
+
+import time
+
+start = time.time()
+z = x @ y
+end = time.time()
+print("runtime: ", end - start)
+
+
+# Heat supports many linear algebra operations:
+# ```bash
+# >>> ht.linalg.
+# ht.linalg.basics        ht.linalg.hsvd_rtol(    ht.linalg.projection(   ht.linalg.triu(
+# ht.linalg.cg(           ht.linalg.inv(          ht.linalg.qr(           ht.linalg.vdot(
+# ht.linalg.cross(        ht.linalg.lanczos(      ht.linalg.solver        ht.linalg.vecdot(
+# ht.linalg.det(          ht.linalg.matmul(       ht.linalg.svdtools      ht.linalg.vector_norm(
+# ht.linalg.dot(          ht.linalg.matrix_norm(  ht.linalg.trace(
+# ht.linalg.hsvd(         ht.linalg.norm(         ht.linalg.transpose(
+# ht.linalg.hsvd_rank(    ht.linalg.outer(        ht.linalg.tril(
+# ```
+#
+# and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](tinyurl.com/demoissues) and we'll get back to you.
+
+# You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely.
diff --git a/tutorials/scripts/hpc/01_basics/10_interoperability.py b/tutorials/scripts/hpc/01_basics/10_interoperability.py
new file mode 100644
index 0000000000..f3ec217425
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/10_interoperability.py
@@ -0,0 +1,26 @@
+# ### Interoperability
+#
+# We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows.
+#
+
+# Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries.
+
+import heat as ht
+import torch
+import numpy as np
+
+torch_array = torch.arange(ht.MPI_WORLD.rank, ht.MPI_WORLD.rank + 5)
+heat_array = ht.array(torch_array, copy=False, is_split=0)
+heat_array[0] = -1
+print(torch_array)
+
+torch_array = torch.arange(ht.MPI_WORLD.rank, ht.MPI_WORLD.rank + 5)
+heat_array = ht.array(torch_array, copy=True, is_split=0)
+heat_array[0] = -1
+print(torch_array)
+
+np_array = heat_array.numpy()
+print(np_array)
+
+
+# Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](tinyurl.com/demoissues) if you encounter any issues or have any feature requests.
diff --git a/tutorials/scripts/hpc/01_basics/11_internals_1.py b/tutorials/scripts/hpc/01_basics/11_internals_1.py
new file mode 100644
index 0000000000..d8c1dae30d
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/11_internals_1.py
@@ -0,0 +1,44 @@
+import heat as ht
+import torch
+
+# # Heat as infrastructure for MPI applications
+#
+# In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node.
+
+
+# We already mentioned that the DNDarray object is "MPI-aware". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it.
+#
+
+a = ht.random.randn(7, 4, 3, split=1)
+if a.comm.rank == 0:
+    print(f"a.comm gets the communicator {a.comm} associated with DNDarray a")
+
+# MPI size = total number of processes
+size = a.comm.size
+
+if a.comm.rank == 0:
+    print(f"a is distributed over {size} processes")
+    print(f"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}")
+
+
+# MPI rank = rank of each process
+rank = a.comm.rank
+# Local shape = shape of the data on each process
+local_shape = a.lshape
+print(f"Rank {rank} holds a slice of a with local shape {local_shape}")
+
+
+# ### Distribution map
+#
+# In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array.
+#
+# The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray.
+#
+# The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process.
+
+
+lshape_map = a.lshape_map
+if a.comm.rank == 0:
+    print(f"lshape_map available on any process: {lshape_map}")
+
+# Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes.
diff --git a/tutorials/scripts/hpc/01_basics/12_internals_2.py b/tutorials/scripts/hpc/01_basics/12_internals_2.py
new file mode 100644
index 0000000000..94d71a445d
--- /dev/null
+++ b/tutorials/scripts/hpc/01_basics/12_internals_2.py
@@ -0,0 +1,71 @@
+import heat as ht
+import torch
+
+# ### Modifying the DNDarray distribution
+#
+# In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm.
+#
+# Depending on your needs, you can choose between:
+# - `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a "target map".
+# - `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis.
+#
+# Let's see some examples.
+
+a = ht.random.randn(7, 4, 3, split=1)
+
+# redistribute
+target_map = a.lshape_map
+target_map[:, a.split] = torch.tensor([1, 2, 2, 2])
+# in-place redistribution (see ht.redistribute for out-of-place)
+a.redistribute_(target_map=target_map)
+
+# new lshape map after redistribution
+a.lshape_map
+
+# local arrays after redistribution
+a.larray
+
+
+# resplit
+a.resplit_(axis=1)
+
+a.lshape_map
+
+
+# You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead.
+
+
+# "un-split" distributed array
+a.resplit_(axis=None)
+# each process now holds a copy of the entire array
+
+
+# The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function:
+
+
+# make `a` split again
+a = ht.array(a, split=0)
+
+
+# ### Making disjoint data into a global DNDarray
+#
+# Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend.
+#
+# You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be "joined" into a global, distributed DNDarray.
+
+
+# create some random local arrays on each process
+import numpy as np
+
+local_array = np.random.rand(3, 4)
+
+# join them into a distributed array
+a_0 = ht.array(local_array, is_split=0)
+print(a_0.shape)
+
+
+# Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis.
+
+# The `ht.array` function takes any data object as an input that can be converted to a torch tensor.
+
+# Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator.
diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py b/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py
new file mode 100644
index 0000000000..ea8aec1545
--- /dev/null
+++ b/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py
@@ -0,0 +1,40 @@
+# # Loading and Preprocessing
+#
+# ### Refresher
+#
+# Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing.
+#
+# As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each "worker" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.
+#
+# In other words:
+# - you don't have to worry about optimizing data chunk sizes;
+# - you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient;
+# - you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs).
+
+# The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only.
+
+# ### I/O
+#
+# Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).
+#
+# Now let's import `heat` and load a data set.
+
+import heat as ht
+
+# Some random data for small scale tests
+iris = ht.load("~/heat/tutorials/02_loading_preprocessing/iris.csv", sep=";", split=0)
+print(iris)
+
+# We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0.
+
+# similar for HDF5
+
+# first, we generate some data
+X = ht.random.randn(10000, 100, split=0)
+
+# ... and save it to file
+ht.save(X, "~/mydata.h5", "mydata", mode="a")
+
+# ... then we can load it again
+Y = ht.load_hdf5("~/mydata.h5", device="gpu", dataset="mydata", split=0)
+print(ht.allclose(X, Y))
diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py b/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py
new file mode 100644
index 0000000000..d3195ab5c1
--- /dev/null
+++ b/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py
@@ -0,0 +1,69 @@
+import heat as ht
+
+X = ht.random.randn(1000, 3, split=0, device="gpu")
+
+# We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0.
+
+# ### Data exploration
+#
+# Let's get an idea of the size of the data.
+
+
+# print global metadata once only
+if X.comm.rank == 0:
+    print(f"X is a {X.ndim}-dimensional array with shape{X.shape}")
+    print(f"X takes up {X.nbytes / 1e6} MB of memory.")
+
+# X is a matrix of shape *(datapoints, features)*.
+#
+# To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations.
+
+
+features_mean = ht.mean(X, axis=0)
+features_var = ht.var(X, axis=0)
+features_max = ht.max(X, axis=0)
+features_min = ht.min(X, axis=0)
+# ht.percentile is buggy, see #1389, we'll leave it out for now
+# features_median = ht.percentile(X,50.,axis=0)
+
+
+if ht.MPI_WORLD.rank == 0:
+    print(f"Mean: {features_mean}")
+    print(f"Var: {features_var}")
+    print(f"Max: {features_max}")
+    print(f"Min: {features_min}")
+
+
+# Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations.
+
+# ### Preprocessing/scaling
+#
+# Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated.
+#
+# Again, please let us know if you're missing any features.
+
+
+# Standard Scaler
+scaler = ht.preprocessing.StandardScaler()
+X_standardized = scaler.fit_transform(X)
+standardized_mean = ht.mean(X_standardized, axis=0)
+standardized_var = ht.var(X_standardized, axis=0)
+
+if ht.MPI_WORLD.rank == 0:
+    print(f"Standard Scaler Mean: {standardized_mean}")
+    print(f"Standard Scaler Var: {standardized_var}")
+
+# Robust Scaler
+scaler = ht.preprocessing.RobustScaler()
+X_robust = scaler.fit_transform(X)
+robust_mean = ht.mean(X_robust, axis=0)
+robust_var = ht.var(X_robust, axis=0)
+
+if ht.MPI_WORLD.rank == 0:
+    print(f"Robust Scaler Mean: {robust_mean}")
+    print(f"Robust Scaler Median: {robust_var}")
+
+
+# Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data.
+#
+# Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)!
diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv b/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv
new file mode 100644
index 0000000000..8bc57da193
--- /dev/null
+++ b/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv
@@ -0,0 +1,150 @@
+5.1;3.5;1.4;0.2
+4.9;3.0;1.4;0.2
+4.7;3.2;1.3;0.2
+4.6;3.1;1.5;0.2
+5.0;3.6;1.4;0.2
+5.4;3.9;1.7;0.4
+4.6;3.4;1.4;0.3
+5.0;3.4;1.5;0.2
+4.4;2.9;1.4;0.2
+4.9;3.1;1.5;0.1
+5.4;3.7;1.5;0.2
+4.8;3.4;1.6;0.2
+4.8;3.0;1.4;0.1
+4.3;3.0;1.1;0.1
+5.8;4.0;1.2;0.2
+5.7;4.4;1.5;0.4
+5.4;3.9;1.3;0.4
+5.1;3.5;1.4;0.3
+5.7;3.8;1.7;0.3
+5.1;3.8;1.5;0.3
+5.4;3.4;1.7;0.2
+5.1;3.7;1.5;0.4
+4.6;3.6;1.0;0.2
+5.1;3.3;1.7;0.5
+4.8;3.4;1.9;0.2
+5.0;3.0;1.6;0.2
+5.0;3.4;1.6;0.4
+5.2;3.5;1.5;0.2
+5.2;3.4;1.4;0.2
+4.7;3.2;1.6;0.2
+4.8;3.1;1.6;0.2
+5.4;3.4;1.5;0.4
+5.2;4.1;1.5;0.1
+5.5;4.2;1.4;0.2
+4.9;3.1;1.5;0.1
+5.0;3.2;1.2;0.2
+5.5;3.5;1.3;0.2
+4.9;3.1;1.5;0.1
+4.4;3.0;1.3;0.2
+5.1;3.4;1.5;0.2
+5.0;3.5;1.3;0.3
+4.5;2.3;1.3;0.3
+4.4;3.2;1.3;0.2
+5.0;3.5;1.6;0.6
+5.1;3.8;1.9;0.4
+4.8;3.0;1.4;0.3
+5.1;3.8;1.6;0.2
+4.6;3.2;1.4;0.2
+5.3;3.7;1.5;0.2
+5.0;3.3;1.4;0.2
+7.0;3.2;4.7;1.4
+6.4;3.2;4.5;1.5
+6.9;3.1;4.9;1.5
+5.5;2.3;4.0;1.3
+6.5;2.8;4.6;1.5
+5.7;2.8;4.5;1.3
+6.3;3.3;4.7;1.6
+4.9;2.4;3.3;1.0
+6.6;2.9;4.6;1.3
+5.2;2.7;3.9;1.4
+5.0;2.0;3.5;1.0
+5.9;3.0;4.2;1.5
+6.0;2.2;4.0;1.0
+6.1;2.9;4.7;1.4
+5.6;2.9;3.6;1.3
+6.7;3.1;4.4;1.4
+5.6;3.0;4.5;1.5
+5.8;2.7;4.1;1.0
+6.2;2.2;4.5;1.5
+5.6;2.5;3.9;1.1
+5.9;3.2;4.8;1.8
+6.1;2.8;4.0;1.3
+6.3;2.5;4.9;1.5
+6.1;2.8;4.7;1.2
+6.4;2.9;4.3;1.3
+6.6;3.0;4.4;1.4
+6.8;2.8;4.8;1.4
+6.7;3.0;5.0;1.7
+6.0;2.9;4.5;1.5
+5.7;2.6;3.5;1.0
+5.5;2.4;3.8;1.1
+5.5;2.4;3.7;1.0
+5.8;2.7;3.9;1.2
+6.0;2.7;5.1;1.6
+5.4;3.0;4.5;1.5
+6.0;3.4;4.5;1.6
+6.7;3.1;4.7;1.5
+6.3;2.3;4.4;1.3
+5.6;3.0;4.1;1.3
+5.5;2.5;4.0;1.3
+5.5;2.6;4.4;1.2
+6.1;3.0;4.6;1.4
+5.8;2.6;4.0;1.2
+5.0;2.3;3.3;1.0
+5.6;2.7;4.2;1.3
+5.7;3.0;4.2;1.2
+5.7;2.9;4.2;1.3
+6.2;2.9;4.3;1.3
+5.1;2.5;3.0;1.1
+5.7;2.8;4.1;1.3
+6.3;3.3;6.0;2.5
+5.8;2.7;5.1;1.9
+7.1;3.0;5.9;2.1
+6.3;2.9;5.6;1.8
+6.5;3.0;5.8;2.2
+7.6;3.0;6.6;2.1
+4.9;2.5;4.5;1.7
+7.3;2.9;6.3;1.8
+6.7;2.5;5.8;1.8
+7.2;3.6;6.1;2.5
+6.5;3.2;5.1;2.0
+6.4;2.7;5.3;1.9
+6.8;3.0;5.5;2.1
+5.7;2.5;5.0;2.0
+5.8;2.8;5.1;2.4
+6.4;3.2;5.3;2.3
+6.5;3.0;5.5;1.8
+7.7;3.8;6.7;2.2
+7.7;2.6;6.9;2.3
+6.0;2.2;5.0;1.5
+6.9;3.2;5.7;2.3
+5.6;2.8;4.9;2.0
+7.7;2.8;6.7;2.0
+6.3;2.7;4.9;1.8
+6.7;3.3;5.7;2.1
+7.2;3.2;6.0;1.8
+6.2;2.8;4.8;1.8
+6.1;3.0;4.9;1.8
+6.4;2.8;5.6;2.1
+7.2;3.0;5.8;1.6
+7.4;2.8;6.1;1.9
+7.9;3.8;6.4;2.0
+6.4;2.8;5.6;2.2
+6.3;2.8;5.1;1.5
+6.1;2.6;5.6;1.4
+7.7;3.0;6.1;2.3
+6.3;3.4;5.6;2.4
+6.4;3.1;5.5;1.8
+6.0;3.0;4.8;1.8
+6.9;3.1;5.4;2.1
+6.7;3.1;5.6;2.4
+6.9;3.1;5.1;2.3
+5.8;2.7;5.1;1.9
+6.8;3.2;5.9;2.3
+6.7;3.3;5.7;2.5
+6.7;3.0;5.2;2.3
+6.3;2.5;5.0;1.9
+6.5;3.0;5.2;2.0
+6.2;3.4;5.4;2.3
+5.9;3.0;5.1;1.8
diff --git a/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py b/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py
new file mode 100644
index 0000000000..1543c81efe
--- /dev/null
+++ b/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py
@@ -0,0 +1,99 @@
+# # Matrix factorizations
+#
+# ### Refresher
+#
+# Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing.
+#
+# As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each "worker" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.
+#
+# In other words:
+# - you don't have to worry about optimizing data chunk sizes;
+# - you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient;
+# - you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs).
+
+# In the following, we will demonstrate the usage of Heat's truncated SVD algorithm.
+
+# ### SVD and its truncated counterparts in a nutshell
+#
+# Let $X \in \mathbb{R}^{m \times n}$ be a matrix, e.g., given by a data set consisting of $m$ data points $\in \mathbb{R}^n$ stacked together. The so-called **singular value decomposition (SVD)** of $X$ is given by
+#
+# $$
+# 	X = U \Sigma V^T
+# $$
+#
+# where $U \in \mathbb{R}^{m \times r_X}$ and $V \in \mathbb{R}^{n \times r_X}$ have orthonormal columns, $\Sigma = \text{diag}(\sigma_1,...,\sigma_{r_X}) \in \mathbb{R}^{r_X \times r_X}$ is a diagonal matrix containing the so-called singular values $\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_{r_X} > 0$, and $r_X \leq \min(m,n)$ denotes the rank of $X$ (i.e. the dimension of the subspace of $\mathbb{R}^m$ spanned by the columns of $X$). Since $\Sigma = U^T X V$ is diagonal, one can imagine this decomposition as finding orthogonal coordinate transformations under which $X$ looks "linear".
+
+# ### SVD in data science
+#
+# In data science, SVD is more often known as **principle component analysis (PCA)**, the columns of $U$ being called the principle components of $X$. In fact, in many applications **truncated SVD/PCA** suffices: to reduce $X$ to the "essential" information, one chooses a truncation rank $0 < r \leq r_X$ and considers the truncated SVD/PCA given by
+#
+# $$
+# X \approx X_r := U_{[:,:r]} \Sigma_{[:r,:r]} V_{[:,:r]}^T
+# $$
+#
+# where we have used `numpy`-like notation for selecting only the first $r$ columns of $U$ and $V$, respectively. The rationale behind this is that if the first $r$ singular values of $X$ are much larger than the remaining ones, $X_r$ will still contain all "essential" information contained in $X$; in mathematical terms:
+#
+# $$
+# \lVert X_r - X \rVert_{F}^2 = \sum_{i=r+1}^{r_X} \sigma_i^2,
+# $$
+#
+# where $\lVert \cdot \rVert_F$ denotes the Frobenius norm. Thus, truncated SVD/PCA may be used for, e.g.,
+# * filtering away non-essential information in order to get a "feeling" for the main characteristics of your data set,
+# * to detect linear (or "almost" linear) dependencies in your data,
+# * to generate features for further processing of your data.
+#
+# Moreover, there is a plenty of more advanced data analytics and data-based simulation techniques, such as, e.g., Proper Orthogonal Decomposition (POD) or Dynamic Mode Decomposition (DMD), that are based on SVD/PCA.
+
+# ### Truncated SVD in Heat
+#
+# In Heat we have currently implemented an algorithm for computing an approximate truncated SVD, where truncation takes place either w.r.t. a fixed truncation-rank (`heat.linalg.hsvd_rank`) or w.r.t. a desired accuracy (`heat.linalg.hsvd_rtol`). In the latter case it can be ensured that it holds for the "reconstruction error":
+#
+# $$
+# \frac{\lVert X - U U^T X \rVert_F}{\lVert X \rVert_F} \overset{!}{\leq} \text{rtol},
+# $$
+#
+# where $U$ denotes the approximate left-singular vectors of $X$ computed by `heat.linalg.hsvd_rtol`.
+#
+
+# To demonstrate the usage of Heat's truncated SVD algorithm, we will load the data set from the last example and then compute its truncated SVD. As usual, first we need to gain access to the MPI environment.
+
+
+import heat as ht
+
+X = ht.load_hdf5("~/mydata.h5", dataset="mydata", split=0).T
+
+
+# Note that due to the transpose, `X` is distributed along the columns now; this is required by the hSVD-algorithm.
+
+# Let's first compute the truncated SVD by setting the relative tolerance.
+
+
+# compute truncated SVD w.r.t. relative tolerance
+svd_with_reltol = ht.linalg.hsvd_rtol(X, rtol=1.0e-2, compute_sv=True, silent=False)
+print("relative residual:", svd_with_reltol[3], "rank: ", svd_with_reltol[0].shape[1])
+
+
+# Alternatively, you can compute a truncated SVD with a fixed truncation rank:
+
+# compute truncated SVD w.r.t. a fixed truncation rank
+svd_with_rank = ht.linalg.hsvd_rank(X, maxrank=3, compute_sv=True, silent=False)
+print("relative residual:", svd_with_rank[3], "rank: ", svd_with_rank[0].shape[1])
+
+# Once we have computed the truncated SVD, we can use it to approximate the original data matrix `X` by the truncated matrix `X_r`.
+#
+# Check out https://helmholtz-analytics.github.io/heat/2023/06/16/new-feature-hsvd.html to see how Heat's truncated SVD algorithm scales with the number of MPI processes and size of the dataset.
+
+# ### Other factorizations
+#
+# Other common factorization algorithms are supported in Heat, such as:
+# - QR decomposition (`heat.linalg.qr`)
+# - Lanczos algorithm for computing the largest eigenvalues and corresponding eigenvectors (`heat.linalg.lanczos`)
+#
+# Check out our [`linalg` PRs](https://github.com/helmholtz-analytics/heat/pulls?q=is%3Aopen+is%3Apr+label%3Alinalg) to see what's in progress.
+#
+
+# **References for hierarchical SVD**
+#
+# 1. Iwen, Ong. *A distributed and incremental SVD algorithm for agglomerative data analysis on large networks.* SIAM J. Matrix Anal. Appl., **37** (4), 2016.
+# 2. Himpe, Leibner, Rave. *Hierarchical approximate proper orthogonal decomposition.* SIAM J. Sci. Comput., **4** (5), 2018.
+# 3. Halko, Martinsson, Tropp. *Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions.* SIAM Rev. 53, **2** (2011)
diff --git a/tutorials/scripts/hpc/04_clustering/clustering.py b/tutorials/scripts/hpc/04_clustering/clustering.py
new file mode 100644
index 0000000000..85c6e2c5e3
--- /dev/null
+++ b/tutorials/scripts/hpc/04_clustering/clustering.py
@@ -0,0 +1,68 @@
+# Cluster Analysis
+# ================
+#
+# This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html).
+#
+# We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation.
+#
+# We will use matplotlib for visualization of data and results.
+
+
+import heat as ht
+
+# The Iris Dataset
+# ------------------------------
+# The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from
+# three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader.
+#
+# **NOTE: you might have to change the path to the dataset in the following cell.**
+
+iris = ht.load("~/heat/tutorials/hpc/02_loading_preprocessing/iris.csv", sep=";", split=0)
+
+
+# Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well.
+#
+# Fitting the dataset with `kmeans`:
+
+k = 3
+kmeans = ht.cluster.KMeans(n_clusters=k, init="kmeans++")
+kmeans.fit(iris)
+
+# Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster.
+
+labels = kmeans.predict(iris).squeeze()
+
+# Select points assigned to clusters c1, c2 and c3
+c1 = iris[ht.where(labels == 0), :]
+c2 = iris[ht.where(labels == 1), :]
+c3 = iris[ht.where(labels == 2), :]
+# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance
+# TODO is balancing really necessary?
+c1.balance_()
+c2.balance_()
+c3.balance_()
+
+print(
+    f"Number of points assigned to c1: {c1.shape[0]} \n"
+    f"Number of points assigned to c2: {c2.shape[0]} \n"
+    f"Number of points assigned to c3: {c3.shape[0]}"
+)
+
+
+# compare Heat results with sklearn
+from sklearn.cluster import KMeans
+import sklearn.datasets
+
+k = 3
+iris_sk = sklearn.datasets.load_iris().data
+kmeans_sk = KMeans(n_clusters=k, init="k-means++").fit(iris_sk)
+labels_sk = kmeans_sk.predict(iris_sk)
+
+c1_sk = iris_sk[labels_sk == 0, :]
+c2_sk = iris_sk[labels_sk == 1, :]
+c3_sk = iris_sk[labels_sk == 2, :]
+print(
+    f"Number of points assigned to c1: {c1_sk.shape[0]} \n"
+    f"Number of points assigned to c2: {c2_sk.shape[0]} \n"
+    f"Number of points assigned to c3: {c3_sk.shape[0]}"
+)
diff --git a/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py b/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py
new file mode 100644
index 0000000000..42e215a52a
--- /dev/null
+++ b/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py
@@ -0,0 +1,44 @@
+import heat as ht
+import numpy as np
+import h5py
+
+# Now its your turn! Download one of the following three data sets and play around with it.
+# Possible ideas:
+# get familiar with the data: shape, min, max, avg, std (possibly along axes?)
+# try SVD and/or QR to detect linear dependence
+# K-Means Clustering (Asteroids, CERN?)
+# Lasso (CERN?)
+# n-dim FFT (CAMELS?)...
+
+
+# "Asteroids": Asteroids of the Solar System
+# Download the data set of the asteroids from the JPL Small Body Database from https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html#/
+# and load the resulting csv file to Heat.
+
+
+# ... to be completed ...
+
+# "CAMELS": 1000 simulated universes on 128 x 128 x 128 grids
+# Take a bunch of 1000 simulated universes from the CAMELS data set (8GB):
+# ```
+# wget https://users.flatironinstitute.org/~fvillaescusa/priv/DEPnzxoWlaTQ6CjrXqsm0vYi8L7Jy/CMD/3D_grids/data/Nbody/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy ~/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy
+# ```
+# load them in NumPy, convert to PyTorch and Heat...
+
+X_np = np.load("~/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy")
+
+# ... to be completed ...
+
+# "CERN": A particle physics data set from CERN
+# Take a small part of the ATLAS Top Tagging Data Set from CERN (7.6GB, actually the "test"-part; the "train" part is much larger...)
+# ```
+# wget https://opendata.cern.ch/record/15013/files/test.h5 ~/test.h5
+# ```
+# and load it directly into Heat (watch out: the h5-file contains different data sets that need to be stacked...)
+
+filename = "~/test.h5"
+with h5py.File(filename, "r") as f:
+    features = f.keys()
+    arrays = [ht.load_hdf5(filename, feature, split=0) for feature in features]
+
+# ... to be completed ...
diff --git a/tutorials/scripts/hpc/README.md b/tutorials/scripts/hpc/README.md
new file mode 100644
index 0000000000..53304b16a1
--- /dev/null
+++ b/tutorials/scripts/hpc/README.md
@@ -0,0 +1,17 @@
+There are two example scripts in this directory, `slurm_script_cpu.sh` and `slurm_script_gpu.sh`, that demonstrate how to run a Heat application on an HPC-system with SLURM as resource manager.
+
+1. `slurm_script_cpu.sh` is an example script that runs a Heat application on a CPU node. You must specify the name of the respective partition of your cluster. Moreover, the
+    numer of CPU cores available at a node of your system must be greater or equal to the product of the tasks-per-node- and the cpus-per-task-argument (=8x16=128 in the example).
+
+2. `slurm_script_gpu.sh` is an example script that runs a Heat application on a GPU node. You must specify the name of the respective partition of your cluster. Moreover, the
+    numer of GPU devices available at a node of your system must be greater or equal to the number of GPUs requested in the script (=4 in the example).
+
+## Remarks
+
+* Please have a look into the documentation of your HPC-system for its detailed configuration and properties. Maybe, you have to adjust the script to your system.
+* You need to load the required modules (e.g., for MPI, CUDA etc.) from the module system of your HPC-system before running the script. Moreover, you need to install Heat in a virtual environment (and activate it). Alternatively, you may use spack (if available on your system) for installing Heat and its dependencies.
+* Depending on the configuration of SLURM and MPI on your system it might be the case that you need to replace `srun python ...` by
+    ```
+    mpirun -n $SLURM_NTASKS --bind-to hwthread --map-by socket:PE=${SLURM_CPUS_PER_TASK} python ...
+    ```
+    or similar.
diff --git a/tutorials/scripts/hpc/slurm_script_cpu.sh b/tutorials/scripts/hpc/slurm_script_cpu.sh
new file mode 100644
index 0000000000..6e534d3309
--- /dev/null
+++ b/tutorials/scripts/hpc/slurm_script_cpu.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+
+#SBATCH --partition=<partition_name>
+#SBATCH --nodes=1
+#SBATCH --tasks-per-node=8
+#SBATCH --cpus-per-task=16
+#SBATCH --time="00:01:00"
+
+export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
+export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
+srun python ~/heat/tutorials/hpc/01_basics/01_basics_dndarrays.py
diff --git a/tutorials/scripts/hpc/slurm_script_gpu.sh b/tutorials/scripts/hpc/slurm_script_gpu.sh
new file mode 100644
index 0000000000..9ffdc619f6
--- /dev/null
+++ b/tutorials/scripts/hpc/slurm_script_gpu.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+#SBATCH --partition=<partition_name>
+#SBATCH --nodes=1
+#SBATCH --tasks-per-node=4
+#SBATCH --cpus-per-task=2
+#SBATCH --gres=gpu:4
+#SBATCH --time="00:01:00"
+
+export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
+export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
+srun python ~/heat/tutorials/hpc/01_basics/01_basics_dndarrays.py

From 3c2b559b70f4950275937a898d560c12065d48ea Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Tue, 30 Sep 2025 13:18:36 +0200
Subject: [PATCH 02/15] Sturdier MPI Check (#1926) (#1979)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* sturdier mpi check

* GPU_AWARE as relevant global check, checks can depend on version now

* gpu aware compatiblity changes

* wip: communication layer fixes for gpu_aware_mpi

* _moveToCompDevice function

* fix: disabled OpenMPI cuda awareness due to unreliable tests and benchmarks

---------


(cherry picked from commit 363766239475f0e074eb44477cf0a068a86f13e2)

Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 .perun.ini                            |   2 +-
 .pre-commit-config.yaml               |   8 +-
 benchmarks/cb/heat_signal.py          |   6 +-
 benchmarks/cb/main.py                 |   6 ++
 heat/cli.py                           |   6 +-
 heat/core/_config.py                  | 113 ++++++++++++++++------
 heat/core/communication.py            | 129 ++++++++++++++------------
 heat/core/linalg/tests/test_basics.py |   2 +-
 heat/core/linalg/tests/test_qr.py     |   4 +-
 heat/core/tests/test_communication.py |  10 +-
 10 files changed, 179 insertions(+), 107 deletions(-)

diff --git a/.perun.ini b/.perun.ini
index b594eac4df..99863465da 100644
--- a/.perun.ini
+++ b/.perun.ini
@@ -19,7 +19,7 @@ format = bench
 data_out = ./bench_data
 
 [benchmarking]
-rounds = 10
+rounds = 5
 warmup_rounds = 1
 metrics = runtime,energy
 region_metrics = runtime,power
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 4b2b239cc2..08262173e8 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -39,10 +39,12 @@ repos:
       - id: "validate-cff"
         args:
           - "--verbose"
-  - repo: https://github.com/gitleaks/gitleaks
-    rev: v8.28.0
+  - repo: https://github.com/thoughtworks/talisman
+    rev: 'v1.37.0'  # Update me!
     hooks:
-      - id: gitleaks
+      # both pre-commit and pre-push supported
+      # -   id: talisman-push
+      - id: talisman-commit
   - repo: https://github.com/shellcheck-py/shellcheck-py
     rev: v0.11.0.1
     hooks:
diff --git a/benchmarks/cb/heat_signal.py b/benchmarks/cb/heat_signal.py
index 9ecf26a443..57f986f066 100644
--- a/benchmarks/cb/heat_signal.py
+++ b/benchmarks/cb/heat_signal.py
@@ -43,7 +43,7 @@ def convolution_batch_processing_stride(signal, kernel, stride):
 
 
 def run_signal_benchmarks():
-    n_s = 1000000000
+    n_s = 2000000
     n_k = 10003
     stride = 3
 
@@ -75,8 +75,8 @@ def run_signal_benchmarks():
     del signal, kernel
 
     # batch processing
-    n_s = 90000
-    n_b = 90000
+    n_s = 50000
+    n_b = 1000
     n_k = 503
     signal = ht.random.random((n_b, n_s), split=0)
     kernel = ht.random.random_integer(0, 1, (n_b, n_k), split=0)
diff --git a/benchmarks/cb/main.py b/benchmarks/cb/main.py
index 2dd4680ae0..9b958920e6 100644
--- a/benchmarks/cb/main.py
+++ b/benchmarks/cb/main.py
@@ -18,8 +18,14 @@
 from heat_signal import run_signal_benchmarks
 
 run_linalg_benchmarks()
+print("Linalg finished")
 run_cluster_benchmarks()
+print("Cluster finished")
 run_manipulation_benchmarks()
+print("Manipulation finished")
 run_preprocessing_benchmarks()
+print("Preprocessing finished")
 run_decomposition_benchmarks()
+print("Decomposition finished")
 run_signal_benchmarks()
+print("Signal finished")
diff --git a/heat/cli.py b/heat/cli.py
index 29a91fdbc7..defe338102 100644
--- a/heat/cli.py
+++ b/heat/cli.py
@@ -8,7 +8,7 @@
 import argparse
 
 from heat.core.version import __version__ as ht_version
-from heat.core.communication import CUDA_AWARE_MPI
+from heat.core._config import CUDA_AWARE_MPI, ROCM_AWARE_MPI, GPU_AWARE_MPI
 
 
 def cli() -> None:
@@ -49,6 +49,8 @@ def plaform_info():
         print(f"    Device name: {torch.cuda.get_device_name(def_device)}")
         print(f"    Device name: {torch.cuda.get_device_properties(def_device)}")
         print(
-            f"   Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB"
+            f"    Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB"
         )
         print(f"    CUDA Aware MPI: {CUDA_AWARE_MPI}")
+        print(f"    ROCM Aware MPI: {ROCM_AWARE_MPI}")
+        print(f"    GPU Aware MPI: {GPU_AWARE_MPI}")
diff --git a/heat/core/_config.py b/heat/core/_config.py
index da327835a1..48d0a3e22b 100644
--- a/heat/core/_config.py
+++ b/heat/core/_config.py
@@ -9,41 +9,94 @@
 import os
 import warnings
 import re
+import dataclasses
+from enum import Enum
 
-PLATFORM = platform.platform()
-MPI_LIBRARY_VERSION = mpi4py.MPI.Get_library_version()
-TORCH_VERSION = torch.__version__
-TORCH_CUDA_IS_AVAILABLE = torch.cuda.is_available()
-CUDA_IS_ACTUALLY_ROCM = "rocm" in TORCH_VERSION
 
-CUDA_AWARE_MPI = False
-ROCM_AWARE_MPI = False
+class MPILibrary(Enum):
+    OpenMPI = "ompi"
+    IntelMPI = "impi"
+    MVAPICH = "mvapich"
+    MPICH = "mpich"
+    CrayMPI = "craympi"
+    ParastationMPI = "psmpi"
+
+
+@dataclasses.dataclass
+class MPILibraryInfo:
+    name: MPILibrary
+    version: str
+
 
-# check whether there is CUDA- or ROCm-aware OpenMPI
-try:
-    buffer = subprocess.check_output(["ompi_info", "--parsable", "--all"])
-    CUDA_AWARE_MPI = b"mpi_built_with_cuda_support:value:true" in buffer
-    pattern = re.compile(r"^MPI extensions:.*", re.MULTILINE)
-    match = pattern.search(buffer)
-    ROCM_AWARE_MPI = "rocm" in match.group(0)
-except:  # noqa E722
-    pass
+def _get_mpi_library() -> MPILibraryInfo:
+    library = mpi4py.MPI.Get_library_version().split()
+    match library:
+        case ["Open", "MPI", *_]:
+            return MPILibraryInfo(MPILibrary.OpenMPI, library[2])
+        case ["Intel(R)", "MPI", *_]:
+            return MPILibraryInfo(MPILibrary.IntelMPI, library[3])
+        case ["MPICH", "Version:", *_]:
+            return MPILibraryInfo(MPILibrary.MPICH, library[2])
+        ### Missing libraries
+        case _:
+            print("Did not find a matching library")
 
-# do the same for MVAPICH
-CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MV2_USE_CUDA") == "1"
-CUDA_AWARE_MPI = ROCM_AWARE_MPI or os.environ.get("MV2_USE_ROCM") == "1"
 
-# do the same for MPICH, TODO: outdated?
-CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MPIR_CVAR_ENABLE_HCOLL") == "1"
+def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]:
+    match library.name:
+        case MPILibrary.OpenMPI:
+            try:
+                parsable_ompi_info = subprocess.check_output(
+                    ["ompi_info", "--parsable", "--all"]
+                ).decode("utf-8")
+                ompi_info = subprocess.check_output(["ompi_info"]).decode("utf-8")
 
-# Cray MPICH
-CUDA_AWARE_MPI = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
-ROCM_AWARE_MPI = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
+                # Check for CUDA support flag
+                cuda_support_flag = "mpi_built_with_cuda_support:value:true" in parsable_ompi_info
 
-# do the same for ParaStationMPI, seems to have CUDA-support only
-CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("PSP_CUDA") == "1"
+                # Check for extensions
+                match = re.search(r"MPI extensions: (.*)", ompi_info)
+                extensions = [ext.strip() for ext in match.group(0).split(":")[1].split(",")]
+                cuda = cuda_support_flag and "cuda" in extensions
+                if library.version.startswith("v4."):
+                    rocm = cuda
+                elif library.version.startswith("v5."):
+                    rocm = "rocm" in extensions or "hip" in extensions
+                # Seems to be broken, disabled by default for now
+                # return cuda, rocm
+                return False, False
+            except Exception as e:  # noqa E722
+                return False, False
+        case MPILibrary.IntelMPI:
+            return False, False
+        case MPILibrary.MVAPICH:
+            cuda = os.environ.get("MV2_USE_CUDA") == "1"
+            rocm = os.environ.get("MV2_USE_ROCM") == "1"
+            return cuda, rocm
+        case MPILibrary.MPICH:
+            cuda = os.environ.get("MPIR_CVAR_ENABLE_HCOLL") == "1"
+            rocm = False
+            return cuda, rocm
+        case MPILibrary.CrayMPI:
+            cuda = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
+            rocm = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
+            return cuda, rocm
+        case MPILibrary.ParastationMPI:
+            cuda = os.environ.get("PSP_CUDA") == "1"
+            rocm = False
+            return cuda, rocm
+        case _:
+            return False, False
 
-# Intel-MPI?
+
+PLATFORM = platform.platform()
+TORCH_VERSION = torch.__version__
+TORCH_CUDA_IS_AVAILABLE = torch.cuda.is_available()
+CUDA_IS_ACTUALLY_ROCM = "rocm" in TORCH_VERSION
+
+mpi_library = _get_mpi_library()
+CUDA_AWARE_MPI, ROCM_AWARE_MPI = _check_gpu_aware_mpi(mpi_library)
+GPU_AWARE_MPI = False
 
 # warn the user if CUDA/ROCm-aware MPI is not available, but PyTorch can use GPUs with CUDA/ROCm
 if TORCH_CUDA_IS_AVAILABLE:
@@ -52,11 +105,11 @@
             f"Heat has CUDA GPU-support (PyTorch version {TORCH_VERSION} and `torch.cuda.is_available() = True`), but CUDA-awareness of MPI could not be detected. This may lead to performance degradation as direct MPI-communication between GPUs is not possible.",
             UserWarning,
         )
+
     elif CUDA_IS_ACTUALLY_ROCM and not ROCM_AWARE_MPI:
         warnings.warn(
             f"Heat has ROCm GPU-support (PyTorch version {TORCH_VERSION} and `torch.cuda.is_available() = True`), but ROCm-awareness of MPI could not be detected. This may lead to performance degradation as direct MPI-communication between GPUs is not possible.",
             UserWarning,
         )
-    GPU_AWARE_MPI = True
-else:
-    GPU_AWARE_MPI = False
+    else:
+        GPU_AWARE_MPI = True
diff --git a/heat/core/communication.py b/heat/core/communication.py
index eb3443bc10..3387ad7326 100644
--- a/heat/core/communication.py
+++ b/heat/core/communication.py
@@ -12,9 +12,10 @@
 from mpi4py import MPI
 
 from typing import Any, Callable, Optional, List, Tuple, Union
+
 from .stride_tricks import sanitize_axis
 
-from ._config import CUDA_AWARE_MPI
+from ._config import GPU_AWARE_MPI
 
 
 class MPIRequest:
@@ -57,7 +58,7 @@ def Wait(self, status: MPI.Status = None):
         if self.tensor is not None and isinstance(self.tensor, torch.Tensor):
             if self.permutation is not None:
                 self.recvbuf = self.recvbuf.permute(self.permutation)
-        if self.tensor is not None and self.tensor.is_cuda and not CUDA_AWARE_MPI:
+        if self.tensor is not None and self.tensor.is_cuda and not GPU_AWARE_MPI:
             self.tensor.copy_(self.recvbuf)
 
     def __getattr__(self, name: str) -> Callable:
@@ -389,6 +390,17 @@ def as_buffer(
             obj.squeeze_(-1)
         return [mpi_mem, elements, mpi_type]
 
+    def _moveToCompDevice(self, x: torch.Tensor, func: Callable | None) -> torch.Tensor:
+        """Moves the torch tensor to the relevant device, in case the function is not compatible with the MPI+GPU library."""
+        if x.is_cuda:
+            if GPU_AWARE_MPI:
+                torch.cuda.synchronize(x.device)
+                return x
+            else:
+                return x.cpu()
+        else:
+            return x
+
     def alltoall_sendbuffer(
         self, obj: torch.Tensor
     ) -> List[Union[MPI.memory, Tuple[int, int], MPI.Datatype]]:
@@ -534,7 +546,7 @@ def Irecv(
         if not isinstance(buf, torch.Tensor):
             return MPIRequest(self.handle.Irecv(buf, source, tag))
 
-        rbuf = buf if CUDA_AWARE_MPI else buf.cpu()
+        rbuf = self._moveToCompDevice(buf, self.handle.Irecv)
         return MPIRequest(self.handle.Irecv(self.as_buffer(rbuf), source, tag), None, rbuf, buf)
 
     Irecv.__doc__ = MPI.Comm.Irecv.__doc__
@@ -565,10 +577,10 @@ def Recv(
         if not isinstance(buf, torch.Tensor):
             return self.handle.Recv(buf, source, tag, status)
 
-        rbuf = buf if CUDA_AWARE_MPI else buf.cpu()
+        rbuf = self._moveToCompDevice(buf, self.handle.Recv)
         ret = self.handle.Recv(self.as_buffer(rbuf), source, tag, status)
 
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -597,7 +609,8 @@ def __send_like(
             return func(buf, dest, tag), None
 
         # in case of GPUs, the memory has to be copied to host memory if CUDA-aware MPI is not supported
-        sbuf = buf if CUDA_AWARE_MPI else buf.cpu()
+        sbuf = self._moveToCompDevice(buf, func)
+
         return func(self.as_buffer(sbuf), dest, tag), sbuf
 
     def Bsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0):
@@ -765,7 +778,7 @@ def __broadcast_like(
         if not isinstance(buf, torch.Tensor):
             return func(buf, root), None, None, None
 
-        srbuf = buf if CUDA_AWARE_MPI else buf.cpu()
+        srbuf = self._moveToCompDevice(buf, func)
 
         return func(self.as_buffer(srbuf), root), srbuf, srbuf, buf
 
@@ -781,7 +794,7 @@ def Bcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> None:
             Rank of the root process, that broadcasts the message
         """
         ret, sbuf, rbuf, buf = self.__broadcast_like(self.handle.Bcast, buf, root)
-        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -941,7 +954,7 @@ def __reduce_like(
         if isinstance(recvbuf, torch.Tensor):
             # Datatype and count shall be derived from the recv buffer, and applied to both, as they should match after the last code block
             buf = recvbuf
-            rbuf = recvbuf if CUDA_AWARE_MPI else recvbuf.cpu()
+            rbuf = self._moveToCompDevice(buf, func)
             recvbuf: Tuple[MPI.memory, int, MPI.Datatype] = self.as_buffer(rbuf, is_contiguous=True)
             if not recvbuf[2].is_predefined:
                 # If using a derived datatype, we need to define the reduce operation to be able to handle the it.
@@ -949,7 +962,7 @@ def __reduce_like(
                 op = derived_op
 
         if isinstance(sendbuf, torch.Tensor):
-            sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu()
+            sbuf = self._moveToCompDevice(sendbuf, func)
             sendbuf = (self.as_mpi_memory(sbuf), recvbuf[1], recvbuf[2])
 
         # perform the actual reduction operation
@@ -974,7 +987,7 @@ def Allreduce(
             The operation to perform upon reduction
         """
         ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Allreduce, sendbuf, recvbuf, op)
-        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -999,7 +1012,7 @@ def Exscan(
             The operation to perform upon reduction
         """
         ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Exscan, sendbuf, recvbuf, op)
-        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1118,7 +1131,7 @@ def Reduce(
             Rank of the root process
         """
         ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Reduce, sendbuf, recvbuf, op, root)
-        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1143,7 +1156,7 @@ def Scan(
             The operation to perform upon reduction
         """
         ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Scan, sendbuf, recvbuf, op)
-        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1213,23 +1226,24 @@ def __allgather_like(
         else:
             recv_axis_permutation = None
 
-        sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu()
-        rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu()
-
-        # prepare buffer objects
-        if sendbuf is MPI.IN_PLACE or not isinstance(sendbuf, torch.Tensor):
-            mpi_sendbuf = sbuf
-        else:
+        if isinstance(sendbuf, torch.Tensor):
+            sbuf = self._moveToCompDevice(sendbuf, func)
             mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs, sbuf_is_contiguous)
             if send_counts is not None:
                 mpi_sendbuf[1] = mpi_sendbuf[1][0][self.rank]
-
-        if recvbuf is MPI.IN_PLACE or not isinstance(recvbuf, torch.Tensor):
-            mpi_recvbuf = rbuf
         else:
+            sbuf = sendbuf
+            mpi_sendbuf = sendbuf
+
+        if isinstance(recvbuf, torch.Tensor):
+            rbuf = self._moveToCompDevice(recvbuf, func)
             mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs, rbuf_is_contiguous)
             if recv_counts is None:
                 mpi_recvbuf[1] //= self.size
+        else:
+            rbuf = recvbuf
+            mpi_recvbuf = recvbuf
+
         # perform the scatter operation
         exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs)
         return exit_code, sbuf, rbuf, original_recvbuf, recv_axis_permutation
@@ -1257,7 +1271,7 @@ def Allgather(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1286,7 +1300,7 @@ def Allgatherv(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1417,20 +1431,12 @@ def __alltoall_like(
             recvbuf = recvbuf.permute(*recv_axis_permutation)
 
             # prepare buffer objects
-            sbuf = (
-                sendbuf
-                if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor)
-                else sendbuf.cpu()
-            )
+            sbuf = self._moveToCompDevice(sendbuf, func)
             mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs)
             if send_counts is None:
                 mpi_sendbuf[1] //= self.size
 
-            rbuf = (
-                recvbuf
-                if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor)
-                else recvbuf.cpu()
-            )
+            rbuf = self._moveToCompDevice(recvbuf, func)
             mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs)
             if recv_counts is None:
                 mpi_recvbuf[1] //= self.size
@@ -1461,16 +1467,8 @@ def __alltoall_like(
             recvbuf = recvbuf.permute(*axis_permutation)
 
             # prepare buffer objects
-            sbuf = (
-                sendbuf
-                if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor)
-                else sendbuf.cpu()
-            )
-            rbuf = (
-                recvbuf
-                if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor)
-                else recvbuf.cpu()
-            )
+            sbuf = self._moveToCompDevice(sendbuf, func)
+            rbuf = self._moveToCompDevice(recvbuf, func)
             mpi_sendbuf = self.alltoall_sendbuffer(sbuf)
             mpi_recvbuf = self.alltoall_recvbuffer(rbuf)
 
@@ -1510,7 +1508,7 @@ def Alltoall(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1546,7 +1544,7 @@ def Alltoallv(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1570,7 +1568,7 @@ def Alltoallw(
         """
         # Unpack sendbuffer information
         sendbuf_tensor, (send_counts, send_displs), subarray_params_list = sendbuf
-        sendbuf = sendbuf_tensor if CUDA_AWARE_MPI else sendbuf_tensor.cpu()
+        sendbuf = self._moveToCompDevice(sendbuf_tensor, self.handle.Alltoallw)
 
         is_contiguous = sendbuf.is_contiguous()
         stride = sendbuf.stride()
@@ -1605,7 +1603,7 @@ def Alltoallw(
 
         # Unpack recvbuf information
         recvbuf_tensor, (recv_counts, recv_displs), subarray_params_list = recvbuf
-        recvbuf = recvbuf_tensor if CUDA_AWARE_MPI else recvbuf_tensor.cpu()
+        recvbuf = self._moveToCompDevice(recvbuf_tensor, self.handle.Alltoallw)
         recvbuf_ptr, _, recv_datatype = self.as_buffer(recvbuf)
 
         # Commit the receive subarray datatypes
@@ -1633,7 +1631,7 @@ def Alltoallw(
         if (
             isinstance(recvbuf_tensor, torch.Tensor)
             and recvbuf_tensor.is_cuda
-            and not CUDA_AWARE_MPI
+            and not GPU_AWARE_MPI
         ):
             recvbuf_tensor.copy_(recvbuf)
         else:
@@ -1860,9 +1858,12 @@ def __gather_like(
             recv_axis_permutation[0], recv_axis_permutation[recv_axis] = recv_axis, 0
             recvbuf = recvbuf.permute(*recv_axis_permutation)
 
-        # prepare buffer objects
-        sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu()
-        rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu()
+        sbuf = (
+            self._moveToCompDevice(sendbuf, func) if isinstance(sendbuf, torch.Tensor) else sendbuf
+        )
+        rbuf = (
+            self._moveToCompDevice(recvbuf, func) if isinstance(recvbuf, torch.Tensor) else recvbuf
+        )
 
         if sendbuf is not MPI.IN_PLACE:
             mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs)
@@ -1870,6 +1871,7 @@ def __gather_like(
                 mpi_sendbuf[1] //= send_factor
         else:
             mpi_sendbuf = sbuf
+
         if recvbuf is not MPI.IN_PLACE:
             mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs)
             if recv_counts is None:
@@ -1916,7 +1918,7 @@ def Gather(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -1951,7 +1953,7 @@ def Gatherv(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -2105,8 +2107,15 @@ def __scatter_like(
         recvbuf = recvbuf.permute(*recv_axis_permutation)
 
         # prepare buffer objects
-        sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu()
-        rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu()
+        if isinstance(sendbuf, torch.Tensor):
+            sbuf = self._moveToCompDevice(sendbuf, func)
+        else:
+            sbuf = sendbuf
+
+        if isinstance(recvbuf, torch.Tensor):
+            rbuf = self._moveToCompDevice(recvbuf, func)
+        else:
+            rbuf = recvbuf
 
         if sendbuf is not MPI.IN_PLACE:
             mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs)
@@ -2236,7 +2245,7 @@ def Scatter(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
@@ -2277,7 +2286,7 @@ def Scatterv(
         )
         if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None:
             rbuf = rbuf.permute(permutation)
-        if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI:
+        if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI:
             buf.copy_(rbuf)
         return ret
 
diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py
index f8c500a72b..5fc901a5d1 100644
--- a/heat/core/linalg/tests/test_basics.py
+++ b/heat/core/linalg/tests/test_basics.py
@@ -375,7 +375,7 @@ def test_inv(self):
         a = ht.random.random((20, 20), dtype=dtype, split=0)
         ainv = ht.linalg.inv(a)
         i = ht.eye(a.shape, split=0, dtype=a.dtype)
-        print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}")
+        # print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}")
         self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol * 1e2))
 
         with self.assertRaises(RuntimeError):
diff --git a/heat/core/linalg/tests/test_qr.py b/heat/core/linalg/tests/test_qr.py
index dc31e03caf..acf814d3c0 100644
--- a/heat/core/linalg/tests/test_qr.py
+++ b/heat/core/linalg/tests/test_qr.py
@@ -32,8 +32,8 @@ def test_qr_split1orNone(self):
                             if not allclose:
                                 diff = qr.Q @ qr.R - mat
                                 max_diff = ht.max(diff)
-                                print(f"diff: {diff}")
-                                print(f"max_diff: {max_diff}m")
+                                #print(f"diff: {diff}")
+                                #print(f"max_diff: {max_diff}m")
 
                             self.assertTrue(
                                 ht.allclose(qr.Q @ qr.R, mat, atol=dtypetol, rtol=dtypetol)
diff --git a/heat/core/tests/test_communication.py b/heat/core/tests/test_communication.py
index 131b21f79a..162a5b0d45 100644
--- a/heat/core/tests/test_communication.py
+++ b/heat/core/tests/test_communication.py
@@ -74,8 +74,8 @@ def test_mpi_communicator(self):
         self.assertEqual(len(chunks), len(self.data.shape))
 
     def test_cuda_aware_mpi(self):
-        self.assertTrue(hasattr(ht.communication, "CUDA_AWARE_MPI"))
-        self.assertIsInstance(ht.communication.CUDA_AWARE_MPI, bool)
+        self.assertTrue(hasattr(ht.communication, "GPU_AWARE_MPI"))
+        self.assertIsInstance(ht.communication.GPU_AWARE_MPI, bool)
 
     def test_contiguous_memory_buffer(self):
         # vector heat tensor
@@ -139,7 +139,7 @@ def test_non_contiguous_memory_buffer(self):
 
         # check that after sending the data everything is equal
         self.assertTrue((non_contiguous_data.larray == contiguous_out.larray).all())
-        if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI:
+        if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI:
             self.assertTrue(contiguous_out.larray.is_contiguous())
 
         # non-contiguous destination
@@ -158,7 +158,7 @@ def test_non_contiguous_memory_buffer(self):
         req.Wait()
         # check that after sending the data everything is equal
         self.assertTrue((contiguous_data.larray == non_contiguous_out.larray).all())
-        if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI:
+        if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI:
             self.assertFalse(non_contiguous_out.larray.is_contiguous())
 
         # non-contiguous destination
@@ -181,7 +181,7 @@ def test_non_contiguous_memory_buffer(self):
         req.Wait()
         # check that after sending the data everything is equal
         self.assertTrue((both_non_contiguous_data.larray == both_non_contiguous_out.larray).all())
-        if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI:
+        if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI:
             self.assertFalse(both_non_contiguous_out.larray.is_contiguous())
 
     def test_default_comm(self):

From 80756c5de62f913b9ff9779ee9753fa062232dcf Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Tue, 28 Oct 2025 09:22:44 +0100
Subject: [PATCH 03/15] Bugs/1990  Fix handling of zarr groups (#1991) (#1999)

* introduce variable to handle zarr groups

* expand tests

* edit docs

* do not test with float64 on MPS

* files housekeeping on MPS

* load zarr variable from multiple dirs

* fix path reading

* fix import

* expand docs

* set device

* fix dtype on empty ranks, balance output, refactor

* adapt tests

* Apply suggestions from code review


* expand tests

* fix deadlock at split sanitation

* enable dtype setting

* expand tests

---------


(cherry picked from commit dc4bd1cd831fd08df64f1a394ee8cb1c5da0c202)

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 heat/core/io.py            | 148 +++++++++++++++++++++++++++--
 heat/core/tests/test_io.py | 186 +++++++++++++++++++++++++++++++++----
 2 files changed, 306 insertions(+), 28 deletions(-)

diff --git a/heat/core/io.py b/heat/core/io.py
index aae6ab5b2c..ca7d7bb0c8 100644
--- a/heat/core/io.py
+++ b/heat/core/io.py
@@ -3,6 +3,8 @@
 from __future__ import annotations
 
 from functools import reduce
+import glob
+
 import operator
 import os.path
 from math import log10
@@ -797,6 +799,12 @@ def load(
     DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
     >>> ht.load("data.nc", variable="DATA")
     DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
+    >>> ht.load("my_data.zarr", variable="RECEIVER_1/DATA")
+    DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=0)
+    >>> ht.load("my_data.zarr", variable="RECEIVER_*/DATA")
+    DNDarray([[ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981],
+                [ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981],
+                [ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981]], dtype=ht.float32, device=cpu:0, split=0)
 
     See Also
     --------
@@ -1417,6 +1425,7 @@ def supports_zarr() -> bool:
 
     def load_zarr(
         path: str,
+        variable: str = None,
         split: int = 0,
         device: Optional[str] = None,
         comm: Optional[Communication] = None,
@@ -1424,12 +1433,15 @@ def load_zarr(
         **kwargs,
     ) -> DNDarray:
         """
-        Loads zarr-Format into DNDarray which will be returned.
+        Loads data from a zarr store into DNDarray. `path` can either point to a single zarr array or a zarr group. In the latter case, `variable` must be provided to specify which array in the group to load. If `variable` contains a wildcard pattern (e.g. `RECEIVER_*/DATA`), all matching arrays will be loaded and concatenated along the specified `split` axis.
 
         Parameters
         ----------
         path : str
             Path to the directory in which a .zarr-file is located.
+        variable : str, optional
+            If the zarr store is a group, the variable (or path to variable) to load from the group.
+            Can contain a wildcard pattern to load and concatenate arrays stored in slices in different directories.
         split : int
             Along which axis the loaded arrays should be concatenated.
         device : str, optional
@@ -1441,12 +1453,13 @@ def load_zarr(
         **kwargs : Any
             extra Arguments to pass to zarr.open
         """
+        # sanitize inputs
+        device = devices.sanitize_device(device)
+        torch_device = device.torch_device
+        comm = sanitize_comm(comm)
+
         if not isinstance(path, str):
             raise TypeError(f"path must be str, not {type(path)}")
-        if split is not None and not isinstance(split, int):
-            raise TypeError(f"split must be None or int, not {type(split)}")
-        if device is not None and not isinstance(device, str):
-            raise TypeError(f"device must be None or str, not {type(split)}")
         if not isinstance(slices, (slice, Iterable)) and slices is not None:
             raise TypeError(f"Slices Argument must be slice, tuple or None and not {type(slices)}")
         if isinstance(slices, Iterable):
@@ -1461,7 +1474,127 @@ def load_zarr(
         else:
             raise ValueError("File has no zarr extension.")
 
-        arr: zarr.Array = zarr.open_array(store=path, **kwargs)
+        store_path = os.path.join(path, variable) if variable else path
+
+        output_dtype = kwargs.pop("dtype", None)
+        torch_output_dtype = output_dtype.torch_type() if output_dtype else None
+
+        if variable and "*" in variable:
+            # `variable` contains a wildcard pattern
+            # e.g. data were chunked at write-out and stored in multiple directories
+            if slices is not None:
+                raise NotImplementedError("Slicing is not supported when loading with a wildcard.")
+
+            base_paths = sorted(glob.glob(store_path))
+
+            if not base_paths:
+                raise FileNotFoundError(
+                    f"Zarr wildcard pattern '{variable}' did not match any arrays in store '{path}'"
+                )
+
+            variable_paths = [os.path.relpath(p, start=path) for p in base_paths]
+
+            # each rank reads data from its assigned directories and concatenates locally
+
+            # determine which directories to open on rank
+            dummy_array = factories.empty((len(base_paths),), dtype=types.float32)
+            _, _, local_dir_slice = dummy_array.comm.chunk(
+                dummy_array.shape, rank=dummy_array.comm.rank, split=0
+            )
+
+            # load data to torch tensors
+            local_tensors = []
+            for i, var_path in enumerate(variable_paths[local_dir_slice[0]]):
+                local_tensor = torch.from_numpy(zarr.open(path)[var_path][:])
+                if torch_output_dtype:
+                    local_tensor = local_tensor.to(torch_output_dtype)
+                local_tensors.append(local_tensor)
+
+            # Have rank 0 determine the single-store shape and broadcast it to all ranks for sanitation
+            target_ndims = torch.zeros(1, dtype=torch.int32)
+            if dummy_array.comm.rank == 0:
+                if len(local_tensors) == 0:
+                    raise ValueError(
+                        f"Zarr wildcard pattern '{variable}' did not match any arrays in store '{path}'"
+                    )
+                # broadcast shape of first local tensor to allow sanitation on empty ranks
+                target_ndims = torch.tensor(local_tensors[0].ndim, dtype=torch.int32)
+            dummy_array.comm.Bcast(target_ndims, root=0)
+            # sanitize split axis
+            proxy_shape = (1,) * target_ndims.item()
+            split = sanitize_axis(proxy_shape, axis=split)
+
+            # concatenate locally
+            if len(local_tensors) >= 1:
+                if len(local_tensors) == 1:
+                    local_tensor = local_tensors[0]
+                else:
+                    local_tensor = torch.cat(local_tensors, dim=split if split is not None else 0)
+                empty_ranks = torch.tensor([0], dtype=torch.int32)
+                ht_type_code = types.__type_codes[types.canonical_heat_type(local_tensor.dtype)]
+            else:
+                # no local tensors i.e. no data assigned to rank
+                local_tensor = torch.empty((0,))
+                empty_ranks = torch.tensor([1], dtype=torch.int32)
+                # dummy dtype code
+                ht_type_code = -1
+            # check for empty ranks
+            dummy_array.comm.Allreduce(MPI.IN_PLACE, empty_ranks, op=MPI.SUM)
+            if empty_ranks.item() > 0:
+                # fix local shape and dtype of empty tensors, otherwise DNDarray construction will fail
+                # Rank 0 broadcasts the info to all other ranks
+                target_shape = torch.zeros(
+                    (
+                        1,
+                        target_ndims.item() + 1,
+                    ),
+                    dtype=torch.int64,
+                )
+                if local_tensor.numel() > 0:
+                    target_shape[0, :-1] = torch.tensor(local_tensor.shape, dtype=torch.int64)
+                # encode dtype as last entry
+                target_shape[0, -1] = ht_type_code
+                # share info about target shape and dtype
+                target_shapes = torch.zeros(
+                    (dummy_array.comm.size, target_ndims.item() + 1), dtype=torch.int64
+                )
+                dummy_array.comm.Allgather(target_shape, target_shapes)
+                if local_tensor.numel() == 0:
+                    ht_type_code = target_shapes[0, -1].item()
+                    target_shape = target_shapes[0, :-1].clone()
+                    target_shape[split] = 0
+                    for dtype, dtype_code in types.__type_codes.items():
+                        if dtype_code == ht_type_code:
+                            ht_type = dtype
+                            break
+                    local_tensor = torch.empty(
+                        tuple(target_shape.tolist()), dtype=ht_type.torch_type()
+                    )
+                # discard dtype code column
+                target_shapes = target_shapes[:, :-1]
+                # calculate global array shape
+                out_gshape = target_shapes[0, :].clone()
+                out_gshape[split] = target_shapes[:, split].sum().item()
+                # wrap local tensors in DNDarray
+                dndarray = DNDarray(
+                    local_tensor.to(device=torch_device),
+                    gshape=tuple(out_gshape.tolist()),
+                    dtype=output_dtype
+                    if output_dtype
+                    else types.canonical_heat_type(local_tensor.dtype),
+                    split=split,
+                    device=device,
+                    comm=comm,
+                    balanced=False,
+                )
+            else:
+                # all ranks are populated, create DNDarray directly
+                dndarray = factories.array(local_tensor, is_split=split, device=device, comm=comm)
+            dndarray.balance_()
+            return dndarray
+
+        # standard single zarr array
+        arr: zarr.Array = zarr.open_array(store=store_path, **kwargs)
         shape = arr.shape
 
         if isinstance(slices, slice) or slices is None:
@@ -1476,8 +1609,7 @@ def load_zarr(
         slices.extend([slice(None) for _ in range(abs(len(slices) - len(shape)))])
 
         dtype = types.canonical_heat_type(arr.dtype)
-        device = devices.sanitize_device(device)
-        comm = sanitize_comm(comm)
+        split = sanitize_axis(shape, axis=split)
 
         # slices = tuple(slice(*tslice.indices(length)) for length, tslice in zip(shape, slices))
         slices = tuple(slices)
diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py
index ac5ebd4a6c..b61325dd3c 100644
--- a/heat/core/tests/test_io.py
+++ b/heat/core/tests/test_io.py
@@ -40,9 +40,16 @@ def setUpClass(cls):
         cls.ZARR_OUT_PATH = pwd + "/zarr_test_out.zarr"
         cls.ZARR_IN_PATH = pwd + "/zarr_test_in.zarr"
         cls.ZARR_TEMP_PATH = pwd + "/zarr_temp.zarr"
+        cls.ZARR_NESTED_PATH = pwd + "/zarr_test_nested.zarr"
+
+        # device-aware dtypes
+        testing_types = [ht.int32, ht.int64, ht.float32]
+        if not cls.is_mps:
+            testing_types.append(ht.float64)
+        cls.testing_types = testing_types
 
     def tearDown(self):
-        # synchronize all nodes
+        # synchronize all processes
         ht.MPI_WORLD.Barrier()
 
         # clean up of temporary files
@@ -57,16 +64,20 @@ def tearDown(self):
                 os.remove(self.NETCDF_OUT_PATH)
             except FileNotFoundError:
                 pass
-        # if ht.MPI_WORLD.rank == 0:
 
         if ht.io.supports_zarr():
-            for file in [self.ZARR_TEMP_PATH, self.ZARR_IN_PATH, self.ZARR_OUT_PATH]:
-                try:
-                    shutil.rmtree(file)
-                except FileNotFoundError:
-                    pass
+            if ht.MPI_WORLD.rank == 0:
+                for file in [
+                    self.ZARR_TEMP_PATH,
+                    self.ZARR_IN_PATH,
+                    self.ZARR_OUT_PATH,
+                    self.ZARR_NESTED_PATH,
+                ]:
+                    try:
+                        shutil.rmtree(file)
+                    except FileNotFoundError:
+                        pass
 
-        # synchronize all nodes
         ht.MPI_WORLD.Barrier()
 
     def test_size_from_slice(self):
@@ -831,9 +842,10 @@ def test_load_npy_float(self):
             self.assertEqual(load_array.dtype, ht.float64)
             if ht.MPI_WORLD.rank == 0:
                 self.assertTrue((load_array_npy == float_array).all)
-                for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")):
-                    if fnmatch.fnmatch(file, "*.npy"):
-                        os.remove(os.path.join(os.getcwd(), "heat/datasets", file))
+        if ht.MPI_WORLD.rank == 0:
+            for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")):
+                if fnmatch.fnmatch(file, "*.npy"):
+                    os.remove(os.path.join(os.getcwd(), "heat/datasets", file))
 
     def test_load_npy_exception(self):
         with self.assertRaises(TypeError):
@@ -940,15 +952,15 @@ def test_load_zarr(self):
         import zarr
 
         test_data = np.arange(self.ZARR_SHAPE[0] * self.ZARR_SHAPE[1]).reshape(self.ZARR_SHAPE)
-
+        dtype = np.float32
         if ht.MPI_WORLD.rank == 0:
             try:
                 arr = zarr.create_array(
-                    self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64
+                    self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=dtype
                 )
             except AttributeError:
                 arr = zarr.create(
-                    store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64
+                    store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=dtype
                 )
             arr[:] = test_data
 
@@ -962,6 +974,140 @@ def test_load_zarr(self):
 
         ht.MPI_WORLD.Barrier()
 
+    def test_load_zarr_group(self):
+        if not ht.io.supports_zarr():
+            self.skipTest("Requires zarr")
+
+        import zarr
+
+        # Write out a nested Zarr store
+        original_data = np.arange(np.prod(self.ZARR_SHAPE)).reshape(self.ZARR_SHAPE)
+        nested_group_name = "MAIN_0"
+        array_name = "DATA"
+        variable_path = f"{nested_group_name}/{array_name}"
+
+        if ht.MPI_WORLD.rank == 0:
+            root = zarr.open_group(self.ZARR_NESTED_PATH, mode="w")
+            main_0 = root.create_group(nested_group_name)
+            main_0.create_dataset(
+                array_name,
+                shape=original_data.shape,
+                dtype=original_data.dtype,
+                data=original_data,
+            )
+
+        ht.MPI_WORLD.Barrier()
+
+        # Test loading using both positional and keyword arguments for different splits
+        for split in [None, 0, 1]:
+            # Test with positional argument
+            with self.subTest(split=split, arg_type="positional"):
+                ht_tensor_pos = ht.load(self.ZARR_NESTED_PATH, variable_path, split=split)
+                self.assertIsInstance(ht_tensor_pos, ht.DNDarray)
+                self.assertEqual(ht_tensor_pos.gshape, original_data.shape)
+                self.assertTrue(np.array_equal(ht_tensor_pos.numpy(), original_data))
+
+            # Test with keyword argument
+            with self.subTest(split=split, arg_type="keyword"):
+                ht_tensor_kw = ht.load(
+                    self.ZARR_NESTED_PATH, variable=variable_path, split=split
+                )
+                self.assertIsInstance(ht_tensor_kw, ht.DNDarray)
+                self.assertEqual(ht_tensor_kw.gshape, original_data.shape)
+                self.assertTrue(np.array_equal(ht_tensor_kw.numpy(), original_data))
+
+        ht.MPI_WORLD.Barrier()
+
+        # test loading with wildcard
+        num_chunks = self.comm.size * 2 + 1
+        if self.comm.size > 3:
+            # test empty ranks
+            num_chunks = self.comm.size - 1
+
+        np_testing_types = [np.int32, np.int64, np.float32, np.complex64]
+        if not self.is_mps:
+            np_testing_types.extend([np.float64, np.complex128])
+
+        ht.MPI_WORLD.Barrier()
+        for dtype in np_testing_types:
+            global_data_shape = (num_chunks * 10, num_chunks * 5, 7)
+            global_data = np.arange(np.prod(global_data_shape), dtype=dtype).reshape(global_data_shape)
+            if self.comm.rank == 0:
+                # create zarr store for split=0 and split=1
+                chunk_shape_split0 = (10, global_data_shape[1], global_data_shape[2])
+                chunk_shape_split1 = (global_data_shape[0], 5, global_data_shape[2])
+
+                root_zarr = zarr.open_group(self.ZARR_OUT_PATH, mode="w")
+
+                for i in range(num_chunks):
+                    chunk_data_split0 = global_data[i * chunk_shape_split0[0] : (i + 1) * chunk_shape_split0[0], :, :]
+                    chunk_group_split0 = root_zarr.create_group(f"CHUNK_{i}_SPLIT0")
+                    chunk_group_split0.create_dataset(
+                        "DATA",
+                        shape=chunk_data_split0.shape,
+                        dtype=chunk_data_split0.dtype,
+                        data=chunk_data_split0
+                    )
+
+                    chunk_data_split1 = global_data[:, i * chunk_shape_split1[1] : (i + 1) * chunk_shape_split1[1], :]
+                    chunk_group_split1 = root_zarr.create_group(f"CHUNK_{i}_SPLIT1")
+                    chunk_group_split1.create_dataset(
+                        "DATA",
+                        shape=chunk_data_split1.shape,
+                        dtype=chunk_data_split1.dtype,
+                        data=chunk_data_split1
+                    )
+            ht.MPI_WORLD.Barrier()
+
+            # test wildcard loading for split=0
+            with self.subTest(dtype=dtype, split=0):
+                ht_array_split0 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=0, device=self.device)
+                self.assertIsInstance(ht_array_split0, ht.DNDarray)
+                self.assertEqual(ht_array_split0.gshape, global_data_shape)
+                ht_array_split0.balance_()
+                self.assertTrue((ht_array_split0.numpy() == global_data).all())
+                self.assertTrue(ht_array_split0.dtype == ht.types.canonical_heat_type(dtype))
+
+            # test wildcard loading for split=1
+            with self.subTest(dtype=dtype, split=1):
+                ht_array_split1 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT1/DATA", split=1, device=self.device)
+                self.assertIsInstance(ht_array_split1, ht.DNDarray)
+                self.assertEqual(ht_array_split1.gshape, global_data_shape)
+                self.assertTrue((ht_array_split1.numpy() == global_data).all())
+                self.assertTrue(ht_array_split1.dtype == ht.types.canonical_heat_type(dtype))
+
+            # test wildcard loading with dtype conversion
+            with self.subTest(dtype=dtype, split="dtype_conversion"):
+                # only for non-complex dtypes
+                if not np.issubdtype(dtype, np.complexfloating):
+                    ht_array_split0 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=0, device=self.device, dtype=ht.float32)
+                    self.assertIsInstance(ht_array_split0, ht.DNDarray)
+                    self.assertEqual(ht_array_split0.gshape, global_data_shape)
+                    self.assertTrue((ht_array_split0.numpy() == global_data).all())
+                    self.assertTrue(ht_array_split0.dtype == ht.float32)
+
+            ht.MPI_WORLD.Barrier()
+
+            # Test data misconstruction when using the wrong split axis
+            with self.subTest(split="split_mismatch_0", dtype=dtype):
+                with self.assertRaises(ValueError):
+                    test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT1/DATA", split=0, device=self.device)
+                    self.assertTrue((test.numpy() == global_data).all())
+
+            with self.subTest(split="split_mismatch_1", dtype=dtype):
+                with self.assertRaises(ValueError):
+                    test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=1, device=self.device)
+                    self.assertFalse((test.numpy() == global_data).all())
+
+            # test exceptions
+            with self.subTest(split="split_exception", dtype=dtype):
+                with self.assertRaises(ValueError):
+                    test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=3)
+            with self.assertRaises(NotImplementedError):
+                test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", slices=slice(0,10))
+            with self.assertRaises(FileNotFoundError):
+                test = ht.load(self.ZARR_OUT_PATH, variable="NONEXSISTENT_CHUNK_*_SPLIT0/DATA", split=0)
+
     def test_load_zarr_slice(self):
         if not ht.io.supports_zarr():
             self.skipTest("Requires zarr")
@@ -1017,7 +1163,7 @@ def test_save_zarr_2d_split0(self):
 
         import zarr
 
-        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+        for type in self.testing_types:
             for dims in [(i, self.ZARR_SHAPE[1]) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]:
                 with self.subTest(type=type, dims=dims):
                     n = dims[0] * dims[1]
@@ -1037,7 +1183,7 @@ def test_save_zarr_2d_split1(self):
 
         import zarr
 
-        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+        for type in self.testing_types:
             for dims in [(self.ZARR_SHAPE[0], i) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]:
                 with self.subTest(type=type, dims=dims):
                     n = dims[0] * dims[1]
@@ -1057,7 +1203,7 @@ def test_save_zarr_split_none(self):
 
         import zarr
 
-        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+        for type in self.testing_types:
             for n in [10, 100, 1000]:
                 with self.subTest(type=type, n=n):
                     dndarray = ht.arange(n, dtype=type, split=None)
@@ -1075,7 +1221,7 @@ def test_save_zarr_1d_split_0(self):
 
         import zarr
 
-        for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]:
+        for type in self.testing_types:
             for n in [10, 100, 1000]:
                 with self.subTest(type=type, n=n):
                     dndarray = ht.arange(n, dtype=type, split=0)
@@ -1095,9 +1241,9 @@ def test_load_zarr_arguments(self):
             ht.load_zarr(None)
         with self.assertRaises(ValueError):
             ht.load_zarr("data.npy")
-        with self.assertRaises(TypeError):
+        with self.assertRaises(ValueError):
             ht.load_zarr("", "")
-        with self.assertRaises(TypeError):
+        with self.assertRaises(ValueError):
             ht.load_zarr("", device=1)
         with self.assertRaises(TypeError):
             ht.load_zarr("", slices=0)

From eb98ba836d21f60e539941bfa0e9b54da053935d Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Mon, 17 Nov 2025 11:40:34 +0100
Subject: [PATCH 04/15] Set correct dtype when loading and saving hdf5 (#2014)
 (#2024)

* fixed load_hdf5

* fixed save_hdf5

* fixed different behavior in tests

* test torch dtype for save_hdf5

---------


(cherry picked from commit 678cd47a551d40687bcab2548e0d504d53b3f0d4)

Co-authored-by: Marc-Jindra <m.jindra@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 heat/core/io.py            | 31 +++++++++++++++++++++++--------
 heat/core/tests/test_io.py | 12 ++++++------
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/heat/core/io.py b/heat/core/io.py
index ca7d7bb0c8..beb3ea9de8 100644
--- a/heat/core/io.py
+++ b/heat/core/io.py
@@ -517,7 +517,7 @@ def supports_hdf5() -> bool:
     def load_hdf5(
         path: str,
         dataset: str,
-        dtype: datatype = types.float32,
+        dtype: Optional[datatype] = None,
         slices: Optional[Tuple[Optional[slice], ...]] = None,
         split: Optional[int] = None,
         device: Optional[str] = None,
@@ -533,7 +533,7 @@ def load_hdf5(
         dataset : str
             Name of the dataset to be read.
         dtype : datatype, optional
-            Data type of the resulting array.
+            Data type of the resulting array, defaults to the loaded datasets type.
         slices : tuple of slice objects, optional
             Load only the specified slices of the dataset.
         split : int or None, optional
@@ -625,8 +625,6 @@ def load_hdf5(
         elif split is not None and not isinstance(split, int):
             raise TypeError(f"split must be None or int, not {type(split)}")
 
-        # infer the type and communicator for the loaded array
-        dtype = types.canonical_heat_type(dtype)
         # determine the comm and device the data will be placed on
         device = devices.sanitize_device(device)
         comm = sanitize_comm(comm)
@@ -637,6 +635,9 @@ def load_hdf5(
             gshape = data.shape
             new_gshape = tuple()
             offsets = [0] * len(gshape)
+            if dtype is None:
+                dtype = data.dtype
+            dtype = types.canonical_heat_type(dtype)
             if slices is not None:
                 for i in range(len(gshape)):
                     if i < len(slices) and slices[i]:
@@ -687,7 +688,12 @@ def load_hdf5(
             return DNDarray(data, gshape, dtype, split, device, comm, balanced)
 
     def save_hdf5(
-        data: DNDarray, path: str, dataset: str, mode: str = "w", **kwargs: Dict[str, object]
+        data: DNDarray,
+        path: str,
+        dataset: str,
+        mode: str = "w",
+        dtype: Optional[datatype] = None,
+        **kwargs: Dict[str, object],
     ):
         """
         Saves ``data`` to an HDF5 file. Attempts to utilize parallel I/O if possible.
@@ -702,6 +708,8 @@ def save_hdf5(
             Name of the dataset the data is saved to.
         mode : str, optional
             File access mode, one of ``'w', 'a', 'r+'``
+        dtype : datatype, optional
+            Data type of the saved data
         kwargs : dict, optional
             Additional arguments passed to the created dataset.
 
@@ -732,16 +740,23 @@ def save_hdf5(
         is_split = data.split is not None
         _, _, slices = data.comm.chunk(data.gshape, data.split if is_split else 0)
 
+        if dtype is None:
+            dtype = data.dtype
+        elif type(dtype) == torch.dtype:
+            dtype = str(dtype).split(".")[-1]
+        if type(dtype) is not str:
+            dtype = dtype.__name__
+
         # attempt to perform parallel I/O if possible
         if h5py.get_config().mpi:
             with h5py.File(path, mode, driver="mpio", comm=data.comm.handle) as handle:
-                dset = handle.create_dataset(dataset, data.shape, **kwargs)
+                dset = handle.create_dataset(dataset, data.shape, dtype=dtype, **kwargs)
                 dset[slices] = data.larray.cpu() if is_split else data.larray[slices].cpu()
 
         # otherwise a single rank only write is performed in case of local data (i.e. no split)
         elif data.comm.rank == 0:
             with h5py.File(path, mode) as handle:
-                dset = handle.create_dataset(dataset, data.shape, **kwargs)
+                dset = handle.create_dataset(dataset, data.shape, dtype=dtype, **kwargs)
                 if is_split:
                     dset[slices] = data.larray.cpu()
                 else:
@@ -763,7 +778,7 @@ def save_hdf5(
             next_rank = (data.comm.rank + 1) % data.comm.size
             data.comm.Isend([None, 0, MPI.INT], dest=next_rank)
 
-    DNDarray.save_hdf5 = lambda self, path, dataset, mode="w", **kwargs: save_hdf5(
+    DNDarray.save_hdf5 = lambda self, path, dataset, mode="w", dtype=None, **kwargs: save_hdf5(
         self, path, dataset, mode, **kwargs
     )
     DNDarray.save_hdf5.__doc__ = save_hdf5.__doc__
diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py
index b61325dd3c..69369d2214 100644
--- a/heat/core/tests/test_io.py
+++ b/heat/core/tests/test_io.py
@@ -106,7 +106,7 @@ def test_size_from_slice(self):
     def test_load(self):
         # HDF5
         if ht.io.supports_hdf5():
-            iris = ht.load(self.HDF5_PATH, dataset="data")
+            iris = ht.load(self.HDF5_PATH, dataset="data", dtype=ht.float32)
             self.assertIsInstance(iris, ht.DNDarray)
             # shape invariant
             self.assertEqual(iris.shape, self.IRIS.shape)
@@ -591,7 +591,7 @@ def test_load_hdf5(self):
             self.skipTest("Requires HDF5")
 
         # default parameters
-        iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET)
+        iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, dtype=ht.float32)
         self.assertIsInstance(iris, ht.DNDarray)
         self.assertEqual(iris.shape, self.IRIS.shape)
         self.assertEqual(iris.dtype, ht.float32)
@@ -602,13 +602,13 @@ def test_load_hdf5(self):
         iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0)
         self.assertIsInstance(iris, ht.DNDarray)
         self.assertEqual(iris.shape, self.IRIS.shape)
-        self.assertEqual(iris.dtype, ht.float32)
+        self.assertEqual(iris.dtype, ht.float64)
         lshape = iris.lshape
         self.assertLessEqual(lshape[0], self.IRIS.shape[0])
         self.assertEqual(lshape[1], self.IRIS.shape[1])
 
         # negative split axis
-        iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=-1)
+        iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=-1, dtype=ht.float32)
         self.assertIsInstance(iris, ht.DNDarray)
         self.assertEqual(iris.shape, self.IRIS.shape)
         self.assertEqual(iris.dtype, ht.float32)
@@ -650,7 +650,7 @@ def test_save_hdf5(self):
         # local unsplit data
         local_data = ht.arange(100)
         ht.save_hdf5(
-            local_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=local_data.dtype.char()
+            local_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=torch.int32
         )
         if local_data.comm.rank == 0:
             with ht.io.h5py.File(self.HDF5_OUT_PATH, "r") as handle:
@@ -662,7 +662,7 @@ def test_save_hdf5(self):
         # distributed data range
         split_data = ht.arange(100, split=0)
         ht.save_hdf5(
-            split_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=split_data.dtype.char()
+            split_data, self.HDF5_OUT_PATH, self.HDF5_DATASET
         )
         if split_data.comm.rank == 0:
             with ht.io.h5py.File(self.HDF5_OUT_PATH, "r") as handle:

From 267b7d5a5a1c2d94d041bef922b602613262f43f Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Mon, 1 Dec 2025 11:30:07 +0100
Subject: [PATCH 05/15] Supporting negative indices for flip operations (#2030)
 (#2050)

* Converting negative indices

* use sanitize_axis

* Added tests

* edit docs

---------


(cherry picked from commit 3c0bc279bc981e51721de5953586990f55efe7bc)

Co-authored-by: Marc-Jindra <m.jindra@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 heat/core/manipulations.py            |  8 ++++++--
 heat/core/tests/test_manipulations.py | 21 ++++++++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/heat/core/manipulations.py b/heat/core/manipulations.py
index d685f4d5ad..a7d9c542df 100644
--- a/heat/core/manipulations.py
+++ b/heat/core/manipulations.py
@@ -1076,7 +1076,7 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
     a: DNDarray
         Input array to be flipped
     axis: int or Tuple[int,...]
-        A list of axes to be flipped
+        The axis or sequence of axes to be flipped
 
     See Also
     --------
@@ -1100,7 +1100,11 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray:
 
     # torch.flip only accepts tuples
     if isinstance(axis, int):
-        axis = [axis]
+        axis = (axis,)
+    elif isinstance(axis, list):
+        axis = tuple(axis)
+
+    axis = stride_tricks.sanitize_axis(a.shape, axis)
 
     flipped = torch.flip(a.larray, axis)
 
diff --git a/heat/core/tests/test_manipulations.py b/heat/core/tests/test_manipulations.py
index e3c5ad232d..30138730d1 100644
--- a/heat/core/tests/test_manipulations.py
+++ b/heat/core/tests/test_manipulations.py
@@ -1089,6 +1089,25 @@ def test_flip(self):
         r_a = ht.array([[[3, 2], [1, 0]], [[7, 6], [5, 4]]], split=0, dtype=ht.uint8)
         self.assertTrue(ht.equal(ht.flip(a, [1, 2]), r_a))
 
+        # test negative axis
+        a = ht.array([[1, 2], [3, 4]])
+        r_a = ht.array([[2, 1], [4, 3]])
+        self.assertTrue(ht.equal(ht.flip(a, -1), r_a))
+
+        a = ht.array([[1, 2], [3, 4]])
+        r_a = ht.array([[3, 4], [1, 2]])
+        self.assertTrue(ht.equal(ht.flip(a, -2), r_a))
+
+        a = ht.array([[1, 2], [3, 4]])
+        r_a = ht.array([[4, 3], [2, 1]])
+        self.assertTrue(ht.equal(ht.flip(a, (-2, -1)), r_a))
+
+        # test negative axis with split
+        a = ht.array([[2, 3], [4, 5], [6, 7], [8, 9]], split=1, dtype=ht.float32)
+        r_a = ht.array([[9, 8], [7, 6], [5, 4], [3, 2]], split=1, dtype=ht.float32)
+        self.assertTrue(ht.equal(ht.flip(a, (0, -1)), r_a))
+        self.assertTrue(ht.equal(ht.flip(a, (-2, -1)), r_a))
+
     def test_fliplr(self):
         b = ht.array([[1, 2], [3, 4]])
         r_b = ht.array([[2, 1], [4, 3]])
@@ -1119,7 +1138,7 @@ def test_fliplr(self):
 
         # test exception
         a = ht.arange(10)
-        with self.assertRaises(IndexError):
+        with self.assertRaises(ValueError):
             ht.fliplr(a)
 
     def test_flipud(self):

From f72990bf78542c15add263e15b77f06544ae53e0 Mon Sep 17 00:00:00 2001
From: Heat Release Bot <>
Date: Mon, 1 Dec 2025 15:25:32 +0000
Subject: [PATCH 06/15] Bump version to 1.7.0

---
 CHANGELOG.md         | 16 ++++++++++++++++
 heat/core/version.py |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 818a7dde09..eb9cc88bbd 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,19 @@
+# v1.7.0 - Heat Minor Release - 1.7.0
+## Changes
+
+- #2050 [Backport stable] Supporting negative indices for flip operations (by @[github-actions[bot]](https://github.com/apps/github-actions))
+- #2024 [Backport stable] Set correct dtype when loading and saving hdf5 (by @[github-actions[bot]](https://github.com/apps/github-actions))
+- #1979 [Backport stable] Sturdier MPI Check (by @[github-actions[bot]](https://github.com/apps/github-actions))
+
+### Bug Fixes
+
+- #1999 [Backport stable] Bugs/1990  Fix handling of zarr groups (by @[github-actions[bot]](https://github.com/apps/github-actions))
+
+## Contributors
+
+@github-actions[bot] and [github-actions[bot]](https://github.com/apps/github-actions)
+
+
 # v1.6.0
 ## Highlights
 
diff --git a/heat/core/version.py b/heat/core/version.py
index b81bec1221..bb133399f7 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -6,7 +6,7 @@
 """Indicates feature extension."""
 micro: int = 0
 """Indicates revisions for bugfixes."""
-extension: str = "dev"
+extension: str = None
 """Indicates special builds, e.g. for specific hardware."""
 
 if not extension:

From 2611ca1ce613000a1a6cf9639158c233b96ad5c2 Mon Sep 17 00:00:00 2001
From: Heat Release Bot <>
Date: Mon, 1 Dec 2025 15:25:32 +0000
Subject: [PATCH 07/15] Update pytorch image in Dockerfile.release and
 Dockerfile.source to version

---
 docker/Dockerfile.release | 2 +-
 docker/Dockerfile.source  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docker/Dockerfile.release b/docker/Dockerfile.release
index 8ead42996a..99e8e14370 100644
--- a/docker/Dockerfile.release
+++ b/docker/Dockerfile.release
@@ -1,5 +1,5 @@
 ARG HEAT_VERSION=latest
-ARG PYTORCH_IMG=25.07-py3
+ARG PYTORCH_IMG=25.11-py3
 
 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
 COPY ./tzdata.seed /tmp/tzdata.seed
diff --git a/docker/Dockerfile.source b/docker/Dockerfile.source
index 93a017b359..ceb9fae49b 100644
--- a/docker/Dockerfile.source
+++ b/docker/Dockerfile.source
@@ -1,4 +1,4 @@
-ARG PYTORCH_IMG=25.07-py3
+ARG PYTORCH_IMG=25.11-py3
 ARG HEAT_BRANCH=main
 
 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base

From 5ab622a7ee6e5ac77f631e5b5de1c71b6f316554 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Mon, 1 Dec 2025 17:10:13 +0100
Subject: [PATCH 08/15] prep for 1.7.0

---
 .github/workflows/inactivity.yml                   |  2 ++
 .pre-commit-config.yaml                            |  4 ++--
 .talismanrc                                        |  2 ++
 CHANGELOG.md                                       | 14 +++++++-------
 CITATION.cff                                       | 10 +++-------
 README.md                                          |  2 +-
 .../notebooks/0_setup/0_setup_haicore.ipynb        |  4 ----
 .../tutorials/notebooks/0_setup/0_setup_jsc.ipynb  |  3 ---
 8 files changed, 17 insertions(+), 24 deletions(-)
 create mode 100644 .talismanrc

diff --git a/.github/workflows/inactivity.yml b/.github/workflows/inactivity.yml
index bf6d997838..f3529d9e42 100644
--- a/.github/workflows/inactivity.yml
+++ b/.github/workflows/inactivity.yml
@@ -31,3 +31,5 @@ jobs:
           stale-pr-message: "This pull request is stale because it has been open for 60 days with no activity."
           close-pr-message: "This pull request was closed because it has been inactive for 60 days since being marked as stale."
           repo-token: ${{ secrets.GITHUB_TOKEN }}
+          exempt-issue-labels: "epic,discussion,good first issue,RFC,student project"
+          exempt-pr-labels: "epic,discussion,good first issue,RFC,student project"
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 684b582c1f..5aecdbab0d 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -13,7 +13,7 @@ repos:
       - id: check-added-large-files
       - id: check-toml
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.18.2 # Use the sha / tag you want to point at
+    rev: v1.19.0 # Use the sha / tag you want to point at
     hooks:
       - id: mypy
         args: [--config-file, pyproject.toml, --ignore-missing-imports]
@@ -26,7 +26,7 @@ repos:
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
     # Ruff version.
-    rev: v0.14.6
+    rev: v0.14.7
     hooks:
       # Run the linter.
       - id: ruff
diff --git a/.talismanrc b/.talismanrc
new file mode 100644
index 0000000000..2fa4d07d95
--- /dev/null
+++ b/.talismanrc
@@ -0,0 +1,2 @@
+allowed_patterns:
+- 'uses: [A-Za-z-\/]+@[\w\d]+ # v\d+\.\d+\.\d+'
diff --git a/CHANGELOG.md b/CHANGELOG.md
index eb9cc88bbd..f0821824c6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,18 +1,18 @@
 # v1.7.0 - Heat Minor Release - 1.7.0
 ## Changes
 
-- #2050 [Backport stable] Supporting negative indices for flip operations (by @[github-actions[bot]](https://github.com/apps/github-actions))
-- #2024 [Backport stable] Set correct dtype when loading and saving hdf5 (by @[github-actions[bot]](https://github.com/apps/github-actions))
-- #1979 [Backport stable] Sturdier MPI Check (by @[github-actions[bot]](https://github.com/apps/github-actions))
 
 ### Bug Fixes
 
-- #1999 [Backport stable] Bugs/1990  Fix handling of zarr groups (by @[github-actions[bot]](https://github.com/apps/github-actions))
+* Sturdier MPI+GPU compatibility check by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1979
+* Fix handling of zarr groups by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1990
+* Supporting negative indices for flip operations by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014
+* Fixed issue where matrices returned by ```eigh``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2046
+* Fixed issue where matrices returned by ```qr``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2045
+* Dtype is now set correctly when loading and saving hdf5 files by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014
 
 ## Contributors
-
-@github-actions[bot] and [github-actions[bot]](https://github.com/apps/github-actions)
-
+@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM
 
 # v1.6.0
 ## Highlights
diff --git a/CITATION.cff b/CITATION.cff
index b09e7f80a5..99e317226a 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -7,19 +7,15 @@ message: 'If you use this software, please cite it as below.'
 type: software
 authors:
 # release highlights
+  - family-names: Comito
+    given-names: Claudia
   - family-names: Hoppe
     given-names: Fabian
   - family-names: Gutiérrez Hermosillo Muriedas
     given-names: Juan Pedro
-  - family-names: Palazoglu
-    given-names: Berkant
-  - family-names: Fischer
-    given-names: Carola
+# active contributors in alphabetic order
   - family-names: Akdag
     given-names: Hakan
-  - family-names: Comito
-    given-names: Claudia
-# active contributors in alphabetic order
   - family-names: Hees
     given-names: Jörn
   - family-names: Jindra
diff --git a/README.md b/README.md
index 0f6ca711d1..0ad84becf8 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ Heat is a distributed tensor framework for high performance data analytics.
 [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/helmholtz-analytics/heat/badge)](https://securityscorecards.dev/viewer/?uri=github.com/helmholtz-analytics/heat)
 [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7688/badge)](https://bestpractices.coreinfrastructure.org/projects/7688)
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2531472.svg)](https://doi.org/10.5281/zenodo.2531472)
-[![Benchmarks](https://img.shields.io/badge/Grafana-Benchmarks-2ea44f)](https://57bc8d92-72f2-4869-accd-435ec06365cb.ka.bw-cloud-instance.org:3000/d/adjpqduq9r7k0a/heat-cb?orgId=1)
+[![Benchmarks](https://img.shields.io/badge/Grafana-Benchmarks-2ea44f)](https://930000e0-e69a-4939-912e-89a92316b420.ka.bw-cloud-instance.org/grafana)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![JuRSE Code Pick of the Month](https://img.shields.io/badge/JuRSE_Code_Pick-August_2024-blue)](https://www.fz-juelich.de/en/rse/jurse-community/jurse-code-of-the-month/august-2024)
 
diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
index 6e4662a701..90758679bd 100644
--- a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb
@@ -36,11 +36,7 @@
     "tags": []
    },
    "source": [
-    "\n",
-    "\n",
-    "\n",
     "## Introduction\n",
-    "---\n",
     "<div class=\"alert alert-block alert-warning\">\n",
     "<b>Note:</b>\n",
     "This notebook expects that you will be working on the JupyterLab hosted in <a href=\"https://haicore-jupyter.scc.kit.edu/\">HAICORE</a>, at the Karlsruhe Institute of Technology.\n",
diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
index ee00ae6115..9ad18751a9 100644
--- a/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
+++ b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb
@@ -20,13 +20,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "\n",
     "<div style=\"float: right; padding-right: 2em; padding-top: 2em;\">\n",
     "    <img src=\"https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/source/_static/images/logo.png\"></img>\n",
     "</div>\n",
     "\n",
     "## Introduction\n",
-    "---\n",
     "<div class=\"alert alert-block alert-warning\">\n",
     "<b>Note:</b>\n",
     "This tutorial is designed to run on <a href=\"https://jupyter-jsc.fz-juelich.de/\">Jupyter-JSC</a>, a JupyterLab environment provided by the Jülich Supercomputing Centre.  \n",
@@ -156,7 +154,6 @@
    "metadata": {},
    "source": [
     "## What is Heat for?\n",
-    "---\n",
     "\n",
     "[**deRSE24 NOTE**:  do attend Fabian Hoppe's talk [TODAY at 16:30](https://events.hifis.net/event/994/contributions/7940/) for more details, benchmarks, and an overview of the parallel Python ecosystem.] \n",
     "\n",

From 1cc54821b8f464a427e6c5efb4c4748a89d0c030 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Wed, 10 Dec 2025 12:47:10 +0100
Subject: [PATCH 09/15] Add device parameter to QR output arrays (#2051)

Update QR decomposition to include device parameter for R and Q arrays.

(cherry picked from commit bd5df7cf283d0bedf48f954aed5d9aeebe294b82)

Co-authored-by: Giovanni Pederiva <g.pederiva@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 heat/core/linalg/qr.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/heat/core/linalg/qr.py b/heat/core/linalg/qr.py
index 4ca0c3fc01..d28183f297 100644
--- a/heat/core/linalg/qr.py
+++ b/heat/core/linalg/qr.py
@@ -107,7 +107,7 @@ def qr(
     if not A.is_distributed() or A.split < A.ndim - 2:
         # handle the case of a single process or split=None: just PyTorch QR
         Q, R = single_proc_qr(A.larray, mode=mode)
-        R = factories.array(R, is_split=A.split)
+        R = factories.array(R, is_split=A.split, device=A.device)
         if mode == "reduced":
             Q = factories.array(Q, is_split=A.split, device=A.device)
         else:

From 38df20877e14642d808d56f26a6840779088a51f Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Wed, 10 Dec 2025 13:30:20 +0100
Subject: [PATCH 10/15] Handling of unknown MPI Libraries (#2032) (#2060)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* fix: Added 'other' as option for mpi library

* fix: corrected parastation configuration

* Update heat/core/_config.py


---------


(cherry picked from commit 50080e08c8cea0aef45bfe50be87032feb0c334e)

Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas <juanpedroghm@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 heat/core/_config.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/heat/core/_config.py b/heat/core/_config.py
index 48d0a3e22b..c82063449e 100644
--- a/heat/core/_config.py
+++ b/heat/core/_config.py
@@ -19,7 +19,8 @@ class MPILibrary(Enum):
     MVAPICH = "mvapich"
     MPICH = "mpich"
     CrayMPI = "craympi"
-    ParastationMPI = "psmpi"
+    ParaStationMPI = "psmpi"
+    Other = "other"
 
 
 @dataclasses.dataclass
@@ -37,9 +38,12 @@ def _get_mpi_library() -> MPILibraryInfo:
             return MPILibraryInfo(MPILibrary.IntelMPI, library[3])
         case ["MPICH", "Version:", *_]:
             return MPILibraryInfo(MPILibrary.MPICH, library[2])
-        ### Missing libraries
+        case ["MVAPICH", "Version:", *_]:
+            return MPILibraryInfo(MPILibrary.MVAPICH, library[2])
+        case ["===", "ParaStation", "MPI", *_]:
+            return MPILibraryInfo(MPILibrary.ParaStationMPI, library[3])
         case _:
-            print("Did not find a matching library")
+            return MPILibraryInfo(MPILibrary.Other, "unknown")
 
 
 def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]:
@@ -81,7 +85,7 @@ def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]:
             cuda = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
             rocm = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1"
             return cuda, rocm
-        case MPILibrary.ParastationMPI:
+        case MPILibrary.ParaStationMPI:
             cuda = os.environ.get("PSP_CUDA") == "1"
             rocm = False
             return cuda, rocm

From b0961b0daecc087a9b25f2098744becc8d40d17b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Fri, 19 Dec 2025 16:19:04 +0100
Subject: [PATCH 11/15] changelog

---
 CHANGELOG.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index f0821824c6..5c18de1608 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,21 @@
 # v1.7.0 - Heat Minor Release - 1.7.0
+
+## Highlights
+
+1) Randomized Symmetric eignevalue decomposition (eigh)
+2) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch
+3) Incremental SVD directly from an HDF5 file
+4) Partial support of the Array API Standard (version: '2020.10'), and API namespace under `x.__array_namespace__(api_version='2020.10')`
+5) Distributed PTP (peak to peak) function
+
 ## Changes
+### Features
 
+* Randomized Symmetric Eigenvalue Decomposition (eigh) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1964
+* Incremental SVD directly from HDF5 file by @LScheib in https://github.com/helmholtz-analytics/heat/pull/2005
+* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022
+* Distributed Peak to Peak (ptp) function by @ivansherbakov9 in https://github.com/helmholtz-analytics/heat/pull/1954
+* PyTorch compatible DistributedSampler by @Berkant03 in https://githubcom/helmholtz-analytics/heat/pull/1807
 
 ### Bug Fixes
 
@@ -10,6 +25,9 @@
 * Fixed issue where matrices returned by ```eigh``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2046
 * Fixed issue where matrices returned by ```qr``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2045
 * Dtype is now set correctly when loading and saving hdf5 files by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014
+* Fix MPI large count issues when respliting by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1973
+* Default GPU+MPI compatibility settings for unknown MPI implementations by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/2060
+
 
 ## Contributors
 @Marc-Jindra, @ClaudiaComito, @JuanPedroGHM

From 70efcf924e7ded25a9ec23957fb30020f2add4d9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Pedro=20Guti=C3=A9rrez=20Hermosillo=20Muriedas?=
 <juanpedroghm@gmail.com>
Date: Mon, 22 Dec 2025 15:03:11 +0100
Subject: [PATCH 12/15] Update bug_report.yml

---
 .github/ISSUE_TEMPLATE/bug_report.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
index 12a163dde4..93ea99fb6a 100644
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -34,7 +34,7 @@ body:
       description: What version of Heat are you running?
       options:
         - main (development branch)
-        - 1.6.x
+        - 1.7.x
         - other
     validations:
       required: true

From 7100ea70737e31910443a70c7459ee1f2e36ec28 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Mon, 22 Dec 2025 15:04:44 +0100
Subject: [PATCH 13/15] final changelog, citation and readme

---
 CHANGELOG.md | 17 ++++++++++++-----
 CITATION.cff |  6 ++++--
 README.md    |  5 +++--
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5c18de1608..dd5a6440d1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,21 +2,25 @@
 
 ## Highlights
 
-1) Randomized Symmetric eignevalue decomposition (eigh)
-2) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch
+1) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch
+2) Randomized Symmetric eignevalue decomposition (eigh)
 3) Incremental SVD directly from an HDF5 file
 4) Partial support of the Array API Standard (version: '2020.10'), and API namespace under `x.__array_namespace__(api_version='2020.10')`
 5) Distributed PTP (peak to peak) function
 
+*SVD, PCA, and DMD have been implemented within the project ESAPCA funded by the European Space Agency (ESA). This support is gratefully acknowledged.*
+
 ## Changes
 ### Features
-
 * Randomized Symmetric Eigenvalue Decomposition (eigh) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1964
 * Incremental SVD directly from HDF5 file by @LScheib in https://github.com/helmholtz-analytics/heat/pull/2005
-* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022
 * Distributed Peak to Peak (ptp) function by @ivansherbakov9 in https://github.com/helmholtz-analytics/heat/pull/1954
 * PyTorch compatible DistributedSampler by @Berkant03 in https://githubcom/helmholtz-analytics/heat/pull/1807
 
+### Interoperability
+* Support Pytorch 2.9.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/2001
+* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022
+
 ### Bug Fixes
 
 * Sturdier MPI+GPU compatibility check by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1979
@@ -30,7 +34,10 @@
 
 
 ## Contributors
-@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM
+@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM, @GioPede, @ivansherbakov9, @LScheib, @Berkant03, @mrfh92, @mtar
+
+#### Acknowledgement and Disclaimer
+*This work is partially carried out under a [programme](https://activities.esa.int/index.php/4000144045) of, and funded by, the European Space Agency. Any view expressed in this repository or related publications can in no way be taken to reflect the official opinion of the European Space Agency.*
 
 # v1.6.0
 ## Highlights
diff --git a/CITATION.cff b/CITATION.cff
index 99e317226a..1e8e02c9c0 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -7,10 +7,12 @@ message: 'If you use this software, please cite it as below.'
 type: software
 authors:
 # release highlights
-  - family-names: Comito
-    given-names: Claudia
   - family-names: Hoppe
     given-names: Fabian
+  - family-names: Palazoglu
+    given-names: Berkant
+  - family-names: Comito
+    given-names: Claudia
   - family-names: Gutiérrez Hermosillo Muriedas
     given-names: Juan Pedro
 # active contributors in alphabetic order
diff --git a/README.md b/README.md
index 0ad84becf8..9a0866ca69 100644
--- a/README.md
+++ b/README.md
@@ -118,6 +118,7 @@ computational and memory needs of your laptop and desktop.
 ### Parallel I/O
 - h5py
 - netCDF4
+- zarr
 
 ### GPU support
 In order to do computations on your GPU(s):
@@ -132,10 +133,10 @@ On most HPC-systems you will not be able to install/compile MPI or CUDA/ROCm you
 Install the latest version with
 
 ```bash
-pip install heat[hdf5,netcdf]
+pip install heat[hdf5,netcdf,zarr]
 ```
 where the part in brackets is a list of optional dependencies. You can omit
-it, if you do not need HDF5 or NetCDF support.
+it, if you do not need HDF5, NetCDF, or Zarr support.
 
 ## **conda**
 

From b1a60b8fd550166d1f7005a0125e0f9a886083d4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Mon, 22 Dec 2025 15:36:55 +0100
Subject: [PATCH 14/15] fix: removed p3.14 tests

---
 .github/workflows/ci.yaml |  8 ++++----
 CHANGELOG.md              |  2 +-
 CITATION.cff              | 18 +++++++++---------
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index af57812b6c..55fa7882c6 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -27,10 +27,10 @@ jobs:
           - 'torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1'
           - 'torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0'
           - 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1'
-        include:
-          - py-version: '3.14'
-            pytorch-version: 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1'
-            install-options: '.'
+        # include:
+        #   - py-version: '3.14'
+        #     pytorch-version: 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1'
+        #     install-options: '.'
         exclude:
           - py-version: '3.13'
             pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2'
diff --git a/CHANGELOG.md b/CHANGELOG.md
index dd5a6440d1..a4bf58db96 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,4 @@
-# v1.7.0 - Heat Minor Release - 1.7.0
+# v1.7.0
 
 ## Highlights
 
diff --git a/CITATION.cff b/CITATION.cff
index 1e8e02c9c0..225160bac8 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -7,17 +7,21 @@ message: 'If you use this software, please cite it as below.'
 type: software
 authors:
 # release highlights
-  - family-names: Hoppe
-    given-names: Fabian
   - family-names: Palazoglu
     given-names: Berkant
+  - family-names: Hoppe
+    given-names: Fabian
+  - family-names: Scheib
+    given-names: Lukas
+  - family-names: Tarnawa
+    given-names: Michael
+# active contributors in alphabetic order
+  - family-names: Akdag
+    given-names: Hakan
   - family-names: Comito
     given-names: Claudia
   - family-names: Gutiérrez Hermosillo Muriedas
     given-names: Juan Pedro
-# active contributors in alphabetic order
-  - family-names: Akdag
-    given-names: Hakan
   - family-names: Hees
     given-names: Jörn
   - family-names: Jindra
@@ -28,10 +32,6 @@ authors:
     given-names: Kai
   - family-names: Lemmen
     given-names: Jonas
-  - family-names: Scheib
-    given-names: Lukas
-  - family-names: Tarnawa
-    given-names: Michael
 # historic core team
   - family-names: Coquelin
     given-names: Daniel

From 3f78649d5a93597fe7677068cb1d17ca497fc10d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Fri, 9 Jan 2026 14:37:51 +0100
Subject: [PATCH 15/15] fix: dev version number

---
 heat/core/version.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index bb133399f7..ea6b038234 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -2,11 +2,11 @@
 
 major: int = 1
 """Indicates Heat's main version."""
-minor: int = 7
+minor: int = 8
 """Indicates feature extension."""
 micro: int = 0
 """Indicates revisions for bugfixes."""
-extension: str = None
+extension: str = "dev"
 """Indicates special builds, e.g. for specific hardware."""
 
 if not extension: