From 83727cb27bf07ee5af204a86bfdf085442c2ad22 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 3 Sep 2025 13:12:48 +0200 Subject: [PATCH 01/15] Heat 1.6.0 - Release (#1957) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update heat/core/io.py Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * added test for dndarray.info * added tests for two uncovered exception lines * one additional line from DMD covered * one more line in DMD covered * debugging * build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.4.0 to 5.5.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/42375524e23c412d93fb67b49958b491fce71c38...8d9ed9ac5c53483de85588cdf95a591a75ab9f55) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/flake8: 7.1.2 → 7.2.0](https://github.com/PyCQA/flake8/compare/7.1.2...7.2.0) * build(deps): bump github/codeql-action from 3.28.12 to 3.28.13 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.12 to 3.28.13. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/5f8171a638ada777af81d42b55959a643bb29017...1b549b9259bda1cb5ddde3b41741a82a2d15a841) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * further work on eigh * eigh completed for split = 0 * flake8 * tests for eigh, now split=none,0,1 * build(deps): bump step-security/harden-runner from 2.11.0 to 2.11.1 (#1851) Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.11.0 to 2.11.1. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](https://github.com/step-security/harden-runner/compare/4d991eb9b905ef189e4c376166672c3f2f230481...c6295a65d1254861815972266d5933fd6e532bdf) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.11.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump actions/dependency-review-action from 4.5.0 to 4.6.0 (#1850) Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.5.0 to 4.6.0. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](https://github.com/actions/dependency-review-action/compare/3b139cfc5fae8b618d3eae3675e383bb1769c019...ce3cf9537a52e8119d91fd484ab5b8a807627bf8) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 4.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * minor modifications due to Copilot's Review * added SVD for general case * reformatting * tests for SVD * tests for SVD completed * added module _config in core which is intended to handle MPI, CUDA, and ROCm versioning * added variable GPU_AWARE_MPI * added MPICH * changed method info into __repr__ * moved __repr__ to printing module * removed dead code * restructuring of tests * added further test to DMDc * small typo corrected * adapted tolerances for last test; errors grow w.r.t. timesteps (is in the nature of DMD), so largest number of procs determines tolerance * Update test_dmd.py lower tolerances for the AMD-runner * dummy commit since sth was wrong with pre-commit * corrected tests * debugging of tests * Remove unnecessary contiguous calls (#1831) * removed contiguous calls from manipulations.py * removed the contiguous calls from linalg/qr.py * removed unnecessary contiguous call in factories.py * removed some more unnecessary contiguous calls * reinstate contiguous() calls if needed * removed the contiguous calls from linalg/qr.py * reinstate setting Q_buf to Q_curr --------- Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * build(deps): bump github/codeql-action from 3.28.13 to 3.28.15 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.13 to 3.28.15. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/1b549b9259bda1cb5ddde3b41741a82a2d15a841...45775bd8235c68ba998cffa5171334d58593da47) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.28.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * removed debugging prints that were forgotten before merging into main * subTest'ified the zarr tests, added some strange exception handling that is likely necessary to accomodate zarr-versions compatible with Python 3.10 * small bug fix * bugfix in eigh * removed unneccesary numpy import * changed representation string according to review * debugging of memory consumption in Polar * bug fixes for devices in polar and eigh * bugfixes for certain device constellations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update README.md * Support latest PyTorch release * Make unit tests compatible with NumPy 2.x (#1826) * changed row_stack to vstack for numpy >= 2.0.0 * Changed the numpy version check to the numpy suggested method * Changed numpy version requirement * fixed DeprecationWarning from missing axes for np.fft.fftn * Fixed one __array_wrap__ DeprecationWarning * Stopped testing cross of vector axes with 2 elements for numpy >= 2.0 * changed requirements to avoid errors with numpy >= 2 * Using Python 3.10 for RecievePR * changed python-version to '3.10' because 3.10 was interpreted as 3.1 * changed ci.yaml to exclude python 3.9 * Fixed two DeprecationWarnings of np.cross() by adding a third dimension with zeros * Fixed the last np.cross() warning by performing the operation manually * changed dtype of np.cross() to float32 --------- Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 (#1863) * build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.5.0 to 5.6.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/8d9ed9ac5c53483de85588cdf95a591a75ab9f55...a26af69be951a213d495a4c3e4e4022e16d87065) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: 5.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * Debugging --------- Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * build(deps): bump docker/build-push-action from 6.15.0 to 6.16.0 (#1860) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.15.0 to 6.16.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/471d1dc4e07e5cdedd4c2171150001c434f0b7a4...14487ce63c7a62a4a324b0bfb37086795e31c6c1) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: 6.16.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump step-security/harden-runner from 2.11.1 to 2.12.0 (#1861) Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.11.1 to 2.12.0. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](https://github.com/step-security/harden-runner/compare/c6295a65d1254861815972266d5933fd6e532bdf...0634a2670c59f64b4a01f0f96f84700a4088b9f0) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.12.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github/codeql-action from 3.28.15 to 3.28.17 (#1866) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.15 to 3.28.17. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/45775bd8235c68ba998cffa5171334d58593da47...60168efe1c415ce0f5521ea06d5c2062adbeed1b) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.28.17 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump actions/dependency-review-action from 4.6.0 to 4.7.0 Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.6.0 to 4.7.0. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](https://github.com/actions/dependency-review-action/compare/ce3cf9537a52e8119d91fd484ab5b8a807627bf8...38ecb5b593bf0eb19e335c03f97670f792489a8b) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 4.7.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * added benchmarks for eigh, svd, and rsvd * dummy commit to trigger benchmark runs * Support latest PyTorch release * changed torchvision version to <0.22.1 * retrigger checks * build(deps): bump github/codeql-action from 3.28.17 to 3.28.18 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.17 to 3.28.18. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/60168efe1c415ce0f5521ea06d5c2062adbeed1b...ff0a06e83cb2de871e5a09832bc6a81e7276941f) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.28.18 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * build(deps): bump docker/build-push-action from 6.16.0 to 6.17.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.16.0 to 6.17.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/14487ce63c7a62a4a324b0bfb37086795e31c6c1...1dc73863535b631f98b2378be8619f83b136f4a0) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: 6.17.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * build(deps): bump actions/dependency-review-action from 4.7.0 to 4.7.1 Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.7.0 to 4.7.1. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](https://github.com/actions/dependency-review-action/compare/38ecb5b593bf0eb19e335c03f97670f792489a8b...da24556b548a50705dd671f47852072ea4c105d9) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 4.7.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * RTD Notebook gallery and profiling notebook with perun. (#1867) * docs: notebook gallery in rtd * docs: missing makefiles * docs: reverted changes to gitignore * haicore notebook setup * ompi in readthedocs build * correct apt package for mpi * docs: replaced small bodies dataset with digits from sklearn (boring, but easier to access on the long term) * perun notebook * wrong cell type * added pytorch 2.7 to ci workflow * docs: post practice run fixes * notebook thumbnails, formatting and corrections for tutorial * forgot to uncomment autoapi * build(deps): bump github/codeql-action from 3.28.18 to 3.28.19 (#1881) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.18 to 3.28.19. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/ff0a06e83cb2de871e5a09832bc6a81e7276941f...fca7ace96b7d713c7035871441bd52efbe39e27e) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.28.19 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump docker/build-push-action from 6.17.0 to 6.18.0 (#1877) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.17.0 to 6.18.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/1dc73863535b631f98b2378be8619f83b136f4a0...263435318d21b8e681c14492fe198d362a7d2c83) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: 6.18.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 (#1878) Bumps [ossf/scorecard-action](https://github.com/ossf/scorecard-action) from 2.4.1 to 2.4.2. - [Release notes](https://github.com/ossf/scorecard-action/releases) - [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md) - [Commits](https://github.com/ossf/scorecard-action/compare/f49aabe0b5af0936a0987cfb85d86b75731b0186...05b42c624433fc40578a4040d5cf5e36ddca8cde) --- updated-dependencies: - dependency-name: ossf/scorecard-action dependency-version: 2.4.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump step-security/harden-runner from 2.12.0 to 2.12.1 Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.0 to 2.12.1. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](https://github.com/step-security/harden-runner/compare/0634a2670c59f64b4a01f0f96f84700a4088b9f0...002fdce3c6a235733a90a27c80493a3241e56863) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.12.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * build(deps): bump github/codeql-action from 3.28.19 to 3.29.0 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.19 to 3.29.0. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/fca7ace96b7d713c7035871441bd52efbe39e27e...ce28f5bb42b7a9f2c824e633a3f6ee835bab6858) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * Add special methods for operations in NumPy * add tests for NumPy related array methods * fix variable name * Exit installation if conda environment cannot be activated (#1880) * exit 0_setup_conda.sh if environment cannot be activated otherwise the script might install into the base environment * exit 0_setup_pip.sh if environment cannot be activated --------- Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas * fix item access * add contiguous call again * same as before * [pre-commit.ci] pre-commit autoupdate (#1894) updates: - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](https://github.com/PyCQA/flake8/compare/7.2.0...7.3.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * bugfix in rSVD for the case the rank is smaller than number of processes * build(deps): bump github/codeql-action from 3.29.0 to 3.29.1 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.0 to 3.29.1. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/ce28f5bb42b7a9f2c824e633a3f6ee835bab6858...39edc492dbe16b1465b0cafca41432d857bdb31a) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * resolved bug in rSVD, actualle one-process QR * Support PyTorch 2.7.1 (#1883) * Support latest PyTorch release * Update ci.yaml pytorch: add v2.2, drop v1.11 * debugging test_lasso * Support latest PyTorch release * Update bug_report.yml Add latest versions for options * update torchvision * do not test latest torch in matrix * pin zarr version * remove dead code * pin zarr to 3.0.8 * add back latest pytorch to matrix * edit PR body * Update ci.yaml --------- Co-authored-by: ClaudiaComito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: Michael Tarnawa Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> * build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3.10.0 to 3.11.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2...e468171a9de216ec08956ac3ada2f0791b6bd435) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: 3.11.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * Update polar.py * Revert "build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1" (#1908) * support torch_function (#1895) Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * Features/1845 Update citations (#1846) * Update README.md: added citation possibilities * Update README.md: link to ZENODO * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update README.md * Update README.md * Update README.md --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * Transition to pyproject.toml, Ruff, and mypy (#1832) * Support latest PyTorch release * Support latest PyTorch release * retrigger checks * wip: toml, ruff, mypy and cli * ci: better mypy error filter (down to 3041) * ci: mypy in list of dev deps * ci: skiping mypy for now * dependency specification * missing argument description for permute method * fixed dynamic version in pyproject.toml * properly skipping mypy in the pre-commit.ci * fixed dependencies, removed references to python 3.9, added python 3.13 * doc -> docs * fixed tests * cli tests * Update .github/ISSUE_TEMPLATE/bug_report.yml Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> * removed problematic pytorch version?! * consistency is important * did not work * vulnerabitliy removal, other changes to toml * It's not working :( * fixing tests * Update .pre-commit-config.yaml * fix: cli does not change the default device * fix: support for older pytorch version on the cli, limit zarr package version * Update pytorch exclude with py-3.13 --------- Co-authored-by: ClaudiaComito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: Marc-Jindra Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> * build(deps): bump docker/setup-buildx-action from 3.10.0 to 3.11.1 (#1911) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3.10.0 to 3.11.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2...e468171a9de216ec08956ac3ada2f0791b6bd435) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: 3.11.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github/codeql-action from 3.29.1 to 3.29.2 (#1910) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.1 to 3.29.2. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/39edc492dbe16b1465b0cafca41432d857bdb31a...181d5eefc20863364f96762470ba6f862bdef56b) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump step-security/harden-runner from 2.12.1 to 2.12.2 (#1909) Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.1 to 2.12.2. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](https://github.com/step-security/harden-runner/compare/002fdce3c6a235733a90a27c80493a3241e56863...6c439dc8bdf85cadbbce9ed30d1c7b959517bc49) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.12.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * took review into account * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pre-commit stuff * [pre-commit.ci] pre-commit autoupdate (#1912) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.11.13...v0.12.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Updated release_prep.yml to incorporate up-to-date Dockerfile Pytorch versions (#1903) * Create update_docker.yml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update release-prep.yml Added the update functionality of the Dockerfile.source and Dockerfile.release Pytorch Image versions. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed update_docker.yml since the functionality was moved to release prep --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * [StepSecurity] Apply security best practices (#1891) * [StepSecurity] Apply security best practices Signed-off-by: StepSecurity Bot * Test with shellcheck-py * Update .pre-commit-config.yaml * shellcheck updates * Update build_and_push.sh * Update increment_version.sh * Update increment_version.sh * Update build_and_push.sh * Update test_nvidia_image_haicore_enroot.sh * Update test_nvidia_image_haicore_enroot.sh * Update build_and_push.sh * Update 0_setup_pip.sh --------- Signed-off-by: StepSecurity Bot Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * build(deps): bump korthout/backport-action from 3.2.0 to 3.2.1 Bumps [korthout/backport-action](https://github.com/korthout/backport-action) from 3.2.0 to 3.2.1. - [Release notes](https://github.com/korthout/backport-action/releases) - [Commits](https://github.com/korthout/backport-action/compare/436145e922f9561fc5ea157ff406f21af2d6b363...0193454f0c5947491d348f33a275c119f30eb736) --- updated-dependencies: - dependency-name: korthout/backport-action dependency-version: 3.2.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * build(deps): bump step-security/harden-runner from 2.12.1 to 2.13.0 Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.12.1 to 2.13.0. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](https://github.com/step-security/harden-runner/compare/v2.12.1...ec9f2d5744a09debf3a187a3f4f675c53b671911) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.13.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-mypy: v1.16.1 → v1.17.0](https://github.com/pre-commit/mirrors-mypy/compare/v1.16.1...v1.17.0) - [github.com/astral-sh/ruff-pre-commit: v0.12.3 → v0.12.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.3...v0.12.4) - [github.com/gitleaks/gitleaks: v8.16.3 → v8.28.0](https://github.com/gitleaks/gitleaks/compare/v8.16.3...v8.28.0) * Fix ValueError in save_zarr by conditional handling of chunks argument Fixes error when calling zarr.create by only passing chunks as a list if not None. * Unfix zarr version in pyproject.toml to test CI job `test_amd` * Add note in docstring of save_zarr() * build(deps): bump tj-actions/branch-names from 8.2.1 to 9.0.1 Bumps [tj-actions/branch-names](https://github.com/tj-actions/branch-names) from 8.2.1 to 9.0.1. - [Release notes](https://github.com/tj-actions/branch-names/releases) - [Changelog](https://github.com/tj-actions/branch-names/blob/main/HISTORY.md) - [Commits](https://github.com/tj-actions/branch-names/compare/dde14ac574a8b9b1cedc59a1cf312788af43d8d8...386e117ea34339627a40843704a60a3bc9359234) --- updated-dependencies: - dependency-name: tj-actions/branch-names dependency-version: 9.0.1 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] * build(deps): bump github/codeql-action from 3.29.2 to 3.29.4 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.2 to 3.29.4. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/181d5eefc20863364f96762470ba6f862bdef56b...4e828ff8d448a8a6e532957b1811f387a63867e8) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * Setuptools build fix on pytproject.toml (#1919) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.4 → v0.12.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.4...v0.12.5) * Stride argument for convolution (#1865) * Add: enable stride as input option for convolution Belongs to Issue 1755 Changes include - stride as optional input - stride in docstring (except for output dim computations and as example) - raise ValueError for incorrect option (stride < 1) and combinations (stride > 1 and mode = 'same') * Add: Failing stride tests in test_signal.py - Added correct results for odd and even kernel - Added calls for convolution with stride=2 except with exception for mode `same` - Covers is_mps true and false - Stopped at tests for large distributed signals * Add: Pass stride tests for standard cases Belongs to Issue 1755 - edge cases and batch-processing not yet implemented or tested - Fix test error - Implement stride to pass tests - Write convolution_stride.py script for debugging purposes, will be removed later * Add: test for large signals with stride Belongs to Issue 1755 - passes * Add: Test cases until batch Belongs to issue 1755 - Passes tests * Add: Failing tests for batch convolution with stride Belongs to Issue 1755 * Add: batch processing with stride Belongs to Issue 1755 - Added stride to conv1D call in batch_convolution - Passes tests * Remove: Remove script to test convolution with stride * Update: Docstring of convolution Belongs to Issue 1755 - Correct stride information - Add examples * Fail: Tests fail for mpirun -n3 ... - Issue with halo size - Problem marked in code - Not solved * Update: Split stride tests - Different configurations in different tests functions - If process number == 1: all pass - If multiple process < 3 (because tested only then), 5 fail - This needs to be fixed, likely fails due to wrong halo in presence of stride * Add: Enable stride on distributed arrays but not kernels Adjust signal on each rank such that it starts with the next kernel according to stride - Added the compution of starting values for each rank - Avoid doubles for even and odd kernels Halo size computation is independent of stride Added a script for debugging, will be removed Ideas: generalize it for stride 1 (should work out of the box) To do: Adapt for distributed kernels * Add: Distributed Kernels and optimized start index computation Optimized start index computation: - Remove global index array - Use lshap map and simple modulo operation only Distributed kernels - Any stride > 1 is a subset of the solution for stride 1 - Not the most efficient but at least functional * Fix: Improve test coverage Still missing - Cuda code bits - else statement beginning line 229 * Delete: conv_test.py Test script no longer needed * Add: Add benchmarks for signal.py * Add: script to test benchmark - empty so far * Fix: Benchmarks - Fix run_signal_benchmarks - Add run_signal_benchmarks to main.py Remove: print statements from convolution in signal.py * Fix: benchmarks/cb/signal.py - add () to monitor decorator of perun * Fix: Rename signal.py to avoid clash with python3.12 signal - change signal.py to heat_signal.py * Fix: Adjust array numbers for benchmarking * Remove: benchmark script from scripts/ * Fix: Benchmarking signal.py and testing signal.py - Benchmark improved - removed import pytest from test_signal.py * Fix: test_signal.py batch convolution with stride - Stride was not passed as a single value but different values for different ranks - Solution: Do not randomly create stride values but use fixed values * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix: test_convolution-stride_large_signal_and_kernel_modes - Remove torch arrays - Instead work with np.convolve similar to test without stride * Fix: Add device for empty torch tensors - Stride test fails for large arrays - Error message says, that device does not match - Due to large stride, potentially empty tensor creation -> add device to that tensor created * Fix: torch_device instead of device - when empty tensor is called, use torch_device * Fix: Missed .device to previous commit * Fix: torch device not accessible - instead use str(ht_array.device) - Use signal.device to get correct rank --------- Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * build(deps): bump tj-actions/branch-names from 9.0.1 to 9.0.2 (#1930) Bumps [tj-actions/branch-names](https://github.com/tj-actions/branch-names) from 9.0.1 to 9.0.2. - [Release notes](https://github.com/tj-actions/branch-names/releases) - [Changelog](https://github.com/tj-actions/branch-names/blob/main/HISTORY.md) - [Commits](https://github.com/tj-actions/branch-names/compare/386e117ea34339627a40843704a60a3bc9359234...5250492686b253f06fa55861556d1027b067aeb5) --- updated-dependencies: - dependency-name: tj-actions/branch-names dependency-version: 9.0.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github/codeql-action from 3.29.4 to 3.29.8 (#1933) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.4 to 3.29.8. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/4e828ff8d448a8a6e532957b1811f387a63867e8...76621b61decf072c1cee8dd1ce2d2a82d33c17ed) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump docker/login-action from 3.4.0 to 3.5.0 (#1934) Bumps [docker/login-action](https://github.com/docker/login-action) from 3.4.0 to 3.5.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/74a5d142397b4f367a81961eba4e8cd7edddf772...184bdaa0721073962dff0199f1fb9940f07167d1) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: 3.5.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#1931) updates: - [github.com/pre-commit/mirrors-mypy: v1.17.0 → v1.17.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.17.0...v1.17.1) - [github.com/astral-sh/ruff-pre-commit: v0.12.5 → v0.12.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.5...v0.12.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas * Update CODE_OF_CONDUCT.md New email for reporting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * build(deps): bump github/codeql-action from 3.29.8 to 3.29.9 (#1943) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.8 to 3.29.9. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/76621b61decf072c1cee8dd1ce2d2a82d33c17ed...df559355d593797519d70b90fc8edd5db049e7a2) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump actions/checkout from 4.2.2 to 5.0.0 (#1942) Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 5.0.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/11bd71901bbe5b1630ceea73d27597364c9af683...08c6903cd8c0fde910a37f88322edcfb5dd907a8) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 5.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump korthout/backport-action from 3.2.1 to 3.3.0 (#1944) Bumps [korthout/backport-action](https://github.com/korthout/backport-action) from 3.2.1 to 3.3.0. - [Release notes](https://github.com/korthout/backport-action/releases) - [Commits](https://github.com/korthout/backport-action/compare/0193454f0c5947491d348f33a275c119f30eb736...ca4972adce8039ff995e618f5fc02d1b7961f27a) --- updated-dependencies: - dependency-name: korthout/backport-action dependency-version: 3.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#1936) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.12.7 → v0.12.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.7...v0.12.8) - [github.com/shellcheck-py/shellcheck-py: v0.10.0.1 → v0.11.0.1](https://github.com/shellcheck-py/shellcheck-py/compare/v0.10.0.1...v0.11.0.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * Update latest-pytorch-support.yml (#1950) move to pyproject.toml * [pre-commit.ci] pre-commit autoupdate (#1949) updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.8...v0.12.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * Bump version to 1.6.0 * Update pytorch image in Dockerfile.release and Dockerfile.source to version * update coverage, add link to issue search * add FFT, masked arrays * Update authors and contributors in CITATION.cff * fix: typo in release drafter * Updated Changelog * changelog highlights * Update micro version to 0 * Update CHANGELOG.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: dependabot[bot] Signed-off-by: StepSecurity Bot Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: jolemse <35252911+jolemse@users.noreply.github.com> Co-authored-by: Hoppe Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Marc-Jindra Co-authored-by: Hakdag97 <72792786+Hakdag97@users.noreply.github.com> Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> Co-authored-by: Till Korten Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael Tarnawa Co-authored-by: StepSecurity Bot Co-authored-by: Berkant <51971304+Berkant03@users.noreply.github.com> Co-authored-by: Lukas Scheib Co-authored-by: Lukas Scheib <146953413+LScheib@users.noreply.github.com> Co-authored-by: lolacaro Co-authored-by: Björn Hagemeier Co-authored-by: Heat Release Bot <> --- .github/ISSUE_TEMPLATE/bug_report.yml | 7 +- .github/PULL_REQUEST_TEMPLATE.md | 1 + .github/rd-release-config.yml | 84 +- .github/workflows/CIBase.yml | 6 +- .github/workflows/CISupport.yml | 7 +- .github/workflows/CommentPR.yml | 6 +- .github/workflows/ReceivePR.yml | 10 +- .github/workflows/backport.yml | 9 +- .github/workflows/bench_trigger.yml | 7 +- .github/workflows/changelog-updater.yml | 55 - .github/workflows/ci.yaml | 36 +- .github/workflows/codeql.yml | 10 +- .../workflows/create-branch-on-assignment.yml | 4 +- .github/workflows/dependency-review.yml | 6 +- .github/workflows/docker.yml | 14 +- .github/workflows/inactivity.yml | 4 +- .github/workflows/increment_version.sh | 4 +- .github/workflows/latest-pytorch-support.yml | 16 +- .github/workflows/markdown-links-check.yml | 4 +- .github/workflows/pytorch-latest-release.yml | 4 +- .github/workflows/release-drafter.yml | 13 +- .github/workflows/release-prep.yml | 90 +- .github/workflows/scorecard.yml | 10 +- .gitignore | 1 + .perun.ini | 23 +- .pre-commit-config.yaml | 47 +- .readthedocs.yaml | 8 +- CHANGELOG.md | 85 + CITATION.cff | 42 +- CODE_OF_CONDUCT.md | 2 +- README.md | 73 +- benchmarks/cb/decomposition.py | 17 + benchmarks/cb/heat_signal.py | 85 + benchmarks/cb/linalg.py | 98 + benchmarks/cb/main.py | 8 + coverage_tables.md | 763 ++-- doc/Makefile | 20 + doc/make.bat | 268 +- doc/requirements.txt | 5 - .../_static}/images/GSoC-Horizontal.svg | 0 doc/{ => source/_static}/images/bsp.svg | 0 .../_static}/images/clustering.png | Bin .../_static}/images/clustering_kmeans.png | Bin doc/{ => source/_static}/images/data.png | Bin doc/{ => source/_static}/images/dlr_logo.svg | 0 doc/{ => source/_static}/images/fzj_logo.svg | 0 .../_static}/images/hSVD_bench_rank5.png | Bin .../_static}/images/hSVD_bench_rank50.png | Bin .../_static}/images/hSVD_bench_rank500.png | Bin .../_static}/images/heat_split_array.png | Bin .../_static}/images/heat_split_array.svg | 0 .../heatvsdask_strong_smalldata_without.png | Bin .../heatvsdask_weak_smalldata_without.png | Bin .../_static}/images/helmholtz_logo.svg | 0 doc/source/_static/images/jsc_logo.png | Bin 0 -> 16766 bytes doc/source/_static/images/jupyter.png | Bin 0 -> 22885 bytes doc/{ => source/_static}/images/kit_logo.svg | 0 doc/source/_static/images/local_laptop.png | Bin 0 -> 24793 bytes doc/{ => source/_static}/images/logo.png | Bin .../_static}/images/logo_emblem.png | Bin .../_static}/images/logo_emblem.svg | 0 .../_static}/images/logo_white.png | Bin .../_static}/images/logo_white.svg | 0 doc/source/_static/images/nhr_verein_logo.jpg | Bin 0 -> 8167 bytes doc/source/_static/images/perun_logo.svg | 112 + .../_static}/images/split_array.png | Bin .../_static}/images/split_array.svg | 0 .../_static}/images/tutorial_clustering.svg | 0 .../_static}/images/tutorial_dpnn.svg | 0 .../_static}/images/tutorial_logo.svg | 0 .../images/tutorial_split_dndarray.svg | 0 .../images/weak_scaling_gpu_terrabyte.png | Bin doc/source/case_studies.rst | 6 +- doc/source/conf.py | 21 +- doc/source/index.rst | 2 +- doc/source/tutorial_dpnn.rst | 4 - .../notebooks/0_setup/0_setup_conda.sh | 15 + .../notebooks/0_setup/0_setup_haicore.ipynb | 620 ++++ .../notebooks/0_setup/0_setup_jsc.ipynb | 80 +- .../notebooks/0_setup/0_setup_local.ipynb | 48 +- .../notebooks/0_setup/0_setup_pip.sh | 25 + doc/source/tutorials/notebooks/1_basics.ipynb | 3165 +++++++++++++++++ .../tutorials/notebooks/2_internals.ipynb | 1417 ++++++++ .../notebooks/3_loading_preprocessing.ipynb | 488 +++ .../notebooks/4_matrix_factorizations.ipynb | 258 +- .../tutorials/notebooks/5_clustering.ipynb | 776 ++++ .../tutorials/notebooks/6_profiling.ipynb | 609 ++++ .../{ => tutorials}/tutorial_30_minutes.rst | 0 .../{ => tutorials}/tutorial_clustering.rst | 8 +- .../tutorials/tutorial_notebook_gallery.rst | 25 + .../tutorial_parallel_computation.rst | 4 +- doc/source/{ => tutorials}/tutorials.rst | 27 +- docker/Dockerfile.release | 2 +- docker/Dockerfile.source | 2 +- docker/scripts/build_and_push.sh | 32 +- docker/scripts/install_print_test.sh | 4 +- .../test_nvidia_image_haicore_enroot.sh | 2 +- heat/classification/kneighborsclassifier.py | 3 +- heat/cli.py | 54 + heat/cluster/batchparallelclustering.py | 7 +- heat/cluster/kmedians.py | 2 +- heat/cluster/kmedoids.py | 6 +- .../tests/test_batchparallelclustering.py | 14 +- heat/cluster/tests/test_kmeans.py | 9 +- heat/cluster/tests/test_kmedians.py | 7 +- heat/cluster/tests/test_kmedoids.py | 7 +- heat/cluster/tests/test_spectral.py | 85 +- heat/core/_operations.py | 17 +- heat/core/arithmetics.py | 145 +- heat/core/base.py | 8 +- heat/core/communication.py | 263 +- heat/core/complex_math.py | 12 +- heat/core/constants.py | 2 +- heat/core/devices.py | 30 +- heat/core/dndarray.py | 126 +- heat/core/exponential.py | 6 +- heat/core/factories.py | 101 +- heat/core/indexing.py | 4 +- heat/core/io.py | 415 ++- heat/core/linalg/__init__.py | 2 + heat/core/linalg/basics.py | 178 +- heat/core/linalg/eigh.py | 309 ++ heat/core/linalg/polar.py | 370 ++ heat/core/linalg/qr.py | 372 +- heat/core/linalg/solver.py | 13 +- heat/core/linalg/svd.py | 215 +- heat/core/linalg/svdtools.py | 525 ++- heat/core/linalg/tests/test_basics.py | 321 +- heat/core/linalg/tests/test_eigh.py | 55 + heat/core/linalg/tests/test_polar.py | 117 + heat/core/linalg/tests/test_qr.py | 67 +- heat/core/linalg/tests/test_solver.py | 116 +- heat/core/linalg/tests/test_svd.py | 72 +- heat/core/linalg/tests/test_svdtools.py | 420 ++- heat/core/logical.py | 34 +- heat/core/manipulations.py | 251 +- heat/core/memory.py | 6 +- heat/core/printing.py | 8 + heat/core/random.py | 52 +- heat/core/relational.py | 46 +- heat/core/rounding.py | 20 +- heat/core/sanitation.py | 24 +- heat/core/signal.py | 111 +- heat/core/statistics.py | 96 +- heat/core/stride_tricks.py | 44 +- heat/core/tests/test_arithmetics.py | 168 +- heat/core/tests/test_communication.py | 50 +- heat/core/tests/test_complex_math.py | 401 ++- heat/core/tests/test_dndarray.py | 545 +-- heat/core/tests/test_exponential.py | 283 +- heat/core/tests/test_factories.py | 174 +- heat/core/tests/test_io.py | 322 +- heat/core/tests/test_logical.py | 22 +- heat/core/tests/test_manipulations.py | 316 +- heat/core/tests/test_printing.py | 15 +- heat/core/tests/test_random.py | 125 +- heat/core/tests/test_rounding.py | 264 +- heat/core/tests/test_sanitation.py | 11 + heat/core/tests/test_signal.py | 364 +- heat/core/tests/test_statistics.py | 171 +- heat/core/tests/test_suites/basic_test.py | 132 +- heat/core/tests/test_tiling.py | 8 + heat/core/tests/test_trigonometrics.py | 392 +- heat/core/tests/test_types.py | 61 +- heat/core/tests/test_vmap.py | 79 +- heat/core/tiling.py | 21 +- heat/core/trigonometrics.py | 22 +- heat/core/types.py | 41 +- heat/core/version.py | 8 +- heat/core/vmap.py | 3 +- heat/decomposition/__init__.py | 1 + heat/decomposition/dmd.py | 715 ++++ heat/decomposition/pca.py | 210 +- heat/decomposition/tests/test_dmd.py | 589 +++ heat/decomposition/tests/test_pca.py | 150 +- heat/fft/tests/test_fft.py | 135 +- heat/naive_bayes/gaussianNB.py | 14 +- heat/naive_bayes/tests/test_gaussiannb.py | 12 +- heat/nn/__init__.py | 2 +- heat/nn/data_parallel.py | 4 +- heat/optim/__init__.py | 2 +- heat/optim/dp_optimizer.py | 4 +- heat/preprocessing/preprocessing.py | 2 +- heat/py.typed | 0 heat/regression/lasso.py | 2 +- heat/sparse/factories.py | 2 +- heat/sparse/manipulations.py | 6 +- heat/sparse/tests/test_arithmetics_csr.py | 9 +- heat/sparse/tests/test_dcscmatrix.py | 9 +- heat/sparse/tests/test_dcsrmatrix.py | 10 +- heat/sparse/tests/test_factories.py | 9 +- heat/sparse/tests/test_manipulations.py | 9 +- heat/spatial/distance.py | 2 +- heat/spatial/tests/test_distances.py | 18 +- heat/tests/test_cli.py | 56 + heat/utils/data/_utils.py | 1 + heat/utils/data/datatools.py | 2 +- heat/utils/data/partial_dataset.py | 1 + heat/utils/data/spherical.py | 4 +- heat/utils/data/tests/test_matrixgallery.py | 15 +- pyproject.toml | 175 +- scripts/numpy_coverage_tables.py | 86 +- setup.cfg | 14 - setup.py | 52 - tutorials/hpc/2_basics.ipynb | 1 - tutorials/hpc/3_internals.ipynb | 1 - tutorials/hpc/4_loading_preprocessing.ipynb | 1 - tutorials/hpc/5_matrix_factorizations.ipynb | 1 - tutorials/hpc/6_clustering.ipynb | 1 - tutorials/local/2_basics.ipynb | 780 ---- tutorials/local/3_internals.ipynb | 301 -- tutorials/local/4_loading_preprocessing.ipynb | 209 -- tutorials/local/6_clustering.ipynb | 787 ---- .../hpc/01_basics/01_basics_dndarrays.py | 25 + .../hpc/01_basics/02_basics_datatypes.py | 22 + .../hpc/01_basics/03_basics_operations.py | 30 + .../hpc/01_basics/04_basics_indexing.py | 13 + .../hpc/01_basics/05_basics_broadcast.py | 14 + .../scripts/hpc/01_basics/06_basics_gpu.py | 39 + .../hpc/01_basics/07_basics_distributed.py | 70 + .../08_basics_distributed_operations.py | 24 + .../01_basics/09_basics_distributed_matmul.py | 55 + .../hpc/01_basics/10_interoperability.py | 26 + .../scripts/hpc/01_basics/11_internals_1.py | 44 + .../scripts/hpc/01_basics/12_internals_2.py | 71 + .../hpc/02_loading_preprocessing/01_IO.py | 40 + .../02_preprocessing.py | 69 + .../hpc/02_loading_preprocessing/iris.csv | 150 + .../matrix_factorizations.py | 99 + .../scripts/hpc/04_clustering/clustering.py | 68 + .../hpc/05_your_turn/now_its_your_turn.py | 44 + tutorials/scripts/hpc/README.md | 17 + tutorials/scripts/hpc/slurm_script_cpu.sh | 12 + tutorials/scripts/hpc/slurm_script_gpu.sh | 13 + 234 files changed, 18199 insertions(+), 6019 deletions(-) delete mode 100644 .github/workflows/changelog-updater.yml create mode 100644 benchmarks/cb/decomposition.py create mode 100644 benchmarks/cb/heat_signal.py create mode 100644 doc/Makefile delete mode 100644 doc/requirements.txt rename doc/{ => source/_static}/images/GSoC-Horizontal.svg (100%) rename doc/{ => source/_static}/images/bsp.svg (100%) rename doc/{ => source/_static}/images/clustering.png (100%) rename doc/{ => source/_static}/images/clustering_kmeans.png (100%) rename doc/{ => source/_static}/images/data.png (100%) rename doc/{ => source/_static}/images/dlr_logo.svg (100%) rename doc/{ => source/_static}/images/fzj_logo.svg (100%) rename doc/{ => source/_static}/images/hSVD_bench_rank5.png (100%) rename doc/{ => source/_static}/images/hSVD_bench_rank50.png (100%) rename doc/{ => source/_static}/images/hSVD_bench_rank500.png (100%) rename doc/{ => source/_static}/images/heat_split_array.png (100%) rename doc/{ => source/_static}/images/heat_split_array.svg (100%) rename doc/{ => source/_static}/images/heatvsdask_strong_smalldata_without.png (100%) rename doc/{ => source/_static}/images/heatvsdask_weak_smalldata_without.png (100%) rename doc/{ => source/_static}/images/helmholtz_logo.svg (100%) create mode 100644 doc/source/_static/images/jsc_logo.png create mode 100644 doc/source/_static/images/jupyter.png rename doc/{ => source/_static}/images/kit_logo.svg (100%) create mode 100644 doc/source/_static/images/local_laptop.png rename doc/{ => source/_static}/images/logo.png (100%) rename doc/{ => source/_static}/images/logo_emblem.png (100%) rename doc/{ => source/_static}/images/logo_emblem.svg (100%) rename doc/{ => source/_static}/images/logo_white.png (100%) rename doc/{ => source/_static}/images/logo_white.svg (100%) create mode 100644 doc/source/_static/images/nhr_verein_logo.jpg create mode 100644 doc/source/_static/images/perun_logo.svg rename doc/{ => source/_static}/images/split_array.png (100%) rename doc/{ => source/_static}/images/split_array.svg (100%) rename doc/{ => source/_static}/images/tutorial_clustering.svg (100%) rename doc/{ => source/_static}/images/tutorial_dpnn.svg (100%) rename doc/{ => source/_static}/images/tutorial_logo.svg (100%) rename doc/{ => source/_static}/images/tutorial_split_dndarray.svg (100%) rename doc/{ => source/_static}/images/weak_scaling_gpu_terrabyte.png (100%) delete mode 100644 doc/source/tutorial_dpnn.rst create mode 100755 doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh create mode 100644 doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb rename tutorials/hpc/1_intro.ipynb => doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb (78%) rename tutorials/local/1_intro.ipynb => doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb (79%) create mode 100755 doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh create mode 100644 doc/source/tutorials/notebooks/1_basics.ipynb create mode 100644 doc/source/tutorials/notebooks/2_internals.ipynb create mode 100644 doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb rename tutorials/local/5_matrix_factorizations.ipynb => doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb (56%) create mode 100644 doc/source/tutorials/notebooks/5_clustering.ipynb create mode 100644 doc/source/tutorials/notebooks/6_profiling.ipynb rename doc/source/{ => tutorials}/tutorial_30_minutes.rst (100%) rename doc/source/{ => tutorials}/tutorial_clustering.rst (97%) create mode 100644 doc/source/tutorials/tutorial_notebook_gallery.rst rename doc/source/{ => tutorials}/tutorial_parallel_computation.rst (99%) rename doc/source/{ => tutorials}/tutorials.rst (68%) create mode 100644 heat/cli.py create mode 100644 heat/core/linalg/eigh.py create mode 100644 heat/core/linalg/polar.py create mode 100644 heat/core/linalg/tests/test_eigh.py create mode 100644 heat/core/linalg/tests/test_polar.py create mode 100644 heat/decomposition/dmd.py create mode 100644 heat/decomposition/tests/test_dmd.py create mode 100644 heat/py.typed create mode 100644 heat/tests/test_cli.py delete mode 100644 setup.cfg delete mode 100644 setup.py delete mode 120000 tutorials/hpc/2_basics.ipynb delete mode 120000 tutorials/hpc/3_internals.ipynb delete mode 120000 tutorials/hpc/4_loading_preprocessing.ipynb delete mode 120000 tutorials/hpc/5_matrix_factorizations.ipynb delete mode 120000 tutorials/hpc/6_clustering.ipynb delete mode 100644 tutorials/local/2_basics.ipynb delete mode 100644 tutorials/local/3_internals.ipynb delete mode 100644 tutorials/local/4_loading_preprocessing.ipynb delete mode 100644 tutorials/local/6_clustering.ipynb create mode 100644 tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py create mode 100644 tutorials/scripts/hpc/01_basics/02_basics_datatypes.py create mode 100644 tutorials/scripts/hpc/01_basics/03_basics_operations.py create mode 100644 tutorials/scripts/hpc/01_basics/04_basics_indexing.py create mode 100644 tutorials/scripts/hpc/01_basics/05_basics_broadcast.py create mode 100644 tutorials/scripts/hpc/01_basics/06_basics_gpu.py create mode 100644 tutorials/scripts/hpc/01_basics/07_basics_distributed.py create mode 100644 tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py create mode 100644 tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py create mode 100644 tutorials/scripts/hpc/01_basics/10_interoperability.py create mode 100644 tutorials/scripts/hpc/01_basics/11_internals_1.py create mode 100644 tutorials/scripts/hpc/01_basics/12_internals_2.py create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py create mode 100644 tutorials/scripts/hpc/02_loading_preprocessing/iris.csv create mode 100644 tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py create mode 100644 tutorials/scripts/hpc/04_clustering/clustering.py create mode 100644 tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py create mode 100644 tutorials/scripts/hpc/README.md create mode 100644 tutorials/scripts/hpc/slurm_script_cpu.sh create mode 100644 tutorials/scripts/hpc/slurm_script_gpu.sh diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml index 5ef72f6bd1..b70c72b57e 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.yml +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -35,6 +35,7 @@ body: options: - main (development branch) - 1.5.x + - other validations: required: true - type: dropdown @@ -43,16 +44,18 @@ body: label: Python version description: What Python version? options: + - 3.13 - 3.12 - 3.11 - - "3.10" - - 3.9 + - '3.10' - type: dropdown id: pytorch-version attributes: label: PyTorch version description: What PyTorch version? options: + - 2.7 + - 2.6 - 2.5 - 2.4 - 2.3 diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index b7ac0c46da..9fec41ba22 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -6,6 +6,7 @@ - Implementation: - [ ] unit tests: all split configurations tested - [ ] unit tests: multiple dtypes tested + - [ ] **NEW** unit tests: MPS tested (1 MPI process, 1 GPU) - [ ] benchmarks: created for new functionality - [ ] benchmarks: performance improved or maintained - [ ] documentation updated where needed diff --git a/.github/rd-release-config.yml b/.github/rd-release-config.yml index 6f1d103d27..a45fa74a14 100644 --- a/.github/rd-release-config.yml +++ b/.github/rd-release-config.yml @@ -94,11 +94,12 @@ autolabeler: - label: 'docker' files: - 'docker/**/*' - - label: 'backport release' + - label: 'backport stable' title: - '/bug/i' - '/resolve/i' - '/fix/i' + - '/\[pre\-commit\.ci\]/i' branch: - '/bug/i' - '/fix/i' @@ -113,9 +114,6 @@ autolabeler: - label: 'interoperability' title: - '/Support.+/' - - label: 'testing' - files: - - '**/tests/**/*' - label: 'classification' files: - 'heat/classification/**/*' @@ -164,6 +162,84 @@ autolabeler: - label: 'linalg' files: - 'heat/core/linalg/**/*' + - label: 'arithmetics' + files: + - 'heat/core/arithmetics.py' + - label: 'base' + files: + - 'heat/core/base.py' + - label: 'communication' + files: + - 'heat/core/communication.py' + - label: 'complex_math' + files: + - 'heat/core/complex_math.py' + - label: 'constants' + files: + - 'heat/core/constants.py' + - label: 'devices' + files: + - 'heat/core/devices.py' + - label: 'dndarray' + files: + - 'heat/core/dndarray.py' + - label: 'exponential' + files: + - 'heat/core/exponential.py' + - label: 'indexing' + files: + - 'heat/core/indexing.py' + - label: 'io' + files: + - 'heat/core/io.py' + - label: 'logical' + files: + - 'heat/core/logical.py' + - label: 'manipulations' + files: + - 'heat/core/manipulations.py' + - label: 'memory' + files: + - 'heat/core/memory.py' + - label: 'printing' + files: + - 'heat/core/printing.py' + - label: 'random' + files: + - 'heat/core/random.py' + - label: 'relational' + files: + - 'heat/core/relational.py' + - label: 'rounding' + files: + - 'heat/core/rounding.py' + - label: 'sanitation' + files: + - 'heat/core/sanitation.py' + - label: 'signal' + files: + - 'heat/core/signal.py' + - label: 'statistics' + files: + - 'heat/core/statistics.py' + - label: 'stride_tricks' + files: + - 'heat/core/stride_tricks.py' + - label: 'tiling' + files: + - 'heat/core/tiling.py' + - label: 'trigonometrics' + files: + - 'heat/core/trigonometrics.py' + - label: 'types' + files: + - 'heat/core/types.py' + - label: 'version' + files: + - 'heat/core/version.py' + - label: 'vmap' + files: + - 'heat/core/vmap.py' change-template: '- #$NUMBER $TITLE (by @$AUTHOR)' category-template: '### $TITLE' diff --git a/.github/workflows/CIBase.yml b/.github/workflows/CIBase.yml index 7f97485236..1454e239e5 100644 --- a/.github/workflows/CIBase.yml +++ b/.github/workflows/CIBase.yml @@ -4,7 +4,7 @@ on: push: branches: - 'main' - - 'release/**' + - 'stable' permissions: contents: read @@ -14,13 +14,13 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Get branch names id: branch-names - uses: tj-actions/branch-names@v8 + uses: tj-actions/branch-names@5250492686b253f06fa55861556d1027b067aeb5 # v9.0.2 - name: 'start test' run: | curl -s -X POST \ diff --git a/.github/workflows/CISupport.yml b/.github/workflows/CISupport.yml index 7f06369842..4f65e0186f 100644 --- a/.github/workflows/CISupport.yml +++ b/.github/workflows/CISupport.yml @@ -9,9 +9,14 @@ jobs: starter: runs-on: ubuntu-latest steps: + - name: Harden the runner (Audit all outbound calls) + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 + with: + egress-policy: audit + - name: Get branch names id: branch-names - uses: tj-actions/branch-names@v8 + uses: tj-actions/branch-names@5250492686b253f06fa55861556d1027b067aeb5 # v9.0.2 - name: 'start test' run: | curl -s -X POST \ diff --git a/.github/workflows/CommentPR.yml b/.github/workflows/CommentPR.yml index b371663ca4..c156bc28ee 100644 --- a/.github/workflows/CommentPR.yml +++ b/.github/workflows/CommentPR.yml @@ -16,7 +16,7 @@ jobs: PR_NR: ${{ steps.step1.outputs.test }} steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit @@ -65,11 +65,11 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: 'Trigger Workflow' run: | diff --git a/.github/workflows/ReceivePR.yml b/.github/workflows/ReceivePR.yml index 6c26e28266..82b27c8989 100644 --- a/.github/workflows/ReceivePR.yml +++ b/.github/workflows/ReceivePR.yml @@ -13,16 +13,16 @@ jobs: steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@v4.1.7 + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: Use Python - uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.2.0 + uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 with: - python-version: 3.9 + python-version: '3.10' architecture: x64 - name: Setup MPI @@ -42,7 +42,7 @@ jobs: run: | mkdir -p ./pr echo $PR_NUMBER > ./pr/pr_number - - uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0 + - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 with: name: pr_number path: pr/ diff --git a/.github/workflows/backport.yml b/.github/workflows/backport.yml index 253eb39ee7..ab09a22d03 100644 --- a/.github/workflows/backport.yml +++ b/.github/workflows/backport.yml @@ -12,8 +12,13 @@ jobs: # Don't run on closed unmerged pull requests if: github.event.pull_request.merged steps: - - uses: actions/checkout@v4.1.7 + - name: Harden the runner (Audit all outbound calls) + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 + with: + egress-policy: audit + + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: Create backport pull requests - uses: korthout/backport-action@v3 + uses: korthout/backport-action@ca4972adce8039ff995e618f5fc02d1b7961f27a # v3.3.0 with: merge_commits: 'skip' diff --git a/.github/workflows/bench_trigger.yml b/.github/workflows/bench_trigger.yml index 53fa95477f..cdee9d3af8 100644 --- a/.github/workflows/bench_trigger.yml +++ b/.github/workflows/bench_trigger.yml @@ -6,18 +6,21 @@ on: pull_request: types: [synchronize] +permissions: + contents: read + jobs: trigger-benchmark: name: Trigger Benchmarks runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Checkout - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: Trigger benchmarks (PR) id: setup_pr if: contains(github.event.pull_request.labels.*.name, 'benchmark PR') diff --git a/.github/workflows/changelog-updater.yml b/.github/workflows/changelog-updater.yml deleted file mode 100644 index 1739b9a876..0000000000 --- a/.github/workflows/changelog-updater.yml +++ /dev/null @@ -1,55 +0,0 @@ -name: 'Update Changelog' - -on: - release: - types: [released] - -permissions: - contents: read - -jobs: - update-changelog: - permissions: - contents: write # for stefanzweifel/git-auto-commit-action to push code in repo - runs-on: ubuntu-latest - steps: - - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 - with: - egress-policy: audit - - - name: Checkout code - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 - with: - repository: helmholtz-analytics/heat - ref: ${{ github.event.release.target_commitish }} - - name: Update Changelog - run: | - echo $RELEASE_TITLE > cl_title.md - echo "$RELEASE_BODY" > cl_new_body.md - echo "" > newline.txt - cat cl_title.md newline.txt cl_new_body.md newline.txt CHANGELOG.md > tmp - mv tmp CHANGELOG.md - rm cl_title.md - rm cl_new_body.md - rm newline.txt - cat CHANGELOG.md - env: - RELEASE_TITLE: ${{ format('# {0} - {1}', github.event.release.tag_name, github.event.release.name) }} - RELEASE_BODY: ${{ github.event.release.body }} - - name: Create PR - uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6.1.0 - with: - base: main - branch: post-release-changelog-update - delete-branch: true - token: ${{ secrets.GITHUB_TOKEN }} - commit-message: Update Changelog post release - title: Update Changelog post release - body: | - This PR updates the changelog post release. - - Changed files should include an updated CHANGELOG.md containing the release notes from the latest release. - - reviewers: ClaudiaComito, mtar, JuanPedroGHM - labels: chore, github_actions diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index fe6ceb8d25..84cb3d7d80 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -12,44 +12,54 @@ jobs: fail-fast: false matrix: py-version: - - 3.9 - '3.10' - 3.11 - 3.12 + - 3.13 mpi: [ 'openmpi' ] - install-options: [ '.', '.[hdf5,netcdf,pandas]' ] + install-options: [ '.', '.[hdf5,netcdf,pandas,zarr]' ] pytorch-version: - - 'torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2' - - 'torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2' - - 'torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' + - 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2' + - 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' - 'torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1' - 'torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1' - 'torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1' - 'torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0' + - 'torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1' exclude: + - py-version: '3.13' + pytorch-version: 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2' + - py-version: '3.13' + pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' + - py-version: '3.13' + pytorch-version: 'torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1' + - py-version: '3.13' + pytorch-version: 'torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1' + - py-version: '3.13' + pytorch-version: 'torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1' - py-version: '3.12' - pytorch-version: 'torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2' + pytorch-version: 'numpy==1.26 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2' - py-version: '3.12' - pytorch-version: 'torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2' - - py-version: '3.12' - pytorch-version: 'torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' + pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' + - py-version: '3.10' + install-options: '.[hdf5,netcdf,pandas,zarr]' name: Python ${{ matrix.py-version }} with ${{ matrix.pytorch-version }}; options ${{ matrix.install-options }} steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Checkout - uses: actions/checkout@v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: Setup MPI - uses: mpi4py/setup-mpi@v1.2.0 + uses: mpi4py/setup-mpi@3969f247e8fceef153418744f9d9ee6fdaeda29f # v1.2.0 with: mpi: ${{ matrix.mpi }} - name: Use Python ${{ matrix.py-version }} - uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.2.0 + uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 with: python-version: ${{ matrix.py-version }} architecture: x64 diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml index 28bd3fbb72..9ad611e838 100644 --- a/.github/workflows/codeql.yml +++ b/.github/workflows/codeql.yml @@ -41,16 +41,16 @@ jobs: steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Checkout repository - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 # Initializes the CodeQL tools for scanning. - name: Initialize CodeQL - uses: github/codeql-action/init@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6 + uses: github/codeql-action/init@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9 with: languages: ${{ matrix.language }} # If you wish to specify custom queries, you can do so here or in a config file. @@ -60,7 +60,7 @@ jobs: # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). # If this step fails, then you should remove it and run the build manually (see below) - name: Autobuild - uses: github/codeql-action/autobuild@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6 + uses: github/codeql-action/autobuild@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9 # ℹ️ Command-line programs to run using the OS shell. # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun @@ -73,6 +73,6 @@ jobs: # ./location_of_script_within_repo/buildscript.sh - name: Perform CodeQL Analysis - uses: github/codeql-action/analyze@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6 + uses: github/codeql-action/analyze@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9 with: category: "/language:${{matrix.language}}" diff --git a/.github/workflows/create-branch-on-assignment.yml b/.github/workflows/create-branch-on-assignment.yml index 3e5abc9add..87ee737027 100644 --- a/.github/workflows/create-branch-on-assignment.yml +++ b/.github/workflows/create-branch-on-assignment.yml @@ -11,11 +11,11 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Create Issue Branch - uses: robvanderleek/create-issue-branch@6bb28dd55d6790ee022ca0de60deca378e628ab3 # main + uses: robvanderleek/create-issue-branch@dfe19372d9a9198999c0fd8a81f0dbe00951afd9 # main env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/dependency-review.yml b/.github/workflows/dependency-review.yml index bf2dcfbae9..3eb2f162ad 100644 --- a/.github/workflows/dependency-review.yml +++ b/.github/workflows/dependency-review.yml @@ -17,11 +17,11 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: 'Checkout Repository' - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: 'Dependency Review' - uses: actions/dependency-review-action@5a2ce3f5b92ee19cbb1541a4984c76d921601d7c # v4.3.4 + uses: actions/dependency-review-action@da24556b548a50705dd671f47852072ea4c105d9 # v4.7.1 diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index 8327f935d0..49922bf604 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -25,31 +25,31 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: Checkout - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - name: Set up QEMU - uses: docker/setup-qemu-action@49b3bc8e6bdd4a60e6116a5414239cba5943d3cf # v3.2.0 + uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0 - name: Set up Docker Buildx - uses: docker/setup-buildx-action@988b5a0280414f521da01fcc63a27aeeb4b104db # v3.6.1 + uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1 with: driver: docker - name: Login to GitHub Container Registry - uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567 # v3.3.0 + uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0 with: registry: ghcr.io username: ${{ github.repository_owner }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build - uses: docker/build-push-action@5cd11c3a4ced054e52742c5fd54dca954e0edd85 # v6.7.0 + uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: file: docker/Dockerfile.release build-args: | @@ -65,7 +65,7 @@ jobs: docker run -v `pwd`:`pwd` -w `pwd` --rm test_${{ inputs.name }} pytest - name: Build and push - uses: docker/build-push-action@5cd11c3a4ced054e52742c5fd54dca954e0edd85 # v6.7.0 + uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: file: docker/Dockerfile.release build-args: | diff --git a/.github/workflows/inactivity.yml b/.github/workflows/inactivity.yml index ef96defa50..636bc7de18 100644 --- a/.github/workflows/inactivity.yml +++ b/.github/workflows/inactivity.yml @@ -14,11 +14,11 @@ jobs: pull-requests: write steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/stale@28ca1036281a5e5922ead5184a1bbf96e5fc984e # v9.0.0 + - uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0 with: days-before-issue-stale: 60 days-before-issue-close: 60 diff --git a/.github/workflows/increment_version.sh b/.github/workflows/increment_version.sh index cfc6481460..8c7b417170 100755 --- a/.github/workflows/increment_version.sh +++ b/.github/workflows/increment_version.sh @@ -2,6 +2,6 @@ version=$1 # assume version is passed as an argument IFS='.' read -r -a parts <<< "$version" # split by dots last_index=$(( ${#parts[@]} - 1 )) # get last index -parts[$last_index]=$(( ${parts[$last_index]} + 1 )) # increment last part +parts[last_index]=$(( parts[last_index] + 1 )) # increment last part new_version=$(IFS=.; echo "${parts[*]}") # join by dots -echo $new_version # print new version +echo "$new_version" # print new version diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml index 0b2060d302..916ca2cf9c 100644 --- a/.github/workflows/latest-pytorch-support.yml +++ b/.github/workflows/latest-pytorch-support.yml @@ -18,11 +18,11 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 - uses: JasonEtco/create-an-issue@1b14a70e4d8dc185e5cc76d3bec9eab20257b2c5 # v2.9.2 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} @@ -34,23 +34,23 @@ jobs: update_existing: true search_existing: open - name: Check out new branch - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 with: token: ${{ secrets.GITHUB_TOKEN }} ref: '${{ inputs.working_branch }}' - name: Set env variables run: | - echo "previous_pytorch=$(grep 'torch>=' setup.py | awk -F '<' '{print $2}' | tr -d '",')" >> $GITHUB_ENV + echo "previous_pytorch=$(grep 'torch~=' pyproject.toml | awk -F '<' '{print $2}' | tr -d '",')" >> $GITHUB_ENV echo "new_pytorch=$(<.github/pytorch-release-versions/pytorch-latest.txt)" >> $GITHUB_ENV - name: Increment version run: | chmod +x .github/workflows/increment_version.sh echo "setup_pytorch=$(.github/workflows/increment_version.sh ${{ env.new_pytorch }})" >> $GITHUB_ENV - - name: Update setup.py + - name: Update pyproject.toml run: | - sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.setup_pytorch }}"'/g' setup.py + sed -i '/torch~=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.setup_pytorch }}"'/g' pyproject.toml - name: Create PR from branch - uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6.1.0 + uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8 with: base: ${{ inputs.base_branch }} branch: ${{ inputs.working_branch }} @@ -62,7 +62,7 @@ jobs: Run tests on latest PyTorch release Issue/s resolved: #${{ steps.create-issue.outputs.number }} TODO: - - [ ] update `.github/workflows/ci.yaml` to include `n-1` Pytorch version + - [ ] update `.github/workflows/ci.yaml` to include latest Pytorch version - [ ] update Nvidia and AMD Docker images on gitlab CI Auto-generated by [create-pull-request][1] [1]: https://github.com/peter-evans/create-pull-request diff --git a/.github/workflows/markdown-links-check.yml b/.github/workflows/markdown-links-check.yml index c09084ba65..db4266ff13 100644 --- a/.github/workflows/markdown-links-check.yml +++ b/.github/workflows/markdown-links-check.yml @@ -12,11 +12,11 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # master + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # master - uses: gaurav-nelson/github-action-markdown-link-check@5c5dfc0ac2e225883c0e5f03a85311ec2830d368 # v1 # checks all markdown files from root but ignores subfolders # By Removing the max-depth variable we can modify it -> to check all the .md files in the entire repo. diff --git a/.github/workflows/pytorch-latest-release.yml b/.github/workflows/pytorch-latest-release.yml index f39adc6183..2bd71c9d9c 100644 --- a/.github/workflows/pytorch-latest-release.yml +++ b/.github/workflows/pytorch-latest-release.yml @@ -14,11 +14,11 @@ jobs: if: ${{ github.repository }} == 'hemlholtz-analytics/heat' steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 with: ref: '${{ env.base_branch }}' - name: Fetch PyTorch release version diff --git a/.github/workflows/release-drafter.yml b/.github/workflows/release-drafter.yml index f599bb28ea..0cd25717a3 100644 --- a/.github/workflows/release-drafter.yml +++ b/.github/workflows/release-drafter.yml @@ -1,8 +1,11 @@ name: Release Drafter on: - pull_request_target: - types: [opened, reopened, synchronize] + pull_request: + types: [opened, reopened] +permissions: + contents: read + jobs: update_release_draft: permissions: @@ -11,14 +14,14 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.2 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: release-drafter/release-drafter@v6 # v6.0.0 + - uses: release-drafter/release-drafter@b1476f6e6eb133afa41ed8589daba6dc69b4d3f5 # v6.1.0 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with: - commitish: 'release' + commitish: 'main' name: ${{ github.ref_name }} - Draft release config-name: rd-release-config.yml diff --git a/.github/workflows/release-prep.yml b/.github/workflows/release-prep.yml index 1e28bae9e0..b1f50d83ae 100644 --- a/.github/workflows/release-prep.yml +++ b/.github/workflows/release-prep.yml @@ -11,6 +11,10 @@ on: description: "The base branch to create the release branch from" required: true default: "main" + title: + description: "Release title" + required: False + default: "Heat" permissions: contents: write @@ -22,13 +26,21 @@ jobs: runs-on: ubuntu-latest steps: - name: Harden Runner - uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - - uses: actions/checkout@v4 + - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 with: ref: ${{ github.event.inputs.base_branch }} + - uses: release-drafter/release-drafter@b1476f6e6eb133afa41ed8589daba6dc69b4d3f5 # v6.1.0 + id: release_drafter + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + with: + commitish: 'stable' + name: ${{ github.event.inputs.title }} + config-name: rd-release-config.yml - name: Bump version.py and create PR env: PR_BRANCH: pre-release/${{ inputs.release_version }} @@ -43,43 +55,97 @@ jobs: MINOR=$(echo $VERSION | cut -d. -f2) MICRO=$(echo $VERSION | cut -d. -f3) + ## ----- START Update Dockerfiles ------- + # Extract the current version from the Dockerfile + FILE_VERSION=$(grep -oP 'ARG PYTORCH_IMG=\K\d{2}\.\d{2}' docker/Dockerfile.release) + FILE_VERSION_SOURCE=$(grep -oP 'ARG PYTORCH_IMG=\K\d{2}\.\d{2}' docker/Dockerfile.source) + + # Construct the date for the new version + DATE=$(date '+%y.%m') + + # Separate new + YEAR=$(echo $DATE | cut -d'.' -f1) + MONTH=$(echo $DATE | cut -d'.' -f2) + + ## --- Handling of special cases --- + # Move to the previous year + if [ "$MONTH" == "01" ]; then + PREV_MONTH="12" + YEAR=$(($YEAR - 1)) + # 09 and 08 will be interpreted in Octal, so they have to be handled differently + elif [ "$MONTH" == "09" ]; then + PREV_MONTH="08" + elif [ "$MONTH" == "08" ]; then + PREV_MONTH="07" + else + PREV_MONTH=$(($MONTH - 1)) + # Ensure the previous month is 2 digits + PREV_MONTH=$(printf "%02d" $PREV_MONTH) + fi + + # Construct the new version + NEW_VERSION="${YEAR}.${PREV_MONTH}" + + sed -i "s/$FILE_VERSION/$NEW_VERSION/g" docker/Dockerfile.release + sed -i "s/$FILE_VERSION_SOURCE/$NEW_VERSION/g" docker/Dockerfile.source + + ## ----- END Workflow to update Dockerfile Images ------- + # Write on to the version.py file sed -i "s/major: int = \([0-9]\+\)/major: int = $MAJOR/g" heat/core/version.py sed -i "s/minor: int = \([0-9]\+\)/minor: int = $MINOR/g" heat/core/version.py sed -i "s/micro: int = \([0-9]\+\)/micro: int = $MICRO/g" heat/core/version.py sed -i "s/extension: str = .*/extension: str = None/g" heat/core/version.py + { echo -e "# v${MAJOR}.${MINOR}.${MICRO} - ${{github.event.inputs.title}}\n${{ steps.release_drafter.outputs.body}}\n"; cat CHANGELOG.md; } > tmp.md + mv tmp.md CHANGELOG.md + # Git configuration with anonymous user and email git config --global user.email "" git config --global user.name "Heat Release Bot" # Commit the changes - git add heat/core/version.py + git add heat/core/version.py CHANGELOG.md git commit -m "Bump version to $VERSION" + # Commit Dockerfile changes + git add docker/Dockerfile.release + git add docker/Dockerfile.source + git commit -m "Update pytorch image in Dockerfile.release and Dockerfile.source to version $UPDATED_VERSION" + # Push the changes git push --set-upstream origin pre-release/${VERSION} # Create PR for release gh pr create \ - --base release \ + --base stable \ --head ${{ env.PR_BRANCH }} \ --title "Heat ${{ env.VERSION }} - Release" \ --body "Pre-release branch for Heat ${{ env.VERSION }}. - Any release work should be done on this branch, and then merged into the release branch and main, following git-flow. + Any release work should be done on this branch, and then merged into `stable` and `main`, following git-flow guidelines. TODO: - [x] Update version.py - - [ ] update the Requirements section on README.md if needed - - [ ] Update CITATION.cff if needed - - [ ] Ensure the Changelog is up to date + - [ ] Ensure Citation file `CITATION.cff` is up to date. + - [ ] Ensure the Changelog is up to date. - [1]: https://github.com/peter-evans/create-pull-request" \ - --label invalid + DO NOT DELETE BRANCH AFTER MERGING!" \ + --label "pre-release" # Create PR for main gh pr create --base main \ --head ${{ env.PR_BRANCH }} \ --title "Heat ${{ env.VERSION }} - Main" \ - --body "Copy of latest pre-release PR targeting release." \ - --label invalid + --draft \ + --body "Copy of latest pre-release PR targeting release. + DO NOT CHANGE ANYTHING UNTIL `Heat ${{ env.VERSION }} - Release` HAS BEEN MERGED. + + TODO: + - [ ] Make sure version.py is updated to reflect the dev version. + - [ ] Ensure Citation file is up to date. + - [ ] Ensure the Changelog is up to date. + - [ ] Test and merge conda-forge build (PR is usually created within a few hours of PyPi release) + - [ ] Update docker image and related documentation (see #1716 ) + - [ ] Update spack recipe + - [ ] Update easybuild recipe" \ + --label "post-release" diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml index f24e301680..243858735a 100644 --- a/.github/workflows/scorecard.yml +++ b/.github/workflows/scorecard.yml @@ -32,17 +32,17 @@ jobs: steps: - name: Harden Runner - uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde # v2.9.1 + uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0 with: egress-policy: audit - name: "Checkout code" - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 + uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 with: persist-credentials: false - name: "Run analysis" - uses: ossf/scorecard-action@62b2cac7ed8198b15735ed49ab1e5cf35480ba46 # v2.4.0 + uses: ossf/scorecard-action@05b42c624433fc40578a4040d5cf5e36ddca8cde # v2.4.2 with: results_file: results.sarif results_format: sarif @@ -64,7 +64,7 @@ jobs: # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF # format to the repository Actions tab. - name: "Upload artifact" - uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0 + uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 with: name: SARIF file path: results.sarif @@ -72,6 +72,6 @@ jobs: # Upload the results to GitHub's code scanning dashboard. - name: "Upload to code-scanning" - uses: github/codeql-action/upload-sarif@4dd16135b69a43b6c8efb853346f8437d92d3c93 # v3.26.6 + uses: github/codeql-action/upload-sarif@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9 with: sarif_file: results.sarif diff --git a/.gitignore b/.gitignore index 6cf3342594..f59b5da737 100644 --- a/.gitignore +++ b/.gitignore @@ -308,3 +308,4 @@ heat/datasets/MNISTDataset perun_results/ bench_data/ my_dev_stuff/ +docs/source/autoapi diff --git a/.perun.ini b/.perun.ini index 0919670d6e..b594eac4df 100644 --- a/.perun.ini +++ b/.perun.ini @@ -1,12 +1,28 @@ +[post-processing] +power_overhead = 100 +pue = 1.05 +emissions_factor = 417.8 +price_factor = 0.3251 +price_unit = € + +[monitor] +sampling_period = 0.1 +include_backends = +include_sensors = +exclude_backends = +exclude_sensors = CPU_FREQ_\d + [output] +app_name +run_id format = bench data_out = ./bench_data [benchmarking] rounds = 10 warmup_rounds = 1 -metrics=runtime -region_metrics=runtime +metrics = runtime,energy +region_metrics = runtime,power [benchmarking.units] joule = k @@ -14,3 +30,6 @@ second = percent = watt = byte = G + +[debug] +log_lvl = WARNING diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index b45d931840..4b2b239cc2 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,30 +1,53 @@ # See https://pre-commit.com for more information # See https://pre-commit.com/hooks.html for more hooks +ci: + skip: + - "mypy" # Skip mypy in CI, as it is run manually repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v5.0.0 + rev: v6.0.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml - id: check-added-large-files - id: check-toml - - repo: https://github.com/psf/black-pre-commit-mirror - rev: 25.1.0 + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v1.17.1 # Use the sha / tag you want to point at hooks: - - id: black - - repo: https://github.com/PyCQA/flake8 - rev: 7.2.0 + - id: mypy + args: [--config-file, pyproject.toml, --ignore-missing-imports] + additional_dependencies: + - torch + - h5py + - zarr + pass_filenames: false + stages: [manual] + + - repo: https://github.com/astral-sh/ruff-pre-commit + # Ruff version. + rev: v0.12.9 hooks: - - id: flake8 - - repo: https://github.com/pycqa/pydocstyle - rev: 6.3.0 # pick a git hash / tag to point to - hooks: - - id: pydocstyle - exclude: "tutorials|tests|benchmarks|examples|scripts|setup.py" #|heat/utils/data/mnist.py|heat/utils/data/_utils.py ? + # Run the linter. + - id: ruff + args: [ --fix ] + # Run the formatter. + - id: ruff-format - repo: "https://github.com/citation-file-format/cffconvert" rev: "054bda51dbe278b3e86f27c890e3f3ac877d616c" hooks: - id: "validate-cff" args: - "--verbose" + - repo: https://github.com/gitleaks/gitleaks + rev: v8.28.0 + hooks: + - id: gitleaks + - repo: https://github.com/shellcheck-py/shellcheck-py + rev: v0.11.0.1 + hooks: + - id: shellcheck + #- repo: https://github.com/jumanjihouse/pre-commit-hooks + # rev: 3.0.0 + # hooks: + # - id: shellcheck diff --git a/.readthedocs.yaml b/.readthedocs.yaml index 49dafa0fa7..e09599c26e 100644 --- a/.readthedocs.yaml +++ b/.readthedocs.yaml @@ -10,6 +10,9 @@ build: os: ubuntu-22.04 tools: python: "3.11" + apt_packages: + - pandoc + - libopenmpi-dev # Build documentation in the docs/ directory with Sphinx sphinx: @@ -19,4 +22,7 @@ sphinx: # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html python: install: - - requirements: doc/requirements.txt + - method: pip + path: . + extra_requirements: + - docs diff --git a/CHANGELOG.md b/CHANGELOG.md index 4187c196b3..818a7dde09 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,88 @@ +# v1.6.0 +## Highlights + +1) Linear algebra: Singular Value Decomposition, Symmetric Eigenvalue Decomposition and Polar Decompositition via the "Zolotarev approach" +2) MPI Layer: Support for communicating buffers larger than 2^31-1 +3) Dynamic Mode Decomposition (with and without control) +4) IO: Zarr format support +5) Signal Processing: Strided 1D convolution +6) Expanded QR decomposition +7) Apple MPS Support +8) Tutorial overhaul + +*SVD, PCA, and DMD have been implemented within the project ESAPCA funded by the European Space Agency (ESA). This support is gratefully acknowledged.* + +## Changes + +### Features +* Decomposition module and PCA interface by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1538 +* Distributed randomized SVD by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1561 +* Incremental SVD/PCA by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1629 +* Dynamic Mode Decomposition (DMD) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1639 +* `heat.eq`, `heat.ne` now allow non-array operands by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1773 +* Large data counts support for MPI Communication by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1765 +* Added `slice` argument for `load_hdf5` by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1753 +* Support Apple MPS acceleration by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1129 +* QR decomposition for non tall-skinny matrices and `split=0` by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1744 +* Support for the `zarr` data format by @Berkant03 in https://github.com/helmholtz-analytics/heat/pull/1766 +* Polar decomposition by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1697 +* Dynamic Mode Decomposition with Control (DMDc) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1794 +* Expand `np.funcs` to heat by @mtar in https://github.com/helmholtz-analytics/heat/pull/1888 +* Extends torch functions to DNDarrays by @mtar in https://github.com/helmholtz-analytics/heat/pull/1895 +* Symmetric Eigenvalue Decomposition (eigh) and full SVD (svd) based on Zolotarev Polar Decomposition by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1824 +* Stride argument for convolution by @lolacaro in https://github.com/helmholtz-analytics/heat/pull/1865 + +### Interoperability +* Support PyTorch 2.4.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1655 +* Support PyTorch 2.5.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1701 +* Support PyTorch 2.6.0 / Add zarr as optional dependency by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1775 +* Make unit tests compatible with NumPy 2.x by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1826 +* Support PyTorch 2.7.0 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1869 +* Support PyTorch 2.7.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/1883 +* More generic check for CUDA-aware MPI by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1793 + + +### Fixes +* Raise Error for batched vector inputs on matmul by @FOsterfeld in https://github.com/helmholtz-analytics/heat/pull/1646 +* Refactor `test_random` to minimize collective calls by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1677 +* Printing non-distributed data by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1756 +* Fixed precision loss in several functions when dtype is float64 by @neosunhan in https://github.com/helmholtz-analytics/heat/pull/993 +* Remove unnecessary contiguous calls by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1831 +* Zarr tests fail on main by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1859 +* Decrease accuracy on `ht.vmap` tests on multi-node GPU runs by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1738 +* Bug-fixes during ESAPCA benchmarking by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1893 +* Exit installation if conda environment cannot be activated by @thawn in https://github.com/helmholtz-analytics/heat/pull/1880 +* Resolve bug in rSVD / wrong citation in polar.py by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1905 +* Fix IO test failures with Zarr v3.0.9 in save_zarr() by @LScheib in https://github.com/helmholtz-analytics/heat/pull/1921 + +### Build system +* Modernise setup.py configuration by @mtar in https://github.com/helmholtz-analytics/heat/pull/1731 +* Transition to pyproject.toml, Ruff, and mypy by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1832 +* Move to pyproject.toml in release action by @mtar in https://github.com/helmholtz-analytics/heat/pull/1950 +* Setuptools configuration in pyproject.toml by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1919 + +### Docs and Cx +* Documentation updates after new release by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1704 +* Release drafter action handles multi-branch releases by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1660 +* Release drafter update and autolabeler by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1681 +* Update tutorials instructions for `ipcluster` initialization by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/1679 +* Added Dalcin et al 2018 reference to `manipulations._axis2axisResplit` by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1695 +* Make it easier to get to GitHub from the docs by @joernhees in https://github.com/helmholtz-analytics/heat/pull/1741 +* Linters will no longer format tutorials by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1748 +* Features/HPC-tutorial via python script by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1527 +* Add marker for providing type annotation by @mtar in https://github.com/helmholtz-analytics/heat/pull/1733 +* Expanded post-release checklist by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1821 +* Skip large-count communication tests on AMD runner by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1834 +* Update `test_dmd.py` by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1852 +* RTD Notebook gallery and profiling notebook with perun. by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1867 +* Features/1845 Update citations by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1846 +* Updated release_prep.yml to incorporate up-to-date Dockerfile Pytorch versions by @jolemse in https://github.com/helmholtz-analytics/heat/pull/1903 +* Update CODE_OF_CONDUCT.md by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1939 + + +#### Acknowledgement and Disclaimer +*This work is partially carried out under a [programme](https://activities.esa.int/index.php/4000144045) of, and funded by, the European Space Agency. Any view expressed in this repository or related publications can in no way be taken to reflect the official opinion of the European Space Agency.* + # v1.5.1 ## Changes diff --git a/CITATION.cff b/CITATION.cff index 78d184e4a1..b09e7f80a5 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -9,46 +9,44 @@ authors: # release highlights - family-names: Hoppe given-names: Fabian - - family-names: Osterfeld - given-names: Fynn - family-names: Gutiérrez Hermosillo Muriedas given-names: Juan Pedro - - family-names: Vaithinathan Aravindan - given-names: Ashwath + - family-names: Palazoglu + given-names: Berkant + - family-names: Fischer + given-names: Carola + - family-names: Akdag + given-names: Hakan - family-names: Comito given-names: Claudia +# active contributors in alphabetic order + - family-names: Hees + given-names: Jörn + - family-names: Jindra + given-names: Marc + - family-names: Korten + given-names: Till - family-names: Krajsek given-names: Kai - - family-names: Nguyen Xuan - given-names: Tu + - family-names: Lemmen + given-names: Jonas + - family-names: Scheib + given-names: Lukas - family-names: Tarnawa given-names: Michael - - family-names: Hees - given-names: Jörn -# core team - # - family-names: Comito - # given-names: Claudia +# historic core team - family-names: Coquelin given-names: Daniel - family-names: Debus given-names: Charlotte - - family-names: Götz - given-names: Markus -# - family-names: Gutiérrez Hermosillo Muriedas -# given-names: Juan Pedro - family-names: Hagemeier given-names: Björn - # - family-names: Hoppe - # given-names: Fabian - family-names: Knechtges given-names: Philipp - # - family-names: Krajsek - # given-names: Kai - family-names: Rüttgers given-names: Alexander - # - family-names: Tarnawa - # given-names: Michael -# release contributors - add as needed below + - family-names: Götz + given-names: Markus repository-code: 'https://github.com/helmholtz-analytics/heat' url: 'https://helmholtz-analytics.github.io/heat/' repository: 'https://heat.readthedocs.io/en/stable/' diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 8a4d8d0db2..1b50eebf8c 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -55,7 +55,7 @@ further defined and clarified by project maintainers. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported by contacting the project team at . All +reported by contacting the project team at . All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. diff --git a/README.md b/README.md index 4944df3ed5..0f6ca711d1 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@
- +
--- @@ -110,7 +110,7 @@ computational and memory needs of your laptop and desktop. ## Requirements ### Basics -- python >= 3.9 +- python >= 3.10 - MPI (OpenMPI, MPICH, Intel MPI, etc.) - mpi4py >= 3.0.0 - pytorch >= 2.0.0 @@ -184,35 +184,66 @@ Heat is distributed under the MIT license, see our -Please do mention Heat in your publications if it helped your research. You can cite: +If Heat contributed to a publication, please cite our main paper. -* Götz, M., Debus, C., Coquelin, D., Krajsek, K., Comito, C., Knechtges, P., Hagemeier, B., Tarnawa, M., Hanselmann, S., Siggel, S., Basermann, A. & Streit, A. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 276-287). IEEE, DOI: 10.1109/BigData50022.2020.9378050. +**Preferred Citation:** -``` +Götz, M., Debus, C., Coquelin, D., et al. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In *2020 IEEE International Conference on Big Data (Big Data)* (pp. 276-287). IEEE. DOI: 10.1109/BigData50022.2020.9378050. + +```bibtex @inproceedings{heat2020, title={{HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics}}, - author={ - Markus Götz and - Charlotte Debus and - Daniel Coquelin and - Kai Krajsek and - Claudia Comito and - Philipp Knechtges and - Björn Hagemeier and - Michael Tarnawa and - Simon Hanselmann and - Martin Siggel and - Achim Basermann and - Achim Streit - }, + author={Markus Götz and Charlotte Debus and Daniel Coquelin and Kai Krajsek and Claudia Comito and Philipp Knechtges and Björn Hagemeier and Michael Tarnawa and Simon Hanselmann and Martin Siggel and Achim Basermann and Achim Streit}, booktitle={2020 IEEE International Conference on Big Data (Big Data)}, year={2020}, pages={276-287}, - month={December}, publisher={IEEE}, doi={10.1109/BigData50022.2020.9378050} } ``` + +### Other Relevant Publications + +**For the RSE perspective and latest benchmarks:** + +Hoppe, F., et al. (2025). *Engineering a large-scale data analytics and array computing library for research: Heat*. Electronic Communications of the EASST, 83. + +```bibtex +@article{heat2025rse, + title={Engineering a large-scale data analytics and array computing library for research: Heat}, + volume={83}, + url={[https://eceasst.org/index.php/eceasst/article/view/2626](https://eceasst.org/index.php/eceasst/article/view/2626)}, + DOI={10.14279/eceasst.v83.2626}, + journal={Electronic Communications of the EASST}, + author={Hoppe, Fabian and Gutiérrez Hermosillo Muriedas, Juan Pedro and Tarnawa, Michael and Knechtges, Philipp and Hagemeier, Björn and Krajsek, Kai and Rüttgers, Alexander and Götz, Markus and Comito, Claudia}, + year={2025} +} +``` + +**For the neural networks module (DASO):** + +Coquelin, D., et al. (2022). *Accelerating neural network training with distributed asynchronous and selective optimization (DASO)*. J Big Data 9, 14. + +```bibtex +@Article{DASO2022, + author={Coquelin, Daniel and Debus, Charlotte and G{\"o}tz, Markus and von der Lehr, Fabrice and Kahn, James and Siggel, Martin and Streit, Achim}, + title={Accelerating neural network training with distributed asynchronous and selective optimization (DASO)}, + journal={Journal of Big Data}, + year={2022}, + volume={9}, + number={1}, + pages={14}, + doi={10.1186/s40537-021-00556-1} +} +``` + + +**For specific software versions:** +Please use the [Zenodo DOI]([10.5281/zenodo.2531472](https://doi.org/10.5281/zenodo.2531472).) provided with each release. + + + + # FAQ Work in progress... @@ -235,4 +266,4 @@ Any view expressed in this repository or related publications can in no way be t ---
- + diff --git a/benchmarks/cb/decomposition.py b/benchmarks/cb/decomposition.py new file mode 100644 index 0000000000..44d9cf1c4a --- /dev/null +++ b/benchmarks/cb/decomposition.py @@ -0,0 +1,17 @@ +# flake8: noqa +import heat as ht +from mpi4py import MPI +from perun import monitor +from heat.decomposition import IncrementalPCA + + +@monitor() +def incremental_pca_split0(list_of_X, n_components): + ipca = IncrementalPCA(n_components=n_components) + for X in list_of_X: + ipca.partial_fit(X) + + +def run_decomposition_benchmarks(): + list_of_X = [ht.random.rand(50000, 500, split=0) for _ in range(10)] + incremental_pca_split0(list_of_X, 50) diff --git a/benchmarks/cb/heat_signal.py b/benchmarks/cb/heat_signal.py new file mode 100644 index 0000000000..9ecf26a443 --- /dev/null +++ b/benchmarks/cb/heat_signal.py @@ -0,0 +1,85 @@ +import heat as ht +from perun import monitor + + +@monitor() +def convolution_array_distributed(signal, kernel): + ht.convolve(signal, kernel, mode="full") + + +@monitor() +def convolution_kernel_distributed(signal, kernel): + ht.convolve(signal, kernel, mode="full") + + +@monitor() +def convolution_distributed(signal, kernel): + ht.convolve(signal, kernel, mode="full") + + +@monitor() +def convolution_batch_processing(signal, kernel): + ht.convolve(signal, kernel, mode="full") + + +@monitor() +def convolution_array_distributed_stride(signal, kernel, stride): + ht.convolve(signal, kernel, mode="full", stride=stride) + + +@monitor() +def convolution_kernel_distributed_stride(signal, kernel, stride): + ht.convolve(signal, kernel, mode="full", stride=stride) + + +@monitor() +def convolution_distributed_stride(signal, kernel, stride): + ht.convolve(signal, kernel, mode="full", stride=stride) + + +@monitor() +def convolution_batch_processing_stride(signal, kernel, stride): + ht.convolve(signal, kernel, mode="full", stride=stride) + + +def run_signal_benchmarks(): + n_s = 1000000000 + n_k = 10003 + stride = 3 + + # signal distributed + signal = ht.random.random((n_s,), split=0) + kernel = ht.random.random_integer(0, 1, (n_k,), split=None) + + convolution_array_distributed(signal, kernel) + convolution_array_distributed_stride(signal, kernel, stride) + + del signal, kernel + + # kernel distributed + signal = ht.random.random((n_s,), split=None) + kernel = ht.random.random_integer(0, 1, (n_k,), split=0) + + convolution_kernel_distributed(signal, kernel) + convolution_kernel_distributed_stride(signal, kernel, stride) + + del signal, kernel + + # signal and kernel distributed + signal = ht.random.random((n_s,), split=0) + kernel = ht.random.random_integer(0, 1, (n_k,), split=0) + + convolution_distributed(signal, kernel) + convolution_distributed_stride(signal, kernel, stride) + + del signal, kernel + + # batch processing + n_s = 90000 + n_b = 90000 + n_k = 503 + signal = ht.random.random((n_b, n_s), split=0) + kernel = ht.random.random_integer(0, 1, (n_b, n_k), split=0) + + convolution_batch_processing(signal, kernel) + convolution_batch_processing_stride(signal, kernel, stride) diff --git a/benchmarks/cb/linalg.py b/benchmarks/cb/linalg.py index 3596d4916f..a6526f6c7e 100644 --- a/benchmarks/cb/linalg.py +++ b/benchmarks/cb/linalg.py @@ -19,6 +19,11 @@ def qr_split_0(a): qr = ht.linalg.qr(a) +@monitor() +def qr_split_0_square(a): + qr = ht.linalg.qr(a) + + @monitor() def qr_split_1(a): qr = ht.linalg.qr(a) @@ -39,6 +44,51 @@ def lanczos(B): V, T = ht.lanczos(B, m=B.shape[0]) +@monitor() +def zolopd_split0(A): + U, H = ht.linalg.polar(A) + + +@monitor() +def zolopd_split1(A): + U, H = ht.linalg.polar(A) + + +@monitor() +def eigh_split0(A): + H, Lambda = ht.linalg.eigh(A) + + +@monitor() +def eigh_split1(A): + H, Lambda = ht.linalg.eigh(A) + + +@monitor() +def svd_ts(a): + svd = ht.linalg.svd(a) + + +@monitor() +def svd_zolo_split0(a): + svd = ht.linalg.svd(a) + + +@monitor() +def svd_zolo_split1(a): + svd = ht.linalg.svd(a) + + +@monitor() +def randomized_svd_split0(a, r): + svd = ht.linalg.rsvd(a, r) + + +@monitor() +def randomized_svd_split1(a, r): + svd = ht.linalg.rsvd(a, r) + + def run_linalg_benchmarks(): n = 3000 a = ht.random.random((n, n), split=0) @@ -57,6 +107,11 @@ def run_linalg_benchmarks(): qr_split_0(a_0) del a_0 + n = 2000 + a_0 = ht.random.random((n, n), split=0) + qr_split_0_square(a_0) + del a_0 + n = 2000 a_1 = ht.random.random((n, n), split=1) qr_split_1(a_1) @@ -74,3 +129,46 @@ def run_linalg_benchmarks(): hierachical_svd_rank(data, 10) hierachical_svd_tol(data, 1e-2) del data + + n = 1000 + A = ht.random.random((n, n), split=0) + zolopd_split0(A) + del A + + A = ht.random.random((n, n), split=1) + zolopd_split1(A) + del A + + n = 1000 + A = ht.random.random((n, n), split=0) + A += A.T.resplit_(0) + eigh_split0(A) + del A + + A = ht.random.random((n, n), split=1) + A += A.T.resplit_(1) + eigh_split1(A) + del A + + n = int((4000000 // MPI.COMM_WORLD.size) ** 0.5) + m = MPI.COMM_WORLD.size * n + a_0 = ht.random.random((m, n), split=0) + svd_ts(a_0) + del a_0 + + n = 1000 + A = ht.random.random((n, n), split=0) + svd_zolo_split0(A) + del A + + A = ht.random.random((n, n), split=1) + svd_zolo_split1(A) + del A + + A = ht.random.random((500 * MPI.COMM_WORLD.Get_size(), 1000), split=0) + randomized_svd_split0(A, 10) + del A + + A = ht.random.random((1000, 500 * MPI.COMM_WORLD.Get_size()), split=1) + randomized_svd_split1(A, 10) + del A diff --git a/benchmarks/cb/main.py b/benchmarks/cb/main.py index 52cd18d76f..2dd4680ae0 100644 --- a/benchmarks/cb/main.py +++ b/benchmarks/cb/main.py @@ -6,12 +6,20 @@ ht.use_device(os.environ["HEAT_DEVICE"] if os.environ["HEAT_DEVICE"] else "cpu") ht.random.seed(12345) +world_size = ht.MPI_WORLD.size +rank = ht.MPI_WORLD.rank +print(f"{rank}/{world_size}: Working on {ht.get_device()}") + from linalg import run_linalg_benchmarks from cluster import run_cluster_benchmarks from manipulations import run_manipulation_benchmarks from preprocessing import run_preprocessing_benchmarks +from decomposition import run_decomposition_benchmarks +from heat_signal import run_signal_benchmarks run_linalg_benchmarks() run_cluster_benchmarks() run_manipulation_benchmarks() run_preprocessing_benchmarks() +run_decomposition_benchmarks() +run_signal_benchmarks() diff --git a/coverage_tables.md b/coverage_tables.md index f90dadfba4..c9286a8834 100644 --- a/coverage_tables.md +++ b/coverage_tables.md @@ -13,395 +13,436 @@ The following tables show the NumPy functions supported by Heat. 8. [NumPy Sorting Operations](#numpy-sorting-operations) 9. [NumPy Statistical Operations](#numpy-statistical-operations) 10. [NumPy Random Operations](#numpy-random-operations) +11. [NumPy FFT Operations](#numpy-fft-operations) +12. [NumPy Masked Array Operations](#numpy-masked-array-operations) ## NumPy Mathematical Functions [Back to Table of Contents](#table-of-contents) -| NumPy Mathematical Functions | Heat | -|---|---| -| sin | ✅ | -| cos | ✅ | -| tan | ✅ | -| arcsin | ✅ | -| arccos | ✅ | -| arctan | ✅ | -| hypot | ✅ | -| arctan2 | ✅ | -| degrees | ✅ | -| radians | ✅ | -| unwrap | ❌ | -| deg2rad | ✅ | -| rad2deg | ✅ | -| sinh | ✅ | -| cosh | ✅ | -| tanh | ✅ | -| arcsinh | ✅ | -| arccosh | ✅ | -| arctanh | ✅ | -| round | ✅ | -| around | ❌ | -| rint | ❌ | -| fix | ❌ | -| floor | ✅ | -| ceil | ✅ | -| trunc | ✅ | -| prod | ✅ | -| sum | ✅ | -| nanprod | ✅ | -| nansum | ✅ | -| cumprod | ✅ | -| cumsum | ✅ | -| nancumprod | ❌ | -| nancumsum | ❌ | -| diff | ✅ | -| ediff1d | ❌ | -| gradient | ❌ | -| cross | ✅ | -| trapz | ❌ | -| exp | ✅ | -| expm1 | ✅ | -| exp2 | ✅ | -| log | ✅ | -| log10 | ✅ | -| log2 | ✅ | -| log1p | ✅ | -| logaddexp | ✅ | -| logaddexp2 | ✅ | -| i0 | ❌ | -| sinc | ❌ | -| signbit | ✅ | -| copysign | ✅ | -| frexp | ❌ | -| ldexp | ❌ | -| nextafter | ❌ | -| spacing | ❌ | -| lcm | ✅ | -| gcd | ✅ | -| add | ✅ | -| reciprocal | ❌ | -| positive | ✅ | -| negative | ✅ | -| multiply | ✅ | -| divide | ✅ | -| power | ✅ | -| subtract | ✅ | -| true_divide | ❌ | -| floor_divide | ✅ | -| float_power | ❌ | -| fmod | ✅ | -| mod | ✅ | -| modf | ✅ | -| remainder | ✅ | -| divmod | ❌ | -| angle | ✅ | -| real | ✅ | -| imag | ✅ | -| conj | ✅ | -| conjugate | ✅ | -| maximum | ✅ | -| max | ✅ | -| amax | ❌ | -| fmax | ❌ | -| nanmax | ❌ | -| minimum | ✅ | -| min | ✅ | -| amin | ❌ | -| fmin | ❌ | -| nanmin | ❌ | -| convolve | ✅ | -| clip | ✅ | -| sqrt | ✅ | -| cbrt | ❌ | -| square | ✅ | -| absolute | ✅ | -| fabs | ✅ | -| sign | ✅ | -| heaviside | ❌ | -| nan_to_num | ✅ | -| real_if_close | ❌ | -| interp | ❌ | +| NumPy Mathematical Functions | Heat | Issues | +|---|---|---| +| sin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sin) | +| cos | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cos) | +| tan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tan) | +| arcsin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arcsin) | +| arccos | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arccos) | +| arctan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctan) | +| hypot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hypot) | +| arctan2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctan2) | +| degrees | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+degrees) | +| radians | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+radians) | +| unwrap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unwrap) | +| deg2rad | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+deg2rad) | +| rad2deg | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rad2deg) | +| sinh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sinh) | +| cosh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cosh) | +| tanh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tanh) | +| arcsinh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arcsinh) | +| arccosh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arccosh) | +| arctanh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arctanh) | +| round | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+round) | +| around | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+around) | +| rint | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rint) | +| fix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fix) | +| floor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+floor) | +| ceil | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ceil) | +| trunc | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trunc) | +| prod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+prod) | +| sum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sum) | +| nanprod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanprod) | +| nansum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nansum) | +| cumprod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cumprod) | +| cumsum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cumsum) | +| nancumprod | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nancumprod) | +| nancumsum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nancumsum) | +| diff | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diff) | +| ediff1d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ediff1d) | +| gradient | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+gradient) | +| cross | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cross) | +| trapz | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trapz) | +| exp | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+exp) | +| expm1 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+expm1) | +| exp2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+exp2) | +| log | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log) | +| log10 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log10) | +| log2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log2) | +| log1p | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+log1p) | +| logaddexp | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logaddexp) | +| logaddexp2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logaddexp2) | +| i0 | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+i0) | +| sinc | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sinc) | +| signbit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+signbit) | +| copysign | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copysign) | +| frexp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+frexp) | +| ldexp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ldexp) | +| nextafter | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nextafter) | +| spacing | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+spacing) | +| lcm | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+lcm) | +| gcd | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+gcd) | +| add | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+add) | +| reciprocal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reciprocal) | +| positive | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+positive) | +| negative | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+negative) | +| multiply | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+multiply) | +| divide | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+divide) | +| power | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+power) | +| subtract | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+subtract) | +| true_divide | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+true_divide) | +| floor_divide | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+floor_divide) | +| float_power | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+float_power) | +| fmod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmod) | +| mod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mod) | +| modf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+modf) | +| remainder | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+remainder) | +| divmod | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+divmod) | +| angle | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+angle) | +| real | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+real) | +| imag | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+imag) | +| conj | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+conj) | +| conjugate | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+conjugate) | +| maximum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+maximum) | +| max | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+max) | +| amax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+amax) | +| fmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmax) | +| nanmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmax) | +| minimum | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+minimum) | +| min | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+min) | +| amin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+amin) | +| fmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fmin) | +| nanmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmin) | +| convolve | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+convolve) | +| clip | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+clip) | +| sqrt | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sqrt) | +| cbrt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cbrt) | +| square | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+square) | +| absolute | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+absolute) | +| fabs | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fabs) | +| sign | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sign) | +| heaviside | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+heaviside) | +| nan_to_num | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nan_to_num) | +| real_if_close | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+real_if_close) | +| interp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+interp) | ## NumPy Array Creation [Back to Table of Contents](#table-of-contents) -| NumPy Array Creation | Heat | -|---|---| -| empty | ✅ | -| empty_like | ✅ | -| eye | ✅ | -| identity | ❌ | -| ones | ✅ | -| ones_like | ✅ | -| zeros | ✅ | -| zeros_like | ✅ | -| full | ✅ | -| full_like | ✅ | -| array | ✅ | -| asarray | ✅ | -| asanyarray | ❌ | -| ascontiguousarray | ❌ | -| asmatrix | ❌ | -| copy | ✅ | -| frombuffer | ❌ | -| from_dlpack | ❌ | -| fromfile | ❌ | -| fromfunction | ❌ | -| fromiter | ❌ | -| fromstring | ❌ | -| loadtxt | ❌ | -| arange | ✅ | -| linspace | ✅ | -| logspace | ✅ | -| geomspace | ❌ | -| meshgrid | ✅ | -| mgrid | ❌ | -| ogrid | ❌ | -| diag | ✅ | -| diagflat | ❌ | -| tri | ❌ | -| tril | ✅ | -| triu | ✅ | -| vander | ❌ | -| mat | ❌ | -| bmat | ❌ | +| NumPy Array Creation | Heat | Issues | +|---|---|---| +| empty | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+empty) | +| empty_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+empty_like) | +| eye | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+eye) | +| identity | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+identity) | +| ones | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ones) | +| ones_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ones_like) | +| zeros | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+zeros) | +| zeros_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+zeros_like) | +| full | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+full) | +| full_like | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+full_like) | +| array | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array) | +| asarray | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray) | +| asanyarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asanyarray) | +| ascontiguousarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ascontiguousarray) | +| asmatrix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asmatrix) | +| copy | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copy) | +| frombuffer | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+frombuffer) | +| from_dlpack | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+from_dlpack) | +| fromfile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromfile) | +| fromfunction | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromfunction) | +| fromiter | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromiter) | +| fromstring | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromstring) | +| loadtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+loadtxt) | +| arange | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+arange) | +| linspace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linspace) | +| logspace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logspace) | +| geomspace | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+geomspace) | +| meshgrid | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+meshgrid) | +| mgrid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mgrid) | +| ogrid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ogrid) | +| diag | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diag) | +| diagflat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+diagflat) | +| tri | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tri) | +| tril | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tril) | +| triu | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+triu) | +| vander | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vander) | +| mat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mat) | +| bmat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bmat) | ## NumPy Array Manipulation [Back to Table of Contents](#table-of-contents) -| NumPy Array Manipulation | Heat | -|---|---| -| copyto | ❌ | -| shape | ✅ | -| reshape | ✅ | -| ravel | ✅ | -| flat | ❌ | -| flatten | ✅ | -| moveaxis | ✅ | -| rollaxis | ❌ | -| swapaxes | ✅ | -| T | ❌ | -| transpose | ✅ | -| atleast_1d | ❌ | -| atleast_2d | ❌ | -| atleast_3d | ❌ | -| broadcast | ❌ | -| broadcast_to | ✅ | -| broadcast_arrays | ✅ | -| expand_dims | ✅ | -| squeeze | ✅ | -| asarray | ✅ | -| asanyarray | ❌ | -| asmatrix | ❌ | -| asfarray | ❌ | -| asfortranarray | ❌ | -| ascontiguousarray | ❌ | -| asarray_chkfinite | ❌ | -| require | ❌ | -| concatenate | ✅ | -| stack | ✅ | -| block | ❌ | -| vstack | ✅ | -| hstack | ✅ | -| dstack | ❌ | -| column_stack | ✅ | -| row_stack | ✅ | -| split | ✅ | -| array_split | ❌ | -| dsplit | ✅ | -| hsplit | ✅ | -| vsplit | ✅ | -| tile | ✅ | -| repeat | ✅ | -| delete | ❌ | -| insert | ❌ | -| append | ❌ | -| resize | ❌ | -| trim_zeros | ❌ | -| unique | ✅ | -| flip | ✅ | -| fliplr | ✅ | -| flipud | ✅ | -| reshape | ✅ | -| roll | ✅ | -| rot90 | ✅ | +| NumPy Array Manipulation | Heat | Issues | +|---|---|---| +| copyto | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+copyto) | +| shape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+shape) | +| reshape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reshape) | +| ravel | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ravel) | +| flat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flat) | +| flatten | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flatten) | +| moveaxis | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+moveaxis) | +| rollaxis | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rollaxis) | +| swapaxes | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+swapaxes) | +| T | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+T) | +| transpose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+transpose) | +| atleast_1d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_1d) | +| atleast_2d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_2d) | +| atleast_3d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+atleast_3d) | +| broadcast | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast) | +| broadcast_to | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast_to) | +| broadcast_arrays | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+broadcast_arrays) | +| expand_dims | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+expand_dims) | +| squeeze | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+squeeze) | +| asarray | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray) | +| asanyarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asanyarray) | +| asmatrix | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asmatrix) | +| asfarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asfarray) | +| asfortranarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asfortranarray) | +| ascontiguousarray | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ascontiguousarray) | +| asarray_chkfinite | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+asarray_chkfinite) | +| require | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+require) | +| concatenate | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+concatenate) | +| stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+stack) | +| block | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+block) | +| vstack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vstack) | +| hstack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hstack) | +| dstack | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dstack) | +| column_stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+column_stack) | +| row_stack | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+row_stack) | +| split | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+split) | +| array_split | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_split) | +| dsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dsplit) | +| hsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+hsplit) | +| vsplit | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vsplit) | +| tile | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tile) | +| repeat | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+repeat) | +| delete | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+delete) | +| insert | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+insert) | +| append | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+append) | +| resize | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+resize) | +| trim_zeros | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trim_zeros) | +| unique | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unique) | +| flip | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flip) | +| fliplr | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fliplr) | +| flipud | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flipud) | +| reshape | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+reshape) | +| roll | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+roll) | +| rot90 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+rot90) | ## NumPy Binary Operations [Back to Table of Contents](#table-of-contents) -| NumPy Binary Operations | Heat | -|---|---| -| bitwise_and | ✅ | -| bitwise_or | ✅ | -| bitwise_xor | ✅ | -| invert | ✅ | -| left_shift | ✅ | -| right_shift | ✅ | -| packbits | ❌ | -| unpackbits | ❌ | -| binary_repr | ❌ | +| NumPy Binary Operations | Heat | Issues | +|---|---|---| +| bitwise_and | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_and) | +| bitwise_or | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_or) | +| bitwise_xor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bitwise_xor) | +| invert | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+invert) | +| left_shift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+left_shift) | +| right_shift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+right_shift) | +| packbits | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+packbits) | +| unpackbits | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+unpackbits) | +| binary_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+binary_repr) | ## NumPy IO Operations [Back to Table of Contents](#table-of-contents) -| NumPy IO Operations | Heat | -|---|---| -| load | ✅ | -| save | ✅ | -| savez | ❌ | -| savez_compressed | ❌ | -| loadtxt | ❌ | -| savetxt | ❌ | -| genfromtxt | ❌ | -| fromregex | ❌ | -| fromstring | ❌ | -| tofile | ❌ | -| tolist | ❌ | -| array2string | ❌ | -| array_repr | ❌ | -| array_str | ❌ | -| format_float_positional | ❌ | -| format_float_scientific | ❌ | -| memmap | ❌ | -| open_memmap | ❌ | -| set_printoptions | ✅ | -| get_printoptions | ✅ | -| set_string_function | ❌ | -| printoptions | ❌ | -| binary_repr | ❌ | -| base_repr | ❌ | -| DataSource | ❌ | -| format | ❌ | +| NumPy IO Operations | Heat | Issues | +|---|---|---| +| load | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+load) | +| save | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+save) | +| savez | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savez) | +| savez_compressed | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savez_compressed) | +| loadtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+loadtxt) | +| savetxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+savetxt) | +| genfromtxt | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+genfromtxt) | +| fromregex | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromregex) | +| fromstring | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fromstring) | +| tofile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tofile) | +| tolist | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tolist) | +| array2string | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array2string) | +| array_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_repr) | +| array_str | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_str) | +| format_float_positional | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format_float_positional) | +| format_float_scientific | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format_float_scientific) | +| memmap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+memmap) | +| open_memmap | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+open_memmap) | +| set_printoptions | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+set_printoptions) | +| get_printoptions | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+get_printoptions) | +| set_string_function | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+set_string_function) | +| printoptions | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+printoptions) | +| binary_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+binary_repr) | +| base_repr | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+base_repr) | +| DataSource | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+DataSource) | +| format | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+format) | ## NumPy LinAlg Operations [Back to Table of Contents](#table-of-contents) -| NumPy LinAlg Operations | Heat | -|---|---| -| dot | ✅ | -| linalg.multi_dot | ❌ | -| vdot | ✅ | -| inner | ❌ | -| outer | ✅ | -| matmul | ✅ | -| tensordot | ❌ | -| einsum | ❌ | -| einsum_path | ❌ | -| linalg.matrix_power | ❌ | -| kron | ❌ | -| linalg.cholesky | ❌ | -| linalg.qr | ✅ | -| linalg.svd | ❌ | -| linalg.eig | ❌ | -| linalg.eigh | ❌ | -| linalg.eigvals | ❌ | -| linalg.eigvalsh | ❌ | -| linalg.norm | ✅ | -| linalg.cond | ❌ | -| linalg.det | ✅ | -| linalg.matrix_rank | ❌ | -| linalg.slogdet | ❌ | -| trace | ✅ | -| linalg.solve | ❌ | -| linalg.tensorsolve | ❌ | -| linalg.lstsq | ❌ | -| linalg.inv | ✅ | -| linalg.pinv | ❌ | -| linalg.tensorinv | ❌ | +| NumPy LinAlg Operations | Heat | Issues | +|---|---|---| +| dot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+dot) | +| linalg.multi_dot | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.multi_dot) | +| vdot | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+vdot) | +| inner | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+inner) | +| outer | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+outer) | +| matmul | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+matmul) | +| tensordot | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+tensordot) | +| einsum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+einsum) | +| einsum_path | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+einsum_path) | +| linalg.matrix_power | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.matrix_power) | +| kron | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+kron) | +| linalg.cholesky | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.cholesky) | +| linalg.qr | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.qr) | +| linalg.svd | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.svd) | +| linalg.eig | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eig) | +| linalg.eigh | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigh) | +| linalg.eigvals | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigvals) | +| linalg.eigvalsh | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.eigvalsh) | +| linalg.norm | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.norm) | +| linalg.cond | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.cond) | +| linalg.det | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.det) | +| linalg.matrix_rank | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.matrix_rank) | +| linalg.slogdet | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.slogdet) | +| trace | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+trace) | +| linalg.solve | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.solve) | +| linalg.tensorsolve | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.tensorsolve) | +| linalg.lstsq | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.lstsq) | +| linalg.inv | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.inv) | +| linalg.pinv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.pinv) | +| linalg.tensorinv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+linalg.tensorinv) | ## NumPy Logic Functions [Back to Table of Contents](#table-of-contents) -| NumPy Logic Functions | Heat | -|---|---| -| all | ✅ | -| any | ✅ | -| isfinite | ✅ | -| isinf | ✅ | -| isnan | ✅ | -| isnat | ❌ | -| isneginf | ✅ | -| isposinf | ✅ | -| iscomplex | ✅ | -| iscomplexobj | ❌ | -| isfortran | ❌ | -| isreal | ✅ | -| isrealobj | ❌ | -| isscalar | ❌ | -| logical_and | ✅ | -| logical_or | ✅ | -| logical_not | ✅ | -| logical_xor | ✅ | -| allclose | ✅ | -| isclose | ✅ | -| array_equal | ❌ | -| array_equiv | ❌ | -| greater | ✅ | -| greater_equal | ✅ | -| less | ✅ | -| less_equal | ✅ | -| equal | ✅ | -| not_equal | ✅ | +| NumPy Logic Functions | Heat | Issues | +|---|---|---| +| all | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+all) | +| any | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+any) | +| isfinite | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isfinite) | +| isinf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isinf) | +| isnan | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isnan) | +| isnat | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isnat) | +| isneginf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isneginf) | +| isposinf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isposinf) | +| iscomplex | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+iscomplex) | +| iscomplexobj | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+iscomplexobj) | +| isfortran | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isfortran) | +| isreal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isreal) | +| isrealobj | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isrealobj) | +| isscalar | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isscalar) | +| logical_and | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_and) | +| logical_or | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_or) | +| logical_not | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_not) | +| logical_xor | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+logical_xor) | +| allclose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+allclose) | +| isclose | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+isclose) | +| array_equal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_equal) | +| array_equiv | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+array_equiv) | +| greater | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+greater) | +| greater_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+greater_equal) | +| less | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+less) | +| less_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+less_equal) | +| equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+equal) | +| not_equal | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+not_equal) | ## NumPy Sorting Operations [Back to Table of Contents](#table-of-contents) -| NumPy Sorting Operations | Heat | -|---|---| -| sort | ✅ | -| lexsort | ❌ | -| argsort | ❌ | -| sort | ✅ | -| sort_complex | ❌ | -| partition | ❌ | -| argpartition | ❌ | -| argmax | ✅ | -| nanargmax | ❌ | -| argmin | ✅ | -| nanargmin | ❌ | -| argwhere | ❌ | -| nonzero | ✅ | -| flatnonzero | ❌ | -| where | ✅ | -| searchsorted | ❌ | -| extract | ❌ | -| count_nonzero | ❌ | +| NumPy Sorting Operations | Heat | Issues | +|---|---|---| +| sort | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort) | +| lexsort | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+lexsort) | +| argsort | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argsort) | +| sort | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort) | +| sort_complex | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+sort_complex) | +| partition | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+partition) | +| argpartition | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argpartition) | +| argmax | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argmax) | +| nanargmax | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanargmax) | +| argmin | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argmin) | +| nanargmin | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanargmin) | +| argwhere | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+argwhere) | +| nonzero | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nonzero) | +| flatnonzero | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+flatnonzero) | +| where | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+where) | +| searchsorted | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+searchsorted) | +| extract | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+extract) | +| count_nonzero | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+count_nonzero) | ## NumPy Statistical Operations [Back to Table of Contents](#table-of-contents) -| NumPy Statistical Operations | Heat | -|---|---| -| ptp | ❌ | -| percentile | ✅ | -| nanpercentile | ❌ | -| quantile | ❌ | -| nanquantile | ❌ | -| median | ✅ | -| average | ✅ | -| mean | ✅ | -| std | ✅ | -| var | ✅ | -| nanmedian | ❌ | -| nanmean | ❌ | -| nanstd | ❌ | -| nanvar | ❌ | -| corrcoef | ❌ | -| correlate | ❌ | -| cov | ✅ | -| histogram | ✅ | -| histogram2d | ❌ | -| histogramdd | ❌ | -| bincount | ✅ | -| histogram_bin_edges | ❌ | -| digitize | ✅ | +| NumPy Statistical Operations | Heat | Issues | +|---|---|---| +| ptp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ptp) | +| percentile | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+percentile) | +| nanpercentile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanpercentile) | +| quantile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+quantile) | +| nanquantile | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanquantile) | +| median | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+median) | +| average | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+average) | +| mean | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+mean) | +| std | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+std) | +| var | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+var) | +| nanmedian | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmedian) | +| nanmean | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanmean) | +| nanstd | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanstd) | +| nanvar | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+nanvar) | +| corrcoef | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+corrcoef) | +| correlate | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+correlate) | +| cov | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+cov) | +| histogram | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram) | +| histogram2d | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram2d) | +| histogramdd | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogramdd) | +| bincount | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+bincount) | +| histogram_bin_edges | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+histogram_bin_edges) | +| digitize | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+digitize) | ## NumPy Random Operations [Back to Table of Contents](#table-of-contents) -| NumPy Random Operations | Heat | -|---|---| -| random.rand | ✅ | -| random.randn | ✅ | -| random.randint | ✅ | -| random.random_integers | ❌ | -| random.random_sample | ✅ | -| random.ranf | ✅ | -| random.sample | ✅ | -| random.choice | ❌ | -| random.bytes | ❌ | -| random.shuffle | ❌ | -| random.permutation | ✅ | -| random.seed | ✅ | -| random.get_state | ✅ | -| random.set_state | ✅ | +| NumPy Random Operations | Heat | Issues | +|---|---|---| +| random.rand | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.rand) | +| random.randn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.randn) | +| random.randint | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.randint) | +| random.random_integers | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.random_integers) | +| random.random_sample | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.random_sample) | +| random.ranf | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.ranf) | +| random.sample | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.sample) | +| random.choice | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.choice) | +| random.bytes | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.bytes) | +| random.shuffle | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.shuffle) | +| random.permutation | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.permutation) | +| random.seed | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.seed) | +| random.get_state | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.get_state) | +| random.set_state | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+random.set_state) | +## NumPy FFT Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy FFT Operations | Heat | Issues | +|---|---|---| +| fft.fft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fft) | +| fft.ifft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifft) | +| fft.fft2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fft2) | +| fft.ifft2 | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifft2) | +| fft.fftn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fftn) | +| fft.ifftn | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifftn) | +| fft.rfft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.rfft) | +| fft.irfft | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.irfft) | +| fft.fftshift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.fftshift) | +| fft.ifftshift | ✅ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+fft.ifftshift) | +## NumPy Masked Array Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy Masked Array Operations | Heat | Issues | +|---|---|---| +| ma.masked_array | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_array) | +| ma.masked_where | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_where) | +| ma.fix_invalid | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.fix_invalid) | +| ma.is_masked | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.is_masked) | +| ma.mean | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.mean) | +| ma.median | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.median) | +| ma.std | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.std) | +| ma.var | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.var) | +| ma.sum | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.sum) | +| ma.min | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.min) | +| ma.max | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.max) | +| ma.ptp | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.ptp) | +| ma.count | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.count) | +| ma.any | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.any) | +| ma.all | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.all) | +| ma.masked_equal | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_equal) | +| ma.masked_greater | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_greater) | +| ma.masked_less | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.masked_less) | +| ma.notmasked_contiguous | ❌ | [Search](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+ma.notmasked_contiguous) | diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 0000000000..d0c3cbf102 --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/make.bat b/doc/make.bat index 02b8e03b50..747ffb7b30 100644 --- a/doc/make.bat +++ b/doc/make.bat @@ -1,64 +1,16 @@ @ECHO OFF +pushd %~dp0 + REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) +set SOURCEDIR=source set BUILDDIR=build -set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source -set I18NSPHINXOPTS=%SPHINXOPTS% source -if NOT "%PAPER%" == "" ( - set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% - set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% -) - -if "%1" == "" goto help - -if "%1" == "help" ( - :help - echo.Please use `make ^` where ^ is one of - echo. html to make standalone HTML files - echo. dirhtml to make HTML files named index.html in directories - echo. singlehtml to make a single large HTML file - echo. pickle to make pickle files - echo. json to make JSON files - echo. htmlhelp to make HTML files and a HTML help project - echo. qthelp to make HTML files and a qthelp project - echo. devhelp to make HTML files and a Devhelp project - echo. epub to make an epub - echo. epub3 to make an epub3 - echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter - echo. text to make text files - echo. man to make manual pages - echo. texinfo to make Texinfo files - echo. gettext to make PO message catalogs - echo. changes to make an overview over all changed/added/deprecated items - echo. xml to make Docutils-native XML files - echo. pseudoxml to make pseudoxml-XML files for display purposes - echo. linkcheck to check all external links for integrity - echo. doctest to run all doctests embedded in the documentation if enabled - echo. coverage to run coverage check of the documentation if enabled - echo. dummy to check syntax errors of document sources - goto end -) - -if "%1" == "clean" ( - for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i - del /q /s %BUILDDIR%\* - goto end -) - - -REM Check if sphinx-build is available and fallback to Python version if any -%SPHINXBUILD% 1>NUL 2>NUL -if errorlevel 9009 goto sphinx_python -goto sphinx_ok - -:sphinx_python -set SPHINXBUILD=python -m sphinx.__init__ -%SPHINXBUILD% 2> nul +%SPHINXBUILD% >NUL 2>NUL if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx @@ -67,215 +19,17 @@ if errorlevel 9009 ( echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from - echo.http://sphinx-doc.org/ + echo.https://www.sphinx-doc.org/ exit /b 1 ) -:sphinx_ok - - -if "%1" == "html" ( - %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/html. - goto end -) - -if "%1" == "dirhtml" ( - %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. - goto end -) - -if "%1" == "singlehtml" ( - %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. - goto end -) - -if "%1" == "pickle" ( - %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can process the pickle files. - goto end -) - -if "%1" == "json" ( - %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can process the JSON files. - goto end -) - -if "%1" == "htmlhelp" ( - %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can run HTML Help Workshop with the ^ -.hhp project file in %BUILDDIR%/htmlhelp. - goto end -) - -if "%1" == "qthelp" ( - %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can run "qcollectiongenerator" with the ^ -.qhcp project file in %BUILDDIR%/qthelp, like this: - echo.^> qcollectiongenerator %BUILDDIR%\qthelp\HeAT.qhcp - echo.To view the help file: - echo.^> assistant -collectionFile %BUILDDIR%\qthelp\HeAT.ghc - goto end -) - -if "%1" == "devhelp" ( - %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. - goto end -) - -if "%1" == "epub" ( - %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The epub file is in %BUILDDIR%/epub. - goto end -) - -if "%1" == "epub3" ( - %SPHINXBUILD% -b epub3 %ALLSPHINXOPTS% %BUILDDIR%/epub3 - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The epub3 file is in %BUILDDIR%/epub3. - goto end -) - -if "%1" == "latex" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "latexpdf" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - cd %BUILDDIR%/latex - make all-pdf - cd %~dp0 - echo. - echo.Build finished; the PDF files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "latexpdfja" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - cd %BUILDDIR%/latex - make all-pdf-ja - cd %~dp0 - echo. - echo.Build finished; the PDF files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "text" ( - %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The text files are in %BUILDDIR%/text. - goto end -) - -if "%1" == "man" ( - %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The manual pages are in %BUILDDIR%/man. - goto end -) - -if "%1" == "texinfo" ( - %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. - goto end -) - -if "%1" == "gettext" ( - %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The message catalogs are in %BUILDDIR%/locale. - goto end -) - -if "%1" == "changes" ( - %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes - if errorlevel 1 exit /b 1 - echo. - echo.The overview file is in %BUILDDIR%/changes. - goto end -) - -if "%1" == "linkcheck" ( - %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck - if errorlevel 1 exit /b 1 - echo. - echo.Link check complete; look for any errors in the above output ^ -or in %BUILDDIR%/linkcheck/output.txt. - goto end -) - -if "%1" == "doctest" ( - %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest - if errorlevel 1 exit /b 1 - echo. - echo.Testing of doctests in the sources finished, look at the ^ -results in %BUILDDIR%/doctest/output.txt. - goto end -) - -if "%1" == "coverage" ( - %SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage - if errorlevel 1 exit /b 1 - echo. - echo.Testing of coverage in the sources finished, look at the ^ -results in %BUILDDIR%/coverage/python.txt. - goto end -) - -if "%1" == "xml" ( - %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The XML files are in %BUILDDIR%/xml. - goto end -) +if "%1" == "" goto help -if "%1" == "pseudoxml" ( - %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. - goto end -) +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end -if "%1" == "dummy" ( - %SPHINXBUILD% -b dummy %ALLSPHINXOPTS% %BUILDDIR%/dummy - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. Dummy builder generates no files. - goto end -) +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% :end +popd diff --git a/doc/requirements.txt b/doc/requirements.txt deleted file mode 100644 index d65f26ec21..0000000000 --- a/doc/requirements.txt +++ /dev/null @@ -1,5 +0,0 @@ -Sphinx==7.4.7 -sphinx-autoapi===3.3.0 -sphinx_rtd_theme==2.0.0 -sphinxcontrib-napoleon==0.7 -sphinx-copybutton==0.5.2 diff --git a/doc/images/GSoC-Horizontal.svg b/doc/source/_static/images/GSoC-Horizontal.svg similarity index 100% rename from doc/images/GSoC-Horizontal.svg rename to doc/source/_static/images/GSoC-Horizontal.svg diff --git a/doc/images/bsp.svg b/doc/source/_static/images/bsp.svg similarity index 100% rename from doc/images/bsp.svg rename to doc/source/_static/images/bsp.svg diff --git a/doc/images/clustering.png b/doc/source/_static/images/clustering.png similarity index 100% rename from doc/images/clustering.png rename to doc/source/_static/images/clustering.png diff --git a/doc/images/clustering_kmeans.png b/doc/source/_static/images/clustering_kmeans.png similarity index 100% rename from doc/images/clustering_kmeans.png rename to doc/source/_static/images/clustering_kmeans.png diff --git a/doc/images/data.png b/doc/source/_static/images/data.png similarity index 100% rename from doc/images/data.png rename to doc/source/_static/images/data.png diff --git a/doc/images/dlr_logo.svg b/doc/source/_static/images/dlr_logo.svg similarity index 100% rename from doc/images/dlr_logo.svg rename to doc/source/_static/images/dlr_logo.svg diff --git a/doc/images/fzj_logo.svg b/doc/source/_static/images/fzj_logo.svg similarity index 100% rename from doc/images/fzj_logo.svg rename to doc/source/_static/images/fzj_logo.svg diff --git a/doc/images/hSVD_bench_rank5.png b/doc/source/_static/images/hSVD_bench_rank5.png similarity index 100% rename from doc/images/hSVD_bench_rank5.png rename to doc/source/_static/images/hSVD_bench_rank5.png diff --git a/doc/images/hSVD_bench_rank50.png b/doc/source/_static/images/hSVD_bench_rank50.png similarity index 100% rename from doc/images/hSVD_bench_rank50.png rename to doc/source/_static/images/hSVD_bench_rank50.png diff --git a/doc/images/hSVD_bench_rank500.png b/doc/source/_static/images/hSVD_bench_rank500.png similarity index 100% rename from doc/images/hSVD_bench_rank500.png rename to doc/source/_static/images/hSVD_bench_rank500.png diff --git a/doc/images/heat_split_array.png b/doc/source/_static/images/heat_split_array.png similarity index 100% rename from doc/images/heat_split_array.png rename to doc/source/_static/images/heat_split_array.png diff --git a/doc/images/heat_split_array.svg b/doc/source/_static/images/heat_split_array.svg similarity index 100% rename from doc/images/heat_split_array.svg rename to doc/source/_static/images/heat_split_array.svg diff --git a/doc/images/heatvsdask_strong_smalldata_without.png b/doc/source/_static/images/heatvsdask_strong_smalldata_without.png similarity index 100% rename from doc/images/heatvsdask_strong_smalldata_without.png rename to doc/source/_static/images/heatvsdask_strong_smalldata_without.png diff --git a/doc/images/heatvsdask_weak_smalldata_without.png b/doc/source/_static/images/heatvsdask_weak_smalldata_without.png similarity index 100% rename from doc/images/heatvsdask_weak_smalldata_without.png rename to doc/source/_static/images/heatvsdask_weak_smalldata_without.png diff --git a/doc/images/helmholtz_logo.svg b/doc/source/_static/images/helmholtz_logo.svg similarity index 100% rename from doc/images/helmholtz_logo.svg rename to doc/source/_static/images/helmholtz_logo.svg diff --git a/doc/source/_static/images/jsc_logo.png b/doc/source/_static/images/jsc_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..d7f23da6ec57004903a67ae80ad51ee441937a4a GIT binary patch literal 16766 zcmV)qK$^daP)4e@w0yZ5}OiVGx4HvAoQ{H>?-+gaamcir;Tast-P6n}7guFX*=JtDz;1-YKG1!Mt zyadIgh^pdIL{;%9qN;clk0PoD^B^uE#j8+6)t(>7B@X>c08%b?T}%jJ|FCbk1ob;H z{RRf`F$lts@NX24BC2*iI7%pjuxn#tW7ncD5HS(45J4Eeztiau$A%xU9QXnH!Y%@+ zh^k%UiW(=pPsWMPUGY-9Kb zIS$1Q1&zZ`3}2D$WHF&XW)e4xonC~qMF8zXsG`W&RhGvbp36gV_=iHO+jTcv_WJIg z54Lr#{&34%>o&gj{<>G*Ui0Ex@4WcdyZn5q@9V$cTlww>tJZG#_lC_MZ0%gXrIXvb zB8&A&XaNv!o1)^35A)6T<2W$@kl-UeY+g$h64VxZ%>CNBY_0ZXW zxG{6#RZZtyI`za0D&jx+?&1>SFF)}&*N<6nyf9{(aP%Uns;vafPfSo_N+>M^<4Rpx zqCnB&)B?$&q^L$kEk^%He}KLa*iVYgR>3#HR>cMO!-AHir&71nR}(q1{=`?;ZL<0t za1m7niz;6GW+^>|5przD>2;tK`CfO=#WycIcIl7DE<8>gH&ZCDmx6KG=#Y)2qMDY~ z1T{he1q#3-E`;J@ZBz_Jq}mAmDjH!?kL1kK4#$`1Z>x6vzbgH;>5*C_a?Yj9Q1!!6 z-VGNIucNB(>F1crhvcqFa5o{;R4iY!etPD|Us4zBueHk6O-djv2O?51DyT_OPm94M zhZI;Qtq&g&L1PgTJ@u%tv$OjTr^IZY5{!O%QtWR}J;zE}sEDcpLsehWhoIbO(Flqr zb`EhnF?na}_P;#-{E_q0!ndkN1Y2m7t%N${P=}(Y6zZh3gsju>DB`0&js{H3bJ0`O zQ=*;$2S-g8oC*F-uuK~L8Uv8!%u@1`s1ntfPm7fV+rBZmr5B8)>3$4VMGxprR1xOv zFzb)d0Nkc%!pOmvneBo|+;ty)?X8n8zIN=wj4_52znagZYU3mkN~&+b+1G zhZ<4a#Ka`UVHoeDiZ|(?af>VmS=2tg{K3XEuKt^U!Yr|Bp`x>7Eff`vs6Z1pBL+Et zZO-L!yET-kO86*z$4-6v-L(KR547^#~xdb zZNo&0d(ZQ)f9II+h+uQ67Ll}QX(%B>{!d{7lJWCjquU{n69Bjm7>N< zsurJjuYios9b?}RXl zxIES;8Fp0hSc*E_m{=DV_2|ez^F@ET11eNbuL}dn{R^PQV=tqM+bV}fy9nDha=I~D z_OBNY4%L^|v?$cqXw)!JB&7u{!@efL85yx*hZOHLCn>DuP;BIsxc~6UAHifh?`0RO zpPoY%Oi9r=H|x6VP3vcWzS}o`ZfP*AuoAh##+YbGE^0}d^s!($DukkY166U>&&eqC z4IDKqeZlYOTelq3g>4)N3Ug}_#bcOJMf$qJ{iZi*s7s%C`mC>5vwp#8KNk*}RuXJ0 zQNtxhf`-Nt@7C4EsMBF*L@=V_-a!@ED$z*$=-U`OW#zgJ-I(SUFk*`M2nDR)6j3$2 zsN&UAgW|esa+5_wT)X;J!VN{7eVpOOCALh4Vw$pC44YxgFoW*(my?Z?mj>o_}N1 z#AEz{hLTW5s0>4iT=&?i#YLDlj)27)H~0F-SM$2HPZhuhtM#X3~;@6Xs9xDl@1Lp;kk)vw3AA=!5Reo`bxcQ-jEVjD{+Jtc87r#{|&fafwMy}2X!46SR zNISW?#4ZP$h8Ily&BSCqrqssdi3{3(a?wZd{A1VKD>OXDgY6=OvIKV{2a&x~_#f&V z5V4pRcOtyebr+rfa|)@FI_T7j+7dzQ5OP_b2Mra}U?nYbP+i+~_djMbE(VW+Y3I|y zWIHA=ty$Z}X89rP_&mi`usqJpFLivYx0~}$J6E1CTMo7hYFyNz2_wQl3+%Ue5K;of z=`p_%J9u*QhmiOIp9k?~T(x6p7V)e7$ZP-pO1bv<>#KU93E<`f_5wx~=~qu62sv5e zzWL#{$(b{J<@0>d*Gs~tlpdFie7j9ccn42n9#zzW`a|iGV0-$)--BDhS%=*qPdYx} zi9p!Fu4TiTh+7vw(Rb9OZ%$cs?}}IHpXps-(-k=1+8~95Dn_otnN@7~tikxF3W_1h{VU!~lhYDi%n&^y9%jAk<22qp{Q}=ZvbHuLN58u7T)H zRDHr;QlsUoX{*hgZGtPeA(QoLtj-p=sqeFG2UP z6d)zyz{1Wp)5-kuDzUmzG@=m2s!_#Y>$NKN=8Z=zNYRwg0svY=y|_3?sB+mB19oiN zqz}SlZ>J~mXj=Qq`VS{GoFJ6XD+#3(BQ;zz=6jARaD0+MAM1>)XGRC|O&@35=k9|4_6GI89I$Wkj%f`=I{{9vdxZ`XX$EDs7Ryqbb z6%qEtP~{;--=>48-+qstt+57Q^G;G4!BX4b_otlJ_S8uDo|8liuzNg#!iQn0<$$cS2UL?~8v#FP&l z*M#1)O(S;xxyy!GJD2W!NX!=Ak}#2^|yTGGfMR^V~R=&Mp`uDg;GgpYxPN*Rrrjk zaOmVG|Gfsv366sh@+_os&o+j{d%f)(6kMPLqp)pe?6QFYn?f{aD$|o?fws|s#iw3z z!&c&Mw=6eTYe3$xRXI%#(&eB{uKUfY4N9=ZuSa1}I2eJlXDC@1&6ynwrBPq7?eGOl zJ2_W2;~mYfQoEi-6}2(Hy6KMn^=2A!fI%`~EQ}BH?2IbN4P`ar*P?Q;{YzTY{jaRr z&i0%z%5qBZy^1Q3Pe2x2aGHI9p=v+pl94sdxsRd63JXm6GF4pGlfu{;SKRsFM-K8T z8*z@_o6&l+AqoQI-cy#I>DSv8HAcgaT{c_ATr?8pjdKXLs_Mr#9>2kMdK`K)9X8^Z zr()jY$4J=S4@WZx;GCvG-#sf{{gS%aAB+?pnEF4;T2c;1zE;=z%6lJl+O~I5m0vB% zZRoo0v#VBrV^&x)!jc+O)TnHH+R|*FucvWjvV{KFp;J$~+d)>=b1$x zRGZJf=H?!5t_I0O3%le0T-U~EJ8?tNlSXK5qLCB~`m7glTqWsof1qXI4}Z=IKv{Zy zJxz87nW=3es0A!M?L6wZ%e0gas3^j~>7@;zI!L2+nwc zitGbK6&tl+PYL9G2*$&%yZ4dN6?03WdK+fZZsDLxP4X=a;BeIZ)y}@X`vhY@iwR?o zx&F!J5Uet>BjVi&1*lgc)}W+!VeI^6e&NV8(8mXfrLWMrgZ zWWQkZb@x93W9`_<*N>c`v{eKg1ISq;oBplWJp7bcK8HnxOMwNCZH|kDQ&x~UaV>pJ zA-j$V<@1lfOs5 zGuHN)PF@{&w^wln7FWT%eQv*x{#E5o8K2rrZw$paeK#yxj|hbmSLIQap%Il&i|!X_ zyyNNTAxgJ;SQ-T7AmZAdL>1v$_NzBLNQZ}D4GzeaV}p2EO6t;OmGc*! zc#g}LCcuuMAwv~CAfzJquXynbT73y{Ru^)AhCUSrHByWxTpM53k61PB5MFX8K*F7jWr_=~TBUy_NZjuDMscN*WrlIAm zi@;pjJwt~oHl63<&9+mQIa{c1qqjsXWK>D8$&{9Yk%Q}+f#k{a5fukLf0I|=_1Hm^ zB627JPF!yvOwMrVU(w$gmx68Io*iDbb(;f_8nUs(xjD! z>avj$CN`dP!|l|eSZ23ZmuT=#BW1}RiJKDCSRr-&cAi%AYxMFZ$^!K-uHE1eJJ<5^ zq={ma^PGzqHvV3G`~Aaar$!qQSak>G-{=lW-`$W@$qNVP&xE3d6^gA;#$VIYa`q+L zEC={-u+275If2Q5io?P{tS>=8s+IZG?}Z8TWi_b)6-TU4J)i;dU2>ph+VSUp(A~pA zM$@yQl*?-TZ1M?pJLnJhJmx=ot}oC^gDNQ} zs_g^wfJ|LcI7EewGc5zfDlL-Vs7V%Fu}Z3I`}&NgSJ$nFhXwIrTy4xQo$4X}vFq&FosCK<^=MOBi?SJSw;L#$}7O`X+Evl&>AhN3=jnG8&3Y@k)p z_q%Oz%p|`~A+=NpCZOa~h^W$F?!_0ZpT6`A$U+=w(0F&4*yTK!wvDqit5|aRb;9^X zu!9wfigmDQ4V(kJpN#JURd9WRuqQ0*9ZDcB95!{64J;otq=Ha^IGF5twBGMGJp9rd zzN2SK(1TpU=y;-qiK?Vbzm7Zh&gHMLl{(ud1bcQbnN2$pnk=E-tm{fgO;teWCrup% zm@8luiix$U!L(J>s|MQ?YOd6@tRiS!>1o4UPGF>3#k6Eq9gF)C+;|r)^>l z=C$Ip0$N+Pe}dI7@2vWYSD(Af0)zby4$qtBL=Cr}Q>H}FI^YANg{@0{d zrKUw1=If@056~K}yZYDo>^i|FkRW{>H^G~BtfE9y8 ziQWt~sRU!=n@@gy(`HsLMyv|~S{#I}zIhQ<$g~N5_~ljpi3@0|PxCTqIG*POCyimK zqM2ZN%>3hCcxR2-k3@9PDhm(2^r|p!YU!j7IRLJNN!AubXcbdx`?_3L zJYtFhmpa0C#D`7IfTPM!Bk6IouKN39JuYMGBm<)48`vZatA2?`*2!wS97w^;LZ}e5 zLpZ9SQk9YeZQnVzjc{>#&(7%OSuxqP0wnUkx$S<~dI-fC|6+95q=bRy3an#H*5YDi zed?Eg*b3`a&Hx9|2G+5GB{{dE^L$Xv zNsDS~m_=VXy9o{06)pDPliD_AO~!9z58fn<13QOLta@KKY!Yk>X-Q}>7s4>HOi`sK zgo=e7mtETh6`-t*EpIwN1Qky<#Y7y7p=#l2=lH7Xl~6{c2_Wyc4rAAq$5m2ohp)Cn z`0nKQI=iz7vHb`#IJSx~4w7}I{oj+DWGy0r7^zonmBvN0DDZ?tbxg zFjK@D$Rzs!Q3b)C*dh zZL8_5ON8-@OKOv1Fe0;Uv%P{UKEW(jG{k@UJ61_E2eNlMKu#Fy^c`yZR&Mz4uzAT6 zJuG=uL4;p-QP04srTuDj%>3jVo41)Db3}o~X*oO>uoFJ_$%F&Ck@fbr&Jp8h`h!t1 z5cL_UT;Qz5hY?jdlP>|^T}-ZSQQJ?2;T((x&jYgO8?y!J=-C(TA8M6%o?C|ojyHNZ z?7Al2L&mJ<14JojQ*!5f~6e^4aG0GiG_ z)u=*m?qLR?Y7pnv5#LH9*g}~4)wc7l@K-i5&TbaaalIPvpi0k3db(7P`Hr4(%hS&R zs*ue#Mf&QD#CxbBCcHyn=pC>iIp4Ra0gZJn7?Aq)2r9yrA7y578EEPbL8<> zw#zctc4z1O@chwwi(DNSf*qokQlQol9p1n^za_#&vqC_+3Z#s|$o^`>nO9zK<@FAT z=NVsE_6Ld|=UsD)QZaADl#ZMS*t@8rk1MrR)DMQ@(!^QIo>@LGSh-*W+O4IwkoG16=RE+aVZckt5`TGdJ1uQwc7SRix3;o zqcN-VPIs@K{;?cr6hm<#v{yP4A463Vrcc?pguiOut^Zn%bMAUjJm8K97qiJp8zIo- zxcTva$zvyvGGa#95qN$31Vna>SNgK1Kg>8fpDFk9Fw!eb2 z1V6Ou-H}EMkiD>`)80oFoa}8FQMqc~)ek;482f@!Cd$qg$4qXTvqFHrBrH41=%Cfh^YT-Fny z4>QO#aJ*l^fD57lG~JDe!MLDiWMc{a8%y-oZyX!G@1@teawT9OQ}O833oI2lY}?&6 zLA?w{3AZBrmD&cypdKd*EX$fFY!#c`OG-w9dcfZ;yWbr|Tea(IuuB6h({k|!;+}fd zjU%h)mj)w>nv(S;qMDMxVzuYQdIrO2vDQHpJU#nulFm7C(t@5T4Rr|P=6`Q??6zlL zpb>hHgLs#Tc(66IBk{`#&^u62fXe61Pd_glHVIG#!^G^;4S!qgleUU|Q_+*9l?$$1 zcAx7GC`=W0|8Zm5;xuya!S)u6uDJd0zOmCws>A-!aiY<|CSqH2veJWqD#khtb=Nc? z2_PGTuSOCm`Hz~bwVnE})$dYBu^Au)W|Vz;%cEp3F4qa?FAytOkDHqqp*AT7A5m`; zybxFE>^iG}s*;KG|8VyMLv5@08l8hl7J_#dvf@Wye`nPAX`_sAN!=3a2gRE9&&=7y z6~=a9{9yr02vXyep&0emW@J6?3nfNPNsU$;XB~I$n(f`-@c5)6Fqq`UXuOFk=vP3n ziX010*L`B;s{O0x`PGONN-$Lf&!8$8kw8Hyp-h~6&x)7v5ZkK00S-0=Kp-nL>D77d zjf%#MIAP96BdJu!cCE;>TkwKz4E(n#DHti6l=5q>zM2L9J0sfiqw`(Hu|(+Y0oFXO z@yhcHn-JtlRKaEALNT@l8dBW_wmcK-X z9##Fz7BqOrre!+7bk+OAD{t7pW`VD!S>`(TJ2;6F|4{;pZIT!?k4^A>H1Xq35Gqn5 zYU5)00%gLC*(aa-*z2q5d+4EmuVXom$rP@09D{7m!XhN=&ADM?TGCeC~O&9?`*JShy~rGj=Ff=T}Eh&=uFx+zP~5hu(l zGib6FgLP*Z>1R<9y;wDEAj=M>KEf*UT(yfU2SF!2&tLr}b?)t9;W*?|GkJh83 z>y|2kgseld%EuhU{4*h|J5CBCIhgiQUmlDrG|aDVrC=)6W25xQ7uBY(h34P!-0Kuf zoesYLKd%8MSv3C@763z#!0fYH3+Z zf#j>&E>tcO#?AbK(G*Ob+3>>)Zhhik^lveN(9AZoJua6TD-0lg+^Dn2&$5W~)|TyG zsjXMEu;5AAPFCX2!2IB$2CIYY6;#31GU@dq*aD7H)-!P!6xQ@uH$VRL^^ZS$(q%V% zdup@rT}?P-l5oT7`r=#DwidvzEoCF+Zu$bh<>-nCVPQF7DQ2k5g!wb56b@+PX}AgJhy)h!i` z$GJIPkzqy^%m^Ts%;#=BP^I!Y8Vrk*h=!*GS&nVP261i)i0N78a6;H%xs7tPZl;-q z_#ZKOf?N%X7okY*j|F0zK9)IAP&rUqbANyLLqhp<|Kz9`4100&8-xO)pzKvy-|&-* z8ReHVXmjhIrF^x+mTa!$e&s>zNMu^t(p;LPc1GRH@9y=@+( z?*dfb6?W%cKmPvc!)3V^jS3E*6)(}7l-e-FRUVWQR^tkkITAwo;LTKiR=Sr1LQYa3b(4OR3rKkeXhf>0Kro7XWH9__g6;b6G|F;8D zki(jY+_U19FB#4LP`=;bNihVN#!H^3K*NH_rti;br<~T zmr`YeVq|hd(jHVt2lkFAsBi__E82g!*|IRn@}Zp~s=Sc_vMB|Z(Ytpc^2~edgt60n zFl3S^Q1Bqb0PF6Hp_owJ+I;>M+o{{eJ%|iQbfk!?VX7+-Y$BBH!MN9T%NM8pfo7OG z1QwwP3?v$!i>UIziVNk7e!c9GUP6PaUMPtaQRM+hv7#N66?zfsp|o8ilC zekb}EM?A#V)Nh1=`sfkW4fi~~!eTp1cyH9JipMUCc$hI%WRYvH+WgUxiN#ncy`s>4xfo8#T?KpMJmTdrtM~Fkbd3`%jAg*BsI2k(w7jt$=syHOO0rt z?c@#JunW(6AjNu>M-o7krRR3rd2YkDuTPJa=n>gSeF9aUsHoKRn;ft0N62 zi+}OhBM%6&$)Luz;EYSfiHm%p1c=fU1Ov1dJwR`~{BQTVO!lSU*UWv`!oSGif zU7{~d;;uvPsENmptff9BCDO!7jd&79DG}ZtqklG%l9BYOZHLTgaAB1SXg6`FBz$_x{OlUFtOTTw$YOFY#OkuGEv{ReI-{g&k<7Op8RdkS zqrLR3vPMkO)4pI_K5EVtx8K*p8G+bNEONO@b+NWSjAqKsa|Y3S;IQTVAN=MzseEQB z)5;QQm;!4>(P3sJ{EVpLEUz6tHGYuRxN`j_n~TF^%=pBaB`%_B$o(BA`Kf0#BlM#MrWc0LojioVP#PUfyuLzq(0MqrSSNq(ut{RY|cn;nO4EnUQ+_z4h3&c8^k^h^pQ7 zee6J(6R%<0hN*6cxZ99>OzJdY{5)T6Mlv!2#8Qz0(Wp*HYT93$@mJN)KK>kwbuf#z zYA+lpx3O(I$m+#zr|T|0<-9Ly4L&0()}}-lAdX6Eq1dW~KXjZ_+3p)V{qZ;61}ZRY z6;ZVpEEdj}60!(yMELCMZY!%;;8WXZ*elfmUw3YtvOqkb988R;4p%JbpdTyiWPy30 zh^nElo9Fq#8HXQ2h*2e5SqB!8&-~-fB@^Z_qqu&9Sf7)+|4Ya>N6aek3;6S_hARiGHnL2Q!lx4M8%vDYP)CkICrJ{ z%6ZRny05a4_NmcOsh~&j+j}d#zxjH^=WYs69~qc zqE2B%>rx-nxY1Fj$K-L-|MJ8$HYBzV`>4#$YJk^7;oKJ9K7D9KT?{NlQwM4$orf|2e^BvwD@D@#~3$^v37k zUXw*Mr-ftBuU-*V{~<6(1oyhA$HBcl7;nVnqT3%GIdRq~Jt76d{!q%7Q}OFxN{ALl z%MVnB(t_T>==m2MfAJr;SfKhmSnT3ORP8F9;&BvqvgSt)>NQ>R()tavPWicT_zb@q zk>L?fF*=ucfwYZxW_sc3uhjRcsghvpn3>Vn)@?MAO~$=S&ogabkqTAjV;&l)AK7)iTP6j4D5No9kL&qixJKV$mmTb`ezrJwQ30aCI)i ziSQicI0!k|>0;9d_dN4_?A*(hBc_!FTV$23UxL_9v>;uc7GtPND8ZDhrWGUQuS@yW zw*9IXT=(eHTR^4|8c^8X!Aw;|)$Sb7kcUZg5W(SsT?8Nnh4Z(!cD9^z@t3ExNtKIA z15FCs4QE?&F{O}ECFu02NYhi&lVWxAmqP6gr(d|q0qG4`-Jmff!D#UK$oC;s^~rJ} z2ulh7nl-bQWp!F0DEh0L?)qZotkRk$g*7xpJuIjZfpH?p6kb}i5FwSL_k`8fl%ysl zHCz^K_};XZwY`qXAVpxu4Do=V7+38LKrAg0T%;Hw{QJN>uU9^G)9Fm9H=CQ=F|9O~>qW<%84MI$Qej9^$xNNP&dQiWPX zpr;n`sgW|ZO&NdemG?gaohY*xN(nGV#U-?JnnQ(wDxPc+E?V<36Z_c=E`O4M37_x$ z=f8sqSBW9GFR>$BKM4+a5JB?`K`#QP(O6uy*0vjdc){1FHIEF`OMzxK7EE!I$wQ}| zow58`U5=Nsu!Mo#kE_An?)vz&iflDX}Q6`8MsHe+~;~6o*;A{?zfYhKtdg_4MSY=~oy-6b}V$v|3;Gw(kLyIa5in)Dx z>LAqFa(Z2a&lBVny%?{OLtKg+$96%55NL7w@l1TMy>r<=m;d6Hd%rU)AsjZjOmC9| z?LH$836aquhL+@k_4znsk`ZI(3I<&SkR*=zsvEvHE3$ISwl0iqWDk%o!CR=>5z26e zZi4#^%NEjbE;r_vFeqw^)wJL}j^1W1*so zbJS2{)oGeM6#S$LAQ^Df&f6@xNjk3S?YdpA`_4z5FRop`;@$V}e&y}x1;72qqz3u$ zNxt!O{WZ;Hp|}tTi?HY&gMJHrVZ>#|d;*uA+AWp$F%shE2~uFIlGGSWp@bA@I50T( z&ksG>3s#Ecw?^?$OE3;Cs;G(Ta?lNT|NW=GzwU~g?zr)RC;tBIi>o$n>)owiNMZE$ zN4*cf^!j!8J$l8h_x<#`znysL4YjE=zdp6Cq;jq>Zl+MNSPr%-YP(DUq@@&NsiZ9t z)PzV;1+!2TJv5NwOK{xlC0XQ=77E=1vJGjz*2n5qCL>e|0{O#C~u(}&DUmNz6PrB0i9!ddf9I5YB-U$&q5 z^W=FKpK#IT$Nl2c%+G$E`o-npvoDUFdrAH27tA^SjJo(q6B;|dw;=tkIq8EYwH;U+ z*60S`V{$M_ z-Jf4;D;YQa)_=T|1*I?S4(~CfaTV$8+(I4scE{f0x(5g6enD-bASh8IrFz5{Xe_C2 zD5+Xl7HlZd87URIl^9r9^WzZR1M z5hWNap`Z!SfGIyj5Q=?{Qn`Vj^l@S%E^Pcu3`X{=jqhJQ@0tgn+U~l&woRkjVHQ3g zT3baetz(-MSe>@DHtSS29Y<4KKfMq&5ucuxpk^JB)u^l|eOgiu#F!{~Le^8PBAjHF z9Ilk2#}r^}j!-+Kgi->;Nogq}g}C*Fk0v!K*s6$DVjzY1JZsu-zZZNX8N<(5WdU2&XTY zd!bMpPO+uN6#!XGq3EG7f^iN`GoQ@ANc3l15k0)$GF01F6iL)dJ#0foNhw%8Bh_xpEFePN|j@ytbO6QGe&tVt!VD;cOG|LghPBJ9_e-R<0UKL&XbJ^y-bQOO}++8$Um} z;{EkovZje`1amGlFGBWiZ?=09RXL$1MBr1w`XY8r$EKmShJf77$6N zzRU-S8&`~lbVlOzH!v&!C+wTE?1rI=6`z2_65#lGsx}RXZzH#S3=x%Fo038any-{r z)>kZE^48|9)Z1mT1qzY$1gQ`Zph2IbXuV%8DsO@jLNtD~T=$y$9{-BIXoT8I!6XKg zexpMSvic;n8g@l_sO8U3Wj-VG~k zP?hi7(5%F9h-13gblo$qy8f$^o8_wJvQVmo8u{urvCpsZNL-8V^QhvK8wmyGRFkE8 z_zS^BQ{$&>G8x$|;42mraAsS}g_rj}_Gd~s%n>nI>i$zYRcgkXO7NP6WQW|l7zs`uAylU(A9$vF?9Fn7y zq}q+H$bU$qSv-gH+lsH-{HWy5!j z@^9xE1hQ|UYL{Noq@e0HdwN`Vi|d~LyFYzvYKyO8L0Ks7(_?-u;#Z@J$}e}6SFqS^ zh~&!{SiA0&%dZu_U0qVskN)B}>p|q%m zC1dA2QI8d3MU+`w2%Dr3cyPJ4L#|B=Rjs~Y`vFEoILx@};U_n_ZZ}43kAbNVay+q3 zD@=;0`t))`4qY(Vj=0FIOMN0iy$0aE@@~u~KET4+2{XuOu5`i1*vwD-S0AQhAODWVLXb^gX}L{HSZO!-*fAf~Vl!AnH|ERyub3+e#F&a{6hKTN9t5D%yj9kdFsm3$ z$$_LCh>kR(zQCeUHFLi|vu)Y(SGT*M1($_=HwO`q=69bfRJruQg-EN%vO8^JXPGVg z_m<8JuDwN#{pi5Dh;Y;bDG2L+xw!_Il&azV>XHh@Q!uR5CRpnu$^0V(SUZYF{klVYTF<2LHQ0VwFj`id9%-cf zG=7@c;+wc=zw(*MpZ(_XRcq+`+>V?qrfD4leh~;99m1+n^mDP69!*tn(E|=g^?`!O zW~=Xr2U?6Yh}$swteaNWr2m3FX87S%+j}ng+x>OXQ@%Z|`Rmid`>73b)nc)-S*~uE z*(#>kuM`QJ7lRSS!(b})u+@!nbz@o8;;+@V9XcyMes1aqSN!FHS612Vjah{BT3H)P zz9u{%%qX!~+~FCW;|?bAYY|odj|a@BhPgrnFfZmZMDQ9m5WjG4A+Tg9&zrN!v>tis zwO`%5EOXHx=bU!o(G4e;R?HU;n=TwRS14aBRX2;(Ei~RzjJU5hA@ff5u3L)v{E>@` zKwTpOfm&6wSk)vb zgiEiPaoqXeogNkruN4li7RF2yj+iMNF;|$dNSN3tR5S^d&AG3YO+rPpFn*D6)O_LS z1;XLeg+nF@-w%D+sMiu_gfIBb**D&P>$9)_bM?C~t=sfoSNDewm`oeMfSN7t*-{1Z4U!Pol-wSUne{bVj6B}7cEWWOp)NS6H-@H=_5J>=WxMN|zd zp2$uiS#Gea0T}`2{?7*?o@Oy81Od%wRV`zRrid9L47nS4VD6NOn=63tyT`u3Hk*_9 z;~`AmS8~wp;JzX*XLZW8jsP;QV-p9lNoWUNrE!ye#x_pWtZ~5Q_ALC&*tca@5 zIj~4S61IMMdmpg+2V^@KDBZjv%IEuTzyH)F{j5=6O(2Iovacd)A3{~}D59!(6j4~Iw&SxhuyGzbKODK96b3W31I{riK84F1Khg|8n1(PojC zdadsEZGQnpD{<-(<#}KGDBFTAEWJkYA*|7aMBJOGv!(il=RH z&2V%}Qk6iZ3OeujC=7w3QtbsHAyNg!z5eA#8HB@yN`MoD14aG|AAkexrqq(afua&2 zqN9T60zbgLMx`>rVW9*sV+8!)UJB}Rfq?PUEi4!{spH+$(7Yg@Um*%4!6QKU;j(|5 z50Q72m6c)43EI5)djY1I%&x8;Z?3L(nonldwmDpAAY@5axP$m%Lj)|{ba)Hm_I4Nr zjt&ma($mxT4)|jmX5j+@K!yZbXoC&eOtVhr0XgH|&a!v;qRxmGQL;CjWM){cUtqwF{Gk&(`dv z*_ZH!AA7SkeGO-u19N7pKGx5HY2u{i<>e1N9`Eb)Cnm0-{S)RtHCSs0GDYw7`T1*w z94{=;|65OII9|^q*t!v9wD-PgED8yoAm+2H>rYa|a>26l17D{7$NBmB%+=xhmVTF*B2!a&u?CL|ODmY8 zme9Y?H>p?WP_Q@gbBga*#u||x%zB1c^Zonata##OUo2=CFC$~|5xj&W%uxiL%B271 zj4-4kqb=bhuuuh*y2n;mLxV5+fHcj-%xtv=2lL;CwWtUpX&`@II_jqOH+fF1)!HqM zgdC85LHlR>{e0EpwHhl;nwh~IgwWnz1*`5Jw14ZUVrF1?i6}8!ks(RwdN}W%S5gvn z@RH!!e=pfCu{|>q4kQpj(ft<8sJEi5d=Vcc5O|BYicRo3)= z08dp%XENge;|nSnAPkdi8Y{c#(i-KL)W2hFmYkF{uh%EENl=uV3vajKjr`x6)a~^V zp|b<&{0B3GAO2^G2%KCK;lO5{T3ajU)B(7Eiz#VrZ2SS|_fa@h3^8C3`+q;$J2sZ6 zs_+KvtLQll;eQrGfWtWV>|C;xrQH4BnM%!&_%rlTI8bc1+Tsjlj)(>K-!Yo0u^78` zg>i5DF<=p=>fyFNxY* zkl%C|1K%}MOHjr)mA3JhAjuy7CZeNIgOr5Nr?$xW2~J`kvfM+c`S6$x5rNOnhoHe{ z!$V#{-l?>X&mcv)Lm*d-e-Pla9~*9e222_Ei^6=>-)tQnVf*`wYADbqh+3PE1gYu_ z=gEHHniFJsk&{%J0Y|k~r!TSV_)P z?n}vrjN1Mfg{RQb$$vaP(J*`El81PWn2iS!sLFU3Xo*I-#v#>tAXxtra_W8<0G)lT z^HK`FXSxdJK$?Rauo%kx?OIt`$yb??2RCuH)pGGIG_;Eq7O2IVE`z4gmPvEFaK-nmbVnc3=3jn$OLJBS_RbAzsSkIh`6V(D#-A}ZtVRP9j* zR)(-d0xrZ z+AgGF$Fk*Hsx4L+cU_DXDe4|=ZE4ztgeit_Ip@_H-4SxXhu=zp{7E?w(SSabiM|m* zd05B6&-^fD95s<;B(I=APzZii8EX|$!RUk{N1K}=cnp>!`DopBj$5;Q;9xbcs(GPL!tz4tQB#Efm*bbAKH7i{z+%bQ?|EbXekdt|QV4yUu6k{^K2R|+l z1shA$ZF?9O^2sb{U;!&#fRo4i{+flxx^I)QJadb2@X-DJD|s(;NRSrC+pvIu=W^9Y z9M)BmLyYtvwpCzwJEj9^+cU05iw|Ei7I#?1vXMFQ()KbD8< zHCMZZ`kKG$b6Ff8TYuuUwO*!L-c8Cd3LcgTkX*rbcWVdM0lvp<7-iXbWk&yTGLsm-{j(75XllJvkmD3IAYHDNk zw69)yBS6}stL1}#THO_}H=U2_C@U+^-9vcr(pf2=463(5VCdc1+${%KZuQ0IeB^F( zH0X@m+WQ!q#WKOZyPz=&GZu(eiH^WGsg|^6%fjpr@fR zUf$@_{GzU@8OyP5_ELeW#b$2uWUVuL4j%{S^AwMKIzc)MB?=blv8dDa=NQ4O2*y8w zuObOu&(F@PdMvV16QB+CfmzLe^@LkUAws`0P*UH1SvU^!6?P*10{{3$(0pp*r{+Oq zL_}wPe*RksB@}Zl)Dp5?$EM$?I(BxO&!VgCuh;)mfu6(OPC#}d}{?-(`H}W&$KqO6$cE%-% zt1LEFVO%)c8KN?<;c#@T`w6%D1)rt_7dN+P^J3+O5Q(?2fPmM{g{C2^NuHgtnrTsW z{E?S=YOfM6HiUF$Fi^`ebguNtA@rB&oIu$kfvY8YMc*bt5)%`XkTw!OFeXoDMx$RTqD-zQ?^Ge{d$fKWg zk2C#M2k)@bie$%kS6)(LQse7g{6is3v|hl3UA)anurc)w){iFP`|0$R>`vx{M#i3vwH=Y`H`J-D68(z#b? zaa}S?0MtyZ&+n@1MUP|{lAc%Geuz3?K@~NXIGCJL-6(QJtRh1siUn2#6EHP3HHvY% zw%|*=hV$WYZBgbZK#I8%JHL6&C?EB=MlepAdRX7bT2W_UlIX~fCG>!&8x?%`?@f=h zR(HSk4&5CuXA0l{JfaT?Uf?p%QXVg|CRV@O*&ZVe`D*ik7ldKb1)TR}1^di5-aO+^ zo(Z#gB2vCuk&@H=bOm$&1O?J}Y;0_PU0>y&cMfm!sZGzcXGhD#NgilSP(8{`azzd^ z^lm!&qicF8jr;%LZ(owUq%eHbJ%J|8bLA|PWh@HrbQtI5YWmuQP$%9`#> zL~rgBQqfq_wRoQ%w_U7ha80}_8Km^bFm~MYzhB7jN%XZCD;$tgvuQEzilHdit1v1k zhCQ+Ll8pq^-yhJ!W&aeslkQeW^~W#r_2wT;m@<=nGO3cKRAvc}NK@0+cJH2^D}bEF zkO{FE{{HZHt8)=XS7U$FtdY{9AJBm?^IYMh)WGCa)!xuP+DBpZ(To8Kf02swwOIaz zdS~WQx%51xm>6z8yT!&Xxw)|jM#kkyyR^pF9o;nj6ZBA)?^{kgPa$WG3n$hnuV;Bv z;Z?P@<(0C;L{I+w4eWbHC5o}2l+I^^g&y1*OQ)Ph%W2yhQmlZkGi8(26uHQZ$*q}% zS`j{1cSU@c@^nTuwGavxVX2kKNU1#E=$?4hhArOT%g&&Md|OK9AZFbfg!J*_MZE$l>t* zW|0)Dhy>!Ka&(X5XPI4{FWD92o+0F1KMYQ?nt_3VCdjpoC!bi{9r|(HE%&h?+h&74 zxRv<%1$RV6aBGE*GAv&zGmswC5?Lt;4m{|J6_=V+lQXWV=HB5Lkp){CEN`QB?WDU@ zks&>>vrrPq*uSc9w3+v85DXF?)?rTv$deRF;y(_=kdyh~ASCDR}u@MfLaX%B_7QfU$cF zV^A#q+6P|#th(eg+MFOxrA&U)RD#))81aBhOd+Vg-hi1)cU}5t29sVix4RIKP@>uJwj#288|wps@Zx93K*j@aBPB$UzSB#nL)42O zFCa*0K@3kV=$2R4>>`j>IE4cS92EF*I7p_KU~4qVzF9;s#qRk`#od1xy@S8+`r1pOBWgEVHdBJ8RjMDc5Zds(f?~h}??o%WrZfE%8 zwBqrVxp2yaN-}|cQi8D&#Pu~+_0?12*5P}MGK1?@ZZ10E8UK$- zVqnd#g(bg))pT&OnV3CdAlH1!Q>2j6N)mELhnGaD-7X1*5cIIraXV5zL;Jf;GnnJo z*J3Go3hRx>i!Pz@xZ2jCuOUCGc?Wjw7BUA-C@8QXl5p9gbE$8mDkyMqt;MNNS>|3_ zlM&!>e^-tn%9D#;K!>mb%c|p>49_0GD-YjM%MR~}6ul3PY$VTdorwu1miejhn`!5j zUBSMC-FK#)pWRCPvJB@HHB)p2)x#gV(`gR6*cu022y{b`+S_*%jc=J%jfZB38G|B8 zhkQ<4^idv6_?ezBeSfzNjWB+|8V>(V6N~~ zJG@8gHhcMtgZ!}Pv@!DYgUn`;q69l|&xQIdHhf~HjNcN?<;3l9OuQm3yiTZPy6eV~ zmbb@Y9l)jv(mL?q$xpuH?9yIRTCI2Qn;A5U{N?u*RZVC3$7Y|sCge}x0Sj{X3UZXT z9R=-d-+|CE)!av%NCyWAA7)kwDtWKOFE0i?dE!RvRYK%H;krv8dNuY9O@lEYi5dHk z$Dv_{r7PWm@JaT14H>I9LA3UWO-i#h&wfW-jpT)JysveSI(P65raeJ=ud(`xl4H3X z_nIK4+ChAmXK#)pCcQS|fCYw#9v06DLMHECmp%lZps7KUQFMoc%YlrYk|%yG#*WG4 zrKv`4`f2S#Rg!e7qo0H(?CewP%5;}dd4g1BM%$MUaGaHCLm{?ihfA45E-}8jS3P9z zOm?GJ_XwOekWTo0}9|6id9TS*_YIf8dR1-(DP;VT)0uwdQby9k58`e52W}8CqoDrtD(f#;4Er6`hM$ z8OvvC-q&GY(yOwRgm z5AVO|x{vYRwT}Wdyp;?;sui0omD@Yvsi3f`X>4E5Zx7wBF~A(NAK^FyZr8u!NL%jP zLq2svub3dW>H5bc4dK~Cv6rGUYuSt@73veOf*6~}b>o{IBNRMLiQ%@}Rg`Ig{T=DU5Fd4kZ%AwPaXdop4+ z0{XK-VRiu~5abxOx98|Sz*$6nBXx88{3zy8zHZ8WsWWJ`#G8rnwQ_;VWb+00Wn`TE zcLnqVe8S66&K+TMPJ9`NUq;>SCJgKP&(1@Z7{8`33>jUHX52dCgxK-m-qma;3W`~z z`xG7HZ5Y0Jj`@@8O%`fuD)mJq`9;rC3BvWyxUb~F#9hnU3+`(vas#7T(SPBMC_9Sn z;P>!at_Q~ZVpP#I-9@3bGblsz9T(gBmc7k5mH4OJ!YB+wM_FC-j9wiW+(=sjL$QPo zh~7UG=VJ%J2-#^J3wT6q+_ORY>W-kGUE#3T#+VvFrw@=+OyFXUH>9{4B*gFTqa*ve zy<1a%)0#(-W_`zph8_lI*@B^;&C~>2q$Md*>EFYxpI~#_tksKMbPGj=?$SJR3*Rwu zT}gA8N^>iY!WCfe%g8h?bF5~d)6FzTx5DSro9<3MI`@#!?mh|QS&NOi|BWXUuw4{D zV1o{+G;C!|sVCr7&6~(EC)>$w; z@`n?N?;n-M_`HuqgzC9dmhKQB9GAR*l}@w7b*$HIW_d*5Mo;aw!>?+UcHc{X|Bk$bsgRbE8C_2l;}ePQe!Yjk z?u*zmm76z${kq*-U6cb-l#|fw@P-1@LK@=c3rRg z3a&T{IjS~R(N4OL)Ba8QVSch_u7NV`aLr~)@)M^v2XiP=K3qh<_u4amI1dr&jvcOT z=dK%ZpLYARGVYH&L2+Ev8Jnj-oG(N8+_PP@ zjN84$P>`bAgYG+*Uqcm)BEo<4DyGjTJ*WeLYOjs#QmMJ>Q()ZMZu0SzuuhfP#ZQ9; z_~hTTLu0@E?xR=F%Pb&4WwJ!9%N8jrt+{q|s; znEYa~bsVvJm32|aW_k9@Po$JKi-ER_SIJO9r!LE$^u{3jA*?%b6FmfntH7+>PTQ29 z+DHU4@w>Wo^&b1IKc)9{g*UEf+%EH`66)e&C>e`~*7c2m-2K^6CA6E5QPrqnqNqI| z!Os&xahKHF%z1S{CtwPCjJp2BDV;j#pxrqeLcZm4Zn$xMM6*`mMrqQpFC&F7$S$y} ztW!^6n;lL=^EuOaQ;Pr{z1z6I?Q{jPyXvr0<#PkZ*VXgEAqso_CVL(xC|}Ilrt_4= z+H2o=eGnC9ZO#;VcfpyTk6?+$5K4-yUJ#>L;P|7d;w>z68Oxw{L}i;6DX?N0o(wKjh}8Tzjw8s{!g z`@Cy^f1mT&vuEt83SBr*p=dJU%LzcX9~ujjRI-72E#bwU?$2uTRV`Z5`Ry)|{r=D& zc|%d>M&e$y!3U_|eZJhEDW4wnv&J?+_1tnY(8Mt3j@v-{JY3(!1;!6?gZJB8BU>VdFu?i2 z$ICe-9z=yi#Ks<-bW!+54fgi#VnW~*iqWa^>6n--zpF4b4!MZgpccUH9=~yXgi=GC zhR4l%5?YUHcf>FKFKcX#)-x*CbH z@(?EELjpWJ#0xU4^@pM0GI~dXg+Q4u=&j|py}yLYq6(b6va;P(2ccm}VbUPa^2KC* zZtkjdg;vRe7>_DNEM>rp0QWJf4OV(saThlhu9}geRuh0mt8c|fgBR0Bk}zC^TxH+)k$1v=ibe@#_j{p ze{*0!YJ;FFt>3?YZ|dy73Ig5B{Gy^3NmPsD3l@AdaW4%x?G~ShFhk#m_ZdG+87oss zz&Wn}owsqK;rk@x?p~+n5Xrk?X<)|hFg}d7 z{mZj6?rZ$~SCLUsUd&=*VhxRrBKIfG9L8u0;CFM}hD{UQsLz{$-d(I#v^=3{`0$!8 zkre_#!87@YNr00-alSM5lSg!L!$DX4rs1}m1}qUFrH{1%>bZNnMxcvB%z6 zQS2?M)Lv+%77G?PLv{=}FP)#PxSe;$_X_*}gsiWdq=bcqQ2n3GwjG zBHz-bmvxz|h(WwwkduoEi-_FYprC|CLq2OavFG$99?jQPeLh+3*dtE4_Z{CG{WOv* zZ3nz69`~+>fzKP4jA%eo5nEM3_q~QjD}9#NX~kQ*F*n(1<14u31QiCB?4&PvP}07> zzPYWU@&PSB}nwRCkAb_xI;e1w8c z?ls-jRn$E(anTYRE)^@5I5swhL_|dN(Wt^L29}OWF5>p<^WEivrrA*DLldqKekqa^ z3xR7{Ny+^mRBvq-7D_q7*%%2OdOEt8M1_KqjktkMCQIXZr+1XrdeAj}Agp`bq7$BAU_EJI829@etP*9LGpt&=O zhhQ~5Jw5My|44nZnriWVGIW|*Okz}n64!^sWn>Ht_4L%`?%XUcE~0}4dO}m0mq(5& zHKt(7cZUr}&Eq+447#?KBc`=)E zP^rv2W5^nWKtt?$%rk!IIX=Gn3w-=)c@_eSsMy%2`;*Xy$;h8vsn(ZUS-w8o4_8Z5 zd$2h%@7w$D#eO1#;;;DmT`RJ(vPNusm_S4M#?32`P9aCdO(Nw$F+-^4Z)832ch(7{ zH+U2jStQi)a#{A29sn!SSrU5+EiD0Jo$g(qteGpP@f6Et*`OoBPnc(U!%oMP#YKI= zDHIKgGxzxnPJ28EL3yCVm2f`r#2;0Nf=w!b14LW~SD)?eXQqJ12Il1C>;Ox4%q|js zK`ZyC)c@pFg#>Ml~e5S236&%r5+S`A?kh!=9l|WY<^_^rOACb%`7$&bEhWuq!~O z>qtWOUFXR4FkBj}z0b1IYdMmk2IJ-pv%tMd35OU9WXC=!-&uhavwRN%T=T5Uo7J6j9$fIuf_( ziXkg6)2c9vy>+%N?LYup8Zry7DxPIqH@9o%zAvm}ZaQz$0b&oVfM1fTexdWH)VR1~ znAjX4D&hsa|A$hTOI=-^Ow-rQ`1qWIOMRy2;Tl9$%6zY0y&8b{!C{IXuC6AKl8^{1 zP~wd9!pcFk%B7MF=^q=;_Xc3&M$<6jc9My68cRl@u#*aAdi_(QG{VYci2wl|bu@Sp zi57qYX1rW=ONFPVmly?z`3Bh7u?6+jLqkKp%)-LD35*z@KDAkkm+C%asR0_?Q3>$auWG2Iff-0 zOFpro^y>)?74^Ht9IO;(BW$zOtY4m&9+F>B&{S}6Yb+t*r$jhZ`WN7%86MP&#O>=d zG&D3qv|#?Ig`-^QNa}If)*+8594-M-(WbvQ3r^WXp43O6H}Va9qouGzP>JB=qkAv1 zU~81>@Xk)d&DK~I>D4<~{KgvlAt58viPF*0;n4(>jV8XxgYkg~smqY8K*Fn|!<76y z^(~*FZUdpA*U47rxiKTqTuFjA2#M43S+pzLzYCKnV-RuWLn(21&YwQj0y$V|3ks%F zMlj5_H;IuQ8NdloE_-C~KND=hukP*LnS$-e)5W1-V3eF)TttuLSZwvBuwze}&w1aT z%X^53{-(1{VbdSwJ66sEIEzUg^ywQ_nySZSS6A2en$mP!Qd0h6>ZgU_H9Wev(-yst zo>zw%)5FFh%V=r$rxm&ofTlsL521;&c5BIVIgQBfg+B=P;ro@|c?ITTT;41{=u@kvRok6C*r zA?T=F1QP(>qdBF6GyuZRISHT~_}-kFHqq^zou4D-sQgw|BZ10Qfrg<)y>#_!5L$Ss z^79J$>SIJ|k#BBp@_0*YY4Fi_$;e)+1)_8BXrW>}CnvWZ&0~H83`x|NHCCit&L=BN z9lwoLaIVZckoI)5mF2N?xw^G=l?Pi*N=!U+F)+v+JaZPsrJyLc*?pcNo2eW4aCc?Z zkV-xE_8svzs3<$T@`qL*_SDaF1aL)MuVq%Yx3xYDJ|e=u{_>X2b94Ku=sS9t0bw5f z+pou4mj`po@P37AAIN;39G00T0R(BUW1wzh$XV;1E{WRaI3*56+1= zhh&lxrl^X6f$fiv%WuK4kS_|UK-7|mdv|^?r@5A^*4eprhHk=@3Nsthv(rfjq0#ht zxYQbN5;$_f;ZoEY=yE_4lSu5&Mi7}^jJ?fM(=#~u$ReD6kv=f{^;1JF4GoQAO;kz(B1<~8|vPxZW$%wA=& zdt^n7zYCnww+`pM55T<*1+vui^RTYAHp#h!TC!MjQ6c#h8~FlNlw$QC_Q5i`#O_$J zN?mmo8LtVGRMl8HtwtOASO8CMlB}nv=cJ=Lt^&}N_^2~Lw#eekb|q_tmPk0xYHo3K zX_Sa>0FKcQHAxw1)C&HQ!crsA*v?k?fRg0VBocriNRVtBn|S}OSc)k(Dnq(Ni>ABY zFEW|p_FIEwY8k{_e2rN1sK9JBmE6uKV3NL?ns_-{Og3V~SS3SeD&U&#njoZ9CdY3n zuWxLm?vBU3d#9jSPh+(A>z4Yb)7D@{CmU*hvK(#iTRFK8JO`gZhRXpNcDhoxV_ne0 zLkspO&HeN$&B`NA^#>@vCWZ3+hw;@7mfcIDd%`-2wVvl-59vlP|`^w7aw!8Tb}Xe7Ya> zAhkMH8hNRLwjENG^NNN+F*qtTaL2#og*P#5jD&9OWR(^mMJbhS4`Egx@y!^ znD+V$3cB3djssTB-<=X0xxC4q;gq*@dxfQ#RQV$`{(*skHH*;`0D_5jc6Oe=o-wks z$EvHVM~xnjTLCn~<>$)_9(5u4sV^fFVxxa;5%9~8=Xrs83Y=4+9`6#hl4P$*B{Ka0 zH`2ckom5r-%XN2Le(i&VK5B65UJ#CrF^`+?g=fc+@!3%eZ|Zr-K*!}qLB0P(Gme# z+VxD%+jK6|UM?FOHpAxoSej_y%`L?%jL85Gcz28xtXI!&bhgfKcZ#xPzS3iIyn!&~r&sZEZb3M21TlxjfVXvzXUG+(Hjwb!HWrlX1JwXm)XWh`aH{pu zE#kj^efhXf0X_ozCreTQR)djSI_1jm-@hliQ2!noiEjE*X0BMg0XTP)b-z!%`$sqn zT4)$s;;r+KA3sj{T$X?bWXYpegsEvAZ_Dik?pltqJxfSPc#aI2wOx6Sgo08;RJCN# znQ1$I)}&fI{*HxUkL$rz`oo70!JK!tKm>2Cqmu^ncZ_ye z{^Nf$JFB9g46^TS)oJHRP%F;*`T1R>%>osDM8|5(Zr(sdLlhGY^rSL9@kRh z$W|E|Hg-#Xn+MB7^W&|R;#5Q&QXBlJTmHuaQ3?Ofb;{&(qY;m%-RpXWbmYEVeiq$J1Q-oD&wx-&L2Gt;ceIpyQmz1dk%`&%`z&Qh**#V!@MX)zXM2)iC8$>ZFK+YNpbG_Jm& z1SRZGpjf0prP85S!9Yik;H&hcs$HHpv*gC6OM8<@`QFxz^^qikNsg=EpiNZh+!v*f#0J2{hd{-d< zUT~2->W%=1QV&#+&mi01_Z%4ttnXtErb>0Ms-&nb&V#j!)o0nbxK6#Kr8_9e(hOgM zuk_e?S=@6rB%Xx`Nki^&9^r^z?m1cOB6)HVS^MO@*-w<%uUG>r-1oB@Zb7or)}H~} z@hn_wbeVXwyAoI=;nyz?8|LW;>0-bqb>oT&&AY01qEXz%KZjR0Ko3y7D~_R-*KN2DR|#SRq^KAG-K zmWp$)*Fkes_RfzBp^O7#lXSwucY%gJH_~h4XCcSu=RrM*^z87EWEd<6rv=n`kGN!H z51Hyryt|C00EB91o4xPcjBTrmt1jS;U^l5A7rQl@DGe3mc!%?KXZN7A`p2|c9bYb9 zU%UOCvuk;f23srzOzK7IoXQKX13{}WsU*>+f2vY3dg%rztaIwkL{1q*&9(|DB z0muqXK`oLBNK06fDmyd`hP`mBU1i2P4`kR5m?*@XOq4UR;`>h{6jZbYYNyGpYs|V{&T0YKv_+ElB!!8J zfuRoS=KF71S>n1=TNvab0)y4XJn1|Zql>_iWOj$gK{?EC1C+m80RaJn;5Gs(6&mET z#*jnn)B0$M5V%a2z5UpYqrO_K-t^QNolyCL2Q&|z75d3%q6Qp44N z0pCD{B3mpoaPaX2moHRx_8ZP96F>h{6F5_b8XA+V;^N|QIM2Vc$tl|;(#Ok%AR{8a zv?&ZI>@P$|El4sSck5GdUf@9yrpc8Aqp0p68QlC~5u z5I!K)i>`?(>3n6yiyJQG+uq)eF>x{|t#%BEs6suXoSt7Z@Cotpf5-TY_l#w-5I)Vp zY~De0I25`%3eVqh_}pLHgVKHWqV~v22>u;K_)g{h&v$NaZo@#?|Jc{ju|cn9-2#JM z)-tLUM(I?Vgfm7~E5Rcm)Ufewtgbq!(mp40)=BQXJ|9z_I=%5S{-$I#e#;KRE*Bl0 zS(plgSBk3TmtvPp7m(T}D%FjtPg}!%d?{1Z`XR!KHddLiB!kN|;#Q?;`sEzB0>;U# zQL-&&KKRXiK{2nwPenoaB~4)2xcF1c)5B2;0s=x2$>&;CNhV-J8W3W1SH_Fg5^uZ` z`x|poK%hpmsVxLa4$;Py9m_5aYsD@20a!pK? zZwGi}`im?f@{8otn|lm@iDL+46x3P2mWGCu*F@Ae0h?KIqhtFb#naRCh{KpR1F(aS z^i)(UU)7m1(gA-Mu1cKWHrltlB>e?+VYGF1txCucF5>6gz()B;$HwviKJq}rl)*&i zP&si3$N3lCg4?DskQnL#%Cx?#yhshCG-4ZDTT|OK}bl%%ku?U3+RiKs4$3_WvcsT=vcke(<5~PWZEOh*$p7WUsQLcQ0vNl;zL72v+|P) z{}FZbda2dnv9Z1Y7+Ir4ItWyQI9_Up@**DAv| zxcK+>2bbm21H|HPf>l<^ceuaa&3P8TPOd~vw zw`v9uRTtwq+AtLclMOxX1)+qc3YW5-4?J32T>KicoKuWhzynH4i?j}coTmhZ0@fHZVHa?R&i*fr zo!yZQU^X^Wg^{Vg%;>>^s*4_^h&o3;Cue8ECYe&*v!Hms18nBgv0DX$q5|ls7>~Zd z#jT{}b0x7;RUPF3(Lh1B;Tay?@(y!;vO1G9Xk!S-pb{09+FTtjnBU#ptc;*oa#{cI zlPegV1cde-R99GG2iMp4sc%z0xa7T(5>|SZo(oGFZcN53M?C&QF1)co{#x8xx3O@%CNs`;doWKx*vITeDo zH(c@e@5|sy3+5AOcwRvyJZ%mh?(bnBwze6W$x44IC@H~ERa47SR#JNW*0zE%y24lk z?z|;}TQ#;{smP%O9hT^F_V%1T-Q6$K0Lw}NL7hCYxT06ioF9&G{~n$wN2>%l_yx+u zp3U7`Qhy1YCWnn)Izd)UsD_#Z7P$yRe_!98GULJV-BHWk_~hgzeUQ_rG2X6d$FJ9c z$lCq1!C~IJG}Gcw_f6!uqlujvBCN2VD|mpyuQ1$ z^EeyeZ@~jJwpmPQi8=s`ji9)Pq)~p7PpU+q&gBGMG6Bc3&d@mCUup2GD=V+zmhmB= z84mj3wUSV%2_%jTAKz)Ft81(J=TGK2PMhahL+toy!RWj-Cl?nrv+)mVg>P@f0PG$i z<;M(dQN)CWxdGN1J5~tC136tEkBrJYC^)z{QVK<3M-(z$1%i7GH0PLjt`4Wt1Q;nG zjuD__6JQD6E&S9huL7(k_Vi$29Pb~R|88Sfq}l%yC$PK$B83a>a4c(c3Dg;6y1NgF z_aYXRNQi9oF&g=uPNu@h(C&I5rHN8oXqwA7AhrHfi4-)(e6H&4V2$i}O1XF!iNghM z+qnD^d3E_l_-L8g+Wey=7}P}+YD(H0X637!lFh{WY0!l{D9*X~P4r1Jz1^%0p@7E= zr(@d$R~p_K_}osClR-2c{affk%9;{a0n*_As5suAhOqUnPcQk!5yZ3}A59OoeWBBr z*q47?3F*XpYn7LI>m70Yjs|#4CSH zj5o;Ct+Kfl0GEh@ShSR|#>*qMB@v}ATwrUfaPj@}Fq%u-oayqUm6(mvm0?E%H;D$y9RVE zfbMvHY$sSn1^BD~lzgXP~Wf&G&^h=4YARKh~0OuYxS9WG` zfz>5WLA7|nm_F3Q-z?}Kt%pi==6QWU@-8B28WFyI<0w|}7=BH#RA~7}k(R_G1^sac z+UF~snc=omUHY}X$xs>2iQ0pHQcNlny0-NkcleX!ZdBtwu?E}nh zcnz8QLXp<(+*C1;TdRCI~HtVBN+%<;Ek3o#A?KK;?m0UnD-{N6xr79{ z)JbC|^~CZF+}fNl*=q72Xhc3`nettmUpn=`g&TIp=$76({(ADwAkQ{kfC$=+nj}y1 zus7PQfao_^Urk@O;A06cg!C0Eo9P4PnzjfE=jB4E-S8TbHwB$k#9ksFQYyy_==b=q zKc3&siXKT#7INgsy&;B-}13JhW(9dC9H2 zdItv@@_p`jmCIB<{!(pRIY$B;b}izYyQ8Yj{!i=JHP8@3$l(F2|FV5c3ct)Sq%?0> z<+IQKV+q@p#|9b{SB9_M^+3*cE|O~pBob&{#UwOu@%wgHaKi|ADXz)Q8Mb|w_zUZtA9xiq7V}#2xiMe=4?U)fNZR%z(SqgrRdm zv>SG)P!D>Q@>GeBPpF7+MWTEw&{E~Uy}7LmMOhO1-k5gNMAd-+>D{OEB{mND#D@?o z;q>{)JL^b5g1AaXj+OoMIYyU7x(}BTOOOh}E?<;(HcQ>Dbe>B>1EyE;O-u9B%=f?6 z7(np)y%HN%EZq2rU&^*K5;nj=>i%HAm~ZOoW;2_)j2N`QYa|tvCeo4IK6oxODR+~4 z;0X6Y#TX_H{BoLQe~)qb$ga_k57d4!yNO09jH5<_#KyY#%gy?+BjrGYdLGFM3sq6m zto_(7Vj9k}d8AfO5~3j(tFzZbM4(|4Weo!~5xM1LtbSY43mGB{M~u*69)socp@Sly z1j~FI>T#!}4GI0t6^@NmG71Qeu_AN+FHhV4m!VU9%buzzS`z2>4_xlY7x>R`ICkw> z;cg?x&e@mN5u;eSH(hWm-}jcR893V?9l%3!IBVbNHFz&suT)G$*C~P<5>m>id0EjZ zSvQN@2uO^7^bPK8g~>MeufLA1*#^A~*Qr@78^&3(@GoO{l>uKRw!-p`n*!Uape z-6hZ266Nv92gRGThHf{w-(;jicMV+!J~=prL4b5!tD!A9n;cMZx};m@ZtWYsJ zB*(`?!s;)Dh4dQ{nvG52v_=IS68fpvo_^a`#D6~QsPZt0V^f{ya|CVf119nePyX5TNB<9wiX!ZyspRHZ{pU&z=}X z-iP^d_@1dEBllY$XfEs!t8wVD^Xl^uTV3M~ zQ^5HXieF0GS*8AW5121VKDyEyG-43kc%lRAls#8iJ+9y%!D2Uxr9)| z%PWbN?30bdg1}hzmg&D+F>R|?DZ6X)Zf(+k(VbXWQpm&`(mtyIHE_Z@c>ss({+v0| zkO?Wfv35iiwhU5_^?B5F@t^cvO5bMCIf8O!)xS6{uWWzY{C`X9mq`Kx^GdV?e)fKeV)@e+Y3LW7u2k5h_Xv0|LFA} zPm|bXDs#7xgEHV1B0u)7CmcKmRJ660xX8v>yNH|aFA0h<-tp=L^7BpUrpRxgO#RcZUF7)Y93Wnu75S%r^W^7L>Zy%=$jkNbp#m1P{wRc)7%bF*W`Od!~Yiq41?BtGf-+Nwu4BH0;^Gbi>FExzGwH$f09wz6TY6ZQ9- zeeJLqPpOK^kQdjacVM4q#&|kC?|iq`jS)ot|| z7n^(~$xY$In#Se}M$LAIcN*IKBnf7w23p}~wje`z34j46+dTOe&2OT5nZ2MNmtaAP zg4GQsSZ4&-yXE~%h&R)}w)1A*|;LW&%PB;)Kh^UCw`j#WP>@{MI9y<&QNIaj76 z3s*hv-TPwf+vdY5rfvvm_(*qsI z^~>lOxQim+Ymi~>Exsd5jy!FIvzE_au{@X$C$v>*8~bO`fJeh$S5Z#L0tC%T?#0E- z0XsTZh7`&MNrv{k6*9VO{Q3nUIjJ5s+@(_evx}@p+;RIyl2jHdj}om9s<{VqklN|F ztk*dQ{P!N#=9N?|)p8bAEbAUEWcsd8ouodE(*?}|?4``x2JCihZd?yHrQ+WzJn4N6 z6YG|DWf)Hww+4S5@?v?d%3faEFowJ@LZP$&_kFGfkcEsAW~|X0NJM$MSztcn%|W9g zmQp3Y(gGg7LrFlUE{~j>#_-$eo&txmgUYXKnCISXj6vNl-F0g-EF`gii@P@%6kc@R zvt%3ei>fbT(j#xD`p5U915@GyribqHC8KO7K|4;<^bu#&Ai4Zzb}#JbY?zVJLBYVz zZzo4K1|-G!qDHg?RfFJ1+FLGW1AX(0L*9+n7VaD7O)6FH%1LD?!m`Wk-Q*G%_bsjF z`pm|vBF+aF9+*!sf4zYBPDXR_=nj_;mlumFlh&V#$_*Sl_J~T{Seu(#zH!tm@tZnq z__L7jc8E*rcHo9O&s`%w8o2XLbUCHfU)PbDl0Q*PT+boY&`!?n(npFd-paj=JcSIF zX`yVpGprL*#lAJM9^tQg!xEz)2k)nHq7^z(*PKq#L>4sHUd8?fyIuZ#PVx@hsTaj% z{mzplOAV8-A@@WbL>X`)l3%PC_9W;N1)g(Enk_}Nm-Auc)2R<y4P!0x5_D z1CxypZ&$PAt~1V(El}1RC;beY5V35$956w5pDo49scx(>a$8~3welE^QH#Sth`g^* zeqR-~#yu<5V$kQJT;lk@f5Y^5Gv4oUi}lp2*@!OR3OPQ+yzdFa4CYS=%TZ8JKzRhp zLhY;b&KqJ0*9utj$bQMU$cYtu4eq8TBVyn-HgJ1&AJgSFX_3%=C)eOqHaezR=-34r>!?&MF)qjWIGzn@JAx}&+jPc| zZiVjA%^94CGpfasApIK{Hu=qw!3{NbL0PD$q<9qM0WjZr&K6YlY-9={X$HoL)Gj!i z{~pnt|Mm4M3f!NCZZs@vrB*#i=b_?ihyCX)+*ceoPa=^&g;jqAS(bxx5ifgIonPzR zY@QV*&6?edh9LuvqQyudo9FS2iLA3^yM;aho)avgv%Vj_wftzJ=qV6IH_N2P+kIDi z9}OIKR6dr}_q!A&z>?J>{kW%hyIeqICSHPm6`VNN-gV5+)4FwL%INPXg86nEWe-%~ zNiv6&clP1YXaL3o;w{SMl~=PrKcOIiQrzr7eu%R%enWbG*an~@gCtSt8w37wi{Ud`YJc`haof^60W@5C{_91*n&0i zlG`nwkEIKrUc{jiHzss=%2+$gTA$)<@nWJaHt>Z@{vnO0N}gNQ{!#bc1QYvYx7c?3 z+NLCz9jD@x!n>)VWX+ECDJMn;k#pySwmv6UVH<=PHl52E=c=AChaGyKFf~8=C`wwYKm7qsck#Y9 z6Zhtm`I2`qm=nZrwDo&o5$^!R{2^xgpH^crR8{s7<3=MC;a>!%yp~ zP&KLR9|-8K9yD#rtVgnkyL;Z(xjCg-$NE)?kLo8!24&H7?5G8t{PwAgkkF}CeThzZcpr==Eru{vOx9Cj#B z!#&$+A+jMUXf>6m-I^9z;#ZH>*4FqE>I#)aGJQi~pi(2T+vbmY%@pMxl3SpWBQY(v zDK>w(#PGd;`wja9{>!QQekN=#h8r3;9&2S&^1O50GAYa1d@1rtK<;7&IdohDl=Ael zv$I=+>T6SNM03oaUnhF}69SI)K{}XZc5wrZ$nQ_z3uj@J;fMz$MRpROx`bK)9sMbf z9V3BvH#3h9`#HBgb|UxJ+R zW4Y^eG?DaRU{Fb6B9n(R#>#t)g-$&Rm{oQef6b5QzSkg~da{Dc=PvkWT(@X-1s|1P z6ALr&%ehg_fM-0IeiJhpZ!NEzaASO}!xjlW&P!_K_XP(X@D`$Xle!Y!Q<6qBbUUIuow>c_v^>WYK z&j@<~Ed6*{^OMUHql|c2bP~usDx<;^_W>%!-#%z=lXsF zBVM=~;=wWH?%tj&qB-jMLz>q6#{3LBMmWK2v9?6a3wa1EG64({Ris-L6B1~EktUP-4B56v?JOXW9nBX&ovdq^kBg0wMac7y3 z2s-F4yGWLfv6ajhC`@l#i{ts z+qW}rjpp7Z(m<4@%zLe9U4MUnbqfnZJ68TUQY_O9jFtF#wFd*hHErR+{wS2Xxo~D4 zLh_!UPMB6&M%c<=YMEL9fPQBK4{mI39_7Q*;yKUcK#d$beT-^;^AR~(g&I2krn56N zpBNo(lThLTDsLN!^Z=5cmWokIS~?je!;u_EVuRt19>@2K>-$jxFepV7=9ZAqJk5@p zV>h;G+AUs913-(zLg6_%IpSCCj1{FTSX%nN}9?kcaeAg0) zv;Y#Ae(!z!rlP{7i5Ly&{P_7ZmrsriShC^TcvCP|Tyr$RK z*?SKwg8`~bahxe65{VECGxyV|b(YLY#j*qNIc9k-#8d%Nz{xhiP!piKqkTQ)0v@7r zSJw{!24AI7B1og77zR!N*ezXwq@Mh)X8HG>|F<+=*+ zSkRyA2pru96$NPoGtB|8AKA&w2?2J?{y?%7f_BTW>|}!6+_oR^&VCp$#F61CI)P1W z3C3s~N21CATinSNoO_*oMyE) z-RZ#{R(=-&7%jz$G>8d3`S^_m&l&RNMPA-t#It7#`F|k*cMRkx5fmrofToJ`2n~(% zWMZa;vvjuZ#IU0Tsf`A>CyI0MK2q(H<)GgJO{=sq&e6oo`9-m8UBeN_y-QeC0N(0F z;lm`ChkJxX{zAIB{iMWNc=WN=gWcNueSrYlSq8chIX*sK^6-tnqMfjT*H97Iy8QqG zF3NXWpa1|JHv#*7!&Q})<~oRUa*nfi-8O*)x4Sj?4L_ey+B-lLBY1IKVP$0n=lC6~ z$rQJ&m1ZOc+N2Jor>Acf$y^8E*0%7|v6-1%`#MvrTEbR*ER2sT=9jvL#`f^=@Tm;w z*{8tIuYR#d4CFg6K*fQ(uM!&@8x^$?+29E;!DjcHZeeu>VPq?w-MO9{mY<&PD}t^7 zt%+kG)P>+x^#IlWS0(GS)K`yYrxwfV{CXxu#(sv>OiVEDX7%ONNM8@BYl*iJe&XER z-D$bGmQ_=mcjC^j1Hkb`C>a2RcKdSyzyd2R6#)Dp|L+JIbUd~e{r?WH-iH9d`MDE; VLnDRRDjfhnT59@;>U%a3{|83V1Ev4~ literal 0 HcmV?d00001 diff --git a/doc/images/kit_logo.svg b/doc/source/_static/images/kit_logo.svg similarity index 100% rename from doc/images/kit_logo.svg rename to doc/source/_static/images/kit_logo.svg diff --git a/doc/source/_static/images/local_laptop.png b/doc/source/_static/images/local_laptop.png new file mode 100644 index 0000000000000000000000000000000000000000..b247b98614f80d03f8d02427e83a58d930eb080f GIT binary patch literal 24793 zcmeFZc|4Tu`!_yPwAe~zr&6huu_QZnEACRx`(#8nR>?k&rSs*fWeY}tNah%ulfAn>^cM9(W zfk50B|2}U70&M|)+ydIp0sOPz(Y_1<-AcW9{@hiM-sxn{_`??lnBLI0wl=n7Hjd2QPM4(%LIv2@dTfXkkZUV9 zY?gT+^7BV9jhr zZ_h}!EUAg&<$o@kNd_&g&AhU~E+e5bQ^-#q$Gi4Q4e-E(Mib}&V|A)2;L=d3oxdkg z>$#P4wM%W<4xZI_1}^ch&5xHT)xEYt>-|+|)7Do-&yZ$}WJs>HusrC|+R4mT+VNn| z`s$2Vqmt@D_wYLke@~|!Cy`8!wgNI1 z=9QS$HNMZ0dhz!o!yPdxx*VzfcA=?nh5R@wo~_wkxWIjnJD~4xbeVTY`3|xFvpxmP z4`Ny-8CbQW&*2w~1A@<3o?Y*2Xt>}s`#+P4zYLT>9qlDse=koq5$M36-Ieq@>WdB^ zF>vh_ab1@b0}T{FcdI1UcHGPIYln9N+u8NSfp~oRZm{0{?b+@BSz6Im)7iMR>kZY{ zy9)?h-_Y6Ew53c-y6$>cf6bZB_X+ONS?3f=RaFrOgAIH%db zb#)BDLvF1Zhqg51np>}9d=Uxd<$7g+u2hi!bjP}#nU^My~~E~n3L+ZPJ6%au2$L0 zh79(jQc`CM`lQ1{gK34d5(?(@ww0ANo$m}W(xu^5u8J>g$1)n*>|I0MZ z=gQ0JwEpb4$jC^w3%7%fIxpQvW1GltkFS_4SixYi1HzB+UW8E}nvW8rpIm>Z;*-F3 zOp-B=qz1XLhwWg!;aRl6vr&=QQ)Y{N27$YOe!#SGfpA)6)gv?Eh?*U&Go+0!@wI>N ze!_M8e$E>^;O7G753KIf&gv2i8kw5r$OGBNrEm5U`!k*eS4gK)$aDF}OOlL`GR$Ar z{w)0ZKG%GnwwqaPLt{Rz*~;qm=_uXsr7$jHDG?diy87H~Dd#~(5Q^fZusXDzE#6`) z4I1uUW=+Y&15+IYm&jqN?`Xa*%+ETk`%=g%h-ygRvKHy+xhuT>Z5eVb$g&*r%uo4k z7oL1cGV47il5Wt_=V4LYeqOE?1N6dRanZ>07VDiUO=d5dblgW-;X0WUO^R-0Y{=Itad16_P20!!b&-q%=OYgAV?Wfs) z%nt3%NapZcZE|7sd3pI{ELptmtp~ z9EEdx7@cF3ZGNmiaO~33IUyt^(f z{ak{zy`wdgTNnAa+dh}TVdL1wBh4S;RuHI|sQQ>b@z@j>xRiMYbUZ5O3XNrXE1f}9 zmGx+ROyE6tT?#gIL*;1PqZO5t$u=%o+*`Hc99)Lo(TzsJ5e-Gv*H~4}61N&R%toZH%r zMHG_u$xaq8#Tg5MrqzdHIV*nI2pT_VTb;E8MxXtb+dn_vqc{13YYT(5Z6Jlv;|2GV zSQa10>(~ssf!L$`U+vP~<*5}Vfy?^RLt&|v)vAnk1<8xbUw!2~?K!J>CYC zaKRu~maFjYhPS7fZ@YDz0S#tZY6!&Qm**W&eOb-+2MKKVmR^{var<>xXPgP+XW;{| zM)p;ZMM4^q;I$X^ccmmd-%*Wauyi1wkl$BuouV!1J^E$uX?>Q_G=z1hrxVbhtz!di5v}{4{jcl_AVH=#kGS13{^bnLbax>q?M(p>f zx9lRCRF;})Jcs;Yzi~F8dn>D}>qxK=|R(iUYyeM+ifsVwO z8$+2_rtOEmA2kt-Mb^o_ZGY?horT+)>MESyjA+m#j&Up6gLjQ0ewO-HmlZ%wX;LPF z&-&yyoD759Ex;_)s+%w!Db>i_1$BAQsHOE*#U{{cpTBdHIlfJNTpt7g7iVx_Dzvo= zsu2u%GZe|TJzAU|7o#1-b#@?bBtZex`JY2nGuZ$d=Py-c|7T(84_E#zaU{wWWRHG1 za^G*=;^TTj@i`?gSvGkJ;H_R)mfN_`Dw1T*ZM!H_yz`s;<@+`^UZ*x%B|-tD27odj z3c!4RlDTizn-9B1iL%AZXF!H7A50bU!&jG%1#(Z9a;z{PQmT+K4|otZpn70CejwR? zMC&PlRtF%Wj7{`G>aRyuSsZ>tcX0)n#yPGEztOf0mZ>_A&goNyV@Q>B%T^ z`mOJfYAg#~2ZL_G{oC3=Cu4!L2ijnh1`h$7MESQh0o!yGwgO8cte0j3M1cTt`udm- z?Do=%+jS@n&sZX#1_hxI6~8zZfkoYT7VZK4z@u|E8eYl=> z8B1tkkVLf>X+q1uf8&vu8@h7E?F{gw49B<1r3+@vOw8e|U+0{@3f26EqCh1Lje=Gq znt&k4;5Q9d->$E;(Kg+MK{=BBwi^4HWSBVSW|uJEx_@!9ik9YT=(@8JU6a@WbXC)5Uyy|(6aL3 z#xY8numQ73#@P@awBSrkyZ}Dnf|T=5umK5c9_|8M1nI`QQvu*XgcwIJ8J{c>ul^&7 zn$dG<5&}*BKAbMYq`RPn9TQq2qtS<9E1J3PPor;t7>@U z5-_F_BY-lka8SxD%Sa2^j`{d@?}S(QN7sX@j>IMWhh?y^vjOCEK2NLS*L~1_J>cy5 zZLmpzB?O@$t1fsWd|qXl9(a>~S6Xoj!T>(E{2QMQShkI->-ZC&;fCyfOVniZO++C@ zo%@Fo&6?XU0*CM-A*41P3Cir-x2hs1tKkCdgI2V)pl0EK^1w!ao|ux-d+#`77nffq z4x)b-^m+b?*`Hj4<*j$mjEu@zHA(0bPnibHB;X? zA6j5b+#D@QMA>Ds<$1kdM|S`o!j>8s77HYc2;3i0D)~q;;3-JI9Y-;7Sst{!)e)e< z8nD(jYAxt$bQS7N|G5hy;Mt?aBch-_xg(y!2iB1_;|yqQLgp^ZCIN6Pu#9Fr0)ntI zy?@R{=>eWX%sAoID@(nNR*C2O7On!)tOCn}Nar<7Bu?D7+p4R6Ik%kq>`QaI&c^rY zt3Ju0RR7jN#)EihjudfCUHFuVlUT3=EsXqJ2U(657Ex#!_S>}=^(cpYnq3y>>;jBw z6BT9|X&-2Bxu*S7)UAq(cR{!)y(YHQHP^R-op(XXhq@7i3#UOLM}V_;b|YsH0c;Yv z>ZpJ7THvK8Sf(eCD>czjaKQ(aSJU9lE4XUi0>rDkD6dP4qVDrw6p1K{ZEv|2Ulz? z>4;t`FlrhCG>#r)BJZ*y$jjyaI>-j#S!@>X{B2t>WJS#pBq|pJV^S!7x!Fh`f_6eA zzI=z|4Ze{FIoD;3fqtxoH7x7|0zlE-QwC06!hY4Pvv*)4XJFZ+$*QB?NN;0UrngJ~ zU-UMGh0hBA#%F$3KaQJd?+Sm0w6plO)Fz%BC}qG77~l=|!E~0Cgul4yNNAR3XE$@` z5wX=E4?)zuNoITERz;0Jc*5Jv%riHGnOoQgv;VgN-4p8nqk!%?lZVyQzm5LGQ(-*P}{+zDY7Z!3BJMyV92~{Gmcz%mqA!#_z-FP+HFAr&(d90J;zXSy<4N z2X!1Kf{#Q0X|35+dQk?T>+Y8?rf~edj;z_-e#Kvb_(=lS^^@!DhpsnT1E8zMggsYZ zj6~(Ixbq*eKOa+tJo7rQtfS5Lwy~Ifu25A(ZUS|Ro^DLq;uROS=dR6xg^++-Y#`Ma(ezTFD0X;?X znhk+(I2j4~6^$6Vt_6j0yew`DoZIm2i&DxO+~01q&K}=}kgJ_#lhUh>dLzB9XN73n zHpN;jdJ8vT;j_@c@%cHcAEiyScZENLAF%lLW}aLboym4H82+AZc13gkZ<0hn#uDK+ zNImp_r%9ZuhCgPTy-I7y=JIn-accN8>;t^D>UZBxYhpw*_2L=OM3WmJft~Pj0^*&G zz^ViU{Z;TlR!=z-pRuyelW@{l!dI%}1KV8!XNte)`CQqAt||b!qJ~eOj|_An4Cn!X9^6%w0SQbIZU-vK=D}OJ)H(J*4fz3OFNT{v$Pf z+zszRg#b#LA?*-1pwY%JjHBL*{sl17^21$#rx5?GwkTObIL;ylKM47yp}}5W5<-3t zqL$)bs%709-iA3G2her3w}%bb->f5RJI6QkfebFdCO0d#C;oN;JO#j=6%^X+)-JJzVVCak2`+J?}s|4Eu1^ zbqP@27q8L+VBCHw<_T*ciy8D}5d}|Cr^mQB2U3a+NJm{i7Z}rJfZnW~g;gUJ@%!Et z_+eJyq;fXbS#y_cm9?O-A&^C=%eg?hTdDl3ni_ESg4QK9z$PECY;xafK)*56y9<~e zb5pDZ=s$o{$YkL&*T3=kJgc9?O|*A~KMN=VJQ=%*C(jK5%Fn54txnL=F-Eot({s@dKTzJWYWFh;L(oK-U@8vP^eD+Ii?*pTMr z$0c1n{jdm#H1-uR%zYZlbEm&;q^*expzAnPC99`@B?WdRngfFG(M{+o3ZN_2SvWP2 zn3FB<2=Y>SDhDVHAt~lof({KExQsdw^c&F#dzHLRy!WFvi@*nqSpg^dw_aB>*&T$M z!d`%2NylK3&vZEsT=rUkT<19`4|0P<*#XiYAlv-5GcbFq!E7hj+16nJjWWRA6))2C z>cy^pCT?+z(gWzam~m7l+iz9%0sF1o86ytBCjZcf53?kyKDmrnuYMJ5v`Tn0Xeyp1 z#cjy=I8{}SO!R@y*A>__HL9+Mi!31y`OLG>YY%EIrQm&naSp|C)SJzA4rMloY@zw6UlG$G`D8jn&VUO|*AK-w0vxWWpw% zOqm6cFcv3a2lCm=Y-c4RHXVsTCB=>&AoMR01n~|FJftJMqEp}gDf})hwCm^*A-6#4 z`CL|%fml2FBF^m%T%0otS^-Y&2R6S9FKLoFy_q6IPy3`dn(tOFFgt))7|fB5rjYPw`EM z7(AAL{2~YrK+gtZnNkP1>h#t8@JG)T&9w@(DQ^%g!-&fNJ+CZvR=!{l^N*YtFAnv)$^Ve0 zeYynbqjQRD>v6_F$S;<2xpQ+k1R};UWBIk$GTZhkvLMLp$k>?;T2$E{cs2Fd`mEKk z{J#HDpTH1g+Iprt(>u+v3HF70%JU>+&Q1OI1HIb5dRuSo0Mt@RCS768#A3;khB|z3 zPUIF2JY?6q4!ulVa)&qF^(5ZL&^^_?px{x z%}K8|<<=kwq=|+CzonO64>|nx$*L7;H&FAANw+FhVS|-aRGjS#Zu*)DK7;JvvA8SJ z9A#v{xqM>I6*rGMvx}9wtZ=O1jJdRQa2*d!K3vD(>tz4hj2UcI@Q!yHivkztplc12O&Gx9 zdV8fhZpe#{16nIQ4<7P7WFlcundZ~<`IL~%G(p8pnS{a!2#WuUn|TUI`t&fDQcZ)T1|=RT$-00FZZiXWU)cY`?zZy z;PyT}L@8%>$=Ue=_=3nOk585f9FC?}!TW(n7N1kXbj{}Os95fG)q<+T(t&Z%FZ>cz zHzklijnui^X;dV{J(ilxxVxnqKQnZTr-Eh-uuNWCiX|rwPC2AC?0W9YjuU-cn96`D ze#`KVzR+V`cSkJl&}!gZpA2eSmBzB3yBOlx9H80{@kfWg+SWn~L#yzaK0f>sY-Wsdw@u9;JbrLab4Me-~^iE(xNWQ>q!*M+yH zb2mS9YKS&2!lQbyN{LOnYBw~T8s1%#CeL4F72rw7D|jK;iOlbZwC^lGV%~+cU}Gbp;TXuZoQ^nEzDmhM7yug3 zm3cneIv6~O@wxRLEg(Q`GH{ZfFT0PK97#^sx4l+9Mc6ir8rnAb z2fFtNriPi~uNeho=EXFIrIdb&%h};oVqUn7 zV6GVM5N<{|SP#E3cZb4jct)^sxEkO5Wu$Hr_EaTO&u7LHH!tyJFga&paA=o}#`W*V zD?E_E!6lEAPW4^O@9{|;OtPzs?xH&`TU+&5UI>JJJwh<=7!!Xzzu52UC2{Z>@bcuG z<>w?X<|q4}Pn^3jDzWOG14xyT7Xg=c8}xq!BT8mwx9q}%Yib8!(n$bejsD68@+wNIkFYIFhyEs>0lUf# zwdw7;P(`3`6WEkk-V(&bT!b=x`;TihR$B41{GQoFsnoWfIQNFkx8D0@VUiOsvB5T= zJrSI1>E&5jsV=Hm5VJ7AA}(j!gTDsbI)-PBtMU3d9S>T>MMS*yua2NbvFYThm(jOV zK(!f81|LT*TwvsGYginy?(*`Tx)@uUrLN8GeKITJx?Kd7&OoR) z@T5G1x4jSTj22TkNVt{%nb)%{OlLT9?zlgO8FMKe{id!kl+>qW@TFv*^if`gUfqjV z{0km9HrWgxl$*YyX-LfuP_ZPONcDWTL4gT0rwNo%39g|FJ3`c#p&tti%x&Hh{=T+v zv@3kE9+o1Gx|A`esq5_s$>~Bbc}!w*7w7hi)wI?`Hfg!Y))%LU+v_URn1^ctV4PXdlxo1~23HQJhGFUWL(EDwA*)OKc zNMxYF`QlRd=>$c8XkbX$%+pAZv{t5t_*Ol4FoW49b)Cc2JDPTInBZ-68BUAf$yFj5 z`yJf9xkT{TH|q?t+TRQbbpP|ZF1)DXQP+p`=i$Qm>9qIZI4L#B3QBT%4MVt0^G%WJ zggOj4Fi^YWhWFQr(5ivji#p}zxU78IdqX(cxto#@y$wE=r2w~$VynGpLoyk7-K}o@ z`W1JJ^I=O!2Q_ZA0ZsUIiEG=W<+E31ofkBBNiwa{ncfq7_A$pbS`_DSVM3nR2+u=@ z5owt5K`-nzK4Qe?RF}Y6T35{g-Rv;4&AMAx&Hge}X~w>MB+D6gGXj#|;}M!uBhy6^ z)^jc&O`GRSUX}`7_6^Nx{+mW9*AZjoRE5j37R@BYJZck)E2eotRr}OPH0!$$Bwh zpvP+j>85~HK&r%;v=iZUGFJJBhf?vqsg4Ju_UZXP&vKfr0d`s}RX8H;|pQDDrHG>3aB#MVZ#Ug4h9UjEAByNlJv?;2A#C z&io9c-}LP-DCkZ`UMMfE$g0Gjv?9ih8rKj8aD!xxV{hTGjDSgcNkWju`iWW&7e zI533y>p(HgR$K2QRx^fy)KlO$2Jzlh`fuF_}itbh?#ckWs8R4o8 zTD7E~I<~(iK8~qzF2yraIH>vIs#vUWzlNy$7~!;dn^xOvWl_5+P+?PRbS7k+{7&;2t7M-WJP6pET`sra5D-J1N96Ij*0#*sbA zKggm3$1RW%tu2jH#f3}Mhv7A+k+KuRnmlR|wn9Uil~%To&Lc}6HTd-8UVg#&T4!iW zxfb9ZRM|`u+tq~cb~S&lU8cFXw~QW~q=Jp<4k`fzObXdomqK=x?0gt_duexoufMjl zZ+HcavS&Zsl(F}o6mN5KK`Qd{r)6J+9LbthE_jRHR6!~){E;Zrvy-`JKhv#s0A6!P zLt%zocSZR@!Gus+>CM;&1zyhv>!h}Ur_)RS&OIcHGh>AD0!g zbxcn`Rdv8WGY%f+NdrS%ynEwr%r=Ue&W_&@e{yJI5Ft8(c#l-{V$>;4e1e_w^kS6V zLil!0XX6p~pmuFzfB)s1bWjOzm`@_n)J`7PU9Lq1R<{rDunE605<)BW^coLJs9;2e zQpmVl4lO(Aif>cg?>1@mO-^4Mq3n5$!G9*Bl(QnINPM zh*j;bsC1RHRpDbM#^iBNZslXS{`rJLwD@#@xjJR94i3e~94d+zgED!fB)> zMd`-u<;AjvWsR};68+$kBKTf?Cb!Tq`9Xo$xKFt~#%IBWSb`9(uKJ=`<>*cSbhknL z_6UN8mCCp))s7ab$VG$n7LB92k57`Zw^MXNv!tYL@6@6We<^KKm(*s|9g4f`U@F_} zTzl?_v+h!3C`I5S;WZiT%1I2>=sQQ%Fm~`2Yl9>w6ln3Oc#M6z+eAe#z%y0GA(ldd z)bOmX?C|BB(hifE;xZBn6VxE%9_qvPDk?G?q9hTha@3@Xdo!O^)L z<{0(1a~ug9&eAG6((ShGCn0@KSaEK&m}hdhNvBj|G7r|G*bRlogS+A%ez&rPB~#wizELV1&9)ZEf5xu}<`Hokqw^9t?XYPvVy>q?}LT`Oj;%4C#EjOk<2a4)5focg6@^*^>kLlZSs!TO9?|gv-a_SAJ0K(Ig zkB-&*cnafSuUk>2-BwgZ#JIgnw}iQ4-l$6()Ac17y0u0h8jd*mmhLNn9<{Gos{XJ{ zc@wbwH)-!J!78Bm6lE1pKR)xL#qH($z<88Fo>qxSH4|O&o&9veT##X4m@Dd>#;>f7 z2guXnC7J^Rg1(-Kk>DKBF0_keQ~5+5j9GCD)o4|2ej7tko@#WI>qBM*B8uw;T>na5 zqLx^U>SY*F{35};kK(~va)COy^Qzi(ar;@Yb37P)rn{w^o62Et5?yE!Oo9TZey4h; zp>L^UElGYYkjyyAkiLFd{KScAcg{MmVhXheZ6-A}c10n@k;^p+U^Fmo6sC-fv2#@E z>*yfqUp|SoZzo*f8Mf8bfd^GC7s(y%VBqp#O+5>afDz)s|D^pKZ)Kh1vDv4!|Np z?xW4b+5>%d=8^e+U&A>y>W26}fFK+)y?*iMoG~s!biC|;x%2pYO=WeziZ%1N6wt%b0BRIsOtp+`3DS56=_o%8J(l+ zkR8$=s^hbGx_Iac5HEHdRlF}K#pxP2_>wdk!#R6em=t}=V~pUPCVEB=SqKo}_cLEV znG^2Mz4;cvtC{z(-7!n}QRk9|;Y;@0KW|Hru%9{uXuR0@?RCZ78E=!z@ z7vbZ_kC}ICeXTTLYhyY%*hIKS;f6qC>TZ+6AVg^^JDW7@P-JRRFxV7C~$>>o<6cz%f48Bfoz zeMQac;I_>oE}=%9=I8UXTG}iFLnc*wSX z=vxj1|Mw`u_gh`_90_Tqn(>Kc2$Gl9vHH*oEJQ=cb+=ZrD>f(v45$wu{pSriNyOch zqSi)Rw+F+zV(kRo5&dW#>WgSCW~CvcZuF6KXm^8(5MH&`_LT@Y?EWXI$!RHVdi{@@~eOCQ-rP#2gs8L3rU9DpM3r^Ab(31<@1G-aOmJo&pkMXbx_ z)C4){R@Xl1hT&)IqF}L7>UWhLVel9H{HG?hYlbejLUVugA^XcsX*H9gjm||6N9#3s zEYoljn+q8Hk4#>DI{)=s_11kuTM_%`sg}YcT0>xN!Uvsu;Z8KFxZ|vWt54Y&hcI3_ z6pL*1*-<$c(f{zlUdy+5EbEF;97&p2Ual9-E{exG>r6q0l@U#H)j6VHB9O1>k8kx) z<&~8g6Cl7H%W&r&*foiCGS~pxkBaYmf|$-jE#GY{qEXmqm@;jX+Jb&))goi9zII7K zthQ)J%1Yr+wo0jOj><~T(M;aQzf4T*Pbrvta@JhH^+jNk-~#>#0B2JV19qq6i!#cY zA+Koe1(lQZa8CiLdL)1i!(QdJ`O{GGqtRuWRmuHk`}N~R%Mu%wV=}?QM9;KgngaBU z&xc{3GrY0IwN})nrrfdSIzflioc9tjC{f>qNPZkasrVh_pr(ZUGO8e)8iyD8h(ZlB z>V3Aj64=Oa=x5*tsch9UT)q=M?eF>KwXTavHAMLy%7wF+r1uXPs0m!Z%v&aioK` z;vy8@p7qH2jGOr>Cz#=u>gj;?OuIK?RE z8f5yD=1BMp9ZgidGhO_C6_X21@-C{-Y6JcUc)&#SPUO-Qqb56w;BJbVnE3~ZEmX_J zmW3^$`mE@e(#tdNe*=c@?#c*X>X+v1XVS0mGTODf54I(JMco{oP>bu*=qh%(oZERY~!hBR(h|!0P4`4=>nLKh!NN5s}^<4gNJqPL1 zqnXfDQ@OCbZ<@SKW)f;|Keaa zG@tYJk<)RpoXA3H*&QzW<4sR4i8p46T%$;e?3f7`!R^BaMsj1>xla6 z9|ZoBZ~LS*Gp&m-K0Y2GR#>PcLR9(TOi!_%MGr53;v!k?G^- zh)InR&g^pZDJP_Aii_rd8fxL`HEYQP%7~M~pv`_s89o-p=c^(5%?gFGW#5Se1qohM zFOigz)`c|h9F~Ux(uXf_w30~x>598OH%jK29~~29*oxt?Z;9=sA^)!DsQRYJ9s( ziV?$1I)MUE_$;9PX0Yi2vlakWK7MwyPe2!wD_gV(mPjxSYo@#c;^;Dn(S(>T4;&6HRH3Ziw@aasB!!DbtB)nhrXh(TQ2^z zbcEiiVeBG88}xk1*z`FEp4&Wuf@5+btT;OW|3pdB;G)~?VH?JApltkE+=Cjoo%ltb zC<#_RcCeZ_f?kXu9g zFlka^>B$}O{Qh;9X|j@hyW`cv!|8I`{Af;Bk&&W%5`7EK-C3F=U*5?-3=HR+F)ertLXRn+s!ahFTdtQ0iU ztXpVWxuth>YeG@o9V}PBzLR?OprgAeO6%49o+`ZNbde^@H3K<1UKfqm#?yMK_<1$< zLjE@xYyE-k{RX!N?)jiY#0n$yniyyEV3RKXS@&EcWS8TjfkvH-Ozauk-6-SRm+40F zsYz0Vz{d&2$cLQtyPPP(uv}&%ti77Fvzel|a8Y;d0(Qwo8hO%f_`M`i9!nYIRC7q`vmX1Ww8ELD`MMHF%1iT&qKe-0dR|IRf#a|dE`7x@ zOu^_D3DlZEW!wf^umq1@3SScyx33^3U)m(Z`pmvwd40S+UcWGuI1??UuB=s z%hjV%mJ(FkVJ{_MjGrWL-ZdXi6jf?=xGajoE5lma=I^qU+~Qo`>80j-ZK(n(GyUa4 z{Fz|2nUj#avnIR;5WEU0RY_w-zUH=zoZmq{%^84tmq!+&hp6M0d4NHd8VX3M?$@SV zZB&mW9BQe8z-Af%q_pneHt)^j(5xwut{Ycehnl4wIlAc@FF2x!!?gg8@p*>F(v)^9hEg~RgTdq;9Tg-XaoK`XsZ!ypeWSAq-tcI zJ6x?nK#=CfUW+f8m^SIZf*AYa`_eJIr-G;*UOo0M{%aBDqet1XY#GrGw7;B_iqH6~ zCBjl^Kjb!FWma$wxkXpyE2b~N!oYdJ0tfl?(weft>O!hN^L0q6g8=^C)|?P}|E%cg zw}B#E9$2>1`U;T#lBQs=%KV^2spn@W72b_}UQA=PI38;PD24Zz_?w1pCs?|~>j z?=diuu;t^qeP~u~#=~~>q*zFkfpsr4CcJvmS`O=+KD_KvfI;XRAT0~$i|n+jAOnVw z`y+bsH9ttpzE(FF1VN4Er1VMp;GTFV$c94qv6Boc*q!VcX^s}0Vs~@rsbdyukdtl# z<9!t_(&iXoq4Y($%Xyfle_7+9n<&a)M)jyUqj*?sv{`=6fI(;qpB&C86K_$=ijWe< zI*)+nJVPucfYjOfu4(Jh`o-5ezUt_Y-L+`?SbcnLOY^7}hrOGVP~F_tf&$8ya=|A%flY@E-Do+JYa-)J zEgP;`xIQdAs*Jx9E`qlUCA`oRbcBkc1STn;yxj1hE|OtRM~YoLT9bXeU;oVQg8QK4 z3`!N=ql==hs_Lax2C$Z~bc-(CoPw#X>6+qNe#=E6yK^$RQ#BXQAi`C0Bh*HCl4~Nn zkfh5r$&}(Wy_p1x8gSfmhBd~rwWHGkyKo1Qh)dh}oq*)z*W3tJZ2OBOr-}Jm5{Lscsn`x< zMLOw{E~=#w&8hYkQ(&{+EkM2#xDk`E&C-z>bkv6TPMIyZcLv0o>Lwv~c3sMSS z&b##Gdq;e9$FnqRt|;v9K+B=~-HgP$qY5-Tb1m{S@0>D{NS{!;^A6aF-t&C$1A4bb zJ_bRI)Xc|HHTQqsozLupGsBE!mgCrYx*gsAg5PYpDUnM>vCU8HaJ$(uOWi*yDUPq5 z>K7NJwsLtymu0iPOt$=N*Tm8nA`RtP!^^R6qG?1@Rv2a{T*RX`5&@kJ*ObO1CnIlY>ilvxiO2019 z=t$t>Iu=#<@`Ae!HuRh5xKnl*lj&+Ee19n_0d?giHY_~cLGJcp+miY#HVpfUkP^${ z8AfCL*B9R9%wGS(8}L~${1 zBo|?z=k%x4Z*(bn<~~XHDK8ta{PbIisCdb z4u@nI6(4vD*R%qWM4)5!ME2Ua5bJWh1A8q>?~$+Ccq(1Z#SE&XCVXAyJPnGWV~4Wl zQtUKs#J^?3AkT|25j?4K8jV41&)6Q7V?(kd@;QIJLcGB@uxkIS_$)av`C&<765j|^ z?aO^01E5@TF;H1z(z(6!ArNPK7rfIZAQ~owEjp3r6JOPjI%qvg0?+*#$GjDPo*0Z& zZbqpc1!@n^q$JUD(tR+RER~zUGGFKRcvxmlEglZZ*aPcyvpp=je6fZ!Berlu*CCRo z#8HyMAWZ;68k>R5Igb#`p<~}qPiNpmf`*Lwv^|78oQJ=`MZXae<;sWU*o!a`Xwm1i zeO*MeFYhUS@eE1;dluH%dSYyWZKNUbibj<0Jg8Z`==(Uq>|$}JEv=bsLF0Ximr;9g z8QccPvc%5Z=yS@Ali_MJ@jHgSuY;{3Y6T}a(iO)gB@BHBe)02$+_@~Ws+UPE~wcf*D_Wt zBj_4Dqu{qhKYY|iQBfP~MxPOww=m&407X~|Z783V%`9gX-f;d9qFbTpXL99p$)} z$ZpNq4Qz%dD*V1AQ_~Miw#sMnEA5B4`tmYbQ)Lf_J|i zGn|>vpbeigI|J^b#_6Pge2*yR_BAI&)*LCTHq>;4RLOFivx?)CYu`FI+pEAT-!n(jm|Egz@$N>Az}Jj`feipTWg?N-1nd$a{?#kYt&s-nm618? z(lqUWeJZhGN-aMomy}z!!Y%dy;=9;cn&tjBF)at++y+cY;Hd(|;g&NGajFnNvlza; zFb;4*lyqp(m2JHgaiL)-ta({v?Usxl5fOg__IWR-Zr!ZDhH^vfDeoX^bCyrYSJw^* zFJLf>ncup-A*CsDe$C`AL-w$!oa;H|7<$t4*%CR*oC9#4*;7<5FaBVomG{MgB_!IY zae%+JgV)lV#!PK1kdXW8osmprCZ7BotI@if!cTr ztd!%&$$dv??&+-_e_bvw{cO^I5(i~hcl3}C9EAx=hq?dUkD!Jw(YV85*p_O1c^;u$ z1jw;{@*=(S1wIB~6zyVRLAdt#voU(RcYSpo{5 z`yb*5;GH{chE%2=u71bSou448YoNbheVgEme^*#Hi%um&nXe?+A37GAh@Shz`mm39 z=|Gj*w;`j&T^t@|8@`G1Hsn@F3DX)s!$_X0B|p8Ea~*RbR%7v=#NndJcI*7|U|JH* zr^=br5MZEa18p3X4(mC6hJ z3xjoHHD~fpKd2+}Orf{FXqk;wb}Xs?b#gkrr^Ow0qzi{Er|pSivwZAzvTrZJ?+xdArC++F$e%s7zk%$C?qlLMXvN-O_jJ$ZRf z2>4k-=sJ-wIfJ<4r*ya!eV|h>mVAEY4@TO&!e@wa$OvQ4*`2j&d><`8l}gsydTbl9 zIBk!E3&wCF{nU$T;PXIVNGnY@U|#$Dl$fho)d8(KkYlX}_QLx(b%H9;ADgEDcgT{2 z&S-`7F+zF}DGL9;YCF@grmi%OgP=GNmViM(NLY#tP$$^(2nHG!k+qL3;!?m+u_M$$ z159adB(j!8F;EuESYk4O78onosSG^B1&Yu?q7odLVmlaF3xlO0Ned`Q=Dm=crTsed zG0!>YJY8RJSBe%d-Mbm< zRzo|(RMICtOXU?uLzV|wgru%J?1ly>mnspI=scB?F?r6U9|v;A>>G+`_C>aSZScF5){tPWbHMON-lmPRaT4%9y0_?j!T1!#R+nwQhp%ypjz#ekh-tTP z{CW}`I>W2kXvgZlq_=EQkfww{Z#*PPrAyK3t>IkeUxp`Q$!NZup5I|&Kqd4=LZ1vD zM7-MJ-&U?A9u0luO>KZ?@RkkxGrf`e>{&T&zJyfre0cFpeX0vMl-GX$uuj7TK)I@P&!NI4WItq|Lp=yRr5z@A5`*bCeN7Qc zL<&XuS^viaYhExqobe-WS4xSabyFt->3#BC7r_?WzlW}-EPeTu*=#CJOk0kGaCoB? zp7fUKb~Ldy*oj0aq7m`S9oa8vmBNBqqgDpRhlnLkK{xq}(_wkFa$m<4LpM`C)i-)6 zB%!#BwNqxz_v@>kAh9nhDu5tcjCV{c1MIIk}l?W;r z_Pte`N!pli^uRMG0ahxNsE%Pcb`)Ln%YU~|erY+NKeK~fpG|dyTtc6_!*Z2w^c_BB z%q)>}$s-_XjuCapPi^7BV2sjgSTS;9c9-Ykzv+D~W* zG>bEXZCxrOWjFZ1|Kq*3UUz=#Pw1+^LIDI+c}2TPy4`tkHh9pFo%bFmhCVo|lI-6C zA9lyuw7pDnGiB%^Mn`q*VW6v6J~`JDbPGAesGV8?o(59xf{Z_q1Cx~7b@(d$LB zz4^u>7Q?FOSx|z}z^ciD9vL0)3*%0UpfwUS+!-}b-(wj$B1ul6z(nqO=X365b6aFl z$bu^{F7ba$C5I%G_SRm#YI3kk><=ty>J$FR(QFKn`|sYW5Z%OYnMA?r2UbObTd^Nn zVjrxL&SIDOpK4+35g#EpY}>dYE_WhTvfi)M^}dQTY73{|ayb3-v$dOULgRoRnggL5 z8}g=IF=Eld=9GBFz)irqSRR1!K{39MMkCW_7U&A-RsV{X=4}3sqnP}N6d}C z%82fEk0i#W=%_HrK`68MWJkjG>?8c8o7TBsmm4s;Ih-yt1x4biqpH+Vy!>_Y@{Ug# z*c0=0w{@ZGUR*bEmc&|jFZ^Uw9eE9F{!@#6rq(+X*A1ltMKF#J zc7E6yR{glK@mRzjW|O%leYNf4maI#lKJ#;Uf}I(FK&*1|RcQ6G(!a)?ZG0N*tjRsG ztabKw6`qgY*H4m8d9}d$VUC=2$FvkvAgv||!rV43kkKcOZAjo_+1%?~jH`CBYMr2W zyW{YjbaLE$<7C2s*)G?(H>(L)j$tVp-lrc2E{R!=GH(a_VJzrO)g)`*_z4{D%-D)Xf`Yx}U8gakA{iAlwq$d4URfNXSS}&Wa zDCNj6&=YtAf!6*DSc=LPouMU3G{bxFiH#;rjfh7s!dAZcgOz-6v~G5TxT@2?c?Q=j zFHCk{!_B*IE7us>yv^0WnW8YUML-cBwlFo9p@B`xVLZ znar~}6Ga6J`8cV4zwG5J8FArq?%sv8JmThb0tM)z<7vM{#=nNPH~oI>cUGHEx`OsB z+WRTV~d*%LKKvYdEUi{w>bylFAH?S4ju|>34AyIzbiLVCIA2c literal 0 HcmV?d00001 diff --git a/doc/images/logo.png b/doc/source/_static/images/logo.png similarity index 100% rename from doc/images/logo.png rename to doc/source/_static/images/logo.png diff --git a/doc/images/logo_emblem.png b/doc/source/_static/images/logo_emblem.png similarity index 100% rename from doc/images/logo_emblem.png rename to doc/source/_static/images/logo_emblem.png diff --git a/doc/images/logo_emblem.svg b/doc/source/_static/images/logo_emblem.svg similarity index 100% rename from doc/images/logo_emblem.svg rename to doc/source/_static/images/logo_emblem.svg diff --git a/doc/images/logo_white.png b/doc/source/_static/images/logo_white.png similarity index 100% rename from doc/images/logo_white.png rename to doc/source/_static/images/logo_white.png diff --git a/doc/images/logo_white.svg b/doc/source/_static/images/logo_white.svg similarity index 100% rename from doc/images/logo_white.svg rename to doc/source/_static/images/logo_white.svg diff --git a/doc/source/_static/images/nhr_verein_logo.jpg b/doc/source/_static/images/nhr_verein_logo.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d1156ac64e5804beb8dc081eecc52acc453b4d8d GIT binary patch literal 8167 zcmb7oWl&sA)Aq8kID|!ly9f8+5Zonra0nJ8xNC5CcXxLS9^3*1776aI!N1M@KKJug zz5m`mQ*}F&C^r)#e1d0l*62Vlrb%18oWU;qFZ=mU6N28aP*fxy25Dsa#NLIi=} z;6TU-2=Iug$f&3&$S5di=$II2=wNgd6pXhRU@UAL930d)xOi`|@i4J*u>U%NfrZ+@ zfsjBTBy2PkH0=Mgy!HXWh=5_hFf0rO00@SG1;e}!0EnO3^r7Zx}cb5FP>c6#_to{s&^fVnF{L#{R4EBrF2EcbJ^R(GL7D8EU+vs66*< zue$1C@afU2tE_zA=(0z>O296meGQl7!wRYO^Y;u9!yCP-#~PNpV{8BpgN+TR>uPk{ zSoG8)U-=BLrOs-63q)|s_7qLHl~YI+FU$Q0am+}E;+4vBqRl~E%jFQ{hL|=*y;A+R zqXm)(%%^&d*)+X6kENZ*EI+H#eT%BI7AO7nEu>2QiiK*Kus;W$90`V2X~Y4~UA)NV z*_tkMH`|j1W3K?gPHU{76|nNxmBh`_TayTx201+1V}`%){KdGwPksjBgoU z0isvC6F=RHXN*5@SrKI{=w>vXEoKO7*~wR(BHLn#ZEHMAmM^l8f&d`f5xn#>$|3!l zK7F(D<7fM&u5MrBmqvX#HlIShiza4;BWCBNyKuAXSz_{}uDUN-Y54he#o+F25OrsvTnSq1>Wus{S?pySIC zK3~(%8F3%`gFSb*U|e=*5AalSOk}Lu#f7S6`HSy>OTVEnp;rV3iAPtd^vM@;msP5p zU&ka@KcC;hq8TQzxQrA5+Mzb^?Z%ryGs|0f=lJF=Hw)AtDnWYmALMk=@;|hrEM0Qyi8j!{O*5bp?AaQ{{)^a}(7fXBcluvar?-R{4mbs%!_>i4>JKagF#gAY57g=)sYuFk4w zE&MT)UkrUb2%o#G4e%?1DA8e3YF9PAqd04ZeBMTS-KH;+W(U6l>^|WRS?#3m5ukeS zK+P@e2ZlG>lp!lHEcckR#Z+^ z^-B8kd}E)Xo?*K;TP>)9B?N!O$kk0PCAH#`6l~;-2(S1?70}njC4d6LCfWdOy54_v zzgLL+xEVyiHE!QDS&&;99EBb|Z1Uy&>2#R7v39azL>(x&Gmf!DI}eUEB-R3!nM6tv zyZ*K7cY$4UugbfxbWJ0D`xn2xu(uL&iud9X6g}|^VFF2u8SWrc-aXh9^WA_BF8n_+ z4lYrTVQEd8Bog%-{|ytW7&D3OOXZU2h;LPSkEQ!hp74ysw?4Zw>d|>s`&8i6k6uQb zq|Z}1p08mKqSQs3r`t43d~uACpQooF4Bjnfa_qyNRA|#qFP{*~CdA$#mx0?J@z|zQ z()v+Cs-!2X9IZ3F!FZ8q&xmG(AmE5Nrk|Z<+k&luQyPIUhY{r*dK_@nnW$X9cm3EZ z@~&ekv$>0JFM zk+8Dk^?{fAUYO=h%GI@pFr*>(g4-rsJ~WkX{}nKo)%AVik_k3VPpm$ISc!hJiZ66@ zVxV@nIGXApCQCH@|IRsW{6A65fwu5@5vUUSdosYxun%T1ag zyxDNNWCYF(`sVv}%tTC$j+ChWHM31O*zjZ~?YdiVZEC-b)O!cA9ZLJTUU4wYpl)Ii z(oQg3?c8Uvv8|Y8<%GP|LL+lY%{m;q^T}&oR#c#|O2#b1s zgEKviwD#euYIA#omFj$W5F1Y1wjf9*da_8SusXGf7gJgrm^SZ5Dz4L@uaIw*V7B~H z1Kmp4zyKI1LnDBY{-rt?AS@gJ41&kNB4=guufyczvhT$vV;5CXH8gU_`)5CavZgT1 z@89C69XR&!EvIUEzVC8y7YnN*)*!5{^ywQX=Q!aTQA2jRle@UiU+#$+SluEr+tKj# zYB#eHZjQ7A-rFLnznD0N2ty(vN`&Z1;f9mr$u04@EVhZApoKq`8i>99iS-Shkh+wC z;lV)rqQX3hcPrxU+SW0-2TGeW1Yh$8wXRC)wB#VTF9ZRHG)eGiCm)wte^$B%37xj} zb@ki0m~iIZMkhtw&0iiA&_CHZ+*FQi?2X%gz55i0R-a&Rq*Xt?{R_oxOH#uf(R*ml zrPO*yJ8heWq%^DW^h4;ncb{x?PVaYUim1T=APn>b1KnW%BhVB9;J_G|tXLePi0vT@CH~5``gpGZVD;k1{}nzQ=2wLs%xj6|2ao8AIMqa`!ny z{>T{_EBm4gP+!yKE&*Ry`g`^VOn;}>Wd1y~XP5H6*Xe#1zL|f?624g&_gZ&5dg=#1 zJISPt2sRlp_^x_JnufP@>*J&jk5L>ucV`*xm((x`=uloDzjHn3R1XTRGzjmY+qYF~ zK#%VIEt+Cqm4O?fEV06Cxj91`YOz4JS97M7@iaeMX5u9pzEC&j@p$pF>iy>xVB7s9 z_0s+dU=?0|1)T7Gu}cbFHLz@HPnV(nJi0Inf|DFA2r8e|o4z@(VD4SILHg^^a)|wO zssL+8`_l>T-L}HtM5MWfamO(6rSSWZmMXmseG*e-CC#qts}oWqb`3}GHj!WBILbE| ze_>#M&hUXSa}zB8B)tCzo#l|*;e0qk!-tx_NA$E#^IziM)1Fi$S1bzLNxG2xr;$S`-4yaC#%u;S}ea5BrD`1E)*8NRuuC8^=Nlh{Qy*rHxS&>96P3!A% zZd{owf55EdE5K@Hcg8N#mMbavW* zkn?EC={LS}0+UGUhn4iozc*P@pk&c+gHH*vwlT|vq)`(&pv~LE#Z^_~5B2N(BpfdA zQN*Sip85T4p-_ryc9~6UXhUcCtb2g_+)AE0RXX5qLxw#7r~QzKA2wZ406vU8fSp&& zgC@54t<9$=m$$_(-0HYAiOeLQB_UKs4Jj-sRYvE(PZV!Y89XvCDCw@sHiH@@O*BY8 zp((8kB?X-$54yc$L98_2p*9iT<}zj@r6CtRWw81-wNN*%Kl>wAlD~Pjb*1kn@)Zz! zy*eNK$Wm}}G|^ocnqmG)zjjeTBr0hx(CK)mk-C0&OJ$2I5Qmu1uGDrx>$hP0A)^Sd zEU>xx!D%&Bry!7M2~;CVP9I8=6m%v>I8sHqNilyNYL6BYTdh&swV!LeojE=+u)v8c zyeZA!&P4>%u(PE)@I$+bQ9JagbYmiitl7%(En6J%RpKyRcoZ(d5Y-=cWW%b&xY`?4 zB_ZlM^pvr$0HQY2sQT6|EjZ2HVMU`~^>vY{1;D-?9CgtR<;g1_?x^mDPYmLPcF03m}-P^)2J`+pUfp++I-l=z>i3Nq* zh|uY!qNK+~q-9}ePN*o+uoaEB&>Pw6zUVg{Zy}qGE8J$6-zGW^Q@AA7q7CTO76!1l z*B#>)V~o~o*^FArODcIgPPW}wEX7d5i71SmmSy8&GOIFo%?BBpJm?!_*c`%{!!(sH z9CzWyC2kBu6pz$vX;-Y;?iB{l^e3Nvl8)~)cfxdcZL5-E(kZi>u0kEa2KdM4cYIPa_{mrS(TAPFq<=j!RND3(|gJUV39&>i6))Cm^ zLZ8RJi%TwbU0NS4D|m4jom|1RNHrt{4Vf)^Rv5OpM1IbT|B~;WgWP@Q_R`K~*Jr|M zK4xR&%jC^TXSz3uBRh#8Y;t| zw&tY{z>?pBCA%X9rav$A*(Kc(2urI;eeba=z~XI|!4bJEAWELd$}qQ_VW(H2AHe@; z2dg}^N+p6G6rzqoW9(>tGQ%ljJ@iUIATVz#Gk@KU~tk`{U1)XX<2xaIg3V zBVV@SJ`7|dm?(uD&_xkWmHTE8AU971XrPVWm?@I1wIo4Im>mQE{GLooi<*)OTX*67 zG^gg#df2K{^DVcg9AoS}!SQf{% z7xdAzc>_@qs@oi$V#_@$$hc0z{T*tPE$aA8vOwOW?kZ7lKj)WWcRo0cHJ{#X7TvB_ZEYKaH0RN(aUMA&Q}0ZOoK^B6lxJZrqdYy^TX5M5S-8XG4$+a zHes=Ie#>3N#|(9ACg%wM5q&%A1bINid}-tr^o;VkvxpLH9Y7{n>G&esBI@RG>!%$F zsvIfql2Yn$p#&1+?bd@1=M^g%X`Kh0A3ro@)P-f?mi&}Oj{IK!Uh6u7k>Ek-cTZ5w zL@XNMJ(oyfL$I*oeu!Lun;~6++OAFC@DhrXlGMQAItWBW(oR90v4f)K32%2o48al1 zrK<^c`MxqfXhl=P%AwIH7+RB0*Pm=`*Vv%jNBI24K6MyQ9mPJR!!NS-{o@!Ci12() zb+v9pVjt@ald6LoCF3knsJO=6!FnA-Tn^T}X77c5fQ8ihsXp?{S&^8{Ls;D0gA(nS zgjeC!iW1itw^F~|%kx9do6E5rWOJUDQ|tFi3+~M3BMKE48Qwhp`dCd4BLWIrbKZvM zmj0eR2)X@D0S4oyyzg3~2RdE>A;2-Ng5MEjo&K#AseN4p@1sJn(TR~MV^u=2tl4G^ zJZzTI-}3^)_bgBWMWdZ7ZDa3nkv>=0YvNegerish+{eN2w;R`+KVvELOR+T)kG0V| z_n4j65oDJQQJb%yZV5qFw)yTB18<5Y8Cz9^B4w3k;A~|ewMjeIueaz22?;KeZPM(p zakx;e{i-jcYf#kTY)mb+Om?Y;<7$UyXp9q>tH+mrdSN=^9$jW2*LiY=V|Y}=r5$iH z9PpgmDO;1a{Ql-oR{R9h==ZAB71&XNY8Sub!FwKTgRk{P9TCUHzf>eq+mX1q=Rfy0 z)CwlsJF<{6TP85ybM59CWh}1}9tjDQ9;-nXb@99wKZkwgUYj9pi(`UM@idSLj!&Gc zu8|EsuE(jeJ@{T3V>4Z~^HPiiZ~g5fN5;1|_{`-4FW(I;$Rno%iq@klA_9ZCKd^p9 z49$a>8&ye3ZWrV9mc{q9=sU!R68G|p)rT6r30OMem-jbi8Sb=V+89vpvg*vF2*9fI zME*%_G5z$=V{{`Pb~rqOONJdn%0(*z-`Bt0V00?4}mL4M@ZBL_x;n=l-1S6 zbs5v+)f8_2vHxPlS{Ef{*c1q%d0`QaaWfOb7#067~58LKF?->K`J zIT!Jd`~Ph##uoQnIUu%|LgZr2PIh7RZ zVion-iA-+06mqDAb^jrd*FUXRu(a>s>O!oXh+oNo6Fd;Cs}n*ef8u0mCTyufX)LW` zG%WGn;F(u#9V0U>b=;}A2yN2#vPpdVO;dCG%F9lC7Q?C2FFJ`*6XW3qlMkEy5UE>2ozlJcq0N#AfdJcv2WQH7 zCu#RxqOIzSo4-gYY)@ zQT3xE=TMl*v?_~9*w7X_!tvQtD(}zvL{gR2CA=D|AYbZ$!`gjh3sd#^ag%Ob7TsNi zy5)Q$Wfpi%jc0`a5c)rS4)bq6{dc`Yp^Q#u01hO|*8A61CNNsDe~pIi&p`KzkHJ*&5j;)PrvyR`0*;%;>tolNlaHwGBx z5L3`f;g7-@pYhsyK~@{R<`GOxw7FDcEBj z_H7177VpWEy_{k-zrt>*DLFIu_>z;X`U>8wjbAoh$9MkthDX9NwRe$zk*j=6dmwQ=o>m{?y-e>lH4>o*>JMEc6R0zuJ;3`*xX# zA*DIxql0|iLGe!Jwq`$Vx>(V6AVT?QZI|hj*jDDi56s73E57M(43l0U(ql~IJNdp1 zXR^oD<=Pr;ei|;tSPUtKtDo=+!uH)!OuyMO8^^sXU)3YpmSGHvA$)Xxy4xw4{aVU; z`sOKfc0rqHj;_z0PUh1h=>%9LS9~&aJ=SGRwQsF0FJa9={)@Lf@L z1OsuKry9({=@G-~5i}a5!i~b1(%6eb;|#y?{86*|=vTm66oClkSLvuKgH47Piqb6R z00McGD5J3e*Atn~*uw_R*M6nCJ(>zFn+5LQ4|mTzBBX(ghkq%by4D zt=`S~n?9jON%x^D=>&3h7YjYWXJ{}gP)ue*-M2StLX ziMJf9B2%%m%x*-K z2U3Mcj1k|GO|)xdmHlq%;+7pU_GIIFa4iKxPr}glw!^JO5&MJ^YZWQpo5Zd;y9KEj ze~hhE+;+|NgR4HqAyHi-56w12sN4^wPMTt#;4KTxD`Ddg6?Nk^n5Ge}#hn0AE|PJX znUNzAtjLnw!3X=Dr>yz;!5R+bN9V%vN7)VI7?984n?dkAXV~8M!YmkmM%2x-RJ6n( z5z!s~-rD%ff+m*Q@@RTp3x3QLjUPUr40(dA<9G(6lvRTqh1oJ9XxMl-`A53=84NlD zc8Et{I?%c&9Ijv5)^bs#HZjN=a}>nE$`%}5p9hGOcYQ~ROVnC8QA9S)crh#&L=K6J zFGDa+wZUAES2(APR4x~725MwBOAbh%@eDTnvY~cg>Z6Nkdi%NX6lKEz+i8LJt;oQ2 z+bxmiF9*%li=QlOm#WbX5redr$#IIPr?T&-1NPfLk}wBP!#(YXEPefDnh_bokGYl)>`)k$h>_)kVP&#(5gjDP!b;G@ zyi(RmlIXB>q2$~MV-JML*jq7Gg09{lxR4fAs*Ef@z|FDj zHHR==B;e12TM+-`D(e4bZAK=!C14XTqqmDs*{CmQrN;q`(D-FqhwXP--JI z87cch1v?IdW>%peQ)9c$ak{9?5m64wMPMc!1Gz2CZ+w1r%8k~&kEd+Btf!Thdl0dC zLry%5uBwRfU7Z0Fd1T{NGN_q!BWgI=Mta{rao6fG=Bw#5ks%xWC2kEB$8g;2&Q7pbIGd9 z2gC;&CekvmS*@|s(DKVbI3~kkD?^iOw@c_FbFsE0)-+^*_cT`*(hWqDTAt}_E%c!| zBz~k!(j`Vn%s34UCSqhK$QiV;DQ9oQ_=*rd1JpNql`D0hH0xV!4sy%;`n9C_r$Y^! zW$eKOLaHX9KwXR40Y2jorTXTQ%uF(=VB;P7NL(^;+z?4ECT&rPB<)XK!Q+VigXM6x wVWMiaygR|7ds6$j9%QQ*vWaAotrE9fZqs@>%R^yYj$KN}X9uDLzOPIF57k*5AOHXW literal 0 HcmV?d00001 diff --git a/doc/source/_static/images/perun_logo.svg b/doc/source/_static/images/perun_logo.svg new file mode 100644 index 0000000000..ff794fba41 --- /dev/null +++ b/doc/source/_static/images/perun_logo.svg @@ -0,0 +1,112 @@ + + + + diff --git a/doc/images/split_array.png b/doc/source/_static/images/split_array.png similarity index 100% rename from doc/images/split_array.png rename to doc/source/_static/images/split_array.png diff --git a/doc/images/split_array.svg b/doc/source/_static/images/split_array.svg similarity index 100% rename from doc/images/split_array.svg rename to doc/source/_static/images/split_array.svg diff --git a/doc/images/tutorial_clustering.svg b/doc/source/_static/images/tutorial_clustering.svg similarity index 100% rename from doc/images/tutorial_clustering.svg rename to doc/source/_static/images/tutorial_clustering.svg diff --git a/doc/images/tutorial_dpnn.svg b/doc/source/_static/images/tutorial_dpnn.svg similarity index 100% rename from doc/images/tutorial_dpnn.svg rename to doc/source/_static/images/tutorial_dpnn.svg diff --git a/doc/images/tutorial_logo.svg b/doc/source/_static/images/tutorial_logo.svg similarity index 100% rename from doc/images/tutorial_logo.svg rename to doc/source/_static/images/tutorial_logo.svg diff --git a/doc/images/tutorial_split_dndarray.svg b/doc/source/_static/images/tutorial_split_dndarray.svg similarity index 100% rename from doc/images/tutorial_split_dndarray.svg rename to doc/source/_static/images/tutorial_split_dndarray.svg diff --git a/doc/images/weak_scaling_gpu_terrabyte.png b/doc/source/_static/images/weak_scaling_gpu_terrabyte.png similarity index 100% rename from doc/images/weak_scaling_gpu_terrabyte.png rename to doc/source/_static/images/weak_scaling_gpu_terrabyte.png diff --git a/doc/source/case_studies.rst b/doc/source/case_studies.rst index 61ec3a1983..184e11571f 100644 --- a/doc/source/case_studies.rst +++ b/doc/source/case_studies.rst @@ -5,7 +5,7 @@ Case Studies .. container:: case-image - .. image:: ../images/fzj_logo.svg + .. image:: _static/images/fzj_logo.svg .. container:: case-text @@ -17,7 +17,7 @@ Case Studies .. container:: case-image - .. image:: ../images/dlr_logo.svg + .. image:: _static/images/dlr_logo.svg .. container:: case-text @@ -29,7 +29,7 @@ Case Studies .. container:: case-image - .. image:: ../images/kit_logo.svg + .. image:: _static/images/kit_logo.svg .. container:: case-text diff --git a/doc/source/conf.py b/doc/source/conf.py index 2cadd075ba..c2da12b04f 100644 --- a/doc/source/conf.py +++ b/doc/source/conf.py @@ -21,8 +21,6 @@ import os import sys -import sphinx_rtd_theme -from sphinx.ext.napoleon.docstring import NumpyDocstring, GoogleDocstring # sys.path.insert(0, os.path.abspath('.')) sys.path.insert(0, os.path.abspath("../../heat")) @@ -46,6 +44,7 @@ "sphinx.ext.napoleon", "sphinx.ext.mathjax", "sphinx_copybutton", + "nbsphinx", ] # Document Python Code @@ -133,7 +132,7 @@ def setup(sphinx): # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. -language = None +language = "en" # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: @@ -209,7 +208,7 @@ def setup(sphinx): # The name of an image file (relative to this directory) to place at the top # of the sidebar. # -html_logo = "../images/logo_emblem.png" +html_logo = "_static/images/logo_emblem.png" # The name of an image file (relative to this directory) to use as a favicon of # the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 @@ -409,3 +408,17 @@ def setup(sphinx): # If true, do not generate a @detailmenu in the "Top" node's menu. # # texinfo_no_detailmenu = False + +# NBSPHINX +nbsphinx_execute = "never" +nbsphinx_thumbnails = { + "tutorials/notebooks/0_setup/0_setup_jsc": "_static/images/jsc_logo.png", + "tutorials/notebooks/0_setup/0_setup_local": "_static/images/local_laptop.png", + "tutorials/notebooks/0_setup/0_setup_haicore": "_static/images/nhr_verein_logo.jpg", + "tutorials/notebooks/1_basics": "_static/images/logo_emblem.png", + "tutorials/notebooks/2_internals": "_static/images/tutorial_split_dndarray.svg", + "tutorials/notebooks/3_loading_preprocessing": "_static/images/jupyter.png", + "tutorials/notebooks/4_matrix_factorizations": "_static/images/hSVD_bench_rank5.png", + # "tutorials/notebooks/5_clustering": "_static/images/tutorial_split_dndarray.svg", + "tutorials/notebooks/6_profiling": "_static/images/perun_logo.svg", +} diff --git a/doc/source/index.rst b/doc/source/index.rst index a176a45340..08b96be0f3 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -16,7 +16,7 @@ Heat is a distributed tensor framework for high performance data analytics. introduction getting_started - tutorials + tutorials/tutorials case_studies documentation_howto diff --git a/doc/source/tutorial_dpnn.rst b/doc/source/tutorial_dpnn.rst deleted file mode 100644 index 7cffac3b59..0000000000 --- a/doc/source/tutorial_dpnn.rst +++ /dev/null @@ -1,4 +0,0 @@ -Data-parallel Neural Networks -============================= - -1 diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh b/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh new file mode 100755 index 0000000000..231c62c24c --- /dev/null +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_conda.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +## 1. If necessary, install conda: https://www.anaconda.com/docs/getting-started/miniconda/install + + +## 2. Setup conda environment +conda create --name heat-env python=3.11 +conda activate heat-env || exit 1 +conda install -c conda-forge heat xarray jupyter scikit-learn ipyparallel + +## 3. Setup kernel +python -m ipykernel install --user --name=heat-env + +## 3. Start notebook +jupyter notebook diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb new file mode 100644 index 0000000000..6e4662a701 --- /dev/null +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb @@ -0,0 +1,620 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "# Setting up a parallel notebook with heat, SLURM, and ipyparallel on HAICORE/Horeka" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "The original version of this tutorial was inspired by the [CS228 tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) by Volodomyr Kuleshov and Isaac Caswell." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "\n", + "\n", + "\n", + "## Introduction\n", + "---\n", + "
\n", + "Note:\n", + "This notebook expects that you will be working on the JupyterLab hosted in HAICORE, at the Karlsruhe Institute of Technology.\n", + "\n", + "If you want to run the tutorial on your local machine, or on another systems, please refer to the local setup notebook in this repository for reference, or to our notebook gallery for more examples.\n", + "
\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "## Setting up the environment\n", + "\n", + "The rest of this tutorial assumes you have started a JupyterLab at [Jupyter for HAICORE](https://haicore-jupyter.scc.kit.edu/) with the following parameters:\n", + "\n", + "| **Resources** | |\n", + "| --- | --- |\n", + "| Nodes | 1 |\n", + "| GPUs | 4 |\n", + "| Runtime (hours) | 4 |\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Resources\n", + "\n", + "We will be running the tutorial on the GPU partition of the [HAICORE](https://www.nhr.kit.edu/userdocs/haicore/hardware/) cluster, with the following hardware:\n", + "\n", + "- 2× Intel Xeon Platinum 8368, 2 × 38 cores\n", + "- 4x NVIDIA A100-40\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Setup environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first step is to load (and unload) the right modules on HAICORE+Jupyter. \n", + "\n", + "On the left bar on Jupyter Lab, open the modules tab, and make to unload any ```jupyter``` modules, and the load ```mpi/openmpi/4.1``` and ```devel/cuda/12.4```.\n", + "\n", + "Afterwards, run the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "Currently Loaded Modules:\n", + "i/4.1dot 3) numlib/mkl/2022.0.2 5) mpi/openmp\n", + " 2) compiler/intel/2023.1.0 4) devel/cuda/12.4 (E)\n", + "\n", + " Where:\n", + " E: Experimental\n", + "\n", + " \n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: heat in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (1.5.1)\n", + "Requirement already satisfied: mpi4py>=3.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (4.0.3)\n", + "Requirement already satisfied: numpy<2,>=1.22.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (1.26.4)\n", + "Requirement already satisfied: torch<2.6.1,>=2.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (2.6.0)\n", + "Requirement already satisfied: scipy>=1.10.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (1.15.3)\n", + "Requirement already satisfied: pillow>=6.0.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (11.2.1)\n", + "Requirement already satisfied: torchvision<0.21.1,>=0.15.2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from heat) (0.21.0)\n", + "Requirement already satisfied: filelock in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (4.13.2)\n", + "Requirement already satisfied: networkx in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.1.6)\n", + "Requirement already satisfied: fsspec in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (2025.3.2)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from torch<2.6.1,>=2.0.0->heat) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from sympy==1.13.1->torch<2.6.1,>=2.0.0->heat) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jinja2->torch<2.6.1,>=2.0.0->heat) (3.0.2)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: There was an error checking the latest version of pip.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: ipyparallel in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (9.0.1)\n", + "Requirement already satisfied: decorator in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (5.2.1)\n", + "Requirement already satisfied: ipykernel>=6.9.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (6.29.5)\n", + "Requirement already satisfied: ipython>=5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (9.2.0)\n", + "Requirement already satisfied: jupyter-client>=7 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (8.6.3)\n", + "Requirement already satisfied: psutil in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (7.0.0)\n", + "Requirement already satisfied: python-dateutil>=2.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (2.9.0.post0)\n", + "Requirement already satisfied: pyzmq>=25 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (26.4.0)\n", + "Requirement already satisfied: tornado>=6.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (6.4.2)\n", + "Requirement already satisfied: tqdm in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (4.67.1)\n", + "Requirement already satisfied: traitlets>=5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipyparallel) (5.14.3)\n", + "Requirement already satisfied: comm>=0.1.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (0.2.2)\n", + "Requirement already satisfied: debugpy>=1.6.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (1.8.14)\n", + "Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (5.7.2)\n", + "Requirement already satisfied: matplotlib-inline>=0.1 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (0.1.7)\n", + "Requirement already satisfied: nest-asyncio in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (1.6.0)\n", + "Requirement already satisfied: packaging in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipykernel>=6.9.1->ipyparallel) (25.0)\n", + "Requirement already satisfied: ipython-pygments-lexers in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (1.1.1)\n", + "Requirement already satisfied: jedi>=0.16 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (0.19.2)\n", + "Requirement already satisfied: pexpect>4.3 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (4.9.0)\n", + "Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (3.0.51)\n", + "Requirement already satisfied: pygments>=2.4.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (2.19.1)\n", + "Requirement already satisfied: stack_data in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (0.6.3)\n", + "Requirement already satisfied: typing_extensions>=4.6 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from ipython>=5->ipyparallel) (4.13.2)\n", + "Requirement already satisfied: six>=1.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from python-dateutil>=2.1->ipyparallel) (1.17.0)\n", + "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jedi>=0.16->ipython>=5->ipyparallel) (0.8.4)\n", + "Requirement already satisfied: platformdirs>=2.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel>=6.9.1->ipyparallel) (4.3.8)\n", + "Requirement already satisfied: ptyprocess>=0.5 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from pexpect>4.3->ipython>=5->ipyparallel) (0.7.0)\n", + "Requirement already satisfied: wcwidth in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=5->ipyparallel) (0.2.13)\n", + "Requirement already satisfied: executing>=1.2.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (2.2.0)\n", + "Requirement already satisfied: asttokens>=2.1.0 in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (3.0.0)\n", + "Requirement already satisfied: pure-eval in /home/scc/io3047/venvs/heat_nb_env/lib64/python3.11/site-packages (from stack_data->ipython>=5->ipyparallel) (0.2.3)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: There was an error checking the latest version of pip.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Installed kernelspec myEnv in /hkfs/home/haicore/scc/io3047/.local/share/jupyter/kernels/myenv\n" + ] + } + ], + "source": [ + "%%bash\n", + "# Report modules\n", + "ml list\n", + "\n", + "# Create a virtual environment\n", + "python3.11 -m venv heat-env\n", + "source heat-env/bin/activate\n", + "pip install heat[hdf5] ipyparallel xarray matplotlib scikit-learn perun[nvidia]\n", + "\n", + "python -m ipykernel install \\\n", + " --user \\\n", + " --name heat-env \\\n", + " --display-name \"heat-env\"\n", + "deactivate" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "To be able to run this tutorial interactively for parallel computing, we need to start an [IPython cluster](https://ipyparallel.readthedocs.io/en/latest/tutorial/process.html).\n", + "\n", + "\n", + "In the terminal, type:\n", + "\n", + "```bash\n", + "ipcluster start -n 4 --engines=MPI --MPILauncher.mpi_args=\"--oversubscribe\"\n", + "```\n", + "On your terminal, you should see something like this:\n", + "\n", + "```bash\n", + "2024-03-04 16:30:24.740 [IPController] Registering 4 new hearts\n", + "2024-03-04 16:30:24.740 [IPController] registration::finished registering engine 0:63ac2343-f1deab70b14c0e14ca4c1630 in 5672ms\n", + "2024-03-04 16:30:24.740 [IPController] engine::Engine Connected: 0\n", + "2024-03-04 16:30:24.744 [IPController] registration::finished registering engine 3:673ce83c-eb7ccae6c69c52382c8349c1 in 5397ms\n", + "2024-03-04 16:30:24.744 [IPController] engine::Engine Connected: 3\n", + "2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 1:d7936040-5ab6c117b845850a3103b2e8 in 5627ms\n", + "2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 1\n", + "2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 2:ca57a419-2f2c89914a6c17865103c3e7 in 5508ms\n", + "2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 2\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Note:\n", + "You must now reload the kernel to be able to access the IPython cluster.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To be able to start working with Heat on an HPC cluster, we first need to check the health of the available processes. We will use `ipyparallel` for this. For a great intro on `ipyparallel` usage on our supercomputers, check out Jan Meinke's tutorial [\"Interactive Parallel Computing with IPython Parallel\"](https://gitlab.jsc.fz-juelich.de/sdlbio-courses/hpc-python/-/blob/master/06_LocalParallel.ipynb) or the [ipyparallel docs](https://ipyparallel.readthedocs.io/en/latest/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.wait_for_engines(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Earlier, we have started an IPython cluster with 4 processes. We can now check if the processes are available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3]" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rc.ids" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `px` magic command allows you to execute Python commands or a Jupyter cell on the ipyparallel engines interactively ([%%px documentation](https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html))." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now finally import `heat` on our 4-process cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "%px: 100%|██████████| 4/4 [00:01<00:00, 2.77tasks/s]\n" + ] + } + ], + "source": [ + "%%px\n", + "import heat as ht" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "ht.use_device(\"gpu\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:3]: \u001b[0m\n", + "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "f2f42775-a79bbfdf74b1451745b1b33b", + "error": null, + "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')" + }, + "execution_count": 3, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-13T13:57:17.136864Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:3]: \u001b[0m\n", + "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "41c89ea2-836a289f0df22369ee3a4a41", + "error": null, + "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')" + }, + "execution_count": 3, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-13T13:57:17.136703Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:3]: \u001b[0m\n", + "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "9a961c9d-e3973d86ed7923c48e730123", + "error": null, + "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')" + }, + "execution_count": 3, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-13T13:57:17.136816Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:3]: \u001b[0m\n", + "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "db282267-fc4496217b3a865d3c3b5ae8", + "error": null, + "execute_input": "x = ht.ones((10,10), split=0)\nx.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')" + }, + "execution_count": 3, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-13T13:57:17.136769Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "x = ht.ones((10,10), split=0)\n", + "x.larray\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] True\n", + "2\n", + "0,1\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] True\n", + "2\n", + "0,1\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] True\n", + "2\n", + "0,1\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] True\n", + "2\n", + "0,1\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "import torch\n", + "import os\n", + "\n", + "print(torch.cuda.is_available())\n", + "print(torch.cuda.device_count())\n", + "print(os.environ[\"CUDA_VISIBLE_DEVICES\"])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python (heat_nb_env)", + "language": "python", + "name": "myenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.2" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/tutorials/hpc/1_intro.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb similarity index 78% rename from tutorials/hpc/1_intro.ipynb rename to doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb index 34ad179a7c..ee00ae6115 100644 --- a/tutorials/hpc/1_intro.ipynb +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb @@ -4,8 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Heat Tutorial\n", - "---" + "# Setting up a parallel notebook with SLURM, ipyparallel at JSC" ] }, { @@ -23,7 +22,7 @@ "source": [ "\n", "
\n", - " \n", + " \n", "
\n", "\n", "## Introduction\n", @@ -80,13 +79,6 @@ "Navigate to `$HOME`, then `tutorials/hpc`. Open `1_intro.ipynb`." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -197,9 +189,9 @@ "metadata": {}, "source": [ "
\n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", "
" ] }, @@ -231,7 +223,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "tags": [] }, @@ -242,7 +234,7 @@ "[0, 1, 2, 3]" ] }, - "execution_count": 3, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -267,59 +259,11 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "tags": [] }, - "outputs": [ - { - "data": { - "text/plain": [ - "[stderr:3] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stderr:2] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stderr:0] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stderr:1] /p/scratch/training2404/jupyter/kernels/heat1.3.1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "%px: 100%|██████████| 4/4 [00:07<00:00, 1.96s/tasks]\n" - ] - } - ], + "outputs": [], "source": [ "%px import heat as ht" ] @@ -327,9 +271,9 @@ ], "metadata": { "kernelspec": { - "display_name": "heat1.3.1", + "display_name": "heat-dev-311", "language": "python", - "name": "heat1.3.1" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -341,7 +285,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.8" } }, "nbformat": 4, diff --git a/tutorials/local/1_intro.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb similarity index 79% rename from tutorials/local/1_intro.ipynb rename to doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb index 16da4e7563..8656c09896 100644 --- a/tutorials/local/1_intro.ipynb +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb @@ -4,8 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Heat Tutorial\n", - "---" + "# Running Heat in parallel on a Jupyter Notebook" ] }, { @@ -21,20 +20,12 @@ "source": [ "\n", "
\n", - " \n", + " \n", "
\n", "\n", - "## Introduction\n", - "---\n", + "## Installation\n", "\n", - "\n", - "This tutorial is designed to run on your local machine. Generally, to run Heat you need an [MPI](https://hpc-tutorials.llnl.gov/mpi/) installation, and a Python environment with PyTorch and mpi4py. The easiest way to set up such an environment is to install `heat` via conda: \n", - "\n", - "```shell\n", - "conda create --name heat-env python=3.11\n", - "conda activate heat-env\n", - "conda install -c conda-forge heat\n", - "```\n" + "Run the scripts to install our notebook setup, either with ```pip``` or with anaconda. They can be found ```doc/source/tutorials/notebooks/0_setup```." ] }, { @@ -43,9 +34,15 @@ "source": [ "## Setting up the IPyParallel environment\n", "\n", - "In this tutorial, we want to demonstrate how Heat distributes arrays and operations across multiple MPI processes. We can do this interactively in a Jupyter Notebook using IPyParallel. In your virtual environment, install the following packages:\n", + "In this tutorial, we want to demonstrate how Heat distributes arrays and operations across multiple MPI processes. We can do this interactively in a Jupyter Notebook using IPyParallel.\n", + "\n", + "Now you can select the `heat-env` kernel when creating a new notebook.\n", + "\n", + "Finally, you need to start an IPyParallel cluster to access multiple MPI processes. You can do this by running the following command in a terminal inside the jupyter:\n", "\n", "```bash\n", +<<<<<<< HEAD:doc/source/tutorials/notebooks/0_setup/0_setup_local.ipynb +======= "conda install ipyparallel jupyter\n", "```" ] @@ -76,6 +73,7 @@ "Finally, you need to start an IPyParallel cluster to access multiple MPI processes. You can do this by running the following command in a terminal:\n", "\n", "```bash\n", +>>>>>>> stable:tutorials/local/1_intro.ipynb "ipcluster start -n 4 --engines=mpi\n", "```\n", "\n", @@ -93,7 +91,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "tags": [] }, @@ -112,7 +110,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "tags": [] }, @@ -123,7 +121,7 @@ "[0, 1, 2, 3]" ] }, - "execution_count": 3, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -148,7 +146,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "tags": [] }, @@ -156,7 +154,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "01f69af457ad4f9e818c5571fa4d17b3", + "model_id": "d51a2e5dfcad4264a317401c208b0f6d", "version_major": 2, "version_minor": 0 }, @@ -173,18 +171,18 @@ ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "The server can be ```ipcluster``` server can be stopped by stopping the command with CTRL+C." + ] } ], "metadata": { "kernelspec": { - "display_name": "heat_env", + "display_name": "heat-dev-311", "language": "python", - "name": "heat_env" + "name": "python3" }, "language_info": { "codemirror_mode": { diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh b/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh new file mode 100755 index 0000000000..f2e21c518a --- /dev/null +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_pip.sh @@ -0,0 +1,25 @@ +#!/bin/sh + +# 1. If necessary, install openmpi +# Heat can also be installed with pip, but ```openmpi``` has to be available on the system. To install ```openmpi``` on linux/macos: + +# Ubuntu +# sudo apt install openmpi-bin libopenmpi-dev + +# Arch +# sudo pacman -S openmpi + +# MacOS +# brew install openmpi + +# 2. Create environment and install dependencies +python -m venv heat-env +# shellcheck disable=SC1091 +. heat-env/bin/activate || exit 1 +pip install heat xarray jupyter scikit-learn ipyparallel + +# 3. Setup jupyter kernel +python -m ipykernel install --user --name=heat-env + +# 4. Start jupyter +jupyter notebook diff --git a/doc/source/tutorials/notebooks/1_basics.ipynb b/doc/source/tutorials/notebooks/1_basics.ipynb new file mode 100644 index 0000000000..73d3c48b84 --- /dev/null +++ b/doc/source/tutorials/notebooks/1_basics.ipynb @@ -0,0 +1,3165 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Heat Basics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## What is Heat for?\n", + "\n", + "\n", + "\n", + "Straight from our [GitHub repository](https://github.com/helmholtz-analytics/heat):\n", + "\n", + "Heat builds on [PyTorch](https://pytorch.org/) and [mpi4py](https://mpi4py.readthedocs.io) to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.\n", + "\n", + "\n", + "With Heat you can:\n", + "- port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort;\n", + "- exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms;\n", + "- run your NumPy/SciPy code on GPUs (CUDA, ROCm, limited support of Apple MPS).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Why?\n", + "\n", + "- significant **scalability** with respect to task-parallel frameworks;\n", + "- analysis of massive datasets without breaking them up in artificially independent chunks;\n", + "- ease of use: script and test on your laptop, port straight to HPC cluster; \n", + "- PyTorch-based: GPU support beyond the CUDA ecosystem." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " \n", + " \n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connecting to ipyparallel cluster\n", + "\n", + "We have started an `ipcluster` with 4 engines at the end of the [Setup notebook](0_setup/0_setup_local.ipynb).\n", + "\n", + "Let's start the interactive session with a look into the `heat` data object. But first, we need to import the `ipyparallel` client." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 engines found\n" + ] + } + ], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.ids\n", + "\n", + "if len(rc.ids) == 0:\n", + " print(\"No engines found\")\n", + "else:\n", + " print(f\"{len(rc.ids)} engines found\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will always start `heat` cells with the `%%px` magic command to execute the cell on all engines. However, the first section of this tutorial doesn't deal with distributed arrays. In these cases, we will use the `%%px --target 0` magic command to execute the cell only on the first engine.\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DNDarrays\n", + "\n", + "\n", + "Similar to a NumPy `ndarray`, a Heat `dndarray` (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension. \n", + "\n", + "Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a9e49353486a4ec5b84c5718033ed9d2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "%px: 0%| | 0/4 [00:00 'DNDarray'\n", + " Sum of array elements over a given axis. An array with the same shape as ``self.__array`` except\n", + " for the specified axis which becomes one, e.g.\n", + " ``a.shape=(1, 2, 3)`` => ``ht.ones((1, 2, 3)).sum(axis=1).shape=(1, 1, 3)``\n", + " \n", + " Parameters\n", + " ----------\n", + " a : DNDarray\n", + " Input array.\n", + " axis : None or int or Tuple[int,...], optional\n", + " Axis along which a sum is performed. The default, ``axis=None``, will sum all of the\n", + " elements of the input array. If ``axis`` is negative it counts from the last to the first\n", + " axis. If ``axis`` is a tuple of ints, a sum is performed on all of the axes specified in the\n", + " tuple instead of a single axis or all the axes as before.\n", + " out : DNDarray, optional\n", + " Alternative output array in which to place the result. It must have the same shape as the\n", + " expected output, but the datatype of the output values will be cast if necessary.\n", + " keepdims : bool, optional\n", + " If this is set to ``True``, the axes which are reduced are left in the result as dimensions\n", + " with size one. With this option, the result will broadcast correctly against the input\n", + " array.\n", + " \n", + " Examples\n", + " --------\n", + " >>> ht.sum(ht.ones(2))\n", + " DNDarray(2., dtype=ht.float32, device=cpu:0, split=None)\n", + " >>> ht.sum(ht.ones((3,3)))\n", + " DNDarray(9., dtype=ht.float32, device=cpu:0, split=None)\n", + " >>> ht.sum(ht.ones((3,3)).astype(ht.int))\n", + " DNDarray(9, dtype=ht.int64, device=cpu:0, split=None)\n", + " >>> ht.sum(ht.ones((3,2,1)), axis=-3)\n", + " DNDarray([[3.],\n", + " [3.]], dtype=ht.float32, device=cpu:0, split=None)\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "help(ht.sum)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Parallel Processing\n", + "\n", + "Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs. \n", + "\n", + "For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem.\n", + "\n", + "**NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### GPUs\n", + "\n", + "Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "The following cells will only work if you have a GPU available.\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[0:execute]\n", + "\u001b[31m---------------------------------------------------------------------------\u001b[39m\n", + "\u001b[31mKeyError\u001b[39m Traceback (most recent call last)\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:190\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m\n", + "\u001b[32m 189\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n", + "\u001b[32m--> \u001b[39m\u001b[32m190\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__device_mapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m.\u001b[49m\u001b[43mstrip\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mlower\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\n", + "\u001b[32m 191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n", + "\n", + "\u001b[31mKeyError\u001b[39m: 'gpu'\n", + "\n", + "During handling of the above exception, another exception occurred:\n", + "\n", + "\u001b[31mValueError\u001b[39m Traceback (most recent call last)\n", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[20]\u001b[39m\u001b[32m, line 1\u001b[39m\n", + "\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mht\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m4\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mgpu\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", + "\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:1489\u001b[39m, in \u001b[36mzeros\u001b[39m\u001b[34m(shape, dtype, split, device, comm, order)\u001b[39m\n", + "\u001b[32m 1451\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n", + "\u001b[32m 1452\u001b[39m \u001b[33;03mReturns a new :class:`~heat.core.dndarray.DNDarray` of given shape and data type filled with zero values.\u001b[39;00m\n", + "\u001b[32m 1453\u001b[39m \u001b[33;03mMay be allocated split up across multiple nodes along the specified axis.\u001b[39;00m\n", + "\u001b[32m (...)\u001b[39m\u001b[32m 1486\u001b[39m \u001b[33;03m [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m\n", + "\u001b[32m 1487\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n", + "\u001b[32m 1488\u001b[39m \u001b[38;5;66;03m# TODO: implement 'K' option when torch.clone() fix to preserve memory layout is released.\u001b[39;00m\n", + "\u001b[32m-> \u001b[39m\u001b[32m1489\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__factory\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msplit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcomm\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m)\u001b[49m\n", + "\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:763\u001b[39m, in \u001b[36m__factory\u001b[39m\u001b[34m(shape, dtype, split, local_factory, device, comm, order)\u001b[39m\n", + "\u001b[32m 761\u001b[39m dtype = types.canonical_heat_type(dtype)\n", + "\u001b[32m 762\u001b[39m split = sanitize_axis(shape, split)\n", + "\u001b[32m--> \u001b[39m\u001b[32m763\u001b[39m device = \u001b[43mdevices\u001b[49m\u001b[43m.\u001b[49m\u001b[43msanitize_device\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[32m 764\u001b[39m comm = sanitize_comm(comm)\n", + "\u001b[32m 766\u001b[39m \u001b[38;5;66;03m# chunk the shape if necessary\u001b[39;00m\n", + "\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:192\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m\n", + "\u001b[32m 190\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m __device_mapping[device.strip().lower()]\n", + "\u001b[32m 191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n", + "\u001b[32m--> \u001b[39m\u001b[32m192\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m'\u001b[39m\u001b[33mUnknown device, must be one of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[33m\"\u001b[39m\u001b[33m, \u001b[39m\u001b[33m\"\u001b[39m.join(__device_mapping.keys())\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m)\n", + "\n", + "\u001b[31mValueError\u001b[39m: Unknown device, must be one of cpu\n" + ] + }, + { + "ename": "RemoteError", + "evalue": "[0:execute] ValueError: Unknown device, must be one of cpu", + "output_type": "error", + "traceback": [ + "[0:execute]", + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mKeyError\u001b[39m Traceback (most recent call last)", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:190\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m", + "\u001b[32m 189\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:", + "\u001b[32m--> \u001b[39m\u001b[32m190\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__device_mapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m.\u001b[49m\u001b[43mstrip\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mlower\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m", + "\u001b[32m 191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):", + "", + "\u001b[31mKeyError\u001b[39m: 'gpu'", + "", + "During handling of the above exception, another exception occurred:", + "", + "\u001b[31mValueError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[20]\u001b[39m\u001b[32m, line 1\u001b[39m", + "\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mht\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m4\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mgpu\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m", + "", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:1489\u001b[39m, in \u001b[36mzeros\u001b[39m\u001b[34m(shape, dtype, split, device, comm, order)\u001b[39m", + "\u001b[32m 1451\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m", + "\u001b[32m 1452\u001b[39m \u001b[33;03mReturns a new :class:`~heat.core.dndarray.DNDarray` of given shape and data type filled with zero values.\u001b[39;00m", + "\u001b[32m 1453\u001b[39m \u001b[33;03mMay be allocated split up across multiple nodes along the specified axis.\u001b[39;00m", + "\u001b[32m (...)\u001b[39m\u001b[32m 1486\u001b[39m \u001b[33;03m [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m", + "\u001b[32m 1487\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m", + "\u001b[32m 1488\u001b[39m \u001b[38;5;66;03m# TODO: implement 'K' option when torch.clone() fix to preserve memory layout is released.\u001b[39;00m", + "\u001b[32m-> \u001b[39m\u001b[32m1489\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m__factory\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msplit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcomm\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m)\u001b[49m", + "", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/factories.py:763\u001b[39m, in \u001b[36m__factory\u001b[39m\u001b[34m(shape, dtype, split, local_factory, device, comm, order)\u001b[39m", + "\u001b[32m 761\u001b[39m dtype = types.canonical_heat_type(dtype)", + "\u001b[32m 762\u001b[39m split = sanitize_axis(shape, split)", + "\u001b[32m--> \u001b[39m\u001b[32m763\u001b[39m device = \u001b[43mdevices\u001b[49m\u001b[43m.\u001b[49m\u001b[43msanitize_device\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m)\u001b[49m", + "\u001b[32m 764\u001b[39m comm = sanitize_comm(comm)", + "\u001b[32m 766\u001b[39m \u001b[38;5;66;03m# chunk the shape if necessary\u001b[39;00m", + "", + "\u001b[36mFile \u001b[39m\u001b[32m~/code/heat/heat/core/devices.py:192\u001b[39m, in \u001b[36msanitize_device\u001b[39m\u001b[34m(device)\u001b[39m", + "\u001b[32m 190\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m __device_mapping[device.strip().lower()]", + "\u001b[32m 191\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mAttributeError\u001b[39;00m, \u001b[38;5;167;01mKeyError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):", + "\u001b[32m--> \u001b[39m\u001b[32m192\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m'\u001b[39m\u001b[33mUnknown device, must be one of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[33m\"\u001b[39m\u001b[33m, \u001b[39m\u001b[33m\"\u001b[39m.join(__device_mapping.keys())\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m)", + "", + "\u001b[31mValueError\u001b[39m: Unknown device, must be one of cpu" + ] + } + ], + "source": [ + "%%px --target 0\n", + "ht.zeros((3, 4,), device='gpu')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Arrays on the same device can be seamlessly used in any Heat operation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:21]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "a = ht.zeros((3, 4,), device='gpu')\nb = ht.ones((3, 4,), device='gpu')\na + b\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 21, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:40.413421Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "a = ht.zeros((3, 4,), device='gpu')\n", + "b = ht.ones((3, 4,), device='gpu')\n", + "a + b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[0:execute]\n", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", + "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)\n", + "Cell \u001b[0;32mIn[22], line 3\u001b[0m\n", + "\u001b[1;32m 1\u001b[0m a \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mfull((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), \u001b[38;5;241m4\u001b[39m, device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", + "\u001b[1;32m 2\u001b[0m b \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mones((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mgpu\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", + "\u001b[0;32m----> 3\u001b[0m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m\n", + "\n", + "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:124\u001b[0m, in \u001b[0;36m_add\u001b[0;34m(self, other)\u001b[0m\n", + "\u001b[1;32m 122\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_add\u001b[39m(\u001b[38;5;28mself\u001b[39m, other):\n", + "\u001b[1;32m 123\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n", + "\u001b[0;32m--> 124\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43madd\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mother\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[1;32m 125\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:\n", + "\u001b[1;32m 126\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mNotImplemented\u001b[39m\n", + "\n", + "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:119\u001b[0m, in \u001b[0;36madd\u001b[0;34m(t1, t2, out, where)\u001b[0m\n", + "\u001b[1;32m 74\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21madd\u001b[39m(\n", + "\u001b[1;32m 75\u001b[0m t1: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],\n", + "\u001b[1;32m 76\u001b[0m t2: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],\n", + "\u001b[0;32m (...)\u001b[0m\n", + "\u001b[1;32m 80\u001b[0m where: Union[\u001b[38;5;28mbool\u001b[39m, DNDarray] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m,\n", + "\u001b[1;32m 81\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DNDarray:\n", + "\u001b[1;32m 82\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n", + "\u001b[1;32m 83\u001b[0m \u001b[38;5;124;03m Element-wise addition of values from two operands, commutative.\u001b[39;00m\n", + "\u001b[1;32m 84\u001b[0m \u001b[38;5;124;03m Takes the first and second operand (scalar or :class:`~heat.core.dndarray.DNDarray`) whose\u001b[39;00m\n", + "\u001b[0;32m (...)\u001b[0m\n", + "\u001b[1;32m 117\u001b[0m \u001b[38;5;124;03m [5., 6.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m\n", + "\u001b[1;32m 118\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n", + "\u001b[0;32m--> 119\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_operations\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__binary_op\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt1\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m\n", + "\n", + "File \u001b[0;32m~/code/heat/heat/core/_operations.py:204\u001b[0m, in \u001b[0;36m__binary_op\u001b[0;34m(operation, t1, t2, out, where, fn_kwargs)\u001b[0m\n", + "\u001b[1;32m 201\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m t1\u001b[38;5;241m.\u001b[39mlarray\u001b[38;5;241m.\u001b[39mis_mps \u001b[38;5;129;01mand\u001b[39;00m promoted_type \u001b[38;5;241m==\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat64:\n", + "\u001b[1;32m 202\u001b[0m promoted_type \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat32\n", + "\u001b[0;32m--> 204\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43moperation\u001b[49m\u001b[43m(\u001b[49m\u001b[43mt1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfn_kwargs\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[1;32m 206\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m out \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m where \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n", + "\u001b[1;32m 207\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m DNDarray(\n", + "\u001b[1;32m 208\u001b[0m result,\n", + "\u001b[1;32m 209\u001b[0m output_shape,\n", + "\u001b[0;32m (...)\u001b[0m\n", + "\u001b[1;32m 214\u001b[0m balanced\u001b[38;5;241m=\u001b[39moutput_balanced,\n", + "\u001b[1;32m 215\u001b[0m )\n", + "\n", + "\u001b[0;31mRuntimeError\u001b[0m: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\n" + ] + }, + { + "ename": "RemoteError", + "evalue": "[0:execute] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!", + "output_type": "error", + "traceback": [ + "[0:execute]", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[22], line 3\u001b[0m", + "\u001b[1;32m 1\u001b[0m a \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mfull((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), \u001b[38;5;241m4\u001b[39m, device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m'\u001b[39m)", + "\u001b[1;32m 2\u001b[0m b \u001b[38;5;241m=\u001b[39m ht\u001b[38;5;241m.\u001b[39mones((\u001b[38;5;241m3\u001b[39m, \u001b[38;5;241m4\u001b[39m,), device\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mgpu\u001b[39m\u001b[38;5;124m'\u001b[39m)", + "\u001b[0;32m----> 3\u001b[0m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m", + "", + "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:124\u001b[0m, in \u001b[0;36m_add\u001b[0;34m(self, other)\u001b[0m", + "\u001b[1;32m 122\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_add\u001b[39m(\u001b[38;5;28mself\u001b[39m, other):", + "\u001b[1;32m 123\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:", + "\u001b[0;32m--> 124\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43madd\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mother\u001b[49m\u001b[43m)\u001b[49m", + "\u001b[1;32m 125\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:", + "\u001b[1;32m 126\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mNotImplemented\u001b[39m", + "", + "File \u001b[0;32m~/code/heat/heat/core/arithmetics.py:119\u001b[0m, in \u001b[0;36madd\u001b[0;34m(t1, t2, out, where)\u001b[0m", + "\u001b[1;32m 74\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21madd\u001b[39m(", + "\u001b[1;32m 75\u001b[0m t1: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],", + "\u001b[1;32m 76\u001b[0m t2: Union[DNDarray, \u001b[38;5;28mfloat\u001b[39m],", + "\u001b[0;32m (...)\u001b[0m", + "\u001b[1;32m 80\u001b[0m where: Union[\u001b[38;5;28mbool\u001b[39m, DNDarray] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m,", + "\u001b[1;32m 81\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DNDarray:", + "\u001b[1;32m 82\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m", + "\u001b[1;32m 83\u001b[0m \u001b[38;5;124;03m Element-wise addition of values from two operands, commutative.\u001b[39;00m", + "\u001b[1;32m 84\u001b[0m \u001b[38;5;124;03m Takes the first and second operand (scalar or :class:`~heat.core.dndarray.DNDarray`) whose\u001b[39;00m", + "\u001b[0;32m (...)\u001b[0m", + "\u001b[1;32m 117\u001b[0m \u001b[38;5;124;03m [5., 6.]], dtype=ht.float32, device=cpu:0, split=None)\u001b[39;00m", + "\u001b[1;32m 118\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m", + "\u001b[0;32m--> 119\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_operations\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__binary_op\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt1\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m", + "", + "File \u001b[0;32m~/code/heat/heat/core/_operations.py:204\u001b[0m, in \u001b[0;36m__binary_op\u001b[0;34m(operation, t1, t2, out, where, fn_kwargs)\u001b[0m", + "\u001b[1;32m 201\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m t1\u001b[38;5;241m.\u001b[39mlarray\u001b[38;5;241m.\u001b[39mis_mps \u001b[38;5;129;01mand\u001b[39;00m promoted_type \u001b[38;5;241m==\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat64:", + "\u001b[1;32m 202\u001b[0m promoted_type \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mfloat32", + "\u001b[0;32m--> 204\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43moperation\u001b[49m\u001b[43m(\u001b[49m\u001b[43mt1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mt2\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpromoted_type\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfn_kwargs\u001b[49m\u001b[43m)\u001b[49m", + "\u001b[1;32m 206\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m out \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m where \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:", + "\u001b[1;32m 207\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m DNDarray(", + "\u001b[1;32m 208\u001b[0m result,", + "\u001b[1;32m 209\u001b[0m output_shape,", + "\u001b[0;32m (...)\u001b[0m", + "\u001b[1;32m 214\u001b[0m balanced\u001b[38;5;241m=\u001b[39moutput_balanced,", + "\u001b[1;32m 215\u001b[0m )", + "", + "\u001b[0;31mRuntimeError\u001b[0m: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!" + ] + } + ], + "source": [ + "%%px --target 0\n", + "a = ht.full((3, 4,), 4, device='cpu')\n", + "b = ht.ones((3, 4,), device='gpu')\n", + "a + b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is possible to explicitly move an array from one device to the other and back to avoid this error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:23]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "a = ht.full((3, 4,), 4, device='gpu')\na.cpu()\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 23, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.011333Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "a = ht.full((3, 4,), 4, device='gpu')\n", + "a.cpu()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll put our multi-GPU setup to the test in the next section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Distributed Computing\n", + "\n", + "Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept.\n", + "\n", + "\n", + "\n", + "The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions. \n", + "\n", + "\n", + "\n", + "\n", + "Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.\n", + "\n", + "You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node.\n", + "\n", + "On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This\n", + "\n", + "```bash\n", + "python ./my_script.py\n", + "```\n", + "\n", + "becomes this instead:\n", + "\n", + "```bash\n", + "mpirun -n python ./my_script.py\n", + "```\n", + "On an HPC cluster you'll of course use SBATCH or similar.\n", + "\n", + "\n", + "Let's see some examples of working with distributed Heat:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:6]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 6, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.052126Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:6]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 6, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.052193Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:24]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 24, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.052033Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:6]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.061012Z", + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "import heat as ht\ndndarray = ht.arange(60).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 6, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_42", + "outputs": [], + "received": "2025-05-19T19:17:51.067009Z", + "started": "2025-05-19T19:17:51.055404Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.052218Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "import heat as ht\n", + "dndarray = ht.arange(60).reshape(5,4,3)\n", + "dndarray" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.\n", + "\n", + "Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:7]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 7, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.106705Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:25]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 25, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.106454Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:7]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 7, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.106872Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:7]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "dndarray = ht.arange(60, split=0).reshape(5,4,3)\ndndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 7, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.106799Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "dndarray = ht.arange(60, split=0).reshape(5,4,3)\n", + "dndarray" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the \"local array\" via the `larray` attribute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:8]: \u001b[0m\n", + "tensor([[[24, 25, 26],\n", + " [27, 28, 29],\n", + " [30, 31, 32],\n", + " [33, 34, 35]]], dtype=torch.int32)" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.194662Z", + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "dndarray.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[24, 25, 26],\n [27, 28, 29],\n [30, 31, 32],\n [33, 34, 35]]], dtype=torch.int32)" + }, + "execution_count": 8, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_48", + "outputs": [], + "received": "2025-05-19T19:17:51.198154Z", + "started": "2025-05-19T19:17:51.190508Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.178849Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:8]: \u001b[0m\n", + "tensor([[[48, 49, 50],\n", + " [51, 52, 53],\n", + " [54, 55, 56],\n", + " [57, 58, 59]]], dtype=torch.int32)" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.194657Z", + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "dndarray.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[48, 49, 50],\n [51, 52, 53],\n [54, 55, 56],\n [57, 58, 59]]], dtype=torch.int32)" + }, + "execution_count": 8, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_50", + "outputs": [], + "received": "2025-05-19T19:17:51.197555Z", + "started": "2025-05-19T19:17:51.191263Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.180344Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:26]: \u001b[0m\n", + "tensor([[[ 0, 1, 2],\n", + " [ 3, 4, 5],\n", + " [ 6, 7, 8],\n", + " [ 9, 10, 11]],\n", + "\n", + " [[12, 13, 14],\n", + " [15, 16, 17],\n", + " [18, 19, 20],\n", + " [21, 22, 23]]], dtype=torch.int32)" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.195580Z", + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "dndarray.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[ 0, 1, 2],\n [ 3, 4, 5],\n [ 6, 7, 8],\n [ 9, 10, 11]],\n\n [[12, 13, 14],\n [15, 16, 17],\n [18, 19, 20],\n [21, 22, 23]]], dtype=torch.int32)" + }, + "execution_count": 26, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_47", + "outputs": [], + "received": "2025-05-19T19:17:51.198806Z", + "started": "2025-05-19T19:17:51.190353Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.178691Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:8]: \u001b[0m\n", + "tensor([[[36, 37, 38],\n", + " [39, 40, 41],\n", + " [42, 43, 44],\n", + " [45, 46, 47]]], dtype=torch.int32)" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.196655Z", + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "dndarray.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[36, 37, 38],\n [39, 40, 41],\n [42, 43, 44],\n [45, 46, 47]]], dtype=torch.int32)" + }, + "execution_count": 8, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_49", + "outputs": [], + "received": "2025-05-19T19:17:51.204676Z", + "started": "2025-05-19T19:17:51.191467Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.179122Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "dndarray.larray" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator.\n", + "\n", + "(In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] Global shape of the dndarray: (5, 4, 3)\n", + "On rank 0/4, local shape of the dndarray: (2, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] Global shape of the dndarray: (5, 4, 3)\n", + "On rank 1/4, local shape of the dndarray: (1, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] Global shape of the dndarray: (5, 4, 3)\n", + "On rank 2/4, local shape of the dndarray: (1, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] Global shape of the dndarray: (5, 4, 3)\n", + "On rank 3/4, local shape of the dndarray: (1, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "print(f\"Global shape of the dndarray: {dndarray.shape}\")\n", + "print(f\"On rank {dndarray.comm.rank}/{dndarray.comm.size}, local shape of the dndarray: {dndarray.lshape}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported. \n", + "\n", + "The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:28]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "# transpose \ndndarray.T\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 28, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.287542Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:10]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.294221Z", + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "# transpose \ndndarray.T\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 10, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_57", + "outputs": [], + "received": "2025-05-19T19:17:51.297046Z", + "started": "2025-05-19T19:17:51.290699Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.288331Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:10]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.295026Z", + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "# transpose \ndndarray.T\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 10, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_56", + "outputs": [], + "received": "2025-05-19T19:17:51.297591Z", + "started": "2025-05-19T19:17:51.290440Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.288210Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:10]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.296667Z", + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "# transpose \ndndarray.T\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 10, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_58", + "outputs": [], + "received": "2025-05-19T19:17:51.300499Z", + "started": "2025-05-19T19:17:51.293633Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.288398Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px \n", + "# transpose \n", + "dndarray.T\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:1] The slowest run took 31.60 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "504 µs ± 876 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] The slowest run took 28.84 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "501 µs ± 864 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] The slowest run took 29.75 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "503 µs ± 880 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] The slowest run took 8.36 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "237 µs ± 216 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# reduction operation along the distribution axis\n", + "%timeit -n 1 dndarray.sum(axis=0)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] The slowest run took 13.43 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "114 µs ± 141 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] 72.7 µs ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] 71.7 µs ± 35.8 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] The slowest run took 15.67 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "183 µs ± 291 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px \n", + "# reduction operation along non-distribution axis: no communication required\n", + "%timeit -n 1 dndarray.sum(axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Operations between tensors with equal split or no split are fully parallelizable and therefore very fast." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:13]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 13, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.515462Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:31]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 31, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.514668Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:13]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.529643Z", + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 13, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_70", + "outputs": [], + "received": "2025-05-19T19:17:51.532912Z", + "started": "2025-05-19T19:17:51.518984Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.516415Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:13]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:17:51.529626Z", + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n\n# element-wise multiplication\ndndarray * other_dndarray\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 13, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "0799d273-c85f091368add7dbf88c8344_231252_69", + "outputs": [], + "received": "2025-05-19T19:17:51.534904Z", + "started": "2025-05-19T19:17:51.522307Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:17:51.516241Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n", + "\n", + "# element-wise multiplication\n", + "dndarray * other_dndarray\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems.\n", + "\n", + "So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize. \n", + "\n", + "With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "# free up memory if necessary\n", + "try:\n", + " del x, y, z\n", + "except NameError:\n", + " pass\n", + "\n", + "n, m = 4000, 4000\n", + "x = ht.random.randn(n, m, split=0, device=\"gpu\") # distributed RNG\n", + "y = ht.random.randn(m, n, split=None, device=\"gpu\")\n", + "z = x @ y\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that:\n", + "- If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.)\n", + "- If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU.\n", + "- If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option.\n", + "\n", + "Time the multiplication for different split parameters and see how the performance changes.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:1] The slowest run took 15.33 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "2.78 ms ± 2.76 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] The slowest run took 14.90 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "2.69 ms ± 2.65 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] The slowest run took 14.88 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "2.22 ms ± 2.24 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] The slowest run took 14.81 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "2.7 ms ± 2.66 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "z = %timeit -n 1 -r 5 x @ y " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Heat supports many linear algebra operations:\n", + "```bash\n", + ">>> ht.linalg.\n", + "ht.linalg.basics ht.linalg.hsvd_rtol( ht.linalg.projection( ht.linalg.triu(\n", + "ht.linalg.cg( ht.linalg.inv( ht.linalg.qr( ht.linalg.vdot(\n", + "ht.linalg.cross( ht.linalg.lanczos( ht.linalg.solver ht.linalg.vecdot(\n", + "ht.linalg.det( ht.linalg.matmul( ht.linalg.svdtools ht.linalg.vector_norm(\n", + "ht.linalg.dot( ht.linalg.matrix_norm( ht.linalg.trace( \n", + "ht.linalg.hsvd( ht.linalg.norm( ht.linalg.transpose( \n", + "ht.linalg.hsvd_rank( ht.linalg.outer( ht.linalg.tril( \n", + "```\n", + "\n", + "and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](https://github.com/helmholtz-analytics/heat/issues) and we'll get back to you." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interoperability\n", + "\n", + "We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows. Here a basic example with xarrays:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[0:execute]\n", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)\n", + "Cell \u001b[0;32mIn[34], line 1\u001b[0m\n", + "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n", + "\u001b[1;32m 3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n", + "\u001b[1;32m 4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n", + "\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n", + "[2:execute]\n", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)\n", + "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n", + "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n", + "\u001b[1;32m 3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n", + "\u001b[1;32m 4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n", + "\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n", + "[1:execute]\n", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)\n", + "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n", + "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n", + "\u001b[1;32m 3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n", + "\u001b[1;32m 4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n", + "\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n", + "[3:execute]\n", + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)\n", + "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n", + "\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mxarray\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mxr\u001b[39;00m\n", + "\u001b[1;32m 3\u001b[0m local_xr \u001b[38;5;241m=\u001b[39m xr\u001b[38;5;241m.\u001b[39mDataArray(dndarray\u001b[38;5;241m.\u001b[39mlarray, dims\u001b[38;5;241m=\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mz\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n", + "\u001b[1;32m 4\u001b[0m \u001b[38;5;66;03m# proceed with local xarray operations\u001b[39;00m\n", + "\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'xarray'\n" + ] + }, + { + "ename": "AlreadyDisplayedError", + "evalue": "4 errors", + "output_type": "error", + "traceback": [ + "4 errors" + ] + } + ], + "source": [ + "%%px\n", + "import xarray as xr\n", + "\n", + "local_xr = xr.DataArray(dndarray.larray, dims=(\"z\", \"y\", \"x\"))\n", + "# proceed with local xarray operations\n", + "local_xr\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**NOTE:** this is not a distributed `xarray`, but local xarray objects on each rank.\n", + "Work on [expanding xarray support](https://github.com/helmholtz-analytics/heat/pull/1183) is ongoing.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] tensor([-1, 1, 2, 3, 4])\n", + "tensor([0, 1, 2, 3, 4])\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] tensor([-1, 1, 2, 3, 4])\n", + "tensor([0, 1, 2, 3, 4])\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] tensor([-1, 1, 2, 3, 4])\n", + "tensor([0, 1, 2, 3, 4])\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] tensor([-1, 1, 2, 3, 4])\n", + "tensor([0, 1, 2, 3, 4])\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "import torch\n", + "torch_array = torch.arange(5)\n", + "heat_array = ht.array(torch_array, copy=False)\n", + "heat_array[0] = -1\n", + "print(torch_array)\n", + "\n", + "torch_array = torch.arange(5)\n", + "heat_array = ht.array(torch_array, copy=True)\n", + "heat_array[0] = -1\n", + "print(torch_array)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](https://github.com/helmholtz-analytics/heat/issues) if you encounter any issues or have any feature requests." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the [next notebook](2_internals.ipynb), let's have a look at Heat's most important internal functions." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "heat-dev311", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.2" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/source/tutorials/notebooks/2_internals.ipynb b/doc/source/tutorials/notebooks/2_internals.ipynb new file mode 100644 index 0000000000..27f823ba78 --- /dev/null +++ b/doc/source/tutorials/notebooks/2_internals.ipynb @@ -0,0 +1,1417 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Heat as infrastructure for MPI applications\n", + "\n", + "In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your IPython cluster should still be running. Let's check it out." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 engines found\n" + ] + } + ], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.ids\n", + "\n", + "if len(rc.ids) == 0:\n", + " print(\"No engines found\")\n", + "else:\n", + " print(f\"{len(rc.ids)} engines found\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If no engines are found, go back to the [Intro](0_setup/0_setup_local.ipynb) for instructions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We already mentioned that the DNDarray object is \"MPI-aware\". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it. \n", + "\n", + "We will use the %%px magic in every cell that executes MPI code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:22]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 22, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:41.928917Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:22]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 22, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:41.928982Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:22]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 22, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:41.929045Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:40]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:20:41.934024Z", + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "import torch\nimport heat as ht\n\na = ht.random.randn(7,4,3, split=0)\na.comm\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 40, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "cf6f5092-7287c6a9544d4c34e4c3830f_231404_21", + "outputs": [], + "received": "2025-05-19T19:20:41.941201Z", + "started": "2025-05-19T19:20:41.930851Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:41.928786Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "import torch\n", + "import heat as ht\n", + "\n", + "a = ht.random.randn(7,4,3, split=0)\n", + "a.comm" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:3] a is distributed over 4 processes\n", + "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] a is distributed over 4 processes\n", + "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] a is distributed over 4 processes\n", + "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] a is distributed over 4 processes\n", + "a is a distributed 3-dimensional array with global shape (7, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# MPI size = total number of processes \n", + "size = a.comm.size\n", + "\n", + "print(f\"a is distributed over {size} processes\")\n", + "print(f\"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:1] Rank 1 holds a slice of a with local shape (2, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] Rank 0 holds a slice of a with local shape (2, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] Rank 2 holds a slice of a with local shape (2, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] Rank 3 holds a slice of a with local shape (1, 4, 3)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# MPI rank = rank of each process\n", + "rank = a.comm.rank\n", + "# Local shape = shape of the data on each process\n", + "local_shape = a.lshape\n", + "print(f\"Rank {rank} holds a slice of a with local shape {local_shape}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Distribution map\n", + "\n", + "In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array. \n", + "\n", + "The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray. \n", + "\n", + "The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:25]: \u001b[0m\n", + "tensor([[2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [1, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "lshape_map = a.lshape_map\nlshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[2, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [1, 4, 3]])" + }, + "execution_count": 25, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:45.543757Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:25]: \u001b[0m\n", + "tensor([[2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [1, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "lshape_map = a.lshape_map\nlshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[2, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [1, 4, 3]])" + }, + "execution_count": 25, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:45.543320Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:25]: \u001b[0m\n", + "tensor([[2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [1, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "lshape_map = a.lshape_map\nlshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[2, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [1, 4, 3]])" + }, + "execution_count": 25, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:45.543554Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:43]: \u001b[0m\n", + "tensor([[2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [1, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "lshape_map = a.lshape_map\nlshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[2, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [1, 4, 3]])" + }, + "execution_count": 43, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:45.543032Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "lshape_map = a.lshape_map\n", + "lshape_map" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Modifying the DNDarray distribution\n", + "\n", + "In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm.\n", + "\n", + "Depending on your needs, you can choose between:\n", + "- `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a \"target map\".\n", + "- `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis.\n", + "\n", + "Let's see some examples." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:44]: \u001b[0m\n", + "tensor([[1, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [2, 4, 3]])" + }, + "execution_count": 44, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:47.671295Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:26]: \u001b[0m\n", + "tensor([[1, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [2, 4, 3]])" + }, + "execution_count": 26, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:47.671730Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:26]: \u001b[0m\n", + "tensor([[1, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [2, 4, 3]])" + }, + "execution_count": 26, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:47.671921Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:26]: \u001b[0m\n", + "tensor([[1, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3],\n", + " [2, 4, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "#redistribute\ntarget_map = a.lshape_map\ntarget_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n# in-place redistribution (see ht.redistribute for out-of-place)\na.redistribute_(target_map=target_map)\n\n# new lshape map after redistribution\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[1, 4, 3],\n [2, 4, 3],\n [2, 4, 3],\n [2, 4, 3]])" + }, + "execution_count": 26, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:47.671506Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "#redistribute\n", + "target_map = a.lshape_map\n", + "target_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n", + "# in-place redistribution (see ht.redistribute for out-of-place)\n", + "a.redistribute_(target_map=target_map)\n", + "\n", + "# new lshape map after redistribution\n", + "a.lshape_map" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:27]: \u001b[0m\n", + "tensor([[[ 0.5730, -1.0918, -0.8577],\n", + " [ 0.6610, -0.4874, 0.9850],\n", + " [ 1.0930, -0.8518, -0.7061],\n", + " [-0.7625, 0.6767, 0.1940]],\n", + "\n", + " [[-1.1230, 0.2482, 0.7127],\n", + " [-0.3202, -0.3510, -1.2052],\n", + " [-1.0595, -0.5830, 0.4192],\n", + " [ 0.5600, -1.2777, -0.1323]]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "# local arrays after redistribution\na.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[ 0.5730, -1.0918, -0.8577],\n [ 0.6610, -0.4874, 0.9850],\n [ 1.0930, -0.8518, -0.7061],\n [-0.7625, 0.6767, 0.1940]],\n\n [[-1.1230, 0.2482, 0.7127],\n [-0.3202, -0.3510, -1.2052],\n [-1.0595, -0.5830, 0.4192],\n [ 0.5600, -1.2777, -0.1323]]])" + }, + "execution_count": 27, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:48.893023Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:27]: \u001b[0m\n", + "tensor([[[ 1.6286, 0.4707, -0.5730],\n", + " [ 0.3841, -0.4789, -0.8033],\n", + " [ 0.1299, -0.6602, -2.0182],\n", + " [ 0.5541, -0.1653, -0.4314]],\n", + "\n", + " [[ 1.1544, -0.8126, -0.7634],\n", + " [-0.0817, -1.5430, -0.6341],\n", + " [ 0.0291, 0.9677, 0.1294],\n", + " [-0.3747, -1.4987, -0.1063]]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "# local arrays after redistribution\na.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[ 1.6286, 0.4707, -0.5730],\n [ 0.3841, -0.4789, -0.8033],\n [ 0.1299, -0.6602, -2.0182],\n [ 0.5541, -0.1653, -0.4314]],\n\n [[ 1.1544, -0.8126, -0.7634],\n [-0.0817, -1.5430, -0.6341],\n [ 0.0291, 0.9677, 0.1294],\n [-0.3747, -1.4987, -0.1063]]])" + }, + "execution_count": 27, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:48.892765Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:27]: \u001b[0m\n", + "tensor([[[-0.0919, -0.7646, 0.1660],\n", + " [-0.9814, 0.9445, -1.8339],\n", + " [-1.0218, 0.8454, -0.6050],\n", + " [-0.4161, -0.0764, 0.4383]],\n", + "\n", + " [[ 0.3151, -2.1761, 0.9970],\n", + " [ 0.9423, 0.7667, 0.6834],\n", + " [ 1.9586, -0.0994, 0.0186],\n", + " [-0.0961, -0.3901, 1.2133]]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "# local arrays after redistribution\na.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[-0.0919, -0.7646, 0.1660],\n [-0.9814, 0.9445, -1.8339],\n [-1.0218, 0.8454, -0.6050],\n [-0.4161, -0.0764, 0.4383]],\n\n [[ 0.3151, -2.1761, 0.9970],\n [ 0.9423, 0.7667, 0.6834],\n [ 1.9586, -0.0994, 0.0186],\n [-0.0961, -0.3901, 1.2133]]])" + }, + "execution_count": 27, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:48.892426Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:45]: \u001b[0m\n", + "tensor([[[-0.1776, -0.8116, -0.6636],\n", + " [ 0.3238, 2.4110, 0.4005],\n", + " [-0.7808, -2.0984, 1.7691],\n", + " [ 0.9370, 0.0141, 0.6934]]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "# local arrays after redistribution\na.larray\n", + "execute_result": { + "data": { + "text/plain": "tensor([[[-0.1776, -0.8116, -0.6636],\n [ 0.3238, 2.4110, 0.4005],\n [-0.7808, -2.0984, 1.7691],\n [ 0.9370, 0.0141, 0.6934]]])" + }, + "execution_count": 45, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:48.891730Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# local arrays after redistribution\n", + "a.larray" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:28]: \u001b[0m\n", + "tensor([[7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[7, 1, 3],\n [7, 1, 3],\n [7, 1, 3],\n [7, 1, 3]])" + }, + "execution_count": 28, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:49.681796Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:28]: \u001b[0m\n", + "tensor([[7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[7, 1, 3],\n [7, 1, 3],\n [7, 1, 3],\n [7, 1, 3]])" + }, + "execution_count": 28, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:49.682052Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:28]: \u001b[0m\n", + "tensor([[7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[7, 1, 3],\n [7, 1, 3],\n [7, 1, 3],\n [7, 1, 3]])" + }, + "execution_count": 28, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:49.682295Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:46]: \u001b[0m\n", + "tensor([[7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3],\n", + " [7, 1, 3]])" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "# resplit\na.resplit_(axis=1)\n\na.lshape_map\n", + "execute_result": { + "data": { + "text/plain": "tensor([[7, 1, 3],\n [7, 1, 3],\n [7, 1, 3],\n [7, 1, 3]])" + }, + "execution_count": 46, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:49.681493Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# resplit\n", + "a.resplit_(axis=1)\n", + "\n", + "a.lshape_map" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:47]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 47, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:53.077278Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:29]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 29, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:53.077581Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:29]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 29, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:53.077833Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:29]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "# \"un-split\" distributed array\na.resplit_(axis=None)\n# each process now holds a copy of the entire array\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 29, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:20:53.078115Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# \"un-split\" distributed array\n", + "a.resplit_(axis=None)\n", + "# each process now holds a copy of the entire array" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "# make `a` split again\n", + "a = ht.array(a, split=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Making disjoint data into a global DNDarray\n", + "\n", + "Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend.\n", + "\n", + "You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be \"joined\" into a global, distributed DNDarray." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:49]: \u001b[0m(12, 4)" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n", + "execute_result": { + "data": { + "text/plain": "(12, 4)" + }, + "execution_count": 49, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:21:07.545019Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:31]: \u001b[0m(12, 4)" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n", + "execute_result": { + "data": { + "text/plain": "(12, 4)" + }, + "execution_count": 31, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:21:07.545093Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:31]: \u001b[0m(12, 4)" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n", + "execute_result": { + "data": { + "text/plain": "(12, 4)" + }, + "execution_count": 31, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:21:07.545314Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:31]: \u001b[0m(12, 4)" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:21:07.555075Z", + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "# create some random local arrays on each process\nimport numpy as np\nlocal_array = np.random.rand(3, 4)\n\n# join them into a distributed array\na_0 = ht.array(local_array, is_split=0)\na_0.shape\n", + "execute_result": { + "data": { + "text/plain": "(12, 4)" + }, + "execution_count": 31, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "cf6f5092-7287c6a9544d4c34e4c3830f_231404_59", + "outputs": [], + "received": "2025-05-19T19:21:07.560600Z", + "started": "2025-05-19T19:21:07.547257Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:21:07.545116Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# create some random local arrays on each process\n", + "import numpy as np\n", + "local_array = np.random.rand(3, 4)\n", + "\n", + "# join them into a distributed array\n", + "a_0 = ht.array(local_array, is_split=0)\n", + "a_0.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `ht.array` function takes any data object as an input that can be converted to a torch tensor. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can access the MPI communication functionalities of the DNDarray through the `comm` attribute, i.e.:\n", + "\n", + "```python\n", + "# these are just examples, this cell won't do anything\n", + "a.comm.Allreduce(a, b, op=MPI.SUM)\n", + "\n", + "a.comm.Allgather(a, b)\n", + "a.comm.Isend(a, dest=1, tag=0)\n", + "```\n", + "\n", + "etc." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the next notebooks, we'll show you how we use Heat's distributed-array infrastructure to scale complex data analysis workflows to large datasets and high-performance computing resources.\n", + "\n", + "- [Data loading and preprocessing](3_loading_preprocessing.ipynb)\n", + "- [Matrix factorization algorithms](4_matrix_factorizations.ipynb)\n", + "- [Clustering algorithms](5_clustering.ipynb)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "heat-dev-311", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb b/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb new file mode 100644 index 0000000000..9db5a38216 --- /dev/null +++ b/doc/source/tutorials/notebooks/3_loading_preprocessing.ipynb @@ -0,0 +1,488 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Loading and Preprocessing\n", + "\n", + "### Refresher\n", + "\n", + "Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. \n", + "\n", + "As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each \"worker\" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.\n", + "\n", + "In other words: \n", + "- you don't have to worry about optimizing data chunk sizes; \n", + "- you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; \n", + "- you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### I/O\n", + "\n", + "Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).\n", + "\n", + "This particular example data set (generated from all Asteroids from the [JPL Small Body Database](https://ssd.jpl.nasa.gov/sb/)) is really small, but it allows to demonstrate the basic functionality of Heat. \n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your ipcluster should still be running (see the [Intro](0_setup/0_setup_local.ipynb)). Let's test it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3]" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.ids" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above cell should return [0, 1, 2, 3].\n", + "\n", + "Now let's import `heat` and load the data set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:54]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 54, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:24:32.711141Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:36]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 36, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:24:32.711423Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:36]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 36, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:24:32.711532Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:36]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 36, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:24:32.711290Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "import heat as ht\n", + "import sklearn\n", + "import sklearn.datasets\n", + "\n", + "X,_ = sklearn.datasets.load_digits(return_X_y=True)\n", + "X = ht.array(X, split=0)\n", + "X\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have loaded the entire data onto 4 MPI processes. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data exploration\n", + "\n", + "Let's get an idea of the size of the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] X is a 2-dimensional array with shape(1797, 64)\n", + "X takes up 0.920064 MB of memory.\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px \n", + "# print global metadata once only\n", + "if X.comm.rank == 0:\n", + " print(f\"X is a {X.ndim}-dimensional array with shape{X.shape}\")\n", + " print(f\"X takes up {X.nbytes/1e6} MB of memory.\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "X is a matrix of shape *(datapoints, features)*. \n", + "\n", + "To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] Mean: DNDarray([0.0000e+00, 3.0384e-01, 5.2048e+00, 1.1836e+01, 1.1848e+01, 5.7819e+00, 1.3623e+00, 1.2966e-01, 5.5648e-03,\n", + " 1.9939e+00, 1.0382e+01, 1.1979e+01, 1.0279e+01, 8.1758e+00, 1.8464e+00, 1.0796e-01, 2.7824e-03, 2.6016e+00,\n", + " 9.9032e+00, 6.9928e+00, 7.0979e+00, 7.8063e+00, 1.7885e+00, 5.0083e-02, 1.1130e-03, 2.4697e+00, 9.0913e+00,\n", + " 8.8214e+00, 9.9271e+00, 7.5515e+00, 2.3178e+00, 2.2259e-03, 0.0000e+00, 2.3395e+00, 7.6672e+00, 9.0718e+00,\n", + " 1.0302e+01, 8.7440e+00, 2.9093e+00, 0.0000e+00, 8.9037e-03, 1.5838e+00, 6.8815e+00, 7.2282e+00, 7.6722e+00,\n", + " 8.2365e+00, 3.4563e+00, 2.7268e-02, 7.2343e-03, 7.0451e-01, 7.5070e+00, 9.5392e+00, 9.4162e+00, 8.7585e+00,\n", + " 3.7251e+00, 2.0646e-01, 5.5648e-04, 2.7935e-01, 5.5576e+00, 1.2089e+01, 1.1809e+01, 6.7641e+00, 2.0679e+00,\n", + " 3.6450e-01], dtype=ht.float32, device=cpu:0, split=None)\n", + "Var: DNDarray([0.0000e+00, 8.2254e-01, 2.2596e+01, 1.8043e+01, 1.8371e+01, 3.2090e+01, 1.1055e+01, 1.0756e+00, 8.8728e-03,\n", + " 1.0210e+01, 2.9376e+01, 1.5812e+01, 2.2861e+01, 3.6618e+01, 1.2855e+01, 6.8506e-01, 3.8876e-03, 1.2783e+01,\n", + " 3.2367e+01, 3.3652e+01, 3.8118e+01, 3.8385e+01, 1.0621e+01, 1.9226e-01, 1.1117e-03, 9.8952e+00, 3.8320e+01,\n", + " 3.4590e+01, 3.7827e+01, 3.4468e+01, 1.3582e+01, 2.2210e-03, 0.0000e+00, 1.2106e+01, 3.9979e+01, 3.9271e+01,\n", + " 3.5187e+01, 3.4445e+01, 1.2505e+01, 0.0000e+00, 2.1067e-02, 8.8863e+00, 4.2721e+01, 4.1468e+01, 3.9160e+01,\n", + " 3.2421e+01, 1.8747e+01, 9.4415e-02, 4.1684e-02, 3.0474e+00, 3.1843e+01, 2.7306e+01, 2.8096e+01, 3.6355e+01,\n", + " 2.4187e+01, 9.6851e-01, 5.5617e-04, 8.7243e-01, 2.6026e+01, 1.9127e+01, 2.4330e+01, 3.4798e+01, 1.6723e+01,\n", + " 3.4581e+00], dtype=ht.float64, device=cpu:0, split=None)\n", + "Max: DNDarray([ 0., 8., 16., 16., 16., 16., 16., 15., 2., 16., 16., 16., 16., 16., 16., 12., 2., 16., 16., 16., 16., 16.,\n", + " 16., 8., 1., 15., 16., 16., 16., 16., 15., 1., 0., 14., 16., 16., 16., 16., 14., 0., 4., 16., 16., 16.,\n", + " 16., 16., 16., 6., 8., 16., 16., 16., 16., 16., 16., 13., 1., 9., 16., 16., 16., 16., 16., 16.], dtype=ht.float64, device=cpu:0, split=None)\n", + "Min: DNDarray([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=ht.float64, device=cpu:0, split=None)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "features_mean = ht.mean(X,axis=0)\n", + "features_var = ht.var(X,axis=0)\n", + "features_max = ht.max(X,axis=0)\n", + "features_min = ht.min(X,axis=0)\n", + "\n", + "if ht.MPI_WORLD.rank == 0:\n", + " print(f\"Mean: {features_mean}\")\n", + " print(f\"Var: {features_var}\")\n", + " print(f\"Max: {features_max}\")\n", + " print(f\"Min: {features_min}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Preprocessing/scaling\n", + "\n", + "Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated.\n", + "\n", + "Again, please let us know if you're missing any features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:1] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Standard Scaler Mean: \n", + "Standard Scaler Var: \n", + "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Robust Scaler Mean: \n", + "Robust Scaler Median: \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Standard Scaler Mean: \n", + "Standard Scaler Var: \n", + "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Robust Scaler Mean: \n", + "Robust Scaler Median: \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Standard Scaler Mean: \n", + "Standard Scaler Var: \n", + "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Robust Scaler Mean: \n", + "Robust Scaler Median: \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Standard Scaler Mean: DNDarray([ 0.0000e+00, -1.0710e-08, -1.1292e-08, 3.9116e-08, 1.0431e-07, -4.6566e-08, -7.4506e-09, -1.8626e-09,\n", + " 0.0000e+00, -1.0710e-08, -6.3796e-08, -1.1176e-08, -1.1502e-07, -5.9605e-08, -2.2352e-08, 1.8626e-09,\n", + " -2.7940e-09, -1.6764e-08, -9.5344e-08, 5.5879e-08, 1.3970e-08, 5.5181e-08, -2.9802e-08, -7.4506e-09,\n", + " 9.3132e-10, -1.6764e-08, 6.4261e-08, -3.9116e-08, -6.7055e-08, -6.2399e-08, -2.1420e-08, -1.8626e-09,\n", + " 0.0000e+00, 0.0000e+00, 2.9802e-08, -9.0338e-08, -1.3970e-09, 3.5390e-08, 2.6077e-08, 0.0000e+00,\n", + " -1.8626e-09, 3.1199e-08, -2.3749e-08, -6.7055e-08, -2.8871e-08, -4.0978e-08, -3.0384e-08, -5.3551e-09,\n", + " -2.7940e-09, -7.4506e-09, -1.0245e-08, -3.7253e-08, -3.7253e-09, -6.3330e-08, -1.8626e-09, 3.7253e-09,\n", + " -2.7940e-09, -3.7253e-09, -4.0513e-08, 7.6252e-08, 8.9407e-08, -4.0978e-08, 7.4506e-09, 0.0000e+00], dtype=ht.float32, device=cpu:0, split=None)\n", + "Standard Scaler Var: DNDarray([0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n", + " 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n", + " 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n", + " 0.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,\n", + " 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], dtype=ht.float64, device=cpu:0, split=None)\n", + "At least one of the features is almost constant (w.r.t. machine precision) and will not be scaled for this reason.\n", + "Robust Scaler Mean: DNDarray([ 0.0000e+00, 3.0384e-01, 1.5060e-01, -2.3283e-01, -2.3038e-01, 1.6199e-01, 1.3623e+00, 1.2966e-01,\n", + " 5.5648e-03, 6.6463e-01, -1.7974e-01, -1.4580e-01, -9.0081e-02, -6.8679e-02, 9.2321e-01, 1.0796e-01,\n", + " 2.7824e-03, 4.0039e-01, -2.0968e-01, 9.0251e-02, 9.1495e-02, -1.3833e-02, 5.9618e-01, 5.0083e-02,\n", + " 1.1130e-03, 3.6742e-01, -1.5906e-01, -9.8219e-02, -1.7274e-01, 4.5956e-02, 5.7944e-01, 2.2259e-03,\n", + " 0.0000e+00, 5.8486e-01, -2.3770e-02, -7.1401e-02, -2.6984e-01, -1.1418e-01, 3.1822e-01, 0.0000e+00,\n", + " 8.9037e-03, 7.9188e-01, 6.2962e-02, 1.6297e-02, -2.5213e-02, -7.6349e-02, 3.5090e-01, 2.7268e-02,\n", + " 7.2343e-03, 7.0451e-01, -4.4822e-02, -5.1196e-02, -5.8375e-02, -9.5501e-02, 3.8930e-01, 2.0646e-01,\n", + " 5.5648e-04, 2.7935e-01, 1.7307e-01, -1.8219e-01, -3.6515e-01, 6.3671e-02, 1.0339e+00, 3.6450e-01], dtype=ht.float32, device=cpu:0, split=None)\n", + "Robust Scaler Median: DNDarray([0.0000e+00, 8.2254e-01, 3.5306e-01, 7.2170e-01, 7.3486e-01, 2.6521e-01, 1.1055e+01, 1.0756e+00, 8.8728e-03,\n", + " 1.1344e+00, 3.6266e-01, 3.2269e-01, 3.5721e-01, 2.5429e-01, 3.2136e+00, 6.8506e-01, 3.8876e-03, 7.9893e-01,\n", + " 3.2367e-01, 2.7812e-01, 2.6471e-01, 1.9584e-01, 1.1801e+00, 1.9226e-01, 1.1117e-03, 6.1845e-01, 2.6611e-01,\n", + " 2.4021e-01, 2.6269e-01, 2.3936e-01, 8.4890e-01, 2.2210e-03, 0.0000e+00, 7.5664e-01, 2.0398e-01, 2.3237e-01,\n", + " 3.5187e-01, 2.8467e-01, 3.4737e-01, 0.0000e+00, 2.1067e-02, 2.2216e+00, 2.1796e-01, 2.1157e-01, 2.3171e-01,\n", + " 3.2421e-01, 3.8259e-01, 9.4415e-02, 4.1684e-02, 3.0474e+00, 2.6316e-01, 3.3711e-01, 2.8096e-01, 2.1512e-01,\n", + " 4.9361e-01, 9.6851e-01, 5.5617e-04, 8.7243e-01, 3.2131e-01, 7.6509e-01, 6.7584e-01, 2.4165e-01, 4.1808e+00,\n", + " 3.4581e+00], dtype=ht.float64, device=cpu:0, split=None)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "# Standard Scaler\n", + "scaler = ht.preprocessing.StandardScaler()\n", + "X_standardized = scaler.fit_transform(X)\n", + "standardized_mean = ht.mean(X_standardized,axis=0)\n", + "standardized_var = ht.var(X_standardized,axis=0)\n", + "print(f\"Standard Scaler Mean: {standardized_mean}\")\n", + "print(f\"Standard Scaler Var: {standardized_var}\")\n", + "\n", + "# Robust Scaler\n", + "scaler = ht.preprocessing.RobustScaler()\n", + "X_robust = scaler.fit_transform(X)\n", + "robust_mean = ht.mean(X_robust,axis=0)\n", + "robust_var = ht.var(X_robust,axis=0)\n", + "\n", + "print(f\"Robust Scaler Mean: {robust_mean}\")\n", + "print(f\"Robust Scaler Median: {robust_var}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data. Check out our dedicated \"clustering\" notebook for an example.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)! " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "heat-dev-311", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/tutorials/local/5_matrix_factorizations.ipynb b/doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb similarity index 56% rename from tutorials/local/5_matrix_factorizations.ipynb rename to doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb index 0abbd6f0ae..3a862220e4 100644 --- a/tutorials/local/5_matrix_factorizations.ipynb +++ b/doc/source/tutorials/notebooks/4_matrix_factorizations.ipynb @@ -92,7 +92,15 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 engines found\n" + ] + } + ], "source": [ "from ipyparallel import Client\n", "rc = Client(profile=\"default\")\n", @@ -108,11 +116,151 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:41]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 41, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:27:27.875170Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:41]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 41, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:27:27.875244Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:59]: \u001b[0m" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 59, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:27:27.874886Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:41]: \u001b[0m" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:27:27.893702Z", + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "import heat as ht\nimport sklearn\nimport sklearn.datasets\n\nX,_ = sklearn.datasets.load_digits(return_X_y=True)\nX = ht.array(X, split=0)\nX\n", + "execute_result": { + "data": { + "text/plain": "" + }, + "execution_count": 41, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "09810356-47db3eecea6fcfe880a7f49c_231811_4", + "outputs": [], + "received": "2025-05-19T19:27:27.898332Z", + "started": "2025-05-19T19:27:27.879051Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:27:27.875269Z" + }, + "output_type": "display_data" + } + ], "source": [ "%%px\n", "import heat as ht\n", - "X = ht.load_hdf5(\"/p/scratch/training2404/data/JPL_SBDB/sbdb_asteroids.h5\",dataset=\"data\",split=0).T" + "import sklearn\n", + "import sklearn.datasets\n", + "\n", + "X,_ = sklearn.datasets.load_digits(return_X_y=True)\n", + "X = ht.array(X, split=0)\n", + "X" ] }, { @@ -133,7 +281,49 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:3] relative residual: rank: 55\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] relative residual: rank: 55\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] hSVD level 0...\t processes 0\t\t1\t\t2\t\t3\n", + " current ranks: 55\t\t56\t\t58\t\t56\n", + "hSVD level 1...\t processes 0\t\t2\n", + " current ranks: 57\t\t59\n", + "hSVD level 2...\t processes 0\n", + "relative residual: DNDarray(0.0085, dtype=ht.float64, device=cpu:0, split=None) rank: 55\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] relative residual: rank: 55\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "%%px\n", "# compute truncated SVD w.r.t. relative tolerance \n", @@ -152,7 +342,47 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:2] relative residual: rank: 3\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] relative residual: rank: 3\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] hSVD level 0...\t processes 0\t\t1\t\t2\t\t3\n", + " current ranks: 8\t\t8\t\t8\t\t8\n", + "hSVD level 1...\t processes 0\n", + "relative residual: DNDarray(0.5713, dtype=ht.float64, device=cpu:0, split=None) rank: 3\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] relative residual: rank: 3\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "%%px\n", "# compute truncated SVD w.r.t. a fixed truncation rank \n", @@ -206,10 +436,24 @@ } ], "metadata": { + "kernelspec": { + "display_name": "heat-dev-311", + "language": "python", + "name": "python3" + }, "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/doc/source/tutorials/notebooks/5_clustering.ipynb b/doc/source/tutorials/notebooks/5_clustering.ipynb new file mode 100644 index 0000000000..2a603bad6d --- /dev/null +++ b/doc/source/tutorials/notebooks/5_clustering.ipynb @@ -0,0 +1,776 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cluster Analysis\n", + "================\n", + "\n", + "This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html). \n", + "\n", + "We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation.\n", + "\n", + "We will use matplotlib for visualization of data and results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 engines found\n" + ] + } + ], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.ids\n", + "\n", + "if len(rc.ids) == 0:\n", + " print(\"No engines found\")\n", + "else:\n", + " print(f\"{len(rc.ids)} engines found\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%px import heat as ht\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Spherical Clouds of Datapoints\n", + "------------------------------\n", + "For a simple demonstration of the clustering process and the differences between the algorithms, we will create an\n", + "artificial dataset, consisting of two circularly shaped clusters positioned at $(x_1=2, y_1=2)$ and $(x_2=-2, y_2=-2)$ in 2D space.\n", + "For each cluster we will sample 100 arbitrary points from a circle with radius of $R = 1.0$ by drawing random numbers\n", + "for the spherical coordinates $( r\\in [0,R], \\phi \\in [0,2\\pi])$, translating these to cartesian coordinates\n", + "and shifting them by $+2$ for cluster ``c1`` and $-2$ for cluster ``c2``. The resulting concatenated dataset ``data`` has shape\n", + "$(200, 2)$ and is distributed among the ``p`` processes along axis 0 (sample axis)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "\n", + "num_ele = 100\n", + "R = 1.0\n", + "\n", + "# Create default spherical point cloud\n", + "# Sample radius between 0 and 1, and phi between 0 and 2pi\n", + "r = ht.random.rand(num_ele, split=0) * R\n", + "phi = ht.random.rand(num_ele, split=0) * 2 * ht.constants.PI\n", + "\n", + "# Transform spherical coordinates to cartesian coordinates\n", + "x = r * ht.cos(phi)\n", + "y = r * ht.sin(phi)\n", + "\n", + "\n", + "# Stack the sampled points and shift them to locations (2,2) and (-2, -2)\n", + "cluster1 = ht.stack((x + 2, y + 2), axis=1)\n", + "cluster2 = ht.stack((x - 2, y - 2), axis=1)\n", + "\n", + "data = ht.concatenate((cluster1, cluster2), axis=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's plot the data for illustration. In order to do so with matplotlib, we need to unsplit the data (gather it from\n", + "all processes) and transform it into a numpy array. Plotting can only be done on rank 0.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "data_np = ht.resplit(data, axis=None).numpy() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:65]: \u001b[0m[]" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "import matplotlib.pyplot as plt\nplt.plot(data_np[:,0], data_np[:,1], 'bo')\n", + "execute_result": { + "data": { + "text/plain": "[]" + }, + "execution_count": 65, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:28:37.872913Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[output:0]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA5TUlEQVR4nO3df5BV9X3/8ddlE1aJsHFxRWAvgjSTjNNqOyYmkDCFSGNtm+BsFo02SoxxqkEj0mAkJkE6dbCEBIwxUWMCnYnrL1iync5YjRbQjtFEE6apiU5kYIBFZYFmF0lniXfv94/zPbt3754fn/Prfs7d+3zM3Nns3XPvOffg5PO+n8/7834XyuVyWQAAABZMsH0BAACgcRGIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACseZftCwgyNDSkQ4cOafLkySoUCrYvBwAAGCiXyzp+/LhmzJihCROC5zxyHYgcOnRIxWLR9mUAAIAYDhw4oPb29sBjch2ITJ48WZLzQaZMmWL5agAAgImBgQEVi8XhcTxIrgMRdzlmypQpBCIAANQZk7QKklUBAIA1BCIAAMAaAhEAAGBNpoHI97//fZ133nnDOR7z5s3TE088keUpAQBAHck0EGlvb9ddd92ll19+WS+99JI+/vGPa8mSJXrllVeyPC0AAKgThXK5XK7lCVtbW/XNb35T1157beixAwMDamlpUX9/P7tmAACoE1HG75pt3y2VSnr88cd14sQJzZs3z/OYwcFBDQ4ODv8+MDBQq8sDAAAWZJ6s+utf/1qnnXaampubdf3112v79u0699xzPY9dt26dWlpahh9UVQUAYHzLfGnm5MmT2r9/v/r7+7V161Y9+OCD2rVrl2cw4jUjUiwWWZoBAFhVKknPPSe98YY0fbq0YIHU1GT7qvIrytJMzXNEFi9erLlz5+r+++8PPZYcEQCAbd3d0s03SwcPjjzX3i7dfbfU0WHvuvIsyvhd8zoiQ0NDo2Y9AADIq+5uqbNzdBAiSb29zvPd3XauazzJNFl19erVuuSSSzRr1iwdP35cXV1d2rlzp5588sksTwsAQGKlkjMT4rVuUC5LhYK0YoW0ZAnLNElkGogcPnxYV199td544w21tLTovPPO05NPPqm/+qu/yvK0AAAk9txzY2dCKpXL0oEDznELF9bsssadTAORH/7wh1m+PQAAmXnjjXSPgzd6zQAA4GH69HSPgzcCEQAAPCxY4OyOKRS8/14oSMWicxziIxABAMBDU5OzRVcaG4y4v2/aRKJqUgQiAAD46OiQtm6VZs4c/Xx7u/M8dUSSq1mvGQAA6lFHh7NFl8qq2SAQAQAgRFMTW3SzQiACAMhcnF4t9HdpDAQiAIBMxenVUk/9XQiYkiFZFQAwrFSSdu6UHn7Y+VkqJXu/OL1a6qm/S3e3NHu2tGiRdOWVzs/Zs/N1jXlX8+67UdB9FwBqJ+1ZiFLJGZT9yqQXCs777907MoMQ5zW2uAFT9Sjqbu3Nw64aW7M1ue6+CwDInyxmIaL0aknymjREnQkKa4gnOQ3xks4oJVEvszUEIgDQ4LIaVOP0arHR3yXOgJ11wJR0iayelrcIRACgwWU1qMbp1VLr/i5xB+wsA6akMxn1MFtTiUAEABpcVoNqnF4tYa+RnByHI0eiXYuXJAN2VgFTGjMZtpa34iIQAYAGl9WgGqdXS+Vr/JRK0mWXJV9eSDJgZ9EQL62ZDBvLW0kQiABAg8uyy2ycXi0dHdKjj4bv7ki6vJBkwM6iIV5aMxm1Xt5KikAEABpc1l1mOzqkffukHTukri7n5969wVtbf/vb4CDDZFAOS/hMOmCn3RAvrZmMLAPLLFBZFQAwPKh61RHZtCl5PYwovVq6u6U1a8yO9RuUTWqiuAN2b6/3cohbsyRowE6zIV5aMxluYNnZ6XyGys+WRmCZNgqaAQCG2S5XHlbQrNqOHWMDnCiFxtxjJe8BO+uiZJX3+8wzpc99zj8wkqS2NufeTJwY/t5ewVixmE5gGSbK+E0gAgDIjZ07ne2qJorFsRVW41RmtTVge5136lTp6NGxMxmVolS7rYfKqizNAAByI8pODq/lhSgJn+5MSprLK6b8Zm2OHXN+trY6AYkXdyuvyWxNlCUxWwhEAAC5YZonsXat9yAcN+GzlgN22DbdQkE65RTpjDO866W4x6xY4QRQecn1iItABACQG2EJpJLz99tv9/5bWgmf1bkbknT4cDqzJSazNr29we/hNbNTrwhEAAC5EbTjw9XZ6QzA1QFBqeQ8WltHljiqmeyE8crdqJSkI7GUbiGxvBQlS4I6IgCAXPGrz+EGHZs2je2/4vZnWbw4OAhxX+83o+FXYr1S0sZxaRYSy0tRsiTYNQMAyCV3eaSnxwkeqrmBxZe/LG3Y4L+U4wrbCRNl67DX7htT7nmC6pfMnOn87dCh4Boncc5fC1HGb2ZEAAC51NTkLKFs3er993LZeXz728FBSGur9PTT4dVcw3I3qs8dt3GcSSXbu++WvvOd4GPyVJQsCQIRAEBumQQHYf1mjh1zBuywQTtOvkXcHA2T8vBpl5DPK5JVAQC51dOTzvuYBAxx8i2S5GiY1C+xUeOk1ghEAAC5VCpJDz2UznuZBAwmW4ddJrtvTATVL6muinrZZeMrAHERiADAOGa7d0wSzz0n9fWFHzdhwki+SDWvgMHvnphsHXbfU8o2R8Okad94QY4IAIxT7pbWRYukK68cu+U170zzL/7mb5yfJkmdYffELy+jUnu79OijThLsww87/XHC8lSi8NtCnHTbcF6xfRcAxqEoHWjzyrQB3saNTnBwyy3Bjeui3JOgyqpHjow9V1qzFXGa9uUR3XcBoIElHczyspwTVm+jUnu7s423rc37utMa4LMO8EyDrx078l3anToiANDAonSgrZan5ZymJme2w+Trcm+vdPnlzlbdK65wBunKgCLJPXGFNauTnEZ0SZZp4jbtc5VKTjCTxZJRVghEAGCciTuY5S03obvbWQIxERYIJB3gpXSCmTBJmvZFDSLzErQQiADAOBNnMKvFt/0oTHq+VAsKBOLek8qBOqwjritJIzp3C3F14q2rUHByX6q3DUcNIvM088X2XQAYZ8LqYXhtaTX9tn/PPdK0adnkjri5Kb29zkxI3AxGr0Agyj0plaQ773SSTysb6LW1mZ0/SZGzoC3EftuGw4LIQsEJIpcscV7nl+fiBi21TmRmRgQAxhmTXibVg5npt/hbbslm2r/yG/pnP2tWP8SPVyBgek96epxAa82asV18jxwJPq/fbEVUUUu7R1kyytvMl0QgAgDjUtTBLM63+LSm/eMsw3hxZzVKJe8AKOyeSNKnPy0dPer9/kEzNGkXOevokPbtc3bHdHU5P/2a9kXJf6lFnktUbN8FgHHMdCtulK2ylaq3vUbd3hq2rTbKdZTL0tSpowMJt75HZb+W6rog7gxGlOs444zRMyQ2q55G2fL7xhtOcBimq8vZfRRXlPGbHBEAGMeCeplUH2dS3rxa5TfoBQui5SpIZt11/a63crajtdUJQKpnM3p7nVkOvwDFvTf/9E/RrmNoKPo1ZyVK/ovpTEeSPJeoWJoBAEgyK2/uJ+60f9QdJoWC83j44ZFli6eflk491f+ckneA4i4rdXc7OSFRVOeP2Cy/HiUnqK8veOkorTyXKAhEAADDqnMTNm40e9306fFqdUT95t3a6gRLS5c6sxmXXSb9+tfRZ1XcAOXmm51HUrYSPV0mOUHd3U7Rt7Dry7KZnxeWZgAAo1Qu55RK0re+ld20f9iyQrXKmQivDrVRlMvJc1Oq38+d8bFRfr2jY3QuTGX+yzPPSNddF3yPm5qkRx6pfZ4LgQgAwFeUuhZx6pfEyU1xZx0uvzx+rZEsJSlollR1TlCUYK1UcpJwa42lGQBAINOtwEG5CpITNHzhC+bv78WddfjiF/MZhEi1TfQMEmdbtI0giu27AAAjpluBw76F+211LZWkO+6Q/vmfU7/0mjDt4FsLcbdFp9XVNzfdd9etW6cPfehDmjx5ss4880xdeumleu2117I8JQAgI+60v1d320puwuvatd5/99th0tQkXXRRihecskLB2Qbs7typ/ptU+0RPP1G3RdvYLePKNBDZtWuXli9frhdeeEE//elP9cc//lGf+MQndOLEiSxPCwDIgR/8wPv5ctl5eO0wMWn6ZtrzJaqgL+7u9TzwQLSKtabS7oQbZYnFehBVrqHDhw+XJZV37dpldHx/f39ZUrm/vz/jKwMApGnHDjfcCH7s2DH2tdu2lcuFgvOoPNZ97vHHy+UzzjB7/yiP1tZy+emny+UVK8rltrbRfysWnetyvfOOc+1dXc7Pd94Jfj7Itm3lcnv76PO1t48+X1b33+uzpSHK+F3TZNX+/n5JUmtrq+ffBwcHNTAwMOoBALAn7jd102/k27ePfS4sObaz02mMl7Zjx5wZgY0bnesP6vPitUwVtceO5J9QmrRAWtjMkuQsMz39tH8Pm5pJNwbyVyqVyn/7t39b/uhHP+p7zJo1a8qSxjyYEQGA2kvyTd30G/mECc4Mh5eg2YUo3/ijPH784/j3qnoGp3IWx+uevfPO2Ptb/dpi0WxWJeia/GaW0p4FqRRlRqRmu2ZuuOEGPfHEE/qv//ovtbe3ex4zODiowcHB4d8HBgZULBbZNQMANRa1eV21Ukk666zRjeGCbNsW7Vt53CZ9YSZPln74QycPJWx3UPW1+CWH+u2midKsLu5OFq8dTMWikw+S5SxIbnbNuG688Ub9+7//u3bs2OEbhEhSc3OzpkyZMuoBAKitUim4eZ0UXsq8qSna8kmc0uhhlULjOH7cKRsfZXklTo8dKV5J/KiqS/Z7LTPZlmll1XK5rJtuuknbt2/Xzp07NWfOnCxPBwCQeb0PP1EG1qBv6kuWON+8TUQpjZ60tHtUbr5G5SxQ5T3+zW/M3qc6oDAtfBalQJrfv72NkvOmMg1Eli9frq6uLvX09Gjy5Ml68803JUktLS061a9VIgAgNq9B2q+AmJ8439S9BkA3YdI0YDA5r9+SUZbKZWd5ZcUKJ7jq6YkXCFUHFHFK4gdJ49/eiuxSVcqeiaeSyps3bzZ6Pdt3AcBcnIRJL1G33gYltW7bZp4o6rWVt1JYcmctHmvXet/joEdQ0mlaCaVp/dunJZfJqnFQ4h0AzMRNmAx6r7Bv6nv3OrMDYUmtpZKzzdUvB8T02kyTO7PU2jq6A7CpoGTcpAmlaf7bpyV3yaoAgGzFTZj045cIWlmFUzJLau3ocNrLe4lS1TONhmwf+ECy18cJQqZOdZZ0/CRNKE37377WCEQAYBxIaweGW5RrzRrvv1eWMo8yAHZ2OrMC1Rsno5RGT6OrbdxciULBmQ2J4+jR8CDAtI+Pl1rsvskSgQgAjANp7MAIaxu/du3ob+pRB8Ck3/wXLIgfDLimTo3+Hu6szc03xz9vlkFAFrtvaolABADGAZNmcUHdVYNqh7ivf/DB0c/FGQCTfPNvakoWDEjStGnR32PmTGfW5vbbw8um+8kyCEj6b28bgQgAjANNTc42TSlei/o4eQYm/Uza2pyk1zQ6ykpOMJBk78LMmc57TJ1q/potW5xZm6B7HCTrICDpv71tBCIAME6ENYsLWgKJk2dgMjD39TkVVk0qlJpoaho7M2PKDQiamqQHHjB/3eHDI/+7o0P68pejBSK1CAKS/NvbRiACAONI3DyMuHkGfgOgl6QdZV1Ll0qrVpkfXyg4j8qAoKPDyXkxUfmZu7ulDRukoSGz17q7hmrB/bd/+mnpa19zHps3B+/YyQPqiAAAItUO8fp271ZW7e2VbrnFmQnxkmZNi61bpS9+cfS53CWXo0dHnvOqyVEqOctFS5dK//u/ZtcaVq/DS5KGdXF41SRpa5O+9z0nCKwV6ogAACJJmmfgJqHOnOkfhEjp1rTo7HSWiipnf956y3kEzQi5W5QXLw4OQqTRnzksj6b69bVOEPXb9dTX5wRct95au2uJItNeMwCA+uEus3j1KzGt8lnrmhZ+Dd38ZiFMe9V4febe3mjXVssE0bBdT5L0zW9KH/qQE5TkCYEIAGBYR4eTUxC3e2+ea1qYDNZTp0qPPuq9tThopqdSS4v0ox/Fyw2J2znZdLZm+fKRHUB5QSACABglSdv4tDvKpslksD561Pn8XgN1W5vZea66Kl4QkqR7rukMU1+fcx9qmbcShhwRAEBq8lzTIumykcnOIEn67nej7wzyy+8w3WkUZYYpb6XeCUQAAKnKa02LqMtG7s6ahx92fs6fP7ZXjpdCwdm2a1rALWjJqLJ5YND7LVhgPmOTt1LvbN8FAGQibr5DltdjukW5p8d7meSKK5ykTxOmW3d37nQKviV9v61bwxNRi8V0tk6HYfsuAMC6JH1lsroek2Wjnh7/ZZING6S/+zuz86W9gyjsuM7O4EJv1UXd8oJABADQMMKWjZYsCV8mefFFs3OlvYPI5Lj166XHHpPOOGP088Vifku9szQDAONY3pZH8sLvvpguk7S1SUeOxKtC63UtSara+r2nzX/3KOM323cBYJxKsh10vPPbomy6TPL3f+/cx0JhdPAQZ2eQu2TU2ZnO+7nvmactukFYmgGAcSjpdtBGZbpMsmRJujuD8rrTqBZYmgGAcSasOVuajefGm6jLJGkvgUR5P5NjbS3RsDQDAA0srIJoZeO5epm+N5HGoBt1mSTtJRDT9zNZdquXpTkCEQAYZ2rdeC4P0hx002j+F4UbQPX2OiXY29qcJRq/QMqvcZ+77LZ1q/N72DF5CUZYmgGAcSatAln1wm9gdmcw4g66tVjW8AqgXF6BlMmym5tnYnNpLsr4TSACAONMFttB86qe82H8AqhKhcLoQMo0yDSRZSBKZVUAaGB5bjyXtij5MHkS1F+mWmWfmTSX0/KyNEcgAgDjUKNsB63XfJiwAMpVHUil2bAuL83vSFYFgHGqo8Opd5GHyqpx8y3CXve735mdPy+DritqYOQev2CBE0wGLbu5wWfY0tyCBdGuISsEIgAwjuWhwubjj0tf/KJTEt1lsqMlbCdMqST94Afh58/ToOuKGhi5x5tsL3aX5dKs1JollmYAAJm59VbpsstGByGSE1wEVXg1qQxrurxx3XX5GXRd7sxGmELBaVhXGUiZLLvV09Icu2YAAJnYulVaujT4mGJx7I4W050w69ZJn/1s+HV0dUlXXGF82TUTZ9dMJSqrAgDgo1RylmPCeFV4Nd0J87OfmV1L3vJDXH6F01zFYnABNZNltzwszYUhEAEApO6555wqoSaqEzdNEznvvTf473lLyvRSmVBsWll1vCEQAQCkLsqukOoZizPPTH7+PCZl+qmHWYssEYgAAFJnuhzS1jZ6xqK7W/rSl5KfP6u+MFFV52jMny89/7z97dR5QiACAEiduyskbFfLvfeODMQmyZsmNm6UbrrJ/gDvtf24qWmkSqqUz264tcb2XQBA6tx6F9Ul5iutWjWyqyZKyfMw06blIwjx2n5cGYRIo7cjNyoCEQBAJtxdIdX1MtranCJn69ePPGdaE8SE7V0yUYIq95jKfjKNhqUZAEBmTMvMmya3nnaadOJEvkuXRw2qKvvJNGLSKoEIACBTJrtCTGcx3n7b+Znn0uVxG+zlrTFfrbA0AwCwzk1uDcopcU2dmu/S5XGXhmwvKdnCjAgAwDo3ufXTnw4/9uhR6emnndfkcRtsWIfcanlZUrKFGREAQC50dDhJmyYOH3aWe664wvmZlyBEGgmqpPAZnjwtKdlCIAIAyI0lS8yOy/syhl/32+pgI09LSrbQfRcAkBtu512/ZQ13GaO6Y29eNWplVbrvAgDqkrus0dmZ750xprx2DDXiFt0gLM0AAHLFb1mDZYzxKdNA5Nlnn9UnP/lJzZgxQ4VCQT/5yU+yPB0AYJzo6JD27ZN27JC6upyfe/cShIxHmS7NnDhxQueff74+//nPq4P/egAAEZgUQkP9yzQQueSSS3TJJZdkeQoAAFDHyBEBAADW5GrXzODgoAYHB4d/HxgYsHg1AAAga7maEVm3bp1aWlqGH8Vi0fYlAQCADOUqEFm9erX6+/uHHwcOHLB9SQAAIEO5Wpppbm5Wc3Oz7csAAAA1kmkg8vbbb+v1118f/n3v3r3avXu3WltbNWvWrCxPDQAA6kCmgchLL72kRYsWDf++cuVKSdKyZcu0ZcuWLE8NAADqQKaByMKFC5XjnnoAAMCyXCWrAgCAxkIgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsKYmgci9996r2bNn65RTTtGHP/xh/fznP6/FaQEAQM5lHog8+uijWrlypdasWaNf/vKXOv/883XxxRfr8OHDWZ8aAADkXOaByLe//W1dd911uuaaa3Tuuefqvvvu06RJk/SjH/0o61MDAICcyzQQOXnypF5++WUtXrx45IQTJmjx4sX62c9+Nub4wcFBDQwMjHoAAIDxK9NA5MiRIyqVSpo2bdqo56dNm6Y333xzzPHr1q1TS0vL8KNYLGZ5eQAAwLJc7ZpZvXq1+vv7hx8HDhywfUkAACBD78ryzc844ww1NTXprbfeGvX8W2+9pbPOOmvM8c3NzWpubs7ykgAAQI5kOiMyceJEXXDBBXrmmWeGnxsaGtIzzzyjefPmZXlqAABQBzKdEZGklStXatmyZfrgBz+oCy+8UJs2bdKJEyd0zTXXZH1qAACQc5kHIpdffrn6+vr0jW98Q2+++ab+/M//XP/xH/8xJoEVAAA0nkK5XC7bvgg/AwMDamlpUX9/v6ZMmWL7cgAAgIEo43euds0AAIDGQiACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwhkAEAABYQyACAACsIRABAADWEIgAAABrCEQAAIA1BCIAAMAaAhEAAGANgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArCEQAQAA1hCIAAAAawhEAACANQQiAADAGgIRAABgDYEIAACwJrNA5M4779T8+fM1adIkvfe9783qNAAAoI5lFoicPHlSS5cu1Q033JDVKQAAQJ17V1ZvvHbtWknSli1bsjoFAACoc5kFInEMDg5qcHBw+PeBgQGLVwMAALKWq2TVdevWqaWlZfhRLBZtXxIAAMhQpEDktttuU6FQCHy8+uqrsS9m9erV6u/vH34cOHAg9nuhdkolaedO6eGHnZ+lku0rAgDUi0hLM//4j/+oz33uc4HHnHPOObEvprm5Wc3NzbFfj9rr7pZuvlk6eHDkufZ26e67pY4Oe9cFAKgPkQKRtrY2tbW1ZXUtqDPd3VJnp1Quj36+t9d5futWghEAQLDMckT279+v3bt3a//+/SqVStq9e7d2796tt99+O6tTooZKJWcmpDoIkUaeW7GCZRoAQLDMds184xvf0L/+678O//4Xf/EXkqQdO3Zo4cKFWZ0WNfLcc6OXY6qVy9KBA85x/HMDAPxkNiOyZcsWlcvlMQ+CkPHhjTfSPQ4A0JhytX0X9WP69HSPAwA0JgIRxLJggbM7plDw/nuhIBWLznEAAPghEEEsTU3OFl1pbDDi/r5pk3McAAB+CEQQW0eHs0V35szRz7e3s3UXAGAmV71mUH86OqQlS5zdMW+84eSELFjATAgAwAyBCBJramKLLgAgHgKRHCiV4s0oxH0dAAB5QSBiWdxeLfXS44VgCQAQhGTViNLsNOv2aqmuUOr2aunuTvd1tdbdLc2eLS1aJF15pfNz9uz8XB8AwL5CuezVLSQfBgYG1NLSov7+fk2ZMsX25aQ6C1EqOYOyX5n0QsF57717R88gxH1drfk1xHO39uZhVw2zNQCQjSjjNzMihtKehYjSqyWN1yURdRaoHhriMVsDAPlAIGIgi4E1bq+WWvd4iTNg1yJYSrJEVi9LWwDQCAhEDGQxsMbt1VLLHi9xB+ysg6Uksxn1MFsDAI2EQMRAFgNr3F4tYa+TnDyHI0fMr8VLkgE7y2Ap6WyGjaUtAIA/AhEDWQyscXu1VL7OT6kkXXZZsiWGJAN2Vg3x0pjNqPXSFgAgGIGIgawG1ri9Wjo6pEcfDd/hkWSJIcmAnVVDvDRmM2q5tAUACEcgYiDLTrMdHdK+fdKOHVJXl/Nz797wra2//W1wkBE2KIcleyYdsLNoiJfGbEZWQSUAIB4qqxpyB1avOiKbNiWriRG1V0t3t7RmjdmxXoOyST0Ud8Du7fVeCnHrlQQN2Gk3xEtjNsMNKjs7nc9Q+dmSBpUAgOgoaBaR7SJYYQXNqu3YMTrIiVJozD1W8h6wa1GUrPJ+n3mm9LnP+QdHktTW5tybiROD39crGCsWkweVAIBo4zeBSJ3ZudPZrmqiWBxdYTVOVVabA7bXuadOlY4eHTubUcm02q3toBIAxqso4zdLM3Umym6O6iWGKMme7ixK2ssrpvxmbo4dc362tjoBiRd3K2/YjE3UJTEAQPoIROqMaZ7E2rVjB+G4yZ61HrDDtukWCtIpp0hnnOFdL8U9ZsUKJ4hilgMA8otApM6EJZFKzt9vv330c6WS9NZbZucwCXaqczck6fDhdGZMTGZuenuD38NrdgcAkD8EInUmaNeHq7PTGYDdgMAr18KLyU4YKfz94nYkdqVZTIzCZACQb9QRqUN+NTrcWYhNm0b6r9x6q3dJ9GqmW1f9SqxXSto8Ls1iYhQmA4B8Y9dMHXOXR3p6nAAiCZOdMFG2DnvtwDHlnieohsnMmc7fDh0KrnMS5/wAgGSijN/MiNSxpiZnGeXxx5O9z8aNZtVcw3I3KiVpHmdSyfbuu6XvfCf4GAqTAUD+EYjUuTvvDE/cDDNtmtmAHSffIm6OhkmJ+CzKyAMAaotk1ToWpdR7kCwbwSXJ0QirYVIqOfVE7rpL6utzqqrOnElhMgCoJw0fiNRrdU231kYSprtkXCZbh+O+tx+/GiZB/XLq4d8PAOBo6KWZ7m4nKXLRIunKK0d2msTd7VFLUfI1pGh5FH6deYNyN0zfOw1+O3eS7tYBANRewwYi9T6YRcm9WLXKPI8iLDjzy8uofu9HH3WWTaqDmaTCqq5KTkXVtM4HAMhWQ27fjdP8LW9Mm999+tPSjTdK8+dLzz8fvAQVpTNvUGXVI0ekW27xXjZJmkBq+rmruw4DAGqH7rshkgxmeckpCau1US0sEEgrOIsSzMTx8MPOTE2Yri7piivinwcAEB91RELEbf6Wp5ySpiZnoDUNI8OWnKJ05vVTi2WTpDt8/PJfAAB2NGQgEmcwy1tOSXe3tGGD+fFhgUDc4KxSGsFMGHfnjl+ybKHgVIn12q0TJZAkYAGA2mjIQCTqYJaHBMnKgfGZZ/yvJ0hQIBAnOKserE0LqyVpRGdSddVrt06UQDJPM18AMO6Vc6y/v78sqdzf35/6e2/bVi4XCs7DGaKdh/vctm0jx+7YMfoYv8fTTzvHdnU5P995J71rbW83uwaTR1fX2HO8845zjur7UXlfisVyeXDQ+WwrVpTLbW2jj6n+3e+xY0c296RYHP3vVv3Z/K7H/WzvvDPy34XXMdX/XQAAvEUZvxu2oJm7DdWrKFZ18zfTb/CXXSYdOzb6vbwSRKMkvPolfybhNfvhzjR0djozC5Xnc2caPvMZae5c/+WXI0eCz5tWkTMpvOpqJdMlo507g2e+CgVn5mvJkvzupgKAetOwgYhkPpiZLltUBiHSyLR/5U6RoIqgXgFLnCUYP27X2lLJWU6p/rxBwdlnPuPkpARdS9Dfsihy5ld1tZppILlzp3mOC1uDASAdDR2ISGaDWZTS5pWqv0X39HjPbngFLFL06qlB3FmO//s/afHikefb253uu2ecMRKM7dkzuubI/PnSjBnRPvuUKdLAwOjzVM801UqSfjdekuS4AABGa8hk1ahMS5t7MZ32l8YmvMYZ8Nzrmzp19POtrc7Po0dHP3/woLR06ejEzLlzndmdK65wgrS77hr7ujCVQUhbm/Ttb9vrhmuanGw6y5F2YAMAjYxAxJBfaXN3gA8TZdrfFWfAa2+Xtm2T3nrLKcj24x+HL6lUq9xJUiqNBGFxHTni5M/Y2nViutNm4cKxAVz1sX5bgwEA8TT80kwUXjklpdLopY6kKmdBoi4JVc88HDsm3XZb9OWdyiWllpaxuS9R5SHR0y//ZeZM6brrpMFB6c47g2d+yuXsGvkBQKNqyBLvaQorte7uFNm82SxgqS4r7+6akcKDkcoy6lI6u22+9jXpn/852XtUst0DpnLH0u9+Jz3wgHn9k6lTnZkmAhEACEaJ9xqKMu0fpyKoSbdblxt03Hyz9KUvpbPbZmgo+XtUsp3o6SYnNzdLd9xhHoRIzmxJkqqwAICxCERS4BcstLeP7IQJC1jKZadT7nPPja3Q2tEh7dvnzCZ87WvB11IuO0sPUQbYIPff78wERE3S9ZOHRM8k26JtB1IAMN4QiKSkMljo6nJ+7t07eqeIX8Ay4f//K2za5F9O3P0mf+652X0GL0ePOo+ksyt5SvRMsi06D4EUAIwnmQUi+/bt07XXXqs5c+bo1FNP1dy5c7VmzRqdPHkyq1Na5wYL7rZXr1yCyoBlxQrnueoZkKBGerYGwtNOM1sekqL1gLEh7rbovARSADCeZBaIvPrqqxoaGtL999+vV155RRs3btR9992nr371q1mdsm40NTkDmptUWs3tcOLVSM+kJkZ7+8gsS1reflv6whektWv9z1soSKtWBS9Rxe1qm2Y33KjBXN4CKQAYVzLvfFNh/fr15Tlz5hgfn2XTO9tMG+l5NYgLa9i3dm16DfIqH1OnjjSGC2o498473s3/vF7X3h7eSC7u6/yENfirfvg10wMAeMtt07v+/n61BlQAGxwc1ODg4PDvA5XlOXMqSgO7SqbLA9u3j93uGtawr+IWpsrdNRLWo8erbL5f8z6/8vZJXxfEpMHfHXdI73tftH9TAEAMNQiMyuVyufy73/2uPGXKlPIDDzzge8yaNWvKksY88jojkuSbuumMyIQJ5fLjj3u/h9/Mg+l7x3ncfHP0++TOQPi9Z6HgzDq415/0dabCZnYAAPFEmRGJXNDstttu07/8y78EHvPb3/5WH/jAB4Z/7+3t1V/+5V9q4cKFevDBB31f5zUjUiwWc1nQzO+bemVRsaBv6qWSdNZZTvlzE9u2mX/zDyuylsSECdLXvy69//3mswU7dzq7gcJUFzuL+7oo4s5oAQD8RSloFjkQ6evr09GQDmjnnHOOJk6cKEk6dOiQFi5cqI985CPasmWLJkTIosxrZVV3oPfbAuomjO7dGzyo3XKLs5RiolgMf79KW7c6zeyy1t7uLHMEBUkPP+w01AvT1eXsOEr6Oj8EHQBQG1HG78g5Im1tbWprazM6tre3V4sWLdIFF1ygzZs3RwpCspR0QAqrQ1HZwC7om/qSJeaBiMn7ubq7nSCnFrxyNarv75lnmr1X9W4W090tJsd1d3vn1IQFUQCAbGWWrNrb26uFCxfq7LPP1oYNG9TX1zf8t7POOiur04ZKY0AyTTStPM4r+HG34poW1zI5r9+SUVaqG9r19Hjf36lTneZ5Qf14qmt0hDX983tdtSwSXgEAKckqUWXz5s2eiadRTpn29l1326tX0mOhYJ6kGHXrbVBS67Zt5omiXlt5K4Uld2b9WLvW//4GvS7o3odtVQ77N8s64RUAMFamyaq1lGaOSFp5HZXvFfZNfe9eZ4YgLKm1VHJyHPyKdJlem2lyZ1ZaW51Zj6hWrZLWr/f/u9csVrHoLGuFzWTUIuEVADAa3Xc9RMnrMHHddf5BiDSS++HXXM19bsUKZzB95BHv80Sp6plGQ7aPfjT+a+MEIZLz2YMqpZr08fETZxkNAFA7DROIpDUgdXc7syFr1nj/vbKceZTgp7PT2aLb3u7/fmHS6EMze3b01xQKzmxIXCYBoEkfHy9pJrwCANLXMIFIGgOSm/ToF1ysXTv6m3rU4CfJN3/JSdqcPNnsWD9RF+rcGZubb0523qxmJEx689DMDgDsaZhAJOmAVCr5L7O4r6+u1RYn+In7zd997cUXmx/vJeqMyMyZzozN7bcH398wWc1IuOXcpfx3BQaARtQwgUjSASlOjklY8CNJbW1O0mvSjrKu669P9vqPf3zs8lCQLVucGZug+xsm6xkJtzdPUFdgAIAdDROISMkGpDg5JiaDc1+f9NnPOjs7Zs92ln+SWLjQqdkRR7HovN69ZhOHD4/8744O6ctfdsrAR1GLGYmky14AgGw0VCAixR+Q4uaY+AU/XtwCW0mCkaYm6YEHor2mUHAebkCwZIl0zTVmr638vN3d0oYN0WZ21q6tXTDQ1OTMvEyf7tzre+6RHnoovdkoAEB0DVNHJKkotUO8vt27lVV7e53y6xWFZiO9jym/CrJXXOH0cPGryeH1OpPrDKvT4qW93QkKa5WfEfTZKPcOAOnJtOldLeUpEJFGds1Io4MR0467Um0LbPn11PF73rQ8vNfnjVJMLcr9SovJZysUyBkBgDRk2vSukbnLLF4zDSZVPqXaFthyd+CYPB+2K6iS1+eNcr0zZ9Z29sH0s5XLIz1z2EUDALVBIBJRR4czUMXt3pvXAlthu4JcGzdKN9009vNGud5rr40fhMTpnGz62aRoXY4BAMkRiMTgN9NgIq2OsmkzndGYNs174F+wwLzXzNq10nnnRQ9G4nZOjjq7RLl3AKidhts1Y1teC2wlnalpaopWXXXFimg7Vfyq2prsNIo6u0S5dwCoHZJVLUnSUTYLcXYFVS+TzJ8vzZghHT1qdk7ThNyknZPDPlulYjH5jiUAaHR0360DeSuwFXWmxm3+t2iRdOWVzs+5c6XPf978nKZLIEk7J1d+tiCVtVQAALVBIGJRkr4yWTCtPBu0TLJhg3T55WbnM10CSWOnkfvZ/MrXF4ts3QUAGxp+aSbOLozxLuiemCyTzJzpzFL09vofE6VoW5q1VyoLy/X1Ob1+Zs7k3x0A0kQdEUNxd2GMd0G7gkyWSQ4edHbG3HHHyHOuOAm5ae40SrLjCQCQvoZdmkmyC6ORmS6TvO996XW8zetOIwBAcg0ZiARV2nSfi7q9tFFE2eabZkJuks7JAID8asgckVr2e8mLtHJhkjb/SyrK5zA5lhwhAEgfOSIhatnvJQ/SzIVxl0k6O52gI2n+R1SmOR4mn5kcIQCwjxmRAONhRsSv62zSDri1LMgWdaeLyWeWsrkvAIBo43dDBiK2lxdqJWlFUpP3z3pZwyvgcXnNXphuL5ayuy8A0OiorBqiUXZhJK1IGibrgmx+O5tcBw+O3eFkur04y/sCADDXkIGI1Bi7MOo5FyZoZ1O1yh1OaX6WPN4XABhvGjJZ1dXRIS1ZUt+7JoKWR5J21LUpbGbDVTl7sXBhup8lj/cFAMabhg5EpHxU2jx5Uvre96Q9e5zGcV/8ojRxYvjrwnZ9HDkS/h7FollF0lqLOhvhHm9ShdWdBUujUisAIJmGXZrJi1tvlSZNkm65Rfrud52fkyY5zwcJqwz7+OPOe4X51rfyOQMUdTbCPd4k/+fuuxsjRwgA6gGBiEW33ip985tjK7iWSs7zfsGISWXY5cvNljba2qJdc624MxthCoWxszom+T+NkCMEAPWgIbfv5sHJk87MR1AZ+aYm6Q9/GLtMY1oHxURXl7PrJY/86oFUKhT8AwcqqwKAHVRWrQPf+154L5tSyTluxYrRz6e5myPPCZnurIVfHZGwAmom+T95yBECgEZGIGLJnj3xj0sjeKiXhMzKnU2mlVUBAPWDQMSSuXPjHVcqOY/WVunYsXjnrreETGYtAGD8IkfEkjg5IkHlzqPIU2O36hyN+fOl558nZwMA6hk5InVg4kRp5Upnd4yflStHByFhiZumtmyRLroo+fsk5RVYNTWNDs7yFDQBANLH9l2L1q+XVq0a+42/qcl5fv1653eTcuetrdJXv2p23sOH411vmvzqoFTPELl1USr7yQAAxg8CEcvWr3eWXzZulG680fn5hz+MBCGSWbnzY8ekQ4fMzml7p0yUPjLuMZX9ZAAA4wdLMzkwceLYLbqVTLfrbtkiTZ3qBCV5Ll1u2kfGVd1PBgAwfjAjUgfizGDkuXR53DoodMMFgPGHQKQOmJY7l6SjR6U77sh36fK4S0O2l5QAAOljaaYOuI3cPv1ps+Pf9z5p3778li4P65BbLS9LSgCA9DEjUic6OqS1a82OnT59pAjYFVc4P/MShEjBHXKr5WlJCQCQPgKROnL77WOXXCp5daLNK7/ut9XBRp6WlAAA6WNppo40NUnf+Y5TV0MavaxRjzMHlX1kqKwKAI2JEu91yKsiaVgnWgAAaoUS7+Oc10wCMwcAgHqUaY7Ipz71Kc2aNUunnHKKpk+frquuukqHTMt/IlCek1EBADCVaSCyaNEiPfbYY3rttde0bds27dmzR51uggMAAGh4Nc0R+bd/+zddeumlGhwc1Lvf/e7Q48kRAQCg/uQyR+TYsWN66KGHNH/+fN8gZHBwUIODg8O/DwwM1OryAACABZnXEfnKV76i97znPZo6dar279+vnp4e32PXrVunlpaW4UexWMz68gAAgEWRA5HbbrtNhUIh8PHqq68OH79q1Sr96le/0lNPPaWmpiZdffXV8lsNWr16tfr7+4cfBw4ciP/JAABA7kXOEenr69PRo0cDjznnnHM0ceLEMc8fPHhQxWJRzz//vObNmxd6LnJEAACoP5nmiLS1tamtrS3WhQ0NDUnSqDwQAADQuDJLVn3xxRf1i1/8Qh/72Md0+umna8+ePfr617+uuXPnGs2GAACA8S+zZNVJkyapu7tbF110kd7//vfr2muv1Xnnnaddu3apubk5q9MCAIA6ktmMyJ/92Z/pP//zPxO9h5u+wjZeAADqhztum6Sh5rrXzPHjxyWJbbwAANSh48ePq6WlJfCYXHffHRoa0qFDhzR58mQV3D73hgYGBlQsFnXgwAF23ETEvYuH+xYP9y0+7l083Ld4oty3crms48ePa8aMGZowITgLJNczIhMmTFB7e3ui95gyZQr/ocXEvYuH+xYP9y0+7l083Ld4TO9b2EyIK/PKqgAAAH4IRAAAgDXjNhBpbm7WmjVr2CocA/cuHu5bPNy3+Lh38XDf4snqvuU6WRUAAIxv43ZGBAAA5B+BCAAAsIZABAAAWEMgAgAArGmYQORTn/qUZs2apVNOOUXTp0/XVVddpUOHDtm+rFzbt2+frr32Ws2ZM0ennnqq5s6dqzVr1ujkyZO2Ly337rzzTs2fP1+TJk3Se9/7XtuXk2v33nuvZs+erVNOOUUf/vCH9fOf/9z2JeXes88+q09+8pOaMWOGCoWCfvKTn9i+pLqwbt06fehDH9LkyZN15pln6tJLL9Vrr71m+7Jy7/vf/77OO++84UJm8+bN0xNPPJHa+zdMILJo0SI99thjeu2117Rt2zbt2bNHnZ2dti8r11599VUNDQ3p/vvv1yuvvKKNGzfqvvvu01e/+lXbl5Z7J0+e1NKlS3XDDTfYvpRce/TRR7Vy5UqtWbNGv/zlL3X++efr4osv1uHDh21fWq6dOHFC559/vu69917bl1JXdu3apeXLl+uFF17QT3/6U/3xj3/UJz7xCZ04ccL2peVae3u77rrrLr388st66aWX9PGPf1xLlizRK6+8ks4Jyg2qp6enXCgUyidPnrR9KXVl/fr15Tlz5ti+jLqxefPmcktLi+3LyK0LL7ywvHz58uHfS6VSecaMGeV169ZZvKr6Iqm8fft225dRlw4fPlyWVN61a5ftS6k7p59+evnBBx9M5b0aZkak0rFjx/TQQw9p/vz5eve73237cupKf3+/WltbbV8GxoGTJ0/q5Zdf1uLFi4efmzBhghYvXqyf/exnFq8MjaK/v1+S+P+0CEqlkh555BGdOHFC8+bNS+U9GyoQ+cpXvqL3vOc9mjp1qvbv36+enh7bl1RXXn/9dd1zzz36h3/4B9uXgnHgyJEjKpVKmjZt2qjnp02bpjfffNPSVaFRDA0NacWKFfroRz+qP/3TP7V9Obn361//Wqeddpqam5t1/fXXa/v27Tr33HNTee+6DkRuu+02FQqFwMerr746fPyqVav0q1/9Sk899ZSampp09dVXq9yAhWWj3jdJ6u3t1V//9V9r6dKluu666yxduV1x7huAfFq+fLn+53/+R4888ojtS6kL73//+7V79269+OKLuuGGG7Rs2TL95je/SeW967rEe19fn44ePRp4zDnnnKOJEyeOef7gwYMqFot6/vnnU5teqhdR79uhQ4e0cOFCfeQjH9GWLVs0YUJdx6+xxfnvbcuWLVqxYoV+//vfZ3x19efkyZOaNGmStm7dqksvvXT4+WXLlun3v/89M5aGCoWCtm/fPuoeItiNN96onp4ePfvss5ozZ47ty6lLixcv1ty5c3X//fcnfq93pXA91rS1tamtrS3Wa4eGhiRJg4ODaV5SXYhy33p7e7Vo0SJdcMEF2rx5c8MGIVKy/94w1sSJE3XBBRfomWeeGR5Eh4aG9Mwzz+jGG2+0e3EYl8rlsm666SZt375dO3fuJAhJYGhoKLXxs64DEVMvvviifvGLX+hjH/uYTj/9dO3Zs0df//rXNXfu3IabDYmit7dXCxcu1Nlnn60NGzaor69v+G9nnXWWxSvLv/379+vYsWPav3+/SqWSdu/eLUn6kz/5E5122ml2Ly5HVq5cqWXLlumDH/ygLrzwQm3atEknTpzQNddcY/vScu3tt9/W66+/Pvz73r17tXv3brW2tmrWrFkWryzfli9frq6uLvX09Gjy5MnDuUgtLS069dRTLV9dfq1evVqXXHKJZs2apePHj6urq0s7d+7Uk08+mc4JUtl7k3P//d//XV60aFG5tbW13NzcXJ49e3b5+uuvLx88eND2peXa5s2by5I8Hwi2bNkyz/u2Y8cO25eWO/fcc0951qxZ5YkTJ5YvvPDC8gsvvGD7knJvx44dnv99LVu2zPal5Zrf/59t3rzZ9qXl2uc///ny2WefXZ44cWK5ra2tfNFFF5Wfeuqp1N6/rnNEAABAfWvcBX8AAGAdgQgAALCGQAQAAFhDIAIAAKwhEAEAANYQiAAAAGsIRAAAgDUEIgAAwBoCEQAAYA2BCAAAsIZABAAAWEMgAgAArPl/9MNMutYHLVoAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": { + "engine": 0 + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "import matplotlib.pyplot as plt\n", + "plt.plot(data_np[:,0], data_np[:,1], 'bo')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the\n", + "initial centroids." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = DNDarray([[ 2.0113, 1.9847],\n", + " [-1.9887, -2.0153]], dtype=ht.float32, device=cpu:0, split=None)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "kmeans = ht.cluster.KMeans(n_clusters=2, init=\"kmeans++\")\n", + "labels = kmeans.fit_predict(data).squeeze()\n", + "centroids = kmeans.cluster_centers_\n", + "\n", + "# Select points assigned to clusters c1 and c2\n", + "c1 = data[ht.where(labels == 0), :]\n", + "c2 = data[ht.where(labels == 1), :]\n", + "# After slicing, the arrays are no longer distributed evenly among the processes; we might need to balance the load\n", + "c1.balance_() #in-place operation\n", + "c2.balance_()\n", + "\n", + "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", + " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", + " f\"Centroids = {centroids}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's plot the assigned clusters and the respective centroids:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "# just for plotting: collect all the data on each process and extract the numpy arrays. This will copy data to CPU if necessary.\n", + "c1_np = c1.numpy()\n", + "c2_np = c2.numpy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[output:0]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9OElEQVR4nO3de5RU9Z3v/U9Vd7pbAt0C6aCERmluio7OeowK6JkBYYLwmESOehLXRNHxMKOBRCQro0QzPlkJwWQ8glFjjK6RrGe8RkTycEAUvM0M4C1yEkGaawaEEAloNzChO1TV80f33uzatatq7117195V9X6txZLqrsuuIiu/T/1+39/3l8hkMhkBAABEIBn1BQAAgNpFEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARKY+6gsoJJ1Oa//+/RowYIASiUTUlwMAAFzIZDI6cuSIhg4dqmSy8JxHrIPI/v371dbWFvVlAAAAH/bu3athw4YVvE+sg8iAAQMk9b6R5ubmiK8GAAC40dXVpba2NnMcLyTWQcRYjmlubiaIAABQYdyUVVCsCgAAIkMQAQAAkSGIAACAyIQaRB5++GGdd955Zo3HhAkTtHr16jBfEgAAVJBQg8iwYcN0zz336N1339U777yjyy67TF/+8pe1efPmMF8WAABUiEQmk8mU8wUHDRqkf/7nf9ZNN91U9L5dXV1qaWlRZ2cnu2YAAKgQXsbvsm3fTaVS+uUvf6ljx45pwoQJjvfp7u5Wd3e3eburq6tclwcAACIQerHqb3/7W/Xv31+NjY26+eabtXz5co0bN87xvosWLVJLS4v5h66qAABUt9CXZnp6erRnzx51dnbqueee02OPPabXX3/dMYw4zYi0tbWxNAMAQAXxsjRT9hqRqVOnauTIkXrkkUeK3pcaEQBAlI6vXSwlkmqacmvu79bdL2XSapp6WwRXFm9exu+y9xFJp9NZsx4AAMRWIqnutff1hg6L4+vuV/fa+6QE7bhKFWqx6oIFCzR9+nQNHz5cR44c0ZNPPqnXXntNa9asCfNlAQAIhDET0r32PvO2EUIap853nCmBN6EGkY8++kjXX3+9fv/736ulpUXnnXee1qxZo7/5m78J82UBAAiMNYx0v/qglOohhASo7DUiXlAjAgCIi867RkupHqmuQS0/2B715cRarGtEAACoNMfX3W+GEKV6cmpG4B9BBACAAqw1IS0/2K7GqfMdC1jhT9k6qwIAUGmcClOdCljhH0EEAIB8MmnHwlTzdiYdwUVVF4pVAQBAoChWBQAAFYEgAgAAIkMQAQCE5vjaxXl3lxxfd3/vWS4BPAaViyACAAiPn7NaKuR8FwJTMNg1AwAI7ZRZP2e1VMz5Ln2BScrewmu9VhRHEAEAhDqo+jmrpRLOd4lzYAorWIaBIAIACH1QbZpyqxkoVNfg6vn8PMarUgfs2AamCpqticdCGwAgck1TbjXbl3feNTrQb/Z+zmopy/kuAdSjNE251bzGUgNTUHUn1n9L4/niMltjRxABAJiCHFQNfs5qMR5T1z7R8TFBFYMGMWAHGpgCLNQNM1gGiaUZAIDJaVAt6Ru+j7NarCEktWt91jV0r71PJ3ZtVGrX+sCWF0pZXrG/PzMwOLwvr9di3C5lJqMcy1ulIogAACQFP6hK8ndWi+Ux9muwhpAgB1U/A3ZYB+IFWXcSdLAMA0EEABDeoOqi0LPg4x0G5br2CYEPpr4G7BAPxAtiJiOUYBkCgggAIF6nzNp2fGQNypLq2ycUfLjXnTB+B+xSQ1Yhpc5khBUsw0AQAQCEOqh6ZR8wJZkhxBUPW1fjOGAHMpMRp2BZBEEEABA7TmHECBDFBmVPBZ8RDdj5Zm1OFupOkDLpnEJd6/sp1OMkTsGyGIIIACD+bHUSXsJIoYLPyAbsPLM2J3aul3Ry+SnnPn3hJG5NyUpBEAEAxJIxKNvrJNzOVsR562q+WY7U7o05gSmOLeSDRBABAMSOfVC210n4bTQWp8HbzaxNbFvIB4jOqgCAWMlXQOqmI6vTc7jt5iplt1i3t1u3dnMNsrNrsU62YXS7jRNmRAAA8WIrIM0p7LQsyRxfd79O7PwP1Y+85OT9bTUUx9cuNmtBiu4+sdRuZP2977GNU+cHWqPhZtYm7jM7pSKIAABiJaeAtMh23Lr2idm/7wsykrICg5vaEutSSOPU+eZMiqSc5wyiV0ixbbqV0pSsFAQRAECsudmOmzVAT70tb1Gn126uqmswfx5kjYab/iXG3+PU4yQMBBEAQGzZl2WyW71PNGc3gi7qtO+4kRRsjYbL/iWV0pSsFAQRAEB82ZZlzHCQqFNq13rVt4837xrkdl17XYakQGs0gjqDpxoQRAAAsWWd6Tixa6MZQpRJqa59Yk7NSBBFnfZiVLNGZPLcrNvVFAaixPZdAECsNU25VXXtE5XatT4rhKR2rT+51dbndl07pxDiVLTq57ldX4Nt27D9+oLYNhwnzIgAQBXyegJtnB1fd39WCFFdg/rPfsoMDSd2bVRq1/qiRZ1uPhPpZF3G8bWLHc+mMT+3sGo0PBzaVw0IIgBQjappMMukT86IOCy7nNj5H+6KOl18Jlk7bGxBzQgn+ZZ8ggp4ng7tqwIEEQCoQlU1mCWS5oyHMmmd2P1WdqCw9NywB4F87dKN254/kzIFvFpo7W4giABAlfI6mMVxOcepX0hq1/qcJmZug0CpA3w5A16cD+0LEsWqAFDFPJ1T0vdt314oaTYLS5RvyDALNm39NqyFq0YfEa9BoNSzW6zn3nTeNTq0WSanXUB571vBBa4EEQCoYl4GM6eD5SJbzjGWQGwzNNYZkdSu9ep+7aeer8/LZ5JP2AfRed4F5CFExi20sDQDAFXKzzklcalNcLME0nnXaM9BwNX5Lm521ySSoR1E56b9e77CXFdLRjErZCaIAEAV8jOYGXLam/ctf4RVO5Jv4G+acqtO7NroGIr8NC8r9pmc2LVB9e0THAfq4+vu14ldG5TatcGcjQntIDqX7d/t3IbIuBUyE0QAoBr5HMyk3KWLE7vf6t06q+LfoH0VvBb4hm72D7HMfPg+kbbIZ3Ji53rHBmbSyUHbHkKsjw8qjJTS/t1tgWtcZr4kgggAVCW/g1m+Qb7QLpVSp/0LfUO39w85+ui1jkHACBFO78/NrI2xDdh6nfYwYmwfrm8f7yvglYOXmaK47MohiAAAJBVfujDCSBjT/k7f0PMtgdjPmJGk+pETldq9USd2bTD7ikjS0Ue/qtSuDY4ByNpN1ZjFybp+49Rd6eRsTIx3n3idKQrqbJ5SEUQAAL1cLOek/vOd0Kb9s76h952umy8U2QdNp9/1Lu1scHytfAfbmYWo1lN3+5aGjj56raSM+ZxxKPR0ev18S0bGe3MKh0cfvTayw/wIIgAAScWXc8Ke9jefP5F0PF3XUDdivOMSSL4AZPzM+j6cZmiyloIsGvuKZo1tw9Ylm6gLPU1uaoL6ls3sZ/PkaxJXLolMJpMp26t51NXVpZaWFnV2dqq5uTnqywGAmpVv2j/f4GsuC/SFlmKDdL4ZCuvjsu5j+XZv1/mddvNwvJYfbHd9PdalDElKDGxTwwXXOBaqmtfo8v3FhXV5y3pwYNZupAA66HoZv5kRAQAU5HUrsJ9ahXwzFPadK07PaQ0lRx+9tjeEKGEup9SPuEhNU2+zLPs4B5hsCWU+3tv7mD717eNV3z7eHKjjUOhZiNMOJuu/W+d3Rjh2ri03gggAoDAPW4F99S9xeP6sZZZ190uZVMEi2tSu9ep595fKfLxXiYFt5n+NZZajj371ZM1HJq1VP/66vvP/vq57F9+vSxObew/T27Wx9/dGfYhl23Dj5LlZMwVxKfQsKM8OJlMmHYsQRRABABTkaSuwJVRYv5HbQ4t1CSDf8xetM8mkLTUdvTMYampW8z/+u7p+fGlOGJF6l1fqRlys/2fW7fo/+3v0nb//il76H59W/chLlNq9Mat4M6uHieVz8N3HpMwK7WCSFJsQFWoQWbRokZ5//nlt3bpVp5xyiiZOnKgf/ehHGjt2bJgvCwCISFaosH0jd6z1KKDorEMieTIsZFJ9D+pS54IzJWXMmRFTfaNSu9br1UMtent/j75+4QD99O3DeuXgaZqSWK+6EeNzijftRZySvM/4RMipgFdSrEJUqIfevf7665ozZ442btyol19+WX/+85/1hS98QceOHQvzZQEAMdA0xf8hem4OfWuacmvfCbx9NSGmjKSEmv/x37Of9ES3MpmMvv/A47pwaIMWXnaqLhzaoHtePaBMJqP6kROzXrf/7KfUOHW+6kdcZL7+iZ3r8y5TmUW0MdM05das7cj2EFX0QL2QhToj8uKLL2bdXrp0qT772c/q3Xff1V/91V+F+dIAgDjoWz7Je16Mww4Nt3Um9pkL2wv3zYyclBjYpnXvbtPb+3u07H+0KpFI6I5LW3TVswf1b6deoy/2LScVLN7MpIv2T/HCV0t8r69hbou2zBxZRN0Vtqw1Ip2dnZKkQYMGOf6+u7tb3d3d5u2urq6yXBcAICTW5ZM858XkcFEcm3MSr7FlN/uJsm6lD+/RPf/eqQuHNuiyEU2SpMtGNOnCzzXqh8t/rStuz/hvje83UIR8Eq7bepYol5NCXZqxSqfTmjdvni655BKde+65jvdZtGiRWlpazD9tbW3lujwAgMXxtYvzTtUfX3e/61bnOcsnfVtq7YOj9fmapt5WcNahaeptWWHF3LKbqHN8TGJg71jyyu7jent/j+64tEWJRO9STiKR0B2XNGvjW29r5Y9u8d/CvS9Q2D8zc+BPOA+3pSxfFZNvZinqpRi7sgWROXPm6P3339fTTz+d9z4LFixQZ2en+Wfv3r157wsACJHPgdXOWD7pDQO9tRvGckrWN3SXz2cwwop1eablh7t6Q4+hrkF17ROV+XivMnUNObMhBmNW5PsPPK6MEp6Clnk9JQQK62M77xodXJfWAjNLcapnKcvSzNy5c7Vy5Uq98cYbGjZsWN77NTY2qrGxsRyXBAAooNDWT7eDZP7lk94w4nSSrhdOyw5ZtSKpHtW3j1f64705tSFWxqzIVc8e1JqVL2jyKbtzlkRcn+Ar72fsGI8NukGa32Wmcgt1RiSTyWju3Llavny5XnnlFY0YMSLMlwMABKjkb+rWniJ9Tcl6l09Ozox4DSFZS0a2b/w97zwrqXdXSOPU+aobMV7da+/rrQ1Zf9RxNsRw2YgmXTi0QT/85Qap5XOO9RpuZm2ydqh4CBROW5VrRahBZM6cOfrXf/1XPfnkkxowYIAOHDigAwcO6E9/+lOYLwsANSmoug4rvwOrlL18Ym7F/eGuk2EkUef9m7llychaS3L00WuV+eRDJQa2mT1L+v/9M6prn9BbG/Lhn7JqQ3Ketm8Hzdv7e7TuvR15l1eKfcZHf/4Vz4HCzVblos8Rwr99uYQaRB5++GF1dnZq0qRJOv30080/zzzzTJgvCwC1KaC6DvtjCw2sRQfmR7/qcKhaytxKevTRaz1dT9OUW82ZDmtYMOpQMh/v7Q0DfepGjM9bG2LXWyvSpHveTun4y//LeRbI4TM+vnaxWYBrdGe1Boqjj34172sGVlAawr99uYRaIxLjg30BoOoEUddh5WrrZ5Htp3Ujxuc9sdeoEfHaYrx+5ESldm/MqcU4sXO9Uh/vVWr3Rh39+VfU/++f0Zr/b3ne2hC73lqRAbrq2YN65Xef0pQRiZxZIKfP+MTut8zaFKfPObVrQ/736OEcn0KC/rcvJ86aAYAqUkrBpJXbpmJuB0Cnn1mPobc+v5v3eGLXxt7B35itkZTavdG8T2r3Rn1y5yj98Lm9GjGwXoP71WnTgZ6izz24X51GnFqve/7tY102sk0Jh9byTp9x4tRhynzyoePnJylvoAiyoDSof/tyI4gAQJXxuwMjqymX7Zt6zq4Ry8DqagAM6Ju/ob59/MkdMqmerFN4JUmJOvX09Gj/kZT2HUlp0tIDnp7/z4kmJS69RXUfvp23AZj1M26+/T/M8BFlCAhj903YCCIAUGV8H1FvXWaxfFO3f7t3eq5iA2CQ3/yt19Nt1JxI5i4cY7aksT6hNdcN0ccDz1W//74o6zn+6/k7lNr3W9V97jyl9v0m63d1nztPrQMapDd+ovqp81XfPiEnjGS1Tbd8xuZnkKiLpE+H73/7CBFEAKCKlHJEfSl1BuUaAHOXPLLbuptLNn3FsMNaGjUstVWNH/9b9jJRqkONX7uj7/1uzXqOujMGOW8t7gsWZv2L7XTeE7s2Zp3pcmL3W4G//0JK+bePEkEEAKqE27qOQvzUGZR1AOxb4jGu0VyO6Rv8sw7B6wtFRlAwr8f2HNa/S3IMIflqXYzbiYFtWTt3jGso14xEEP/2UYnvfh4AgDcBtfT20jvEccbEcuKu03bSUnpaGEs81hBi9CcxzpQxf9a3hdYIJ8b7tz6Hcd3mey7ixM7/MNvTSyfP0sl83HskSebjvWqcOl/9Zz9V3jNdKqSduxOCCABUCVeHxbngqcun0wDYd+KudfA3njeQnhZ9Qcc+c9FwwTU5dzUG4t5Zk2TWc+QU41p24JzYuT7nuSSpfuQl5kyH+bMRF0nq2xpsadJWzhAQ1L99FFiaAQCYvC6zOA1w1iWB+vbxjs9biqapt+n42sWqbx/vuDPH+Lv9erJ+lqcY1/qeHZdVLLM9xnP31oL0tq1XJpX1uHIuh2TterL/zsVZOVEhiAAAJAVbZxB2TwvHAORiZ459sLYXvx5fuzhr6cb6WElZsz3da+8zd+1Ya0Miq8no2/V0YtdG9Z/9lPlja3Gt9f3FBUszAIBeAdcZeKk1KRt7K3Rb4aqxfJPvPWct9fTNgEiJ6GpDbNdmLFkZrfPtO3zi2Oo9kYlxH/auri61tLSos7NTzc3NUV8OAMADc1mnL4zEpctnvqUYL9dntKc31LVPzJmFiGopxLy2vp1ETvU0YfMyfscvGgEAKl4QJ8qGxZjV6F57n/PBdkUYh+xZC1StBax+QkiQp+f2n/2UGUIk5+3IcUIQAQAEKrATZUPkd9nI2jdEymTNOvSetHutv51BAZ6ea55wbD53XWxDiEQQAQAErQJ6WnjaomzVt2vGKEy1Ln0YTc38zD44BTU/S0bWmhBJZlAyakbiiBoRAEBNKVYjUmgbrFF/kbO92TIzYq0V8Xttfupq7IWpxmONay712rygRgQAAAeulo0KLJMYA3rW9ua6ht6lkERdX3Mz/0raaZSn0Vv/2U9ltZyPG/qIAABqR4FlI+P3Tn1E8i2T2Jd4St0eW8rhgXkbvak3jBhFtHHD0gwAAA6KLZMEsQ3Y6fWCer4oeRm/mREBgCpUqe2+46Rpyq1mV1j7MknQp91W8um5pSKIAEA16qtzkJR3AEVhBZdJXCzxeBL081UQlmYAoEpV01R/ufHZlYalGQBA6AfPVau4L5O4WXaTVDFLc2zfBYAqFsuD5wIWZHt0SfFvyOamC2uAnVrDxowIAFSxUraDVoyA62EKzRQE+dkZMxvKpHNmL8yZjb7fW6/JaXYm39KRm/tEjSACAFUqX52DVF07MLwMzLHSF6CMZmNS9rVbm5PZuVl2q5SlOYpVAaAKFWrAFfsB2qdS2qNHxR467P8t9h467xptzni1/GC77/sEjRbvAFDr4l7nEBBrfYi9Hsb4fZwZ/x6pXeulRF3Wf4uFEDcH9/k+3K+MWJoBgCpUrjqHyFnqQyRlDbqV0i8lq3Ga1HtuTZHCYjfLbpWyNEcQAQCExm+HV7ePs9eHGMEjK5zEnDlr0XeCrxJ1BQuL3WwvNv4e1y3IVgQRAEAojq9drBO738oqxDQYR9PnnbHwuRPGWpQpxW/QtStUI5L32l12Ya2UTq0EEQBAOBJJx0HVCCF17RPzBgRPO2H6BuZ858LEadC1ctodY9814xRGSl12i1soY9cMACA09sHWWH6oa5+o/rOfcv34YjthKnLHjM8+IpXAy/hNEAEAhMpaJClJStSp5Ye7XD/euv20cdLXHQdtI+zUj7jIXNaphDBSrdi+CwCIjaYpt/bOhBgyKdfbSO3bT0/sfiurdXnujEvS3BLr1OIc8UONCAAgVEcfvfbkbpC+ZRnHQlTbThl7YeqJXRtyak6USTs2/4pjUSacEUQAAKGxFqb2n/1U4UJMW08Qawgx/l7fPiH78QVqQuJ0Qq71tn3bctxOwy03gggAIBTH192fFUKk7N0wde0Ts2Yssn43YnxOCLHvhEn95zvxPVXYvv247/aJXRuzti37PZivmhBEAADhcNHvwj4LYA0jqT2/znuYW9xPFXbafmyEEGPbcjWf++MFQQQAEArf/S4yabO7qH224/i6+/tqRTbEvnW50+m3Rj2LsROo1kOIxK4ZAEDMnNj9Vk6rc+nkMoY1hEiK9S4Z+0F8/Wc/lXW71kOIxIwIACBGrHUl1h0y5rLGiPGqH5nbkTWuu2TsS0hHH7021ktKUSCIAABiIe9psYm6nKJXJ3Eb0O3vJ98OIil+115OBBEAQDzYilubptx68vyYRF1v19QK4RSqrDM91pmQWg8jBBEAQCzYi1vtyxpKVFBZo33HkOW2eY6M4rukVE6cNQMAiJ18yzTsMqkMXsZvZkQAALHiFDpYxqheBBEAQLy4aISG6hHqgtsbb7yhL37xixo6dKgSiYReeOGFMF8OAFAFmqbelnfGo2nKrTV7Jku1CjWIHDt2TOeff74eeuihMF8GAABUqFCXZqZPn67p06eH+RIAAKCCVdBeKAAAUG1iVaza3d2t7u5u83ZXV1eEVwMAAMIWqxmRRYsWqaWlxfzT1tYW9SUBAIAQxSqILFiwQJ2dneafvXv3Rn1JAAAgRLFammlsbFRjY2PUlwEAAMok1CBy9OhR7dixw7y9e/dubdq0SYMGDdLw4cPDfGkAAFABQg0i77zzjiZPnmzenj9/viRp1qxZWrp0aZgvDQAAKkCoQWTSpEmK8Zl6AAAgYrEqVgUAALWFIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIhMWYLIQw89pDPPPFNNTU26+OKL9dZbb5XjZQEAQMyFHkSeeeYZzZ8/X3fffbd+/etf6/zzz9e0adP00Ucfhf3SAAAg5kIPIvfdd59mz56tG2+8UePGjdPPfvYz9evXT//yL/8S9ksDAICYCzWI9PT06N1339XUqVNPvmAyqalTp2rDhg059+/u7lZXV1fWHwAAUL1CDSJ//OMflUqlNGTIkKyfDxkyRAcOHMi5/6JFi9TS0mL+aWtrC/PyAABAxGK1a2bBggXq7Ow0/+zduzfqSwIAACGqD/PJP/OZz6iurk5/+MMfsn7+hz/8QaeddlrO/RsbG9XY2BjmJQEAgBgJdUakoaFBF1xwgdatW2f+LJ1Oa926dZowYUKYLw0AACpAqDMikjR//nzNmjVLn//853XRRRdpyZIlOnbsmG688cawXxoAAMRc6EHkK1/5ig4ePKh/+qd/0oEDB/SXf/mXevHFF3MKWAEAQO1JZDKZTNQXkU9XV5daWlrU2dmp5ubmqC8HAAC44GX8jtWuGQAAUFsIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkSGIAACAyBBEAABAZAgiAAAgMgQRAAAQGYIIAACIDEEEAABEhiACAAAiQxABAACRIYgAAIDIEEQAAEBkCCIAACAyBBEAABAZgggAAIgMQQQAAESGIAIAACJDEAEAAJEhiAAAgMiEFkQWLlyoiRMnql+/fjr11FPDehkAAFDBQgsiPT09uuaaa3TLLbeE9RIAAKDC1Yf1xN/73vckSUuXLg3rJQAAQIULLYj40d3dre7ubvN2V1dXhFcDAADCFqti1UWLFqmlpcX809bWFvUlAQCAEHkKInfccYcSiUTBP1u3bvV9MQsWLFBnZ6f5Z+/evb6fCwAAxJ+npZlvfetbuuGGGwrep7293ffFNDY2qrGx0ffjUV7LVq5WMpnUzBnTcn63fNUapdNpXXXF9AiuDABQKTwFkdbWVrW2toZ1LagwyWRSz61cJUlZYWT5qjV6buUqXX3FjKguDQBQIUIrVt2zZ48OHz6sPXv2KJVKadOmTZKkUaNGqX///mG9LMrICB/WMGINIU4zJQAAWCUymUwmjCe+4YYb9Itf/CLn56+++qomTZrk6jm6urrU0tKizs5ONTc3B3yFCIoRPurr63TiRIoQAgA1zsv4HdqumaVLlyqTyeT8cRtCUDlmzphmhpD6+jpCCADAtVht30VlWr5qjRlCTpxIafmqNVFfEgCgQsSqoRkqj70mxLgtiZkRAEBRBBH45lSY6lTACgBAPgQR+JZOpx0LU43b6XQ6issCAFSQ0HbNBIFdMwAAVJ5Y7JoBAAAohiACAAAiQxCJ0LKVq/NudV2+ao2WrVwd6OMAAIgbgkiEjLNa7KHC2I2STDr/8/h9XDkRlgAAbrBrxqUwTpr1e1ZLJZzxwoF4AAA3CCIuhTWwWkPFCy+ucX1Wi9/HlUucw1IYoRIA4A9BxKUwB9aZM6aZYcLLWS1+H+dWqQN2XMMSszUAEB/RFxNUkJkzpunqK2bouZWrNOub8wP7du/3rJawz3gJohYl6APxgqg9sf47Gs8Vl9kaAKg1zIh4FPQshN+zWoz7jRszWnfOm5vzuCCWGIKYBXIKS6V8ZkHNZsR1tgYAag1BxKMgB1a/Z7VYQ8iWbduzruG5lau0Zdt2bdm2PZAlhlIG7DAOxAtyiSzspS0AQHEEEQ+CHlj9ntVifZz9GqwhJKiB1c+AHeaBeEHNZgQ9WwMA8I4g4lIYA6ubQs9inAbls0ePCnRA9TNgh30gXqmzGWHM1gAAvCOIuBSnk2btdRLWQVmSzhk7Ju9jve6E8TtgBxGyCillNiPM2RoAgDcEEZfCHli9sA+akswQUoyXYs+4DtilzmbEKVQCQK0jiFQopzBihIhCg7KXYs8oB+x8MzfGtZ49epTS6XROoa71PeXbNRSnUAkAtY4gUiXsdRJuw0ihYs8oB+x8MzebO7ZJOrn8ZL+PEU5oTAYAlYEgUsGMQdleJ+FmxiLuW1fzzXJ8sH1HTmiKYxt5AIA7BJEKZR+U7XUSxQbhSti66mbmhsZkAFDZCCIVyEsRqVOthbUh2lmjRuZdBnFifT77c1vrMoI6PM7NzE3cZ3cAAPkRRCqQvYjUHgisSzJbd+zUlm3bJWUvXRhdWceNGe1pJ4w1tFj/bjz+6itmBFqj4WbmphJmdwAAzggiFcg+y1BoS64RNozfp9NpM4Q4zagU2wljDS1XXzHDPDxOyt61E9RhgMW26dKYDAAqG0GkCrjZkmvcNmYNCm3L9fJ69fV15s+DrNFws/xk/D1ufU4AAO4RRCqYdUmmWKv3oOso7M8nKdAaDbc9TGhMBgCVjSBSwby0eg+6jsL+fFLuNuJSlNrDhJkQAKgMBJEK5rbVe9B1FPZiVOO5rry8PMsiXs/LAQDEV80GkWoZzIq1et+ybXvewlQ/gcEphNj/bi1gDSOMeDkvBwAQbzUbRKp1MHNq9W7domtwqqNwE86kk3UZy1audjyfxghwYdVoeDkvBwAQbzUbRKppMPPb6t3+Ht2EM+vP7TNGRjjJVyMS5EwTHVUBoDrUbBCRqmMws7d6/8HiB/K2ei8WBIIIZ+WcaaKjKgBUvpoOIpL7wSxuNSXLVq42u6ZaQ8I5Y8fog+07Cjb+KqTUcFbOmSY6qgJA5UtGfQFRcxrMnBjf9O2/NwbZZLK8H2Uymcxp0W5nLNl4DQIzZ0wzPw8/Mw0zZ0wzC1ZnfXN+aCHEeN5f/OQ+8/Xy/fsZS0b5nmvZytWBXRsAwL2anhHxsq01bjUl1usxZgKcdrTM+uZ8z7MaQcw0hLls4uXQP4PbJaO4zXwBQLWr2SDiZzCLoqak0MAoSWePHpX3evwEATfhzM1gnUwmQ1s2cdt11el3xYJkte6mAoC4qtkg4mcwM35vH+DD/BbtZmDcvnt3TuDwM6tRLJxt2bZdd86bm3NN1vBhPdk3rIPo/HZddRMk4zbzBQDVrmaDiN/BzGmAD3Pav9jAKMmxxsVPJ9VC4cxojGYNNNaGaUb4KHSybxwOonOzZFQNu6kAoFLUbBDxI9+yhVM30SCn/fMNjMbPnAKHPQhs7tiWNwgUm7VZtnK1xo0ZrXFjRmc9hxFCEgmZ4SOdTrtuoBYFtzNFbA0GgPKo+V0zbuVbtrAGkGI7Raz3N2Yu3E7723eySHK8nrNHj3J8vHEAnrGTRuoNGAuXPOi468e6k8QaoKzvccu27ZKkTKa3mZqxPBPXQk8vO23c7qYCAJSGGRGX3NSUXHXF9NCm/e0D4+aObY6Pu+u2b2S1Y3d6XWMWwOhDYp/BsM/QWB979uhRZiGqwbimf3vzLR08dDjrMU7PFwU3xclGkDJ+Zp9p2tyxTXfd9o1o3gAAVClmRFy66orpecPCzBnTdNUV011/i/bap8P+Tf7s0aP0wfYdee9rhBD761tnZL42Z15WTUexGRrjsR9s35EVcsaNGa1f/OQ+jRszWgcPHVbr4EG+ZnzCVihIGktKxsxPvuv9YPsOZkYAIGDMiATES08SLztanAZyt91TCxWHZjIZ1dfX6c55c83HeS3MbB08SFu2bdfCJQ/mFKzGrdDTviRkLRy2Xtvmjm36YPsObe7Y5lgYHHWNCwBUG4JIALz0JPESWCTnb/LW53YaMK33Ne5j1IgYr5VMJnTiREoLlzyos0aNNENRMplwHGyt15lIJJTJZHTw0GElkwkzfBihJp1Oa9uuXbEu9MxXOGyEvA+27/DVDA4A4A1BJABue5L4aaLm5oC6fAOm9T7WpRxj5sKY0Th46JBZ85FOZ7R1x06tXbtW8267TUsWL9aRnpReW79BknICVDqdUTKZ0FmjRpqvWQlnwBTbFs2OGQAoD4JIANz2JLEHFntfEetMhJtdJoW2mFqf27iP1WcGDdLBQ4fV75RTzAJTqXe5ZXPHNv3iZw9qe0eHbvyf/1P/bfqXlEgk8s4M9IaRpHndfnqYRKHQtui4BykAqBahBZHf/e53+v73v69XXnlFBw4c0NChQ/W1r31Nd955pxoaGsJ62Vizh4pS24kXmnkwnnvLtu2WJZeMJJnbblsHD8oKIf1OadLBQ4fVkElpe0eHxp73l+r4zSb91ycf68KLL3ac2ZFk9igx+orEtZmZE3uYk/w1gwMA+BNaENm6davS6bQeeeQRjRo1Su+//75mz56tY8eO6d577w3rZStKKe3Ei808WBuOjRszWmeNGpm1vVaSDn38cdZz/tefjiuTyWjlC89r8JDT9H9N/G869IcD2vLe2+p36kCNGzM67/KS8fM4NzNzYg9zXpfOAAClCS2IXH755br88svN2+3t7ero6NDDDz9MELEwOpE67TLJtzzjptZEUtZOlq07duYEAWOGxNA6eJD+z3vv6dAfDmjS//1lJRIJnfv5i/Xa/16hr/7tdZo5o/dcHT9n9Fh7dNgVWoYK8xwf++f4g8UPOG6LjnOQAoBKV9Y+Ip2dnRo0aFDe33d3d6urqyvrT7VLJpPasm27uYvFqPUwBkmnwdtNTwzjPnfOm2t2PTW0Dnb+N/joj4f023fe1OAhp+n0tuGSpNPbhuszQ07TmpW/0vP/+0XzdZwY/VTyvU+nLqaF3mcpjyvGKczddds38nZbLfTeAAD+la1YdceOHXrggQcKzoYsWrRI3/ve98p1SSUJ6pu6/cwWY0ut/fRa6/N5ObDPWHow2OtCrH6/d0/WbIikrFmRBx7+mb759VuKvqdC1+V1GaqU5atC/J6+DAAIViKTyWSK3+2kO+64Qz/60Y8K3ueDDz7QWWedZd7et2+f/vqv/1qTJk3SY489lvdx3d3d6u7uNm93dXWpra1NnZ2dam5u9nKZocs3GHodJI37GwHB6NFh7cvhd9A1Hnv26FE6Z+wYM/AY6uvrdOXlvQfibdm2XS8t/6Uk6QszrzGDiNTb/Oyl5b/UoFNb1PHBB3ph9Uu+l0SMazJqMrx+Tl4fBwAov66uLrW0tLgavz3PiHzrW9/SDTfcUPA+7e3t5t/379+vyZMna+LEifr5z39e8HGNjY1qbGz0ekmRCOKbuv3+182dp3Q6o0QikdWxtJQQYn1sOp3WwUOHzEZkxkxJIpFwnA0xJBIJ/UXfrMjdP/ihdv7+o6wdPV5ngPz06PD7OABAvHkOIq2trWptbXV133379mny5Mm64IIL9Pjjj/tez4+rfH0o3A6S1uWB3sE8Y26zNcKI1xBiLBk5LT1s3bFTBw8dNnfRGPUXmUxGWze9k1UbYnd623ANHnKafvrQg7r1HxfkbStfjN9mZ6U0SQuz4BUAUJrQakT27dunSZMm6YwzztC9996rgwcPmr877bTTwnrZgsIYkEr5pm68Vu7MyG3mLhOv3/yNcGEPIcbsSuvgQbpz3lzz55s7tmndunU6sG+f42yIwTorsnNbh+N1F/t8jXNcvPboKLVJWqn9WgAA4QktiLz88svasWOHduzYoWHDhmX9zmNZSmDCGJCKfVP3MzgbISSdTmvhkgezgkMxM2dMMxuMGbeXr1pjhpCDhw7rB4sfMI+zHzdmtJb86IcFZ0MMxg6alS88r+s/1aBUKnvWJd/na90Wa99yvGXb9oKhwk9bfKfPxH7/uJwKDAC1LrQgcsMNNxStJSm3oAckN9/Ui4Wfs0ePynm8cduYxfDbYty+ZLS5Y5sOHjpsHmc/c8Y0rfjVr/LWhthZd9Ds3b1bw9vbs64r3+drhBB7szMjII0bM7pg/5EgdreUuowGAAhHzZ01E9SA5Pabutvw4/Qz664Z6/MXY5wgKymrdbnxs9bBg3p7Zax+UatWLFf/5hY1nXKKDh/8qOhzN51yivo3t+j9d97U6W3Dc0JSvs/X+Jlxf7cB0MtW5WIoeAWA+Km5ICL5H5Csyyz2b+r2GhPrN3U34SfIvhbW3iSSzNbl0smTd5PJhHp6evSno0f1X8eO6sXnnnb9/JKUTCQ0pv1Mx5Bk/Xzty1LWz+Ds0aPKGgYq4VRgAKg1NRlE/A5I1mUW6zd1e42J03MVCz9BfvO3LnlY+4YYvUmMJZ+6unr9zX+/Rp9rbdVNf/uVrOd47ImntXvPXvO20d+kfXhvHcn+gwd1/jnn6PxzzskJI8bna69zsX4GUu/MTblU0qnAAFBLai6IlDIglVJjUq5v4/ZQZA0i1t4kRkgY0NyiT/50XP954GBWkOj8U7cuveSSrMdbu7Je/9Wv5Mzo2F/fWueycMmDGjdmdFaX13IJouAVABCOmgoiQe/AcFtjUs5v48YSj3GNkszwI2UfhGf83Dh0z7ge4yA+o4+J8VxGCLEvqRSrczHCiBFq7NcXdgignTsAxFdNBZEgd2C4rTHJ1+HUPvhb719Kg62rrpieFXScwo/9HJvnVq7K2rliHMRnvWY3Syrvb+3I2RkjKWuJyP55lSOMBLnsBQAIVnW1Oi3iqium5x14Zs5wf7qq0zJLPk7hxxjo7dtWSz1R1rC5Y5uk3J4d48aMliR9ZtCgrJ9ffcUMc7nG6Zqt79f6/HbnnjXW3G5s9dr6jX3vO5H1eRmvzYwEANSumpoRCYLXZRancGNd3jHCQZANtoxD7uzPc9aokVn/tV/P+1s7cq7ZOmNibQvvVOPi1Ext4ZIH9cfDvW3lx40ZnfP7cs1IWFvf23fy2JvI0e4dAMqHIOJBkEWPYTbYyjeQulmicCrENZZWrMsu+d6v0cPkuZWr9PyqF81lKGMp6uorZuicsWPKXiRqBCjrMpHTe6TdOwCUF0HEg6CLHuPWYMseMuxFq/bw5fR+rc9hzDJYQ4jTTptysM9CPbdylVlA6/QeAQDlkchEdfCLC11dXWppaVFnZ6eam5ujvpzAGd/GjVqTuAyEpV6XdbnKUKj5WzlZ63B6g1Lvacdx+ewBoBp4Gb9rqlg1TqzLPL/4yX26+ooZZu1F1GbOmGaGEK8zNdaljkK/91qQu2zl6ryfzfJVa7Rs5WpXz2O8N2M2Jp3OxGI2CgBqFUEkAvlqTeISRrzsCrI/znhf9iDy3MpVWrjkQd8FudYiWafXdBtsrF1fe5834ek9AgCCRY1IBOLcYKuU5mv2Zmr2xmWl1GGU0tXW/t6sxbfGf+mwCgDRIIhEIK4NttzuCrIe/md11RXTzS6q+ZqhlaKUnUZOu2Psu2YIIwBQfgQRmNzO1FgP/7P347Bv87Uv82zu2FbSQO93p5Hx3oydQE47gOwN5gAA4WPXDHzJt4Rj3x1T7D5+XzduO40AACd5Gb9rdkYk3/KCFO320kpRbJkkjBNvy3l4IACgPGo2iBRaXrAWWiK/QsskQRfkhhFsAADRq9kgEsQujFrntM3X+NyCLsiN804jAIB/NV8jQs2BP2HUfwAAqgM1Ih7E7byXoIVRCxP3ZRI371kSNUIAEAM131nVbxfRShFUR1KrQsskxhbZKLl5z2F8LgAA72p6RqQWdmGEUQtTroZsxsyGcYqvvajY+Ll99sLLe6ZGCACiVbNBJO7LC0EqpSNplIxZC6MbqpQdGKxdUu3cvOdK/VwAoJrUbBCptV0YlVgLYw0KRgv2Ldu2Z50TUyg4uHnPlfi5AEA1qdkgEtfzXoJkLdq018IsXPKgzho1MvYFmdYwkkwmtGXbdiWTSVcH6BXaXuzlPgCA8NRsEKl0bnaGGEsbxiyCMXAbB9NVCuushdQ7W1Vs9sJN/U8t1AgBQNwRRCL0/ft+omQyqTvnzc353cIlDyqdTuu787/p+Fg3nWFnzpiWtZRhDLbWpY1KmAEwZi2SyYTS6YySyWTB2Qs39T/G32uhRggA4owgEiFjiWHhkgezwogxYzFuzOi8j3W7M+SsUSMlSVu2bdesb87PKsi09tSIK3thqvW/+QKD2/qfWqoRAoC4qvnOqlGzho47583NuV2M286wRgipr6/TL35yXxhvJXBOu2Py7Zph9gIA4sPL+E0QiQF7zYbbEGIoFjIqtY293z4iAIBoEUQq0N9+/Vbz70/89H7Xj7OHjLNHj9I5Y8c4FmRK0uaObfpg+46KCSMAgMrDWTMVZuGSB3Nue1mWsS9ZfLB9h3kfawgx/n7O2DEUZAIAYoEDNSJmrQl54qf3m3UP9nCybOXqrHNR7DMdy1auNs96kXpDx+aObTkhZOaMabE5EwYAAIJIhJwKU++cN9cxjNgPaTN2hkhGs6/ef0ojZJw9epQ+2L5DL7zofH7KzBnTIq2rsAcr6+3lq9Zo2crV5u/stwEA1YMgEqF0Ou1YmGqEEeuMhREwjDBihIh8IeOu275h1o3EsXW5PVgZtxcueTArWHEaLgBUN2pEIpSvWZkkxxqRmTOmaXPHtryHtBk7Sa66YnrsW5c79UFxar7GabgAUN0IIhXmnLFj9MH2HTkzHdZBu1JalzudfmssS9mbrwEAqhPz3RXMmOmwF646tS63LuvEycwZ07KWkO6cNzfWS0oAgGAxI1JBnAKHMdNhBI9lK1dXVOtyp1OB47ykBAAIFkGkQjjVS1hPpDUU2gkTtwHd/p7su4jiuqQEAAgOSzMVwn6Qm3UmQertmFpJnOpYnE4FjuuSEgAgGMyIVAjrTEe+YtRKWsawByvrbeupwHFdUgIABIOzZipMvi2tbHUFAMQFZ81UMftMgoGZAwBAJWJGBAAABMrL+B1qseqXvvQlDR8+XE1NTTr99NN13XXXaf/+/WG+JAAAqCChBpHJkyfr2WefVUdHh5YtW6adO3fq6quvDvMlAQBABSnr0syvfvUrXXnlleru7tanPvWpovdnaQYAgMoTy2LVw4cP64knntDEiRPzhpDu7m51d3ebt7u6usp1eQAAIAKhNzS7/fbb9elPf1qDBw/Wnj17tGLFirz3XbRokVpaWsw/bW1tYV8eAACIkOcgcscddyiRSBT8s3XrVvP+3/72t/Xee+/ppZdeUl1dna6//nrlWw1asGCBOjs7zT979+71/84AAEDsea4ROXjwoA4dOlTwPu3t7WpoaMj5+Ycffqi2tjatX79eEyZMKPpa1IgAAFB5Qq0RaW1tVWtrq68LM5ptWetAAABA7QqtWPXNN9/U22+/rUsvvVQDBw7Uzp079d3vflcjR450NRsCAACqX2jFqv369dPzzz+vKVOmaOzYsbrpppt03nnn6fXXX1djY2NYLwsAACpIaDMif/EXf6FXXnmlpOcwylfYxgsAQOUwxm03ZaixPvTuyJEjksQ2XgAAKtCRI0fU0tJS8D6xPvQunU5r//79GjBggBKJhKfHdnV1qa2tTXv37mXHjUd8dv7wufnD5+Yfn50/fG7+ePncMpmMjhw5oqFDhyqZLFwFEusZkWQyqWHDhpX0HM3NzfwPzSc+O3/43Pzhc/OPz84fPjd/3H5uxWZCDKF3VgUAAMiHIAIAACJTtUGksbFRd999N1uFfeCz84fPzR8+N//47Pzhc/MnrM8t1sWqAACgulXtjAgAAIg/gggAAIgMQQQAAESGIAIAACJTM0HkS1/6koYPH66mpiadfvrpuu6667R///6oLyvWfve73+mmm27SiBEjdMopp2jkyJG6++671dPTE/Wlxd7ChQs1ceJE9evXT6eeemrUlxNrDz30kM4880w1NTXp4osv1ltvvRX1JcXeG2+8oS9+8YsaOnSoEomEXnjhhagvqSIsWrRIF154oQYMGKDPfvazuvLKK9XR0RH1ZcXeww8/rPPOO89sZDZhwgStXr06sOevmSAyefJkPfvss+ro6NCyZcu0c+dOXX311VFfVqxt3bpV6XRajzzyiDZv3qzFixfrZz/7mb7zne9EfWmx19PTo2uuuUa33HJL1JcSa88884zmz5+vu+++W7/+9a91/vnna9q0afroo4+ivrRYO3bsmM4//3w99NBDUV9KRXn99dc1Z84cbdy4US+//LL+/Oc/6wtf+IKOHTsW9aXF2rBhw3TPPffo3Xff1TvvvKPLLrtMX/7yl7V58+ZgXiBTo1asWJFJJBKZnp6eqC+lovz4xz/OjBgxIurLqBiPP/54pqWlJerLiK2LLrooM2fOHPN2KpXKDB06NLNo0aIIr6qySMosX7486suoSB999FFGUub111+P+lIqzsCBAzOPPfZYIM9VMzMiVocPH9YTTzyhiRMn6lOf+lTUl1NROjs7NWjQoKgvA1Wgp6dH7777rqZOnWr+LJlMaurUqdqwYUOEV4Za0dnZKUn8f5oHqVRKTz/9tI4dO6YJEyYE8pw1FURuv/12ffrTn9bgwYO1Z88erVixIupLqig7duzQAw88oH/4h3+I+lJQBf74xz8qlUppyJAhWT8fMmSIDhw4ENFVoVak02nNmzdPl1xyic4999yoLyf2fvvb36p///5qbGzUzTffrOXLl2vcuHGBPHdFB5E77rhDiUSi4J+tW7ea9//2t7+t9957Ty+99JLq6up0/fXXK1ODjWW9fm6StG/fPl1++eW65pprNHv27IiuPFp+PjcA8TRnzhy9//77evrpp6O+lIowduxYbdq0SW+++aZuueUWzZo1S1u2bAnkuSu6xfvBgwd16NChgvdpb29XQ0NDzs8//PBDtbW1af369YFNL1UKr5/b/v37NWnSJI0fP15Lly5VMlnR+dU3P/97W7p0qebNm6dPPvkk5KurPD09PerXr5+ee+45XXnllebPZ82apU8++YQZS5cSiYSWL1+e9RmisLlz52rFihV64403NGLEiKgvpyJNnTpVI0eO1COPPFLyc9UHcD2RaW1tVWtrq6/HptNpSVJ3d3eQl1QRvHxu+/bt0+TJk3XBBRfo8ccfr9kQIpX2vzfkamho0AUXXKB169aZg2g6nda6des0d+7caC8OVSmTyegb3/iGli9frtdee40QUoJ0Oh3Y+FnRQcStN998U2+//bYuvfRSDRw4UDt37tR3v/tdjRw5suZmQ7zYt2+fJk2apDPOOEP33nuvDh48aP7utNNOi/DK4m/Pnj06fPiw9uzZo1QqpU2bNkmSRo0apf79+0d7cTEyf/58zZo1S5///Od10UUXacmSJTp27JhuvPHGqC8t1o4ePaodO3aYt3fv3q1NmzZp0KBBGj58eIRXFm9z5szRk08+qRUrVmjAgAFmLVJLS4tOOeWUiK8uvhYsWKDp06dr+PDhOnLkiJ588km99tprWrNmTTAvEMjem5j7zW9+k5k8eXJm0KBBmcbGxsyZZ56ZufnmmzMffvhh1JcWa48//nhGkuMfFDZr1izHz+3VV1+N+tJi54EHHsgMHz4809DQkLnooosyGzdujPqSYu/VV191/N/XrFmzor60WMv3/2ePP/541JcWa3/3d3+XOeOMMzINDQ2Z1tbWzJQpUzIvvfRSYM9f0TUiAACgstXugj8AAIgcQQQAAESGIAIAACJDEAEAAJEhiAAAgMgQRAAAQGQIIgAAIDIEEQAAEBmCCAAAiAxBBAAARIYgAgAAIkMQAQAAkfn/AYyS7l+jQ9wLAAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": { + "engine": 0 + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "# plotting on 1 process only\n", + "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n", + "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n", + "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n", + "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:0] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = DNDarray([[ 1.9905, 1.9855],\n", + " [-2.0095, -2.0145]], dtype=ht.float32, device=cpu:0, split=None)\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:2] Number of points assigned to c1: 100 \n", + "Number of points assigned to c2: 100 \n", + "Centroids = \n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "kmedians = ht.cluster.KMedians(n_clusters=2, init=\"kmedians++\")\n", + "labels = kmedians.fit_predict(data).squeeze()\n", + "centroids = kmedians.cluster_centers_\n", + "\n", + "# Select points assigned to clusters c1 and c2\n", + "c1 = data[ht.where(labels == 0), :]\n", + "c2 = data[ht.where(labels == 1), :]\n", + "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n", + "c1.balance_()\n", + "c2.balance_()\n", + "\n", + "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", + " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", + " f\"Centroids = {centroids}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Plotting the assigned clusters and the respective centroids:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "c1_np = c1.numpy()\n", + "c2_np = c2.numpy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[output:0]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9EElEQVR4nO3df5QU9Z3v/1f3TJiRACNDJirLEBl+KUk052oU0O8uCCvCmkSu+s16NhFdL3c1kIjkZCPRxJsTCWbXo7hqjNG7kntXjYkE3cNXRMVfMYD4I5wov38lIARB0BlgZSYzXd8/Zqqorq7urqqu6qrufj7OYaVn+kd1k7OfV38+78/7kzIMwxAAAEAM0nFfAAAAqF0EEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbOrjvoBCMpmM9u3bp4EDByqVSsV9OQAAwAPDMHTkyBENHTpU6XThOY9EB5F9+/aptbU17ssAAAAB7NmzR8OGDSt4n0QHkYEDB0rqfSODBg2K+WoAAIAXHR0dam1ttcbxQhIdRMzlmEGDBhFEAACoMF7KKihWBQAAsSGIAACA2BBEAABAbCINIg888IDOOussq8ZjwoQJWrFiRZQvCQAAKkikQWTYsGG644479NZbb+nNN9/URRddpK985SvasGFDlC8LAAAqRMowDKOcL9jc3Kx//dd/1XXXXVf0vh0dHWpqalJ7ezu7ZgAAqBB+xu+ybd/t6enRr3/9ax07dkwTJkxwvU9nZ6c6Ozut2x0dHeW6PAAAEIPIi1XfeecdDRgwQA0NDbr++uu1bNkyjRs3zvW+ixYtUlNTk/WHrqoAAFS3yJdmurq6tHv3brW3t+vJJ5/Uww8/rFdeecU1jLjNiLS2trI0AwBABfGzNFP2GpGpU6dq5MiRevDBB4velxoRAECcjr9wt5RKq3HKjbm/W3WPZGTUOPWmGK4s2fyM32XvI5LJZLJmPQAASKxUWp0v3NUbOmyOr7pHnS/cJaVox1WqSItVFyxYoOnTp2v48OE6cuSIHnvsMb388stauXJllC8LAEAozJmQzhfusm6bIaRh6nzXmRL4E2kQOXDggK6++mr9+c9/VlNTk8466yytXLlSf/u3fxvlywIAEBp7GOl86T6pp4sQEqKy14j4QY0IACAp2m8dLfV0SXX91HT7trgvJ9ESXSMCAEClOb7qHiuEqKcrp2YEwRFEAAAowF4T0nT7NjVMne9awIpgytZZFQCASuNWmOpWwIrgCCIAAORjZFwLU63bRiaGi6ouFKsCAIBQUawKAAAqAkEEAADEhiACAIjM8Rfuzru75Piqe3rPcgnhMahcBBEAQHSCnNVSIee7EJjCwa4ZAEBkp8wGOaulYs536QtMUvYWXvu1ojiCCAAg0kE1yFktlXC+S5IDU1TBMgoEEQBA5INq45QbrUChun6eni/IY/wqdcBObGCqoNmaZCy0AQBi1zjlRqt9efuto0P9Zh/krJaynO8SQj1K45QbrWssNTCFVXdi/7c0ny8pszVOBBEAgCXMQdUU5KwW8zF1bRNdHxNWMWgYA3aogSnEQt0og2WYWJoBAFjcBtWSvuEHOKvFHkJ6dq7OuobOF+5S98616tm5OrTlhVKWV5zvzwoMLu/L77WYt0uZySjH8lapCCIAAEnhD6qSgp3VYnuM8xrsISTMQTXIgB3VgXhh1p2EHSyjQBABAEQ3qHoo9Cz4eJdBua5tQuiDaaABO8ID8cKYyYgkWEaAIAIASNYps44dH1mDsqT6tgkFH+53J0zQAbvUkFVIqTMZUQXLKBBEAACRDqp+OQdMSVYI8cTH1tUkDtihzGQkKVgWQRABACSOWxgxA0SxQdlXwWdMA3a+WZsThboTJCOTU6hrfz+FepwkKVgWQxABACSfo07CTxgpVPAZ24CdZ9ame8dqSSeWn3Lu0xdOktaUrBQEEQBAIpmDsrNOwutsRZK3ruab5ejZtTYnMCWxhXyYCCIAgMRxDsrOOomgjcaSNHh7mbVJbAv5ENFZFQCQKPkKSL10ZHV7Dq/dXKXsFuvOduv2bq5hdnYt1sk2im63ScKMCAAgWRwFpDmFnbYlmeOr7lH3jt+pfuQFJ+7vqKE4/sLdVi1I0d0nttqNrL/3PbZh6vxQazS8zNokfWanVAQRAECi5BSQFtmOW9c2Mfv3fUFGUlZg8FJbYl8KaZg635pJkZTznGH0Cim2TbdSmpKVgiACAEg0L9txswboqTflLer0281Vdf2sn4dZo+Glf4n59yT1OIkCQQQAkFjOZZnsVu8TrdmNsIs6nTtuJIVbo+Gxf0mlNCUrBUEEAJBcjmUZKxyk6tSzc7Xq28Zbdw1zu66zLkNSqDUaYZ3BUw0IIgCAxLLPdHTvXGuFEBk9qmubmFMzEkZRp7MY1aoRmTw363Y1hYE4sX0XAJBojVNuVF3bRPXsXJ0VQnp2rj6x1Tbgdl0ntxDiVrQa5Lk9X4Nj27Dz+sLYNpwkzIgAQBXyewJtkh1fdU9WCFFdPw2Y/bgVGrp3rlXPztVFizq9fCbSibqM4y/c7Xo2jfW5RVWj4ePQvmpAEAGAalRNg5mROTEj4rLs0r3jd96KOj18Jlk7bBxBzQwn+ZZ8wgp4vg7tqwIEEQCoQlU1mKXS1oyHjIy6d63LDhS2nhvOIJCvXbp52/dnUqaAVwut3U0EEQCoUn4HsyQu57j1C+nZuTqniZnXIFDqAF/OgJfkQ/vCRLEqAFQxX+eU9H3bdxZKWs3CUuUbMqyCTUe/DXvhqtlHxG8QKPXsFvu5N+23jo5slsltF1De+1ZwgStBBACqmJ/BzO1gudiWc8wlEMcMjX1GpGfnanW+/FPf1+fnM8kn6oPofO8C8hEikxZaWJoBgCoV5JySpNQmeFkCab91tO8g4Ol8Fy+7a1LpyA6i89L+PV9hrqclo4QVMhNEAKAKBRnMTDntzfuWP6KqHck38DdOuVHdO9e6hqIgzcuKfSbdO9eovm2C60B9fNU96t65Rj0711izMZEdROex/buT1xCZtEJmgggAVKOAg5mUu3TRvWtd79ZZFf8GHajgtcA3dKt/iG3mI/CJtEU+k+4dq10bmEknBm1nCLE/PqwwUkr7d68FrkmZ+ZIIIgBQlYIOZvkG+UK7VEqd9i/0Dd3ZP+ToQ1e5BgEzRLi9Py+zNuY2YPt1OsOIuX24vm18oIBXDn5mipKyK4cgAgCQVHzpwgwjUUz7u31Dz7cE4jxjRpLqR05Uz6616t65xuorIklHH/p79exc4xqA7N1UzVmcrOs3T92VTszGJHj3id+ZorDO5ikVQQQA0MvDck7Pn96MbNo/6xt63+m6+UKRc9B0+13v0s4a19fKd7CdVYhqP3W3b2no6ENXSTKs50xCoafb6+dbMjLfm1s4PPrQVbEd5kcQAQBIKr6cE/W0v/X8qbTr6bqmuhHjXZdA8gUg82f29+E2Q5O1FGTT0Fc0a24bti/ZxF3oafFSE9S3bOY8mydfk7hySRmGYZTt1Xzq6OhQU1OT2tvbNWjQoLgvBwBqVr5p/3yDr7Us0Bdaig3S+WYo7I/Luo/t271T+/farMPxmm7f5vl67EsZkpQa3Kp+51zpWqhqXaPH95cU9uUt+8GBWbuRQuig62f8ZkYEAFCQ363AQWoV8s1QOHeuuD2nPZQcfeiq3hCilLWcUj/iPDVOvcm27OMeYLKlZHy4p/cxferbxqu+bbw1UCeh0LMQtx1M9n+39u+NcO1cW24EEQBAYT62AgfqX+Ly/FnLLKvukYyegkW0PTtXq+utX8v4cI9Sg1ut/5rLLEcf+nupp0sv//G4bn3xQ/3k5G9oxj//1LpmGRl171zb++JmfYht23DD5LlZMwVJKfQsKM8OJouRSUSIIogAAArytRXYFirs38idocW+BJDv+YvWmRgZW01H7wyGGgdp0D+/po5/uTAnjBiGoYWvd+vdA3/R/7rnf2vyfxujnl2vn2gZv2ttVvFmVg8T2+cQuI9JmRXawSQpMSEq0iCyaNEi/eY3v9HmzZt10kknaeLEifrJT36isWPHRvmyAICYZIUKxzdy11qPAorOOqTSJ8KC0dP3oA61LzhdkmHNjJhe3N2jN3Yd1je+OFA/feOIVjz0Y01pO8kKK3UjxucUbzqLOCX5n/GJkVsBr6REhahID7175ZVXNGfOHK1du1bPP/+8/vKXv+jiiy/WsWPHonxZAEACNE4Jfoiel0PfGqfc2HcCb19NiMWQlNKgf37txE8MQ3e88oG+OLSfFl50sr44tJ/ueK1dRsNApQcPk9Tbi8T+ugNmP66GqfNVP+I86/W7d6zOu0xlFdEmTOOUG7O2IztDVNED9SIW6YzIs88+m3V7yZIl+vSnP6233npLf/3Xfx3lSwMAkqBv+STveTEuOzS81pk4Zy4cL9w3M9LrxV3H9ca+Li39f1uUSqV084VNuvxXB/XinoymdK7JWk4qWLxpZIr2T/EjUEt8v69hbYu2zRzZxN0Vtqw1Iu3t7ZKk5uZm1993dnaqs7PTut3R0VGW6wIARMS+fJLnvJgcHopjc07iNbfsZj9R7/81DN3xWru+OLSfLhrRKEm6aERj76zIszv0t7fNPBE8grbGDxooIj4J12s9S9XWiNhlMhnNmzdPF1xwgT73uc+53mfRokX64Q9/WK5LAgDkEdY39UZbMzD7llpnQy3783kJA/aZC2vLbp5v/C8eHKg39u2xZkMkZc2KvHz4ZH3J5/vKETBQFCooLbU3SSknMJdT2YLInDlz9O677+q1117Le58FCxZo/vwT/1gdHR1qbW0tx+UBAOxC+qZuLp+cKBxNWcspRWdGCrDvYDGfb8Dsx0/sdpGkun5KDz9Hd/yfZVmzISZzVuRH9z6iSc0fKbOrd4kmSCApJVDk6whbckgo4QTmcipLEJk7d66WL1+uV199VcOGDct7v4aGBjU0NJTjkgAABYTxTT3/8klvGHE7SdcPt2WHrFqRni69fPjkrNoQO/usyPOrXtTFUy/Kan/ufC1PJ/gqWKAI0hK/6HMGXGYqt0h3zRiGoblz52rZsmV68cUXNWLEiChfDgAQIvuOivZbR/tfLrD3FOlrSqZUncxdLUFCyPEX7j6xu8Pxjb/rzV9J6t0V0jB1vtKnn68f3fuI62yIyaoVea1dPYf39M7enDzMdRZIqeJDZtYOFR+Bwm2rcq2INIjMmTNH//Ef/6HHHntMAwcO1P79+7V//359/PHHUb4sANSkrEHa+btV9wQ6wj7owCr1fiN3zqQ0/XjniTCSqvP/zbxvyej4qnus55ekow9dJeOj95Qa3Gr1LFl9+nV6Y1+Xbr6wKWc2xHq6vlmRN/Z1adXbvefSGB+9l3e7cbHP+OjPv+o7UHjZqlz0OSL4ty+XSIPIAw88oPb2dk2aNEmnnXaa9eeJJ56I8mUBoDbZBmk7P9/onYp9Uy86MD/09y6Hqp0oLD360FW+rqdxyo2qGzE+pzeJvQ7l6M+/KsMw9L9+8H198a8a886GmOyzIv2m3GRtN3adBXL5jI+/cLeOPnSVOl+4y+rOag8URx/6+7yvna+g1HcYieDfvlwirRFJ8MG+AFB1wt6B4WnrZ5Gi1roR4/Oe2GvWiPhtMV4/srcdu7MWo3vHavV8uEc9u9bq6XmTtHbdG661IU72WpGVy5/S5JN25Ww3Nrl9xt271lm1KW6fc8/ONfnfY0gFpVHuvokaZ80AQBUJaweG162fXgdAt5/Zj6G3P7+X92htCTZnayT17Oo9tM4wDP34ybUacXK9hvSv0/r9XUWfc0j/Oo04uV4//vUaTf7WudJH7+VtLe/2GadOHibjo/dcP7/ei3IPFGEWlEa2+yZiBBEAqDJBd2Bk9Q5xfFPP2TViG1g9DYAhbyWtbxt/YodMT1f2Kbw90r4jPdp7pEeTluz39bx/ydSr84M96j/mAtW3jVf3zjV5G4DZP+NB3/2dFT7iDAFR7L6JGkEEAKpM4CPq7csstm/qzm/3bs9VbAAM85u//Xo6zZoTydqFU7dzrVZ+/VV98F+94abur85S//++KOs5/us3N6tn7zuq+6uz1LP3D9bPW/qn1X/MBZKRsV6jvm1CThjJaptu+4ytzyBVF0ufjsD/9jEiiABAFSnliPpS6gzKNQDmLnlkd1I1l2yGNTVo2KC+otiezWr48LfZy0Q9W9TwtZv73u/mnNcxi06dszr2a3Ceztu9c23WmS7du9aF/v4LKeXfPk4EEQCoEmG09A5SZ1DWAbBvice8RuvAu77BP+sQvL5QZAYF63ocz2H/uyTX/ib5al3M26nBrVk7d8xrKNeMRKW0c3eT3P08AAB/CtRh+Dmi3k/vENcZE9uJu27bSUvpaWEu8dhDiNmfJDW490gQ62d9W2jNcGK+f/tzmNdtveciunf8zmpPL/VtJ26b2Ne+XjI+3KOGqfM1YPbjgfqBBBbSv30cCCIAUCXsDb5yfjflRs9np/jq8uk2APaduGsf/M3nDaWnRV/Qcc5c9Dvnypy7mgNx76xJOus5copxbTtwunesznkuSaofeYE102H9bMR5kvq2CNuatJUzBIT1bx8HlmYAABa/yyxuA5x9SaC+bbzr85aicepNOv7C3apvG++6M8f8u/N6sn6WpxjX/p5dl1Vssz3mc/fWgvS2rZfRk/W4ci6HhHVicrkRRAAAksKtM4i6p4VrAPKwM8c5WDuLX4+/cHfW0o39sZKyZns6X7jL2rVjrw2JrSajb9dT9861GjD7cevH9uJa+/tLCpZmAAC9Qq4z8FNrUjbOVuiOwlVz+Sbfe85a6umbAZFS8dWGOK7NXLIyW+c7d/gksdV7ykhwH/aOjg41NTWpvb1dgwYNivtyAAA+WMs6fWEkKV0+8y3F+Lk+sz29qa5tYs4sRFxLIda19e0kcquniZqf8Tt50QgAUPHCOFE2KuasRt6D7YowD9mzF6jaC1iDhJAwT88dMPtxK4RI7tuRk4QgAgAIVWgnykYo6LKRvW+IZGTNOvSetHtVsJ1BIZ6ea51wbD13XWJDiEQQAQCErQJ6WvjaomzXt2vGLEy1L32YTc2CzD64BbUgS0b2mhBJVlAya0aSiBoRAEBNKVYjUmgbrFl/kbO92TYzYq8VCXptQepqnIWp5mPNay712vygRgQAABeelo0KLJOYA3rW9ua6fr1LIam6vuZmwZW00yhPo7cBsx/PajmfNPQRAQDUjgLLRubv3fqI5FsmcS7xlLo9tpTDA/M2elNvGDGLaJOGpRkAAFwUWyYJYxuw2+uF9Xxx8jN+MyMCAFWoUtt9J0njlButrrDOZZKwT7ut5NNzS0UQAYBq1FfnICnvAIrCCi6TeFji8SXs56sgLM0AQJWqpqn+cuOzKw1LMwCAyA+eq1ZJXybxsuwmqWKW5ti+CwBVLJEHz4UszPbokpLfkM1LF9YQO7VGjRkRAKhipWwHrRgh18MUmikI87MzZzZkZHJmL6yZjb7f26/JbXYm39KRl/vEjSACAFUqX52DVF07MPwMzInSF6DMZmNS9rXbm5M5eVl2q5SlOYpVAaAKFWrAlfgBOqBS2qPHxRk6nP8t9h7abx1tzXg13b4t8H3CRot3AKh1Sa9zCIm9PsRZD2P+PsnMf4+enaulVF3Wf4uFEC8H9wU+3K+MWJoBgCpUrjqH2NnqQyRlDbqV0i8lq3Ga1HtuTZHCYi/LbpWyNEcQAQBEJmiHV6+Pc9aHmMEjK5wknDVr0XeCr1J1BQuLvWwvNv+e1C3IdgQRAEAkjr9wt7p3rcsqxDSZR9PnnbEIuBPGXpQpJW/QdSpUI5L32j12Ya2UTq0EEQBANFJp10HVDCF1bRPzBgRfO2H6BuZ858IkadC1c9sd49w14xZGSl12S1ooY9cMACAyzsHWXH6oa5uoAbMf9/z4YjthKnLHTMA+IpXAz/hNEAEARMpeJClJStWp6cc7PT/evv20YdI3XAdtM+zUjzjPWtaphDBSrdi+CwBIjMYpN/bOhJiMHs/bSJ3bT7t3rctqXZ4745K2tsS6tThH8lAjAgCI1NGHrjqxG6RvWca1ENWxU8ZZmNq9c01OzYmMjGvzryQWZcIdQQQAEBl7YeqA2Y8XLsR09ASxhxDz7/VtE7IfX6AmJEkn5NpvO7ctJ+003HIjiAAAInF81T1ZIUTK3g1T1zYxa8Yi63cjxueEEOdOmJ4/vZncU4Wd24/7bnfvXJu1bTnowXzVhCACAIiGh34XzlkAexjp2f123sPckn6qsNv2YzOEmNuWq/ncHz8IIgCASATud2FkrO6iztmO46vu6asVWZP41uVup9+a9SzmTqBaDyESu2YAAAnTvWtdTqtz6cQyhj2ESEr0LhnnQXwDZj+edbvWQ4jEjAgAIEHsdSX2HTLWssaI8aofmduRNam7ZJxLSEcfuirRS0pxIIgAABIh72mxqbqcolc3SRvQne8n3w4iKXnXXk4EEQBAMjiKWxun3Hji/JhUXW/X1ArhFqrsMz32mZBaDyMEEQBAIjiLW53LGkpVUFmjc8eQ7bZ1joySu6RUTpw1AwBInHzLNOwyqQx+xm9mRAAAieIWOljGqF4EEQBAsnhohIbqEemC26uvvqovfelLGjp0qFKplJ566qkoXw4AUAUap96Ud8ajccqNNXsmS7WKNIgcO3ZMZ599tu6///4oXwYAAFSoSJdmpk+frunTp0f5EgAAoIJV0F4oAABQbRJVrNrZ2anOzk7rdkdHR4xXAwAAopaoGZFFixapqanJ+tPa2hr3JQEAgAglKogsWLBA7e3t1p89e/bEfUkAACBCiVqaaWhoUENDQ9yXAQAAyiTSIHL06FFt377dur1r1y6tX79ezc3NGj58eJQvDQAAKkCkQeTNN9/U5MmTrdvz58+XJM2aNUtLliyJ8qUBAEAFiDSITJo0SQk+Uw8AAMQsUcWqAACgthBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSlLELn//vt1+umnq7GxUeeff77WrVtXjpcFAAAJF3kQeeKJJzR//nzddtttevvtt3X22Wdr2rRpOnDgQNQvDQAAEi7yIHLXXXdp9uzZuvbaazVu3Dj97Gc/U//+/fXv//7vUb80AABIuEiDSFdXl9566y1NnTr1xAum05o6darWrFmTc//Ozk51dHRk/QEAANUr0iDywQcfqKenR6ecckrWz0855RTt378/5/6LFi1SU1OT9ae1tTXKywMAADFL1K6ZBQsWqL293fqzZ8+euC8JAABEqD7KJ//Upz6luro6vf/++1k/f//993Xqqafm3L+hoUENDQ1RXhIAAEiQSGdE+vXrp3POOUerVq2yfpbJZLRq1SpNmDAhypcGAAAVINIZEUmaP3++Zs2apXPPPVfnnXeeFi9erGPHjunaa6+N+qUBAEDCRR5EvvrVr+rgwYP6wQ9+oP379+sLX/iCnn322ZwCVgAAUHtShmEYcV9EPh0dHWpqalJ7e7sGDRoU9+UAAAAP/Izfido1AwAAagtBBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEJvIgsjChQs1ceJE9e/fXyeffHJULwMAACpYZEGkq6tLV155pW644YaoXgIAAFS4+qie+Ic//KEkacmSJVG9BAAAqHCRBZEgOjs71dnZad3u6OiI8WoAAEDUElWsumjRIjU1NVl/Wltb474kAAAQIV9B5Oabb1YqlSr4Z/PmzYEvZsGCBWpvb7f+7NmzJ/BzAQCA5PO1NPPtb39b11xzTcH7tLW1Bb6YhoYGNTQ0BH48ymvp8hVKp9OaOWNazu+WPbNSmUxGl186PYYrAwBUCl9BpKWlRS0tLVFdCypMOp3Wk8ufkaSsMLLsmZV6cvkzuuLSGXFdGgCgQkRWrLp7924dPnxYu3fvVk9Pj9avXy9JGjVqlAYMGBDVy6KMzPBhDyP2EOI2UwIAgF3KMAwjiie+5ppr9Itf/CLn5y+99JImTZrk6Tk6OjrU1NSk9vZ2DRo0KOQrRFjM8FFfX6fu7h5CCADUOD/jd2S7ZpYsWSLDMHL+eA0hqBwzZ0yzQkh9fR0hBADgWaK276IyLXtmpRVCurt7tOyZlXFfEgCgQiSqoRkqj7MmxLwtiZkRAEBRBBEE5laY6lbACgBAPgQRBJbJZFwLU83bmUwmjssCAFSQyHbNhIFdMwAAVJ5E7JoBAAAohiACAABiQxCJ0dLlK/JudV32zEotXb4i1McBAJA0BJEYmWe1OEOFuRslnXb/5wn6uHIiLAEAvGDXjEdRnDQb9KyWSjjjhQPxAABeEEQ8impgtYeKp55d6fmslqCPK5ckh6UoQiUAIBiCiEdRDqwzZ0yzwoSfs1qCPs6rUgfspIYlZmsAIDniLyaoIDNnTNMVl87Qk8uf0axvzQ/t233Qs1qiPuMljFqUsA/EC6P2xP7vaD5XUmZrAKDWMCPiU9izEEHPajHvN27MaN0yb27O48JYYghjFsgtLJXymYU1m5HU2RoAqDUEEZ/CHFiDntViDyEbt27LuoYnlz+jjVu3aePWbaEsMZQyYEdxIF6YS2RRL20BAIojiPgQ9sAa9KwW++Oc12APIWENrEEG7CgPxAtrNiPs2RoAgH8EEY+iGFi9FHoW4zYonzl6VKgDapABO+oD8UqdzYhitgYA4B9BxKMknTTrrJOwD8qS9NmxY/I+1u9OmKADdhghq5BSZjOinK0BAPhDEPEo6oHVD+egKckKIcX4KfZM6oBd6mxGkkIlANQ6gkiFcgsjZogoNCj7KfaMc8DON3NjXuuZo0cpk8nkFOra31O+XUNJCpUAUOsIIlXCWSfhNYwUKvaMc8DON3OzYctWSSeWn5z3McMJjckAoDIQRCqYOSg76yS8zFgkfetqvlmOTdu254SmJLaRBwB4QxCpUM5B2VknUWwQroStq15mbmhMBgCVjSBSgfwUkbrVWtgbop0xamTeZRA39udzPre9LiOsw+O8zNwkfXYHAJAfQaQCOYtInYHAviSzefsObdy6TVL20oXZlXXcmNG+dsLYQ4v97+bjr7h0Rqg1Gl5mbiphdgcA4I4gUoGcswyFtuSaYcP8fSaTsUKI24xKsZ0w9tByxaUzrMPjpOxdO2EdBlhsmy6NyQCgshFEqoCXLbnmbXPWoNC2XD+vV19fZ/08zBoNL8tP5t+T1ucEAOAdQaSC2ZdkirV6D7uOwvl8kkKt0fDaw4TGZABQ2QgiFcxPq/ew6yiczyflbiMuRak9TJgJAYDKQBCpYF5bvYddR+EsRjWf67JLyrMs4ve8HABActVsEKmWwaxYq/eNW7flLUwNEhjcQojz7/YC1ijCiJ/zcgAAyVazQaRaBzO3Vu/2LbomtzoKL+FMOlGXsXT5CtfzacwAF1WNhp/zcgAAyVazQaSaBrOgrd6d79FLOLP/3DljZIaTfDUiYc400VEVAKpDzQYRqToGM2er99vvvjdvq/diQSCMcFbOmSY6qgJA5avpICJ5H8ySVlOydPkKq2uqPSR8duwYbdq2vWDjr0JKDWflnGmioyoAVL503BcQN7fBzI35Td/5e3OQTafL+1Gm0+mcFu1O5pKN3yAwc8Y06/MIMtMwc8Y0q2B11rfmRxZCzOf9xb/dZb1evn8/c8ko33MtXb4itGsDAHhX0zMifra1Jq2mxH495kyA246WWd+a73tWI4yZhiiXTfwc+mfyumSUtJkvAKh2NRtEggxmcdSUFBoYJenM0aPyXk+QIOAlnHkZrNPpdGTLJl67rrr9rliQrNbdVACQVDUbRIIMZubvnQN8lN+ivQyM23btygkcQWY1ioWzjVu36ZZ5c3OuyR4+7Cf7RnUQXdCuq16CZNJmvgCg2tVsEAk6mLkN8FFO+xcbGCW51rgE6aRaKJyZjdHsgcbeMM0MH4VO9k3CQXReloyqYTcVAFSKmg0iQeRbtnDrJhrmtH++gdH8mVvgcAaBDVu25g0CxWZtli5foXFjRmvcmNFZz2GGkFRKVvjIZDKeG6jFwetMEVuDAaA8an7XjFf5li3sAaTYThH7/c2ZC6/T/s6dLJJcr+fM0aNcH28egGfupJF6A8bCxfe57vqx7ySxByj7e9y4dZskyTB6m6mZyzNJLfT0s9PG624qAEBpmBHxyEtNyeWXTo9s2t85MG7YstX1cbfe9M2sduxur2vOAph9SJwzGM4ZGvtjzxw9yipENZnX9NvX1+ngocNZj3F7vjh4KU42g5T5M+dM04YtW3XrTd+M5w0AQJViRsSjyy+dnjcszJwxTZdfOt3zt2i/fTqc3+TPHD1Km7Ztz3tfM4Q4X98+I/O1OfOyajqKzdCYj920bXtWyBk3ZrR+8W93adyY0Tp46LBahjQHmvGJWqEgaS4pmTM/+a5307btzIwAQMiYEQmJn54kfna0uA3kXrunFioONQxD9fV1umXeXOtxfgszW4Y0a+PWbVq4+L6cgtWkFXo6l4TshcP2a9uwZas2bduuDVu2uhYGx13jAgDVhiASAj89SfwEFsn9m7z9ud0GTPt9zfuYNSLma6XTKXV392jh4vt0xqiRVihKp1Oug639OlOplAzD0MFDh5VOp6zwYYaaTCajrTt3JrrQM1/hsBnyNm3bHqgZHADAH4JICLz2JAnSRM3LAXX5Bkz7fexLOebMhTmjcfDQIavmY9/uP2nO//wfavo/v9DUqVOt63559RpJyglQmYyhdDqlM0aNtF6zEs6AKbYtmh0zAFAeBJEQeO1J4gwszr4i9pkIL7tMCm0xtT+3eR+7TzU36+Chw+p/0klWgalhGNr49ht6/8/79I25c7Vl0yb9+J77rd0x+WYGesNI2rruID1M4lBoW3TSgxQAVIvIgsgf//hH/ehHP9KLL76o/fv3a+jQofra176mW265Rf369YvqZRPNGSpKbSdeaObBfO6NW7fZllwMSbKCRcuQZiuESNKHB/Zr/969GnvWF7TlD+s144qvqvmU0/Sp5mZ9+lNDXGd2JFk9Ssy+IkltZubGGeakYM3gAADBRBZENm/erEwmowcffFCjRo3Su+++q9mzZ+vYsWO68847o3rZilJKO/FiMw/2hmPjxozWGaNGZm2vlaRDH35o/d0wDL3xu99qyCmn6r9N/H/0wfv79cbvfqt/+B//pEkTx1ut2/MtL5k/T3IzMzfOMOd36QwAUJrIgsgll1yiSy65xLrd1tamLVu26IEHHiCI2JidSN12meRbnvFSayIpayfL5u07coKAOUMiSX/es1uH3t+vSX/3FaVSKX3+3PP18v/3tP60Y4eePPyh9VpLl68IdEaPvUeHU6FlqCjP8XF+jrfffa/rtugkBykAqHRl7SPS3t6u5ubmvL/v7OxUR0dH1p9ql06ntXHrNmsXi1nrYQ6SboO3l54Y5n1umTfX6npqahmS/W9gGIbeefN1DTnlVJ3WOlySdFrrcA055VS99tILOnP0KOuazNdxY/ZTyfc+3bqYFnqfpTyuGLcwd+tN38zbbbXQewMABFe2YtXt27fr3nvvLTgbsmjRIv3whz8s1yWVJKxv6s4zW8wttc7Ta+3P5+fAPnPpweSsC5FyZ0MkZc2KdB3tKLk7atBlqFKWrwoJevoyACBcKcMwjOJ3O+Hmm2/WT37yk4L32bRpk8444wzr9t69e/U3f/M3mjRpkh5++OG8j+vs7FRnZ6d1u6OjQ62trWpvb9egQYP8XGbk8g2GfgdJ8/5mQDB7dNj7cgQddM3Hnjl6lD47dowVeEz19XX6yrSLdeM3rtfHnZ26eOaVVhCRemdKnlv2a0nSxTOv1JVf+ruCS0Z+rsmsyfD7Ofl9HACg/Do6OtTU1ORp/PY9I/Ltb39b11xzTcH7tLW1WX/ft2+fJk+erIkTJ+rnP/95wcc1NDSooaHB7yXFIoxv6s77f33uPGUyhlKpVFbH0lJCiP2xmUxGBw8dshqRdXf3aP3bb+m93X/Kmg0x2WdFPjywP+d57a/lZwYoSI+OoI8DACSb7yDS0tKilpYWT/fdu3evJk+erHPOOUePPPJI4PX8pMrXh8LrIGlfHugdzA1rm60ZRvyGEHPJyG3pYfP2HTp46LC1iyaVSul/XndtVm2Ik1kr8sbvfqvBnz5VqVTKqhmR/B9oF7TZWSlN0qIseAUAlCayZLB3715NmjRJw4cP15133qmDBw9q//792r9/f1QvWdTS5SvyHlpmP/bej5kz/B1gZ2cepGcfzP/vfYuVTqdlGEbewbMQs7jT+VhzdqVlSLNumTdXl186XZ+sT+nQ+/v1+XPPz5kNMZmzIofe368/79lt/UzKnXUp9vnefve9WYf35SsMdXtskMc5P5OwC14BAKWLrFj1+eef1/bt27V9+3YNGzYs63c+y1JCU2oDMTfFvqkX+zZuHrLmLEw1ZzUWLr5Pt8yb6/l6Zs6YZjUYM28ve2alFUIOHjqs2+++V7fMm6sf/OAHajn1tLyzISZzVuSdN1/XWV/4gjZu3ebaVj7f52vfFuvccrxx67aCPTqCtMV3+0yc90/KqcAAUOsiCyLXXHNN0VqScgt7QPLSzrxY+Dlz9Kicx5u3zVmMoC3GnUtGG7Zs1cFDh7Vp23bddvuPtW7dOtfaECd7rcgf1q/XsNNPd50Byvf5miHE2ezMDEjjxowu2H8kjN0tpS6jAQCiUXNnzYQ1IHn9pu41/Lj9zL5rxv78xZgnyErKal1u/uxTzYP10/vv04BBTWo86SQdPnig6HM2nnSSBgxq0jtvvq7TWofrE5+od50BKnR+i7k84icA+tmqXAwFrwCQPDUXRKTgA5J9mcX5Td1Z9Gj/pu4l/ITZ18Lem0SS1bpc6p2VeGfTJn189Kj+69hRPfvkLz0/b+919CiT6dGYtjEyDMM1JNk/X+eylP0zsBe9lkMlnAoMALWmJoNI0AHJvsxi/6burDFxe65i4SfMb/72JQ973xCzN8nCxffp+H+/Up0ffyxJahs+XNf9w1eznuPhR3+pXbv3WLfN/iaf/+w4NZ082NrR89mxY3LCiPn5Outc7J+B1DtzUy6VdCowANSSmgsipQxIpdSYlOvbuDMU2YOIvTfJwEFN+uSAgUqn0/ro4+P60/6DWUGi/eNOXXjBBVmPbxnSrL0HPtDeAx+4zug4X99e57Jw8X0aN2Z0VpfXcgmj4BUAEI2aCiJh78DwWmNSzm/j5hKPeY2SrPAjZR+EZ/7cPHTPvB7zID5z1sN8LrM1vHNJpVidixlGzFDjvL6oQwDt3AEguWoqiIS5A8NrjUm+DqfOwd9+/1IabF1+6fSsoOMWfpzn2Dy5/JmsnSvmQXz2a/aypPLu5i05O2MkZS0ROT+vcoSRMJe9AADhqqlOTmYDMTczZ3g/XdVtmSUft/BjDvTObathNdjasGWrpNyeHePGjJYkfaq5OevnV1w6o+8E4LTrNdvfr/35nT53xlhru7Hdy6vX9r3vVNbnZb42MxIAULtqakYkDH6XWdzCjX15xwwHYTbYMg+5cz7PGaNGZv3XeT3vbt6Sc832GZMzRo3M6lLqNrPkbKa2cPF9+uBwb1v5cWNG5/y+XDMS9tb3zp08ziZytHsHgPIhiPgQZtFjlA228g2kXpYo3ApxzaUV+7JLvvdr9jB5cvkz+s0zz1rLUOZSVL6dNlEzA5R9mcjtPQbprgsACI4g4kPYRY9Ja7DlDBnOolVn+HJ7v/bnMGcZ7CHEbadNOThnoZ5c/oxVQOv2HgEA5ZEy4jr4xYOOjg41NTWpvb1dgwYNivtyQmd+GzdrTZIyEJZ6XfblKlOh5m/lZK/D6Q1KvacdJ+WzB4Bq4Gf8rqli1SQp9UTZKM2cEfxEYftSR6Hf+y3IDevkZPO9mbMxmYyRiNkoAKhVBJEY5Ks1SUoY8bMryPk48305g8iTy5/RwsX3BS7ItRfJur2m12Bj7/ra+7wpX+8RABAuakRikOQGW6U0X3M2U3M2LiulDqOUrrbO92YvvjX/S4dVAIgHQSQGSW2w5XVXkP3wP7vLL51udVHN1wytFKXsNHLbHePcNUMYAYDyI4jA4nWmxn74n7Mfh3Obr3OZZ8OWrSUN9EF3GpnvzdwJ5LYDyNlgDgAQPXbNIJB8SzjO3THF7hP0dZO20wgAcIKf8btmZ0TyLS9I8W4vrRTFlkmiOPG2nIcHAgDKo2aDSKHlBXuhJfIrtEwSdkFuFMEGABC/mg0iYezCqHVu23zNzy3sgtwk7zQCAARX8zUi1BwEE0X9BwCgOlAj4kPSznsJWxS1MElfJvHyniVRIwQACVDznVWDdhGtFGF1JLUrtExibpGNk5f3HMXnAgDwr6ZnRGphF0YUtTDlashmzmyYp/g6i4rNnztnL/y8Z2qEACBeNRtEkr68EKZSOpLGyZy1MLuhStmBwd4l1cnLe67UzwUAqknNBpFa24VRibUw9qBgtmDfuHVb1jkxhYKDl/dciZ8LAFSTmg0iST3vJUz2ok1nLczCxffpjFEjE1+QaQ8j6XRKG7duUzqd9nSAXqHtxX7uAwCITs0GkUrnZWeIubRhziKYA7d5MF2lsM9aSL2zVcVmL7zU/9RCjRAAJB1BJEY/uuvflE6ndcu8uTm/W7j4PmUyGX1//rdcH+ulM+zMGdOyljLMwda+tFEJMwDmrEU6nVImYyidThecvfBS/2P+vRZqhAAgyQgiMTKXGBYuvi8rjJgzFuPGjM77WK87Q84YNVKStHHrNs361vysgkx7T42kcham2v+bLzB4rf+ppRohAEiqmu+sGjd76Lhl3tyc28V47QxrhpD6+jr94t/uiuKthM5td0y+XTPMXgBAcvgZvwkiCeCs2fAaQkzFQkaltrEP2kcEABAvgkgF+odv3Gj9/dGf3uP5cc6QceboUfrs2DGuBZmStGHLVm3atr1iwggAoPJw1kyFWbj4vpzbfpZlnEsWm7Ztt+5jDyHm3z87dgwFmQCAROBAjZjZa0Ie/ek9Vt2DM5wsXb4i61wU50zH0uUrrLNepN7QsWHL1pwQMnPGtMScCQMAAEEkRm6FqbfMm+saRpyHtJk7QySz2VfvP6UZMs4cPUqbtm3XU8+6n58yc8a0WOsqnMHKfnvZMyu1dPkK63fO2wCA6kEQiVEmk3EtTDXDiH3GwgwYZhgxQ0S+kHHrTd+06kaS2LrcGazM2wsX35cVrDgNFwCqGzUiMcrXrEySa43IzBnTtGHL1ryHtJk7SS6/dHriW5e79UFxa77GabgAUN0IIhXms2PHaNO27TkzHfZBu1Jal7udfmsuSzmbrwEAqhPz3RXMnOlwFq66tS63L+skycwZ07KWkG6ZNzfRS0oAgHAxI1JB3AKHOdNhBo+ly1dUVOtyt1OBk7ykBAAIF0GkQrjVS9hPpDUV2gmTtAHd+Z6cu4iSuqQEAAgPSzMVwnmQm30mQertmFpJ3OpY3E4FTuqSEgAgHMyIVAj7TEe+YtRKWsZwBiv7bfupwEldUgIAhIOzZipMvi2tbHUFACQFZ81UMedMgomZAwBAJWJGBAAAhMrP+B1pseqXv/xlDR8+XI2NjTrttNP09a9/Xfv27YvyJQEAQAWJNIhMnjxZv/rVr7RlyxYtXbpUO3bs0BVXXBHlSwIAgApS1qWZ//zP/9Rll12mzs5OfeITnyh6f5ZmAACoPIksVj18+LAeffRRTZw4MW8I6ezsVGdnp3W7o6OjXJcHAABiEHlDs+9+97v65Cc/qSFDhmj37t16+umn89530aJFampqsv60trZGfXkAACBGvoPIzTffrFQqVfDP5s2brft/5zvf0e9//3s999xzqqur09VXX618q0ELFixQe3u79WfPnj3B3xkAAEg83zUiBw8e1KFDhwrep62tTf369cv5+XvvvafW1latXr1aEyZMKPpa1IgAAFB5Iq0RaWlpUUtLS6ALM5tt2etAAABA7YqsWPX111/XG2+8oQsvvFCDBw/Wjh079P3vf18jR470NBsCAACqX2TFqv3799dvfvMbTZkyRWPHjtV1112ns846S6+88ooaGhqielkAAFBBIpsR+fznP68XX3yxpOcwy1fYxgsAQOUwx20vZaiJPvTuyJEjksQ2XgAAKtCRI0fU1NRU8D6JPvQuk8lo3759GjhwoFKplK/HdnR0qLW1VXv27GHHjU98dsHwuQXD5xYcn10wfG7B+PncDMPQkSNHNHToUKXThatAEj0jkk6nNWzYsJKeY9CgQfwPLSA+u2D43ILhcwuOzy4YPrdgvH5uxWZCTJF3VgUAAMiHIAIAAGJTtUGkoaFBt912G1uFA+CzC4bPLRg+t+D47ILhcwsmqs8t0cWqAACgulXtjAgAAEg+gggAAIgNQQQAAMSGIAIAAGJTM0Hky1/+soYPH67Gxkaddtpp+vrXv659+/bFfVmJ9sc//lHXXXedRowYoZNOOkkjR47Ubbfdpq6urrgvLfEWLlyoiRMnqn///jr55JPjvpxEu//++3X66aersbFR559/vtatWxf3JSXeq6++qi996UsaOnSoUqmUnnrqqbgvqSIsWrRIX/ziFzVw4EB9+tOf1mWXXaYtW7bEfVmJ98ADD+iss86yGplNmDBBK1asCO35ayaITJ48Wb/61a+0ZcsWLV26VDt27NAVV1wR92Ul2ubNm5XJZPTggw9qw4YNuvvuu/Wzn/1M3/ve9+K+tMTr6urSlVdeqRtuuCHuS0m0J554QvPnz9dtt92mt99+W2effbamTZumAwcOxH1piXbs2DGdffbZuv/+++O+lIryyiuvaM6cOVq7dq2ef/55/eUvf9HFF1+sY8eOxX1piTZs2DDdcccdeuutt/Tmm2/qoosu0le+8hVt2LAhnBcwatTTTz9tpFIpo6urK+5LqSj/8i//YowYMSLuy6gYjzzyiNHU1BT3ZSTWeeedZ8yZM8e63dPTYwwdOtRYtGhRjFdVWSQZy5Yti/syKtKBAwcMScYrr7wS96VUnMGDBxsPP/xwKM9VMzMidocPH9ajjz6qiRMn6hOf+ETcl1NR2tvb1dzcHPdloAp0dXXprbfe0tSpU62fpdNpTZ06VWvWrInxylAr2tvbJYn/n+ZDT0+PfvnLX+rYsWOaMGFCKM9ZU0Hku9/9rj75yU9qyJAh2r17t55++um4L6mibN++Xffee6/+6Z/+Ke5LQRX44IMP1NPTo1NOOSXr56eccor2798f01WhVmQyGc2bN08XXHCBPve5z8V9OYn3zjvvaMCAAWpoaND111+vZcuWady4caE8d0UHkZtvvlmpVKrgn82bN1v3/853vqPf//73eu6551RXV6err75aRg02lvX7uUnS3r17dckll+jKK6/U7NmzY7ryeAX53AAk05w5c/Tuu+/ql7/8ZdyXUhHGjh2r9evX6/XXX9cNN9ygWbNmaePGjaE8d0W3eD948KAOHTpU8D5tbW3q169fzs/fe+89tba2avXq1aFNL1UKv5/bvn37NGnSJI0fP15LlixROl3R+TWwIP97W7JkiebNm6ePPvoo4qurPF1dXerfv7+efPJJXXbZZdbPZ82apY8++ogZS49SqZSWLVuW9RmisLlz5+rpp5/Wq6++qhEjRsR9ORVp6tSpGjlypB588MGSn6s+hOuJTUtLi1paWgI9NpPJSJI6OzvDvKSK4Odz27t3ryZPnqxzzjlHjzzySM2GEKm0/70hV79+/XTOOedo1apV1iCayWS0atUqzZ07N96LQ1UyDEPf/OY3tWzZMr388suEkBJkMpnQxs+KDiJevf7663rjjTd04YUXavDgwdqxY4e+//3va+TIkTU3G+LH3r17NWnSJH3mM5/RnXfeqYMHD1q/O/XUU2O8suTbvXu3Dh8+rN27d6unp0fr16+XJI0aNUoDBgyI9+ISZP78+Zo1a5bOPfdcnXfeeVq8eLGOHTuma6+9Nu5LS7SjR49q+/bt1u1du3Zp/fr1am5u1vDhw2O8smSbM2eOHnvsMT399NMaOHCgVYvU1NSkk046KearS64FCxZo+vTpGj58uI4cOaLHHntML7/8slauXBnOC4Sy9ybh/vCHPxiTJ082mpubjYaGBuP00083rr/+euO9996L+9IS7ZFHHjEkuf5BYbNmzXL93F566aW4Ly1x7r33XmP48OFGv379jPPOO89Yu3Zt3JeUeC+99JLr/75mzZoV96UlWr7/f/bII4/EfWmJ9o//+I/GZz7zGaNfv35GS0uLMWXKFOO5554L7fkrukYEAABUttpd8AcAALEjiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNv8/S0UKfqPA5WAAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": { + "engine": 0 + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px --target 0\n", + "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n", + "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n", + "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n", + "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Iris Dataset\n", + "------------------------------\n", + "The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from\n", + "three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader.\n", + "\n", + "**NOTE: you might have to change the path to the dataset in the following cell.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "iris = ht.load_csv(\"heat/datasets/iris.csv\", sep=\";\", split=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well.\n", + "\n", + "Fitting the dataset with `kmeans`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[1:54]: \u001b[0m\n", + "KMeans({\n", + " \"n_clusters\": 3,\n", + " \"init\": \"probability_based\",\n", + " \"max_iter\": 300,\n", + " \"tol\": 0.0001,\n", + " \"random_state\": null\n", + "})" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 1, + "engine_uuid": "4a6ffcbf-4b7c9961beb0aa49f4f299a5", + "error": null, + "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", + "execute_result": { + "data": { + "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" + }, + "execution_count": 54, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:30:37.715568Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[3:54]: \u001b[0m\n", + "KMeans({\n", + " \"n_clusters\": 3,\n", + " \"init\": \"probability_based\",\n", + " \"max_iter\": 300,\n", + " \"tol\": 0.0001,\n", + " \"random_state\": null\n", + "})" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 3, + "engine_uuid": "b9f6f6e8-01c224a4024814eaffce2266", + "error": null, + "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", + "execute_result": { + "data": { + "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" + }, + "execution_count": 54, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:30:37.715694Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[0:75]: \u001b[0m\n", + "KMeans({\n", + " \"n_clusters\": 3,\n", + " \"init\": \"probability_based\",\n", + " \"max_iter\": 300,\n", + " \"tol\": 0.0001,\n", + " \"random_state\": null\n", + "})" + ] + }, + "metadata": { + "after": null, + "completed": null, + "data": {}, + "engine_id": 0, + "engine_uuid": "26ba0021-35d3d060b50582f7d11d6ead", + "error": null, + "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", + "execute_result": { + "data": { + "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" + }, + "execution_count": 75, + "metadata": {} + }, + "follow": null, + "msg_id": null, + "outputs": [], + "received": null, + "started": null, + "status": null, + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:30:37.715223Z" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\u001b[0;31mOut[2:54]: \u001b[0m\n", + "KMeans({\n", + " \"n_clusters\": 3,\n", + " \"init\": \"probability_based\",\n", + " \"max_iter\": 300,\n", + " \"tol\": 0.0001,\n", + " \"random_state\": null\n", + "})" + ] + }, + "metadata": { + "after": [], + "completed": "2025-05-19T19:30:37.759682Z", + "data": {}, + "engine_id": 2, + "engine_uuid": "e3e9e719-1b11a826b66969f71d179e21", + "error": null, + "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", + "execute_result": { + "data": { + "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" + }, + "execution_count": 54, + "metadata": {} + }, + "follow": [], + "is_broadcast": false, + "is_coalescing": false, + "msg_id": "762efd2c-13bb8db32ded39032c1e088e_231924_46", + "outputs": [], + "received": "2025-05-19T19:30:37.766196Z", + "started": "2025-05-19T19:30:37.717854Z", + "status": "ok", + "stderr": "", + "stdout": "", + "submitted": "2025-05-19T19:30:37.715597Z" + }, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "k = 3\n", + "kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n", + "kmeans.fit(iris)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[stdout:2] Number of points assigned to c1: 97 \n", + "Number of points assigned to c2: 24 \n", + "Number of points assigned to c3: 29\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:3] Number of points assigned to c1: 97 \n", + "Number of points assigned to c2: 24 \n", + "Number of points assigned to c3: 29\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:0] Number of points assigned to c1: 97 \n", + "Number of points assigned to c2: 24 \n", + "Number of points assigned to c3: 29\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "[stdout:1] Number of points assigned to c1: 97 \n", + "Number of points assigned to c2: 24 \n", + "Number of points assigned to c3: 29\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%px\n", + "labels = kmeans.predict(iris).squeeze()\n", + "\n", + "# Select points assigned to clusters c1, c2 and c3\n", + "c1 = iris[ht.where(labels == 0), :]\n", + "c2 = iris[ht.where(labels == 1), :]\n", + "c3 = iris[ht.where(labels == 2), :]\n", + "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n", + "#TODO is balancing really necessary?\n", + "c1.balance_()\n", + "c2.balance_()\n", + "c3.balance_()\n", + "\n", + "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", + " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", + " f\"Number of points assigned to c3: {c3.shape[0]}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of points assigned to c1: 38 \n", + "Number of points assigned to c2: 50 \n", + "Number of points assigned to c3: 62\n" + ] + } + ], + "source": [ + "# compare Heat results with sklearn\n", + "from sklearn.cluster import KMeans\n", + "import sklearn.datasets\n", + "k = 3\n", + "iris_sk = sklearn.datasets.load_iris().data\n", + "kmeans_sk = KMeans(n_clusters=k, init=\"k-means++\").fit(iris_sk)\n", + "labels_sk = kmeans_sk.predict(iris_sk)\n", + "\n", + "c1_sk = iris_sk[labels_sk == 0, :]\n", + "c2_sk = iris_sk[labels_sk == 1, :]\n", + "c3_sk = iris_sk[labels_sk == 2, :]\n", + "print(f\"Number of points assigned to c1: {c1_sk.shape[0]} \\n\"\n", + " f\"Number of points assigned to c2: {c2_sk.shape[0]} \\n\"\n", + " f\"Number of points assigned to c3: {c3_sk.shape[0]}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "heat-dev-311", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/source/tutorials/notebooks/6_profiling.ipynb b/doc/source/tutorials/notebooks/6_profiling.ipynb new file mode 100644 index 0000000000..973dcfe6b6 --- /dev/null +++ b/doc/source/tutorials/notebooks/6_profiling.ipynb @@ -0,0 +1,609 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": {}, + "source": [ + "# Distributed profiling and energy measurements with perun" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": {}, + "source": [ + "How to locate performance issues on your distributed application, and fix them, in three steps:\n", + "\n", + "1. Find the problematic/slow function in your code.\n", + "2. Gather statistics and data about the slow function.\n", + "3. Fix it!\n", + "\n", + "---\n", + "\n", + "
\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": {}, + "source": [ + "If you want more information on perun, find any issues, or questions leaves us a message on [github](https://github.com/Helmholtz-AI-Energy/perun) or check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest)." + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "## Installation\n", + "\n", + "Perun can be installed with ```pip```:\n", + "\n", + "```shell\n", + "pip install perun\n", + "```\n", + "\n", + "Thourgh pip, optional dependencies can be installed that target different hardware accelerators, as well as the optional MPI support.\n", + "\n", + "\n", + "```shell\n", + "pip install perun[mpi,nvidia]\n", + "# or\n", + "pip install perun[mpi,rocm]\n", + "```\n", + "\n", + "Running the cell below will install perun." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: perun[mpi,nvidia] in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (0.9.0)\n", + "Requirement already satisfied: h5py>=3.5.9 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (3.13.0)\n", + "Requirement already satisfied: numpy>=1.20.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (2.2.5)\n", + "Requirement already satisfied: pandas>=1.3 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (2.2.3)\n", + "Requirement already satisfied: psutil>=5.9.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (7.0.0)\n", + "Requirement already satisfied: py-cpuinfo>=5.0.0 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (9.0.0)\n", + "Requirement already satisfied: tabulate>=0.9 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (0.9.0)\n", + "Requirement already satisfied: mpi4py>=3.1 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from perun[mpi,nvidia]) (4.0.3)\n", + "Collecting nvidia-ml-py>=12.535.77 (from perun[mpi,nvidia])\n", + " Using cached nvidia_ml_py-12.575.51-py3-none-any.whl.metadata (9.3 kB)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from pandas>=1.3->perun[mpi,nvidia]) (2025.2)\n", + "Requirement already satisfied: six>=1.5 in /home/juanpedroghm/.pyenv/versions/3.11.2/envs/heat-dev311/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas>=1.3->perun[mpi,nvidia]) (1.17.0)\n", + "Using cached nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB)\n", + "Installing collected packages: nvidia-ml-py\n", + "Successfully installed nvidia-ml-py-12.575.51\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "perun 0.9.0\n" + ] + } + ], + "source": [ + "%%bash\n", + "pip install perun[mpi,nvidia]\n", + "perun --version" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "## Basic command line usage\n", + "\n", + "Perun is primarily a command line tool. The complete functionality can be accessed through the ```perun``` command. On a terminal, simply type ```perun``` and click enter to get a help dialog with the available subcommands." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "usage: perun [-h] [-c CONFIGURATION] [-l {DEBUG,INFO,WARN,ERROR,CRITICAL}]\n", + " [--log_file LOG_FILE] [--version]\n", + " {showconf,sensors,metadata,export,monitor} ...\n", + "\n", + "Distributed performance and energy monitoring tool\n", + "\n", + "positional arguments:\n", + " {showconf,sensors,metadata,export,monitor}\n", + " showconf Print perun configuration in INI format.\n", + " sensors Print available sensors by host and rank.\n", + " metadata Print available metadata.\n", + " export Export existing output file to another format.\n", + " monitor Gather power consumption from hardware devices while\n", + " SCRIPT [SCRIPT_ARGS] is running. SCRIPT is a path to\n", + " the python script to monitor, run with arguments\n", + " SCRIPT_ARGS.\n", + "\n", + "options:\n", + " -h, --help show this help message and exit\n", + " -c CONFIGURATION, --configuration CONFIGURATION\n", + " Path to perun configuration file.\n", + " -l {DEBUG,INFO,WARN,ERROR,CRITICAL}, --log_lvl {DEBUG,INFO,WARN,ERROR,CRITICAL}\n", + " Logging level.\n", + " --log_file LOG_FILE Path to the log file. None by default. Writting to a\n", + " file disables logging in stdout.\n", + " --version show program's version number and exit\n" + ] + } + ], + "source": [ + "!perun" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": {}, + "source": [ + "**perun** can already be used after this, without any further configuration or modification of the code. perun can monitor command line scripts, and other programs from the command lines. Try running the ```perun monitor -b sleep 10``` on a terminal, or by running the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/juanpedroghm/code/heat/doc/source/tutorials/notebooks\n", + "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R3/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n", + "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R3/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n", + "[2025-05-20 16:59:39,969][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R1/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n", + "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R1/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n", + "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n", + "[2025-05-20 16:59:39,970][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n", + "[2025-05-20 16:59:39,976][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R2/4:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n", + "[2025-05-20 16:59:39,976][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R2/4:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n" + ] + } + ], + "source": [ + "%%bash\n", + "pwd\n", + "mpirun -n 4 perun monitor -b sleep 10" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": {}, + "source": [ + "In the directory reported by ```pwd```, you should see a new directory called ```perun_results```, (might be named ```bench_data``` if the current directory is the heat root directory ) with two files, **sleep.hdf5** and **sleep_.txt**. \n", + "\n", + "The file **sleep_.txt** contains a summary of what was measured on the run, with the average power draw of different hardware componets, memory usage, and the total energy. The available information depends on the available *sensors* that perun finds. You can see a list of the available sensors by running the sensors subcommand:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2025-05-20 16:55:39,740][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/1:\u001b[1;31mUnknown error loading dependecy NVMLBackend\u001b[0m\n", + "[2025-05-20 16:55:39,740][\u001b[1;36mperun.core\u001b[0m][\u001b[1;35mbackends\u001b[0m][\u001b[1;31mERROR\u001b[0m] - R0/1:\u001b[1;31mNVML Shared Library Not Found\u001b[0m\n", + "| Sensor | Source | Device | Unit |\n", + "|-----------------:|--------------:|----------------:|-------:|\n", + "| cpu_0_package-0 | powercap_rapl | DeviceType.CPU | J |\n", + "| CPU_FREQ_0 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_1 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_2 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_3 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_4 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_5 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_6 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_FREQ_7 | psutil | DeviceType.CPU | Hz |\n", + "| CPU_USAGE | psutil | DeviceType.CPU | % |\n", + "| DISK_READ_BYTES | psutil | DeviceType.DISK | B |\n", + "| DISK_WRITE_BYTES | psutil | DeviceType.DISK | B |\n", + "| NET_READ_BYTES | psutil | DeviceType.NET | B |\n", + "| NET_WRITE_BYTES | psutil | DeviceType.NET | B |\n", + "| RAM_USAGE | psutil | DeviceType.RAM | B |\n", + "\n" + ] + } + ], + "source": [ + "!perun sensors" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": {}, + "source": [ + "The other file, **sleep.hdf5**, contains all the raw data that perun collects, that can be used for later processing. To get an interactive view of the data, navigate to [myhdf5](https://myhdf5.hdfgroup.org), and upload the file there.\n", + "\n", + "This will let you explore the data tree that perun uses to store the hardware information. More info on the data tree can be found on the [data documentation](https://perun.readthedocs.io/en/latest/data.html)." + ] + }, + { + "cell_type": "markdown", + "id": "12", + "metadata": {}, + "source": [ + "The data that is stored on the hdf5 file can be exported to other formats. Supported formats are text (same as text report), csv, json and bench. Run the cell below to export the last run of the sleep program to csv." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + ",run id,hostname,device_group,sensor,unit,magnitude,timestep,value\n", + "0,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,0.0,2021.14599609375\n", + "1,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,1.0068829,964.1939697265625\n", + "2,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,2.0126529,400.12799072265625\n", + "3,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,3.0183434,2600.0\n", + "4,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,4.024712,2800.0\n", + "5,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,5.0291414,2384.971923828125\n", + "6,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,6.033699,1418.0760498046875\n", + "7,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,7.0397954,2297.81298828125\n", + "8,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,8.047083,2893.419921875\n", + "9,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,9.0511675,2456.3759765625\n", + "10,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.060614,1828.7459716796875\n", + "11,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.068606,3012.5791015625\n", + "12,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,0.0,1211.6190185546875\n", + "13,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,1.0068829,2700.0\n", + "14,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,2.0126529,1569.219970703125\n", + "15,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,3.0183434,2497.64697265625\n", + "16,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,4.024712,2693.7109375\n", + "17,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,5.0291414,2240.751953125\n", + "18,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,6.033699,3000.02099609375\n", + "19,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,7.0397954,2600.0\n", + "20,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,8.047083,3100.0\n", + "21,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,9.0511675,1806.197021484375\n", + "22,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,10.060614,3102.570068359375\n", + "23,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,10.068606,2934.219970703125\n", + "24,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,0.0,2200.10595703125\n", + "25,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,1.0068829,2700.096923828125\n", + "26,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,2.0126529,2842.551025390625\n", + "27,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,3.0183434,2488.455078125\n", + "28,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,4.024712,2651.922119140625\n", + "29,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,5.0291414,2183.43310546875\n", + "30,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,6.033699,2751.02490234375\n", + "31,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,7.0397954,2544.83203125\n", + "32,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,8.047083,3044.756103515625\n", + "33,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,9.0511675,2271.235107421875\n", + "34,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,10.060614,2385.8291015625\n", + "35,0,juan-20w000p2ge,cpu,CPU_FREQ_2,Hz,1000000.0,10.068606,3200.0\n", + "36,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,0.0,2200.012939453125\n", + "37,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,1.0068829,2700.0\n", + "38,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,2.0126529,1869.530029296875\n", + "39,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,3.0183434,2600.0\n", + "40,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,4.024712,2800.0\n", + "41,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,5.0291414,2315.37109375\n", + "42,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,6.033699,2672.827880859375\n", + "43,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,7.0397954,2600.0\n", + "44,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,8.047083,2464.04296875\n", + "45,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,9.0511675,2410.884033203125\n", + "46,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,10.060614,3060.60791015625\n", + "47,0,juan-20w000p2ge,cpu,CPU_FREQ_3,Hz,1000000.0,10.068606,2562.06201171875\n", + "48,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,0.0,2156.548095703125\n", + "49,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,1.0068829,2499.455078125\n", + "50,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,2.0126529,400.62200927734375\n", + "51,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,3.0183434,2080.2919921875\n", + "52,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,4.024712,2777.10693359375\n", + "53,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,5.0291414,1521.5909423828125\n", + "54,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,6.033699,2873.384033203125\n", + "55,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,7.0397954,2195.196044921875\n", + "56,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,8.047083,2817.139892578125\n", + "57,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,9.0511675,2418.926025390625\n", + "58,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,10.060614,2187.868896484375\n", + "59,0,juan-20w000p2ge,cpu,CPU_FREQ_4,Hz,1000000.0,10.068606,2655.29296875\n", + "60,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,0.0,2137.35791015625\n", + "61,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,1.0068829,2700.0\n", + "62,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,2.0126529,769.7069702148438\n", + "63,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,3.0183434,1988.4849853515625\n", + "64,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,4.024712,2471.529052734375\n", + "65,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,5.0291414,1931.303955078125\n", + "66,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,6.033699,2886.305908203125\n", + "67,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,7.0397954,2543.840087890625\n", + "68,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,8.047083,3100.0\n", + "69,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,9.0511675,2055.845947265625\n", + "70,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,10.060614,2340.925048828125\n", + "71,0,juan-20w000p2ge,cpu,CPU_FREQ_5,Hz,1000000.0,10.068606,2812.739990234375\n", + "72,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,0.0,2176.281005859375\n", + "73,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,1.0068829,1221.010986328125\n", + "74,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,2.0126529,1433.5810546875\n", + "75,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,3.0183434,2562.242919921875\n", + "76,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,4.024712,2591.029052734375\n", + "77,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,5.0291414,2437.9990234375\n", + "78,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,6.033699,3000.0\n", + "79,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,7.0397954,2550.77392578125\n", + "80,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,8.047083,3063.29296875\n", + "81,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,9.0511675,2261.791015625\n", + "82,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,10.060614,3050.388916015625\n", + "83,0,juan-20w000p2ge,cpu,CPU_FREQ_6,Hz,1000000.0,10.068606,3017.64892578125\n", + "84,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,0.0,2199.987060546875\n", + "85,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,1.0068829,2698.6279296875\n", + "86,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,2.0126529,1597.2509765625\n", + "87,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,3.0183434,2600.0\n", + "88,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,4.024712,2800.0\n", + "89,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,5.0291414,2749.60400390625\n", + "90,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,6.033699,1021.1300048828125\n", + "91,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,7.0397954,1945.0069580078125\n", + "92,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,8.047083,3001.322998046875\n", + "93,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,9.0511675,2486.304931640625\n", + "94,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,10.060614,3200.0\n", + "95,0,juan-20w000p2ge,cpu,CPU_FREQ_7,Hz,1000000.0,10.068606,2859.821044921875\n", + "96,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,0.0,37.5\n", + "97,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,1.0068829,25.700000762939453\n", + "98,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,2.0126529,24.600000381469727\n", + "99,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,3.0183434,33.599998474121094\n", + "100,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,4.024712,31.5\n", + "101,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,5.0291414,23.100000381469727\n", + "102,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,6.033699,26.600000381469727\n", + "103,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,7.0397954,33.900001525878906\n", + "104,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,8.047083,24.700000762939453\n", + "105,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,9.0511675,23.600000381469727\n", + "106,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,10.060614,23.299999237060547\n", + "107,0,juan-20w000p2ge,cpu,CPU_USAGE,%,1.0,10.068606,50.0\n", + "108,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,0.0,9.068116188049316\n", + "109,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,1.0068829,9.068116188049316\n", + "110,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,2.0126529,9.29400634765625\n", + "111,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,3.0183434,10.591010093688965\n", + "112,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,4.024712,9.672627449035645\n", + "113,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,5.0291414,9.234281539916992\n", + "114,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,6.033699,10.3326416015625\n", + "115,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,7.0397954,10.53620433807373\n", + "116,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,8.047083,8.992063522338867\n", + "117,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,9.0511675,9.542298316955566\n", + "118,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,10.060614,10.295360565185547\n", + "119,0,juan-20w000p2ge,cpu,cpu_0_package-0,W,1.0,10.068606,11.85925579071045\n", + "120,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,0.0,6371516416.0\n", + "121,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,1.0068829,6371516416.0\n", + "122,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,2.0126529,6371516416.0\n", + "123,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,3.0183434,6371516416.0\n", + "124,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,4.024712,6371516416.0\n", + "125,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,5.0291414,6371516416.0\n", + "126,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,6.033699,6371520512.0\n", + "127,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,7.0397954,6371520512.0\n", + "128,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,8.047083,6371520512.0\n", + "129,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,9.0511675,6371520512.0\n", + "130,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,10.060614,6371520512.0\n", + "131,0,juan-20w000p2ge,disk,DISK_READ_BYTES,B,1.0,10.068606,6371520512.0\n", + "132,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,0.0,35543599104.0\n", + "133,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,1.0068829,35543599104.0\n", + "134,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,2.0126529,35543599104.0\n", + "135,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,3.0183434,35543697408.0\n", + "136,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,4.024712,35556833280.0\n", + "137,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,5.0291414,35556833280.0\n", + "138,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,6.033699,35556923392.0\n", + "139,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,7.0397954,35556923392.0\n", + "140,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,8.047083,35556923392.0\n", + "141,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,9.0511675,35556923392.0\n", + "142,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,10.060614,35557033984.0\n", + "143,0,juan-20w000p2ge,disk,DISK_WRITE_BYTES,B,1.0,10.068606,35557033984.0\n", + "144,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,0.0,18377730529.0\n", + "145,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,1.0068829,18377732025.0\n", + "146,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,2.0126529,18377732426.0\n", + "147,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,3.0183434,18377740366.0\n", + "148,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,4.024712,18377741928.0\n", + "149,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,5.0291414,18377741994.0\n", + "150,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,6.033699,18377741994.0\n", + "151,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,7.0397954,18391531834.0\n", + "152,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,8.047083,18391531959.0\n", + "153,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,9.0511675,18391531959.0\n", + "154,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,10.060614,18391534144.0\n", + "155,0,juan-20w000p2ge,net,NET_READ_BYTES,B,1.0,10.068606,18391534144.0\n", + "156,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,0.0,304896333.0\n", + "157,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,1.0068829,304897829.0\n", + "158,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,2.0126529,304898025.0\n", + "159,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,3.0183434,304900338.0\n", + "160,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,4.024712,304901904.0\n", + "161,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,5.0291414,304901904.0\n", + "162,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,6.033699,304901904.0\n", + "163,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,7.0397954,304946475.0\n", + "164,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,8.047083,304946686.0\n", + "165,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,9.0511675,304946686.0\n", + "166,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,10.060614,304948698.0\n", + "167,0,juan-20w000p2ge,net,NET_WRITE_BYTES,B,1.0,10.068606,304948698.0\n", + "168,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,0.0,7110832128.0\n", + "169,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,1.0068829,7132991488.0\n", + "170,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,2.0126529,7121014784.0\n", + "171,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,3.0183434,7130132480.0\n", + "172,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,4.024712,7077158912.0\n", + "173,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,5.0291414,7070154752.0\n", + "174,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,6.033699,7081443328.0\n", + "175,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,7.0397954,7110733824.0\n", + "176,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,8.047083,7109107712.0\n", + "177,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,9.0511675,7103995904.0\n", + "178,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,10.060614,7114371072.0\n", + "179,0,juan-20w000p2ge,ram,RAM_USAGE,B,1.0,10.068606,7114371072.0\n" + ] + } + ], + "source": [ + "%%bash\n", + "perun export perun_results/sleep.hdf5 csv\n", + "cat perun_results/sleep_*.csv" + ] + }, + { + "cell_type": "markdown", + "id": "14", + "metadata": {}, + "source": [ + "Let's move on to a slightly more interesting example, that we are going to profile in parallel inside our notebook using **ipyparallel**. " + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": {}, + "source": [ + "## Setup for a notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 engines found\n" + ] + } + ], + "source": [ + "from ipyparallel import Client\n", + "rc = Client(profile=\"default\")\n", + "rc.ids\n", + "\n", + "if len(rc.ids) == 0:\n", + " print(\"No engines found\")\n", + "else:\n", + " print(f\"{len(rc.ids)} engines found\")" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": {}, + "source": [ + "## Using the perun decorators\n", + "\n", + "perun offers an alternative way to start monitoring your code by using function decorators. The main goal is to isolate the region of the code that you want to monitor inside a function, and decorate it with the ```@perun``` decorator. Now, your code can be started using the normal python command, and perun will start gathering data only when that function is reached.\n", + "\n", + "**Carefull**: For each time the perun decorator is called, it will create a new output file and a new run, which could slow down your code significantly. If the function that you want to monitor will be run more than once, it is better to use the ```@monitor``` decorator. \n", + "\n", + "Let's look at the example below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "import sklearn\n", + "import heat as ht\n", + "from perun import perun, monitor\n", + "\n", + "@monitor()\n", + "def data_loading():\n", + " X,_ = sklearn.datasets.load_digits(return_X_y=True)\n", + " return ht.array(X, split=0)\n", + "\n", + "@monitor()\n", + "def fitting(X):\n", + " k = 10\n", + " kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n", + " kmeans.fit(X)\n", + "\n", + "@perun(log_lvl=\"WARNING\", data_out=\"perun_data\", format=\"text\", sampling_period=0.1)\n", + "def main():\n", + " data = data_loading()\n", + " fitting(data)\n" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": {}, + "source": [ + "The example has 3 functions, the ```main``` function with the ```@perun``` decorator, ```fitting``` and ```data_loading``` with the ```@monitor``` decorator. **perun** will start monitoring whenever we run the ```main``` function, and will record the entry and exit time of the other two functions marked with ```@monitor```. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": {}, + "outputs": [], + "source": [ + "%%px\n", + "main()" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": {}, + "source": [ + "The text report will have an extra table with with all the monitored functions, outlining the average runtime, and power draw measured while the application was running, together with other metrics. The data can also be found in the hdf5 file, where the start and stop events of the functions are stored under the regions node of the individual runs. " + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": {}, + "source": [ + "If you want more information on perun check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest) or check the code in [github](https://github.com/Helmholtz-AI-Energy/perun). Thanks!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "heat-dev311", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/source/tutorial_30_minutes.rst b/doc/source/tutorials/tutorial_30_minutes.rst similarity index 100% rename from doc/source/tutorial_30_minutes.rst rename to doc/source/tutorials/tutorial_30_minutes.rst diff --git a/doc/source/tutorial_clustering.rst b/doc/source/tutorials/tutorial_clustering.rst similarity index 97% rename from doc/source/tutorial_clustering.rst rename to doc/source/tutorials/tutorial_clustering.rst index 21b4157065..ccceb4248b 100644 --- a/doc/source/tutorial_clustering.rst +++ b/doc/source/tutorials/tutorial_clustering.rst @@ -50,7 +50,7 @@ all processes) and transform it into a numpy array. Plotting can only be done on This will render something like -.. image:: ../images/data.png +.. image:: ../_static/images/data.png Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the initial centroids. @@ -93,7 +93,7 @@ Let's plot the assigned clusters and the respective centroids: plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e') plt.show() -.. image:: ../images/clustering.png +.. image:: ../_static/images/clustering.png We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'. @@ -127,7 +127,7 @@ Plotting the assigned clusters and the respective centroids: plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e') plt.show() -.. image:: ../images/clustering_kmeans.png +.. image:: ../_static/images/clustering_kmeans.png The Iris Dataset ------------------------------ @@ -139,6 +139,8 @@ dataloader .. code:: python iris = ht.load("heat/datasets/iris.csv", sep=";", split=0) + + Fitting the dataset with kmeans: .. code:: python diff --git a/doc/source/tutorials/tutorial_notebook_gallery.rst b/doc/source/tutorials/tutorial_notebook_gallery.rst new file mode 100644 index 0000000000..67c67ab40b --- /dev/null +++ b/doc/source/tutorials/tutorial_notebook_gallery.rst @@ -0,0 +1,25 @@ +Notebook gallery +================ + +Setup notebooks +~~~~~~~~~~~~~~~ + +Example notebooks explaining how to setup an MPI enabled notebook to work with heat in an interactive way. + +.. nbgallery:: + notebooks/0_setup/0_setup_local + notebooks/0_setup/0_setup_jsc + notebooks/0_setup/0_setup_haicore + +Example notebooks +~~~~~~~~~~~~~~~~~ + +This notebooks contain heat examples that have been used in interactive tutorials. + +.. nbgallery:: + notebooks/1_basics + notebooks/2_internals + notebooks/3_loading_preprocessing + notebooks/4_matrix_factorizations + notebooks/5_clustering + notebooks/6_profiling diff --git a/doc/source/tutorial_parallel_computation.rst b/doc/source/tutorials/tutorial_parallel_computation.rst similarity index 99% rename from doc/source/tutorial_parallel_computation.rst rename to doc/source/tutorials/tutorial_parallel_computation.rst index 684e775cea..2a10777726 100644 --- a/doc/source/tutorial_parallel_computation.rst +++ b/doc/source/tutorials/tutorial_parallel_computation.rst @@ -74,7 +74,7 @@ Distributed Computing With Heat you can even compute in distributed memory environments with multiple computation nodes, like modern high-performance cluster systems. For this, Heat makes use of the fact that operations performed on multi-dimensional arrays tend to be identical for all data items. Hence, they can be processed in data-parallel manner. Heat partitions the total number of data items equally among all processing nodes. A ``DNDarray`` assumes the role of a virtual overlay over these node-local data portions and manages them for you while offering the same interface. Consequently, operations can now be executed in parallel. Each processing node applies them locally to their own data chunk. If necessary, partial results are communicated and automatically combined behind the scenes for correct global results. -.. image:: ../images/split_array.svg +.. image:: ../_static/images/split_array.svg :align: center :width: 80% @@ -202,7 +202,7 @@ Technical Details On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) `_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronization only occurs for collective MPI calls as well as at the program start and termination. -.. image:: ../images/bsp.svg +.. image:: ../_static/images/bsp.svg :align: center :width: 60% diff --git a/doc/source/tutorials.rst b/doc/source/tutorials/tutorials.rst similarity index 68% rename from doc/source/tutorials.rst rename to doc/source/tutorials/tutorials.rst index 6cc42143f2..59b68fb2bf 100644 --- a/doc/source/tutorials.rst +++ b/doc/source/tutorials/tutorials.rst @@ -7,12 +7,14 @@ Heat Tutorials tutorial_30_minutes tutorial_parallel_computation tutorial_clustering + tutorial_notebook_gallery + .. container:: tutorial .. container:: tutorial-image - .. image:: ../images/tutorial_logo.svg + .. image:: ../_static/images/tutorial_logo.svg :target: tutorial_30_minutes.html .. raw:: html @@ -31,7 +33,7 @@ Heat Tutorials .. container:: tutorial-image - .. image:: ../images/tutorial_split_dndarray.svg + .. image:: ../_static/images/tutorial_split_dndarray.svg :target: tutorial_parallel_computation.html .. raw:: html @@ -50,7 +52,7 @@ Heat Tutorials .. container:: tutorial-image - .. image:: ../images/tutorial_clustering.svg + .. image:: ../_static/images/tutorial_clustering.svg :target: tutorial_clustering.html .. raw:: html @@ -63,3 +65,22 @@ Heat Tutorials For intermediate analysts.
+ + +.. container:: tutorial + + .. container:: tutorial-image + + .. image:: ../_static/images/jupyter.png + :target: tutorial_notebook_gallery.html + + .. raw:: html + + +
+ Example notebooks +

+

Ideal for people that like using jupyter notebook or other interactive environments.

+ Excellent for beginners. +
+
diff --git a/docker/Dockerfile.release b/docker/Dockerfile.release index 3aa43fde14..8ead42996a 100644 --- a/docker/Dockerfile.release +++ b/docker/Dockerfile.release @@ -1,5 +1,5 @@ ARG HEAT_VERSION=latest -ARG PYTORCH_IMG=23.05-py3 +ARG PYTORCH_IMG=25.07-py3 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base COPY ./tzdata.seed /tmp/tzdata.seed diff --git a/docker/Dockerfile.source b/docker/Dockerfile.source index 2765d1cc41..93a017b359 100644 --- a/docker/Dockerfile.source +++ b/docker/Dockerfile.source @@ -1,4 +1,4 @@ -ARG PYTORCH_IMG=23.05-py3 +ARG PYTORCH_IMG=25.07-py3 ARG HEAT_BRANCH=main FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base diff --git a/docker/scripts/build_and_push.sh b/docker/scripts/build_and_push.sh index 10895596ab..8fdd1d309f 100755 --- a/docker/scripts/build_and_push.sh +++ b/docker/scripts/build_and_push.sh @@ -8,39 +8,35 @@ while [[ $# -gt 0 ]]; do case $1 in --heat-version) HEAT_VERSION="$2" - shift # past argument - shift # past value + shift 2 # Use 'shift 2' as a concise way to skip past the argument and its value ;; --pytorch-img) PYTORCH_IMG="$2" - shift # past argument - shift # past value + shift 2 ;; --torch-version) TORCH_VERSION="$2" - shift # past argument - shift # past value + shift 2 ;; --cuda-version) CUDA_VERSION="$2" - shift # past argument - shift # past value + shift 2 ;; --python-version) PYTHON_VERSION="$2" - shift # past argument - shift # past value + shift 2 ;; --upload) GHCR_UPLOAD=true - shift - shift + shift # FIX 1: This is a flag, not an option with a value, so only shift once. ;; - -*|--*) + -*) # FIX 2: Simplified from '-*|--*'. This correctly catches all unknown options. echo "Unknown option $1" exit 1 ;; - *) + *) # FIX 3: Added a 'shift' to handle positional arguments and prevent an infinite loop. + shift + ;; esac done @@ -56,13 +52,13 @@ ghcr_tag="ghcr.io/helmholtz-analytics/heat:${HEAT_VERSION}_torch${TORCH_VERSION} echo "Building image $ghcr_tag" docker build --file ../Dockerfile.release \ - --build-arg HEAT_VERSION=$HEAT_VERSION \ - --build-arg PYTORCH_IMG=$PYTORCH_IMG \ - --tag $ghcr_tag \ + --build-arg HEAT_VERSION="$HEAT_VERSION" \ + --build-arg PYTORCH_IMG="$PYTORCH_IMG" \ + --tag "$ghcr_tag" \ . if [ $GHCR_UPLOAD = true ]; then echo "Push image" echo "You might need to log in into ghcr.io (https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry)" - docker push $ghcr_tag + docker push "$ghcr_tag" fi diff --git a/docker/scripts/install_print_test.sh b/docker/scripts/install_print_test.sh index 9103be9562..08bbf33ae6 100755 --- a/docker/scripts/install_print_test.sh +++ b/docker/scripts/install_print_test.sh @@ -12,10 +12,10 @@ mpirun --version # Install heat from source. git clone https://github.com/helmholtz-analytics/heat.git -cd heat +cd heat || exit pip install --upgrade pip pip install mpi4py --no-binary :all: -pip install .[netcdf,hdf5,dev] +pip install '.[netcdf,hdf5,dev]' # Run tests HEAT_TEST_USE_DEVICE=gpu mpirun -n 1 pytest heat/ diff --git a/docker/scripts/test_nvidia_image_haicore_enroot.sh b/docker/scripts/test_nvidia_image_haicore_enroot.sh index 7b052b22ea..07c6b94f7d 100755 --- a/docker/scripts/test_nvidia_image_haicore_enroot.sh +++ b/docker/scripts/test_nvidia_image_haicore_enroot.sh @@ -12,7 +12,7 @@ SBATCH_PARAMS=( --gres gpu:1 --container-image ~/containers/nvidia+pytorch+23.05-py3.sqsh --container-writable - --container-mounts /etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch + --container-mounts "/etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch" --container-mount-home ) diff --git a/heat/classification/kneighborsclassifier.py b/heat/classification/kneighborsclassifier.py index dac65546bc..90d1859537 100644 --- a/heat/classification/kneighborsclassifier.py +++ b/heat/classification/kneighborsclassifier.py @@ -24,7 +24,7 @@ class KNeighborsClassifier(ht.BaseEstimator, ht.ClassificationMixin): The distance function used to identify the nearest neighbors, defaults to the Euclidean distance. References - -------- + ---------- [1] T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, January 1967, doi: 10.1109/TIT.1967.1053964. """ @@ -122,7 +122,6 @@ def predict(self, x: DNDarray) -> DNDarray: """ distances = self.effective_metric_(x, self.x) _, indices = ht.topk(distances, self.n_neighbors, largest=False) - predictions = self.y[indices.flatten()] predictions.balance_() predictions = ht.reshape(predictions, (indices.gshape + (self.y.gshape[1],))) diff --git a/heat/cli.py b/heat/cli.py new file mode 100644 index 0000000000..29a91fdbc7 --- /dev/null +++ b/heat/cli.py @@ -0,0 +1,54 @@ +""" +Heat command line interface module. +""" + +import torch +import platform +import mpi4py +import argparse + +from heat.core.version import __version__ as ht_version +from heat.core.communication import CUDA_AWARE_MPI + + +def cli() -> None: + """ + Command line interface entrypoint. + """ + parser = argparse.ArgumentParser( + prog="heat", description="Commmand line utilities of the Helmholtz Analytics Toolkit" + ) + parser.add_argument( + "-i", "--info", action="store_true", help="Print version and platform information" + ) + + args = parser.parse_args() + if args.info: + plaform_info() + else: + parser.print_help() + + +def plaform_info(): + """ + Print the current software stack being used by heat, including available devices. + """ + print("HeAT: Helmholtz Analytics Toolkit") + print(f" Version: {ht_version}") + print(f" Platform: {platform.platform()}") + + print(f" mpi4py Version: {mpi4py.__version__}") + print(f" MPI Library Version: {mpi4py.MPI.Get_library_version()}") + + print(f" Torch Version: {torch.__version__}") + print(f" CUDA Available: {torch.cuda.is_available()}") + if torch.cuda.is_available(): + def_device = torch.cuda.current_device() + print(f" Device count: {torch.cuda.device_count()}") + print(f" Default device: {def_device}") + print(f" Device name: {torch.cuda.get_device_name(def_device)}") + print(f" Device name: {torch.cuda.get_device_properties(def_device)}") + print( + f" Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB" + ) + print(f" CUDA Aware MPI: {CUDA_AWARE_MPI}") diff --git a/heat/cluster/batchparallelclustering.py b/heat/cluster/batchparallelclustering.py index 257b88c18d..ef7dd45aba 100644 --- a/heat/cluster/batchparallelclustering.py +++ b/heat/cluster/batchparallelclustering.py @@ -308,13 +308,16 @@ def predict(self, x: DNDarray): if self._p == 2: self._functional_value = ( torch.norm( - x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), p="fro" + x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), + p="fro", ) ** 2 ) else: self._functional_value = torch.norm( - x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), p=self._p, dim=1 + x.larray - self._cluster_centers.larray[local_labels, :].squeeze(), + p=self._p, + dim=1, ).sum() x.comm.Allreduce(ht.communication.MPI.IN_PLACE, self._functional_value) self._functional_value = self._functional_value.item() diff --git a/heat/cluster/kmedians.py b/heat/cluster/kmedians.py index c7d991b1fd..efca792ff6 100644 --- a/heat/cluster/kmedians.py +++ b/heat/cluster/kmedians.py @@ -32,7 +32,7 @@ class KMedians(_KCluster): Determines random number generation for centroid initialization. References - ------------- + ---------- [1] Hakimi, S., and O. Kariv. "An algorithmic approach to network location problems II: The p-medians." SIAM Journal on Applied Mathematics 37.3 (1979): 539-560. """ diff --git a/heat/cluster/kmedoids.py b/heat/cluster/kmedoids.py index 0eb38a5eb6..b0fd951ae2 100644 --- a/heat/cluster/kmedoids.py +++ b/heat/cluster/kmedoids.py @@ -10,9 +10,9 @@ class KMedoids(_KCluster): """ - This is not the original implementation of k-medoids using PAM as originally proposed by in [1]. - This is kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center + Kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center and snapping the centroid to the the nearest datapoint afterwards. + This is not the original implementation of k-medoids using PAM as originally proposed by in [1]. Parameters ---------- @@ -30,7 +30,7 @@ class KMedoids(_KCluster): Determines random number generation for centroid initialization. References - ----------- + ---------- [1] Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical Data Analysis Based on the L1 Norm and Related Methods, edited by Y. Dodge, North-Holland, 405416. """ diff --git a/heat/cluster/tests/test_batchparallelclustering.py b/heat/cluster/tests/test_batchparallelclustering.py index 222e6f8fff..684d9d9247 100644 --- a/heat/cluster/tests/test_batchparallelclustering.py +++ b/heat/cluster/tests/test_batchparallelclustering.py @@ -1,4 +1,5 @@ import os +import platform import unittest import numpy as np import torch @@ -11,7 +12,12 @@ # test BatchParallelKCluster base class and auxiliary functions +# skip on MPS +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + +@unittest.skipIf(is_mps, "Batchparallelclustering fit() fails on MPS") class TestAuxiliaryFunctions(TestCase): def test_kmex(self): X = torch.rand(10, 3) @@ -50,6 +56,7 @@ def test_BatchParallelKClustering(self): # test BatchParallelKMeans and BatchParallelKMedians +@unittest.skipIf(is_mps, "Batchparallelclustering fit() fails on MPS") class TestBatchParallelKCluster(TestCase): def test_clusterer(self): for ParallelClusterer in [ht.cluster.BatchParallelKMeans, ht.cluster.BatchParallelKMedians]: @@ -84,6 +91,11 @@ def test_get_and_set_params(self): self.assertEqual(10, parallelclusterer.n_clusters) def test_spherical_clusters(self): + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for ParallelClusterer in [ht.cluster.BatchParallelKMeans, ht.cluster.BatchParallelKMedians]: if ParallelClusterer is ht.cluster.BatchParallelKMeans: ppinitkws = ["k-means++"] @@ -91,7 +103,7 @@ def test_spherical_clusters(self): ppinitkws = ["k-medians++"] for seed in [1, None]: n = 20 * ht.MPI_WORLD.size - for dtype in [ht.float32, ht.float64]: + for dtype in dtypes: data = create_spherical_dataset( num_samples_cluster=n, radius=1.0, diff --git a/heat/cluster/tests/test_kmeans.py b/heat/cluster/tests/test_kmeans.py index 9fa79ed67e..ec6254c633 100644 --- a/heat/cluster/tests/test_kmeans.py +++ b/heat/cluster/tests/test_kmeans.py @@ -100,15 +100,20 @@ def test_spherical_clusters(self): # different datatype n = 20 * ht.MPI_WORLD.size + if self.is_mps: + # MPS does not support float64 + dtype = ht.float32 + else: + dtype = ht.float64 data = create_spherical_dataset( - num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed + num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed ) kmeans = ht.cluster.KMeans(n_clusters=4, init="kmeans++") kmeans.fit(data) self.assertIsInstance(kmeans.cluster_centers_, ht.DNDarray) self.assertEqual(kmeans.cluster_centers_.shape, (4, 3)) - # on Ints (different radius, offset and datatype + # on Ints (different radius, offset and datatype) data = create_spherical_dataset( num_samples_cluster=n, radius=10.0, offset=40.0, dtype=ht.int32, random_state=seed ) diff --git a/heat/cluster/tests/test_kmedians.py b/heat/cluster/tests/test_kmedians.py index 64c95eb740..ee8b534e50 100644 --- a/heat/cluster/tests/test_kmedians.py +++ b/heat/cluster/tests/test_kmedians.py @@ -100,8 +100,13 @@ def test_spherical_clusters(self): # different datatype n = 20 * ht.MPI_WORLD.size + # MPS does not support float64 + if self.is_mps: + dtype = ht.float32 + else: + dtype = ht.float64 data = create_spherical_dataset( - num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed + num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed ) kmedians = ht.cluster.KMedians(n_clusters=4, init="kmedians++") kmedians.fit(data) diff --git a/heat/cluster/tests/test_kmedoids.py b/heat/cluster/tests/test_kmedoids.py index b04d29a522..27ce5388bf 100644 --- a/heat/cluster/tests/test_kmedoids.py +++ b/heat/cluster/tests/test_kmedoids.py @@ -103,8 +103,13 @@ def test_spherical_clusters(self): # different datatype n = 20 * ht.MPI_WORLD.size + # MPS does not support float64 + if self.is_mps: + dtype = ht.float32 + else: + dtype = ht.float64 data = create_spherical_dataset( - num_samples_cluster=n, radius=1.0, offset=4.0, dtype=ht.float64, random_state=seed + num_samples_cluster=n, radius=1.0, offset=4.0, dtype=dtype, random_state=seed ) kmedoid = ht.cluster.KMedoids(n_clusters=4, init="kmedoids++") kmedoid.fit(data) diff --git a/heat/cluster/tests/test_spectral.py b/heat/cluster/tests/test_spectral.py index 9e24dddfc5..cd43433d9d 100644 --- a/heat/cluster/tests/test_spectral.py +++ b/heat/cluster/tests/test_spectral.py @@ -2,6 +2,7 @@ import unittest import heat as ht +import torch from ...core.tests.test_suites.basic_test import TestCase @@ -35,49 +36,51 @@ def test_get_and_set_params(self): self.assertEqual(10, spectral.n_clusters) def test_fit_iris(self): - # get some test data - iris = ht.load("heat/datasets/iris.csv", sep=";", split=0) - m = 10 - # fit the clusters - spectral = ht.cluster.Spectral( - n_clusters=3, gamma=1.0, metric="rbf", laplacian="fully_connected", n_lanczos=m - ) - spectral.fit(iris) - self.assertIsInstance(spectral.labels_, ht.DNDarray) + # skip on MPS, matmul on ComplexFloat not supported as of PyTorch 2.5 + if not self.is_mps: + # get some test data + iris = ht.load("heat/datasets/iris.csv", sep=";", split=0) + m = 10 + # fit the clusters + spectral = ht.cluster.Spectral( + n_clusters=3, gamma=1.0, metric="rbf", laplacian="fully_connected", n_lanczos=m + ) + spectral.fit(iris) + self.assertIsInstance(spectral.labels_, ht.DNDarray) - spectral = ht.cluster.Spectral( - metric="euclidean", - laplacian="eNeighbour", - threshold=0.5, - boundary="upper", - n_lanczos=m, - ) - labels = spectral.fit_predict(iris) - self.assertIsInstance(labels, ht.DNDarray) + spectral = ht.cluster.Spectral( + metric="euclidean", + laplacian="eNeighbour", + threshold=0.5, + boundary="upper", + n_lanczos=m, + ) + labels = spectral.fit_predict(iris) + self.assertIsInstance(labels, ht.DNDarray) - spectral = ht.cluster.Spectral( - gamma=0.1, - metric="rbf", - laplacian="eNeighbour", - threshold=0.5, - boundary="upper", - n_lanczos=m, - ) - labels = spectral.fit_predict(iris) - self.assertIsInstance(labels, ht.DNDarray) + spectral = ht.cluster.Spectral( + gamma=0.1, + metric="rbf", + laplacian="eNeighbour", + threshold=0.5, + boundary="upper", + n_lanczos=m, + ) + labels = spectral.fit_predict(iris) + self.assertIsInstance(labels, ht.DNDarray) - kmeans = {"kmeans++": "kmeans++", "max_iter": 30, "tol": -1} - spectral = ht.cluster.Spectral( - n_clusters=3, gamma=1.0, normalize=True, n_lanczos=m, params=kmeans - ) - labels = spectral.fit_predict(iris) - self.assertIsInstance(labels, ht.DNDarray) + kmeans = {"kmeans++": "kmeans++", "max_iter": 30, "tol": -1} + spectral = ht.cluster.Spectral( + n_clusters=3, gamma=1.0, normalize=True, n_lanczos=m, params=kmeans + ) + labels = spectral.fit_predict(iris) + self.assertIsInstance(labels, ht.DNDarray) - # Errors - with self.assertRaises(NotImplementedError): - spectral = ht.cluster.Spectral(metric="ahalanobis", n_lanczos=m) + # Errors + with self.assertRaises(NotImplementedError): + spectral = ht.cluster.Spectral(metric="ahalanobis", n_lanczos=m) - iris_split = ht.load("heat/datasets/iris.csv", sep=";", split=1) - spectral = ht.cluster.Spectral(n_lanczos=20) - with self.assertRaises(NotImplementedError): - spectral.fit(iris_split) + iris_split = ht.load("heat/datasets/iris.csv", sep=";", split=1) + spectral = ht.cluster.Spectral(n_lanczos=20) + with self.assertRaises(NotImplementedError): + spectral.fit(iris_split) diff --git a/heat/core/_operations.py b/heat/core/_operations.py index 4541ba08a7..3977975a7a 100644 --- a/heat/core/_operations.py +++ b/heat/core/_operations.py @@ -63,7 +63,7 @@ def __binary_op( MPI communication is necessary when both operands are distributed along the same dimension, but the distribution maps do not match. E.g.: ``` - a = ht.ones(10000, split=0) + a = ht.ones(10000, split=0) b = ht.zeros(10000, split=0) c = a[:-1] + b[1:] ``` @@ -197,6 +197,10 @@ def __get_out_params(target, other=None, map=None): sanitation.sanitize_out(out, output_shape, output_split, output_device, output_comm) t1, t2 = sanitation.sanitize_distribution(t1, t2, target=out) + # MPS does not support float64 + if t1.larray.is_mps and promoted_type == torch.float64: + promoted_type = torch.float32 + result = operation(t1.larray.to(promoted_type), t2.larray.to(promoted_type), **fn_kwargs) if out is None and where is True: @@ -282,6 +286,9 @@ def __cum_op( if dtype is not None: dtype = types.canonical_heat_type(dtype) + if x.larray.is_mps and dtype == types.float64: + warnings.warn("MPS does not support float64, will cast to float32") + dtype = types.float32 if out is not None: sanitation.sanitize_out(out, x.shape, x.split, x.device) @@ -350,13 +357,15 @@ def __local_op( out : DNDarray, optional A location in which to store the results. If provided, it must have a broadcastable shape. If not provided or set to None, a fresh tensor is allocated. + **kwargs: + Arguments to be passed to the operation. Warning ------- The gshape of the result DNDarray will be the same as that of x Raises - ------- + ------ TypeError If the input is not a tensor or the output is not a tensor or None. """ @@ -369,6 +378,8 @@ def __local_op( # we need floating point numbers here, due to PyTorch only providing sqrt() implementation for float32/64 if not no_cast: promoted_type = types.promote_types(x.dtype, types.float32) + if promoted_type is types.float64 and x.larray.is_mps: + promoted_type = types.float32 torch_type = promoted_type.torch_type() else: torch_type = x.larray.dtype @@ -426,6 +437,8 @@ def __reduce_op( Neutral element, i.e. an element that does not change the result of the reduction operation. Needed for those cases where 'x.gshape[x.split] < x.comm.rank', that is, the shape of the distributed tensor is such that one or more processes will be left without data. + **kwargs: + Arguments to be passed to the operation. Raises ------ diff --git a/heat/core/arithmetics.py b/heat/core/arithmetics.py index d91c6e6d1b..1318e8a8ba 100644 --- a/heat/core/arithmetics.py +++ b/heat/core/arithmetics.py @@ -231,11 +231,11 @@ def bitwise_and( DNDarray(1, dtype=ht.int64, device=cpu:0, split=None) >>> ht.bitwise_and(14, 13) DNDarray(12, dtype=ht.int64, device=cpu:0, split=None) - >>> ht.bitwise_and(ht.array([14,3]), 13) + >>> ht.bitwise_and(ht.array([14, 3]), 13) DNDarray([12, 1], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.bitwise_and(ht.array([11,7]), ht.array([4,25])) + >>> ht.bitwise_and(ht.array([11, 7]), ht.array([4, 25])) DNDarray([0, 1], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.bitwise_and(ht.array([2,5,255]), ht.array([3,14,16])) + >>> ht.bitwise_and(ht.array([2, 5, 255]), ht.array([3, 14, 16])) DNDarray([ 2, 4, 16], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bitwise_and(ht.array([True, True]), ht.array([False, True])) DNDarray([False, True], dtype=ht.bool, device=cpu:0, split=None) @@ -304,15 +304,15 @@ def bitwise_and_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: DNDarray(16, dtype=ht.int64, device=cpu:0, split=None) >>> T2 DNDarray(16, dtype=ht.int64, device=cpu:0, split=None) - >>> T4 = ht.array([14,3]) + >>> T4 = ht.array([14, 3]) >>> s = 29 >>> T4 &= s >>> T4 DNDarray([12, 1], dtype=ht.int64, device=cpu:0, split=None) >>> s 29 - >>> T5 = ht.array([2,5,255]) - >>> T6 = ht.array([3,14,16]) + >>> T5 = ht.array([2, 5, 255]) + >>> T6 = ht.array([3, 14, 16]) >>> T5 &= T6 >>> T5 DNDarray([ 2, 4, 16], dtype=ht.int64, device=cpu:0, split=None) @@ -457,7 +457,7 @@ def bitwise_or_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: DNDarray([33, 5], dtype=ht.int64, device=cpu:0, split=None) >>> s 1 - >>> T4 = ht.array([2,5,255]) + >>> T4 = ht.array([2, 5, 255]) >>> T5 = ht.array([4, 4, 4]) >>> T4 |= T5 >>> T4 @@ -524,9 +524,9 @@ def bitwise_xor( DNDarray(28, dtype=ht.int64, device=cpu:0, split=None) >>> ht.bitwise_xor(31, 5) DNDarray(26, dtype=ht.int64, device=cpu:0, split=None) - >>> ht.bitwise_xor(ht.array([31,3]), 5) + >>> ht.bitwise_xor(ht.array([31, 3]), 5) DNDarray([26, 6], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.bitwise_xor(ht.array([31,3]), ht.array([5,6])) + >>> ht.bitwise_xor(ht.array([31, 3]), ht.array([5, 6])) DNDarray([26, 5], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bitwise_xor(ht.array([True, True]), ht.array([False, True])) DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None) @@ -598,7 +598,7 @@ def bitwise_xor_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: DNDarray([26, 6], dtype=ht.int64, device=cpu:0, split=None) >>> s 5 - >>> T4 = ht.array([31,3,255]) + >>> T4 = ht.array([31, 3, 255]) >>> T5 = ht.array([5, 6, 4]) >>> T4 ^= T5 >>> T4 @@ -661,7 +661,7 @@ def copysign( -------- >>> ht.copysign(ht.array([3, 2, -8, -2, 4]), 1) DNDarray([3, 2, 8, 2, 4], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.copysign(ht.array([3., 2., -8., -2., 4.]), ht.array([1., -1., 1., -1., 1.])) + >>> ht.copysign(ht.array([3.0, 2.0, -8.0, -2.0, 4.0]), ht.array([1.0, -1.0, 1.0, -1.0, 1.0])) DNDarray([ 3., -2., 8., -2., 4.], dtype=ht.float32, device=cpu:0, split=None) """ try: @@ -702,7 +702,7 @@ def copysign_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: Examples -------- >>> import heat as ht - >>> T1 = ht.array([3., 2., -8., -2., 4.]) + >>> T1 = ht.array([3.0, 2.0, -8.0, -2.0, 4.0]) >>> s = 2.0 >>> T1.copysign_(s) DNDarray([3., 2., 8., 2., 4.], dtype=ht.float32, device=cpu:0, split=None) @@ -710,8 +710,8 @@ def copysign_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: DNDarray([3., 2., 8., 2., 4.], dtype=ht.float32, device=cpu:0, split=None) >>> s 2.0 - >>> T2 = ht.array([[1., -1.],[1., -1.]]) - >>> T3 = ht.array([-5., 2.]) + >>> T2 = ht.array([[1.0, -1.0], [1.0, -1.0]]) + >>> T3 = ht.array([-5.0, 2.0]) >>> T2.copysign_(T3) DNDarray([[-1., 1.], [-1., 1.]], dtype=ht.float32, device=cpu:0, split=None) @@ -767,7 +767,7 @@ def cumprod(a: DNDarray, axis: int, dtype: datatype = None, out=None) -> DNDarra Examples -------- - >>> a = ht.full((3,3), 2) + >>> a = ht.full((3, 3), 2) >>> ht.cumprod(a, 0) DNDarray([[2., 2., 2.], [4., 4., 4.], @@ -796,7 +796,7 @@ def cumprod_(t: DNDarray, axis: int) -> DNDarray: Examples -------- >>> import heat as ht - >>> T = ht.full((3,3), 2) + >>> T = ht.full((3, 3), 2) >>> T.cumprod_(0) DNDarray([[2., 2., 2.], [4., 4., 4.], @@ -821,6 +821,14 @@ def wrap_cumprod_(a: torch.Tensor, b: int, out=None, dtype=None) -> torch.Tensor def wrap_mul_(a: torch.Tensor, b: torch.Tensor, out=None) -> torch.Tensor: return a.mul_(b) + axis = stride_tricks.sanitize_axis(t.shape, axis) + if axis is None: + raise NotImplementedError("cumprod_ is not implemented for axis=None") + + if not t.is_distributed(): + t.larray.cumprod_(dim=axis) + return t + return _operations.__cum_op(t, wrap_cumprod_, MPI.PROD, wrap_mul_, 1, axis, dtype=None, out=t) @@ -850,7 +858,7 @@ def cumsum(a: DNDarray, axis: int, dtype: datatype = None, out=None) -> DNDarray Examples -------- - >>> a = ht.ones((3,3)) + >>> a = ht.ones((3, 3)) >>> ht.cumsum(a, 0) DNDarray([[1., 1., 1.], [2., 2., 2.], @@ -874,7 +882,7 @@ def cumsum_(t: DNDarray, axis: int) -> DNDarray: Examples -------- >>> import heat as ht - >>> T = ht.ones((3,3)) + >>> T = ht.ones((3, 3)) >>> T.cumsum_(0) DNDarray([[1., 1., 1.], [2., 2., 2.], @@ -891,6 +899,14 @@ def wrap_cumsum_(a: torch.Tensor, b: int, out=None, dtype=None) -> torch.Tensor: def wrap_add_(a: torch.Tensor, b: torch.Tensor, out=None) -> torch.Tensor: return a.add_(b) + axis = stride_tricks.sanitize_axis(t.shape, axis) + if axis is None: + raise NotImplementedError("cumsum_ is not implemented for axis=None") + + if not t.is_distributed(): + t.larray.cumsum_(dim=axis) + return t + return _operations.__cum_op(t, wrap_cumsum_, MPI.SUM, wrap_add_, 0, axis, dtype=None, out=t) @@ -913,7 +929,7 @@ def diff( output array is balanced. Parameters - ------- + ---------- a : DNDarray Input array n : int, optional @@ -1622,8 +1638,8 @@ def wrap_gcd_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor: def hypot( - a: DNDarray, - b: DNDarray, + t1: DNDarray, + t2: DNDarray, /, out: Optional[DNDarray] = None, *, @@ -1635,9 +1651,9 @@ def hypot( Parameters ---------- - a: DNDarray + t1: DNDarray The first input array - b: DNDarray + t2: DNDarray the second input array out: DNDarray, optional The output array. It must have a shape that the inputs broadcast to and matching split axis. @@ -1651,17 +1667,27 @@ def hypot( Examples -------- - >>> a = ht.array([2.]) - >>> b = ht.array([1.,3.,3.]) - >>> ht.hypot(a,b) + >>> a = ht.array([2.0]) + >>> b = ht.array([1.0, 3.0, 3.0]) + >>> ht.hypot(a, b) DNDarray([2.2361, 3.6056, 3.6056], dtype=ht.float32, device=cpu:0, split=None) """ + # catch int64 operation crash on MPS. TODO: issue still persists in 2.3.0, check 2.4, report to PyTorch + t1_ismps = getattr(getattr(t1, "device", "cpu"), "torch_device", "cpu").startswith("mps") + t2_ismps = getattr(getattr(t2, "device", "cpu"), "torch_device", "cpu").startswith("mps") + if t1_ismps or t2_ismps: + t1_isint64 = getattr(t1, "dtype", None) == types.int64 + t2_isint64 = getattr(t2, "dtype", None) == types.int64 + if t1_isint64 or t2_isint64: + raise TypeError( + f"hypot on MPS does not support int64 dtype, got {t1.dtype}, {t2.dtype}" + ) + try: - res = _operations.__binary_op(torch.hypot, a, b, out, where) + res = _operations.__binary_op(torch.hypot, t1, t2, out, where) except RuntimeError: # every other possibility is caught by __binary_op - raise TypeError(f"Not implemented for array dtype, got {a.dtype}, {b.dtype}") - + raise TypeError(f"hypot on CPU does not support Int dtype, got {t1.dtype}, {t2.dtype}") return res @@ -1691,8 +1717,8 @@ def hypot_(t1: DNDarray, t2: DNDarray) -> DNDarray: Examples -------- >>> import heat as ht - >>> T1 = ht.array([1.,3.,3.]) - >>> T2 = ht.array(2.) + >>> T1 = ht.array([1.0, 3.0, 3.0]) + >>> T2 = ht.array(2.0) >>> T1.hypot_(T2) DNDarray([2.2361, 3.6056, 3.6056], dtype=ht.float32, device=cpu:0, split=None) >>> T1 @@ -1704,6 +1730,17 @@ def hypot_(t1: DNDarray, t2: DNDarray) -> DNDarray: def wrap_hypot_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor: return a.hypot_(b) + # catch int64 operation crash on MPS + t1_ismps = getattr(getattr(t1, "device", "cpu"), "torch_device", "cpu").startswith("mps") + t2_ismps = getattr(getattr(t2, "device", "cpu"), "torch_device", "cpu").startswith("mps") + if t1_ismps or t2_ismps: + t1_isint64 = getattr(t1, "dtype", None) == types.int64 + t2_isint64 = getattr(t2, "dtype", None) == types.int64 + if t1_isint64 or t2_isint64: + raise TypeError( + f"hypot_ on MPS does not support int64 dtype, got {t1.dtype}, {t2.dtype}" + ) + try: return _operations.__binary_op(wrap_hypot_, t1, t2, out=t1) except NotImplementedError: @@ -1711,7 +1748,7 @@ def wrap_hypot_(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor: f"In-place operation not allowed: operands are distributed along different axes. \n Operand 1 with shape {t1.shape} is split along axis {t1.split}. \n Operand 2 with shape {t2.shape} is split along axis {t2.split}." ) except RuntimeError: - raise TypeError(f"Not implemented for array dtype, got {t1.dtype}, {t2.dtype}") + raise TypeError(f"hypot on CPU does not support Int dtype, got {t1.dtype}, {t2.dtype}") DNDarray.hypot_ = hypot_ @@ -1724,7 +1761,7 @@ def invert(a: DNDarray, /, out: Optional[DNDarray] = None) -> DNDarray: Bitwise_not is an alias for invert. Parameters - --------- + ---------- a: DNDarray The input array to invert. Must be of integral or Boolean types out : DNDarray, optional @@ -1834,12 +1871,12 @@ def lcm( -------- >>> a = ht.array([6, 12, 15]) >>> b = ht.array([3, 4, 5]) - >>> ht.lcm(a,b) + >>> ht.lcm(a, b) DNDarray([ 6, 12, 15], dtype=ht.int64, device=cpu:0, split=None) >>> s = 2 - >>> ht.lcm(s,a) + >>> ht.lcm(s, a) DNDarray([ 6, 12, 30], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.lcm(b,s) + >>> ht.lcm(b, s) DNDarray([ 6, 4, 10], dtype=ht.int64, device=cpu:0, split=None) """ try: @@ -1943,7 +1980,7 @@ def left_shift( Examples -------- - >>> ht.left_shift(ht.array([1,2,3]), 1) + >>> ht.left_shift(ht.array([1, 2, 3]), 1) DNDarray([2, 4, 6], dtype=ht.int64, device=cpu:0, split=None) """ dtypes = (heat_type_of(t1), heat_type_of(t2)) @@ -2000,7 +2037,7 @@ def left_shift_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: Examples -------- >>> import heat as ht - >>> T1 = ht.array([1,2,3]) + >>> T1 = ht.array([1, 2, 3]) >>> s = 1 >>> T1.left_shift_(s) DNDarray([2, 4, 6], dtype=ht.int64, device=cpu:0, split=None) @@ -2208,7 +2245,7 @@ def nan_to_num( Examples -------- - >>> x = ht.array([float('nan'), float('inf'), -float('inf')]) + >>> x = ht.array([float("nan"), float("inf"), -float("inf")]) >>> ht.nan_to_num(x) DNDarray([ 0.0000e+00, 3.4028e+38, -3.4028e+38], dtype=ht.float32, device=cpu:0, split=None) """ @@ -2245,7 +2282,7 @@ def nan_to_num_( Examples -------- >>> import heat as ht - >>> T1 = ht.array([float('nan'), float('inf'), -float('inf')]) + >>> T1 = ht.array([float("nan"), float("inf"), -float("inf")]) >>> T1.nan_to_num_() DNDarray([ 0.0000e+00, 3.4028e+38, -3.4028e+38], dtype=ht.float32, device=cpu:0, split=None) >>> T1 @@ -2298,7 +2335,7 @@ def nanprod( Examples -------- - >>> ht.nanprod(ht.array([4.,ht.nan])) + >>> ht.nanprod(ht.array([4.0, ht.nan])) DNDarray(4., dtype=ht.float32, device=cpu:0, split=None) >>> ht.nanprod(ht.array([ [1.,ht.nan], @@ -2349,11 +2386,11 @@ def nansum( -------- >>> ht.sum(ht.ones(2)) DNDarray(2., dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,3))) + >>> ht.sum(ht.ones((3, 3))) DNDarray(9., dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,3)).astype(ht.int)) + >>> ht.sum(ht.ones((3, 3)).astype(ht.int)) DNDarray(9, dtype=ht.int64, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,2,1)), axis=-3) + >>> ht.sum(ht.ones((3, 2, 1)), axis=-3) DNDarray([[3.], [3.]], dtype=ht.float32, device=cpu:0, split=None) """ @@ -2377,7 +2414,7 @@ def neg(a: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: -------- >>> ht.neg(ht.array([-1, 1])) DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None) - >>> -ht.array([-1., 1.]) + >>> -ht.array([-1.0, 1.0]) DNDarray([ 1., -1.], dtype=ht.float32, device=cpu:0, split=None) """ sanitation.sanitize_in(a) @@ -2411,7 +2448,7 @@ def neg_(t: DNDarray) -> DNDarray: DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None) >>> T1 DNDarray([ 1, -1], dtype=ht.int64, device=cpu:0, split=None) - >>> T2 = ht.array([[-1., 2.5], [4. , 0.]]) + >>> T2 = ht.array([[-1.0, 2.5], [4.0, 0.0]]) >>> T2.neg_() DNDarray([[ 1.0000, -2.5000], [-4.0000, -0.0000]], dtype=ht.float32, device=cpu:0, split=None) @@ -2449,7 +2486,7 @@ def pos(a: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: -------- >>> ht.pos(ht.array([-1, 1])) DNDarray([-1, 1], dtype=ht.int64, device=cpu:0, split=None) - >>> +ht.array([-1., 1.]) + >>> +ht.array([-1.0, 1.0]) DNDarray([-1., 1.], dtype=ht.float32, device=cpu:0, split=None) """ sanitation.sanitize_in(a) @@ -2502,7 +2539,7 @@ def pow( Examples -------- - >>> ht.pow (3.0, 2.0) + >>> ht.pow(3.0, 2.0) DNDarray(9., dtype=ht.float32, device=cpu:0, split=None) >>> T1 = ht.float32([[1, 2], [3, 4]]) >>> T2 = ht.float32([[3, 3], [2, 2]]) @@ -2671,7 +2708,7 @@ def prod( Examples -------- - >>> ht.prod(ht.array([1.,2.])) + >>> ht.prod(ht.array([1.0, 2.0])) DNDarray(2., dtype=ht.float32, device=cpu:0, split=None) >>> ht.prod(ht.array([ [1.,2.], @@ -2857,7 +2894,7 @@ def right_shift( Examples -------- - >>> ht.right_shift(ht.array([1,2,3]), 1) + >>> ht.right_shift(ht.array([1, 2, 3]), 1) DNDarray([0, 1, 1], dtype=ht.int64, device=cpu:0, split=None) """ dtypes = (heat_type_of(t1), heat_type_of(t2)) @@ -2914,7 +2951,7 @@ def right_shift_(t1: DNDarray, t2: Union[DNDarray, float]) -> DNDarray: Examples -------- >>> import heat as ht - >>> T1 = ht.array([1,2,32]) + >>> T1 = ht.array([1, 2, 32]) >>> s = 1 >>> T1.right_shift_(s) DNDarray([ 0, 1, 16], dtype=ht.int64, device=cpu:0, split=None) @@ -3124,11 +3161,11 @@ def sum( -------- >>> ht.sum(ht.ones(2)) DNDarray(2., dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,3))) + >>> ht.sum(ht.ones((3, 3))) DNDarray(9., dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,3)).astype(ht.int)) + >>> ht.sum(ht.ones((3, 3)).astype(ht.int)) DNDarray(9, dtype=ht.int64, device=cpu:0, split=None) - >>> ht.sum(ht.ones((3,2,1)), axis=-3) + >>> ht.sum(ht.ones((3, 2, 1)), axis=-3) DNDarray([[3.], [3.]], dtype=ht.float32, device=cpu:0, split=None) """ diff --git a/heat/core/base.py b/heat/core/base.py index 9c1233ce4b..66c2000cbe 100644 --- a/heat/core/base.py +++ b/heat/core/base.py @@ -173,10 +173,10 @@ def transform(self, x: DNDarray) -> DNDarray: """ Transforms the input data. - Parameters - ---------- - x : DNDarray - Values to transform. Shape = (n_samples, n_features) + Parameters + ---------- + x : DNDarray + Values to transform. Shape = (n_samples, n_features) """ raise NotImplementedError() diff --git a/heat/core/communication.py b/heat/core/communication.py index 6443d31b01..eb3443bc10 100644 --- a/heat/core/communication.py +++ b/heat/core/communication.py @@ -5,8 +5,6 @@ from __future__ import annotations import numpy as np -import os -import subprocess import math import ctypes import torch @@ -116,6 +114,8 @@ class MPICommunication(Communication): Handle for the mpi4py Communicator """ + COUNT_LIMIT = torch.iinfo(torch.int32).max + __mpi_type_mappings = { torch.bool: MPI.BOOL, torch.uint8: MPI.UNSIGNED_CHAR, @@ -134,8 +134,8 @@ class MPICommunication(Communication): def __init__(self, handle=MPI.COMM_WORLD): self.handle = handle try: - self.rank = handle.Get_rank() - self.size = handle.Get_size() + self.rank: Optional[int] = handle.Get_rank() + self.size: Optional[int] = handle.Get_size() except MPI.Exception: # ranks not within the group will fail with an MPI.Exception, this is expected self.rank = None @@ -281,7 +281,33 @@ def mpi_type_and_elements_of( if is_contiguous: if counts is None: - return mpi_type, elements + if elements > cls.COUNT_LIMIT: + # Uses vector type to get around the MAX_INT limit on certain MPI implementations + # This is at the moment only applied when sending contiguous data, as the construction of data types to get around non-contiguous data naturally aliviates the problem to a certain extent. + # Thanks to: J. R. Hammond, A. Schäfer and R. Latham, "To INT_MAX... and Beyond! Exploring Large-Count Support in MPI," 2014 Workshop on Exascale MPI at Supercomputing Conference, New Orleans, LA, USA, 2014, pp. 1-8, doi: 10.1109/ExaMPI.2014.5. keywords: {Vectors;Standards;Libraries;Optimization;Context;Memory management;Open area test sites}, + + new_count = elements // cls.COUNT_LIMIT + left_over = elements % cls.COUNT_LIMIT + + if new_count > cls.COUNT_LIMIT: + raise ValueError("Tensor is too large") + vector_type = mpi_type.Create_vector( + new_count, cls.COUNT_LIMIT, cls.COUNT_LIMIT + ) + if left_over > 0: + left_over_mpi_type = mpi_type.Create_contiguous(left_over).Commit() + _, old_type_extent = mpi_type.Get_extent() + disp = cls.COUNT_LIMIT * new_count * old_type_extent + struct_type = mpi_type.Create_struct( + [1, 1], [0, disp], [vector_type, left_over_mpi_type] + ).Commit() + vector_type.Free() + left_over_mpi_type.Free() + return struct_type, 1 + else: + return vector_type, 1 + else: + return mpi_type, elements factor = np.prod(obj.shape[1:], dtype=np.int32) return ( mpi_type, @@ -310,7 +336,7 @@ def mpi_type_and_elements_of( return mpi_type, elements @classmethod - def as_mpi_memory(cls, obj) -> MPI.memory: + def as_mpi_memory(cls, obj: torch.Tensor) -> MPI.memory: """ Converts the passed ``torch.Tensor`` into an MPI compatible memory view. @@ -320,7 +346,8 @@ def as_mpi_memory(cls, obj) -> MPI.memory: The tensor to be converted into a MPI memory view. """ # TODO: MPI.memory might be depraecated in future versions of mpi4py. The following code might need to be adapted and use MPI.buffer instead. - return MPI.memory.fromaddress(obj.data_ptr(), 0) + nbytes = obj.dtype.itemsize * obj.numel() + return MPI.memory.fromaddress(obj.data_ptr(), nbytes) @classmethod def as_buffer( @@ -494,7 +521,7 @@ def Irecv( Nonblocking receive Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address where to place the received message source: int, optional @@ -523,7 +550,7 @@ def Recv( Blocking receive Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address where to place the received message source: int, optional @@ -554,7 +581,7 @@ def __send_like( Generic function for sending a message to process with rank "dest" Parameters - ------------ + ---------- func: Callable The respective MPI sending function buf: Union[DNDarray, torch.Tensor, Any] @@ -578,7 +605,7 @@ def Bsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0 Blocking buffered send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -597,7 +624,7 @@ def Ibsend( Nonblocking buffered send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -616,7 +643,7 @@ def Irsend( Nonblocking ready send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -633,7 +660,7 @@ def Isend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0 Nonblocking send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -652,7 +679,7 @@ def Issend( Nonblocking synchronous send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -669,7 +696,7 @@ def Rsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0 Blocking ready send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -686,7 +713,7 @@ def Ssend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0 Blocking synchronous send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -703,7 +730,7 @@ def Send(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0) Blocking send Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be send dest: int, optional @@ -723,7 +750,7 @@ def __broadcast_like( communicator Parameters - ------------ + ---------- func: Callable The respective MPI broadcast function buf: Union[DNDarray, torch.Tensor, Any] @@ -747,7 +774,7 @@ def Bcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> None: Blocking Broadcast Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be broadcasted root: int @@ -765,7 +792,7 @@ def Ibcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> MPIR Nonblocking Broadcast Parameters - ------------ + ---------- buf: Union[DNDarray, torch.Tensor, Any] Buffer address of the message to be broadcasted root: int @@ -775,25 +802,91 @@ def Ibcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> MPIR Ibcast.__doc__ = MPI.Comm.Ibcast.__doc__ + def __derived_op( + self, tensor: torch.Tensor, datatype: MPI.Datatype, operation: MPI.Op + ) -> Callable[[MPI.memory, MPI.memory, MPI.Datatype], None]: + # Based from this conversation on the internet: https://groups.google.com/g/mpi4py/c/UkDT_9pp4V4?pli=1 + shape = tensor.shape + dtype = tensor.dtype + stride = tensor.stride() + offset = tensor.storage_offset() + count = tensor.numel() + + mpiOp2torch = { + MPI.SUM.handle: torch.add, + MPI.PROD.handle: torch.mul, + MPI.MIN.handle: torch.min, + MPI.MAX.handle: torch.max, + MPI.LAND.handle: torch.logical_and, + MPI.LOR.handle: torch.logical_or, + MPI.LXOR.handle: torch.logical_xor, + MPI.BAND.handle: torch.bitwise_and, + MPI.BOR.handle: torch.bitwise_or, + MPI.BXOR.handle: torch.bitwise_xor, + # MPI.MINLOC.handle: torch.argmin, Not supported, seems to be an invalid inplace operation + # MPI.MAXLOC.handle: torch.argmax + } + mpiDtype2Ctype = { + torch.bool: ctypes.c_bool, + torch.uint8: ctypes.c_uint8, + torch.uint16: ctypes.c_uint16, + torch.uint32: ctypes.c_uint32, + torch.uint64: ctypes.c_uint64, + torch.int8: ctypes.c_int8, + torch.int16: ctypes.c_int16, + torch.int32: ctypes.c_int32, + torch.int64: ctypes.c_int64, + torch.float32: ctypes.c_float, + torch.float64: ctypes.c_double, + torch.complex64: ctypes.c_double, + torch.complex128: ctypes.c_longdouble, + } + ctype_size = mpiDtype2Ctype[dtype] + torch_op = mpiOp2torch[operation.handle] + + def op(sendbuf: MPI.memory, recvbuf: MPI.memory, datatype): + send_arr = (ctype_size * (count + offset)).from_address(sendbuf.address) + recv_arr = (ctype_size * (count + offset)).from_address(recvbuf.address) + + send_tensor = torch.as_strided( + torch.frombuffer(send_arr, dtype=dtype, count=count, offset=offset), shape, stride + ) + recv_tensor = torch.as_strided( + torch.frombuffer(recv_arr, dtype=dtype, count=count, offset=offset), shape, stride + ) + torch_op(send_tensor, recv_tensor, out=recv_tensor) + + op = MPI.Op.Create(op) + + return op + def __reduce_like( self, func: Callable, sendbuf: Union[DNDarray, torch.Tensor, Any], recvbuf: Union[DNDarray, torch.Tensor, Any], - *args, - **kwargs, + op: MPI.Op, + *args: Any, + **kwargs: Any, ) -> Tuple[Optional[DNDarray, torch.Tensor]]: """ Generic function for reduction operations. Parameters - ------------ + ---------- func: Callable The respective MPI reduction operation sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] Buffer address where to store the result of the reduction + op: MPI.Op + Operation to apply during the reduction. + *args: Any + Additional positional arguments to be passed to the function + **kwargs: Any + Additional keyword arguments to be passed to the function + """ sbuf = None rbuf = None @@ -808,56 +901,59 @@ def __reduce_like( # harmonize the input and output buffers # MPI requires send and receive buffers to be of same type and length. If the torch tensors are either not both # contiguous or differently strided, they have to be made matching (if possible) first. - if isinstance(sendbuf, torch.Tensor): - # convert the send buffer to a pointer, number of elements and type are identical to the receive buffer - dummy = ( - sendbuf.contiguous() - ) # make a contiguous copy and reassign the storage, old will be collected - # In PyTorch Version >= 2.0.0 we can use untyped_storage() instead of storage - # to keep backward compatibility with earlier PyTorch versions (where no untyped_storage() exists) we use a try/except - # (this applies to all places of Heat where untyped_storage() is used without further comment) - try: - sendbuf.set_( - dummy.untyped_storage(), - dummy.storage_offset(), - size=dummy.shape, - stride=dummy.stride(), - ) - except AttributeError: - sendbuf.set_( - dummy.storage(), - dummy.storage_offset(), - size=dummy.shape, - stride=dummy.stride(), - ) - sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu() - sendbuf = self.as_buffer(sbuf) + if sendbuf is not MPI.IN_PLACE: + # Send and recv buffer need the same number of elements. + if sendbuf.numel() != recvbuf.numel(): + raise ValueError("Send and recv buffers need the same number of elements.") + + # Stride and offset should be the same to create the same datatype and operation. If they differ, they should be made contiguous (at the expense of memory) + if ( + sendbuf.stride() != recvbuf.stride() + or sendbuf.storage_offset() != recvbuf.storage_offset() + ): + if not sendbuf.is_contiguous(): + tmp = sendbuf.contiguous() + try: + sendbuf.set_( + tmp.untyped_storage(), + tmp.storage_offset(), + size=tmp.shape, + stride=tmp.stride(), + ) + except AttributeError: + sendbuf.set_( + tmp.storage(), tmp.storage_offset(), size=tmp.shape, stride=tmp.stride() + ) + if not recvbuf.is_contiguous(): + tmp = recvbuf.contiguous() + try: + recvbuf.set_( + tmp.untyped_storage(), + tmp.storage_offset(), + size=tmp.shape, + stride=tmp.stride(), + ) + except AttributeError: + recvbuf.set_( + tmp.storage(), tmp.storage_offset(), size=tmp.shape, stride=tmp.stride() + ) + if isinstance(recvbuf, torch.Tensor): + # Datatype and count shall be derived from the recv buffer, and applied to both, as they should match after the last code block buf = recvbuf - # nothing matches, the buffers have to be made contiguous - dummy = recvbuf.contiguous() - try: - recvbuf.set_( - dummy.untyped_storage(), - dummy.storage_offset(), - size=dummy.shape, - stride=dummy.stride(), - ) - except AttributeError: - recvbuf.set_( - dummy.storage(), - dummy.storage_offset(), - size=dummy.shape, - stride=dummy.stride(), - ) rbuf = recvbuf if CUDA_AWARE_MPI else recvbuf.cpu() - if sendbuf is MPI.IN_PLACE: - recvbuf = self.as_buffer(rbuf) - else: - recvbuf = (self.as_mpi_memory(rbuf), sendbuf[1], sendbuf[2]) + recvbuf: Tuple[MPI.memory, int, MPI.Datatype] = self.as_buffer(rbuf, is_contiguous=True) + if not recvbuf[2].is_predefined: + # If using a derived datatype, we need to define the reduce operation to be able to handle the it. + derived_op = self.__derived_op(rbuf, recvbuf[2], op) + op = derived_op + + if isinstance(sendbuf, torch.Tensor): + sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu() + sendbuf = (self.as_mpi_memory(sbuf), recvbuf[1], recvbuf[2]) # perform the actual reduction operation - return func(sendbuf, recvbuf, *args, **kwargs), sbuf, rbuf, buf + return func(sendbuf, recvbuf, op, *args, **kwargs), sbuf, rbuf, buf def Allreduce( self, @@ -869,7 +965,7 @@ def Allreduce( Combines values from all processes and distributes the result back to all processes Parameters - --------- + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -894,7 +990,7 @@ def Exscan( Computes the exclusive scan (partial reductions) of data on a collection of processes Parameters - ------------ + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -919,7 +1015,7 @@ def Iallreduce( Nonblocking allreduce reducing values on all processes to a single value Parameters - --------- + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -941,7 +1037,7 @@ def Iexscan( Nonblocking Exscan Parameters - ------------ + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -963,7 +1059,7 @@ def Iscan( Nonblocking Scan Parameters - ------------ + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -986,7 +1082,7 @@ def Ireduce( Nonblocking reduction operation Parameters - --------- + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -1011,7 +1107,7 @@ def Reduce( Reduce values from all processes to a single value on process "root" Parameters - --------- + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -1038,7 +1134,7 @@ def Scan( Computes the scan (partial reductions) of data on a collection of processes in a nonblocking way Parameters - ------------ + ---------- sendbuf: Union[DNDarray, torch.Tensor, Any] Buffer address of the send message recvbuf: Union[DNDarray, torch.Tensor, Any] @@ -1074,6 +1170,8 @@ def __allgather_like( Buffer address where to store the result axis: int Concatenation axis: The axis along which ``sendbuf`` is packed and along which ``recvbuf`` puts together individual chunks + **kwargs + Extra arguments to be passed to the function. """ # dummy allocation for *v calls # ToDO: Propper implementation of usage @@ -1269,6 +1367,8 @@ def __alltoall_like( - if ``send_axis`` or ``recv_axis`` are ``None``, an error will be thrown recv_axis: int Prior split axis, along which blocks are received from the individual ranks + **kwargs + Extra arguments to be passed to the function. """ if send_axis is None: raise NotImplementedError( @@ -1484,7 +1584,6 @@ def Alltoallw( lshape, subsizes, substarts = subarray_params if np.all(np.array(subsizes) > 0): - if is_contiguous: # Commit the source subarray datatypes # Subarray parameters are calculated based on the work by Dalcin et al. (https://arxiv.org/abs/1804.09536) @@ -1580,7 +1679,9 @@ def _create_recursive_vectortype( >>> datatype = MPI.INT >>> tensor_stride = [1, 2, 3] >>> subarray_sizes = [4, 5, 6] - >>> recursive_vectortype = create_recursive_vectortype(datatype, tensor_stride, subarray_sizes) + >>> recursive_vectortype = create_recursive_vectortype( + ... datatype, tensor_stride, subarray_sizes + ... ) """ datatype_history = [] current_datatype = datatype @@ -1718,6 +1819,8 @@ def __gather_like( Number of elements to be scattered (vor non-v-calls) recv_factor: int Number of elements to be gathered (vor non-v-calls) + **kwargs + Extra arguments to be passed to the function. """ sbuf, rbuf, recv_axis_permutation = None, None, None @@ -1960,6 +2063,8 @@ def __scatter_like( Number of elements to be scattered (vor non-v-calls) recv_factor: int Number of elements to be gathered (vor non-v-calls) + **kwargs + Extra arguments to be passed to the function. """ sbuf, rbuf, recv_axis_permutation = None, None, None diff --git a/heat/core/complex_math.py b/heat/core/complex_math.py index b33e913075..1384140a5d 100644 --- a/heat/core/complex_math.py +++ b/heat/core/complex_math.py @@ -1,5 +1,5 @@ """ -This module handles operations focussing on complex numbers. +Complex numbers module. """ import torch @@ -30,9 +30,9 @@ def angle(x: DNDarray, deg: bool = False, out: Optional[DNDarray] = None) -> DND Examples -------- - >>> ht.angle(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j])) + >>> ht.angle(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])) DNDarray([ 0.0000, 1.5708, 0.7854, 2.3562, -0.7854], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.angle(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j]), deg=True) + >>> ht.angle(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]), deg=True) DNDarray([ 0., 90., 45., 135., -45.], dtype=ht.float32, device=cpu:0, split=None) """ a = _operations.__local_op(torch.angle, x, out) @@ -56,7 +56,7 @@ def conjugate(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.conjugate(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j])) + >>> ht.conjugate(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])) DNDarray([ (1-0j), -1j, (1-1j), (-2-2j), (3+3j)], dtype=ht.complex64, device=cpu:0, split=None) """ return _operations.__local_op(torch.conj, x, out) @@ -81,7 +81,7 @@ def imag(x: DNDarray) -> DNDarray: Examples -------- - >>> ht.imag(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j])) + >>> ht.imag(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])) DNDarray([ 0., 1., 1., 2., -3.], dtype=ht.float32, device=cpu:0, split=None) """ if types.heat_type_is_complexfloating(x.dtype): @@ -101,7 +101,7 @@ def real(x: DNDarray) -> DNDarray: Examples -------- - >>> ht.real(ht.array([1.0, 1.0j, 1+1j, -2+2j, 3 - 3j])) + >>> ht.real(ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j])) DNDarray([ 1., 0., 1., -2., 3.], dtype=ht.float32, device=cpu:0, split=None) """ if types.heat_type_is_complexfloating(x.dtype): diff --git a/heat/core/constants.py b/heat/core/constants.py index 3641178c66..80a745f598 100644 --- a/heat/core/constants.py +++ b/heat/core/constants.py @@ -1,5 +1,5 @@ """ -This module defines constants used in HeAT. +Constants module. """ import torch diff --git a/heat/core/devices.py b/heat/core/devices.py index dfb69d2224..83a2be05c6 100644 --- a/heat/core/devices.py +++ b/heat/core/devices.py @@ -16,13 +16,13 @@ class Device: """ - Implements a compute device. HeAT can run computations on different compute devices or backends. + Implements a compute device. Heat can run computations on different compute devices or backends. A device describes the device type and id on which said computation should be carried out. Parameters ---------- device_type : str - Represents HeAT's device name + Represents Heat's device name device_id : int The device id torch_device : str @@ -34,6 +34,8 @@ class Device: device(cpu:0) >>> ht.Device("gpu", 0, "cuda:0") device(gpu:0) + >>> ht.Device("gpu", 0, "mps:0") # on Apple M1/M2 + device(gpu:0) """ def __init__(self, device_type: str, device_id: int, torch_device: str): @@ -133,6 +135,28 @@ def __eq__(self, other: Any) -> bool: # the GPU device should be exported as global symbol __all__.append("gpu") +elif torch.backends.mps.is_built() and torch.backends.mps.is_available(): + # Apple MPS available + gpu_id = 0 + # create a new GPU device + gpu = Device("gpu", gpu_id, "mps:{}".format(gpu_id)) + """ + The standard GPU Device on Apple M1/M2 + + Examples + -------- + >>> ht.cpu + device(cpu:0) + >>> ht.ones((2, 3), device=ht.gpu) + DNDarray([[1., 1., 1.], + [1., 1., 1.]], dtype=ht.float32, device=mps:0, split=None) + """ + # add a GPU device string + __device_mapping[gpu.device_type] = gpu + __device_mapping["mps"] = gpu + # the GPU device should be exported as global symbol + __all__.append("gpu") + def get_device() -> Device: """ @@ -165,7 +189,7 @@ def sanitize_device(device: Optional[Union[str, Device]] = None) -> Device: try: return __device_mapping[device.strip().lower()] except (AttributeError, KeyError, TypeError): - raise ValueError(f'Unknown device, must be one of {", ".join(__device_mapping.keys())}') + raise ValueError(f"Unknown device, must be one of {', '.join(__device_mapping.keys())}") def use_device(device: Optional[Union[str, Device]] = None) -> None: diff --git a/heat/core/dndarray.py b/heat/core/dndarray.py index 9d9bda1037..3a295531a0 100644 --- a/heat/core/dndarray.py +++ b/heat/core/dndarray.py @@ -188,7 +188,7 @@ def ndim(self) -> int: @property def __partitioned__(self) -> dict: """ - This will return a dictionary containing information useful for working with the partitioned + Return a dictionary containing information useful for working with the partitioned data. These items include the shape of the data on each process, the starting index of the data that a process has, the datatype of the data, the local devices, as well as the global partitioning scheme. @@ -208,13 +208,16 @@ def size(self) -> int: """ Number of total elements of the ``DNDarray`` """ - return ( - torch.prod( + if self.larray.is_mps: + # MPS does not support double precision + size = torch.prod( + torch.tensor(self.gshape, dtype=torch.float32, device=self.device.torch_device) + ) + else: + size = torch.prod( torch.tensor(self.gshape, dtype=torch.float64, device=self.device.torch_device) ) - .long() - .item() - ) + return size.long().item() @property def gnbytes(self) -> int: @@ -382,7 +385,7 @@ def __prephalo(self, start, end) -> torch.Tensor: except IndexError: print("Indices out of bound") - return self.__array[ix].clone().contiguous() + return self.__array[ix].clone() def get_halo(self, halo_size: int, prev: bool = True, next: bool = True) -> torch.Tensor: """ @@ -479,6 +482,34 @@ def __array__(self) -> np.ndarray: """ return self.larray.cpu().__array__() + def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): + """ + Override NumPy's universal functions. + """ + import heat + + # TODO support ufunc method variants + if method == "__call__": + try: + func = getattr(heat, ufunc.__name__) + except AttributeError: + return NotImplemented + return func(*inputs, **kwargs) + else: + return NotImplemented + + def __array_function__(self, func, types, args, kwargs): + """ + Augments NumPy's functions. + """ + import heat + + try: + ht_func = getattr(heat, func.__name__) + except AttributeError: + return NotImplemented + return ht_func(*args, **kwargs) + def astype(self, dtype, copy=True) -> DNDarray: """ Returns a casted version of this array. @@ -495,6 +526,21 @@ def astype(self, dtype, copy=True) -> DNDarray: """ dtype = canonical_heat_type(dtype) + if self.__array.is_mps: + if dtype == types.float64: + # print warning + warnings.warn( + "MPS does not support float64. Casting to float32 instead.", + ResourceWarning, + ) + dtype = types.float32 + elif dtype == types.complex128: + # print warning + warnings.warn( + "MPS does not support complex128. Casting to complex64 instead.", + ResourceWarning, + ) + dtype = types.complex64 casted_array = self.__array.type(dtype.torch_type()) if copy: return DNDarray( @@ -788,7 +834,7 @@ def __float__(self) -> DNDarray: Float scalar casting. See Also - --------- + -------- :func:`~heat.core.manipulations.flatten` """ return self.__cast(float) @@ -854,7 +900,7 @@ def __getitem__(self, key: Union[int, Tuple[int, ...], List[int, ...]]) -> DNDar >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) - >>> a = ht.zeros((4,5), split=0) + >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], @@ -1116,6 +1162,7 @@ def is_balanced(self, force_check: bool = False) -> bool: assessed via collective communication. Parameters + ---------- force_check : bool, optional If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. @@ -1156,7 +1203,7 @@ def item(self): raised (by pytorch) Examples - ------- + -------- >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() @@ -1189,11 +1236,20 @@ def numpy(self) -> np.array: dist = self.copy().resplit_(axis=None) return dist.larray.cpu().numpy() + def _repr_pretty_(self, p, cycle): + """ + Pretty print for IPython. + """ + if cycle: + p.text(printing.__str__(self)) + else: + p.text(printing.__str__(self)) + def __repr__(self) -> str: """ - Computes a printable representation of the passed DNDarray. + Returns a printable representation of the passed DNDarray, targeting developers. """ - return printing.__str__(self) + return printing.__repr__(self) def ravel(self): """ @@ -1205,9 +1261,9 @@ def ravel(self): Examples -------- - >>> a = ht.ones((2,3), split=0) + >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() - >>> a[0,0] = 4 + >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) """ @@ -1423,7 +1479,13 @@ def resplit_(self, axis: int = None): Examples -------- - >>> a = ht.zeros((4, 5,), split=0) + >>> a = ht.zeros( + ... ( + ... 4, + ... 5, + ... ), + ... split=0, + ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) @@ -1433,7 +1495,13 @@ def resplit_(self, axis: int = None): >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) - >>> a = ht.zeros((4, 5,), split=0) + >>> a = ht.zeros( + ... ( + ... 4, + ... 5, + ... ), + ... split=0, + ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) @@ -1522,7 +1590,7 @@ def __setitem__( Examples -------- - >>> a = ht.zeros((4,5), split=0) + >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], @@ -1798,7 +1866,7 @@ def __setter( Utility function for checking ``value`` and forwarding to :func:``__setitem__`` Raises - ------------- + ------ NotImplementedError If the type of ``value`` ist not supported """ @@ -1834,15 +1902,15 @@ def tolist(self, keepsplit: bool = False) -> List: Examples -------- - >>> a = ht.array([[0,1],[2,3]]) + >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] - >>> a = ht.array([[0,1],[2,3]], split=0) + >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] - >>> a = ht.array([[0,1],[2,3]], split=1) + >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] @@ -1852,6 +1920,21 @@ def tolist(self, keepsplit: bool = False) -> List: return self.__array.tolist() + @classmethod + def __torch_function__(cls, func, types, args=(), kwargs=None): + """ + Supports PyTorch's dispatch mechanism. + """ + import heat + + if kwargs is None: + kwargs = {} + try: + ht_func = getattr(heat, func.__name__) + except AttributeError: + return NotImplemented + return ht_func(*args, **kwargs) + def __torch_proxy__(self) -> torch.Tensor: """ Return a 1-element `torch.Tensor` strided as the global `self` shape. @@ -1899,6 +1982,7 @@ def __xitem_get_key_start_stop( from . import statistics from . import stride_tricks from . import tiling +from . import types from .devices import Device from .stride_tricks import sanitize_axis diff --git a/heat/core/exponential.py b/heat/core/exponential.py index 85778ef3d0..359bc4d7f1 100644 --- a/heat/core/exponential.py +++ b/heat/core/exponential.py @@ -1,5 +1,5 @@ """ -This module computes exponential and logarithmic operations. +Exponential and logarithmic operations module. """ import torch @@ -63,7 +63,7 @@ def expm1(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.expm1(ht.arange(5)) + 1. + >>> ht.expm1(ht.arange(5)) + 1.0 DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float64, device=cpu:0, split=None) """ return _operations.__local_op(torch.expm1, x, out) @@ -303,7 +303,7 @@ def square(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: A location in which to store the results. If provided, it must have a broadcastable shape. If not provided or set to :keyword:`None`, a fresh array is allocated. - Examples: + Examples -------- >>> a = ht.random.rand(4) >>> a diff --git a/heat/core/factories.py b/heat/core/factories.py index dc2dbb4e01..389b671f24 100644 --- a/heat/core/factories.py +++ b/heat/core/factories.py @@ -59,15 +59,12 @@ def arange( Parameters ---------- - start : scalar, optional - Start of interval. The interval includes this value. The default start value is 0. - stop : scalar - End of interval. The interval does not include this value, except in some cases where ``step`` is not an - integer and floating point round-off affects the length of ``out``. - step : scalar, optional - Spacing between values. For any output ``out``, this is the distance between two adjacent values, - ``out[i+1]-out[i]``. The default step size is 1. If ``step`` is specified as a position argument, ``start`` - must also be given. + *args : int or float, optional + Positional arguments defining the interval. Can be: + - A single argument: interpreted as `stop`, with `start=0` and `step=1`. + - Two arguments: interpreted as `start` and `stop`, with `step=1`. + - Three arguments: interpreted as `start`, `stop`, and `step`. + The function raises a `TypeError` if more than three arguments are provided. dtype : datatype, optional The type of the output array. If `dtype` is not given, it is automatically inferred from the other input arguments. @@ -247,7 +244,7 @@ def array( 4 5 [torch.LongStorage of size 6] - >>> c = ht.array(a, order='F') + >>> c = ht.array(a, order="F") >>> c DNDarray([[0, 1, 2], [3, 4, 5]], dtype=ht.int64, device=cpu:0, split=None) @@ -264,7 +261,7 @@ def array( >>> a = np.arange(4 * 3).reshape(4, 3) >>> a.strides (24, 8) - >>> b = ht.array(a, order='F', split=0) + >>> b = ht.array(a, order="F", split=0) >>> b DNDarray([[ 0, 1, 2], [ 3, 4, 5], @@ -324,7 +321,6 @@ def array( f"'is_split' and the split axis of the object do not match ({is_split} != {obj.split}).\nIf you are trying to resplit an existing DNDarray in-place, use the method `DNDarray.resplit_()` instead." ) elif device is not None and device != obj.device and copy is False: - raise ValueError( "argument `copy` is set to False, but copy of input object is necessary as the array is being copied across devices.\nUse the method `DNDarray.cpu()` or `DNDarray.gpu()` to move the array to the desired device." ) @@ -516,24 +512,24 @@ def asarray( Examples -------- - >>> a = [1,2] + >>> a = [1, 2] >>> ht.asarray(a) DNDarray([1, 2], dtype=ht.int64, device=cpu:0, split=None) - >>> a = np.array([1,2,3]) + >>> a = np.array([1, 2, 3]) >>> n = ht.asarray(a) >>> n DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None) >>> n[0] = 0 >>> a DNDarray([0, 2, 3], dtype=ht.int64, device=cpu:0, split=None) - >>> a = torch.tensor([1,2,3]) + >>> a = torch.tensor([1, 2, 3]) >>> t = ht.asarray(a) >>> t DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None) >>> t[0] = 0 >>> a DNDarray([0, 2, 3], dtype=ht.int64, device=cpu:0, split=None) - >>> a = ht.array([1,2,3,4], dtype=ht.float32) + >>> a = ht.array([1, 2, 3, 4], dtype=ht.float32) >>> ht.asarray(a, dtype=ht.float32) is a True >>> ht.asarray(a, dtype=ht.float64) is a @@ -583,7 +579,12 @@ def empty( DNDarray([0., 0., 0.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.empty(3, dtype=ht.int) DNDarray([59140784, 0, 59136816], dtype=ht.int32, device=cpu:0, split=None) - >>> ht.empty((2, 3,)) + >>> ht.empty( + ... ( + ... 2, + ... 3, + ... ) + ... ) DNDarray([[-1.7206e-10, 4.5905e-41, -1.7206e-10], [ 4.5905e-41, 4.4842e-44, 0.0000e+00]], dtype=ht.float32, device=cpu:0, split=None) """ @@ -629,7 +630,12 @@ def empty_like( Examples -------- - >>> x = ht.ones((2, 3,)) + >>> x = ht.ones( + ... ( + ... 2, + ... 3, + ... ) + ... ) >>> x DNDarray([[1., 1., 1.], [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None) @@ -736,7 +742,7 @@ def __factory( shape : int or Sequence[ints,...] Desired shape of the output array, e.g. 1 or (1, 2, 3,). dtype : datatype - The desired HeAT data type for the array, defaults to ht.float32. + The desired Heat data type for the array, defaults to ht.float32. split : int or None The axis along which the array is split and distributed. local_factory : callable @@ -804,6 +810,8 @@ def __factory_like( Options: ``'C'`` or ``'F'``. Specifies the memory layout of the newly created array. Default is ``order='C'``, meaning the array will be stored in row-major order (C-like). If ``order=‘F’``, the array will be stored in column-major order (Fortran-like). + **kwargs + Keyword arguments for the factory method. Raises ------ @@ -867,7 +875,7 @@ def from_partitioned(x, comm: Optional[Communication] = None) -> DNDarray: comm: Communication, optional Handle to the nodes holding distributed parts or copies of this array. - See also + See Also -------- :func:`ht.core.DNDarray.create_partition_interface `. @@ -883,11 +891,11 @@ def from_partitioned(x, comm: Optional[Communication] = None) -> DNDarray: Examples -------- >>> import heat as ht - >>> a = ht.ones((44,55), split=0) + >>> a = ht.ones((44, 55), split=0) >>> b = ht.from_partitioned(a) - >>> assert (a==b).all() + >>> assert (a == b).all() >>> a[40] = 4711 - >>> assert (a==b).all() + >>> assert (a == b).all() """ comm = sanitize_comm(comm) parted = x.__partitioned__ @@ -912,7 +920,7 @@ def from_partition_dict(parted: dict, comm: Optional[Communication] = None) -> D comm: Communication, optional Handle to the nodes holding distributed parts or copies of this array. - See also + See Also -------- :func:`ht.core.DNDarray.create_partition_interface `. @@ -928,11 +936,11 @@ def from_partition_dict(parted: dict, comm: Optional[Communication] = None) -> D Examples -------- >>> import heat as ht - >>> a = ht.ones((44,55), split=0) + >>> a = ht.ones((44, 55), split=0) >>> b = ht.from_partition_dict(a.__partitioned__) - >>> assert (a==b).all() + >>> assert (a == b).all() >>> a[40] = 4711 - >>> assert (a==b).all() + >>> assert (a == b).all() """ comm = sanitize_comm(comm) return __from_partition_dict_helper(parted, comm) @@ -971,7 +979,7 @@ def __from_partition_dict_helper(parted: dict, comm: Communication): gshape_list = list(gshape) lshape_list = list(data.shape) shape_diff = torch.tensor( - [g - l for g, l in zip(gshape_list, lshape_list)] + [g_shape - l_shape for g_shape, l_shape in zip(gshape_list, lshape_list)] ) # dont care about device nz = torch.nonzero(shape_diff) @@ -1094,7 +1102,12 @@ def full_like( Examples -------- - >>> x = ht.zeros((2, 3,)) + >>> x = ht.zeros( + ... ( + ... 2, + ... 3, + ... ) + ... ) >>> x DNDarray([[0., 0., 0.], [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None) @@ -1284,7 +1297,7 @@ def meshgrid(*arrays: Sequence[DNDarray], indexing: str = "xy") -> List[DNDarray -------- >>> x = ht.arange(4) >>> y = ht.arange(3) - >>> xx, yy = ht.meshgrid(x,y) + >>> xx, yy = ht.meshgrid(x, y) >>> xx DNDarray([[0, 1, 2, 3], [0, 1, 2, 3], @@ -1385,7 +1398,12 @@ def ones( DNDarray([1., 1., 1.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.ones(3, dtype=ht.int) DNDarray([1, 1, 1], dtype=ht.int32, device=cpu:0, split=None) - >>> ht.ones((2, 3,)) + >>> ht.ones( + ... ( + ... 2, + ... 3, + ... ) + ... ) DNDarray([[1., 1., 1.], [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None) """ @@ -1429,7 +1447,12 @@ def ones_like( Examples -------- - >>> x = ht.zeros((2, 3,)) + >>> x = ht.zeros( + ... ( + ... 2, + ... 3, + ... ) + ... ) >>> x DNDarray([[0., 0., 0.], [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None) @@ -1481,7 +1504,12 @@ def zeros( DNDarray([0., 0., 0.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.zeros(3, dtype=ht.int) DNDarray([0, 0, 0], dtype=ht.int32, device=cpu:0, split=None) - >>> ht.zeros((2, 3,)) + >>> ht.zeros( + ... ( + ... 2, + ... 3, + ... ) + ... ) DNDarray([[0., 0., 0.], [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None) """ @@ -1525,7 +1553,12 @@ def zeros_like( Examples -------- - >>> x = ht.ones((2, 3,)) + >>> x = ht.ones( + ... ( + ... 2, + ... 3, + ... ) + ... ) >>> x DNDarray([[1., 1., 1.], [1., 1., 1.]], dtype=ht.float32, device=cpu:0, split=None) diff --git a/heat/core/indexing.py b/heat/core/indexing.py index 33d94c04d0..e66ecd3203 100644 --- a/heat/core/indexing.py +++ b/heat/core/indexing.py @@ -115,14 +115,14 @@ def where( if only x or y is given or both are not DNDarrays or numerical scalars Notes - ------- + ----- When only condition is provided, this function is a shorthand for :func:`nonzero`. Examples -------- >>> import heat as ht >>> x = ht.arange(10, split=0) - >>> ht.where(x < 5, x, 10*x) + >>> ht.where(x < 5, x, 10 * x) DNDarray([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90], dtype=ht.int64, device=cpu:0, split=0) >>> y = ht.array([[0, 1, 2], [0, 2, 4], [0, 3, 6]]) >>> ht.where(y < 4, y, -1) diff --git a/heat/core/io.py b/heat/core/io.py index 427c7b8d49..aae6ab5b2c 100644 --- a/heat/core/io.py +++ b/heat/core/io.py @@ -2,6 +2,8 @@ from __future__ import annotations +from functools import reduce +import operator import os.path from math import log10 import numpy as np @@ -27,6 +29,7 @@ __HDF5_EXTENSIONS = frozenset([".h5", ".hdf5"]) __NETCDF_EXTENSIONS = frozenset([".nc", ".nc4", "netcdf"]) __NETCDF_DIM_TEMPLATE = "{}_dim_{}" +__ZARR_EXTENSIONS = frozenset([".zarr"]) __all__ = [ "load", @@ -36,8 +39,32 @@ "supports_hdf5", "supports_netcdf", "load_npy_from_path", + "supports_zarr", ] + +def size_from_slice(size: int, s: slice) -> Tuple[int, int]: + """ + Determines the size of a slice object. + + Parameters + ---------- + size: int + The size of the array the slice object is applied to. + s : slice + The slice object to determine the size of. + + Returns + ------- + int + The size of the sliced object. + int + The start index of the slice object. + """ + new_range = range(size)[s] + return len(new_range), new_range.start if len(new_range) > 0 else 0 + + try: import netCDF4 as nc except ImportError: @@ -99,20 +126,20 @@ def load_netcdf( The device id on which to place the data, defaults to globally set default device. Raises - ------- + ------ TypeError If any of the input parameters are not of correct type. Examples -------- - >>> a = ht.load_netcdf('data.nc', variable='DATA') + >>> a = ht.load_netcdf("data.nc", variable="DATA") >>> a.shape [0/2] (5,) [1/2] (5,) >>> a.lshape [0/2] (5,) [1/2] (5,) - >>> b = ht.load_netcdf('data.nc', variable='DATA', split=0) + >>> b = ht.load_netcdf("data.nc", variable="DATA", split=0) >>> b.shape [0/2] (5,) [1/2] (5,) @@ -189,7 +216,7 @@ def save_netcdf( additional arguments passed to the created dataset. Raises - ------- + ------ TypeError If any of the input parameters are not of correct type. ValueError @@ -199,7 +226,7 @@ def save_netcdf( Examples -------- >>> x = ht.arange(100, split=0) - >>> ht.save_netcdf(x, 'data.nc', dataset='DATA') + >>> ht.save_netcdf(x, "data.nc", dataset="DATA") """ if not isinstance(data, DNDarray): raise TypeError(f"data must be heat tensor, not {type(data)}") @@ -251,7 +278,7 @@ def __get_expanded_split( split-axis of dndarray. Raises - ------- + ------ ValueError If resulting shapes do not match. """ @@ -295,7 +322,7 @@ def __merge_slices( data_slices: Optional[Tuple[int, slice]] = None, ) -> Tuple[Union[int, slice]]: """ - This method allows replacing: + Allows replacing: ``var[var_slices][data_slices] = data`` (a `netcdf4.Variable.__getitem__` and a `numpy.ndarray.__setitem__` call) @@ -489,7 +516,7 @@ def load_hdf5( path: str, dataset: str, dtype: datatype = types.float32, - load_fraction: float = 1.0, + slices: Optional[Tuple[Optional[slice], ...]] = None, split: Optional[int] = None, device: Optional[str] = None, comm: Optional[Communication] = None, @@ -505,10 +532,8 @@ def load_hdf5( Name of the dataset to be read. dtype : datatype, optional Data type of the resulting array. - load_fraction : float between 0. (excluded) and 1. (included), default is 1. - if 1. (default), the whole dataset is loaded from the file specified in path - else, the dataset is loaded partially, with the fraction of the dataset (along the split axis) specified by load_fraction - If split is None, load_fraction is automatically set to 1., i.e. the whole dataset is loaded. + slices : tuple of slice objects, optional + Load only the specified slices of the dataset. split : int or None, optional The axis along which the data is distributed among the processing cores. device : str, optional @@ -517,26 +542,79 @@ def load_hdf5( The communication to use for the data distribution. Raises - ------- + ------ TypeError If any of the input parameters are not of correct type Examples -------- - >>> a = ht.load_hdf5('data.h5', dataset='DATA') + >>> a = ht.load_hdf5("data.h5", dataset="DATA") >>> a.shape [0/2] (5,) [1/2] (5,) >>> a.lshape [0/2] (5,) [1/2] (5,) - >>> b = ht.load_hdf5('data.h5', dataset='DATA', split=0) + >>> b = ht.load_hdf5("data.h5", dataset="DATA", split=0) >>> b.shape [0/2] (5,) [1/2] (5,) >>> b.lshape [0/2] (3,) [1/2] (2,) + + Using the slicing argument: + >>> not_sliced = ht.load_hdf5("other_data.h5", dataset="DATA", split=0) + >>> not_sliced.shape + [0/2] (10,2) + [1/2] (10,2) + >>> not_sliced.lshape + [0/2] (5,2) + [1/2] (5,2) + >>> not_sliced.larray + [0/2] [[ 0, 1], + [ 2, 3], + [ 4, 5], + [ 6, 7], + [ 8, 9]] + [1/2] [[10, 11], + [12, 13], + [14, 15], + [16, 17], + [18, 19]] + + >>> sliced = ht.load_hdf5("other_data.h5", dataset="DATA", split=0, slices=slice(8)) + >>> sliced.shape + [0/2] (8,2) + [1/2] (8,2) + >>> sliced.lshape + [0/2] (4,2) + [1/2] (4,2) + >>> sliced.larray + [0/2] [[ 0, 1], + [ 2, 3], + [ 4, 5], + [ 6, 7]] + [1/2] [[ 8, 9], + [10, 11], + [12, 13], + [14, 15], + [16, 17]] + + >>> sliced = ht.load_hdf5('other_data.h5', dataset='DATA', split=0, slices=(slice(2,8), slice(0,1)) + >>> sliced.shape + [0/2] (6,1) + [1/2] (6,1) + >>> sliced.lshape + [0/2] (3,1) + [1/2] (3,1) + >>> sliced.larray + [0/2] [[ 4, ], + [ 6, ], + [ 8, ]] + [1/2] [[10, ], + [12, ], + [14, ]] """ if not isinstance(path, str): raise TypeError(f"path must be str, not {type(path)}") @@ -545,14 +623,6 @@ def load_hdf5( elif split is not None and not isinstance(split, int): raise TypeError(f"split must be None or int, not {type(split)}") - if not isinstance(load_fraction, float): - raise TypeError(f"load_fraction must be float, but is {type(load_fraction)}") - else: - if split is not None and (load_fraction <= 0.0 or load_fraction > 1.0): - raise ValueError( - f"load_fraction must be between 0. (excluded) and 1. (included), but is {load_fraction}." - ) - # infer the type and communicator for the loaded array dtype = types.canonical_heat_type(dtype) # determine the comm and device the data will be placed on @@ -563,13 +633,33 @@ def load_hdf5( with h5py.File(path, "r") as handle: data = handle[dataset] gshape = data.shape - if split is not None: - gshape = list(gshape) - gshape[split] = int(gshape[split] * load_fraction) - gshape = tuple(gshape) + new_gshape = tuple() + offsets = [0] * len(gshape) + if slices is not None: + for i in range(len(gshape)): + if i < len(slices) and slices[i]: + s = slices[i] + if s.step is not None and s.step != 1: + raise ValueError("Slices with step != 1 are not supported") + new_axis_size, offset = size_from_slice(gshape[i], s) + new_gshape += (new_axis_size,) + offsets[i] = offset + else: + new_gshape += (gshape[i],) + offsets[i] = 0 + + gshape = new_gshape + dims = len(gshape) split = sanitize_axis(gshape, split) _, _, indices = comm.chunk(gshape, split) + + if slices is not None: + new_indices = tuple() + for offset, index in zip(offsets, indices): + new_indices += (slice(index.start + offset, index.stop + offset),) + indices = new_indices + balanced = True if split is None: data = torch.tensor( @@ -614,7 +704,7 @@ def save_hdf5( Additional arguments passed to the created dataset. Raises - ------- + ------ TypeError If any of the input parameters are not of correct type. ValueError @@ -623,7 +713,7 @@ def save_hdf5( Examples -------- >>> x = ht.arange(100, split=0) - >>> ht.save_hdf5(x, 'data.h5', dataset='DATA') + >>> ht.save_hdf5(x, "data.h5", dataset="DATA") """ if not isinstance(data, DNDarray): raise TypeError(f"data must be heat tensor, not {type(data)}") @@ -695,7 +785,7 @@ def load( Additional options passed to the particular functions. Raises - ------- + ------ ValueError If the file extension is not understood or known. RuntimeError @@ -703,10 +793,20 @@ def load( Examples -------- - >>> ht.load('data.h5', dataset='DATA') + >>> ht.load("data.h5", dataset="DATA") DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.load('data.nc', variable='DATA') + >>> ht.load("data.nc", variable="DATA") DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None) + + See Also + -------- + :func:`load_csv` : Loads data from a CSV file. + :func:`load_csv_from_folder` : Loads multiple .csv files into one DNDarray which will be returned. + :func:`load_hdf5` : Loads data from an HDF5 file. + :func:`load_netcdf` : Loads data from a NetCDF4 file. + :func:`load_npy_from_path` : Loads multiple .npy files into one DNDarray which will be returned. + :func:`load_zarr` : Loads zarr-Format into DNDarray which will be returned. + """ if not isinstance(path, str): raise TypeError(f"Expected path to be str, but was {type(path)}") @@ -724,6 +824,12 @@ def load( return load_netcdf(path, *args, **kwargs) else: raise RuntimeError(f"netcdf is required for file extension {extension}") + elif extension in __ZARR_EXTENSIONS: + if supports_zarr(): + return load_zarr(path, *args, **kwargs) + else: + raise RuntimeError(f"Package zarr is required for file extension {extension}") + else: raise ValueError(f"Unsupported file extension {extension}") @@ -762,14 +868,14 @@ def load_csv( The communication to use for the data distribution, defaults to global default Raises - ------- + ------ TypeError If any of the input parameters are not of correct type. Examples -------- >>> import heat as ht - >>> a = ht.load_csv('data.csv') + >>> a = ht.load_csv("data.csv") >>> a.shape [0/3] (150, 4) [1/3] (150, 4) @@ -780,7 +886,7 @@ def load_csv( [1/3] (38, 4) [2/3] (37, 4) [3/3] (37, 4) - >>> b = ht.load_csv('data.csv', header_lines=10) + >>> b = ht.load_csv("data.csv", header_lines=10) >>> b.shape [0/3] (140, 4) [1/3] (140, 4) @@ -833,12 +939,12 @@ def load_csv( f.seek(displs[rank], 0) line_starts = [] r = f.read(counts[rank]) - for pos, l in enumerate(r): - if chr(l) == "\n": + for pos, line in enumerate(r): + if chr(line) == "\n": # Check if it is part of '\r\n' if chr(r[pos - 1]) != "\r": line_starts.append(pos + 1) - elif chr(l) == "\r": + elif chr(line) == "\r": # check if file line is terminated by '\r\n' if pos + 1 < len(r) and chr(r[pos + 1]) == "\n": line_starts.append(pos + 2) @@ -1107,7 +1213,7 @@ def save( Additional options passed to the particular functions. Raises - ------- + ------ ValueError If the file extension is not understood or known. RuntimeError @@ -1116,7 +1222,7 @@ def save( Examples -------- >>> x = ht.arange(100, split=0) - >>> ht.save(x, 'data.h5', 'DATA', mode='a') + >>> ht.save(x, "data.h5", "DATA", mode="a") """ if not isinstance(path, str): raise TypeError(f"Expected path to be str, but was {type(path)}") @@ -1134,6 +1240,11 @@ def save( raise RuntimeError(f"netcdf is required for file extension {extension}") elif extension in __CSV_EXTENSION: save_csv(data, path, *args, **kwargs) + elif extension in __ZARR_EXTENSIONS: + if supports_zarr(): + return save_zarr(data, path, *args, **kwargs) + else: + raise RuntimeError(f"Package zarr is required for file extension {extension}") else: raise ValueError(f"Unsupported file extension {extension}") @@ -1283,3 +1394,227 @@ def load_csv_from_folder( larray = torch.from_numpy(larray) x = factories.array(larray, dtype=dtype, device=device, is_split=split, comm=comm) return x + + +try: + import zarr +except ModuleNotFoundError: + + def supports_zarr() -> bool: + """ + Returns ``True`` if zarr is installed, ``False`` otherwise. + """ + return False + +else: + __all__.extend(["load_zarr", "save_zarr"]) + + def supports_zarr() -> bool: + """ + Returns ``True`` if zarr is installed, ``False`` otherwise. + """ + return True + + def load_zarr( + path: str, + split: int = 0, + device: Optional[str] = None, + comm: Optional[Communication] = None, + slices: Union[None, slice, Iterable[Union[slice, None]]] = None, + **kwargs, + ) -> DNDarray: + """ + Loads zarr-Format into DNDarray which will be returned. + + Parameters + ---------- + path : str + Path to the directory in which a .zarr-file is located. + split : int + Along which axis the loaded arrays should be concatenated. + device : str, optional + The device id on which to place the data, defaults to globally set default device. + comm : Communication, optional + The communication to use for the data distribution, default is 'heat.MPI_WORLD' + slices: Union[None, slice, Iterable[Union[slice, None]]] + Load only a slice of the array instead of everything + **kwargs : Any + extra Arguments to pass to zarr.open + """ + if not isinstance(path, str): + raise TypeError(f"path must be str, not {type(path)}") + if split is not None and not isinstance(split, int): + raise TypeError(f"split must be None or int, not {type(split)}") + if device is not None and not isinstance(device, str): + raise TypeError(f"device must be None or str, not {type(split)}") + if not isinstance(slices, (slice, Iterable)) and slices is not None: + raise TypeError(f"Slices Argument must be slice, tuple or None and not {type(slices)}") + if isinstance(slices, Iterable): + for elem in slices: + if isinstance(elem, slice) or elem is None: + continue + raise TypeError(f"Tuple values of slices must be slice or None, not {type(elem)}") + + for extension in __ZARR_EXTENSIONS: + if fnmatch.fnmatch(path, f"*{extension}"): + break + else: + raise ValueError("File has no zarr extension.") + + arr: zarr.Array = zarr.open_array(store=path, **kwargs) + shape = arr.shape + + if isinstance(slices, slice) or slices is None: + slices = [slices] + + if len(shape) < len(slices): + raise ValueError( + f"slices Argument has more arguments than the length of the shape of the array. {len(shape)} < {len(slices)}" + ) + + slices = [elem if elem is not None else slice(None) for elem in slices] + slices.extend([slice(None) for _ in range(abs(len(slices) - len(shape)))]) + + dtype = types.canonical_heat_type(arr.dtype) + device = devices.sanitize_device(device) + comm = sanitize_comm(comm) + + # slices = tuple(slice(*tslice.indices(length)) for length, tslice in zip(shape, slices)) + slices = tuple(slices) + shape = [len(range(*tslice.indices(length))) for length, tslice in zip(shape, slices)] + offset, local_shape, local_slices = comm.chunk(shape, split) + + return factories.array( + arr[slices][local_slices], dtype=dtype, is_split=split, device=device, comm=comm + ) + + def save_zarr(dndarray: DNDarray, path: str, overwrite: bool = False, **kwargs) -> None: + """ + Writes the DNDArray into the zarr-format. + + Parameters + ---------- + dndarray : DNDarray + DNDArray to save. + path : str + path to save to. + overwrite : bool + Wether to overwrite an existing array. + **kwargs : Any + extra Arguments to pass to zarr.open and zarr.create + + Raises + ------ + TypeError + - If given parameters do not match or have conflicting information. + - If it already exists and no overwrite is specified. + + Notes + ----- + Zarr functions by chunking the data, were a chunk is a file inside the store. + The problem ist that only one process writes to it at a time. Therefore when two + processes try to write to the same chunk one will fail, unless the other finishes before + the other starts. + + To alleviate it we can define the chunk sizes ourselves. To do this we just get the lowest size of + the distributed axis, ex: split=0 with a (4,4) shape with a worldsize of 4 you would chunk it with (1,4). + + A problem arises when a process gets a bigger chunk and interferes with another process. Example: + N_PROCS = 4 + SHAPE = (9,10) + SPLIT = 0 + CHUNKS => (2,10) + + In this problem one process will have a write region of 3 rows and therefore be able to either not write + or overwrite what another process does therefore destroying the parallel write as it would at the end load + 2 chunks to write 3 rows. + To counter act this we just set the chunk size in the split axis to 1. This allows for no overwrites but can + cripple write speeds and or even speed it up. + + Another Problem with this approach is that we tell zarr have full chunks, i.e if array has shape (10_000, 10_000) + and we split it at axis=0 with 4 processes we have chunks of (2_500, 10_000). Zarr will load the whole chunk into + memory making it memory intensive and probably inefficient. Better approach would be to have a smaller chunk size + for example half of it but that cannot be determined at all times so the current approach is a compromise. + + Another Problem is the split=None scenario. In this case every processs has the same data, so only one needs to write + so we ignore chunking and let zarr decide the chunk size and let only one process, aka rank=0 write. + + To avoid errors when using NumPy arrays as chunk shape, the chunks argument is only passed to zarr.create if it is + not None. This prevents issues with ambiguous truth values or attribute errors on None. + + """ + if not isinstance(path, str): + raise TypeError(f"path must be str, not {type(path)}") + + for extension in __ZARR_EXTENSIONS: + if fnmatch.fnmatch(path, f"*{extension}"): + break + else: + raise ValueError("path does not end on an Zarr extension.") + + if os.path.exists(path) and not overwrite: + raise RuntimeError("Given Path already exists.") + + if MPI_WORLD.rank == 0: + if dndarray.split is None or MPI_WORLD.size == 1: + chunks = None + else: + chunks = np.array(dndarray.gshape) + axis = dndarray.split + + if chunks[axis] % MPI_WORLD.size != 0: + chunks[axis] = 1 + else: + chunks[axis] //= MPI_WORLD.size + + CODEC_LIMIT_BYTES = 2**31 - 1 # PR#1766 + + for _ in range( + 10 + ): # Use for loop instead of while true for better handling of edge cases + byte_size = reduce(operator.mul, chunks, 1) * dndarray.larray.element_size() + if byte_size > CODEC_LIMIT_BYTES: + if chunks[axis] % 2 == 0: + chunks[axis] /= 2 + continue + else: + chunks[axis] = 1 + break + else: + break + else: + chunks[axis] = 1 + warnings.warn( + "Calculation of chunk size for zarr format unexpectadly defaulted to 1 on the split axis" + ) + + dtype = dndarray.dtype.char() + + zarr_create_kwargs = { + "store": path, + "shape": dndarray.gshape, + "dtype": dtype, + "overwrite": overwrite, + **kwargs, + } + + if chunks is not None: + zarr_create_kwargs["chunks"] = chunks.tolist() + + zarr_array = zarr.create(**zarr_create_kwargs) + + # Wait for the file creation to finish + MPI_WORLD.Barrier() + zarr_array = zarr.open(store=path, mode="r+", **kwargs) + + if dndarray.split is not None: + _, _, slices = MPI_WORLD.chunk(dndarray.gshape, dndarray.split) + + zarr_array[slices] = ( + dndarray.larray.cpu().numpy() # Numpy array needed as zarr can only understand numpy dtypes and infers it. + ) + else: + if MPI_WORLD.rank == 0: + zarr_array[:] = dndarray.larray.cpu().numpy() + + MPI_WORLD.Barrier() diff --git a/heat/core/linalg/__init__.py b/heat/core/linalg/__init__.py index d4a2b1f972..d3a75b2ae3 100644 --- a/heat/core/linalg/__init__.py +++ b/heat/core/linalg/__init__.py @@ -7,3 +7,5 @@ from .qr import * from .svdtools import * from .svd import * +from .polar import * +from .eigh import * diff --git a/heat/core/linalg/basics.py b/heat/core/linalg/basics.py index 2c0a786138..7a403b08fd 100644 --- a/heat/core/linalg/basics.py +++ b/heat/core/linalg/basics.py @@ -24,8 +24,12 @@ from .. import statistics from .. import stride_tricks from .. import types +from ..random import randn +from .qr import qr +from .solver import solve_triangular __all__ = [ + "condest", "cross", "det", "dot", @@ -45,6 +49,116 @@ ] +def _estimate_largest_singularvalue(A: DNDarray, algorithm: str = "fro") -> DNDarray: + """ + Computes an upper estimate for the largest singular value of the input 2D DNDarray. + + Parameters + ---------- + A : DNDarray + The matrix, i.e., a 2D DNDarray, for which the largest singular value should be estimated. + algorithm : str + The algorithm to use for the estimation. Currently, only "fro" (default) is implemented. + If "fro" is chosen, the Frobenius norm of the matrix is used as an upper estimate. + """ + if not isinstance(algorithm, str): + raise TypeError( + f"Parameter 'algorithm' needs to be a string, but is {algorithm} with data type {type(algorithm)}." + ) + if algorithm == "fro": + return matrix_norm(A, ord="fro").squeeze() + else: + raise NotImplementedError("So far only algorithm='fro' implemented.") + + +def condest( + A: DNDarray, p: Union[int, str] = None, algorithm: str = "randomized", params: list = None +) -> DNDarray: + """ + Computes a (possibly randomized) upper estimate of the l2-condition number of the input 2D DNDarray. + + Parameters + ---------- + A : DNDarray + The matrix, i.e., a 2D DNDarray, for which the condition number shall be estimated. + p : int or str (optional) + The norm to use for the condition number computation. If None, the l2-norm (default, p=2) is used. + So far, only p=2 is implemented. + algorithm : str + The algorithm to use for the estimation. Currently, only "randomized" (default) is implemented. + params : dict (optional) + A list of parameters required for the chosen algorithm; if not provided, default values for the respective algorithm are chosen. + If `algorithm="randomized"` the number of random samples to use can be specified under the key "nsamples"; default is 10. + + Notes + ----- + The "randomized" algorithm follows the approach described in [1]; note that in the paper actually the condition number w.r.t. the Frobenius norm is estimated. + However, this yields an upper bound for the condition number w.r.t. the l2-norm as well. + + References + ---------- + [1] T. Gudmundsson, C. S. Kenney, and A. J. Laub. Small-Sample Statistical Estimates for Matrix Norms. SIAM Journal on Matrix Analysis and Applications 1995 16:3, 776-792. + """ + if p is None: + p = 2 + if p != 2: + raise ValueError( + f"Only the case p=2 (condition number w.r.t. the euclidean norm) is implemented so far, but input was p={p} (type: {type(p)})." + ) + if not isinstance(algorithm, str): + raise TypeError( + f"Parameter 'algorithm' needs to be a string, but is {algorithm} with data type {type(algorithm)}." + ) + if algorithm == "randomized": + if params is None: + nsamples = 10 # set default value + else: + if not isinstance(params, dict) or "nsamples" not in params: + raise TypeError( + "If not None, 'params' needs to be a dictionary containing the number of samples under the key 'nsamples'." + ) + if not isinstance(params["nsamples"], int) or params["nsamples"] <= 0: + raise ValueError( + f"The number of samples needs to be a positive integer, but is {params['nsamples']} with data type {type(params['nsamples'])}." + ) + nsamples = params["nsamples"] + + m = A.shape[0] + n = A.shape[1] + + if n > m: + # the algorithm only works for m >= n, but fortunately, the condition number (w.r.t. l2-norm) is invariant under transposition + return condest(A.T, p=p, algorithm=algorithm, params=params) + + _, R = qr(A, mode="r") # only R factor is computed in QR + + # random samples from unit sphere + # regarding the split: if A.split == 1, then n is probably large and we should split along an axis of size n; otherwise, both n and nsamples should be small + Q, R_not_used = qr( + randn( + n, + nsamples, + dtype=A.dtype, + split=0 if A.split == 1 else None, + device=A.device, + comm=A.comm, + ) + ) + del R_not_used + + est = ( + matrix_norm(R @ Q) + * A.dtype((m / nsamples) ** 0.5, comm=A.comm) + * matrix_norm(solve_triangular(R, Q)) + ) + + return est.squeeze() + else: + raise NotImplementedError( + "So far only algorithm='randomized' is implemented. Please open an issue on GitHub if you would like to suggest implementing another algorithm." + ) + + def cross( a: DNDarray, b: DNDarray, axisa: int = -1, axisb: int = -1, axisc: int = -1, axis: int = -1 ) -> DNDarray: @@ -174,7 +288,7 @@ def det(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([[-2,-1,2],[2,1,4],[-3,3,-1]]) + >>> a = ht.array([[-2, -1, 2], [2, 1, 4], [-3, 3, -1]]) >>> ht.linalg.det(a) DNDarray(54., dtype=ht.float64, device=cpu:0, split=None) """ @@ -328,7 +442,7 @@ def inv(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([[1., 2], [2, 3]]) + >>> a = ht.array([[1.0, 2], [2, 3]]) >>> ht.linalg.inv(a) DNDarray([[-3., 2.], [ 2., -1.]], dtype=ht.float32, device=cpu:0, split=None) @@ -347,7 +461,13 @@ def inv(a: DNDarray) -> DNDarray: # no split in the square matrices if not a.is_distributed() or a.split < a.ndim - 2: - data = torch.inverse(a.larray) + try: + data = torch.inverse(a.larray) + except RuntimeError as e: + raise RuntimeError(e) + # torch.linalg.inv does not raise RuntimeError on MPS when inversion fails + if data.is_mps and torch.any(data.isnan()): + raise RuntimeError("linalg.inv: inversion could not be performed") return DNDarray( data, a.shape, @@ -428,23 +548,23 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray: Batched inputs (with batch dimensions being leading dimensions) are allowed; see also the Notes below. Parameters - ----------- + ---------- a : DNDarray - matrix :math:`L \\times P` or vector :math:`P` or batch of matrices/vectors: :math:`B_1 \\times ... \\times B_k [\\times L] \\times P` + matrix :math:`L \\times P` or vector :math:`P` or batch of matrices: :math:`B_1 \\times ... \\times B_k \\times L \\times P` b : DNDarray - matrix :math:`P \\times Q` or vector :math:`P` or batch of matrices/vectors: :math:`B_1 \\times ... \\times B_k \\times P [\\times Q]` + matrix :math:`P \\times Q` or vector :math:`P` or batch of matrices: :math:`B_1 \\times ... \\times B_k \\times P \\times Q` allow_resplit : bool, optional Whether to distribute ``a`` in the case that both ``a.split is None`` and ``b.split is None``. Default is ``False``. If ``True``, if both are not split then ``a`` will be distributed in-place along axis 0. Notes - ----------- + ----- - For batched inputs, batch dimensions must coincide and if one matrix is split along a batch axis the other must be split along the same axis. - - If ``a`` or ``b`` is a (possibly batched) vector the result will also be a (possibly batched) vector. + - If ``a`` or ``b`` is a vector the result will also be a vector. - We recommend to avoid the particular split combinations ``1``-``0``, ``None``-``0``, and ``1``-``None`` (for ``a.split``-``b.split``) due to their comparably high memory consumption, if possible. Applying ``DNDarray.resplit_`` or ``heat.resplit`` on one of the two factors before calling ``matmul`` in these situations might improve performance of your code / might avoid memory bottlenecks. References - ----------- + ---------- [1] R. Gu, et al., "Improving Execution Concurrency of Large-scale Matrix Multiplication on Distributed Data-parallel Platforms," IEEE Transactions on Parallel and Distributed Systems, vol 28, no. 9. 2017. \n @@ -453,7 +573,7 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray: Workshops (IPDPSW), Vancouver, BC, 2018, pp. 877-882. Examples - ----------- + -------- >>> a = ht.ones((n, m), split=1) >>> a[0] = ht.arange(1, m + 1) >>> a[:, -1] = ht.arange(1, n + 1).larray @@ -529,6 +649,10 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray: raise NotImplementedError( "Both input matrices have to be split along the same batch axis!" ) + if vector_flag: # batched matrix vector multiplication not supported + raise NotImplementedError( + "Batched matrix-vector multiplication is not supported, try using expand_dims to make it a batched matrix-matrix multiplication." + ) comm = a.comm ndim = max(a.ndim, b.ndim) @@ -695,11 +819,11 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray: kB, a.gshape[-1] ) # shouldnt this always be kB and be the same as for split 11? - if a.lshape[-1] % kB != 0 or ( - kB == 1 and a.lshape[-1] != 1 - ): # does kb == 1 imply a.lshape[-1] > 1? + if (kB == 1 and a.lshape[-1] != 1) or a.lshape[ + -1 + ] % kB != 0: # does kb == 1 imply a.lshape[-1] > 1? rem_a = 1 - if b.lshape[-2] % kB != 0 or (kB == 1 and b.lshape[-2] != 1): + if (kB == 1 and b.lshape[-2] != 1) or b.lshape[-2] % kB != 0: rem_b = 1 # get the lshape map to determine what needs to be sent where as well as M and N @@ -1236,9 +1360,9 @@ def matrix_norm( Examples -------- - >>> ht.matrix_norm(ht.array([[1,2],[3,4]])) + >>> ht.matrix_norm(ht.array([[1, 2], [3, 4]])) DNDarray([[5.4772]], dtype=ht.float64, device=cpu:0, split=None) - >>> ht.matrix_norm(ht.array([[1,2],[3,4]]), keepdims=True, ord=-1) + >>> ht.matrix_norm(ht.array([[1, 2], [3, 4]]), keepdims=True, ord=-1) DNDarray([[4.]], dtype=ht.float64, device=cpu:0, split=None) """ sanitation.sanitize_in(x) @@ -1382,9 +1506,9 @@ def norm( DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None) >>> LA.norm(b) DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None) - >>> LA.norm(b, ord='fro') + >>> LA.norm(b, ord="fro") DNDarray(7.7460, dtype=ht.float32, device=cpu:0, split=None) - >>> LA.norm(a, float('inf')) + >>> LA.norm(a, float("inf")) DNDarray([4.], dtype=ht.float32, device=cpu:0, split=None) >>> LA.norm(b, ht.inf) DNDarray([9.], dtype=ht.float32, device=cpu:0, split=None) @@ -1416,8 +1540,8 @@ def norm( DNDarray([3.7417, 4.2426], dtype=ht.float64, device=cpu:0, split=None) >>> LA.norm(c, axis=1, ord=1) DNDarray([6., 6.], dtype=ht.float64, device=cpu:0, split=None) - >>> m = ht.arange(8).reshape(2,2,2) - >>> LA.norm(m, axis=(1,2)) + >>> m = ht.arange(8).reshape(2, 2, 2) + >>> LA.norm(m, axis=(1, 2)) DNDarray([ 3.7417, 11.2250], dtype=ht.float32, device=cpu:0, split=None) >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :]) (DNDarray(3.7417, dtype=ht.float32, device=cpu:0, split=None), DNDarray(11.2250, dtype=ht.float32, device=cpu:0, split=None)) @@ -2329,11 +2453,11 @@ def vdot(x1: DNDarray, x2: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([1+1j, 2+2j]) - >>> b = ht.array([1+2j, 3+4j]) - >>> ht.vdot(a,b) + >>> a = ht.array([1 + 1j, 2 + 2j]) + >>> b = ht.array([1 + 2j, 3 + 4j]) + >>> ht.vdot(a, b) DNDarray([(17+3j)], dtype=ht.complex64, device=cpu:0, split=None) - >>> ht.vdot(b,a) + >>> ht.vdot(b, a) DNDarray([(17-3j)], dtype=ht.complex64, device=cpu:0, split=None) """ x1 = manipulations.flatten(x1) @@ -2366,7 +2490,7 @@ def vecdot( Examples -------- - >>> ht.vecdot(ht.full((3,3,3),3), ht.ones((3,3)), axis=0) + >>> ht.vecdot(ht.full((3, 3, 3), 3), ht.ones((3, 3)), axis=0) DNDarray([[9., 9., 9.], [9., 9., 9.], [9., 9., 9.]], dtype=ht.float32, device=cpu:0, split=None) @@ -2433,9 +2557,9 @@ def vector_norm( Examples -------- - >>> ht.vector_norm(ht.array([1,2,3,4])) + >>> ht.vector_norm(ht.array([1, 2, 3, 4])) DNDarray([5.4772], dtype=ht.float64, device=cpu:0, split=None) - >>> ht.vector_norm(ht.array([[1,2],[3,4]]), axis=0, ord=1) + >>> ht.vector_norm(ht.array([[1, 2], [3, 4]]), axis=0, ord=1) DNDarray([[4., 6.]], dtype=ht.float64, device=cpu:0, split=None) """ sanitation.sanitize_in(x) diff --git a/heat/core/linalg/eigh.py b/heat/core/linalg/eigh.py new file mode 100644 index 0000000000..0a7524447d --- /dev/null +++ b/heat/core/linalg/eigh.py @@ -0,0 +1,309 @@ +""" +Implements Symmetric Eigenvalue Decomposition +""" + +import numpy as np +import collections +import torch +from typing import Type, Callable, Dict, Any, TypeVar, Union, Tuple + +from ..dndarray import DNDarray +from .. import factories +from .. import types +from ..linalg import matrix_norm, vector_norm, matmul, qr, polar +from ..indexing import where +from ..random import randn +from ..devices import Device +from ..manipulations import vstack, hstack, concatenate, diag, balance +from .. import statistics +from mpi4py import MPI +from ..sanitation import sanitize_in_nd_realfloating + + +__all__ = ["eigh"] + + +def _subspaceiteration( + A: DNDarray, + C: DNDarray, + silent: bool = True, + safetyparam: int = 3, + maxit: int = None, + tol: float = None, + depth: int = 0, +) -> DNDarray: + """ + Auxiliary function that implements the subspace iteration as required for symmetric eigenvalue decomposition + via polar decomposition; cf. Ref. 2 below. The algorithm for subspace iteration itself is taken from Ref. 1, + Algorithm 3 in Sect. 5.1. + + Given a symmetric matrix ``A`` and and a matrix ``C`` that is the orthogonal projection onto an invariant + subspace of A, this function computes and returns an orthogonal matrix ``Q`` such that Q = [V_1 V_2] with + C = V_1 V_1.T. Moreover, the dimension of the invariant subspace, i.e., the number of column of V_1 is + returned as well. + + References + ---------- + 1. Nakatsukasa, Y., & Higham, N. J. (2013). Stable and efficient spectral divide and conquer algorithms for + Hermitian eigenproblems. SIAM Journal on Scientific Computing, 35(3). + 2. Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the + matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3). + """ + # set parameters for convergence + if A.dtype == types.float64: + maxit = 3 if maxit is None else maxit + tol = 1e-8 if tol is None else tol + elif A.dtype == types.float32: + maxit = 6 if maxit is None else maxit + tol = 1e-4 if tol is None else tol + else: + raise TypeError( + f"Input DNDarray must be of data type float32 or float64, but is of type {A.dtype}." + ) + + Anorm = matrix_norm(A, ord="fro") + + # this initialization is proposed in Ref. 1, Sect. 5.1 + k = int(round(matrix_norm(C, ord="fro").item() ** 2)) + columnnorms = vector_norm(C, axis=0) + idx = where( + columnnorms + >= factories.ones( + columnnorms.shape, + comm=columnnorms.comm, + split=columnnorms.split, + device=columnnorms.device, + ) + * statistics.percentile(columnnorms, 100.0 * (1 - (k + safetyparam) / columnnorms.shape[0])) + ) + X = C[:, idx].balance() + + # actual subspace iteration + it = 1 + while it < maxit + 1: + # enrich X by additional random columns to get a full orthonormal basis by QR + X = hstack( + [ + X, + randn( + X.shape[0], + X.shape[0] - X.shape[1], + dtype=X.dtype, + device=X.device, + comm=X.comm, + split=X.split, + ), + ] + ) + Q, _ = qr(X) + Q_k = Q[:, :k].balance() + Q_k_orth = Q[:, k:].balance() + E = (Q_k_orth.T @ A) @ Q_k + Enorm = matrix_norm(E, ord="fro") + if Enorm / Anorm < tol: + # exit if success + if A.comm.rank == 0 and not silent: + print("\t" * depth + f" Number of subspace iterations: {it}") + return Q, k + # else go on with iteration + X = C @ Q_k + it += 1 + # warning if the iteration did not converge within the maximum number of iterations + if A.comm.rank == 0 and not silent: + print( + "\t" * depth + + f" Subspace iteration did not converge in {maxit} iterations. \n" + + "\t" * depth + + f" It holds ||E||_F/||A||_F = {Enorm / Anorm}, which might impair the accuracy of the result." # noqa E226 + ) + return Q, k + + +def _eigh( + A: DNDarray, + r: int = None, + silent: bool = True, + r_max: int = 8, + depth: int = 0, + orig_lsize: int = 0, +) -> Tuple[DNDarray, DNDarray]: + """ + Auxiliary function for eigh containing the main algorithmic content. + Inputs are as for the public `eigh`-function, except for: + `depth`: an internal variable that is used to track the recursion depth, + `orig_lsize` an internal variable that is used to propagate the local shapes of the original input matrix + through the recursions in order to determine when the direct solution of the reduced problems is possible), + `r`: a hyperparameter for the computation of the polar decomposition via :func:`heat.linalg.polar` which is + applied multiple times in this function. See the documentation of :func:`heat.linalg.polar` for more details. + In the actual implementation, this parameter is set to `None` for simplicity. + """ + n = A.shape[0] + global_comm = A.comm + nprocs = global_comm.Get_size() + rank = global_comm.rank + + # direct solution in torch if the problem is small enough + if n <= orig_lsize or not A.is_distributed(): + orig_split = A.split + A.resplit_(None) + Lambda_loc, Q_loc = torch.linalg.eigh(A.larray) + Lambda = factories.array(torch.flip(Lambda_loc, (0,)), split=0, comm=A.comm) + V = factories.array(torch.flip(Q_loc, (1,)), split=orig_split, comm=A.comm) + A.resplit_(orig_split) + return Lambda, V + + if orig_lsize == 0: + orig_lsize = min(A.lshape_map[:, A.split]) + + # now we handle the main case: Zolo-PD is used to reduce the problem to two independent problems + sigma = statistics.median(diag(A)) + + U = polar.polar( + A + - sigma * factories.eye((n, n), dtype=A.dtype, device=A.device, comm=A.comm, split=A.split), + r, + False, + ) + + V, k = _subspaceiteration( + A, + 0.5 + * (U + factories.eye((n, n), dtype=A.dtype, device=A.device, comm=A.comm, split=A.split)), + silent, + depth, + ) + A = V.T @ A @ V + + if A.comm.rank == 0 and not silent: + print( + "\t" * depth + + f"At depth {depth}: Zolo-PD(r={'auto' if r is None else r}) on {nprocs} processes reduced symmetric eigenvalue problem of size {n} to" + ) + print( + "\t" * depth + + f" two independent problems of size {k} and {n - k} respectively." + ) + + # from the "global" A, two independent "local" A's are created + # the number of processes per local array is roughly proportional to their size with the constraint that + # each "local" A needs to get at least one process + nprocs1 = max(1, min(nprocs - 1, round(k / n * nprocs))) + nprocs2 = nprocs - nprocs1 + new_lshapes = torch.tensor( + [k // nprocs1 + (i < k % nprocs1) for i in range(nprocs1)] + + [(n - k) // nprocs2 + (i < (n - k) % nprocs2) for i in range(nprocs2)] + ) + new_lshape_map = A.lshape_map + new_lshape_map[:, A.split] = new_lshapes + A.redistribute_(target_map=new_lshape_map) + local_comm = A.comm.Split(color=rank < nprocs1, key=rank) + if A.split == 1: + A_local = factories.array( + A.larray[:k, :] if rank < nprocs1 else A.larray[k:, :], + comm=local_comm, + is_split=A.split, + ) + else: + A_local = factories.array( + A.larray[:, :k] if rank < nprocs1 else A.larray[:, k:], + comm=local_comm, + is_split=A.split, + ) + + Lambda_local, V_local = _eigh(A_local, r, silent, r_max, depth + 1, orig_lsize) + + Lambda = factories.array(Lambda_local.larray, is_split=0, comm=A.comm) + V_local_larray = V_local.larray + if A.split == 0: + if rank < nprocs1: + V_local_larray = torch.hstack( + [ + V_local_larray, + torch.zeros(V_local_larray.shape[0], n - k, device=V_local.device.torch_device), + ] + ) + else: + V_local_larray = torch.hstack( + [ + torch.zeros(V_local_larray.shape[0], k, device=V_local.device.torch_device), + V_local_larray, + ] + ) + else: + if rank < nprocs1: + V_local_larray = torch.vstack( + [ + V_local_larray, + torch.zeros(n - k, V_local_larray.shape[1], device=V_local.device.torch_device), + ] + ) + else: + V_local_larray = torch.vstack( + [ + torch.zeros(k, V_local_larray.shape[1], device=V_local.device.torch_device), + V_local_larray, + ] + ) + V_new = factories.array(V_local_larray, is_split=A.split, comm=A.comm, device=A.device) + V.balance_() + V_new.balance_() + V = V @ V_new + + if A.comm.rank == 0 and not silent: + print( + "\t" * depth + + f"At depth {depth}: solutions of two independent problems of size {k} and {n - k} have been merged successfully." + ) + + return Lambda, V + + +def eigh( + A: DNDarray, + r_max_zolopd: int = 8, + silent: bool = True, +) -> Tuple[DNDarray, DNDarray]: + """ + Computes the symmetric eigenvalue decomposition of a symmetric n x n - matrix A, provided as a DNDarray. + + The function returns DNDarrays Lambda (shape (n,) with split = 0) and V (shape (n,n)) such that + A = V @ diag(Lambda) @ V^T, where Lambda contains the eigenvalues of A and V is an orthonormal matrix + containing the corresponding eigenvectors as columns. + + Parameters + ---------- + A : DNDarray + The input matrix. Must be symmetric. + r_max_zolopd : int, optional + This is a hyperparameter for the computation of the polar decomposition via :func:`heat.linalg.polar` which is + applied multiple times in this function. See the documentation of :func:`heat.linalg.polar` for more details on its + meaning and the respective default value. + silent : bool, optional + If True (default), suppresses output messages; otherwise, some information on the recursion is printed to the console. + + Notes + ----- + Unlike the :func:`torch.linalg.eigh` function, the eigenvalues are returned in descending order. + Note that no check of symmetry is performed on the input matrix A; thus, applying this function to a non-symmetric matrix may + result in unpredictable behaviour without a specific error message pointing to this issue. + + The algorithm used for the computation of the symmetric eigenvalue decomposition is based on the Zolotarev polar decomposition; + see Algorithm 5.2 in: + + Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the + matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3). + + See Also + -------- + :func:`heat.linalg.polar` + """ + sanitize_in_nd_realfloating(A, "A", [2]) + if A.shape[0] != A.shape[1]: + raise ValueError( + f"Input matrix must be symmetric and, consequently, square, but input shape was {A.shape[0]} x {A.shape[1]}." + ) + if not isinstance(r_max_zolopd, int) or r_max_zolopd < 1 or r_max_zolopd > 8: + raise ValueError( + f"If provided, parameter r_max_zolopd must be a positive integer, but was {r_max_zolopd} of type {type(r_max_zolopd)}." + ) + return _eigh(A, None, silent, r_max_zolopd, 0, 0) diff --git a/heat/core/linalg/polar.py b/heat/core/linalg/polar.py new file mode 100644 index 0000000000..ef3e58268e --- /dev/null +++ b/heat/core/linalg/polar.py @@ -0,0 +1,370 @@ +""" +Implements polar decomposition (PD) +""" + +import numpy as np +import collections +import torch +from typing import Type, Callable, Dict, Any, TypeVar, Union, Tuple + +from ..communication import MPICommunication, MPI +from ..dndarray import DNDarray +from .. import factories +from .. import types +from . import matrix_norm, vector_norm, matmul, qr, solve_triangular +from .basics import _estimate_largest_singularvalue, condest +from ..indexing import where +from ..random import randn +from ..devices import Device +from ..manipulations import vstack, hstack, concatenate, diag, balance +from ..exponential import sqrt +from .. import statistics + +from scipy.special import ellipj +from scipy.special import ellipkm1 + +__all__ = ["polar"] + + +def _zolopd_n_iterations(r: int, kappa: float) -> int: + """ + Returns the number of iterations required in the Zolotarev-PD algorithm. + See the Table 3.1 in: Nakatsukasa, Y., & Freund, R. W. (2016). Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334 + + Inputs are `r` and `kappa` (named as in the paper), and the output is the number of iterations. + """ + if kappa <= 1e2: + its = [4, 3, 2, 2, 2, 2, 2, 2] + elif kappa <= 1e3: + its = [3, 3, 2, 2, 2, 2, 2, 2] + elif kappa <= 1e5: + its = [5, 3, 3, 3, 2, 2, 2, 2] + elif kappa <= 1e7: + its = [5, 4, 3, 3, 3, 2, 2, 2] + else: + its = [6, 4, 3, 3, 3, 3, 3, 2] + return its[r - 1] + + +def _compute_zolotarev_coefficients( + r: int, ell: float, device: str, dtype: types.datatype = types.float64 +) -> Tuple[DNDarray, DNDarray, DNDarray]: + """ + Computes c=(c_i)_i defined in equation (3.4), as well as a=(a_j)_j and Mhat defined in formulas (4.2)/(4.3) of the paper Nakatsukasa, Y., & Freund, R. W. (2016). Computing the polar decomposition with applications. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334. + Evaluations of the respective complete elliptic integral of the first kind and the Jacobi elliptic functions are imported from SciPy. + + Inputs are `r` and `ell` (named as in the paper), as well as the Heat data type `dtype` of the output (required for reasons of consistency). + Output is a tupe containing the vectors `a` and `c` as DNDarrays and `Mhat`. + """ + uu = np.arange(1, 2 * r + 1) * ellipkm1(ell**2) / (2 * r + 1) + ellipfcts = np.asarray(ellipj(uu, 1 - ell**2)[:2]) + cc = ell**2 * ellipfcts[0, :] ** 2 / ellipfcts[1, :] ** 2 + aa = np.zeros(r) + Mhat = 1 + for j in range(1, r + 1): + p1 = 1 + p2 = 1 + for k in range(1, r + 1): + p1 *= cc[2 * j - 2] - cc[2 * k - 1] + if k != j: + p2 *= cc[2 * j - 2] - cc[2 * k - 2] + aa[j - 1] = -p1 / p2 + Mhat *= (1 + cc[2 * j - 2]) / (1 + cc[2 * j - 1]) + return ( + factories.array(cc, dtype=dtype, split=None, device=device), + factories.array(aa, dtype=dtype, split=None, device=device), + factories.array(Mhat, dtype=dtype, split=None, device=device), + ) + + +def _in_place_qr_with_q_only(A: DNDarray, procs_to_merge: int = 2) -> None: + r""" + Input A and procs_to_merge are as in heat.linalg.qr; difference it that this routine modified A in place and replaces it with Q. + """ + if not A.is_distributed() or A.split < A.ndim - 2: + # handle the case of a single process or split=None: just PyTorch QR + # difference to heat.linalg.qr: we only return Q and put it directly in place of A + A.larray, R = torch.linalg.qr(A.larray, mode="reduced") + del R + + elif A.split == A.ndim - 1: + # handle the case that A is split along the columns + # unlike in heat.linalg.qr, we know by assumption of Zolo-PD that A has at least as many rows as columns + + nprocs = A.comm.size + with torch.no_grad(): + for i in range(nprocs): + # this loop goes through all the column-blocks (i.e. local arrays) of the matrix + # this corresponds to the loop over all columns in classical Gram-Schmidt + A_lshapes = A.lshape_map + if i < nprocs - 1: + if A.comm.rank > i: + Q_buf = torch.zeros( + tuple(A_lshapes[i, :]), + dtype=A.larray.dtype, + device=A.device.torch_device, + ) + color = 0 if A.comm.rank < i else 1 + sub_comm = A.comm.Split(color, A.comm.rank) + + if A.comm.rank == i: + # orthogonalize the current block of columns by utilizing PyTorch QR + Q, R = torch.linalg.qr(A.larray, mode="reduced") + del R + A.larray[...] = Q + del Q + if i < nprocs - 1: + Q_buf = A.larray + + if i < nprocs - 1 and A.comm.rank >= i: + sub_comm.Bcast(Q_buf, root=0) + + if A.comm.rank > i: + # subtract the contribution of the current block of columns from the remaining columns + R_loc = torch.transpose(Q_buf, -2, -1) @ A.larray + A.larray -= Q_buf @ R_loc + del R_loc, Q_buf + + else: + A, r = qr(A) + del r + + +def polar( + A: DNDarray, + r: int = None, + calcH: bool = True, + condition_estimate: float = 1.0e16, + silent: bool = True, + r_max: int = 8, +) -> Tuple[DNDarray, DNDarray]: + """ + Computes the so-called polar decomposition of the input 2D DNDarray ``A``, i.e., it returns the orthogonal matrix ``U`` and the symmetric, positive definite + matrix ``H`` such that ``A = U @ H``. + + Input + ----- + A : ht.DNDarray, + The input matrix for which the polar decomposition is computed; + must be two-dimensional, of data type float32 or float64, and must have at least as many rows as columns. + r : int, optional, default: None + The parameter r used in the Zolotarev-PD algorithm; if provided, must be an integer between 1 and 8 that divides the number of MPI processes. + Higher values of r lead to faster convergence, but memory consumption is proportional to r. + If not provided, the largest 1 <= r <= r_max that divides the number of MPI processes is chosen. + calcH : bool, optional, default: True + If True, the function returns the symmetric, positive definite matrix H. If False, only the orthogonal matrix U is returned. + condition_estimate : float, optional, default: 1.e16. + This argument allows to provide an estimate for the condition number of the input matrix ``A``, if such estimate is already known. + If a positive number greater than 1., this value is used as an estimate for the condition number of A. + If smaller or equal than 1., the condition number is estimated internally. + The default value of 1.e16 is the worst case scenario considered in [1]. + silent : bool, optional, default: True + If True, the function does not print any output. If False, some information is printed during the computation. + r_max : int, optional, default: 8 + See the description of r for the meaning; r_max is only taken into account if r is not provided. + + + Notes + ----- + The implementation follows Algorithm 5.1 in Reference [1]; however, instead of switching from QR to Cholesky decomposition depending on the condition number, + we stick to QR decomposition in all iterations. + + References + ---------- + [1] Nakatsukasa, Y., & Freund, R. W. (2016). Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Review, 58(3), DOI: https://doi.org/10.1137/140990334. + """ + # check whether input is DNDarray of correct shape + if not isinstance(A, DNDarray): + raise TypeError(f"Input ``A`` needs to be a DNDarray but is {type(A)}.") + if not A.ndim == 2: + raise ValueError(f"Input ``A`` needs to be a 2D DNDarray, but its dimension is {A.ndim}.") + if A.shape[0] < A.shape[1]: + raise ValueError( + f"Input ``A`` must have at least as many rows as columns, but has shape {A.shape}." + ) + + # check if A is a real floating point matrix and choose tolerances tol accordingly + if A.dtype == types.float32: + tol = 1.19e-7 + elif A.dtype == types.float64: + tol = 2.22e-16 + else: + raise TypeError( + f"Input ``A`` must be of data type float32 or float64 but has data type {A.dtype}" + ) + + # check if input for r is reasonable + if r is not None: + if not isinstance(r, int) or r < 1 or r > 8: + raise ValueError( + f"If specified, input ``r`` must be an integer between 1 and 8, but is {r} of data type {type(r)}." + ) + if A.is_distributed() and (A.comm.size % r != 0 or A.comm.size == r): + raise ValueError( + f"If specified, input ``r`` must be a non-trivial divisor of the number MPI processes , but r={r} and A.comm.size={A.comm.size}." + ) + else: + if not isinstance(r_max, int) or r_max < 1 or r_max > 8: + raise ValueError( + f"If specified, input ``r_max`` must be an integer between 1 and 8, but is {r_max} of data type {type(r_max)}." + ) + for i in range(r_max, 0, -1): + if A.comm.size % i == 0 and A.comm.size // i > 1: + r = i + break + if not silent: + if A.comm.rank == 0: + print(f"Automatically chosen r={r} (r_max = {r_max}, {A.comm.size} processes).") + + # check if input for condition_estimate is reasonable + if not isinstance(condition_estimate, float): + raise TypeError( + f"If specified, input ``condition_estimate`` must be a float but is {type(condition_estimate)}." + ) + + # early out for the non-distributed case + if not A.is_distributed(): + U, s, vh = torch.linalg.svd(A.larray, full_matrices=False) + U @= vh + H = vh.T @ torch.diag(s) @ vh + if calcH: + return factories.array(U, is_split=None, comm=A.comm), factories.array( + H, is_split=None, comm=A.comm + ) + else: + return factories.array(U, is_split=None, comm=A.comm) + + alpha = _estimate_largest_singularvalue(A).item() + + if condition_estimate <= 1.0: + kappa = condest(A).item() + else: + kappa = condition_estimate + + if A.comm.rank == 0 and not silent: + print( + f"Condition number estimate: {kappa:2.2e} / Estimate for largest singular value: {alpha:2.2e}." + ) + + # each of these communicators has size r, along these communicators we parallelize the r many QR decompositions that are performed in parallel + horizontal_comm = A.comm.Split(A.comm.rank // r, A.comm.rank) + + # each of these communicators has size MPI_WORLD.size / r and will carray a full copy of X for QR decomposition + vertical_comm = A.comm.Split(A.comm.rank % r, A.comm.rank) + + # in each horizontal communicator, collect the local array of X from all processes + local_shapes = horizontal_comm.allgather(A.lshape[A.split]) + new_local_shape = ( + (sum(local_shapes), A.shape[1]) if A.split == 0 else (A.shape[0], sum(local_shapes)) + ) + counts = tuple(local_shapes) + displacements = tuple(np.cumsum([0] + list(local_shapes))[:-1]) + X_collected_local = torch.zeros( + new_local_shape, dtype=A.dtype.torch_type(), device=A.device.torch_device + ) + horizontal_comm.Allgatherv( + A.larray, (X_collected_local, counts, displacements), recv_axis=A.split + ) + + X = factories.array(X_collected_local, is_split=A.split, comm=vertical_comm) + X.balance_() + X /= alpha + + # iteration counter and maximum number of iterations + it = 0 + itmax = _zolopd_n_iterations(r, kappa) + + # parameters and coefficients, see Ref. [1] for their meaning + ell = 1.0 / kappa + c, a, Mhat = _compute_zolotarev_coefficients(r, ell, A.device, dtype=A.dtype) + + itmax = _zolopd_n_iterations(r, kappa) + while it < itmax: + it += 1 + if not silent: + if A.comm.rank == 0: + print(f"Starting Zolotarev-PD iteration no. {it}...") + # remember current X for later convergence check + X_old = X.copy() + cId = factories.eye(X.shape[1], dtype=X.dtype, comm=X.comm, split=X.split, device=X.device) + cId *= c[2 * horizontal_comm.rank].item() ** 0.5 + X = concatenate([X, cId], axis=0) + del cId + if X.split == 0: + Q, R = qr(X) + del R + Q1 = Q[: A.shape[0], :].balance() + Q2 = Q[A.shape[0] :, :].transpose().balance() + Q1Q2 = matmul(Q1, Q2) + del Q1, Q2 + X = X[: A.shape[0], :].balance() + X /= r + else: + _in_place_qr_with_q_only(X) + Q1 = X[: A.shape[0], :].balance() + Q2 = X[A.shape[0] :, :].transpose().balance() + del X + Q1Q2 = matmul(Q1, Q2) + del Q1, Q2 + X = X_old / r + X += a[horizontal_comm.rank].item() / c[2 * horizontal_comm.rank].item() ** 0.5 * Q1Q2 + del Q1Q2 + X *= Mhat.item() + # finally, sum over the horizontal communicators + horizontal_comm.Allreduce(MPI.IN_PLACE, X.larray, op=MPI.SUM) + + # check for convergence and break if tolerance is reached + if it > 1 and matrix_norm(X - X_old, ord="fro") / matrix_norm(X, ord="fro") <= tol ** ( + 1 / (2 * r + 1) + ): + if not silent: + if A.comm.rank == 0: + print(f"Zolotarev-PD iteration converged after {it} iterations.") + break + elif it < itmax: + # if another iteration is necessary, update coefficients and parameters for next iteration + ellold = ell + ell = 1 + for j in range(r): + ell *= (ellold**2 + c[2 * j + 1].item()) / (ellold**2 + c[2 * j].item()) + ell *= Mhat.item() * ellold + if ell >= 1.0: + ell = 1.0 - tol + c, a, Mhat = _compute_zolotarev_coefficients(r, ell, A.device, dtype=A.dtype) + else: + if not silent: + if A.comm.rank == 0: + print( + f"Zolotarev-PD iteration did not reach the convergence criterion after {itmax} iterations, which is most likely due to limited numerical accuracy and/or poor estimation of the condition number. The result may still be useful, but should be handled with care!" + ) + + # as every process has much more data than required, we need to split the result into the parts that are actually + counts = [ + X.lshape[X.split] // horizontal_comm.size + (r < X.lshape[X.split] % horizontal_comm.size) + for r in range(horizontal_comm.size) + ] + displacements = [sum(counts[:r]) for r in range(horizontal_comm.size)] + + if A.split == 1: + U_local = X.larray[ + :, + displacements[horizontal_comm.rank] : displacements[horizontal_comm.rank] + + counts[horizontal_comm.rank], + ] + else: + U_local = X.larray[ + displacements[horizontal_comm.rank] : displacements[horizontal_comm.rank] + + counts[horizontal_comm.rank], + :, + ] + U = factories.array(U_local, is_split=A.split, comm=A.comm, device=A.device) + del X + U.balance_() + + # postprocessing: compute H if requested + if calcH: + H = matmul(U.T, A) + H = 0.5 * (H + H.T.resplit(H.split)) + return U, H.resplit(A.split) + else: + return U diff --git a/heat/core/linalg/qr.py b/heat/core/linalg/qr.py index f3cc5afe5b..4ca0c3fc01 100644 --- a/heat/core/linalg/qr.py +++ b/heat/core/linalg/qr.py @@ -1,5 +1,5 @@ """ -QR decomposition of (distributed) 2-D ``DNDarray``s. +QR decomposition of ``DNDarray``s. """ import collections @@ -7,6 +7,7 @@ from typing import Tuple from ..dndarray import DNDarray +from ..manipulations import concatenate from .. import factories from .. import communication from ..types import float32, float64 @@ -24,16 +25,18 @@ def qr( Factor the matrix ``A`` as *QR*, where ``Q`` is orthonormal and ``R`` is upper-triangular. If ``mode = "reduced``, function returns ``QR(Q=Q, R=R)``, if ``mode = "r"`` function returns ``QR(Q=None, R=R)`` + This function also works for batches of matrices; in this case, the last two dimensions of the input array are considered as the matrix dimensions. + The output arrays have the same leading batch dimensions as the input array. + Parameters ---------- - A : DNDarray of shape (M, N) - Array which will be decomposed. So far only 2D arrays with datatype float32 or float64 are supported - For split=0, the matrix must be tall skinny, i.e. the local chunks of data must have at least as many rows as columns. + A : DNDarray of shape (M, N), of shape (...,M,N) in the batched case + Array which will be decomposed. So far only arrays with datatype float32 or float64 are supported mode : str, optional - default "reduced" returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N), respectively. + default "reduced" returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N). Potential batch dimensions are not modified. "r" returns only R, with dimensions (min(M,N), N). procs_to_merge : int, optional - This parameter is only relevant for split=0 and determines the number of processes to be merged at one step during the so-called TS-QR algorithm. + This parameter is only relevant for split=0 (-2, in the batched case) and determines the number of processes to be merged at one step during the so-called TS-QR algorithm. The default is 2. Higher choices might be faster, but will probably result in higher memory consumption. 0 corresponds to merging all processes at once. We only recommend to modify this parameter if you are familiar with the TS-QR algorithm (see the references below). @@ -43,16 +46,20 @@ def qr( - If ``A`` is distributed along the columns (A.split = 1), so will be ``Q`` and ``R``. - - If ``A`` is distributed along the rows (A.split = 0), ``Q`` too will have `split=0`, but ``R`` won't be distributed, i.e. `R. split = None` and a full copy of ``R`` will be stored on each process. + - If ``A`` is distributed along the rows (A.split = 0), ``Q`` too will have `split=0`. ``R`` won't be distributed, i.e. `R. split = None`, if ``A`` is tall-skinny, i.e., if + the largest local chunk of data of ``A`` has at least as many rows as columns. Otherwise, ``R`` will be distributed along the rows as well, i.e., `R.split = 0`. Note that the argument `calc_q` allowed in earlier Heat versions is no longer supported; `calc_q = False` is equivalent to `mode = "r"`. Unlike ``numpy.linalg.qr()``, `ht.linalg.qr` only supports ``mode="reduced"`` or ``mode="r"`` for the moment, since "complete" may result in heavy memory usage. Heats QR function is built on top of PyTorchs QR function, ``torch.linalg.qr()``, using LAPACK (CPU) and MAGMA (CUDA) on - the backend. For split=0, tall-skinny QR (TS-QR) is implemented, while for split=1 a block-wise version of stabilized Gram-Schmidt orthogonalization is used. + the backend. Both cases split=0 and split=1 build on a column-block-wise version of stabilized Gram-Schmidt orthogonalization. + For split=1 (-1, in the batched case), this is directly applied to the local arrays of the input array. + For split=0, a tall-skinny QR (TS-QR) is implemented for the case of tall-skinny matrices (i.e., the largest local chunk of data has at least as many rows as columns), + and extended to non tall-skinny matrices by applying a block-wise version of stabilized Gram-Schmidt orthogonalization. References - ----------- + ---------- Basic information about QR factorization/decomposition can be found at, e.g.: - https://en.wikipedia.org/wiki/QR_factorization, @@ -87,65 +94,58 @@ def qr( if procs_to_merge == 0: procs_to_merge = A.comm.size - if A.ndim != 2: - raise ValueError( - f"Array 'A' must be 2 dimensional, buts has {A.ndim} dimensions. \n Please open an issue on GitHub if you require QR for batches of matrices similar to PyTorch." - ) if A.dtype not in [float32, float64]: raise TypeError(f"Array 'A' must have a datatype of float32 or float64, but has {A.dtype}") QR = collections.namedtuple("QR", "Q, R") - if not A.is_distributed(): + if A.ndim == 3: + single_proc_qr = torch.vmap(torch.linalg.qr, in_dims=0, out_dims=0) + else: + single_proc_qr = torch.linalg.qr + + if not A.is_distributed() or A.split < A.ndim - 2: # handle the case of a single process or split=None: just PyTorch QR - Q, R = torch.linalg.qr(A.larray, mode=mode) - R = DNDarray( - R, - gshape=R.shape, - dtype=A.dtype, - split=A.split, - device=A.device, - comm=A.comm, - balanced=True, - ) + Q, R = single_proc_qr(A.larray, mode=mode) + R = factories.array(R, is_split=A.split) if mode == "reduced": - Q = DNDarray( - Q, - gshape=Q.shape, - dtype=A.dtype, - split=A.split, - device=A.device, - comm=A.comm, - balanced=True, - ) + Q = factories.array(Q, is_split=A.split, device=A.device) else: Q = None return QR(Q, R) - if A.split == 1: + if A.split == A.ndim - 1: # handle the case that A is split along the columns # here, we apply a block-wise version of (stabilized) Gram-Schmidt orthogonalization # instead of orthogonalizing each column of A individually, we orthogonalize blocks of columns (i.e. the local arrays) at once - lshapes = A.lshape_map[:, 1] + lshapes = A.lshape_map[:, -1] lshapes_cum = torch.cumsum(lshapes, 0) nprocs = A.comm.size - if A.shape[0] >= A.shape[1]: + if A.shape[-2] >= A.shape[-1]: last_row_reached = nprocs - k = A.shape[1] + k = A.shape[-1] else: - last_row_reached = min(torch.argwhere(lshapes_cum >= A.shape[0]))[0] - k = A.shape[0] + last_row_reached = min(torch.argwhere(lshapes_cum >= A.shape[-2]))[0] + k = A.shape[-2] if mode == "reduced": - Q = factories.zeros(A.shape, dtype=A.dtype, split=1, device=A.device, comm=A.comm) + Q = factories.zeros( + A.shape, dtype=A.dtype, split=A.ndim - 1, device=A.device, comm=A.comm + ) - R = factories.zeros((k, A.shape[1]), dtype=A.dtype, split=1, device=A.device, comm=A.comm) + R = factories.zeros( + (*A.shape[:-2], k, A.shape[-1]), + dtype=A.dtype, + split=A.ndim - 1, + device=A.device, + comm=A.comm, + ) R_shapes = torch.hstack( [ torch.zeros(1, dtype=torch.int32, device=A.device.torch_device), - torch.cumsum(R.lshape_map[:, 1], 0), + torch.cumsum(R.lshape_map[:, -1], 0), ] ) @@ -154,157 +154,209 @@ def qr( for i in range(last_row_reached + 1): # this loop goes through all the column-blocks (i.e. local arrays) of the matrix # this corresponds to the loop over all columns in classical Gram-Schmidt + if i < nprocs - 1: - k_loc_i = min(A.shape[0], A.lshape_map[i, 1]) + k_loc_i = min(A.shape[-2], A.lshape_map[i, -1]) Q_buf = torch.zeros( - (A.shape[0], k_loc_i), dtype=A.larray.dtype, device=A.device.torch_device + (*A.shape[:-1], k_loc_i), + dtype=A.larray.dtype, + device=A.device.torch_device, ) if A.comm.rank == i: # orthogonalize the current block of columns by utilizing PyTorch QR - Q_curr, R_loc = torch.linalg.qr(A_columns, mode="reduced") + Q_curr, R_loc = single_proc_qr(A_columns, mode="reduced") if i < nprocs - 1: - Q_buf = Q_curr + Q_buf = Q_curr.contiguous() if mode == "reduced": Q.larray = Q_curr - r_size = R.larray[R_shapes[i] : R_shapes[i + 1], :].shape[0] - R.larray[R_shapes[i] : R_shapes[i + 1], :] = R_loc[:r_size, :] + r_size = R.larray[..., R_shapes[i] : R_shapes[i + 1], :].shape[-2] + R.larray[..., R_shapes[i] : R_shapes[i + 1], :] = R_loc[..., :r_size, :] if i < nprocs - 1: # broadcast the orthogonalized block of columns to all other processes - req = A.comm.Ibcast(Q_buf, root=i) - req.Wait() + A.comm.Bcast(Q_buf, root=i) if A.comm.rank > i: # subtract the contribution of the current block of columns from the remaining columns - R_loc = Q_buf.T @ A_columns + R_loc = torch.transpose(Q_buf, -2, -1) @ A_columns A_columns -= Q_buf @ R_loc - r_size = R.larray[R_shapes[i] : R_shapes[i + 1], :].shape[0] - R.larray[R_shapes[i] : R_shapes[i + 1], :] = R_loc[:r_size, :] + r_size = R.larray[..., R_shapes[i] : R_shapes[i + 1], :].shape[-2] + R.larray[..., R_shapes[i] : R_shapes[i + 1], :] = R_loc[..., :r_size, :] if mode == "reduced": - Q = Q[:, :k].balance() + Q = Q[..., :, :k].balance() else: Q = None return QR(Q, R) - if A.split == 0: - # implementation of TS-QR for split = 0 - # check that data distribution is reasonable for TS-QR (i.e. tall-skinny matrix with also tall-skinny local chunks of data) - if A.lshape_map[:, 0].max().item() < A.shape[1]: - raise ValueError( - "A is split along the rows and the local chunks of data are rectangular with more rows than columns. \n Applying TS-QR in this situation is not reasonable w.r.t. runtime and memory consumption. \n We recomment to split A along the columns instead. \n In case this is not an option for you, please open an issue on GitHub." + if A.split == A.ndim - 2: + # check that data distribution is reasonable for TS-QR + # we regard a matrix with split = 0 as suitable for TS-QR if its largest local chunk of data has at least as many rows as columns + biggest_number_of_local_rows = A.lshape_map[:, -2].max().item() + if biggest_number_of_local_rows < A.shape[-1]: + column_idx = torch.cumsum(A.lshape_map[:, -2], 0) + column_idx = column_idx[column_idx < A.shape[-1]] + column_idx = torch.cat( + ( + torch.tensor([0], device=column_idx.device), + column_idx, + torch.tensor([A.shape[-1]], device=column_idx.device), + ) ) + A_copy = A.copy() + R = A.copy() + # Block-wise Gram-Schmidt orthogonalization, applied to groups of columns + offset = 1 if A.shape[-1] <= A.shape[-2] else 2 + for k in range(len(column_idx) - offset): + # since we only consider a group of columns, TS QR is applied to a tall-skinny matrix + Qnew, Rnew = qr( + A_copy[..., :, column_idx[k] : column_idx[k + 1]], + mode="reduced", + procs_to_merge=procs_to_merge, + ) - current_procs = [i for i in range(A.comm.size)] - current_comm = A.comm - local_comm = current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank) - Q_loc, R_loc = torch.linalg.qr(A.larray, mode=mode) - R_loc = R_loc.contiguous() # required for all the communication ops lateron - if mode == "reduced": - leave_comm = current_comm.Split(current_comm.rank, A.comm.rank) - - level = 1 - while len(current_procs) > 1: - if A.comm.rank in current_procs and local_comm.size > 1: - # create array to collect the R_loc's from all processes of the process group of at most n_procs_to_merge processes - shapes_R_loc = local_comm.gather(R_loc.shape[0], root=0) - if local_comm.rank == 0: - gathered_R_loc = torch.zeros( - (sum(shapes_R_loc), R_loc.shape[1]), - device=R_loc.device, - dtype=R_loc.dtype, + # usual update of the remaining columns + if R.comm.rank == k: + R.larray[ + ..., + : (column_idx[k + 1] - column_idx[k]), + column_idx[k] : column_idx[k + 1], + ] = Rnew.larray + if R.comm.rank > k: + R.larray[..., :, column_idx[k] : column_idx[k + 1]] *= 0 + if k < len(column_idx) - 2: + coeffs = ( + torch.transpose(Qnew.larray, -2, -1) + @ A_copy.larray[..., :, column_idx[k + 1] :] ) - counts = list(shapes_R_loc) - displs = torch.cumsum( - torch.tensor([0] + shapes_R_loc, dtype=torch.int32), 0 - ).tolist()[:-1] - else: - gathered_R_loc = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype) - counts = None - displs = None - # gather the R_loc's from all processes of the process group of at most n_procs_to_merge processes - local_comm.Gatherv(R_loc, (gathered_R_loc, counts, displs), root=0, axis=0) - # perform QR decomposition on the concatenated, gathered R_loc's to obtain new R_loc - if local_comm.rank == 0: - previous_shape = R_loc.shape - Q_buf, R_loc = torch.linalg.qr(gathered_R_loc, mode=mode) - R_loc = R_loc.contiguous() - else: - Q_buf = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype) + R.comm.Allreduce(communication.MPI.IN_PLACE, coeffs) + if R.comm.rank == k: + R.larray[..., :, column_idx[k + 1] :] = coeffs + A_copy.larray[..., :, column_idx[k + 1] :] -= Qnew.larray @ coeffs if mode == "reduced": - if local_comm.rank == 0: - Q_buf = Q_buf.contiguous() - scattered_Q_buf = torch.empty( - R_loc.shape if local_comm.rank != 0 else previous_shape, - device=R_loc.device, - dtype=R_loc.dtype, - ) - # scatter the Q_buf to all processes of the process group - local_comm.Scatterv((Q_buf, counts, displs), scattered_Q_buf, root=0, axis=0) - del gathered_R_loc, Q_buf + Q = Qnew if k == 0 else concatenate((Q, Qnew), axis=-1) + if A.shape[-1] < A.shape[-2]: + R = R[..., : A.shape[-1], :].balance() + if mode == "reduced": + return QR(Q, R) + else: + return QR(None, R) - # for each process in the current processes, broadcast the scattered_Q_buf of this process - # to all leaves (i.e. all original processes that merge to the current process) - if mode == "reduced" and leave_comm.size > 1: + else: + # in this case the input is tall-skinny and we apply the TS-QR algorithm + # it follows the implementation of TS-QR for split = 0 + current_procs = [i for i in range(A.comm.size)] + current_comm = A.comm + local_comm = current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank) + Q_loc, R_loc = single_proc_qr(A.larray, mode=mode) + R_loc = R_loc.contiguous() + if mode == "reduced": + leave_comm = current_comm.Split(current_comm.rank, A.comm.rank) + + level = 1 + while len(current_procs) > 1: + if A.comm.rank in current_procs and local_comm.size > 1: + # create array to collect the R_loc's from all processes of the process group of at most n_procs_to_merge processes + shapes_R_loc = local_comm.gather(R_loc.shape[-2], root=0) + if local_comm.rank == 0: + gathered_R_loc = torch.zeros( + (*R_loc.shape[:-2], sum(shapes_R_loc), R_loc.shape[-1]), + device=R_loc.device, + dtype=R_loc.dtype, + ) + counts = list(shapes_R_loc) + displs = torch.cumsum( + torch.tensor([0] + shapes_R_loc, dtype=torch.int32), 0 + ).tolist()[:-1] + else: + gathered_R_loc = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype) + counts = None + displs = None + # gather the R_loc's from all processes of the process group of at most n_procs_to_merge processes + local_comm.Gatherv(R_loc, (gathered_R_loc, counts, displs), root=0, axis=-2) + # perform QR decomposition on the concatenated, gathered R_loc's to obtain new R_loc + if local_comm.rank == 0: + previous_shape = R_loc.shape + Q_buf, R_loc = single_proc_qr(gathered_R_loc, mode=mode) + R_loc = R_loc.contiguous() + else: + Q_buf = torch.empty(0, device=R_loc.device, dtype=R_loc.dtype) + if mode == "reduced": + if local_comm.rank == 0: + Q_buf = Q_buf.contiguous() + scattered_Q_buf = torch.empty( + R_loc.shape if local_comm.rank != 0 else previous_shape, + device=R_loc.device, + dtype=R_loc.dtype, + ) + # scatter the Q_buf to all processes of the process group + local_comm.Scatterv( + (Q_buf, counts, displs), scattered_Q_buf, root=0, axis=-2 + ) + del gathered_R_loc, Q_buf + + # for each process in the current processes, broadcast the scattered_Q_buf of this process + # to all leaves (i.e. all original processes that merge to the current process) + if mode == "reduced" and leave_comm.size > 1: + try: + scattered_Q_buf_shape = scattered_Q_buf.shape + except UnboundLocalError: + scattered_Q_buf_shape = None + scattered_Q_buf_shape = leave_comm.bcast(scattered_Q_buf_shape, root=0) + if scattered_Q_buf_shape is not None: + # this is needed to ensure that only those Q_loc get updates that are actually part of the current process group + if leave_comm.rank != 0: + scattered_Q_buf = torch.empty( + scattered_Q_buf_shape, device=Q_loc.device, dtype=Q_loc.dtype + ) + leave_comm.Bcast(scattered_Q_buf, root=0) + # update the local Q_loc by multiplying it with the scattered_Q_buf try: - scattered_Q_buf_shape = scattered_Q_buf.shape + Q_loc = Q_loc @ scattered_Q_buf + del scattered_Q_buf except UnboundLocalError: - scattered_Q_buf_shape = None - scattered_Q_buf_shape = leave_comm.bcast(scattered_Q_buf_shape, root=0) - if scattered_Q_buf_shape is not None: - # this is needed to ensure that only those Q_loc get updates that are actually part of the current process group - if leave_comm.rank != 0: - scattered_Q_buf = torch.empty( - scattered_Q_buf_shape, device=Q_loc.device, dtype=Q_loc.dtype + pass + + # update: determine processes to be active at next "merging" level, create new communicator and split it into groups for gathering + current_procs = [ + current_procs[i] for i in range(len(current_procs)) if i % procs_to_merge == 0 + ] + if len(current_procs) > 1: + new_group = A.comm.group.Incl(current_procs) + current_comm = A.comm.Create_group(new_group) + if A.comm.rank in current_procs: + local_comm = communication.MPICommunication( + current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank) ) - leave_comm.Bcast(scattered_Q_buf, root=0) - # update the local Q_loc by multiplying it with the scattered_Q_buf - try: - Q_loc = Q_loc @ scattered_Q_buf - del scattered_Q_buf - except UnboundLocalError: - pass - - # update: determine processes to be active at next "merging" level, create new communicator and split it into groups for gathering - current_procs = [ - current_procs[i] for i in range(len(current_procs)) if i % procs_to_merge == 0 - ] - if len(current_procs) > 1: - new_group = A.comm.group.Incl(current_procs) - current_comm = A.comm.Create_group(new_group) - if A.comm.rank in current_procs: - local_comm = communication.MPICommunication( - current_comm.Split(current_comm.rank // procs_to_merge, A.comm.rank) - ) - if mode == "reduced": - leave_comm = A.comm.Split(A.comm.rank // procs_to_merge**level, A.comm.rank) - level += 1 - # broadcast the final R_loc to all processes - R_gshape = (A.shape[1], A.shape[1]) - if A.comm.rank != 0: - R_loc = torch.empty(R_gshape, dtype=R_loc.dtype, device=R_loc.device) - A.comm.Bcast(R_loc, root=0) - R = DNDarray( - R_loc, - gshape=R_gshape, - dtype=A.dtype, - split=None, - device=A.device, - comm=A.comm, - balanced=True, - ) - if mode == "r": - Q = None - else: - Q = DNDarray( - Q_loc, - gshape=A.shape, + if mode == "reduced": + leave_comm = A.comm.Split(A.comm.rank // procs_to_merge**level, A.comm.rank) + level += 1 + # broadcast the final R_loc to all processes + R_gshape = (*A.shape[:-2], A.shape[-1], A.shape[-1]) + if A.comm.rank != 0: + R_loc = torch.empty(R_gshape, dtype=R_loc.dtype, device=R_loc.device) + A.comm.Bcast(R_loc, root=0) + R = DNDarray( + R_loc, + gshape=R_gshape, dtype=A.dtype, - split=0, + split=None, device=A.device, comm=A.comm, balanced=True, ) - return QR(Q, R) + if mode == "r": + Q = None + else: + Q = DNDarray( + Q_loc, + gshape=A.shape, + dtype=A.dtype, + split=A.split, + device=A.device, + comm=A.comm, + balanced=True, + ) + return QR(Q, R) diff --git a/heat/core/linalg/solver.py b/heat/core/linalg/solver.py index 1a8d156b70..845f4d419a 100644 --- a/heat/core/linalg/solver.py +++ b/heat/core/linalg/solver.py @@ -274,9 +274,10 @@ def lanczos( def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray: """ - This function provides a solver for (possibly batched) upper triangular systems of linear equations: it returns `x` in `Ax = b`, where `A` is a (possibly batched) upper triangular matrix and + Solver for (possibly batched) upper triangular systems of linear equations: it returns `x` in `Ax = b`, where `A` is a (possibly batched) upper triangular matrix and `b` a (possibly batched) vector or matrix of suitable shape, both provided as input to the function. The implementation builts on the corresponding solver in PyTorch and implements an memory-distributed, MPI-parallel block-wise version thereof. + Parameters ---------- A : DNDarray @@ -339,7 +340,7 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray: else: # A not split, b.split == -2 b_lshapes_cum = torch.hstack( [ - torch.zeros(1, dtype=torch.int32, device=tdev), + torch.zeros(1, dtype=torch.int64, device=tdev), torch.cumsum(b.lshape_map[:, -2], 0), ] ) @@ -387,7 +388,7 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray: if A.split >= batch_dim: # both splits in la dims A_lshapes_cum = torch.hstack( [ - torch.zeros(1, dtype=torch.int32, device=tdev), + torch.zeros(1, dtype=torch.int64, device=tdev), torch.cumsum(A.lshape_map[:, A.split], 0), ] ) @@ -411,7 +412,11 @@ def solve_triangular(A: DNDarray, b: DNDarray) -> DNDarray: displ[i:] = 0 res_send = torch.empty(0) - res_recv = torch.zeros((*batch_shape, count[comm.rank], b.shape[-1]), device=tdev) + res_recv = torch.zeros( + (*batch_shape, count[comm.rank], b.shape[-1]), + device=tdev, + dtype=b.dtype.torch_type(), + ) if comm.rank == i: x.larray = torch.linalg.solve_triangular( diff --git a/heat/core/linalg/svd.py b/heat/core/linalg/svd.py index eb7ff1c87a..86a939c55a 100644 --- a/heat/core/linalg/svd.py +++ b/heat/core/linalg/svd.py @@ -5,8 +5,11 @@ from typing import Tuple from ..dndarray import DNDarray from .qr import qr +from .polar import polar +from .eigh import eigh from ..types import float32, float64 import torch +from warnings import warn __all__ = ["svd"] @@ -16,6 +19,7 @@ def svd( full_matrices: bool = False, compute_uv: bool = True, qr_procs_to_merge: int = 2, + r_max_zolopd: int = 8, ) -> Tuple[DNDarray, DNDarray, DNDarray]: """ Computes the singular value decomposition of a matrix (the input array ``A``). @@ -39,16 +43,29 @@ def svd( If ``False``, only the vector ``S`` containing the singular values is returned. qr_procs_to_merge : int, optional the number of processes to merge in the tall skinny QR decomposition that is applied if the input array is tall skinny (``M > N``) or short fat (``M < N``). - See the corresponding remarks for ``heat.linalg.qr`` for more details. + See the corresponding remarks for :func:``heat.linalg.qr`` for more details. + r_max_zolopd : int, optional + an internal parameter only relevant for the case that the input matrix is neither tall-skinny nor short-fat. + This parameter is passed to the Zolotarev-Polar Decomposition and the symmetric eigenvalue decomposition that is applied in this case. + See the documentation of :func:``heat.linalg.polar`` as well as of :func:``heat.linalg.eigh`` for more details. - - Remarks - ---------- + Notes + ----- Unlike in NumPy, we currently do not support the option ``full_matrices=True``, since this can result in heavy memory consumption (in particular for tall skinny and short fat matrices) that should be avoided in the context Heat is designed for. If you nevertheless require this feature, please open an issue on GitHub. The algorithm used for the computation of the singular value depens on the shape of the input array ``A``. - For tall and skinny matrices (``M > N``), the algorithm is based on the tall-skinny QR decomposition; currently this is the only supported algorithm. + For tall and skinny matrices (``M > N``), the algorithm is based on the tall-skinny QR decomposition. For the remaining cases we use the approach based on + Zolotarev-Polar Decomposition and a symmetric eigenvalue decomposition based on Zolotarev-Polar Decomposition; see Algorithm 5.3 in: + + Nakatsukasa, Y., & Freund, R. W. (2016). Computing fundamental matrix decompositions accurately via the + matrix sign function in two iterations: The power of Zolotarev's functions. SIAM Review, 58(3). + + See Also + -------- + :func:`heat.linalg.qr` + :func:`heat.linalg.polar` + :func:`heat.linalg.eigh` """ if full_matrices: raise NotImplementedError( @@ -67,6 +84,10 @@ def svd( ) if qr_procs_to_merge == 0: qr_procs_to_merge = A.comm.size + if not isinstance(r_max_zolopd, int) or r_max_zolopd < 0 or r_max_zolopd > 8: + raise ValueError( + f"r_max_zolopd must be a non-negative int, but is currently {r_max_zolopd} of type {type(r_max_zolopd)}" + ) if A.ndim != 2: raise ValueError( f"Array ``A`` must be 2 dimensional, buts has {A.ndim} dimensions. \n Please open an issue on GitHub if you require SVD for batches of matrices similar to PyTorch." @@ -76,88 +97,7 @@ def svd( f"Array ``A`` must have a datatype of float32 or float64, but has {A.dtype}" ) - if A.is_distributed() and A.split == 0: - if A.lshape_map[:, 0].max().item() < A.shape[1]: - raise ValueError( - "Input ``A`` is split along the rows and the local chunks of data are rectangular with more columns than rows. \n This case is not supported by the current implementation of SVD in Heat." - ) - else: - # this is the distributed, tall skinny case - # compute SVD via tall skinny QR - if compute_uv: - # compute full SVD: first full QR, then SVD of R - Q, R = qr(A, mode="reduced", procs_to_merge=qr_procs_to_merge) - Utilde_loc, S_loc, Vt_loc = torch.linalg.svd(R.larray, full_matrices=False) - Utilde = DNDarray( - Utilde_loc, - tuple(Utilde_loc.shape), - dtype=A.dtype, - split=None, - device=A.device, - comm=A.comm, - balanced=A.balanced, - ) - S = DNDarray( - S_loc, - tuple(S_loc.shape), - dtype=A.dtype, - split=None, - device=A.device, - comm=A.comm, - balanced=A.balanced, - ) - V = DNDarray( - Vt_loc.T, - tuple(Vt_loc.T.shape), - dtype=A.dtype, - split=None, - device=A.device, - comm=A.comm, - balanced=A.balanced, - ) - U = (Utilde.T @ Q.T).T - return U, S, V - else: - # compute only singular values: first only R of QR, then singular values only of R - _, R = qr(A, mode="r", procs_to_merge=qr_procs_to_merge) - S_loc = torch.linalg.svdvals(R.larray) - S = DNDarray( - S_loc, - tuple(S_loc.shape), - dtype=A.dtype, - split=None, - device=A.device, - comm=A.comm, - balanced=A.balanced, - ) - return S - if A.is_distributed() and A.split == 1: - - if A.lshape_map[:, 1].max().item() < A.shape[0]: - raise ValueError( - "Input ``A`` is split along the columns and the local chunks of data are rectangular with more rows than columns. \n This case is not supported by the current implementation of SVD in Heat." - ) - else: - # this is the distributed, short fat case - # apply the tall skinny SVD to the transpose of A - if compute_uv: - V, S, U = svd( - A.T, - full_matrices=full_matrices, - compute_uv=True, - qr_procs_to_merge=qr_procs_to_merge, - ) - return U, S, V - else: - S = svd( - A.T, - full_matrices=full_matrices, - compute_uv=False, - qr_procs_to_merge=qr_procs_to_merge, - ) - return S - - else: + if not A.is_distributed(): # this is the non-distributed case if compute_uv: U_loc, S_loc, Vt_loc = torch.linalg.svd(A.larray, full_matrices=full_matrices) @@ -201,3 +141,104 @@ def svd( balanced=A.balanced, ) return S + elif A.split == 0 and A.lshape_map[:, 0].max().item() >= A.shape[1]: + # this is the distributed, tall skinny case + # compute SVD via tall skinny QR + if compute_uv: + # compute full SVD: first full QR, then SVD of R + Q, R = qr(A, mode="reduced", procs_to_merge=qr_procs_to_merge) + Utilde_loc, S_loc, Vt_loc = torch.linalg.svd(R.larray, full_matrices=False) + Utilde = DNDarray( + Utilde_loc, + tuple(Utilde_loc.shape), + dtype=A.dtype, + split=None, + device=A.device, + comm=A.comm, + balanced=A.balanced, + ) + S = DNDarray( + S_loc, + tuple(S_loc.shape), + dtype=A.dtype, + split=None, + device=A.device, + comm=A.comm, + balanced=A.balanced, + ) + V = DNDarray( + Vt_loc.T, + tuple(Vt_loc.T.shape), + dtype=A.dtype, + split=None, + device=A.device, + comm=A.comm, + balanced=A.balanced, + ) + U = (Utilde.T @ Q.T).T + return U, S, V + else: + # compute only singular values: first only R of QR, then singular values only of R + _, R = qr(A, mode="r", procs_to_merge=qr_procs_to_merge) + S_loc = torch.linalg.svdvals(R.larray) + S = DNDarray( + S_loc, + tuple(S_loc.shape), + dtype=A.dtype, + split=None, + device=A.device, + comm=A.comm, + balanced=A.balanced, + ) + return S + elif A.split == 1 and A.lshape_map[:, 1].max().item() >= A.shape[0]: + # this is the distributed, short fat case + # apply the tall skinny SVD to the transpose of A + if compute_uv: + V, S, U = svd( + A.T, + full_matrices=full_matrices, + compute_uv=True, + qr_procs_to_merge=qr_procs_to_merge, + ) + return U, S, V + else: + S = svd( + A.T, + full_matrices=full_matrices, + compute_uv=False, + qr_procs_to_merge=qr_procs_to_merge, + ) + return S + + else: + # this is the general, distributed case in which the matrix is neither tall skinny nor short fat + # we apply the Zolotarev-Polar Decomposition and the symmetric eigenvalue decomposition + if A.shape[0] < A.shape[1]: + # Zolo-PD requires A.shape[0] >= A.shape[1], so we need to transpose in this case + if compute_uv: + V, S, U = svd( + A.T, + full_matrices=full_matrices, + compute_uv=True, + qr_procs_to_merge=qr_procs_to_merge, + ) + return U, S, V + else: + S = svd( + A.T, + full_matrices=full_matrices, + compute_uv=False, + qr_procs_to_merge=qr_procs_to_merge, + ) + return S + else: + warn( + "You are performing the full SVD of a distributed matrix that is neither of tall-skinny nor short-fat shape. \n This operation may be costly in terms of memory and compute time." + ) + U, H = polar(A, r_max=r_max_zolopd) + S, V = eigh(H, r_max_zolopd=r_max_zolopd) + if not compute_uv: + return S + else: + return U @ V, S, V diff --git a/heat/core/linalg/svdtools.py b/heat/core/linalg/svdtools.py index 3ff273a79c..fafe9fef46 100644 --- a/heat/core/linalg/svdtools.py +++ b/heat/core/linalg/svdtools.py @@ -11,21 +11,21 @@ from ..dndarray import DNDarray from .. import factories from .. import types -from ..linalg import matmul, vector_norm +from ..linalg import matmul, vector_norm, qr, svd from ..indexing import where from ..random import randn - +from ..sanitation import sanitize_in_nd_realfloating from ..manipulations import vstack, hstack, diag, balance from .. import statistics from math import log, ceil, floor, sqrt -__all__ = ["hsvd_rank", "hsvd_rtol", "hsvd"] +__all__ = ["hsvd_rank", "hsvd_rtol", "hsvd", "rsvd", "isvd"] ####################################################################################### -# user-friendly versions of hSVD +# hierachical SVD "hSVD" ####################################################################################### @@ -40,61 +40,53 @@ def hsvd_rank( Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray ]: """ - Hierarchical SVD (hSVD) with prescribed truncation rank `maxrank`. - If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:maxrank] (and sigma[:maxrank], V[:,:maxrank]). - - The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. - - One can expect a similar outcome from this routine as for sci-kit learn's TruncatedSVD (with `algorithm='randomized'`) although a different, determinstic algorithm is applied here. Hereby, the parameters `n_components` - and `n_oversamples` (sci-kit learn) roughly correspond to `maxrank` and `safetyshift` (see below). - - Parameters - ---------- - A : DNDarray - 2D-array (float32/64) of which the hSVD has to be computed. - maxrank : int - truncation rank. (This parameter corresponds to `n_components` in sci-kit learn's TruncatedSVD.) - compute_sv : bool, optional - compute_sv=True implies that also Sigma and V are computed and returned. The default is False. - maxmergedim : int, optional - maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. - Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large. - safetyshift : int, optional - Increases the actual truncation rank within the computations by a safety shift. The default is 5. (There is some similarity to `n_oversamples` in sci-kit learn's TruncatedSVD.) - silent : bool, optional - silent=False implies that some information on the computations are printed. The default is True. - - Returns - ------- - (Union[ Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray]) - if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree). - if compute_sv=False: U, a-posteriori error estimate - - Notes - ------- - The size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues. - Note that, as sci-kit learn's randomized SVD, this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed - and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure. + Hierarchical SVD (hSVD) with prescribed truncation rank `maxrank`. + If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:maxrank] (and sigma[:maxrank], V[:,:maxrank]). + + The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. + + One can expect a similar outcome from this routine as for sci-kit learn's TruncatedSVD (with `algorithm='randomized'`) although a different, determinstic algorithm is applied here. Hereby, the parameters `n_components` + and `n_oversamples` (sci-kit learn) roughly correspond to `maxrank` and `safetyshift` (see below). + + Parameters + ---------- + A : DNDarray + 2D-array (float32/64) of which the hSVD has to be computed. + maxrank : int + truncation rank. (This parameter corresponds to `n_components` in sci-kit learn's TruncatedSVD.) + compute_sv : bool, optional + compute_sv=True implies that also Sigma and V are computed and returned. The default is False. + maxmergedim : int, optional + maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. + Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large. + safetyshift : int, optional + Increases the actual truncation rank within the computations by a safety shift. The default is 5. (There is some similarity to `n_oversamples` in sci-kit learn's TruncatedSVD.) + silent : bool, optional + silent=False implies that some information on the computations are printed. The default is True. + + Returns + ------- + (Union[ Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray]) + if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree). + if compute_sv=False: U, a-posteriori error estimate + + Notes + ----- + The size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues. + Note that, as sci-kit learn's randomized SVD, this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed + and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure. See Also - --------- + -------- :func:`hsvd` :func:`hsvd_rtol` - References - ------- - [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016. - [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018. + + References + ---------- + [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016. + [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018. """ - if not isinstance(A, DNDarray): - raise TypeError(f"Argument needs to be a DNDarray but is {type(A)}.") - if not A.ndim == 2: - raise ValueError("A needs to be a 2D matrix") - if not A.dtype == types.float32 and not A.dtype == types.float64: - raise TypeError( - "Argument needs to be a DNDarray with datatype float32 or float64, but data type is {}.".format( - A.dtype - ) - ) + sanitize_in_nd_realfloating(A, "A", [2]) A_local_size = max(A.lshape_map[:, 1]) if maxmergedim is not None and maxmergedim < 2 * (maxrank + safetyshift) + 1: @@ -135,52 +127,52 @@ def hsvd_rtol( Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray ]: """ - Hierchical SVD (hSVD) with prescribed upper bound on the relative reconstruction error. - If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:r] (and sigma[:r], V[:,:r]) - such that the rel. reconstruction error ||A-U[:,:r] diag(sigma[:r]) V[:,:r]^T ||_F / ||A||_F does not exceed rtol. - - The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. This routine is similar to `hsvd_rank` with the difference that - truncation is not performed after a fixed number (namly `maxrank` many) singular values but after such a number of singular values that suffice to capture a prescribed fraction of the amount of information - contained in the input data (`rtol`). - - Parameters - ---------- - A : DNDarray - 2D-array (float32/64) of which the hSVD has to be computed. - rtol : float - desired upper bound on the relative reconstruction error ||A-U Sigma V^T ||_F / ||A||_F. This upper bound is processed into 'local' - tolerances during the actual computations assuming the worst case scenario of a binary "merging tree"; therefore, the a-posteriori - error for the relative error using the true "merging tree" (see output) may be significantly smaller than rtol. - Prescription of maxrank or maxmergedim (disabled in default) can result in loss of desired precision, but can help to avoid memory issues. - compute_sv : bool, optional - compute_sv=True implies that also Sigma and V are computed and returned. The default is False. - no_of_merges : int, optional - Maximum number of processes to be merged at each step. If no further arguments are provided (see below), - this completely determines the "merging tree" and may cause memory issues. The default is None and results in a binary merging tree. - Note that no_of_merges dominates maxrank and maxmergedim in the sense that at most no_of_merges processes are merged - even if maxrank and maxmergedim would allow merging more processes. - maxrank : int, optional - maximal truncation rank. The default is None. - Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision. - Setting only maxrank (and not maxmergedim) results in an appropriate default choice for maxmergedim depending on the size of the local slices of A and the value of maxrank. - maxmergedim : int, optional - maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. The default is None. - Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large. - Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision. - Setting only maxmergedim (and not maxrank) results in an appropriate default choice for maxrank. - safetyshift : int, optional - Increases the actual truncation rank within the computations by a safety shift. The default is 5. - silent : bool, optional - silent=False implies that some information on the computations are printed. The default is True. - - Returns - ------- - (Union[ Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray]) - if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree used in the computations). - if compute_sv=False: U, a-posteriori error estimate - - Notes - ------- + Hierchical SVD (hSVD) with prescribed upper bound on the relative reconstruction error. + If A = U diag(sigma) V^T is the true SVD of A, this routine computes an approximation for U[:,:r] (and sigma[:r], V[:,:r]) + such that the rel. reconstruction error ||A-U[:,:r] diag(sigma[:r]) V[:,:r]^T ||_F / ||A||_F does not exceed rtol. + + The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. This routine is similar to `hsvd_rank` with the difference that + truncation is not performed after a fixed number (namly `maxrank` many) singular values but after such a number of singular values that suffice to capture a prescribed fraction of the amount of information + contained in the input data (`rtol`). + + Parameters + ---------- + A : DNDarray + 2D-array (float32/64) of which the hSVD has to be computed. + rtol : float + desired upper bound on the relative reconstruction error ||A-U Sigma V^T ||_F / ||A||_F. This upper bound is processed into 'local' + tolerances during the actual computations assuming the worst case scenario of a binary "merging tree"; therefore, the a-posteriori + error for the relative error using the true "merging tree" (see output) may be significantly smaller than rtol. + Prescription of maxrank or maxmergedim (disabled in default) can result in loss of desired precision, but can help to avoid memory issues. + compute_sv : bool, optional + compute_sv=True implies that also Sigma and V are computed and returned. The default is False. + no_of_merges : int, optional + Maximum number of processes to be merged at each step. If no further arguments are provided (see below), + this completely determines the "merging tree" and may cause memory issues. The default is None and results in a binary merging tree. + Note that no_of_merges dominates maxrank and maxmergedim in the sense that at most no_of_merges processes are merged + even if maxrank and maxmergedim would allow merging more processes. + maxrank : int, optional + maximal truncation rank. The default is None. + Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision. + Setting only maxrank (and not maxmergedim) results in an appropriate default choice for maxmergedim depending on the size of the local slices of A and the value of maxrank. + maxmergedim : int, optional + maximal size of the concatenation matrices during the merging procedure. The default is None and results in an appropriate choice depending on the size of the local slices of A and maxrank. The default is None. + Too small choices for this parameter will result in failure if the maximal size of the concatenation matrices does not allow to merge at least two matrices. Too large choices for this parameter can cause memory errors if the resulting merging problem becomes too large. + Setting at least one of maxrank and maxmergedim is recommended to avoid memory issues, but can result in loss of desired precision. + Setting only maxmergedim (and not maxrank) results in an appropriate default choice for maxrank. + safetyshift : int, optional + Increases the actual truncation rank within the computations by a safety shift. The default is 5. + silent : bool, optional + silent=False implies that some information on the computations are printed. The default is True. + + Returns + ------- + (Union[ Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray]) + if compute_sv=True: U, Sigma, V, a-posteriori error estimate for the reconstruction error ||A-U Sigma V^T ||_F / ||A||_F (computed according to [2] along the "true" merging tree used in the computations). + if compute_sv=False: U, a-posteriori error estimate + + Notes + ----- The maximum size of the process local SVDs to be computed during merging is proportional to the non-split size of the input A and (maxrank + safetyshift). Therefore, conservative choice of maxrank and safetyshift is advised to avoid memory issues. For similar reasons, prescribing only rtol and the number of processes to be merged in each step (without specifying maxrank or maxmergedim) may result in memory issues. Although prescribing maxrank is therefore strongly recommended to avoid memory issues, but may result in loss of desired precision (rtol). If this occures, a separate warning will be raised. @@ -188,25 +180,18 @@ def hsvd_rtol( Note that this routine is different from `numpy.linalg.svd` because not all singular values and vectors are computed and even those computed may be inaccurate if the input matrix exhibts a unfavorable structure. To avoid confusion, note that `rtol` in this routine does not have any similarity to `tol` in scikit learn's TruncatedSVD. + See Also - --------- + -------- :func:`hsvd` :func:`hsvd_rank` - References - ------- + + References + ---------- [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016. [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018. """ - if not isinstance(A, DNDarray): - raise TypeError(f"Argument needs to be a DNDarray but is {type(A)}.") - if not A.ndim == 2: - raise ValueError("A needs to be a 2D matrix") - if not A.dtype == types.float32 and not A.dtype == types.float64: - raise TypeError( - "Argument needs to be a DNDarray with datatype float32 or float64, but data type is {}.".format( - A.dtype - ) - ) + sanitize_in_nd_realfloating(A, "A", [2]) A_local_size = max(A.lshape_map[:, 1]) if maxmergedim is not None and maxrank is None: @@ -252,11 +237,6 @@ def hsvd_rtol( ) -################################################################################################ -# hSVD - "full" routine for the experts -################################################################################################ - - def hsvd( A: DNDarray, maxrank: Optional[int] = None, @@ -271,7 +251,7 @@ def hsvd( Tuple[DNDarray, DNDarray, DNDarray, float], Tuple[DNDarray, DNDarray, DNDarray], DNDarray ]: """ - This function computes an approximate truncated SVD of A utilizing a distributed hiearchical algorithm; see the references. + Computes an approximate truncated SVD of A utilizing a distributed hiearchical algorithm; see the references. The present function `hsvd` is a low-level routine, provides many options/parameters, but no default values, and is not recommended for usage by non-experts since conflicts arising from inappropriate parameter choice will not be catched. We strongly recommend to use the corresponding high-level functions `hsvd_rank` and `hsvd_rtol` instead. @@ -303,12 +283,12 @@ def hsvd( if compute_sv=False: U, a-posteriori error estimate References - ------- + ---------- [1] Iwen, Ong. A distributed and incremental SVD algorithm for agglomerative data analysis on large networks. SIAM J. Matrix Anal. Appl., 37(4), 2016. [2] Himpe, Leibner, Rave. Hierarchical approximate proper orthogonal decomposition. SIAM J. Sci. Comput., 40 (5), 2018. See Also - --------- + -------- :func:`hsvd_rank` :func:`hsvd_rtol` """ @@ -338,7 +318,7 @@ def hsvd( "\t\t".join(["%d" % an for an in active_nodes]), ) - U_loc, sigma_loc, err_squared_loc = compute_local_truncated_svd( + U_loc, sigma_loc, err_squared_loc = _compute_local_truncated_svd( level, A.comm.rank, A.larray, maxrank, loc_atol, safetyshift ) U_loc = torch.matmul(U_loc, torch.diag(sigma_loc)) @@ -416,7 +396,7 @@ def hsvd( if len(future_nodes) == 1: safetyshift = 0 - U_loc, sigma_loc, err_squared_loc_new = compute_local_truncated_svd( + U_loc, sigma_loc, err_squared_loc_new = _compute_local_truncated_svd( level, A.comm.rank, U_loc, maxrank, loc_atol, safetyshift ) @@ -470,12 +450,7 @@ def hsvd( return U, rel_error_estimate -############################################################################################## -# AUXILIARY ROUTINES -############################################################################################## - - -def compute_local_truncated_svd( +def _compute_local_truncated_svd( level: int, proc_id: int, U_loc: torch.Tensor, @@ -529,3 +504,295 @@ def compute_local_truncated_svd( sigma_loc = torch.zeros(1, dtype=U_loc.dtype, device=U_loc.device) U_loc = torch.zeros(U_loc.shape[0], 1, dtype=U_loc.dtype, device=U_loc.device) return U_loc, sigma_loc, err_squared_loc + + +############################################################################################## +# Randomized SVD "rSVD" +############################################################################################## + + +def rsvd( + A: DNDarray, + rank: int, + n_oversamples: int = 10, + power_iter: int = 0, + qr_procs_to_merge: int = 2, +) -> Union[Tuple[DNDarray, DNDarray, DNDarray], Tuple[DNDarray, DNDarray]]: + r""" + Randomized SVD (rSVD) with prescribed truncation rank `rank`. + If :math:`A = U \operatorname{diag}(S) V^T` is the true SVD of A, this routine computes an approximation for U[:,:rank] (and S[:rank], V[:,:rank]). + + The accuracy of this approximation depends on the structure of A ("low-rank" is best) and appropriate choice of parameters. + + Parameters + ---------- + A : DNDarray + 2D-array (float32/64) of which the rSVD has to be computed. + rank : int + truncation rank. (This parameter corresponds to `n_components` in scikit-learn's TruncatedSVD.) + n_oversamples : int, optional + number of oversamples. The default is 10. + power_iter : int, optional + number of power iterations. The default is 0. + Choosing `power_iter > 0` can improve the accuracy of the SVD approximation in the case of slowly decaying singular values, but increases the computational cost. + qr_procs_to_merge : int, optional + number of processes to merge at each step of QR decomposition in the power iteration (if power_iter > 0). The default is 2. See the corresponding remarks for :func:`heat.linalg.qr() ` for more details. + + + Notes + ----- + Memory requirements: the SVD computation of a matrix of size (rank + n_oversamples) x (rank + n_oversamples) must fit into the memory of a single process. + The implementation follows Algorithm 4.4 (randomized range finder) and Algorithm 5.1 (direct SVD) in [1]. + + References + ---------- + [1] Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2), 217-288. + """ + sanitize_in_nd_realfloating(A, "A", [2]) + if not isinstance(rank, int): + raise TypeError(f"rank must be an integer, but is {type(rank)}.") + if rank < 1: + raise ValueError(f"rank must be positive, but is {rank}.") + if not isinstance(n_oversamples, int): + raise TypeError( + f"if provided, n_oversamples must be an integer, but is {type(n_oversamples)}." + ) + if n_oversamples < 0: + raise ValueError(f"n_oversamples must be non-negative, but is {n_oversamples}.") + if not isinstance(power_iter, int): + raise TypeError(f"if provided, power_iter must be an integer, but is {type(power_iter)}.") + if power_iter < 0: + raise ValueError(f"power_iter must be non-negative, but is {power_iter}.") + + ell = rank + n_oversamples + q = power_iter + + # random matrix + splitOmega = 1 if A.split == 0 else 0 + Omega = randn(A.shape[1], ell, dtype=A.dtype, device=A.device, split=splitOmega) + + # compute the range of A + Y = matmul(A, Omega) + Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge) + + # power iterations + for _ in range(q): + if Q.split is not None and Q.shape[Q.split] < Q.comm.size: + Q.resplit_(None) + Y = matmul(A.T, Q) + Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge) + if Q.split is not None and Q.shape[Q.split] < Q.comm.size: + Q.resplit_(None) + Y = matmul(A, Q) + Q, _ = qr(Y, procs_to_merge=qr_procs_to_merge) + + # compute the SVD of the projected matrix + if Q.split is not None and Q.shape[Q.split] < Q.comm.size: + Q.resplit_(None) + B = matmul(Q.T, A) + B.resplit_( + None + ) # B will be of size ell x ell and thus small enough to fit into memory of a single process + U, sigma, V = svd.svd(B) # actually just torch svd as input is not split anymore + U = matmul(Q, U)[:, :rank] + U.balance_() + S = sigma[:rank] + V = V[:, :rank] + V.balance_() + return U, S, V + + +############################################################################################## +# Incremental SVD "iSVD" +############################################################################################## + + +def _isvd( + new_data: DNDarray, + U_old: DNDarray, + S_old: DNDarray, + V_old: Optional[DNDarray] = None, + maxrank: Optional[int] = None, + old_matrix_size: Optional[int] = None, + old_rowwise_mean: Optional[DNDarray] = None, +) -> Union[Tuple[DNDarray, DNDarray, DNDarray], Tuple[DNDarray, DNDarray, DNDarray, DNDarray]]: + """ + Helper function for iSVD and iPCA; follows roughly the "incremental PCA with mean update", Fig.1 in: + David A. Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang. Incremental Learning for Robust Visual Tracking. IJCV, 2008. + + Either incremental SVD / PCA or incremental SVD / PCA with mean subtraction is performed. + + Parameters + ---------- + new_data: DNDarray + new data as DNDarray + U_old, S_old, V_old: DNDarrays + "old" SVD-factors + if no V_old is provided, only U and S are computed (PCA) + maxrank: int, optional + rank to which new SVD should be truncated + old_matrix_size: int, optional + size of the old matrix; this does not need to be identical to V_old.shape[0] as "old" SVD might have been truncated + old_rowwise_mean: int, optional + row-wise mean of the old matrix; if not provided, no mean subtraction is performed + """ + # old SVD is SVD of a matrix of dimension m x n as has rank r + # new data have shape m x d + d = new_data.shape[1] + n = V_old.shape[0] if V_old is not None else old_matrix_size + r = S_old.shape[0] + if maxrank is None: + maxrank = min(n + d, U_old.shape[0]) + else: + maxrank = min(maxrank, min(n + d, U_old.shape[0])) + + if old_rowwise_mean is not None: + new_data_rowwise_mean = statistics.mean(new_data, axis=1) + new_rowwise_mean = (old_matrix_size * old_rowwise_mean + d * new_data_rowwise_mean) / ( + old_matrix_size + d + ) + new_data -= new_data_rowwise_mean.reshape(-1, 1) + new_data = hstack( + [ + new_data, + (new_data_rowwise_mean - old_rowwise_mean) + * (d * old_matrix_size / (d + old_matrix_size)) ** 0.5, + ] + ) + d += 1 + + # orthogonalize and decompose new_data + UtC = U_old.T @ new_data + if U_old.split is not None: + new_data = new_data.resplit_(U_old.split) - U_old @ UtC + else: + new_data = new_data - (U_old @ UtC).resplit_(new_data.split) + P, Rc = qr(new_data) + + # prepare one component of "new" V-factor + if V_old is not None: + V_new = vstack( + [ + V_old, + factories.zeros( + (d, r), + device=V_old.device, + dtype=V_old.dtype, + split=V_old.split, + comm=V_old.comm, + ), + ] + ) + helper = vstack( + [ + factories.zeros( + (n, d), + device=V_old.device, + dtype=V_old.dtype, + split=V_old.split, + comm=V_old.comm, + ), + factories.eye( + d, device=V_old.device, dtype=V_old.dtype, split=V_old.split, comm=V_old.comm + ), + ] + ) + V_new = hstack([V_new, helper]) + del helper + + # prepare one component of "new" U-factor + U_new = hstack([U_old, P]) + + # prepare "inner" matrix that needs to be decomposed, decompose it + helper1 = vstack( + [ + diag(S_old), + factories.zeros( + (Rc.shape[0] + UtC.shape[0] - r, r), + device=S_old.device, + dtype=S_old.dtype, + split=S_old.split, + comm=S_old.comm, + ), + ] + ) + if r > d: + Rc = Rc.resplit_(UtC.split) + else: + UtC = UtC.resplit_(Rc.split) + helper2 = vstack([UtC, Rc]) + innermat = hstack([helper1, helper2]) + del (helper1, helper2) + # as innermat is small enough to fit into memory of a single process, we can use torch svd + u, s, v = svd.svd(innermat.resplit_(None)) + del innermat + + # truncate if desired + if maxrank < s.shape[0]: + u = u[:, :maxrank] + s = s[:maxrank] + v = v[:, :maxrank] + + U_new = U_new @ u + if V_old is not None: + V_new = V_new @ v + + if V_old is not None: # use-case: SVD + return U_new, s, V_new + if old_rowwise_mean is not None: # use-case PCA + return U_new, s, new_rowwise_mean + + +def isvd( + new_data: DNDarray, + U_old: DNDarray, + S_old: DNDarray, + V_old: DNDarray, + maxrank: Optional[int] = None, +) -> Tuple[DNDarray, DNDarray, DNDarray]: + r"""Incremental SVD (iSVD) for the addition of new data to an existing SVD. + Given the the SVD of an "old" matrix, :math:`X_\textnormal{old} = `U_\textnormal{old} \cdot S_\textnormal{old} \cdot V_\textnormal{old}^T`, and additional columns :math:`N` (\"`new_data`\"), this routine computes + (a possibly approximate) SVD of the extended matrix :math:`X_\textnormal{new} = [ X_\textnormal{old} | N]`. + + Parameters + ---------- + new_data : DNDarray + 2D-array (float32/64) of columns that are added to the "old" SVD. It must hold `new_data.split != 1` if `U_old.split = 0`. + U_old : DNDarray + U-factor of the SVD of the "old" matrix, 2D-array (float32/64). It must hold `U_old.split != 0` if `new_data.split = 1`. + S_old : DNDarray + Sigma-factor of the SVD of the "old" matrix, 1D-array (float32/64) + V_old : DNDarray + V-factor of the SVD of the "old" matrix, 2D-array (float32/64) + maxrank : int, optional + truncation rank of the SVD of the extended matrix. The default is None, i.e., no bound on the maximal rank is imposed. + + Notes + ----- + Inexactness may arise due to truncation to maximal rank `maxrank` if rank of the data to be processed exceeds this rank. + If you set `maxrank` to a high number (or None) in order to avoid inexactness, you may encounter memory issues. + The implementation follows the approach described in Ref. [1], Sect. 2. + + References + ---------- + [1] Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1), 20-30. + """ + # check if new_data, U_old, V_old are 2D DNDarrays and float32/64 + sanitize_in_nd_realfloating(new_data, "new_data", [2]) + sanitize_in_nd_realfloating(U_old, "U_old", [2]) + sanitize_in_nd_realfloating(S_old, "S_old", [1]) + sanitize_in_nd_realfloating(V_old, "V_old", [2]) + # check if number of columns of U_old and V_old match the number of elements in S_old + if U_old.shape[1] != S_old.shape[0]: + raise ValueError( + "The number of columns of U_old must match the number of elements in S_old." + ) + if V_old.shape[1] != S_old.shape[0]: + raise ValueError( + "The number of columns of V_old must match the number of elements in S_old." + ) + # check if the number of columns of new_data matches the number of rows of U_old and V_old + if new_data.shape[0] != U_old.shape[0]: + raise ValueError("The number of rows of new_data must match the number of rows of U_old.") + + return _isvd(new_data, U_old, S_old, V_old, maxrank) diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py index 9063c2558a..f8c500a72b 100644 --- a/heat/core/linalg/tests/test_basics.py +++ b/heat/core/linalg/tests/test_basics.py @@ -1,26 +1,74 @@ -from typing import Type +import numpy as np import torch -import os -import unittest import heat as ht -import numpy as np from ...tests.test_suites.basic_test import TestCase +from ..basics import _estimate_largest_singularvalue class TestLinalgBasics(TestCase): + def test_estimate_largest_singularvalue(self): + for param in [(0, ht.float32), (1, ht.float64)]: + with self.subTest(param=param): + x = ht.random.randn(100, 100, split=param[0], dtype=param[1]) + est = _estimate_largest_singularvalue(x) + self.assertIsInstance(est, ht.DNDarray) + self.assertTrue(est >= 0) + self.assertEqual(est.dtype, param[1]) + self.assertTrue(est.item() >= np.linalg.svd(x.numpy(), compute_uv=False).max()) + + # catch wrong inputs + with self.assertRaises(NotImplementedError): + est = _estimate_largest_singularvalue(x, algorithm="invalid") + with self.assertRaises(TypeError): + est = _estimate_largest_singularvalue(x, algorithm=1) + + def test_condest(self): + # split = 0, tall-skinny type, float32 (actually split = 1, but due to transposition this yields split = 0 interally) + x = ht.random.randn(25, 25 * ht.MPI_WORLD.size, split=1, dtype=ht.float32) + est = ht.linalg.condest(x) + self.assertIsInstance(est, ht.DNDarray) + self.assertTrue(est >= 0) + self.assertTrue(est.dtype, ht.float32) + xnp = x.numpy() + xnpsvals = np.linalg.svd(xnp, compute_uv=False) + self.assertTrue(est.item() >= xnpsvals.max() / xnpsvals.min()) + + # split = 1, float64 + x = ht.random.randn( + 25 * ht.MPI_WORLD.size + 2, 25 * ht.MPI_WORLD.size + 1, split=1, dtype=ht.float64 + ) + est = ht.linalg.condest(x, algorithm="randomized", params={"nsamples": 15}) + self.assertEqual(est.shape, ()) + self.assertEqual(est.device, x.device) + self.assertTrue(est.dtype, ht.float64) + self.assertTrue(est.item() >= np.linalg.svd(x.numpy(), compute_uv=False).max()) + + # catch wrong inputs + with self.assertRaises(NotImplementedError): + est = ht.linalg.condest(x, algorithm="invalid") + with self.assertRaises(TypeError): + est = ht.linalg.condest(x, algorithm=3.14) + with self.assertRaises(ValueError): + est = ht.linalg.condest(x, algorithm="randomized", params={"nsamples": 0}) + with self.assertRaises(TypeError): + est = ht.linalg.condest(x, algorithm="randomized", params=10) + with self.assertRaises(ValueError): + est = ht.linalg.condest(x, p=3) + def test_cross(self): a = ht.eye(3) b = ht.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]]) - # different types - cross = ht.cross(a, b) - self.assertEqual(cross.shape, a.shape) - self.assertEqual(cross.dtype, a.dtype) - self.assertEqual(cross.split, a.split) - self.assertEqual(cross.comm, a.comm) - self.assertEqual(cross.device, a.device) - self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]]))) + # different types - do not run on MPS + if not self.is_mps: + cross = ht.cross(a, b) + self.assertEqual(cross.shape, a.shape) + self.assertEqual(cross.dtype, a.dtype) + self.assertEqual(cross.split, a.split) + self.assertEqual(cross.comm, a.comm) + self.assertEqual(cross.device, a.device) + self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]]))) # axis a = ht.eye(3, split=0) @@ -32,7 +80,7 @@ def test_cross(self): self.assertEqual(cross.split, a.split) self.assertEqual(cross.comm, a.comm) self.assertEqual(cross.device, a.device) - self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]]))) + self.assertTrue(ht.equal(cross, ht.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]], dtype=ht.int))) a = ht.eye(3, dtype=ht.int8, split=1) b = ht.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=ht.int8, split=1) @@ -47,8 +95,8 @@ def test_cross(self): # test axisa, axisb, axisc np.random.seed(42) - np_a = np.random.randn(40, 3, 50) - np_b = np.random.randn(3, 40, 50) + np_a = np.random.randn(40, 3, 50).astype(np.float32) + np_b = np.random.randn(3, 40, 50).astype(np.float32) np_cross = np.cross(np_a, np_b, axisa=1, axisb=0) a = ht.array(np_a, split=0) @@ -63,16 +111,26 @@ def test_cross(self): # test vector axes with 2 elements b_2d = ht.array(np_b[:-1, :, :], split=1) cross_3d_2d = ht.cross(a, b_2d, axisa=1, axisb=0) - np_cross_3d_2d = np.cross(np_a, np_b[:-1, :, :], axisa=1, axisb=0) + np_cross_3d_2d = np.cross( + np_a, + np.concatenate([np_b[:-1, :, :], np.zeros((1, 40, 50))], axis=0, dtype=np.float32), + axisa=1, + axisb=0, + ) self.assert_array_equal(cross_3d_2d, np_cross_3d_2d) a_2d = ht.array(np_a[:, :-1, :], split=0) cross_2d_3d = ht.cross(a_2d, b, axisa=1, axisb=0) - np_cross_2d_3d = np.cross(np_a[:, :-1, :], np_b, axisa=1, axisb=0) + np_cross_2d_3d = np.cross( + np.concatenate([np_a[:, :-1, :], np.zeros((40, 1, 50))], axis=1, dtype=np.float32), + np_b, + axisa=1, + axisb=0, + ) self.assert_array_equal(cross_2d_3d, np_cross_2d_3d) cross_z_comp = ht.cross(a_2d, b_2d, axisa=1, axisb=0) - np_cross_z_comp = np.cross(np_a[:, :-1, :], np_b[:-1, :, :], axisa=1, axisb=0) + np_cross_z_comp = np_a[:, 0, ...] * np_b[1, ...] - np_a[:, 1, ...] * np_b[0, ...] self.assert_array_equal(cross_z_comp, np_cross_z_comp) a_wrong_split = ht.array(np_a[:, :-1, :], split=2) @@ -93,7 +151,7 @@ def test_cross(self): def test_det(self): # (3,3) with pivoting ares = ht.array(54.0) - a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=0, dtype=ht.double) + a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=0, dtype=ht.float32) adet = ht.linalg.det(a) self.assertTupleEqual(adet.shape, ares.shape) @@ -102,7 +160,9 @@ def test_det(self): self.assertEqual(adet.device, a.device) self.assertTrue(ht.equal(adet, ares)) - a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=1, dtype=ht.double) + dtype = ht.float64 if not self.is_mps else ht.float32 + + a = ht.array([[-2.0, -1, 2], [2, 1, 4], [-3, 3, -1]], split=1, dtype=dtype) adet = ht.linalg.det(a) self.assertTupleEqual(adet.shape, ares.shape) @@ -113,7 +173,7 @@ def test_det(self): # det==0 ares = ht.array(0.0) - a = ht.array([[0, 0, 0], [2, 1, 4], [-3, 3, -1]], dtype=ht.float64, split=0) + a = ht.array([[0, 0, 0], [2, 1, 4], [-3, 3, -1]], dtype=dtype, split=0) adet = ht.linalg.det(a) self.assertTupleEqual(adet.shape, ares.shape) @@ -194,7 +254,11 @@ def test_dot(self): a1d = ht.array(data1d, dtype=ht.float32, split=0) b1d = ht.array(data1d, dtype=ht.float32, split=0) self.assertEqual(ht.dot(a1d, b1d), np.dot(data1d, data1d)) - # 2 1D arrays, + + dtype = np.float32 if self.is_mps else np.float64 + data1d = data1d.astype(dtype) + data2d = data2d.astype(dtype) + data3d = data3d.astype(dtype) a2d = ht.array(data2d, split=1) b2d = ht.array(data2d, split=1) @@ -210,13 +274,13 @@ def test_dot(self): const1 = 5 const2 = 6 # a is const - res = ht.dot(const1, b2d) - ht.array(np.dot(const1, data2d)) + res = ht.dot(const1, b2d) - ht.array(np.dot(const1, data2d).astype(dtype)) ret = 0 ht.dot(const1, b2d, out=ret) self.assertEqual(ht.equal(res, ht.zeros(res.shape)), 1) # b is const - res = ht.dot(a2d, const2) - ht.array(np.dot(data2d, const2)) + res = ht.dot(a2d, const2) - ht.array(np.dot(data2d, const2).astype(dtype)) self.assertEqual(ht.equal(res, ht.zeros(res.shape)), 1) # a and b and const self.assertEqual(ht.dot(const2, const1), 5 * 6) @@ -281,34 +345,38 @@ def test_inv(self): self.assertTrue(ht.allclose(ainv, ares, atol=1e-6)) # pivoting row change - ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=0) / 3.0 - a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=0) + dtype = ht.float32 if self.is_mps else ht.float64 + atol = 1e-6 if dtype == ht.float32 else 1e-12 + + ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=dtype, split=0) / 3.0 + a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=dtype, split=0) ainv = ht.linalg.inv(a) self.assertEqual(ainv.split, a.split) self.assertEqual(ainv.device, a.device) self.assertTupleEqual(ainv.shape, a.shape) - self.assertTrue(ht.allclose(ainv, ares, atol=1e-6)) + self.assertTrue(ht.allclose(ainv, ares, atol=atol)) - ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=1) / 3.0 - a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=1) + ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=dtype, split=1) / 3.0 + a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=dtype, split=1) ainv = ht.linalg.inv(a) self.assertEqual(ainv.split, a.split) self.assertEqual(ainv.device, a.device) self.assertTupleEqual(ainv.shape, a.shape) - self.assertTrue(ht.allclose(ainv, ares, atol=1e-15)) + self.assertTrue(ht.allclose(ainv, ares, atol=atol)) ht.random.seed(42) - a = ht.random.random((20, 20), dtype=ht.float64, split=1) + a = ht.random.random((20, 20), dtype=dtype, split=1) ainv = ht.linalg.inv(a) i = ht.eye(a.shape, split=1, dtype=a.dtype) # loss of precision in distributed floating-point ops - self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-10)) + self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol)) ht.random.seed(42) - a = ht.random.random((20, 20), dtype=ht.float64, split=0) + a = ht.random.random((20, 20), dtype=dtype, split=0) ainv = ht.linalg.inv(a) i = ht.eye(a.shape, split=0, dtype=a.dtype) - self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-10)) + print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}") + self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol * 1e2)) with self.assertRaises(RuntimeError): ht.linalg.inv(ht.array([1, 2, 3], split=0)) @@ -827,14 +895,16 @@ def test_matmul(self): a = ht.zeros((3, 3, 3), split=0) b = ht.zeros((4, 3, 3), split=0) ht.matmul(a, b) - # not implemented split - """ - todo + # split along different batch dimension with self.assertRaises(NotImplementedError): - a = ht.zeros((3, 3, 3)) - b = ht.zeros((3, 3, 3)) + a = ht.zeros((4, 3, 3, 3), split=0) + b = ht.zeros((4, 3, 3, 3), split=1) + ht.matmul(a, b) + # batched matrix-vector multiplication + with self.assertRaises(NotImplementedError): + a = ht.zeros((3, 3, 3), split=0) + b = ht.zeros((3, 3), split=0) ht.matmul(a, b) - """ # batched, split batch n = 11 # number of batches @@ -1071,7 +1141,10 @@ def test_outer(self): self.assertTrue((ht_outer_split.numpy() == np_outer).all()) # a_split.ndim > 1 and a.split != 0 - a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float64, split=2) + if self.is_mps: + a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float32, split=2) + else: + a_split_3d = ht.random.randn(3, 3, 3, dtype=ht.float64, split=2) ht_outer_split = ht.outer(a_split_3d, b_split) np_outer_3d = np.outer(a_split_3d.numpy(), b_split.numpy()) self.assertTrue(ht_outer_split.split == 0) @@ -1772,39 +1845,40 @@ def test_tril(self): self.assertTrue((result.larray == comparison).all()) local_ones = ht.ones((3, 4, 5, 6)) - - # 2D+ case, no offset, data is not split, module-level call - result = local_ones.tril() - comparison = torch.ones((5, 6), device=self.device.torch_device).tril() - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) - - # 2D+ case, positive offset, data is not split, module-level call - result = local_ones.tril(k=2) - comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=2) - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) - - # # 2D+ case, negative offset, data is not split, module-level call - result = local_ones.tril(k=-2) - comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=-2) - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) + if not self.is_mps: + # triu, tril fail on MPS for ndim > 2 + # 2D+ case, no offset, data is not split, module-level call + result = local_ones.tril() + comparison = torch.ones((5, 6), device=self.device.torch_device).tril() + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) + + # 2D+ case, positive offset, data is not split, module-level call + result = local_ones.tril(k=2) + comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=2) + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) + + # # 2D+ case, negative offset, data is not split, module-level call + result = local_ones.tril(k=-2) + comparison = torch.ones((5, 6), device=self.device.torch_device).tril(diagonal=-2) + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) distributed_ones = ht.ones((5,), split=0) @@ -1994,39 +2068,39 @@ def test_triu(self): self.assertTrue((result.larray == comparison).all()) local_ones = ht.ones((3, 4, 5, 6)) - - # 2D+ case, no offset, data is not split, module-level call - result = local_ones.triu() - comparison = torch.ones((5, 6), device=self.device.torch_device).triu() - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) - - # 2D+ case, positive offset, data is not split, module-level call - result = local_ones.triu(k=2) - comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=2) - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) - - # # 2D+ case, negative offset, data is not split, module-level call - result = local_ones.triu(k=-2) - comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=-2) - self.assertIsInstance(result, ht.DNDarray) - self.assertEqual(result.shape, (3, 4, 5, 6)) - self.assertEqual(result.lshape, (3, 4, 5, 6)) - self.assertEqual(result.split, None) - for i in range(3): - for j in range(4): - self.assertTrue((result.larray[i, j] == comparison).all()) + if not self.is_mps: + # 2D+ case, no offset, data is not split, module-level call + result = local_ones.triu() + comparison = torch.ones((5, 6), device=self.device.torch_device).triu() + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) + + # 2D+ case, positive offset, data is not split, module-level call + result = local_ones.triu(k=2) + comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=2) + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) + + # # 2D+ case, negative offset, data is not split, module-level call + result = local_ones.triu(k=-2) + comparison = torch.ones((5, 6), device=self.device.torch_device).triu(diagonal=-2) + self.assertIsInstance(result, ht.DNDarray) + self.assertEqual(result.shape, (3, 4, 5, 6)) + self.assertEqual(result.lshape, (3, 4, 5, 6)) + self.assertEqual(result.split, None) + for i in range(3): + for j in range(4): + self.assertTrue((result.larray[i, j] == comparison).all()) distributed_ones = ht.ones((5,), split=0) @@ -2182,7 +2256,7 @@ def test_vecdot(self): c = ht.linalg.vecdot(a, b, axis=0, keepdims=True) self.assertEqual(c.dtype, ht.float32) self.assertEqual(c.device, a.device) - self.assertTrue(ht.equal(c, ht.array([[8, 8, 8, 8]]))) + self.assertTrue(ht.equal(c, ht.array([[8, 8, 8, 8]], dtype=ht.float32))) def test_vector_norm(self): a = ht.arange(9, dtype=ht.float) - 4 @@ -2236,22 +2310,23 @@ def test_vector_norm(self): ) # different dtype - vn = ht.linalg.vector_norm(ht.full((4, 4, 4), 1 + 1j, dtype=ht.int), axis=0, ord=4) - self.assertEqual(vn.split, None) - self.assertEqual(vn.dtype, ht.float) - self.assertTrue( - ht.equal( - vn, - ht.array( - [ - [2.0, 2.0, 2.0, 2.0], - [2.0, 2.0, 2.0, 2.0], - [2.0, 2.0, 2.0, 2.0], - [2.0, 2.0, 2.0, 2.0], - ] - ), + if not self.is_mps: + vn = ht.linalg.vector_norm(ht.full((4, 4, 4), 1 + 1j, dtype=ht.int), axis=0, ord=4) + self.assertEqual(vn.split, None) + self.assertEqual(vn.dtype, ht.float) + self.assertTrue( + ht.equal( + vn, + ht.array( + [ + [2.0, 2.0, 2.0, 2.0], + [2.0, 2.0, 2.0, 2.0], + [2.0, 2.0, 2.0, 2.0], + [2.0, 2.0, 2.0, 2.0], + ] + ), + ) ) - ) # bad ord with self.assertRaises(ValueError): diff --git a/heat/core/linalg/tests/test_eigh.py b/heat/core/linalg/tests/test_eigh.py new file mode 100644 index 0000000000..45b4eecb42 --- /dev/null +++ b/heat/core/linalg/tests/test_eigh.py @@ -0,0 +1,55 @@ +import heat as ht +import unittest +import numpy as np + +from ...tests.test_suites.basic_test import TestCase + + +class TestEigh(TestCase): + def _check_eigh_result(self, X, Lambda, H): + dtypetol = 1e-3 if X.dtype == ht.float32 else 1e-5 + self.assertEqual(Lambda.shape, (X.shape[0],)) + self.assertEqual(H.shape, X.shape) + self.assertEqual(H.split, X.split) + self.assertEqual(Lambda.split, 0) + self.assertEqual(H.dtype, X.dtype) + self.assertEqual(Lambda.dtype, X.dtype) + X_rec = H @ ht.diag(Lambda) @ H.T + self.assertTrue(ht.norm(X - X_rec) / ht.norm(X) < dtypetol) + HtH = H.T @ H + eye_size_H = ht.eye(HtH.shape[0], split=HtH.split, dtype=X.dtype) + self.assertTrue(ht.norm(HtH - eye_size_H) / ht.norm(eye_size_H) < dtypetol) + + def test_eigh(self): + # test with default values + splits = [None, 0, 1] + dtypes = [ht.float32, ht.float64] + i = 0 + for split in splits: + for dtype in dtypes: + with self.subTest(split=split, dtype=dtype): + ht.random.seed(41 + i) + X = ht.random.randn(100, 100, split=split, dtype=dtype) + X = X + X.T.resplit_(X.split) + Lambda, H = ht.linalg.eigh(X) + self._check_eigh_result(X, Lambda, H) + i += 1 + + def test_eigh_options(self): + # test non-default options + ht.random.seed(42) + X = ht.random.randn(101, 101, split=0, dtype=ht.float32) + X = X @ X.T + Lambda, H = ht.linalg.eigh(X, r_max_zolopd=1, silent=False) + self._check_eigh_result(X, Lambda, H) + + def test_eigh_catch_wrong_inputs(self): + # non-square DNDarray as input + X = ht.random.rand(100, 101, split=0, dtype=ht.float32) + with self.assertRaises(ValueError): + ht.linalg.eigh(X) + + # r_max_zolopd not of right type + X = ht.random.rand(100, 100, split=0, dtype=ht.float32) + with self.assertRaises(ValueError): + ht.linalg.eigh(X, r_max_zolopd=2.2) diff --git a/heat/core/linalg/tests/test_polar.py b/heat/core/linalg/tests/test_polar.py new file mode 100644 index 0000000000..2d65a9c04d --- /dev/null +++ b/heat/core/linalg/tests/test_polar.py @@ -0,0 +1,117 @@ +import heat as ht +import unittest +import torch +import numpy as np + +from ...tests.test_suites.basic_test import TestCase + + +class TestZolopolar(TestCase): + def _check_polar(self, A, U, H, dtypetol): + # check whether output has right type, shape and dtype + self.assertTrue(isinstance(U, ht.DNDarray)) + self.assertEqual(U.shape, A.shape) + self.assertEqual(U.dtype, A.dtype) + self.assertTrue(isinstance(H, ht.DNDarray)) + self.assertEqual(H.shape, (A.shape[1], A.shape[1])) + self.assertEqual(H.dtype, A.dtype) + + # check whether output is correct + A_np = A.numpy() + U_np = U.numpy() + H_np = H.numpy() + # U orthogonal + self.assertTrue( + np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=dtypetol, rtol=dtypetol) + ) + # H symmetric + self.assertTrue(np.allclose(H_np.T, H_np, atol=dtypetol, rtol=dtypetol)) + # H positive definite, i.e., eigenvalues > 0 + self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all()) + # A = U H + self.assertTrue(np.allclose(A_np, U_np @ H_np, atol=dtypetol, rtol=dtypetol)) + + def test_catch_wrong_inputs(self): + # if A is not a DNDarray + with self.assertRaises(TypeError): + ht.polar("I am clearly not a DNDarray. Do you mind?") + # test wrong input dimension + with self.assertRaises(ValueError): + ht.polar(ht.zeros((10, 10, 10), dtype=ht.float32)) + # test wrong input shape + with self.assertRaises(ValueError): + ht.polar(ht.random.rand(10, 11, dtype=ht.float32)) + # test wrong input dtype + with self.assertRaises(TypeError): + ht.polar(ht.ones((10, 10), dtype=ht.int32)) + # wrong input for r + with self.assertRaises(ValueError): + ht.polar(ht.ones((11, 10)), r=1.0) + # wrong input for tol + with self.assertRaises(TypeError): + ht.polar(ht.ones((11, 10)), r=2, condition_estimate=1) + + def test_polar_split0(self): + # split=0, float32, no condition estimate provided, silent mode + for r in range(1, 9): + with self.subTest(r=r): + ht.random.seed(18112024) + A = ht.random.randn(100, 10 * r, split=0, dtype=ht.float32) + if ( + ht.MPI_WORLD.size % r == 0 and ht.MPI_WORLD.size != r + ) or ht.MPI_WORLD.size == 1: + U, H = ht.polar(A, r=r) + dtypetol = 1e-4 + self._check_polar(A, U, H, dtypetol) + else: + with self.assertRaises(ValueError): + U, H = ht.polar(A, r=r) + + # cases not covered so far + A = ht.random.randn(100, 100, split=0, dtype=ht.float64) + U, H = ht.polar(A, condition_estimate=1.0e16, silent=False) + dtypetol = 1e-7 + + self._check_polar(A, U, H, dtypetol) + + # case without calculating H + ht.random.seed(10122024) + A = ht.random.randn(100, 10, split=0, dtype=ht.float32) + U = ht.polar(A, calcH=False) + U_np = U.numpy() + self.assertTrue(np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=1e-4, rtol=1e-4)) + H_np = U_np.T @ A.numpy() + self.assertTrue(np.allclose(H_np.T, H_np, atol=1e-4, rtol=1e-4)) + self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all()) + + def test_polar_split1(self): + # split=1, float64, condition estimate provided, non-silent mode + for r in range(1, 9): + with self.subTest(r=r): + ht.random.seed(623) + A = ht.random.randn(100, 99, split=1, dtype=ht.float64) + if ( + ht.MPI_WORLD.size % r == 0 and ht.MPI_WORLD.size != r + ) or ht.MPI_WORLD.size == 1: + U, H = ht.polar(A, r=r, silent=False, condition_estimate=1.0e16) + dtypetol = 1e-7 + + self._check_polar(A, U, H, dtypetol) + else: + with self.assertRaises(ValueError): + U, H = ht.polar(A, r=r) + + # cases not covered so far + A = ht.random.randn(100, 99, split=1, dtype=ht.float32) + U, H = ht.polar(A, silent=False, condition_estimate=1.0e16) + dtypetol = 1e-4 + self._check_polar(A, U, H, dtypetol) + + # case without calculating H + A = ht.random.randn(100, 100, split=1, dtype=ht.float64) + U = ht.polar(A, calcH=False, condition_estimate=1.0e16) + U_np = U.numpy() + self.assertTrue(np.allclose(U_np.T @ U_np, np.eye(U_np.shape[1]), atol=1e-7, rtol=1e-7)) + H_np = U_np.T @ A.numpy() + self.assertTrue(np.allclose(H_np.T, H_np, atol=1e-8, rtol=1e-8)) + self.assertTrue((np.linalg.eigvalsh(H_np) > 0).all()) diff --git a/heat/core/linalg/tests/test_qr.py b/heat/core/linalg/tests/test_qr.py index 6de9e091d8..dc31e03caf 100644 --- a/heat/core/linalg/tests/test_qr.py +++ b/heat/core/linalg/tests/test_qr.py @@ -8,17 +8,21 @@ class TestQR(TestCase): def test_qr_split1orNone(self): + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] ht.random.seed(1234) for split in [1, None]: for mode in ["reduced", "r"]: - # note that split = 1 can be handeled for arbitrary shapes + # note that split = 1 can be handled for arbitrary shapes for shape in [ (20 * ht.MPI_WORLD.size + 1, 40 * ht.MPI_WORLD.size), (20 * ht.MPI_WORLD.size, 20 * ht.MPI_WORLD.size), (40 * ht.MPI_WORLD.size - 1, 20 * ht.MPI_WORLD.size), ]: - for dtype in [ht.float32, ht.float64]: + for dtype in dtypes: dtypetol = 1e-3 if dtype == ht.float32 else 1e-6 mat = ht.random.randn(*shape, dtype=dtype, split=split) qr = ht.linalg.qr(mat, mode=mode) @@ -72,12 +76,20 @@ def test_qr_split1orNone(self): ) def test_qr_split0(self): + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] split = 0 for procs_to_merge in [0, 2, 3]: for mode in ["reduced", "r"]: - # split = 0 can be handeled only for tall skinny matrices s.t. the local chunks are at least square too - for shape in [(40 * ht.MPI_WORLD.size + 1, 40), (40 * ht.MPI_WORLD.size, 20)]: - for dtype in [ht.float32, ht.float64]: + # split = 0 can be handled only for tall skinny matrices s.t. the local chunks are at least square too + for shape in [ + (20 * ht.MPI_WORLD.size + 1, 40 * ht.MPI_WORLD.size), + (20 * ht.MPI_WORLD.size, 20 * ht.MPI_WORLD.size), + (40 * ht.MPI_WORLD.size - 1, 20 * ht.MPI_WORLD.size), + ]: + for dtype in dtypes: dtypetol = 1e-3 if dtype == ht.float32 else 1e-6 mat = ht.random.randn(*shape, dtype=dtype, split=split) @@ -124,13 +136,45 @@ def test_qr_split0(self): ) ) + def test_batched_qr_splitNone(self): + # two batch dimensions, float64 data type, "split = None" (split batch axis) + x = ht.random.rand(2, 2 * ht.MPI_WORLD.size, 10, 9, dtype=ht.float32, split=1) + _, r = ht.linalg.qr(x, mode="r") + self.assertEqual(r.shape, (2, 2 * ht.MPI_WORLD.size, 9, 9)) + self.assertEqual(r.split, 1) + + def test_batched_qr_split1(self): + # skip float64 tests on MPS + if not self.is_mps: + # two batch dimensions, float64 data type, "split = 1" (last dimension) + ht.random.seed(0) + x = ht.random.rand(3, 2, 50, ht.MPI_WORLD.size * 5 + 3, dtype=ht.float64, split=3) + q, r = ht.linalg.qr(x) + batched_id = ht.stack([ht.eye(q.shape[3], dtype=ht.float64) for _ in range(6)]).reshape( + 3, 2, q.shape[3], q.shape[3] + ) + + self.assertTrue( + ht.allclose(q.transpose([0, 1, 3, 2]) @ q, batched_id, atol=1e-6, rtol=1e-6) + ) + self.assertTrue(ht.allclose(q @ r, x, atol=1e-6, rtol=1e-6)) + + def test_batched_qr_split0(self): + ht.random.seed(424242) + # one batch dimension, float32 data type, "split = 0" (second last dimension) + x = ht.random.randn( + 8, ht.MPI_WORLD.size * 10 + 3, ht.MPI_WORLD.size * 10 - 1, dtype=ht.float32, split=1 + ) + q, r = ht.linalg.qr(x) + batched_id = ht.stack([ht.eye(q.shape[2], dtype=ht.float32) for _ in range(q.shape[0])]) + + self.assertTrue(ht.allclose(q.transpose([0, 2, 1]) @ q, batched_id, atol=1e-3, rtol=1e-3)) + self.assertTrue(ht.allclose(q @ r, x, atol=1e-3, rtol=1e-3)) + def test_wronginputs(self): # test wrong input type with self.assertRaises(TypeError): ht.linalg.qr([1, 2, 3]) - # test too many input dimensions - with self.assertRaises(ValueError): - ht.linalg.qr(ht.zeros((10, 10, 10))) # wrong data type for mode with self.assertRaises(TypeError): ht.linalg.qr(ht.zeros((10, 10)), mode=1) @@ -148,13 +192,6 @@ def test_wronginputs(self): # test wrong procs_to_merge with self.assertRaises(ValueError): ht.linalg.qr(ht.zeros((10, 10)), procs_to_merge=1) - # test wrong shape - with self.assertRaises(ValueError): - ht.linalg.qr(ht.zeros((10, 10, 10))) # test wrong dtype with self.assertRaises(TypeError): ht.linalg.qr(ht.zeros((10, 10), dtype=ht.int32)) - # test wrong shape for split=0 - if ht.MPI_WORLD.size > 1: - with self.assertRaises(ValueError): - ht.linalg.qr(ht.zeros((10, 10), split=0)) diff --git a/heat/core/linalg/tests/test_solver.py b/heat/core/linalg/tests/test_solver.py index 660ab995d6..944305b63e 100644 --- a/heat/core/linalg/tests/test_solver.py +++ b/heat/core/linalg/tests/test_solver.py @@ -31,9 +31,14 @@ def test_cg(self): ht.linalg.cg(A, b, A) def test_lanczos(self): + # single precision tolerance for torch.inv() is pretty bad + tolerance = 1e-3 + + dtype, atol = (ht.float32, tolerance) if self.is_mps else (ht.float64, 1e-12) + # define positive definite matrix (n,n), split = 0 n = 100 - A = ht.random.randn(n, n, dtype=ht.float64, split=0) + A = ht.random.randn(n, n, dtype=dtype, split=0) B = A @ A.T # Lanczos decomposition with iterations m = n V, T = ht.lanczos(B, m=n) @@ -41,32 +46,27 @@ def test_lanczos(self): self.assertTrue(T.dtype is B.dtype) # V must be unitary V_inv = ht.linalg.inv(V) - self.assertTrue(ht.allclose(V_inv, V.T)) + self.assertTrue(ht.allclose(V_inv, V.T, atol=atol)) # V T V.T must be = B, V transposed = V inverse lanczos_B = V @ T @ V_inv - self.assertTrue(ht.allclose(lanczos_B, B)) + self.assertTrue(ht.allclose(lanczos_B, B, atol=atol)) # complex128, output buffers - A = ( - ht.random.rand(n, n, dtype=ht.float64, split=0) - + ht.random.rand(n, n, dtype=ht.float64, split=0) * 1j - ) - A_conj = ht.conj(A) - B = A @ A_conj.T - m = n - V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm) - T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm) - # Lanczos decomposition with iterations m = n - ht.lanczos(B, m=m, V_out=V_out, T_out=T_out) - # V must be unitary - V_inv = ht.linalg.inv(V_out) - self.assertTrue(ht.allclose(V_inv, ht.conj(V_out).T)) - # V T V* must be = B, V conjugate transpose = V inverse - lanczos_B = V_out @ T_out @ V_inv - self.assertTrue(ht.allclose(lanczos_B, B)) - - # single precision tolerance for torch.inv() is pretty bad - tolerance = 1e-3 + if not self.is_mps: + A = ht.random.rand(n, n, dtype=ht.complex128, split=0) + A_conj = ht.conj(A) + B = A @ A_conj.T + m = n + V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm) + T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm) + # Lanczos decomposition with iterations m = n + ht.lanczos(B, m=m, V_out=V_out, T_out=T_out) + # V must be unitary + V_inv = ht.linalg.inv(V_out) + self.assertTrue(ht.allclose(V_inv, ht.conj(V_out).T)) + # V T V* must be = B, V conjugate transpose = V inverse + lanczos_B = V_out @ T_out @ V_inv + self.assertTrue(ht.allclose(lanczos_B, B)) # float32, pre_defined v0, split mismatch A = ht.random.randn(n, n, dtype=ht.float32, split=0) @@ -77,46 +77,46 @@ def test_lanczos(self): V, T = ht.lanczos(B, m=n, v0=v0) self.assertTrue(V.dtype is B.dtype) self.assertTrue(T.dtype is B.dtype) - # V must be unitary - V_inv = ht.linalg.inv(V) - self.assertTrue(ht.allclose(V_inv, V.T, atol=tolerance)) - # V T V.T must be = B, V transposed = V inverse - lanczos_B = V @ T @ V_inv - self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance)) + # # skipping the following tests as torch.inv on float32 is too imprecise + # # V must be unitary + # V_inv = ht.linalg.inv(V) + # self.assertTrue(ht.allclose(V_inv, V.T, atol=atol)) + # # V T V.T must be = B, V transposed = V inverse + # lanczos_B = V @ T @ V_inv + # self.assertTrue(ht.allclose(lanczos_B, B, atol=atol)) # complex64 - A = ( - ht.random.randn(n, n, dtype=ht.float32, split=0) - + ht.random.randn(n, n, dtype=ht.float32, split=0) * 1j - ) - A_conj = ht.conj(A) - B = A @ A_conj.T - # Lanczos decomposition with iterations m = n - V, T = ht.lanczos(B, m=n) - # V must be unitary - # V T V* must be = B, V conjugate transpose = V inverse - V_conj = ht.conj(V) - lanczos_B = V @ T @ V_conj.T - self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance)) + if not self.is_mps: + # in principle, MPS supports complex64, but many operations are not implemented, e.g. matmul, div + A = ht.random.randn(n, n, dtype=ht.complex64, split=0) + A_conj = ht.conj(A) + B = A @ A_conj.T + # Lanczos decomposition with iterations m = n + V, T = ht.lanczos(B, m=n) + # V must be unitary + # V T V* must be = B, V conjugate transpose = V inverse + V_conj = ht.conj(V) + lanczos_B = V @ T @ V_conj.T + self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance)) # non-distributed - A = ht.random.randn(n, n, dtype=ht.float64, split=None) + A = ht.random.randn(n, n, dtype=dtype, split=None) B = A @ A.T # Lanczos decomposition with iterations m = n m = n V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm) - T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm) + T_out = ht.zeros((m, m), dtype=dtype, device=B.device, comm=B.comm) ht.lanczos(B, m=m, V_out=V_out, T_out=T_out) self.assertTrue(V_out.dtype is B.dtype) self.assertTrue(T_out.dtype is B.real.dtype) # V must be unitary V_inv = ht.linalg.inv(V_out) - self.assertTrue(ht.allclose(V_inv, V_out.T)) + self.assertTrue(ht.allclose(V_inv, V_out.T, atol=atol)) # without output buffers V, T = ht.lanczos(B, m=m) # V T V.T must be = B, V transposed = V inverse lanczos_B = V @ T @ V.T - self.assertTrue(ht.allclose(lanczos_B, B)) + self.assertTrue(ht.allclose(lanczos_B, B, atol=atol)) with self.assertRaises(TypeError): V, T = ht.lanczos(B, m="3") @@ -199,19 +199,20 @@ def test_solve_triangular(self): self.assertTrue(ht.equal(res, c)) # batched tests - batch_shapes = [ - (10,), - ( - 4, - 4, - 4, - 20, - ), - ] + if self.is_mps: + # reduction ops on tensors with ndim > 4 are not supported on MPS + # see e.g. https://github.com/pytorch/pytorch/issues/129960 + # fmt: off + batch_shapes = [(10,),] + # fmt: on + else: + # fmt: off + batch_shapes = [(10,), (4, 4, 4, 20,),] + # fmt: on m = 100 # data dimension size # exceptions - batch_shape = batch_shapes[1] + batch_shape = batch_shapes[-1] at = torch.rand((*batch_shape, m, m)) # at += torch.eye(k) @@ -235,7 +236,6 @@ def test_solve_triangular(self): for batch_shape in batch_shapes: # batch_shape = tuple() # no batch dimensions - at = torch.rand((*batch_shape, m, m)) # at += torch.eye(k) at += 1e2 * torch.ones_like(at) # make gaussian elimination more stable @@ -254,7 +254,6 @@ def test_solve_triangular(self): b.resplit_(s1) res = ht.linalg.solve_triangular(a, b) - self.assertTrue(ht.allclose(c, res)) # split in batch dimension @@ -264,5 +263,4 @@ def test_solve_triangular(self): c.resplit_(s) res = ht.linalg.solve_triangular(a, b) - self.assertTrue(ht.allclose(c, res)) diff --git a/heat/core/linalg/tests/test_svd.py b/heat/core/linalg/tests/test_svd.py index e25a5acd12..5badcb242d 100644 --- a/heat/core/linalg/tests/test_svd.py +++ b/heat/core/linalg/tests/test_svd.py @@ -8,7 +8,11 @@ class TestTallSkinnySVD(TestCase): def test_tallskinny_split0(self): - for dtype in [ht.float32, ht.float64]: + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for dtype in dtypes: for n_merge in [0, None]: tol = 1e-5 if dtype == ht.float32 else 1e-10 X = ht.random.randn(ht.MPI_WORLD.size * 10 + 3, 10, split=0, dtype=dtype) @@ -30,7 +34,11 @@ def test_tallskinny_split0(self): self.assertTrue(ht.all(S >= 0)) def test_shortfat_split1(self): - for dtype in [ht.float32, ht.float64]: + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for dtype in dtypes: tol = 1e-5 if dtype == ht.float32 else 1e-10 X = ht.random.randn(10, ht.MPI_WORLD.size * 10 + 3, split=1, dtype=dtype) U, S, V = ht.linalg.svd(X) @@ -48,7 +56,11 @@ def test_shortfat_split1(self): self.assertTrue(ht.all(S >= 0)) def test_singvals_only(self): - for dtype in [ht.float32, ht.float64]: + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for dtype in dtypes: tol = 1e-5 if dtype == ht.float32 else 1e-10 for split in [0, 1]: shape = ( @@ -69,16 +81,6 @@ def test_singvals_only(self): ) def test_wrong_inputs(self): - # split = 0 but not tall skinny - X = ht.random.randn(10, 10, split=0) - if ht.MPI_WORLD.size > 1: - with self.assertRaises(ValueError): - ht.linalg.svd(X) - # split = 1 but not short fat - X = ht.random.randn(10, 10, split=1) - if ht.MPI_WORLD.size > 1: - with self.assertRaises(ValueError): - ht.linalg.svd(X) # full_matrices = True X = ht.random.rand(10 * ht.MPI_WORLD.size, 5, split=0) with self.assertRaises(NotImplementedError): @@ -100,3 +102,47 @@ def test_wrong_inputs(self): X = ht.ones((10 * ht.MPI_WORLD.size, 10), split=0, dtype=ht.int32) with self.assertRaises(TypeError): ht.linalg.svd(X) + + +class TestZoloSVD(TestCase): + def test_full_svd(self): + shapes = [(100, 100), (117, 100), (100, 103)] + splits = [None, 0, 1] + dtypes = [ht.float32, ht.float64] + for shape in shapes: + for split in splits: + for dtype in dtypes: + with self.subTest(shape=shape, split=split, dtype=dtype): + ht.random.seed(123) + tol = 1e-2 if dtype == ht.float32 else 1e-2 + X = ht.random.randn(*shape, split=split, dtype=dtype) + if split is not None and ht.MPI_WORLD.size > 1: + with self.assertWarns(UserWarning): + U, S, V = ht.linalg.svd(X) + else: + U, S, V = ht.linalg.svd(X) + self.assertTrue( + ht.allclose( + U.T @ U, ht.eye(U.shape[1], dtype=dtype), rtol=tol, atol=tol + ) + ) + self.assertTrue( + ht.allclose( + V.T @ V, ht.eye(V.shape[1], dtype=dtype), rtol=tol, atol=tol + ) + ) + self.assertTrue(ht.allclose(U @ ht.diag(S) @ V.T, X, rtol=tol, atol=tol)) + self.assertTrue(ht.all(S >= 0)) + + def test_options_full_svd(self): + # only singular values + X = ht.random.rand(101, 100, split=0, dtype=ht.float32) + S = ht.linalg.svd(X, compute_uv=False) + + # prescribed r_max_zolopd + U, S, V = ht.linalg.svd(X, r_max_zolopd=1) + + # catch error if r_max_zolopd is not provided properly + if X.is_distributed(): + with self.assertRaises(ValueError): + ht.linalg.svd(X, r_max_zolopd=0) diff --git a/heat/core/linalg/tests/test_svdtools.py b/heat/core/linalg/tests/test_svdtools.py index 2946f4f88c..dbb517ab76 100644 --- a/heat/core/linalg/tests/test_svdtools.py +++ b/heat/core/linalg/tests/test_svdtools.py @@ -10,157 +10,178 @@ class TestHSVD(TestCase): def test_hsvd_rank_part1(self): - nprocs = MPI.COMM_WORLD.Get_size() - test_matrices = [ - ht.random.randn(50, 15 * nprocs, dtype=ht.float32, split=1), - ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=1), - ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=0), - ht.random.randn(15 * nprocs, 50, dtype=ht.float64, split=0), - ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=None), - ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=None), - ht.zeros((50, 15 * nprocs), dtype=ht.float32, split=1), - ] - rtols = [1e-1, 1e-2, 1e-3] - ranks = [5, 10, 15] - - # check if hsvd yields "reasonable" results for random matrices, i.e. - # U (resp. V) is orthogonal for split=1 (resp. split=0) - # hsvd_rank yields the correct rank - # the true reconstruction error is <= error estimate - # for hsvd_rtol: true reconstruction error <= rtol (provided no further options) - - for A in test_matrices: - if A.dtype == ht.float64: - dtype_tol = 1e-8 - if A.dtype == ht.float32: - dtype_tol = 1e-3 + # not testing on MPS for now as torch.norm() is unstable + if not self.is_mps: + nprocs = MPI.COMM_WORLD.Get_size() + test_matrices = [ + ht.random.randn(50, 15 * nprocs, dtype=ht.float32, split=1), + ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=1), + ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=0), + ht.random.randn(15 * nprocs, 50, dtype=ht.float64, split=0), + ht.random.randn(15 * nprocs, 50, dtype=ht.float32, split=None), + ht.random.randn(50, 15 * nprocs, dtype=ht.float64, split=None), + ht.zeros((50, 15 * nprocs), dtype=ht.float32, split=1), + ] + rtols = [1e-1, 1e-2, 1e-3] + ranks = [5, 10, 15] - for r in ranks: - U, sigma, V, err_est = ht.linalg.hsvd_rank(A, r, compute_sv=True, silent=True) - hsvd_rk = U.shape[1] - - if ht.norm(A) > 0: - self.assertEqual(hsvd_rk, r) - if A.split == 1: - U_orth_err = ( - ht.norm( - U.T @ U - - ht.eye(hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device) + # check if hsvd yields "reasonable" results for random matrices, i.e. + # U (resp. V) is orthogonal for split=1 (resp. split=0) + # hsvd_rank yields the correct rank + # the true reconstruction error is <= error estimate + # for hsvd_rtol: true reconstruction error <= rtol (provided no further options) + + for i, A in enumerate(test_matrices): + print("Testing hsvd for matrix {} of {}".format(i + 1, len(test_matrices))) + if A.dtype == ht.float64: + dtype_tol = 1e-8 + if A.dtype == ht.float32: + dtype_tol = 1e-3 + + for r in ranks: + U, sigma, V, err_est = ht.linalg.hsvd_rank(A, r, compute_sv=True, silent=True) + hsvd_rk = U.shape[1] + + if ht.norm(A) > 0: + self.assertEqual(hsvd_rk, r) + if A.split == 1: + U_orth_err = ( + ht.norm( + U.T @ U + - ht.eye( + hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device + ) + ) + / hsvd_rk**0.5 ) - / hsvd_rk**0.5 - ) - self.assertTrue(U_orth_err <= dtype_tol) - if A.split == 0: - V_orth_err = ( - ht.norm( - V.T @ V - - ht.eye(hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device) + self.assertTrue(U_orth_err <= dtype_tol) + if A.split == 0: + V_orth_err = ( + ht.norm( + V.T @ V + - ht.eye( + hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device + ) + ) + / hsvd_rk**0.5 ) - / hsvd_rk**0.5 - ) - self.assertTrue(V_orth_err <= dtype_tol) - true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A) - self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol) - else: - self.assertEqual(hsvd_rk, 1) - self.assertEqual(ht.norm(U), 0) - self.assertEqual(ht.norm(sigma), 0) - self.assertEqual(ht.norm(V), 0) - - # check if wrong parameter choice is caught - with self.assertRaises(RuntimeError): - ht.linalg.hsvd_rank(A, r, maxmergedim=4) - - for tol in rtols: - U, sigma, V, err_est = ht.linalg.hsvd_rtol(A, tol, compute_sv=True, silent=True) - hsvd_rk = U.shape[1] - - if ht.norm(A) > 0: - if A.split == 1: - U_orth_err = ( - ht.norm( - U.T @ U - - ht.eye(hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device) + self.assertTrue(V_orth_err <= dtype_tol) + true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A) + self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol) + else: + self.assertEqual(hsvd_rk, 1) + self.assertEqual(ht.norm(U), 0) + self.assertEqual(ht.norm(sigma), 0) + self.assertEqual(ht.norm(V), 0) + + # check if wrong parameter choice is caught + with self.assertRaises(RuntimeError): + ht.linalg.hsvd_rank(A, r, maxmergedim=4) + + for tol in rtols: + U, sigma, V, err_est = ht.linalg.hsvd_rtol(A, tol, compute_sv=True, silent=True) + hsvd_rk = U.shape[1] + + if ht.norm(A) > 0: + if A.split == 1: + U_orth_err = ( + ht.norm( + U.T @ U + - ht.eye( + hsvd_rk, dtype=U.dtype, split=U.T.split, device=U.device + ) + ) + / hsvd_rk**0.5 ) - / hsvd_rk**0.5 - ) - # print(U_orth_err) - self.assertTrue(U_orth_err <= dtype_tol) - if A.split == 0: - V_orth_err = ( - ht.norm( - V.T @ V - - ht.eye(hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device) + # print(U_orth_err) + self.assertTrue(U_orth_err <= dtype_tol) + if A.split == 0: + V_orth_err = ( + ht.norm( + V.T @ V + - ht.eye( + hsvd_rk, dtype=V.dtype, split=V.T.split, device=V.device + ) + ) + / hsvd_rk**0.5 ) - / hsvd_rk**0.5 - ) - self.assertTrue(V_orth_err <= dtype_tol) - true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A) - self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol) - self.assertTrue(true_rel_err <= tol) - else: - self.assertEqual(hsvd_rk, 1) - self.assertEqual(ht.norm(U), 0) - self.assertEqual(ht.norm(sigma), 0) - self.assertEqual(ht.norm(V), 0) - - # check if wrong parameter choices are catched - with self.assertRaises(ValueError): - ht.linalg.hsvd_rtol(A, tol, maxmergedim=4) - with self.assertRaises(ValueError): - ht.linalg.hsvd_rtol(A, tol, maxmergedim=10, maxrank=11) - with self.assertRaises(ValueError): - ht.linalg.hsvd_rtol(A, tol, no_of_merges=1) - - # check if wrong input arrays are catched - wrong_test_matrices = [ - 0, - ht.ones((50, 15 * nprocs), dtype=ht.int8, split=1), - ht.ones((50, 15 * nprocs), dtype=ht.int16, split=1), - ht.ones((50, 15 * nprocs), dtype=ht.int32, split=1), - ht.ones((50, 15 * nprocs), dtype=ht.int64, split=1), - ht.ones((50, 15 * nprocs), dtype=ht.complex64, split=1), - ht.ones((50, 15 * nprocs), dtype=ht.complex128, split=1), - ] - - for A in wrong_test_matrices: - with self.assertRaises(TypeError): - ht.linalg.hsvd_rank(A, 5) - with self.assertRaises(TypeError): - ht.linalg.hsvd_rank(A, 1e-1) - - wrong_test_matrices = [ - ht.ones((15, 15 * nprocs, 15), split=1, dtype=ht.float64), - ht.ones(15 * nprocs, split=0, dtype=ht.float64), - ] - for wrong_arr in wrong_test_matrices: - with self.assertRaises(ValueError): - ht.linalg.hsvd_rank(wrong_arr, 5) - with self.assertRaises(ValueError): - ht.linalg.hsvd_rtol(wrong_arr, 1e-1) - - # check if compute_sv=False yields the correct number of outputs (=1) - self.assertEqual(len(ht.linalg.hsvd_rank(test_matrices[0], 5)), 2) - self.assertEqual(len(ht.linalg.hsvd_rtol(test_matrices[0], 5e-1)), 2) + self.assertTrue(V_orth_err <= dtype_tol) + true_rel_err = ht.norm(U @ ht.diag(sigma) @ V.T - A) / ht.norm(A) + self.assertTrue(true_rel_err <= err_est or true_rel_err < dtype_tol) + self.assertTrue(true_rel_err <= tol) + else: + self.assertEqual(hsvd_rk, 1) + self.assertEqual(ht.norm(U), 0) + self.assertEqual(ht.norm(sigma), 0) + self.assertEqual(ht.norm(V), 0) + + # check if wrong parameter choices are catched + with self.assertRaises(ValueError): + ht.linalg.hsvd_rtol(A, tol, maxmergedim=4) + with self.assertRaises(ValueError): + ht.linalg.hsvd_rtol(A, tol, maxmergedim=10, maxrank=11) + with self.assertRaises(ValueError): + ht.linalg.hsvd_rtol(A, tol, no_of_merges=1) + + # check if wrong input arrays are catched + wrong_test_matrices = [ + 0, + ht.ones((50, 15 * nprocs), dtype=ht.int8, split=1), + ht.ones((50, 15 * nprocs), dtype=ht.int16, split=1), + ht.ones((50, 15 * nprocs), dtype=ht.int32, split=1), + ht.ones((50, 15 * nprocs), dtype=ht.int64, split=1), + ht.ones((50, 15 * nprocs), dtype=ht.complex64, split=1), + ht.ones((50, 15 * nprocs), dtype=ht.complex128, split=1), + ] + + for A in wrong_test_matrices: + with self.assertRaises(TypeError): + ht.linalg.hsvd_rank(A, 5) + with self.assertRaises(TypeError): + ht.linalg.hsvd_rank(A, 1e-1) + + wrong_test_matrices = [ + ht.ones((15, 15 * nprocs, 15), split=1, dtype=ht.float64), + ht.ones(15 * nprocs, split=0, dtype=ht.float64), + ] + for wrong_arr in wrong_test_matrices: + with self.assertRaises(ValueError): + ht.linalg.hsvd_rank(wrong_arr, 5) + with self.assertRaises(ValueError): + ht.linalg.hsvd_rtol(wrong_arr, 1e-1) + + # check if compute_sv=False yields the correct number of outputs (=1) + self.assertEqual(len(ht.linalg.hsvd_rank(test_matrices[0], 5)), 2) + self.assertEqual(len(ht.linalg.hsvd_rtol(test_matrices[0], 5e-1)), 2) def test_hsvd_rank_part2(self): # check if hsvd_rank yields correct results for maxrank <= truerank nprocs = MPI.COMM_WORLD.Get_size() true_rk = max(10, nprocs) - test_matrices_low_rank = [ - ht.utils.data.matrixgallery.random_known_rank( - 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 - ), - ht.utils.data.matrixgallery.random_known_rank( - 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 - ), - ht.utils.data.matrixgallery.random_known_rank( - 15 * nprocs, 50, true_rk, split=0, dtype=ht.float64 - ), - ht.utils.data.matrixgallery.random_known_rank( - 15 * nprocs, 50, true_rk, split=0, dtype=ht.float64 - ), - ] + if self.is_mps: + test_matrices_low_rank = [ + ht.utils.data.matrixgallery.random_known_rank( + 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 + ), + ht.utils.data.matrixgallery.random_known_rank( + 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 + ), + ] + else: + test_matrices_low_rank = [ + ht.utils.data.matrixgallery.random_known_rank( + 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 + ), + ht.utils.data.matrixgallery.random_known_rank( + 50, 15 * nprocs, true_rk, split=1, dtype=ht.float32 + ), + ht.utils.data.matrixgallery.random_known_rank( + 15 * nprocs, 50, true_rk, split=0, dtype=ht.float64 + ), + ht.utils.data.matrixgallery.random_known_rank( + 15 * nprocs, 50, true_rk, split=0, dtype=ht.float64 + ), + ] for mat in test_matrices_low_rank: A = mat[0] @@ -193,3 +214,120 @@ def test_hsvd_rank_part2(self): self.assertTrue(U_orth_err <= dtype_tol) self.assertTrue(V_orth_err <= dtype_tol) self.assertTrue(true_rel_err <= dtype_tol) + + +class TestRSVD(TestCase): + def test_rsvd(self): + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for dtype in dtypes: + dtype_tol = 1e-4 if dtype == ht.float32 else 1e-10 + for split in [0, 1, None]: + X = ht.random.randn(200, 200, dtype=dtype, split=split) + for rank in [ht.MPI_WORLD.size, 10]: + for n_oversamples in [5, 10]: + for power_iter in [0, 1, 2, 3]: + U, S, V = ht.linalg.rsvd( + X, rank, n_oversamples=n_oversamples, power_iter=power_iter + ) + self.assertEqual(U.shape, (X.shape[0], rank)) + self.assertEqual(S.shape, (rank,)) + self.assertEqual(V.shape, (X.shape[1], rank)) + self.assertTrue(ht.all(S >= 0)) + self.assertTrue( + ht.allclose( + U.T @ U, + ht.eye(rank, dtype=U.dtype, split=U.split), + rtol=dtype_tol, + atol=dtype_tol, + ) + ) + self.assertTrue( + ht.allclose( + V.T @ V, + ht.eye(rank, dtype=V.dtype, split=V.split), + rtol=dtype_tol, + atol=dtype_tol, + ) + ) + + def test_rsvd_catch_wrong_inputs(self): + X = ht.random.randn(10, 10) + # wrong dtype for rank + with self.assertRaises(TypeError): + ht.linalg.rsvd(X, "a") + # rank zero + with self.assertRaises(ValueError): + ht.linalg.rsvd(X, 0) + # wrong dtype for n_oversamples + with self.assertRaises(TypeError): + ht.linalg.rsvd(X, 10, n_oversamples="a") + # n_oversamples negative + with self.assertRaises(ValueError): + ht.linalg.rsvd(X, 10, n_oversamples=-1) + # wrong dtype for power_iter + with self.assertRaises(TypeError): + ht.linalg.rsvd(X, 10, power_iter="a") + # power_iter negative + with self.assertRaises(ValueError): + ht.linalg.rsvd(X, 10, power_iter=-1) + + +class TestISVD(TestCase): + def test_isvd(self): + ht.random.seed(27183) + if self.is_mps: + dtypes = [ht.float32] + else: + dtypes = [ht.float32, ht.float64] + for dtype in dtypes: + dtypetol = 1e-5 if dtype == ht.float32 else 1e-10 + for old_split in [0, 1, None]: + X_old, SVD_old = ht.utils.data.matrixgallery.random_known_rank( + 250, 25, 3 * ht.MPI_WORLD.size, split=old_split, dtype=dtype + ) + U_old, S_old, V_old = SVD_old + for new_split in [0, 1, None]: + new_data = ht.random.randn( + 250, 2 * ht.MPI_WORLD.size, split=new_split, dtype=dtype + ) + U_new, S_new, V_new = ht.linalg.isvd(new_data, U_old, S_old, V_old) + # check if U_new, V_new are orthogonal + self.assertTrue( + ht.allclose( + U_new.T @ U_new, + ht.eye(U_new.shape[1], dtype=U_new.dtype, split=U_new.split), + atol=dtypetol, + rtol=dtypetol, + ) + ) + self.assertTrue( + ht.allclose( + V_new.T @ V_new, + ht.eye(V_new.shape[1], dtype=V_new.dtype, split=V_new.split), + atol=dtypetol, + rtol=dtypetol, + ) + ) + # check if entries of S_new are positive + self.assertTrue(ht.all(S_new >= 0)) + # check if the reconstruction error is small + X_new = ht.hstack([X_old, new_data.resplit_(X_old.split)]) + X_rec = U_new @ ht.diag(S_new) @ V_new.T + self.assertTrue(ht.allclose(X_rec, X_new, atol=dtypetol, rtol=dtypetol)) + + def test_isvd_catch_wrong_inputs(self): + u_old = ht.zeros((10, 2)) + s_old = ht.zeros((3,)) + v_old = ht.zeros((5, 3)) + new_data = ht.zeros((11, 5)) + with self.assertRaises(ValueError): + ht.linalg.isvd(new_data, u_old, s_old, v_old) + s_old = ht.zeros((2,)) + with self.assertRaises(ValueError): + ht.linalg.isvd(new_data, u_old, s_old, v_old) + v_old = ht.zeros((5, 2)) + with self.assertRaises(ValueError): + ht.linalg.isvd(new_data, u_old, s_old, v_old) diff --git a/heat/core/logical.py b/heat/core/logical.py index 59051006ed..9bb8204008 100644 --- a/heat/core/logical.py +++ b/heat/core/logical.py @@ -48,7 +48,7 @@ def all( reference to ``out`` is returned. Parameters - ----------- + ---------- x : DNDarray Input array or object that can be converted to an array. axis : None or int or Tuple[int,...], optional @@ -63,7 +63,7 @@ def all( With this option, the result will broadcast correctly against the original array. Examples - --------- + -------- >>> x = ht.random.randn(4, 5) >>> x DNDarray([[ 0.7199, 1.3718, 1.5008, 0.3435, 1.2884], @@ -114,7 +114,7 @@ def allclose( for all elements of ``x`` and ``y``, ``False`` otherwise Parameters - ----------- + ---------- x : DNDarray First array to compare y : DNDarray @@ -128,7 +128,7 @@ def allclose( the output array. Examples - --------- + -------- >>> x = ht.float32([[2, 2], [2, 2]]) >>> ht.allclose(x, x) True @@ -179,7 +179,7 @@ def any( The returning array is one dimensional unless axis is not ``None``. Parameters - ----------- + ---------- x : DNDarray Input tensor axis : int, optional @@ -193,7 +193,7 @@ def any( With this option, the result will broadcast correctly against the original array. Examples - --------- + -------- >>> x = ht.float32([[0.3, 0, 0.5]]) >>> x.any() DNDarray([True], dtype=ht.bool, device=cpu:0, split=None) @@ -234,7 +234,7 @@ def isclose( within the given tolerance. If both ``x`` and ``y`` are scalars, returns a single boolean value. Parameters - ----------- + ---------- x : DNDarray Input array to compare. y : DNDarray @@ -390,14 +390,14 @@ def logical_and(x: DNDarray, y: DNDarray) -> DNDarray: Compute the truth value of ``x`` AND ``y`` element-wise. Returns a boolean :class:`~heat.core.dndarray.DNDarray` containing the truth value of ``x`` AND ``y`` element-wise. Parameters - ----------- + ---------- x : DNDarray Input array of same shape y : DNDarray Input array of same shape Examples - --------- + -------- >>> ht.logical_and(ht.array([True, False]), ht.array([False, False])) DNDarray([False, False], dtype=ht.bool, device=cpu:0, split=None) """ @@ -411,7 +411,7 @@ def logical_not(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Computes the element-wise logical NOT of the given input :class:`~heat.core.dndarray.DNDarray`. Parameters - ----------- + ---------- x : DNDarray Input array out : DNDarray, optional @@ -419,7 +419,7 @@ def logical_not(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: The output is a :class:`~heat.core.dndarray.DNDarray` with ``datatype=bool``. Examples - --------- + -------- >>> ht.logical_not(ht.array([True, False])) DNDarray([False, True], dtype=ht.bool, device=cpu:0, split=None) """ @@ -432,14 +432,14 @@ def logical_or(x: DNDarray, y: DNDarray) -> DNDarray: input :class:`~heat.core.dndarray.DNDarray`. Parameters - ----------- + ---------- x : DNDarray Input array of same shape y : DNDarray Input array of same shape Examples - --------- + -------- >>> ht.logical_or(ht.array([True, False]), ht.array([False, False])) DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None) """ @@ -453,14 +453,14 @@ def logical_xor(x: DNDarray, y: DNDarray) -> DNDarray: Computes the element-wise logical XOR of the given input :class:`~heat.core.dndarray.DNDarray`. Parameters - ----------- + ---------- x : DNDarray Input array of same shape y : DNDarray Input array of same shape Examples - --------- + -------- >>> ht.logical_xor(ht.array([True, False, True]), ht.array([True, False, False])) DNDarray([False, False, True], dtype=ht.bool, device=cpu:0, split=None) """ @@ -473,7 +473,7 @@ def __sanitize_close_input(x: DNDarray, y: DNDarray) -> Tuple[DNDarray, DNDarray Provides copies of ``x`` and ``y`` distributed along the same split axis (if original split axes do not match). Parameters - ----------- + ---------- x : DNDarray The left-hand side operand. y : DNDarray @@ -493,7 +493,7 @@ def sanitize_input_type( In the former case, the scalar is wrapped in a :class:`~heat.core.dndarray.DNDarray`. Parameters - ----------- + ---------- x : Union[int, float, DNDarray] The left-hand side operand. y : Union[int, float, DNDarray] diff --git a/heat/core/manipulations.py b/heat/core/manipulations.py index 02ec09ec93..d685f4d5ad 100644 --- a/heat/core/manipulations.py +++ b/heat/core/manipulations.py @@ -192,7 +192,7 @@ def broadcast_to(x: DNDarray, shape: Tuple[int, ...]) -> DNDarray: -------- >>> import heat as ht >>> a = ht.arange(100, split=0) - >>> b = ht.broadcast_to(a, (10,100)) + >>> b = ht.broadcast_to(a, (10, 100)) >>> b.shape (10, 100) >>> b.split @@ -493,7 +493,12 @@ def concatenate(arrays: Sequence[DNDarray, ...], axis: int = 0) -> DNDarray: raise RuntimeError("Communicators of passed arrays mismatch.") # identify common data type + is_mps = arr0.larray.is_mps or arr1.larray.is_mps out_dtype = types.promote_types(arr0.dtype, arr1.dtype) + if is_mps and out_dtype == types.float64: + warnings.warn("MPS does not support float64, using float32 instead") + out_dtype = types.float32 + if arr0.dtype != out_dtype: arr0 = out_dtype(arr0, device=arr0.device) if arr1.dtype != out_dtype: @@ -503,7 +508,9 @@ def concatenate(arrays: Sequence[DNDarray, ...], axis: int = 0) -> DNDarray: # no splits, local concat if s0 is None and s1 is None: return factories.array( - torch.cat((arr0.larray, arr1.larray), dim=axis), device=arr0.device, comm=arr0.comm + torch.cat((arr0.larray, arr1.larray), dim=axis), + device=arr0.device, + comm=arr0.comm, ) # non-matching splits when both arrays are split @@ -770,10 +777,12 @@ def diag(a: DNDarray, offset: int = 0) -> DNDarray: (abs(offset),), dtype=a.dtype, split=None, device=a.device, comm=a.comm ) a = concatenate((padding, a)) - indices_x = torch.arange(max(0, min(abs(offset) - off, lshape[0])), lshape[0]) + indices_x = torch.arange( + max(0, min(abs(offset) - off, lshape[0])), lshape[0], device=a.device.torch_device + ) else: # Offset = 0 values on main diagonal - indices_x = torch.arange(0, lshape[0]) + indices_x = torch.arange(0, lshape[0], device=a.device.torch_device) indices_y = indices_x + off + offset a.balance_() @@ -887,7 +896,7 @@ def dsplit(x: Sequence[DNDarray, ...], indices_or_sections: Iterable) -> List[DN the array is always split along the third axis provided the array dimension is greater than or equal to 3. See Also - ------ + -------- :func:`split` :func:`hsplit` :func:`vsplit` @@ -945,7 +954,7 @@ def expand_dims(a: DNDarray, axis: int) -> DNDarray: Examples -------- - >>> x = ht.array([1,2]) + >>> x = ht.array([1, 2]) >>> x.shape (2,) >>> y = ht.expand_dims(x, axis=0) @@ -1023,7 +1032,7 @@ def flatten(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([[[1,2],[3,4]],[[5,6],[7,8]]]) + >>> a = ht.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) >>> ht.flatten(a) DNDarray([1, 2, 3, 4, 5, 6, 7, 8], dtype=ht.int64, device=cpu:0, split=None) """ @@ -1031,14 +1040,22 @@ def flatten(a: DNDarray) -> DNDarray: if a.split is None: return factories.array( - torch.flatten(a.larray), dtype=a.dtype, is_split=None, device=a.device, comm=a.comm + torch.flatten(a.larray), + dtype=a.dtype, + is_split=None, + device=a.device, + comm=a.comm, ) if a.split > 0: a = resplit(a, 0) a = factories.array( - torch.flatten(a.larray), dtype=a.dtype, is_split=a.split, device=a.device, comm=a.comm + torch.flatten(a.larray), + dtype=a.dtype, + is_split=a.split, + device=a.device, + comm=a.comm, ) a.balance_() @@ -1068,12 +1085,12 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: Examples -------- - >>> a = ht.array([[0,1],[2,3]]) + >>> a = ht.array([[0, 1], [2, 3]]) >>> ht.flip(a, [0]) DNDarray([[2, 3], [0, 1]], dtype=ht.int64, device=cpu:0, split=None) - >>> b = ht.array([[0,1,2],[3,4,5]], split=1) - >>> ht.flip(a, [0,1]) + >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=1) + >>> ht.flip(a, [0, 1]) (1/2) tensor([5,4,3]) (2/2) tensor([2,1,0]) """ @@ -1087,7 +1104,7 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: flipped = torch.flip(a.larray, axis) - if a.split not in axis: + if not a.is_distributed() or a.split not in axis: return factories.array( flipped, dtype=a.dtype, is_split=a.split, device=a.device, comm=a.comm ) @@ -1125,11 +1142,11 @@ def fliplr(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([[0,1],[2,3]]) + >>> a = ht.array([[0, 1], [2, 3]]) >>> ht.fliplr(a) DNDarray([[1, 0], [3, 2]], dtype=ht.int64, device=cpu:0, split=None) - >>> b = ht.array([[0,1,2],[3,4,5]], split=0) + >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=0) >>> ht.fliplr(b) (1/2) tensor([[2, 1, 0]]) (2/2) tensor([[5, 4, 3]]) @@ -1153,11 +1170,11 @@ def flipud(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([[0,1],[2,3]]) + >>> a = ht.array([[0, 1], [2, 3]]) >>> ht.flipud(a) DNDarray([[2, 3], [0, 1]], dtype=ht.int64, device=cpu:0, split=None)) - >>> b = ht.array([[0,1,2],[3,4,5]], split=0) + >>> b = ht.array([[0, 1, 2], [3, 4, 5]], split=0) >>> ht.flipud(b) (1/2) tensor([3,4,5]) (2/2) tensor([0,1,2]) @@ -1253,19 +1270,19 @@ def hstack(arrays: Sequence[DNDarray, ...]) -> DNDarray: Examples -------- - >>> a = ht.array((1,2,3)) - >>> b = ht.array((2,3,4)) - >>> ht.hstack((a,b)).larray + >>> a = ht.array((1, 2, 3)) + >>> b = ht.array((2, 3, 4)) + >>> ht.hstack((a, b)).larray [0/1] tensor([1, 2, 3, 2, 3, 4]) [1/1] tensor([1, 2, 3, 2, 3, 4]) - >>> a = ht.array((1,2,3), split=0) - >>> b = ht.array((2,3,4), split=0) - >>> ht.hstack((a,b)).larray + >>> a = ht.array((1, 2, 3), split=0) + >>> b = ht.array((2, 3, 4), split=0) + >>> ht.hstack((a, b)).larray [0/1] tensor([1, 2, 3]) [1/1] tensor([2, 3, 4]) - >>> a = ht.array([[1],[2],[3]], split=0) - >>> b = ht.array([[2],[3],[4]], split=0) - >>> ht.hstack((a,b)).larray + >>> a = ht.array([[1], [2], [3]], split=0) + >>> b = ht.array([[2], [3], [4]], split=0) + >>> ht.hstack((a, b)).larray [0/1] tensor([[1, 2], [0/1] [2, 3]]) [1/1] tensor([[3, 4]]) @@ -1391,7 +1408,7 @@ def pad( Notes - ----------- + ----- This function follows the principle of datatype integrity. Therefore, an array can only be padded with values of the same datatype. All values that violate this rule are implicitly cast to the datatype of the `DNDarray`. @@ -1399,9 +1416,9 @@ def pad( Examples -------- >>> a = torch.arange(2 * 3 * 4).reshape(2, 3, 4) - >>> b = ht.array(a, split = 0) + >>> b = ht.array(a, split=0) Pad last dimension - >>> c = ht.pad(b, (2,1), constant_values=1) + >>> c = ht.pad(b, (2, 1), constant_values=1) tensor([[[ 1, 1, 0, 1, 2, 3, 1], [ 1, 1, 4, 5, 6, 7, 1], [ 1, 1, 8, 9, 10, 11, 1]], @@ -1409,7 +1426,7 @@ def pad( [ 1, 1, 16, 17, 18, 19, 1], [ 1, 1, 20, 21, 22, 23, 1]]]) Pad last 2 dimensions - >>> d = ht.pad(b, [(1,0), (2,1)]) + >>> d = ht.pad(b, [(1, 0), (2, 1)]) DNDarray([[[ 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 1, 2, 3, 0], [ 0, 0, 4, 5, 6, 7, 0], @@ -1420,7 +1437,7 @@ def pad( [ 0, 0, 16, 17, 18, 19, 0], [ 0, 0, 20, 21, 22, 23, 0]]], dtype=ht.int64, device=cpu:0, split=0) Pad last 3 dimensions - >>> e = ht.pad(b, ((2,1), [1,0], (2,1))) + >>> e = ht.pad(b, ((2, 1), [1, 0], (2, 1))) DNDarray([[[ 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0], @@ -1683,7 +1700,7 @@ def ravel(a: DNDarray) -> DNDarray: array to collapse Notes - ------ + ----- Returning a view of distributed data is only possible when `split != 0`. The returned DNDarray may be unbalanced. Otherwise, data must be communicated among processes, and `ravel` falls back to `flatten`. @@ -1693,9 +1710,9 @@ def ravel(a: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.ones((2,3), split=0) + >>> a = ht.ones((2, 3), split=0) >>> b = ht.ravel(a) - >>> a[0,0] = 4 + >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) """ @@ -1809,15 +1826,15 @@ def repeat(a: Iterable, repeats: Iterable, axis: Optional[int] = None) -> DNDarr >>> ht.repeat(3, 4) DNDarray([3, 3, 3, 3]) - >>> x = ht.array([[1,2],[3,4]]) + >>> x = ht.array([[1, 2], [3, 4]]) >>> ht.repeat(x, 2) DNDarray([1, 1, 2, 2, 3, 3, 4, 4]) - >>> x = ht.array([[1,2],[3,4]]) + >>> x = ht.array([[1, 2], [3, 4]]) >>> ht.repeat(x, [0, 1, 2, 0]) DNDarray([2, 3, 3]) - >>> ht.repeat(x, [1,2], axis=0) + >>> ht.repeat(x, [1, 2], axis=0) DNDarray([[1, 2], [3, 4], [3, 4]]) @@ -2030,6 +2047,8 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar The distribution axis of the reshaped array. If `new_split` is not provided, the reshaped array will have: - the same split axis as the input array, if the original dimensionality is unchanged; - split axis 0, if the number of dimensions is modified by reshaping. + **kwargs + Extra keyword arguments. Raises ------ @@ -2046,14 +2065,14 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar Examples -------- - >>> a = ht.zeros((3,4)) - >>> ht.reshape(a, (4,3)) + >>> a = ht.zeros((3, 4)) + >>> ht.reshape(a, (4, 3)) DNDarray([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.linspace(0, 14, 8, split=0) - >>> ht.reshape(a, (2,4)) + >>> ht.reshape(a, (2, 4)) (1/2) tensor([[0., 2., 4., 6.]]) (2/2) tensor([[ 8., 10., 12., 14.]]) # 3-dim array, distributed along axis 1 @@ -2066,7 +2085,7 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar [[0.0680, 0.4944, 0.4114, 0.6669], [0.6423, 0.2625, 0.5413, 0.2225], [0.0197, 0.5079, 0.4739, 0.4387]]], dtype=ht.float32, device=cpu:0, split=1) - >>> a.reshape(-1, 3) # reshape to 2-dim array: split axis will be set to 0 + >>> a.reshape(-1, 3) # reshape to 2-dim array: split axis will be set to 0 DNDarray([[0.5525, 0.5434, 0.9477], [0.9503, 0.4165, 0.3924], [0.3310, 0.3935, 0.1008], @@ -2075,7 +2094,7 @@ def reshape(a: DNDarray, *shape: Union[int, Tuple[int, ...]], **kwargs) -> DNDar [0.6669, 0.6423, 0.2625], [0.5413, 0.2225, 0.0197], [0.5079, 0.4739, 0.4387]], dtype=ht.float32, device=cpu:0, split=0) - >>> a.reshape(2,3,2,2, new_split=1) # reshape to 4-dim array, specify distribution axis + >>> a.reshape(2, 3, 2, 2, new_split=1) # reshape to 4-dim array, specify distribution axis DNDarray([[[[0.5525, 0.5434], [0.9477, 0.9503]], @@ -2250,7 +2269,7 @@ def roll( Examples -------- - >>> a = ht.arange(20).reshape((4,5)) + >>> a = ht.arange(20).reshape((4, 5)) >>> a DNDarray([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], @@ -2268,6 +2287,9 @@ def roll( [ 0, 1, 2, 3, 4]], dtype=ht.int32, device=cpu:0, split=None) """ sanitation.sanitize_in(x) + if isinstance(axis, list): + axis = tuple(axis) + axis = stride_tricks.sanitize_axis(x.shape, axis) if axis is None: return roll(x.flatten(), shift, 0).reshape(x.shape, new_split=x.split) @@ -2275,7 +2297,18 @@ def roll( # inputs are ints if isinstance(shift, int): if isinstance(axis, int): - if x.split is not None and (axis == x.split or (axis + x.ndim) == x.split): + if not x.is_distributed(): + return DNDarray( + torch.roll(x.larray, shift, axis), + gshape=x.shape, + dtype=x.dtype, + split=x.split, + device=x.device, + comm=x.comm, + balanced=x.balanced, + ) + # x is distributed + if axis == x.split: # roll along split axis size = x.comm.Get_size() rank = x.comm.Get_rank() @@ -2284,9 +2317,6 @@ def roll( lshape_map = x.create_lshape_map(force_check=False)[:, x.split] cumsum_map = torch.cumsum(lshape_map, dim=0) # cumulate along axis indices = torch.arange(size, device=x.device.torch_device) - # NOTE Can be removed when min version>=1.9 - if "1.8." in torch.__version__: # pragma: no cover - lshape_map = lshape_map.to(torch.int64) index_map = torch.repeat_interleave(indices, lshape_map) # index -> process # compute index positions @@ -2329,7 +2359,17 @@ def roll( raise TypeError(f"axis must be a int, list or a tuple, got {type(axis)}") shift = [shift] * len(axis) - + if not x.is_distributed(): + return DNDarray( + torch.roll(x.larray, shift, axis), + gshape=x.shape, + dtype=x.dtype, + split=x.split, + device=x.device, + comm=x.comm, + balanced=x.balanced, + ) + # x is distributed return roll(x, shift, axis) else: # input must be tuples now @@ -2354,7 +2394,18 @@ def roll( if not isinstance(axis[i], int): raise TypeError(f"Element {i} in axis is not an integer, got {type(axis[i])}") - if x.split is not None and (x.split in axis or (x.split - x.ndim) in axis): + if not x.is_distributed(): + return DNDarray( + torch.roll(x.larray, shift, axis), + gshape=x.shape, + dtype=x.dtype, + split=x.split, + device=x.device, + comm=x.comm, + balanced=x.balanced, + ) + # x is distributed + if x.split in axis: # remove split axis elements shift_split = 0 for y in (x.split, x.split - x.ndim): @@ -2416,7 +2467,7 @@ def rot90(m: DNDarray, k: int = 1, axes: Sequence[int, int] = (0, 1)) -> DNDarra Examples -------- - >>> m = ht.array([[1,2],[3,4]], dtype=ht.int) + >>> m = ht.array([[1, 2], [3, 4]], dtype=ht.int) >>> m DNDarray([[1, 2], [3, 4]], dtype=ht.int32, device=cpu:0, split=None) @@ -2426,8 +2477,8 @@ def rot90(m: DNDarray, k: int = 1, axes: Sequence[int, int] = (0, 1)) -> DNDarra >>> ht.rot90(m, 2) DNDarray([[4, 3], [2, 1]], dtype=ht.int32, device=cpu:0, split=None) - >>> m = ht.arange(8).reshape((2,2,2)) - >>> ht.rot90(m, 1, (1,2)) + >>> m = ht.arange(8).reshape((2, 2, 2)) + >>> ht.rot90(m, 1, (1, 2)) DNDarray([[[1, 3], [0, 2]], @@ -2536,7 +2587,7 @@ def sort(a: DNDarray, axis: int = -1, descending: bool = False, out: Optional[DN """ stride_tricks.sanitize_axis(a.shape, axis) - if a.split is None or axis != a.split: + if not a.is_distributed() or axis != a.split: # sorting is not affected by split -> we can just sort along the axis final_result, final_indices = torch.sort(a.larray, dim=axis, descending=descending) @@ -2791,7 +2842,7 @@ def split(x: DNDarray, indices_or_sections: Iterable, axis: int = 0) -> List[DND Examples -------- - >>> x = ht.arange(12).reshape((4,3)) + >>> x = ht.arange(12).reshape((4, 3)) >>> ht.split(x, 2) [ DNDarray([[0, 1, 2], [3, 4, 5]]), @@ -2989,7 +3040,7 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: Split semantics: see Notes below. Parameters - ----------- + ---------- x : DNDarray Input data. axis : None or int or Tuple[int,...], optional @@ -3006,9 +3057,9 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: which, depending on the squeeze axis, may result in a lower numerical `split` value (see Examples). Examples - --------- + -------- >>> import heat as ht - >>> a = ht.random.randn(1,3,1,5) + >>> a = ht.random.randn(1, 3, 1, 5) >>> a DNDarray([[[[-0.2604, 1.3512, 0.1175, 0.4197, 1.3590]], [[-0.2777, -1.1029, 0.0697, -1.3074, -1.1931]], @@ -3021,11 +3072,11 @@ def squeeze(x: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: DNDarray([[-0.2604, 1.3512, 0.1175, 0.4197, 1.3590], [-0.2777, -1.1029, 0.0697, -1.3074, -1.1931], [-0.4512, -1.2348, -1.1479, -0.0242, 0.4050]], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.squeeze(a,axis=0).shape + >>> ht.squeeze(a, axis=0).shape (3, 1, 5) - >>> ht.squeeze(a,axis=-2).shape + >>> ht.squeeze(a, axis=-2).shape (1, 3, 5) - >>> ht.squeeze(a,axis=1).shape + >>> ht.squeeze(a, axis=1).shape Traceback (most recent call last): ... ValueError: Dimension along axis 1 is not 1 for shape (1, 3, 1, 5) @@ -3137,7 +3188,7 @@ def stack( -------- >>> a = ht.arange(20).reshape((4, 5)) >>> b = ht.arange(20, 40).reshape((4, 5)) - >>> ht.stack((a,b), axis=0).larray + >>> ht.stack((a, b), axis=0).larray tensor([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], @@ -3149,7 +3200,7 @@ def stack( >>> # distributed DNDarrays, 3 processes, stack along last dimension >>> a = ht.arange(20, split=0).reshape(4, 5) >>> b = ht.arange(20, 40, split=0).reshape(4, 5) - >>> ht.stack((a,b), axis=-1).larray + >>> ht.stack((a, b), axis=-1).larray [0/2] tensor([[[ 0, 20], [0/2] [ 1, 21], [0/2] [ 2, 22], @@ -3241,7 +3292,7 @@ def swapaxes(x: DNDarray, axis1: int, axis2: int) -> DNDarray: Examples -------- - >>> x = ht.array([[[0,1],[2,3]],[[4,5],[6,7]]]) + >>> x = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> ht.swapaxes(x, 0, 1) DNDarray([[[0, 1], [4, 5]], @@ -3270,7 +3321,7 @@ def swapaxes(x: DNDarray, axis1: int, axis2: int) -> DNDarray: def unique( a: DNDarray, sorted: bool = False, return_inverse: bool = False, axis: int = None -) -> Tuple[DNDarray, torch.tensor]: +) -> Tuple[DNDarray, DNDarray]: """ Finds and returns the unique elements of a `DNDarray`. If return_inverse is `True`, the second tensor will hold the list of inverse indices @@ -3302,7 +3353,7 @@ def unique( array([[2, 3], [3, 1]]) """ - if a.split is None: + if not a.is_distributed(): torch_output = torch.unique( a.larray, sorted=sorted, return_inverse=return_inverse, dim=axis ) @@ -3467,8 +3518,12 @@ def unique( result.resplit_(a.split) return_value = result + if return_inverse: - return_value = [return_value, inverse_indices.to(a.device.torch_device)] + inverse_indices = factories.array( + inverse_indices, dtype=inverse_pos.dtype, device=a.device, comm=a.comm + ) + return_value = [return_value, inverse_indices] return return_value @@ -3485,6 +3540,7 @@ def unfold(a: DNDarray, axis: int, size: int, step: int = 1): """ Returns a DNDarray which contains all slices of size `size` in the axis `axis`. Behaves like torch.Tensor.unfold for DNDarrays. [torch.Tensor.unfold](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html) + Parameters ---------- a : DNDarray @@ -3649,7 +3705,13 @@ def resplit(arr: DNDarray, axis: Optional[int] = None) -> DNDarray: Examples -------- - >>> a = ht.zeros((4, 5,), split=0) + >>> a = ht.zeros( + ... ( + ... 4, + ... 5, + ... ), + ... split=0, + ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) @@ -3659,7 +3721,13 @@ def resplit(arr: DNDarray, axis: Optional[int] = None) -> DNDarray: >>> b.lshape (0/2) (4, 5) (1/2) (4, 5) - >>> a = ht.zeros((4, 5,), split=0) + >>> a = ht.zeros( + ... ( + ... 4, + ... 5, + ... ), + ... split=0, + ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) @@ -3762,8 +3830,17 @@ def _axis2axisResplit( return target_larray -DNDarray._axis2axisResplit = lambda self, comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile: _axis2axisResplit( - comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile +DNDarray._axis2axisResplit = ( + lambda self, + comm, + source_larray, + source_split, + source_tiles, + target_larray, + target_split, + target_tile: _axis2axisResplit( + comm, source_larray, source_split, source_tiles, target_larray, target_split, target_tile + ) ) DNDarray._axis2axisResplit.__doc__ = _axis2axisResplit.__doc__ @@ -3872,7 +3949,7 @@ def vstack(arrays: Sequence[DNDarray, ...]) -> DNDarray: 1-D arrays must have the same length. Notes - ------- + ----- The split axis will be switched to 1 in the case that both elements are 1D and split=0 See Also @@ -3888,21 +3965,21 @@ def vstack(arrays: Sequence[DNDarray, ...]) -> DNDarray: -------- >>> a = ht.array([1, 2, 3]) >>> b = ht.array([2, 3, 4]) - >>> ht.vstack((a,b)).larray + >>> ht.vstack((a, b)).larray [0/1] tensor([[1, 2, 3], [0/1] [2, 3, 4]]) [1/1] tensor([[1, 2, 3], [1/1] [2, 3, 4]]) >>> a = ht.array([1, 2, 3], split=0) >>> b = ht.array([2, 3, 4], split=0) - >>> ht.vstack((a,b)).larray + >>> ht.vstack((a, b)).larray [0/1] tensor([[1, 2], [0/1] [2, 3]]) [1/1] tensor([[3], [1/1] [4]]) >>> a = ht.array([[1], [2], [3]], split=0) >>> b = ht.array([[2], [3], [4]], split=0) - >>> ht.vstack((a,b)).larray + >>> ht.vstack((a, b)).larray [0] tensor([[1], [0] [2], [0] [3]]) @@ -3949,7 +4026,7 @@ def tile(x: DNDarray, reps: Sequence[int, ...]) -> DNDarray: Examples -------- - >>> x = ht.arange(12).reshape((4,3)).resplit_(0) + >>> x = ht.arange(12).reshape((4, 3)).resplit_(0) >>> x DNDarray([[ 0, 1, 2], [ 3, 4, 5], @@ -4185,7 +4262,7 @@ def topk( (Not Stable for split arrays) Parameters - ----------- + ---------- a: DNDarray Input data k: int @@ -4202,16 +4279,16 @@ def topk( Examples -------- >>> a = ht.array([1, 2, 3]) - >>> ht.topk(a,2) + >>> ht.topk(a, 2) (DNDarray([3, 2], dtype=ht.int64, device=cpu:0, split=None), DNDarray([2, 1], dtype=ht.int64, device=cpu:0, split=None)) - >>> a = ht.array([[1,2,3],[1,2,3]]) - >>> ht.topk(a,2,dim=1) + >>> a = ht.array([[1, 2, 3], [1, 2, 3]]) + >>> ht.topk(a, 2, dim=1) (DNDarray([[3, 2], [3, 2]], dtype=ht.int64, device=cpu:0, split=None), DNDarray([[2, 1], [2, 1]], dtype=ht.int64, device=cpu:0, split=None)) - >>> a = ht.array([[1,2,3],[1,2,3]], split=1) - >>> ht.topk(a,2,dim=1) + >>> a = ht.array([[1, 2, 3], [1, 2, 3]], split=1) + >>> ht.topk(a, 2, dim=1) (DNDarray([[3, 2], [3, 2]], dtype=ht.int64, device=cpu:0, split=1), DNDarray([[2, 1], @@ -4267,10 +4344,16 @@ def local_topk(*args, **kwargs): metadata = torch.tensor( [k, dim, largest, sorted, local_shape_len, *local_shape], device=indices.device ) - send_buffer = torch.cat( - (metadata.double(), result.double().flatten(), indices.flatten().double()) - ) + if result.is_mps: + # MPS does not support double precision + send_buffer = torch.cat( + (metadata.float(), result.float().flatten(), indices.flatten().float()) + ) + else: + send_buffer = torch.cat( + (metadata.double(), result.double().flatten(), indices.flatten().double()) + ) return send_buffer gres = _operations.__reduce_op( diff --git a/heat/core/memory.py b/heat/core/memory.py index 72b8cc7d9b..dbf2d8723e 100644 --- a/heat/core/memory.py +++ b/heat/core/memory.py @@ -1,5 +1,5 @@ """ -This module changes the internal memory of an array. +Utilities to manage the internal memory of an array. """ import torch @@ -21,7 +21,7 @@ def copy(x: DNDarray) -> DNDarray: Examples -------- - >>> a = ht.array([1,2,3]) + >>> a = ht.array([1, 2, 3]) >>> b = ht.copy(a) >>> b DNDarray([1, 2, 3], dtype=ht.int64, device=cpu:0, split=None) @@ -44,7 +44,7 @@ def sanitize_memory_layout(x: torch.Tensor, order: str = "C") -> torch.Tensor: Return the given object with memory layout as defined below. The default memory distribution is assumed. Parameters - ----------- + ---------- x: torch.Tensor Input data order: str, optional. diff --git a/heat/core/printing.py b/heat/core/printing.py index 5f9f95218d..660c333e39 100644 --- a/heat/core/printing.py +++ b/heat/core/printing.py @@ -205,6 +205,14 @@ def __str__(dndarray) -> str: ) +def __repr__(dndarray) -> str: + """ + Returns a printable representation of the passed DNDarray. + Unlike the __str__ method, which prints a representation targeted at users, this method targets developers by showing key internal parameters of the DNDarray. + """ + return f"" + + def _torch_data(dndarray, summarize) -> DNDarray: """ Extracts the data to be printed from the DNDarray in form of a torch tensor and returns it. diff --git a/heat/core/random.py b/heat/core/random.py index 5d22dbcc08..02e8bfad7d 100644 --- a/heat/core/random.py +++ b/heat/core/random.py @@ -129,8 +129,8 @@ def __counter_sequence( c_0 = (__counter & (max_count << 64)) >> 64 c_1 = __counter & max_count total_elements = torch.prod(torch.tensor(shape)) - if total_elements.item() > 2 * max_count: - raise ValueError(f"Shape is to big with {total_elements} elements") + # if total_elements.item() > 2 * max_count: + # raise ValueError(f"Shape is to big with {total_elements} elements") if split is None: values = total_elements.item() // 2 + total_elements.item() % 2 @@ -216,7 +216,7 @@ def __counter_sequence( tmp_counter += used_values __counter = tmp_counter & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF # 128-bit mask - return x_0.contiguous(), x_1.contiguous(), lshape, lslice + return x_0, x_1, lshape, lslice def get_state() -> Tuple[str, int, int, int, float]: @@ -332,7 +332,7 @@ def normal( Examples -------- - >>> ht.random.normal(ht.array([-1,2]), ht.array([0.5, 2]), (2,)) + >>> ht.random.normal(ht.array([-1, 2]), ht.array([0.5, 2]), (2,)) DNDarray([-1.4669, 1.6596], dtype=ht.float64, device=cpu:0, split=None) """ if not (isinstance(mean, (float, int))) and not isinstance(mean, DNDarray): @@ -348,23 +348,26 @@ def normal( return mean + std * standard_normal(shape, dtype, split, device, comm) -def permutation(x: Union[int, DNDarray]) -> DNDarray: +def permutation(x: Union[int, DNDarray], **kwargs) -> DNDarray: """ Randomly permute a sequence, or return a permuted range. If ``x`` is a multi-dimensional array, it is only shuffled along its first index. Parameters - ----------- + ---------- x : int or DNDarray If ``x`` is an integer, call :func:`heat.random.randperm `. If ``x`` is an array, make a copy and shuffle the elements randomly. + kwargs : dict, optional + Additional keyword arguments passed to :func:`heat.random.randperm ` if ``x`` is an integer. + See Also - ----------- + -------- :func:`heat.random.randperm ` for randomly permuted ranges. Examples - ---------- + -------- >>> ht.random.permutation(10) DNDarray([9, 1, 5, 4, 8, 2, 7, 6, 3, 0], dtype=ht.int64, device=cpu:0, split=None) >>> ht.random.permutation(ht.array([1, 4, 9, 12, 15])) @@ -381,7 +384,7 @@ def permutation(x: Union[int, DNDarray]) -> DNDarray: Thus, the array containing these indices needs to fit into the memory of a single MPI-process. """ if isinstance(x, int): - return randperm(x) + return randperm(x, **kwargs) if not isinstance(x, DNDarray): raise TypeError("x must be int or DNDarray") @@ -434,7 +437,7 @@ def permutation(x: Union[int, DNDarray]) -> DNDarray: def rand( - *args: List[int], + *d: int, dtype: Type[datatype] = types.float32, split: Optional[int] = None, device: Optional[Device] = None, @@ -446,7 +449,7 @@ def rand( Parameters ---------- - d1,d2,…,dn : List[int,...] + *d : int, optional The dimensions of the returned array, should all be positive. If no argument is given a single random samples is generated. dtype : Type[datatype], optional @@ -472,11 +475,11 @@ def rand( DNDarray([0.1921, 0.9635, 0.5047], dtype=ht.float32, device=cpu:0, split=None) """ # if args are not set, generate a single sample - if not args: + if not d: shape = (1,) else: # ensure that the passed dimensions are positive integer-likes - shape = tuple(int(ele) for ele in args) + shape = tuple(int(ele) for ele in d) if any(ele <= 0 for ele in shape): raise ValueError("negative dimensions are not allowed") @@ -523,7 +526,7 @@ def rand( ) if split is None: x = x.resplit_(None) - if not args or shape == (): + if not d or shape == (): x = x.item() return x @@ -563,7 +566,7 @@ def randint( Handle to the nodes holding distributed parts or copies of this array. Raises - ------- + ------ TypeError If one of low or high is not an int. ValueError @@ -619,7 +622,6 @@ def randint( x_0, x_1 = __threefry32(x_0, x_1, seed=__seed) else: # torch.int64 x_0, x_1 = __threefry64(x_0, x_1, seed=__seed) - # stack the resulting sequence and normalize to given range values = torch.stack([x_0, x_1], dim=1).flatten()[lslice].reshape(lshape) # ATTENTION: this is biased and known, bias-free rejection sampling is difficult to do in parallel @@ -665,7 +667,7 @@ def random_integer( def randn( - *args: List[int], + *d: int, dtype: Type[datatype] = types.float32, split: Optional[int] = None, device: Optional[str] = None, @@ -676,7 +678,7 @@ def randn( Parameters ---------- - d1,d2,…,dn : List[int,...] + *d : int, optional The dimensions of the returned array, should be all positive. dtype : Type[datatype], optional The datatype of the returned values. Has to be one of :class:`~heat.core.types.float32` or @@ -697,7 +699,7 @@ def randn( Accepts arguments for mean and standard deviation. Raises - ------- + ------ TypeError If one of ``d1`` to ``dn`` is not an integer. ValueError @@ -716,7 +718,7 @@ def randn( if __rng == "Threefry": # use threefry RNG and the Kundu transform to generate normally distributed random numbers # generate uniformly distributed random numbers first - normal_tensor = rand(*args, dtype=dtype, split=split, device=device, comm=comm) + normal_tensor = rand(*d, dtype=dtype, split=split, device=device, comm=comm) # convert the the values to a normal distribution using the Kundu transform normal_tensor.larray = __kundu_transform(normal_tensor.larray) @@ -724,11 +726,11 @@ def randn( else: # use batchparallel RNG and torch's generation of normally distributed random numbers # if args are not set, generate a single sample - if not args: + if not d: shape = (1,) else: # ensure that the passed dimensions are positive integer-likes - shape = tuple(int(ele) for ele in args) + shape = tuple(int(ele) for ele in d) if any(ele <= 0 for ele in shape): raise ValueError("negative dimensions are not allowed") @@ -749,7 +751,7 @@ def randn( ) if split is None: x = x.resplit_(None) - if not args or shape == (): + if not d or shape == (): x = x.item() return x @@ -779,7 +781,7 @@ def randperm( Handle to the nodes holding distributed parts or copies of this array. Raises - ------- + ------ TypeError If ``n`` is not an integer. @@ -798,7 +800,7 @@ def randperm( device = devices.sanitize_device(device) comm = communication.sanitize_comm(comm) perm = torch.randperm(n, dtype=dtype.torch_type(), device=device.torch_device) - if __rng != "Threefry": + if comm.Get_size() > 1 and __rng != "Threefry": comm.Bcast(perm, root=0) return factories.array(perm, dtype=dtype, device=device, split=split, comm=comm) diff --git a/heat/core/relational.py b/heat/core/relational.py index 940d4538df..19cd7646b8 100644 --- a/heat/core/relational.py +++ b/heat/core/relational.py @@ -16,6 +16,7 @@ from . import types from . import sanitation from . import factories +from . import devices __all__ = [ "eq", @@ -48,9 +49,9 @@ def eq(x, y) -> DNDarray: The second operand involved in the comparison Examples - --------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.eq(x, 3.0) DNDarray([[False, False], [ True, False]], dtype=ht.bool, device=cpu:0, split=None) @@ -97,10 +98,10 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo The second operand involved in the comparison Examples - --------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) - >>> ht.equal(x, ht.float32([[1, 2],[3, 4]])) + >>> x = ht.float32([[1, 2], [3, 4]]) + >>> ht.equal(x, ht.float32([[1, 2], [3, 4]])) True >>> y = ht.float32([[2, 2], [2, 2]]) >>> ht.equal(x, y) @@ -144,7 +145,11 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo target_map[: x.comm.rank + 1, y.split].sum(), ) x = factories.array( - x.larray[tuple(idx)], is_split=y.split, copy=False, comm=x.comm, device=x.device + x.larray[tuple(idx)], + is_split=y.split, + copy=False, + comm=x.comm, + device=x.device, ) elif x.split is not None and y.split is None: if x.is_balanced(force_check=False): @@ -157,7 +162,11 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo target_map[: y.comm.rank + 1, x.split].sum(), ) y = factories.array( - y.larray[tuple(idx)], is_split=x.split, copy=False, comm=y.comm, device=y.device + y.larray[tuple(idx)], + is_split=x.split, + copy=False, + comm=y.comm, + device=y.device, ) elif x.split != y.split: raise ValueError( @@ -171,6 +180,9 @@ def equal(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> boo y = y.balance() result_type = types.result_type(x, y) + is_mps = x.larray.is_mps or y.larray.is_mps + if is_mps and result_type is types.float64: + result_type = types.float32 x = x.astype(result_type) y = y.astype(result_type) @@ -196,9 +208,9 @@ def ge(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr The second operand to be compared less than or equal to first operand Examples - ------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.ge(x, 3.0) DNDarray([[False, False], [ True, True]], dtype=ht.bool, device=cpu:0, split=None) @@ -245,9 +257,9 @@ def gt(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr The second operand to be compared less than first operand Examples - ------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.gt(x, 3.0) DNDarray([[False, False], [False, True]], dtype=ht.bool, device=cpu:0, split=None) @@ -294,9 +306,9 @@ def le(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr The second operand to be compared greater than or equal to first operand Examples - ------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.le(x, 3.0) DNDarray([[ True, True], [ True, False]], dtype=ht.bool, device=cpu:0, split=None) @@ -343,9 +355,9 @@ def lt(x: Union[DNDarray, float, int], y: Union[DNDarray, float, int]) -> DNDarr The second operand to be compared greater than first operand Examples - ------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.lt(x, 3.0) DNDarray([[ True, True], [False, False]], dtype=ht.bool, device=cpu:0, split=None) @@ -393,9 +405,9 @@ def ne(x, y) -> DNDarray: The second operand involved in the comparison Examples - --------- + -------- >>> import heat as ht - >>> x = ht.float32([[1, 2],[3, 4]]) + >>> x = ht.float32([[1, 2], [3, 4]]) >>> ht.ne(x, 3.0) DNDarray([[ True, True], [False, True]], dtype=ht.bool, device=cpu:0, split=None) diff --git a/heat/core/rounding.py b/heat/core/rounding.py index dcee642b4c..32d5620ab9 100644 --- a/heat/core/rounding.py +++ b/heat/core/rounding.py @@ -45,7 +45,7 @@ def abs( precision. Raises - ------- + ------ TypeError If dtype is not a heat type. """ @@ -143,7 +143,7 @@ def clip(x: DNDarray, min, max, out: Optional[DNDarray] = None) -> DNDarray: the right shape to hold the output. Its type is preserved. Raises - ------- + ------ ValueError if either min or max is not set """ @@ -154,7 +154,13 @@ def clip(x: DNDarray, min, max, out: Optional[DNDarray] = None) -> DNDarray: if out is None: return dndarray.DNDarray( - x.larray.clamp(min, max), x.shape, x.dtype, x.split, x.device, x.comm, x.balanced + x.larray.clamp(min, max), + x.shape, + x.dtype, + x.split, + x.device, + x.comm, + x.balanced, ) sanitation.sanitize_out(out, x.gshape, x.split, x.device) @@ -237,7 +243,7 @@ def modf(x: DNDarray, out: Optional[Tuple[DNDarray, DNDarray]] = None) -> Tuple[ If not provided or ``None``, a freshly-allocated array is returned. Raises - ------- + ------ TypeError if ``x`` is not a :class:`~heat.core.dndarray.DNDarray` TypeError @@ -305,7 +311,7 @@ def round( precision. Raises - ------- + ------ TypeError if dtype is not a heat data type @@ -361,7 +367,7 @@ def sgn(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: >>> a = ht.array([-1, -0.5, 0, 0.5, 1]) >>> ht.sign(a) DNDarray([-1., -1., 0., 1., 1.], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sgn(ht.array([5-2j, 3+4j])) + >>> ht.sgn(ht.array([5 - 2j, 3 + 4j])) DNDarray([(0.9284766912460327-0.3713906705379486j), (0.6000000238418579+0.800000011920929j)], dtype=ht.complex64, device=cpu:0, split=None) """ return _operations.__local_op(torch.sgn, x, out) @@ -388,7 +394,7 @@ def sign(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: >>> a = ht.array([-1, -0.5, 0, 0.5, 1]) >>> ht.sign(a) DNDarray([-1., -1., 0., 1., 1.], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.sign(ht.array([5-2j, 3+4j])) + >>> ht.sign(ht.array([5 - 2j, 3 + 4j])) DNDarray([(1+0j), (1+0j)], dtype=ht.complex64, device=cpu:0, split=None) """ # special case for complex values diff --git a/heat/core/sanitation.py b/heat/core/sanitation.py index a820c8b92e..cfebfb61dc 100644 --- a/heat/core/sanitation.py +++ b/heat/core/sanitation.py @@ -58,7 +58,7 @@ def sanitize_distribution( When the split-axes or sizes along the split-axis do not match. See Also - --------- + -------- :func:`~heat.core.dndarray.create_lshape_map` Function to create the lshape_map. """ @@ -139,8 +139,7 @@ def sanitize_distribution( ) elif not ( # False - target_balanced - and arg.is_balanced(force_check=False) + target_balanced and arg.is_balanced(force_check=False) ): # Split axes are the same and atleast one is not balanced current_map = arg.lshape_map out_map = current_map.clone() @@ -174,12 +173,29 @@ def sanitize_in(x: Any): raise TypeError(f"Input must be a DNDarray, is {type(x)}") +def sanitize_in_nd_realfloating(input: Any, inputname: str, allowed_ns: List[int]) -> None: + """ + Verify that input object ``input`` is a real floating point ``DNDarray`` with number of dimensions contained in ``allowed_ns``. + The argument ``inputname`` is used for error messages. + """ + if not isinstance(input, DNDarray): + raise TypeError(f"Argument {inputname} needs to be a DNDarray but is {type(input)}.") + if input.ndim not in allowed_ns: + raise ValueError( + f"Argument {inputname} needs to be a {allowed_ns}-dimensional, but is {input.ndim}-dimensional." + ) + if not types.heat_type_is_realfloating(input.dtype): + raise TypeError( + f"Argument {inputname} needs to be a DNDarray with datatype float32 or float64, but data type is {input.dtype}." + ) + + def sanitize_infinity(x: Union[DNDarray, torch.Tensor]) -> Union[int, float]: """ Returns largest possible value for the ``dtype`` of the input array. Parameters - ----------- + ---------- x: Union[DNDarray, torch.Tensor] Input object. """ diff --git a/heat/core/signal.py b/heat/core/signal.py index 82cba98566..4f02482ae0 100644 --- a/heat/core/signal.py +++ b/heat/core/signal.py @@ -5,15 +5,15 @@ from .communication import MPI from .dndarray import DNDarray -from .types import promote_types +from .types import promote_types, float32, float64 from .manipulations import pad, flip -from .factories import array, zeros +from .factories import array, zeros, arange import torch.nn.functional as fc __all__ = ["convolve"] -def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: +def convolve(a: DNDarray, v: DNDarray, mode: str = "full", stride: int = 1) -> DNDarray: """ Returns the discrete, linear convolution of two one-dimensional `DNDarray`s or scalars. Unlike `numpy.signal.convolve`, if ``a`` and/or ``v`` have more than one dimension, batch-convolution along the last dimension will be attempted. See `Examples` below. @@ -30,7 +30,7 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: Can be 'full', 'valid', or 'same'. Default is 'full'. 'full': Returns the convolution at - each point of overlap, with an output shape of (N+M-1,). At + each point of overlap, with a length of '(N+M-2)//stride+1'. At the end-points of the convolution, the signals do not overlap completely, and boundary effects may be seen. 'same': @@ -38,34 +38,43 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: effects are still visible. This mode is not supported for even-sized filter weights 'valid': - Mode 'valid' returns output of length 'N-M+1'. The + Mode 'valid' returns output of length '(N-M)//stride+1'. The convolution product is only given for points where the signals overlap completely. Values outside the signal boundary have no effect. + stride : int + Stride of the convolution. Must be a positive integer. Default is 1. + Stride must be 1 for mode 'same'. Examples -------- Note how the convolution operator flips the second array before "sliding" the two across one another: - >>> a = ht.ones(10) + >>> a = ht.ones(5) >>> v = ht.arange(3).astype(ht.float) - >>> ht.convolve(a, v, mode='full') + >>> ht.convolve(a, v, mode="full") DNDarray([0., 1., 3., 3., 3., 3., 2.]) - >>> ht.convolve(a, v, mode='same') + >>> ht.convolve(a, v, mode="same") DNDarray([1., 3., 3., 3., 3.]) - >>> ht.convolve(a, v, mode='valid') + >>> ht.convolve(a, v, mode="valid") DNDarray([3., 3., 3.]) - >>> a = ht.ones(10, split = 0) - >>> v = ht.arange(3, split = 0).astype(ht.float) - >>> ht.convolve(a, v, mode='valid') + >>> ht.convolve(a, v, stride=2) + DNDarray([0., 3., 3., 2.]) + >>> ht.convolve(a, v, mode="valid", stride=2) + DNDarray([3., 3.]) + + >>> a = ht.ones(10, split=0) + >>> v = ht.arange(3, split=0).astype(ht.float) + >>> ht.convolve(a, v, mode="valid") DNDarray([3., 3., 3., 3., 3., 3., 3., 3.]) [0/3] DNDarray([3., 3., 3.]) [1/3] DNDarray([3., 3., 3.]) [2/3] DNDarray([3., 3.]) - >>> a = ht.ones(10, split = 0) - >>> v = ht.arange(3, split = 0) + + >>> a = ht.ones(10, split=0) + >>> v = ht.arange(3, split=0) >>> ht.convolve(a, v) DNDarray([0., 1., 3., 3., 3., 3., 3., 3., 3., 3., 3., 2.], dtype=ht.float32, device=cpu:0, split=0) @@ -73,10 +82,10 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: [1/3] DNDarray([3., 3., 3., 3.]) [2/3] DNDarray([3., 3., 3., 2.]) - >>> a = ht.arange(50, dtype = ht.float64, split=0) - >>> a = a.reshape(10, 5) # 10 signals of length 5 + >>> a = ht.arange(50, dtype=ht.float64, split=0) + >>> a = a.reshape(10, 5) # 10 signals of length 5 >>> v = ht.arange(3) - >>> ht.convolve(a, v) # batch processing: 10 signals convolved with filter v + >>> ht.convolve(a, v) # batch processing: 10 signals convolved with filter v DNDarray([[ 0., 0., 1., 4., 7., 10., 8.], [ 0., 5., 16., 19., 22., 25., 18.], [ 0., 10., 31., 34., 37., 40., 28.], @@ -88,8 +97,8 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: [ 0., 40., 121., 124., 127., 130., 88.], [ 0., 45., 136., 139., 142., 145., 98.]], dtype=ht.float64, device=cpu:0, split=0) - >>> v = ht.random.randint(0, 3, (10, 3), split=0) # 10 filters of length 3 - >>> ht.convolve(a, v) # batch processing: 10 signals convolved with 10 filters + >>> v = ht.random.randint(0, 3, (10, 3), split=0) # 10 filters of length 3 + >>> ht.convolve(a, v) # batch processing: 10 signals convolved with 10 filters DNDarray([[ 0., 0., 2., 4., 6., 8., 0.], [ 5., 6., 7., 8., 9., 0., 0.], [ 20., 42., 56., 61., 66., 41., 14.], @@ -116,6 +125,10 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: except TypeError: raise TypeError(f"non-supported type for filter: {type(v)}") promoted_type = promote_types(a.dtype, v.dtype) + if a.larray.is_mps and promoted_type == float64: + # cannot cast to float64 on MPS + promoted_type = float32 + a = a.astype(promoted_type) v = v.astype(promoted_type) @@ -152,6 +165,12 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: f"1-D convolution only supported for 1-dimensional signal and kernel. Signal: {a.shape}, Filter: {v.shape}" ) + # check mode and stride for value errors + if stride < 1: + raise ValueError("Stride must be at positive integer") + if stride > 1 and mode == "same": + raise ValueError("Stride must be 1 for mode 'same'") + if mode == "same" and v.shape[-1] % 2 == 0: raise ValueError("Mode 'same' cannot be used with even-sized kernel") if not v.is_balanced(): @@ -160,20 +179,23 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: # calculate pad size according to mode if mode == "full": pad_size = v.shape[-1] - 1 - gshape = v.shape[-1] + a.shape[-1] - 1 elif mode == "same": pad_size = v.shape[-1] // 2 - gshape = a.shape[-1] elif mode == "valid": pad_size = 0 - gshape = a.shape[-1] - v.shape[-1] + 1 else: raise ValueError(f"Supported modes are 'full', 'valid', 'same', got {mode}") + gshape = (a.shape[-1] + 2 * pad_size - v.shape[-1]) // stride + 1 + + if v.is_distributed() and stride > 1: + gshape_stride_1 = a.shape[-1] + 2 * pad_size - v.shape[-1] + 1 + if batch_processing: # all operations are local torch operations, only the last dimension is convolved local_a = a.larray local_v = v.larray + # flip filter for convolution, as Pytorch conv1d computes correlations local_v = torch.flip(local_v, [-1]) local_batch_dims = tuple(local_a.shape[:-1]) @@ -204,7 +226,9 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: # apply torch convolution operator if local signal isn't empty if torch.prod(torch.tensor(local_a.shape, device=local_a.device)) > 0: - local_convolved = fc.conv1d(local_a, local_v, padding=pad_size, groups=channels) + local_convolved = fc.conv1d( + local_a, local_v, padding=pad_size, groups=channels, stride=stride + ) else: empty_shape = tuple(local_a.shape[:-1] + (gshape,)) local_convolved = torch.empty(empty_shape, dtype=local_a.dtype, device=local_a.device) @@ -232,6 +256,23 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: a.get_halo(halo_size) # apply halos to local array signal = a.array_with_halos + + # shift signal based on global kernel starts for any rank but first + if stride > 1 and not v.is_distributed(): + if a.comm.rank == 0: + local_index = 0 + else: + local_index = torch.sum(a.lshape_map[: a.comm.rank, 0]).item() - halo_size + local_index = local_index % stride + + if local_index != 0: + local_index = stride - local_index + + # even kernels can produces doubles + if v.shape[-1] % 2 == 0 and local_index == 0: + local_index = stride + + signal = signal[local_index:] else: signal = a.larray @@ -262,11 +303,15 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: if v.is_distributed(): size = v.comm.size + # any stride is a subset of stride 1 + if stride > 1: + gshape = gshape_stride_1 + for r in range(size): rec_v = t_v.clone() v.comm.Bcast(rec_v, root=r) t_v1 = rec_v.reshape(1, 1, rec_v.shape[0]) - local_signal_filtered = fc.conv1d(signal, t_v1) + local_signal_filtered = fc.conv1d(signal, t_v1, stride=1) # unpack 3D result into 1D local_signal_filtered = local_signal_filtered[0, 0, :] @@ -294,21 +339,29 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray: ) if r != size - 1: start_idx += v.lshape_map[r + 1][0].item() + + # any stride is a subset of arrays of stride 1 + if stride > 1: + signal_filtered = signal_filtered[::stride] + return signal_filtered else: # apply torch convolution operator - signal_filtered = fc.conv1d(signal, weight) + if signal.shape[-1] >= weight.shape[-1]: + signal_filtered = fc.conv1d(signal, weight, stride=stride) - # unpack 3D result into 1D - signal_filtered = signal_filtered[0, 0, :] + # unpack 3D result into 1D + signal_filtered = signal_filtered[0, 0, :] + else: + signal_filtered = torch.tensor([], device=str(signal.device)) # if kernel shape along split axis is even we need to get rid of duplicated values - if a.comm.rank != 0 and v.shape[0] % 2 == 0: + if a.comm.rank != 0 and v.shape[0] % 2 == 0 and stride == 1: signal_filtered = signal_filtered[1:] return DNDarray( - signal_filtered.contiguous(), + signal_filtered, (gshape,), signal_filtered.dtype, a.split, diff --git a/heat/core/statistics.py b/heat/core/statistics.py index 29c557863d..57a9bbebc1 100644 --- a/heat/core/statistics.py +++ b/heat/core/statistics.py @@ -61,6 +61,8 @@ def argmax( By default, the index is into the flattened array, otherwise along the specified axis. out : DNDarray, optional. If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype. + **kwargs + Extra keyword arguments Examples -------- @@ -98,7 +100,13 @@ def local_argmax(*args, **kwargs): offset, _, _ = x.comm.chunk(shape, x.split) indices += torch.tensor(offset, dtype=indices.dtype) - return torch.cat([maxima.double(), indices.double()]) + if maxima.is_mps: + # MPS framework doesn't support float64 + out = torch.cat([maxima.float(), indices.float()]) + else: + out = torch.cat([maxima.double(), indices.double()]) + + return out # axis sanitation if axis is not None and not isinstance(axis, int): @@ -132,6 +140,8 @@ def argmin( By default, the index is into the flattened array, otherwise along the specified axis. out : DNDarray, optional Issue #100 If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype. + **kwargs + Extra keyword arguments Examples -------- @@ -156,21 +166,27 @@ def local_argmin(*args, **kwargs): # argmin will be the flattened index, computed standalone and the actual minimum value obtain separately if len(args) <= 1 and axis < 0: indices = torch.argmin(*args, **kwargs).reshape(1) - minimums = args[0].flatten()[indices] + minima = args[0].flatten()[indices] # artificially flatten the input tensor shape to correct the offset computation axis = 0 shape = [np.prod(shape)] # usual case where indices and minimum values are both returned. Axis is not equal to None else: - minimums, indices = torch.min(*args, **kwargs) + minima, indices = torch.min(*args, **kwargs) # add offset of data chunks if reduction is computed across split axis if axis == x.split: offset, _, _ = x.comm.chunk(shape, x.split) indices += torch.tensor(offset, dtype=indices.dtype) - return torch.cat([minimums.double(), indices.double()]) + if minima.is_mps: + # MPS framework doesn't support float64 + out = torch.cat([minima.float(), indices.float()]) + else: + out = torch.cat([minima.double(), indices.double()]) + + return out # axis sanitation if axis is not None and not isinstance(axis, int): @@ -235,17 +251,17 @@ def average( Examples -------- - >>> data = ht.arange(1,5, dtype=float) + >>> data = ht.arange(1, 5, dtype=float) >>> data DNDarray([1., 2., 3., 4.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data) DNDarray(2.5000, dtype=ht.float32, device=cpu:0, split=None) - >>> ht.average(ht.arange(1,11, dtype=float), weights=ht.arange(10,0,-1)) + >>> ht.average(ht.arange(1, 11, dtype=float), weights=ht.arange(10, 0, -1)) DNDarray([4.], dtype=ht.float64, device=cpu:0, split=None) >>> data = ht.array([[0, 1], [2, 3], [4, 5]], dtype=float, split=1) - >>> weights = ht.array([1./4, 3./4]) + >>> weights = ht.array([1.0 / 4, 3.0 / 4]) >>> ht.average(data, axis=1, weights=weights) DNDarray([0.7500, 2.7500, 4.7500], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data, weights=weights) @@ -581,11 +597,11 @@ def digitize(x: DNDarray, bins: Union[DNDarray, torch.Tensor], right: bool = Fal Examples -------- - >>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.]) + >>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.0]) >>> bins = ht.array([0, 5, 10, 15, 20]) - >>> ht.digitize(x,bins,right=True) + >>> ht.digitize(x, bins, right=True) DNDarray([1, 2, 3, 4, 4], dtype=ht.int64, device=cpu:0, split=None) - >>> ht.digitize(x,bins,right=False) + >>> ht.digitize(x, bins, right=False) DNDarray([1, 3, 3, 4, 5], dtype=ht.int64, device=cpu:0, split=None) """ if isinstance(bins, DNDarray): @@ -642,7 +658,7 @@ def histc( Examples -------- - >>> ht.histc(ht.array([1., 2, 1]), bins=4, min=0, max=3) + >>> ht.histc(ht.array([1.0, 2, 1]), bins=4, min=0, max=3) DNDarray([0., 2., 1., 0.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.histc(ht.arange(10, dtype=ht.float64, split=0), bins=10) DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float64, device=cpu:0, split=None) @@ -659,7 +675,7 @@ def histc( out=out._DNDarray__array if out is not None and input.split is None else None, ) - if input.split is None: + if not input.is_distributed(): if out is None: out = DNDarray( hist, @@ -855,7 +871,7 @@ def maximum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar imaginary parts being ``NaN``. The net effect is that NaNs are propagated. Parameters - ----------- + ---------- x1 : DNDarray The first array containing the elements to be compared. x2 : DNDarray @@ -865,7 +881,7 @@ def maximum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar If not provided or ``None``, a freshly-allocated array is returned. Examples - --------- + -------- >>> import heat as ht >>> a = ht.random.randn(3, 4) >>> a @@ -920,12 +936,12 @@ def mean(x: DNDarray, axis: Optional[Union[int, Tuple[int, ...]]] = None) -> DND Examples -------- - >>> a = ht.random.randn(1,3) + >>> a = ht.random.randn(1, 3) >>> a DNDarray([[-0.1164, 1.0446, -0.4093]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a) DNDarray(0.1730, dtype=ht.float32, device=cpu:0, split=None) - >>> a = ht.random.randn(4,4) + >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-1.0585, 0.7541, -1.1011, 0.5009], [-1.3575, 0.3344, 0.4506, 0.7379], @@ -935,13 +951,13 @@ def mean(x: DNDarray, axis: Optional[Union[int, Tuple[int, ...]]] = None) -> DND DNDarray([-0.2262, 0.0413, -0.8328, -0.2619], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, 0) DNDarray([-0.5392, -0.1655, -0.7539, 0.1791], dtype=ht.float32, device=cpu:0, split=None) - >>> a = ht.random.randn(4,4) + >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-0.1441, 0.5016, 0.8907, 0.6318], [-1.1690, -1.2657, 1.4840, -0.1014], [ 0.4133, 1.4168, 1.3499, 1.0340], [-0.9236, -0.7535, -0.2466, -0.9703]], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.mean(a, (0,1)) + >>> ht.mean(a, (0, 1)) DNDarray(0.1342, dtype=ht.float32, device=cpu:0, split=None) """ @@ -984,7 +1000,7 @@ def reduce_means_elementwise(output_shape_i: torch.Tensor) -> DNDarray: # ---------------------------------------------------------------------------------------------- # sanitize dtype if types.heat_type_is_exact(x.dtype): - if x.dtype is types.int64: + if x.dtype is types.int64 and not x.larray.is_mps: x = x.astype(types.float64) else: x = x.astype(types.float32) @@ -1067,7 +1083,11 @@ def median( DNDarray.median: Callable[[DNDarray, int, bool, bool, float], DNDarray] = ( - lambda x, axis=None, keepdims=False, sketched=False, sketch_size=1.0 / MPI.COMM_WORLD.size: median( + lambda x, + axis=None, + keepdims=False, + sketched=False, + sketch_size=1.0 / MPI.COMM_WORLD.size: median( x, axis, keepdims, sketched=sketched, sketch_size=sketch_size ) ) @@ -1217,7 +1237,7 @@ def minimum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar imaginary parts being ``NaN``. The net effect is that NaNs are propagated. Parameters - ----------- + ---------- x1 : DNDarray The first array containing the elements to be compared. x2 : DNDarray @@ -1227,31 +1247,31 @@ def minimum(x1: DNDarray, x2: DNDarray, out: Optional[DNDarray] = None) -> DNDar If not provided or ``None``, a freshly-allocated array is returned. Examples - --------- + -------- >>> import heat as ht - >>> a = ht.random.randn(3,4) + >>> a = ht.random.randn(3, 4) >>> a DNDarray([[-0.5462, 0.0079, 1.2828, 1.4980], [ 0.6503, -1.1069, 1.2131, 1.4003], [-0.3203, -0.2318, 1.0388, 0.4439]], dtype=ht.float32, device=cpu:0, split=None) - >>> b = ht.random.randn(3,4) + >>> b = ht.random.randn(3, 4) >>> b DNDarray([[ 1.8505, 2.3055, -0.2825, -1.4718], [-0.3684, 1.6866, -0.8570, -0.4779], [ 1.0532, 0.3775, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.minimum(a,b) + >>> ht.minimum(a, b) DNDarray([[-0.5462, 0.0079, -0.2825, -1.4718], [-0.3684, -1.1069, -0.8570, -0.4779], [-0.3203, -0.2318, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) - >>> c = ht.random.randn(1,4) + >>> c = ht.random.randn(1, 4) >>> c DNDarray([[-1.4358, 1.2914, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) - >>> ht.minimum(a,c) + >>> ht.minimum(a, c) DNDarray([[-1.4358, 0.0079, -0.6042, -1.4009], [-1.4358, -1.1069, -0.6042, -1.4009], [-1.4358, -0.2318, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) - >>> d = ht.random.randn(3,4,5) - >>> ht.minimum(a,d) + >>> d = ht.random.randn(3, 4, 5) + >>> ht.minimum(a, d) ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5) """ return _operations.__binary_op(torch.min, x1, x2, out) @@ -1597,7 +1617,10 @@ def _create_sketch( output_shape = perc_size + output_shape # output data type must be float - output_dtype = types.float32 if x.larray.element_size() == 4 else types.float64 + if x.larray.element_size() == 4 or x.larray.is_mps: + output_dtype = types.float32 + else: + output_dtype = types.float64 if out is not None: sanitation.sanitize_out(out, output_shape, output_split, x.device, x.comm) if output_dtype != out.dtype: @@ -1779,6 +1802,8 @@ def std( Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If ``ddof=1``, the Bessel correction will be applied. Setting ``ddof>1`` raises a ``NotImplementedError``. + **kwargs + Extra keyword arguments Examples -------- @@ -1787,7 +1812,7 @@ def std( DNDarray([[ 0.5714, 0.0048, -0.2942]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a) DNDarray(0.3590, dtype=ht.float32, device=cpu:0, split=None) - >>> a = ht.random.randn(4,4) + >>> a = ht.random.randn(4, 4) >>> a DNDarray([[ 0.8488, 1.2225, 1.2498, -1.4592], [-0.5820, -0.3928, 0.1509, -0.0174], @@ -1800,7 +1825,7 @@ def std( """ # sanitize dtype if types.heat_type_is_exact(x.dtype): - if x.dtype is types.int64: + if x.dtype is types.int64 and not x.larray.is_mps: x = x.astype(types.float64) else: x = x.astype(types.float32) @@ -1919,6 +1944,9 @@ def var( Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If ``ddof=1``, the Bessel correction will be applied. Setting ``ddof>1`` raises a ``NotImplementedError``. + **kwargs + Extra keyword arguments + Notes ----- @@ -1938,14 +1966,14 @@ def var( Examples -------- - >>> a = ht.random.randn(1,3) + >>> a = ht.random.randn(1, 3) >>> a DNDarray([[-2.3589, -0.2073, 0.8806]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a) DNDarray(1.8119, dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, ddof=1) DNDarray(2.7179, dtype=ht.float32, device=cpu:0, split=None) - >>> a = ht.random.randn(4,4) + >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-0.8523, -1.4982, -0.5848, -0.2554], [ 0.8458, -0.3125, -0.2430, 1.9016], diff --git a/heat/core/stride_tricks.py b/heat/core/stride_tricks.py index 22e9fff694..a0d7ce0a15 100644 --- a/heat/core/stride_tricks.py +++ b/heat/core/stride_tricks.py @@ -22,20 +22,27 @@ def broadcast_shape(shape_a: Tuple[int, ...], shape_b: Tuple[int, ...]) -> Tuple Shape of second operand Raises - ------- + ------ ValueError If the two shapes cannot be broadcast. Examples -------- >>> import heat as ht - >>> ht.core.stride_tricks.broadcast_shape((5,4),(4,)) + >>> ht.core.stride_tricks.broadcast_shape((5, 4), (4,)) (5, 4) - >>> ht.core.stride_tricks.broadcast_shape((1,100,1),(10,1,5)) + >>> ht.core.stride_tricks.broadcast_shape((1, 100, 1), (10, 1, 5)) (10, 100, 5) - >>> ht.core.stride_tricks.broadcast_shape((8,1,6,1),(7,1,5,)) + >>> ht.core.stride_tricks.broadcast_shape( + ... (8, 1, 6, 1), + ... ( + ... 7, + ... 1, + ... 5, + ... ), + ... ) (8,7,6,5)) - >>> ht.core.stride_tricks.broadcast_shape((2,1),(8,4,3)) + >>> ht.core.stride_tricks.broadcast_shape((2, 1), (8, 4, 3)) Traceback (most recent call last): File "", line 1, in File "heat/core/stride_tricks.py", line 42, in broadcast_shape @@ -69,20 +76,27 @@ def broadcast_shapes(*shapes: Tuple[int, ...]) -> Tuple[int, ...]: The broadcast output shape. Raises - ------- + ------ ValueError If the shapes cannot be broadcast. Examples -------- >>> import heat as ht - >>> ht.broadcast_shapes((5,4),(4,)) + >>> ht.broadcast_shapes((5, 4), (4,)) (5, 4) - >>> ht.broadcast_shapes((1,100,1),(10,1,5)) + >>> ht.broadcast_shapes((1, 100, 1), (10, 1, 5)) (10, 100, 5) - >>> ht.broadcast_shapes((8,1,6,1),(7,1,5,)) + >>> ht.broadcast_shapes( + ... (8, 1, 6, 1), + ... ( + ... 7, + ... 1, + ... 5, + ... ), + ... ) (8,7,6,5)) - >>> ht.broadcast_shapes((2,1),(8,4,3)) + >>> ht.broadcast_shapes((2, 1), (8, 4, 3)) Traceback (most recent call last): File "", line 1, in File "heat/core/stride_tricks.py", line 100, in broadcast_shapes @@ -114,18 +128,18 @@ def sanitize_axis( The axis to be sanitized Raises - ------- + ------ ValueError if the axis cannot be sanitized, i.e. out of bounds. TypeError if the axis is not integral. Examples - ------- + -------- >>> import heat as ht - >>> ht.core.stride_tricks.sanitize_axis((5,4,4),1) + >>> ht.core.stride_tricks.sanitize_axis((5, 4, 4), 1) 1 - >>> ht.core.stride_tricks.sanitize_axis((5,4,4),-1) + >>> ht.core.stride_tricks.sanitize_axis((5, 4, 4), -1) 2 >>> ht.core.stride_tricks.sanitize_axis((5, 4), (1,)) (1,) @@ -178,7 +192,7 @@ def sanitize_shape(shape: Union[int, Tuple[int, ...]], lval: int = 0) -> Tuple[i Lowest legal value Raises - ------- + ------ ValueError If the shape contains illegal values, e.g. negative numbers. TypeError diff --git a/heat/core/tests/test_arithmetics.py b/heat/core/tests/test_arithmetics.py index 8d01c82358..8b8a8a902d 100644 --- a/heat/core/tests/test_arithmetics.py +++ b/heat/core/tests/test_arithmetics.py @@ -1,5 +1,6 @@ import operator import math +import platform import heat as ht import numpy as np @@ -273,10 +274,11 @@ def test_add_(self): a += b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a += b + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a += b def test_bitwise_and(self): int_result = ht.array([[0, 2], [2, 0]]) @@ -1441,7 +1443,11 @@ def test_cumsum_(self): a_string = self.a_string # reset def test_diff(self): - ht_array = ht.random.rand(20, 20, 20, split=None) + if self.is_mps: + dtype = ht.float32 + else: + dtype = ht.float64 + ht_array = ht.random.rand(20, 20, 20, split=None, dtype=dtype) arb_slice = [0] * 3 for dim in range(0, 3): # loop over 3 dimensions arb_slice[dim] = slice(None) @@ -1469,11 +1475,14 @@ def test_diff(self): ht_diff_pend = ht.diff(lp_array, n=nl, axis=ax, prepend=0, append=ht_append) np_append = np.ones(append_shape, dtype=lp_array.larray.cpu().numpy().dtype) np_diff_pend = ht.array( - np.diff(np_array, n=nl, axis=ax, prepend=0, append=np_append) + np.diff(np_array, n=nl, axis=ax, prepend=0, append=np_append).astype( + lp_array.larray.cpu().numpy().dtype + ), + dtype=dtype, ) self.assertTrue(ht.equal(ht_diff_pend, np_diff_pend)) self.assertEqual(ht_diff_pend.split, sp) - self.assertEqual(ht_diff_pend.dtype, ht.float64) + self.assertEqual(ht_diff_pend.dtype, dtype) np_array = ht_array.numpy() ht_diff = ht.diff(ht_array, n=2) @@ -1482,7 +1491,7 @@ def test_diff(self): self.assertEqual(ht_diff.split, None) self.assertEqual(ht_diff.dtype, ht_array.dtype) - ht_array = ht.random.rand(20, 20, 20, split=1, dtype=ht.float64) + ht_array = ht.random.rand(20, 20, 20, split=1, dtype=dtype) np_array = ht_array.copy().numpy() ht_diff = ht.diff(ht_array, n=2) np_diff = ht.array(np.diff(np_array, n=2)) @@ -1762,10 +1771,12 @@ def test_div_(self): a /= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a /= b + # MPS does not support float64 + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a /= b def test_divmod(self): # basic tests as floor_device and mod are tested separately @@ -2058,10 +2069,12 @@ def test_floordiv_(self): a //= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a //= b + # MPS does not support float64 + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a //= b def test_fmod(self): result = ht.array([[1.0, 0.0], [1.0, 0.0]]) @@ -2253,10 +2266,12 @@ def test_fmod_(self): a.fmod_(b) # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a.fmod_(b) + # MPS does not support float64 + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a.fmod_(b) def test_gcd(self): a = ht.array([5, 10, 15]) @@ -2426,26 +2441,28 @@ def test_gcd_(self): a.gcd_(b) # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) * 6 - b = ht.ones(3, dtype=ht.float64) * 4 - with self.assertRaises(TypeError): - a.gcd_(b) + # MPS does not support float64 + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) * 6 + b = ht.ones(3, dtype=ht.float64) * 4 + with self.assertRaises(TypeError): + a.gcd_(b) def test_hypot(self): a = ht.array([2.0]) b = ht.array([1.0, 3.0, 5.0]) gt = ht.array([5, 13, 29]) - result = (ht.hypot(a, b) ** 2).astype(ht.int64) - - self.assertTrue(ht.equal(gt, result)) - self.assertEqual(result.dtype, ht.int64) + result = ht.hypot(a, b) ** 2 + self.assertTrue(ht.allclose(gt, result)) with self.assertRaises(TypeError): ht.hypot(a) with self.assertRaises(TypeError): ht.hypot("a", "b") - with self.assertRaises(TypeError): - ht.hypot(a.astype(ht.int32), b.astype(ht.int32)) + if a.device.torch_device.startswith("cpu"): + # torch.hypot does not support Int datatypes on CPU + with self.assertRaises(TypeError): + ht.hypot(a.astype(ht.int32), b.astype(ht.int32)) def test_hypot_(self): # Copies of class variables for the in-place operations @@ -2589,10 +2606,11 @@ def test_hypot_(self): a.hypot_(b) # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a.hypot_(b) + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a.hypot_(b) def test_invert(self): int8_tensor = ht.array([[0, 1], [2, -2]], dtype=ht.int8) @@ -2853,10 +2871,11 @@ def test_lcm_(self): a.lcm_(b) # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) * 2 - b = ht.ones(3, dtype=ht.float64) * 3 - with self.assertRaises(TypeError): - a.lcm_(b) + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) * 2 + b = ht.ones(3, dtype=ht.float64) * 3 + with self.assertRaises(TypeError): + a.lcm_(b) def test_left_shift(self): int_tensor = ht.array([[0, 1], [2, 3]]) @@ -3239,10 +3258,11 @@ def test_mul_(self): a *= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a *= b + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a *= b def test_nan_to_num(self): arr = ht.array([1, 2, 3, ht.nan, ht.inf, -ht.inf]) @@ -3379,12 +3399,12 @@ def test_nansum(self): def test_neg(self): self.assertTrue(ht.equal(ht.neg(ht.array([-1, 1])), ht.array([1, -1]))) self.assertTrue(ht.equal(-ht.array([-1.0, 1.0]), ht.array([1.0, -1.0]))) - - a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0) - b = out = ht.empty(5, dtype=ht.complex64, split=0) - ht.negative(a, out=out) - self.assertTrue(ht.equal(out, ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0))) - self.assertIs(out, b) + if not self.is_mps: + a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0) + b = out = ht.empty(5, dtype=ht.complex64, split=0) + ht.negative(a, out=out) + self.assertTrue(ht.equal(out, ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0))) + self.assertIs(out, b) with self.assertRaises(TypeError): ht.neg(1) @@ -3400,8 +3420,10 @@ def test_neg_(self): result = ht.array([[-1.0, -2.0], [-3.0, -4.0]]) int_result = ht.array([-2, -2]) - a_complex_vector = a_complex_vector_double = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0) - complex_result = ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0) + a_complex_vector = a_complex_vector_double = ht.array( + [1 + 1j, 2 - 2j, 3, 4j, 5], split=0, dtype=ht.complex64 + ) + complex_result = ht.array([-1 - 1j, -2 + 2j, -3, -4j, -5], split=0, dtype=ht.complex64) # We identify the underlying PyTorch objects to check whether operations are really in-place underlying_torch_tensor = a_tensor.larray @@ -3423,8 +3445,12 @@ def test_neg_(self): self.assertIs(an_int_vector.larray, underlying_int_torch_tensor) underlying_int_torch_tensor.copy_(self.an_int_vector.larray) - self.assertTrue(ht.equal(a_complex_vector.neg_(), complex_result)) - self.assertTrue(ht.equal(a_complex_vector, complex_result)) + # test only on Mac Os 14.0 and higher + if self.is_mps: + macos_version = int(platform.mac_ver()[0].split(".")[0]) + if not self.is_mps or macos_version >= 14: + self.assertTrue(ht.equal(a_complex_vector.neg_(), complex_result)) + self.assertTrue(ht.equal(a_complex_vector, complex_result)) self.assertIs(a_complex_vector, a_complex_vector_double) self.assertIs(a_complex_vector.larray, underlying_complex_torch_tensor) @@ -3453,11 +3479,12 @@ def test_pos(self): self.assertTrue(ht.equal(ht.pos(ht.array([-1, 1])), ht.array([-1, 1]))) self.assertTrue(ht.equal(+ht.array([-1.0, 1.0]), ht.array([-1.0, 1.0]))) - a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0) - b = out = ht.empty(5, dtype=ht.complex64, split=0) - ht.positive(a, out=out) - self.assertTrue(ht.equal(out, a)) - self.assertIs(out, b) + if not self.is_mps: + a = ht.array([1 + 1j, 2 - 2j, 3, 4j, 5], split=0) + b = out = ht.empty(5, dtype=ht.complex64, split=0) + ht.positive(a, out=out) + self.assertTrue(ht.equal(out, a)) + self.assertIs(out, b) with self.assertRaises(TypeError): ht.pos(1) @@ -3661,10 +3688,11 @@ def test_pow_(self): a **= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a **= b + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a **= b def test_prod(self): array_len = 11 @@ -3976,10 +4004,11 @@ def test_remainder_(self): a %= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a %= b + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a %= b def test_right_shift(self): int_tensor = ht.array([[0, 1], [2, 3]]) @@ -4361,10 +4390,11 @@ def test_sub_(self): a -= b # test function for invalid casting between data types - a = ht.ones(3, dtype=ht.float32) - b = ht.ones(3, dtype=ht.float64) - with self.assertRaises(TypeError): - a -= b + if not self.is_mps: + a = ht.ones(3, dtype=ht.float32) + b = ht.ones(3, dtype=ht.float64) + with self.assertRaises(TypeError): + a -= b def test_sum(self): array_len = 11 diff --git a/heat/core/tests/test_communication.py b/heat/core/tests/test_communication.py index 48187a591b..131b21f79a 100644 --- a/heat/core/tests/test_communication.py +++ b/heat/core/tests/test_communication.py @@ -1,10 +1,18 @@ +import os +import unittest +import platform + import numpy as np import torch import heat as ht from .test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.machine() == "arm64" + +@unittest.skipIf(is_mps, "Distribution not supported on Apple MPS") class TestCommunication(TestCase): @classmethod def setUpClass(cls): @@ -215,7 +223,6 @@ def test_allgather(self): # contiguous data data = ht.ones((1, 7)) output = ht.zeros((ht.MPI_WORLD.size, 7)) - # ensure prior invariants self.assertTrue(data.larray.is_contiguous()) self.assertTrue(output.larray.is_contiguous()) @@ -2492,3 +2499,44 @@ def test_alltoallSorting(self): test4.comm.Alltoallv(test4.larray, redistributed4, send_axis=2, recv_axis=2) with self.assertRaises(NotImplementedError): test4.comm.Alltoallv(test4.larray, redistributed4, send_axis=None) + + # The following test is only for the bool data type to save memory + @unittest.skipIf( + ht.MPI_WORLD.size == 1 or ht.MPI_WORLD.size > 2 or "rocm" in torch.__version__, + "Only for two or three processes and not on the AMD runner", + ) + def test_largecount_workaround_IsendRecv(self): + shape = (2**15, 2**16) + data = ( + torch.zeros(shape, dtype=torch.bool) + if ht.MPI_WORLD.rank % 2 == 0 + else torch.ones(shape, dtype=torch.bool) + ) + buf = torch.empty(shape, dtype=torch.bool) + req = ht.MPI_WORLD.Isend( + data, ht.MPI_WORLD.rank - 1 if ht.MPI_WORLD.rank > 0 else ht.MPI_WORLD.size - 1 + ) + ht.MPI_WORLD.Recv( + buf, ht.MPI_WORLD.rank + 1 if ht.MPI_WORLD.rank < ht.MPI_WORLD.size - 1 else 0 + ) + req.Wait() + self.assertTrue( + buf.all() + if (ht.MPI_WORLD.rank % 2 == 0 and ht.MPI_WORLD.rank != ht.MPI_WORLD.size - 1) + else not buf.all() + ) + + # the following test is only for up to three processes to save memory + @unittest.skipIf( + ht.MPI_WORLD.size == 1 or ht.MPI_WORLD.size > 2 or "rocm" in torch.__version__, + "Only for two or three processes and not on the AMD runner", + ) + def test_largecount_workaround_Allreduce(self): + shape = (2**10, 2**11, 2**10) + data = ( + torch.zeros(shape, dtype=torch.bool) + if ht.MPI_WORLD.rank % 2 == 0 + else torch.ones(shape, dtype=torch.bool) + ) + ht.MPI_WORLD.Allreduce(ht.MPI.IN_PLACE, data, op=ht.MPI.SUM) + self.assertTrue(data.all()) diff --git a/heat/core/tests/test_complex_math.py b/heat/core/tests/test_complex_math.py index dd0f8236cd..cc56088bce 100644 --- a/heat/core/tests/test_complex_math.py +++ b/heat/core/tests/test_complex_math.py @@ -1,210 +1,225 @@ import numpy as np import torch import heat as ht +import platform from .test_suites.basic_test import TestCase class TestComplex(TestCase): def test_abs(self): - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) - absolute = ht.absolute(a) - res = torch.abs(a.larray) - - self.assertIs(absolute.device, self.device) - self.assertIs(absolute.dtype, ht.float) - self.assertEqual(absolute.shape, (5,)) - self.assertTrue(torch.equal(absolute.larray, res)) - - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) - absolute = ht.absolute(a) - res = torch.abs(a.larray) - - self.assertIs(absolute.device, self.device) - self.assertIs(absolute.dtype, ht.float) - self.assertEqual(absolute.shape, (5,)) - self.assertTrue(torch.equal(absolute.larray, res)) - - a = ht.array( - [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1, dtype=ht.complex128 - ) - absolute = ht.absolute(a) - res = torch.abs(a.larray) - - self.assertIs(absolute.device, self.device) - self.assertIs(absolute.dtype, ht.double) - self.assertEqual(absolute.shape, (3, 2)) - self.assertTrue(torch.equal(absolute.larray, res)) + if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14: + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) + absolute = ht.absolute(a) + res = torch.abs(a.larray) + + self.assertIs(absolute.device, self.device) + self.assertIs(absolute.dtype, ht.float) + self.assertEqual(absolute.shape, (5,)) + self.assertTrue(torch.equal(absolute.larray, res)) + + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) + absolute = ht.absolute(a) + res = torch.abs(a.larray) + + self.assertIs(absolute.device, self.device) + self.assertIs(absolute.dtype, ht.float) + self.assertEqual(absolute.shape, (5,)) + self.assertTrue(torch.equal(absolute.larray, res)) + + if not self.is_mps: + a = ht.array( + [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], + split=1, + dtype=ht.complex128, + ) + absolute = ht.absolute(a) + res = torch.abs(a.larray) + + self.assertIs(absolute.device, self.device) + self.assertIs(absolute.dtype, ht.double) + self.assertEqual(absolute.shape, (3, 2)) + self.assertTrue(torch.equal(absolute.larray, res)) def test_angle(self): - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) - angle = ht.angle(a) - res = torch.angle(a.larray) - - self.assertIs(angle.device, self.device) - self.assertIs(angle.dtype, ht.float) - self.assertEqual(angle.shape, (5,)) - self.assertTrue(torch.equal(angle.larray, res)) - - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) - angle = ht.angle(a) - res = torch.angle(a.larray) - - self.assertIs(angle.device, self.device) - self.assertIs(angle.dtype, ht.float) - self.assertEqual(angle.shape, (5,)) - self.assertTrue(torch.equal(angle.larray, res)) - - a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1) - angle = ht.angle(a, deg=True) - res = ht.array( - [[0.0, 90.0], [45.0, 135.0], [-45.0, -135.0]], - dtype=ht.float32, - device=self.device, - split=1, - ) - - self.assertIs(angle.device, self.device) - self.assertIs(angle.dtype, ht.float32) - self.assertEqual(angle.shape, (3, 2)) - self.assertTrue(ht.equal(angle, res)) - - # Not complex - a = ht.ones((4, 4), split=1) - angle = ht.angle(a) - res = ht.zeros((4, 4), split=1) - - self.assertIs(angle.device, self.device) - self.assertIs(angle.dtype, ht.float32) - self.assertEqual(angle.shape, (4, 4)) - self.assertTrue(ht.equal(angle, res)) + if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14: + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) + angle = ht.angle(a) + res = torch.angle(a.larray) + + self.assertIs(angle.device, self.device) + self.assertIs(angle.dtype, ht.float) + self.assertEqual(angle.shape, (5,)) + self.assertTrue(torch.equal(angle.larray, res)) + + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) + angle = ht.angle(a) + res = torch.angle(a.larray) + + self.assertIs(angle.device, self.device) + self.assertIs(angle.dtype, ht.float) + self.assertEqual(angle.shape, (5,)) + self.assertTrue(torch.equal(angle.larray, res)) + + a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=1) + angle = ht.angle(a, deg=True) + res = ht.array( + [[0.0, 90.0], [45.0, 135.0], [-45.0, -135.0]], + dtype=ht.float32, + device=self.device, + split=1, + ) + + self.assertIs(angle.device, self.device) + self.assertIs(angle.dtype, ht.float32) + self.assertEqual(angle.shape, (3, 2)) + self.assertTrue(ht.equal(angle, res)) + + # Not complex + a = ht.ones((4, 4), split=1) + angle = ht.angle(a) + res = ht.zeros((4, 4), split=1) + + self.assertIs(angle.device, self.device) + self.assertIs(angle.dtype, ht.float32) + self.assertEqual(angle.shape, (4, 4)) + self.assertTrue(ht.equal(angle, res)) def test_conjugate(self): - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) - conj = ht.conjugate(a) - res = ht.array( - [1 - 0j, -1j, 1 - 1j, -2 - 2j, 3 + 3j], dtype=ht.complex64, device=self.device - ) - - self.assertIs(conj.device, self.device) - self.assertIs(conj.dtype, ht.complex64) - self.assertEqual(conj.shape, (5,)) - # equal on complex numbers does not work on PyTorch - self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) - self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) - - a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=0) - conj = ht.conjugate(a) - res = ht.array( - [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]], - dtype=ht.complex64, - device=self.device, - split=0, - ) - - self.assertIs(conj.device, self.device) - self.assertIs(conj.dtype, ht.complex64) - self.assertEqual(conj.shape, (3, 2)) - # equal on complex numbers does not work on PyTorch - self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) - self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) - - a = ht.array( - [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], dtype=ht.complex128, split=1 - ) - conj = ht.conjugate(a) - res = ht.array( - [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]], - dtype=ht.complex128, - device=self.device, - split=1, - ) - - self.assertIs(conj.device, self.device) - self.assertIs(conj.dtype, ht.complex128) - self.assertEqual(conj.shape, (3, 2)) - # equal on complex numbers does not work on PyTorch - self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) - self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) - - # Not complex - a = ht.ones((4, 4)) - conj = ht.conj(a) - res = ht.ones((4, 4)) - - self.assertIs(conj.device, self.device) - self.assertIs(conj.dtype, ht.float32) - self.assertEqual(conj.shape, (4, 4)) - self.assertTrue(ht.equal(conj, res)) - - # DNDarray method - a = ht.array([1 + 1j, 1 - 1j]) - conj = a.conj() - res = ht.array([1 - 1j, 1 + 1j]) - - self.assertIs(conj.device, self.device) - self.assertTrue(ht.equal(conj, res)) + if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14: + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) + conj = ht.conjugate(a) + res = ht.array( + [1 - 0j, -1j, 1 - 1j, -2 - 2j, 3 + 3j], dtype=ht.complex64, device=self.device + ) + + self.assertIs(conj.device, self.device) + self.assertIs(conj.dtype, ht.complex64) + self.assertEqual(conj.shape, (5,)) + # equal on complex numbers does not work on PyTorch + self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) + if not self.is_mps: + # precision loss on imaginary part on MPS + self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) + + a = ht.array([[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], split=0) + conj = ht.conjugate(a) + res = ht.array( + [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]], + dtype=ht.complex64, + device=self.device, + split=0, + ) + + self.assertIs(conj.device, self.device) + self.assertIs(conj.dtype, ht.complex64) + self.assertEqual(conj.shape, (3, 2)) + # equal on complex numbers does not work on PyTorch + self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) + if not self.is_mps: + # precision loss on imaginary part on MPS + self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) + + if not self.is_mps: + # complex128 not supported on MPS + a = ht.array( + [[1.0, 1.0j], [1 + 1j, -2 + 2j], [3 - 3j, -4 - 4j]], + dtype=ht.complex128, + split=1, + ) + conj = ht.conjugate(a) + res = ht.array( + [[1 - 0j, -1j], [1 - 1j, -2 - 2j], [3 + 3j, -4 + 4j]], + dtype=ht.complex128, + device=self.device, + split=1, + ) + + self.assertIs(conj.device, self.device) + self.assertIs(conj.dtype, ht.complex128) + self.assertEqual(conj.shape, (3, 2)) + # equal on complex numbers does not work on PyTorch + self.assertTrue(ht.equal(ht.real(conj), ht.real(res))) + self.assertTrue(ht.equal(ht.imag(conj), ht.imag(res))) + + # Not complex + a = ht.ones((4, 4)) + conj = ht.conj(a) + res = ht.ones((4, 4)) + + self.assertIs(conj.device, self.device) + self.assertIs(conj.dtype, ht.float32) + self.assertEqual(conj.shape, (4, 4)) + self.assertTrue(ht.equal(conj, res)) + + # DNDarray method + a = ht.array([1 + 1j, 1 - 1j]) + conj = a.conj() + res = ht.array([1 - 1j, 1 + 1j]) + + self.assertIs(conj.device, self.device) + self.assertTrue(ht.equal(conj, res)) def test_imag(self): - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) - imag = ht.imag(a) - res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device) - - self.assertIs(imag.device, self.device) - self.assertIs(imag.dtype, ht.float) - self.assertEqual(imag.shape, (5,)) - self.assertTrue(ht.equal(imag, res)) - - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) - imag = ht.imag(a) - res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device, split=0) - - self.assertIs(imag.device, self.device) - self.assertIs(imag.dtype, ht.float) - self.assertEqual(imag.shape, (5,)) - self.assertTrue(ht.equal(imag, res)) - - # Not complex - a = ht.ones((4, 4)) - imag = a.imag - res = ht.zeros((4, 4)) - - self.assertIs(imag.device, self.device) - self.assertIs(imag.dtype, ht.float32) - self.assertEqual(imag.shape, (4, 4)) - self.assertTrue(ht.equal(imag, res)) + if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14: + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) + imag = ht.imag(a) + res = ht.array([0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device) + + self.assertIs(imag.device, self.device) + self.assertIs(imag.dtype, ht.float) + self.assertEqual(imag.shape, (5,)) + self.assertTrue(ht.equal(imag, res)) + + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) + imag = ht.imag(a) + res = ht.array( + [0.0, 1.0, 1.0, 2.0, -3.0], dtype=ht.float32, device=self.device, split=0 + ) + + self.assertIs(imag.device, self.device) + self.assertIs(imag.dtype, ht.float) + self.assertEqual(imag.shape, (5,)) + self.assertTrue(ht.equal(imag, res)) + + # Not complex + a = ht.ones((4, 4)) + imag = a.imag + res = ht.zeros((4, 4)) + + self.assertIs(imag.device, self.device) + self.assertIs(imag.dtype, ht.float32) + self.assertEqual(imag.shape, (4, 4)) + self.assertTrue(ht.equal(imag, res)) def test_real(self): - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) - real = ht.real(a) - res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device) - - self.assertIs(real.device, self.device) - self.assertIs(real.dtype, ht.float) - self.assertEqual(real.shape, (5,)) - self.assertTrue(ht.equal(real, res)) - - a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) - real = ht.real(a) - res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device, split=0) - - self.assertIs(real.device, self.device) - self.assertIs(real.dtype, ht.float) - self.assertEqual(real.shape, (5,)) - self.assertTrue(ht.equal(real, res)) - - # Not complex - a = ht.ones((4, 4), split=1) - real = a.real - res = ht.ones((4, 4), split=1) - - self.assertIs(real.device, self.device) - self.assertIs(real.dtype, ht.float32) - self.assertEqual(real.shape, (4, 4)) - self.assertIs(real, a) - - # This test will be redundant with PyTorch 1.7 - def test_full(self): - a = ht.full((4, 4), 1 + 1j) - - self.assertIs(a.dtype, ht.complex64) + if not self.is_mps or int(platform.mac_ver()[0].split(".")[0]) >= 14: + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j]) + real = ht.real(a) + res = ht.array([1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device) + + self.assertIs(real.device, self.device) + self.assertIs(real.dtype, ht.float) + self.assertEqual(real.shape, (5,)) + self.assertTrue(ht.equal(real, res)) + + a = ht.array([1.0, 1.0j, 1 + 1j, -2 + 2j, 3 - 3j], split=0) + real = ht.real(a) + res = ht.array( + [1.0, 0.0, 1.0, -2.0, 3.0], dtype=ht.float32, device=self.device, split=0 + ) + + self.assertIs(real.device, self.device) + self.assertIs(real.dtype, ht.float) + self.assertEqual(real.shape, (5,)) + self.assertTrue(ht.equal(real, res)) + + # Not complex + a = ht.ones((4, 4), split=1) + real = a.real + res = ht.ones((4, 4), split=1) + + self.assertIs(real.device, self.device) + self.assertIs(real.dtype, ht.float32) + self.assertEqual(real.shape, (4, 4)) + self.assertIs(real, a) diff --git a/heat/core/tests/test_dndarray.py b/heat/core/tests/test_dndarray.py index 9cc1361b80..c6123c1cf2 100644 --- a/heat/core/tests/test_dndarray.py +++ b/heat/core/tests/test_dndarray.py @@ -244,14 +244,39 @@ def test_array(self): self.assertEqual(x.__array__().shape, x.gshape) # distributed case - x = ht.arange(6 * 7 * 8, dtype=ht.float64, split=0).reshape((6, 7, 8)) - x_np = np.arange(6 * 7 * 8, dtype=np.float64).reshape((6, 7, 8)) + if self.is_mps: + dtype = ht.float32 + np_dtype = np.float32 + else: + dtype = ht.float64 + np_dtype = np.float64 + x = ht.arange(6 * 7 * 8, dtype=dtype, split=0).reshape((6, 7, 8)) + x_np = np.arange(6 * 7 * 8, dtype=np_dtype).reshape((6, 7, 8)) self.assertTrue((x.__array__() == x.larray.cpu().numpy()).all()) self.assertIsInstance(x.__array__(), np.ndarray) self.assertEqual(x.__array__().dtype, x_np.dtype) self.assertEqual(x.__array__().shape, x.lshape) + def test_array_ufunc(self): + arr = ht.array([1, 2, 3, 4]) + self.assertIsInstance(np.multiply(arr, 3), ht.DNDarray) + self.assertIsInstance(np.add(arr, 3), ht.DNDarray) + self.assertIsInstance(np.sin(arr), ht.DNDarray) + + with self.assertRaises(TypeError): + np.multiply.reduce(arr) + with self.assertRaises(TypeError): + np.heaviside(arr, 5) + + def test_array_function(self): + arr = ht.array([1, 2, 3, 4]) + self.assertIsInstance(np.concatenate([arr, arr]), ht.DNDarray) + self.assertIsInstance(np.sum(arr, axis=0), ht.DNDarray) + + with self.assertRaises(TypeError): + np.array_equiv(arr, arr) + def test_larray(self): # undistributed case x = ht.arange(6 * 7 * 8).reshape((6, 7, 8)) @@ -320,12 +345,13 @@ def test_astype(self): self.assertEqual(as_uint8.larray.dtype, torch.uint8) self.assertIsNot(as_uint8, data) - # check the copy case for uint8 - as_float64 = data.astype(ht.float64, copy=False) - self.assertIsInstance(as_float64, ht.DNDarray) - self.assertEqual(as_float64.dtype, ht.float64) - self.assertEqual(as_float64.larray.dtype, torch.float64) - self.assertIs(as_float64, data) + # check the copy case for float64 + if not self.is_mps: + as_float64 = data.astype(ht.float64, copy=False) + self.assertIsInstance(as_float64, ht.DNDarray) + self.assertEqual(as_float64.dtype, ht.float64) + self.assertEqual(as_float64.larray.dtype, torch.float64) + self.assertIs(as_float64, data) def test_balance_and_lshape_map(self): data = ht.zeros((70, 20), split=0) @@ -347,9 +373,10 @@ def test_balance_and_lshape_map(self): data.balance_() self.assertTrue(data.is_balanced()) - data = ht.zeros((70, 20), split=0, dtype=ht.float64) - data = ht.balance(data[:50], copy=True) - self.assertTrue(data.is_balanced()) + if not self.is_mps: + data = ht.zeros((70, 20), split=0, dtype=ht.float64) + data = ht.balance(data[:50], copy=True) + self.assertTrue(data.is_balanced()) data = ht.zeros((4, 120), split=1, dtype=ht.int64) data = data[:, 40:70].balance() @@ -357,7 +384,9 @@ def test_balance_and_lshape_map(self): data = np.loadtxt("heat/datasets/iris.csv", delimiter=";") htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0) - self.assertTrue(ht.equal(htdata, ht.array(data, split=0, dtype=ht.float))) + self.assertTrue( + ht.equal(htdata, ht.array(data.astype(np.float32), split=0, dtype=ht.float)) + ) if ht.MPI_WORLD.size > 4: rank = ht.MPI_WORLD.rank @@ -697,7 +726,8 @@ def test_lnbytes(self): # float x_float32 = ht.arange(6 * 7 * 8, dtype=ht.float32).reshape((6, 7, 8)) - x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8)) + if not self.is_mps: + x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8)) # bool x_bool = ht.arange(6 * 7 * 8, dtype=ht.bool).reshape((6, 7, 8)) @@ -709,7 +739,8 @@ def test_lnbytes(self): self.assertEqual(x_int64.lnbytes, x_int64.gnbytes) self.assertEqual(x_float32.lnbytes, x_float32.gnbytes) - self.assertEqual(x_float64.lnbytes, x_float64.gnbytes) + if not self.is_mps: + self.assertEqual(x_float64.lnbytes, x_float64.gnbytes) self.assertEqual(x_bool.lnbytes, x_bool.gnbytes) @@ -724,7 +755,8 @@ def test_lnbytes(self): # float x_float32_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float32) - x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64) + if not self.is_mps: + x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64) # bool x_bool_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.bool) @@ -736,7 +768,8 @@ def test_lnbytes(self): self.assertEqual(x_int64_d.lnbytes, x_int64_d.lnumel * 8) self.assertEqual(x_float32_d.lnbytes, x_float32_d.lnumel * 4) - self.assertEqual(x_float64_d.lnbytes, x_float64_d.lnumel * 8) + if not self.is_mps: + self.assertEqual(x_float64_d.lnbytes, x_float64_d.lnumel * 8) self.assertEqual(x_bool_d.lnbytes, x_bool_d.lnumel * 1) @@ -752,7 +785,8 @@ def test_nbytes(self): # float x_float32 = ht.arange(6 * 7 * 8, dtype=ht.float32).reshape((6, 7, 8)) - x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8)) + if not self.is_mps: + x_float64 = ht.arange(6 * 7 * 8, dtype=ht.float64).reshape((6, 7, 8)) # bool x_bool = ht.arange(6 * 7 * 8, dtype=ht.bool).reshape((6, 7, 8)) @@ -764,7 +798,8 @@ def test_nbytes(self): self.assertEqual(x_int64.nbytes, 336 * 8) self.assertEqual(x_float32.nbytes, 336 * 4) - self.assertEqual(x_float64.nbytes, 336 * 8) + if not self.is_mps: + self.assertEqual(x_float64.nbytes, 336 * 8) self.assertEqual(x_bool.nbytes, 336 * 1) @@ -776,7 +811,8 @@ def test_nbytes(self): self.assertEqual(x_int64.nbytes, x_int64.gnbytes) self.assertEqual(x_float32.nbytes, x_float32.gnbytes) - self.assertEqual(x_float64.nbytes, x_float64.gnbytes) + if not self.is_mps: + self.assertEqual(x_float64.nbytes, x_float64.gnbytes) self.assertEqual(x_bool.nbytes, x_bool.gnbytes) @@ -791,7 +827,8 @@ def test_nbytes(self): # float x_float32_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float32) - x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64) + if not self.is_mps: + x_float64_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.float64) # bool x_bool_d = ht.arange(6 * 7 * 8, split=0, dtype=ht.bool) @@ -803,7 +840,8 @@ def test_nbytes(self): self.assertEqual(x_int64_d.nbytes, 336 * 8) self.assertEqual(x_float32_d.nbytes, 336 * 4) - self.assertEqual(x_float64_d.nbytes, 336 * 8) + if not self.is_mps: + self.assertEqual(x_float64_d.nbytes, 336 * 8) self.assertEqual(x_bool_d.nbytes, 336 * 1) @@ -815,7 +853,8 @@ def test_nbytes(self): self.assertEqual(x_int64_d.nbytes, x_int64_d.gnbytes) self.assertEqual(x_float32_d.nbytes, x_float32_d.gnbytes) - self.assertEqual(x_float64_d.nbytes, x_float64_d.gnbytes) + if not self.is_mps: + self.assertEqual(x_float64_d.nbytes, x_float64_d.gnbytes) self.assertEqual(x_bool_d.nbytes, x_bool_d.gnbytes) @@ -824,21 +863,23 @@ def test_ndim(self): self.assertEqual(a.ndim, 4) def test_numpy(self): - # ToDo: numpy does not work for distributed tensors du to issue# + # ToDo: numpy does not work for distributed tensors due to issue# # Add additional tests if the issue is solved - a = np.random.randn(10, 8) - b = ht.array(a) - self.assertIsInstance(b.numpy(), np.ndarray) - self.assertEqual(b.numpy().shape, a.shape) - self.assertEqual(b.numpy().tolist(), b.larray.cpu().numpy().tolist()) + if not self.is_mps: + a = np.random.randn(10, 8) + b = ht.array(a) + self.assertIsInstance(b.numpy(), np.ndarray) + self.assertEqual(b.numpy().shape, a.shape) + self.assertEqual(b.numpy().tolist(), b.larray.cpu().numpy().tolist()) a = ht.ones((10, 8), dtype=ht.float32) b = np.ones((2, 2)).astype("float32") self.assertEqual(a.numpy().dtype, b.dtype) - a = ht.ones((10, 8), dtype=ht.float64) - b = np.ones((2, 2)).astype("float64") - self.assertEqual(a.numpy().dtype, b.dtype) + if not self.is_mps: + a = ht.ones((10, 8), dtype=ht.float64) + b = np.ones((2, 2)).astype("float64") + self.assertEqual(a.numpy().dtype, b.dtype) a = ht.ones((10, 8), dtype=ht.int32) b = np.ones((2, 2)).astype("int32") @@ -857,18 +898,19 @@ def test_or(self): ) def test_partitioned(self): - a = ht.zeros((120, 120), split=0) - parted = a.__partitioned__ - self.assertEqual(parted["shape"], (120, 120)) - self.assertEqual(parted["partition_tiling"], (a.comm.size, 1)) - self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0)) - - a.resplit_(None) - self.assertIsNone(a.__partitions_dict__) - parted = a.__partitioned__ - self.assertEqual(parted["shape"], (120, 120)) - self.assertEqual(parted["partition_tiling"], (1, 1)) - self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0)) + if not self.is_mps: + a = ht.zeros((120, 120), split=0) + parted = a.__partitioned__ + self.assertEqual(parted["shape"], (120, 120)) + self.assertEqual(parted["partition_tiling"], (a.comm.size, 1)) + self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0)) + + a.resplit_(None) + self.assertIsNone(a.__partitions_dict__) + parted = a.__partitioned__ + self.assertEqual(parted["shape"], (120, 120)) + self.assertEqual(parted["partition_tiling"], (1, 1)) + self.assertEqual(parted["partitions"][(0, 0)]["start"], (0, 0)) def test_redistribute(self): # need to test with 1, 2, 3, and 4 dims @@ -934,177 +976,175 @@ def test_redistribute(self): with self.assertRaises(ValueError): st.redistribute_(target_map=torch.zeros((2, 4))) - def test_repr(self): - a = ht.array([1, 2, 3, 4]) - self.assertEqual(a.__repr__(), a.__str__()) - def test_resplit(self): - # resplitting with same axis, should leave everything unchanged - shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) - data = ht.zeros(shape, split=None) - data.resplit_(None) - - self.assertIsInstance(data, ht.DNDarray) - self.assertEqual(data.shape, shape) - self.assertEqual(data.lshape, shape) - self.assertEqual(data.split, None) - - # resplitting with same axis, should leave everything unchanged - shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) - data = ht.zeros(shape, split=1) - data.resplit_(1) - - self.assertIsInstance(data, ht.DNDarray) - self.assertEqual(data.shape, shape) - self.assertEqual(data.lshape, (data.comm.size, 1)) - self.assertEqual(data.split, 1) - - # splitting an unsplit tensor should result in slicing the tensor locally - shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) - data = ht.zeros(shape) - data.resplit_(-1) - - self.assertIsInstance(data, ht.DNDarray) - self.assertEqual(data.shape, shape) - self.assertEqual(data.lshape, (data.comm.size, 1)) - self.assertEqual(data.split, 1) - - # unsplitting, aka gathering a tensor - shape = (ht.MPI_WORLD.size + 1, ht.MPI_WORLD.size) - data = ht.ones(shape, split=0) - data.resplit_(None) - - self.assertIsInstance(data, ht.DNDarray) - self.assertEqual(data.shape, shape) - self.assertEqual(data.lshape, shape) - self.assertEqual(data.split, None) - - # assign and entirely new split axis - shape = (ht.MPI_WORLD.size + 2, ht.MPI_WORLD.size + 1) - data = ht.ones(shape, split=0) - data.resplit_(1) - - self.assertIsInstance(data, ht.DNDarray) - self.assertEqual(data.shape, shape) - self.assertEqual(data.lshape[0], ht.MPI_WORLD.size + 2) - self.assertTrue(data.lshape[1] == 1 or data.lshape[1] == 2) - self.assertEqual(data.split, 1) - - # test sorting order of resplit - a_tensor = self.reference_tensor.copy() - N = ht.MPI_WORLD.size - - # split along axis = 0 - a_tensor.resplit_(axis=0) - local_shape = (1, N + 1, 2 * N) - local_tensor = self.reference_tensor[ht.MPI_WORLD.rank, :, :] - self.assertEqual(a_tensor.lshape, local_shape) - self.assertTrue((a_tensor.larray == local_tensor.larray).all()) - - # unsplit - a_tensor.resplit_(axis=None) - self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all()) - - # split along axis = 1 - a_tensor.resplit_(axis=1) - if ht.MPI_WORLD.rank == 0: - local_shape = (N, 2, 2 * N) - local_tensor = self.reference_tensor[:, 0:2, :] - else: - local_shape = (N, 1, 2 * N) - local_tensor = self.reference_tensor[ - :, ht.MPI_WORLD.rank + 1 : ht.MPI_WORLD.rank + 2, : - ] - - self.assertEqual(a_tensor.lshape, local_shape) - self.assertTrue((a_tensor.larray == local_tensor.larray).all()) + # MPS tests are always 1 process only + if not self.is_mps: + # resplitting with same axis, should leave everything unchanged + shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) + data = ht.zeros(shape, split=None) + data.resplit_(None) + + self.assertIsInstance(data, ht.DNDarray) + self.assertEqual(data.shape, shape) + self.assertEqual(data.lshape, shape) + self.assertEqual(data.split, None) + + # resplitting with same axis, should leave everything unchanged + shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) + data = ht.zeros(shape, split=1) + data.resplit_(1) + + self.assertIsInstance(data, ht.DNDarray) + self.assertEqual(data.shape, shape) + self.assertEqual(data.lshape, (data.comm.size, 1)) + self.assertEqual(data.split, 1) + + # splitting an unsplit tensor should result in slicing the tensor locally + shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size) + data = ht.zeros(shape) + data.resplit_(-1) + + self.assertIsInstance(data, ht.DNDarray) + self.assertEqual(data.shape, shape) + self.assertEqual(data.lshape, (data.comm.size, 1)) + self.assertEqual(data.split, 1) + + # unsplitting, aka gathering a tensor + shape = (ht.MPI_WORLD.size + 1, ht.MPI_WORLD.size) + data = ht.ones(shape, split=0) + data.resplit_(None) + + self.assertIsInstance(data, ht.DNDarray) + self.assertEqual(data.shape, shape) + self.assertEqual(data.lshape, shape) + self.assertEqual(data.split, None) + + # assign and entirely new split axis + shape = (ht.MPI_WORLD.size + 2, ht.MPI_WORLD.size + 1) + data = ht.ones(shape, split=0) + data.resplit_(1) + + self.assertIsInstance(data, ht.DNDarray) + self.assertEqual(data.shape, shape) + self.assertEqual(data.lshape[0], ht.MPI_WORLD.size + 2) + self.assertTrue(data.lshape[1] == 1 or data.lshape[1] == 2) + self.assertEqual(data.split, 1) + + # test sorting order of resplit + a_tensor = self.reference_tensor.copy() + N = ht.MPI_WORLD.size + + # split along axis = 0 + a_tensor.resplit_(axis=0) + local_shape = (1, N + 1, 2 * N) + local_tensor = self.reference_tensor[ht.MPI_WORLD.rank, :, :] + self.assertEqual(a_tensor.lshape, local_shape) + self.assertTrue((a_tensor.larray == local_tensor.larray).all()) + + # unsplit + a_tensor.resplit_(axis=None) + self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all()) + + # split along axis = 1 + a_tensor.resplit_(axis=1) + if ht.MPI_WORLD.rank == 0: + local_shape = (N, 2, 2 * N) + local_tensor = self.reference_tensor[:, 0:2, :] + else: + local_shape = (N, 1, 2 * N) + local_tensor = self.reference_tensor[ + :, ht.MPI_WORLD.rank + 1 : ht.MPI_WORLD.rank + 2, : + ] - # unsplit - a_tensor.resplit_(axis=None) - self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all()) + self.assertEqual(a_tensor.lshape, local_shape) + self.assertTrue((a_tensor.larray == local_tensor.larray).all()) - # split along axis = 2 - a_tensor.resplit_(axis=2) - local_shape = (N, N + 1, 2) - local_tensor = self.reference_tensor[ - :, :, 2 * ht.MPI_WORLD.rank : 2 * ht.MPI_WORLD.rank + 2 - ] + # unsplit + a_tensor.resplit_(axis=None) + self.assertTrue((a_tensor.larray == self.reference_tensor.larray).all()) - self.assertEqual(a_tensor.lshape, local_shape) - self.assertTrue((a_tensor.larray == local_tensor.larray).all()) + # split along axis = 2 + a_tensor.resplit_(axis=2) + local_shape = (N, N + 1, 2) + local_tensor = self.reference_tensor[ + :, :, 2 * ht.MPI_WORLD.rank : 2 * ht.MPI_WORLD.rank + 2 + ] - expected = torch.ones( - (ht.MPI_WORLD.size, 100), dtype=torch.int64, device=self.device.torch_device - ) - data = ht.array(expected, split=1) - data.resplit_(None) + self.assertEqual(a_tensor.lshape, local_shape) + self.assertTrue((a_tensor.larray == local_tensor.larray).all()) - self.assertTrue(torch.equal(data.larray, expected)) - self.assertFalse(data.is_distributed()) - self.assertIsNone(data.split) - self.assertEqual(data.dtype, ht.int64) - self.assertEqual(data.larray.dtype, expected.dtype) + expected = torch.ones( + (ht.MPI_WORLD.size, 100), dtype=torch.int64, device=self.device.torch_device + ) + data = ht.array(expected, split=1) + data.resplit_(None) - expected = torch.zeros( - (100, ht.MPI_WORLD.size), dtype=torch.uint8, device=self.device.torch_device - ) - data = ht.array(expected, split=0) - data.resplit_(None) + self.assertTrue(torch.equal(data.larray, expected)) + self.assertFalse(data.is_distributed()) + self.assertIsNone(data.split) + self.assertEqual(data.dtype, ht.int64) + self.assertEqual(data.larray.dtype, expected.dtype) - self.assertTrue(torch.equal(data.larray, expected)) - self.assertFalse(data.is_distributed()) - self.assertIsNone(data.split) - self.assertEqual(data.dtype, ht.uint8) - self.assertEqual(data.larray.dtype, expected.dtype) - - # "in place" - length = torch.tensor([i + 20 for i in range(2)], device=self.device.torch_device) - test = torch.arange( - torch.prod(length), dtype=torch.float64, device=self.device.torch_device - ).reshape([i + 20 for i in range(2)]) - a = ht.array(test, split=1) - a.resplit_(axis=0) - self.assertTrue(ht.equal(a, ht.array(test, split=0))) - self.assertEqual(a.split, 0) - self.assertEqual(a.dtype, ht.float64) - del a - - test = torch.arange(torch.prod(length), device=self.device.torch_device) - a = ht.array(test, split=0) - a.resplit_(axis=None) - self.assertTrue(ht.equal(a, ht.array(test, split=None))) - self.assertEqual(a.split, None) - self.assertEqual(a.dtype, ht.int64) - del a - - a = ht.array(test, split=None) - a.resplit_(axis=0) - self.assertTrue(ht.equal(a, ht.array(test, split=0))) - self.assertEqual(a.split, 0) - self.assertEqual(a.dtype, ht.int64) - del a - - a = ht.array(test, split=0) - resplit_a = ht.manipulations.resplit(a, axis=None) - self.assertTrue(ht.equal(resplit_a, ht.array(test, split=None))) - self.assertEqual(resplit_a.split, None) - self.assertEqual(resplit_a.dtype, ht.int64) - del a - - a = ht.array(test, split=None) - resplit_a = ht.manipulations.resplit(a, axis=0) - self.assertTrue(ht.equal(resplit_a, ht.array(test, split=0))) - self.assertEqual(resplit_a.split, 0) - self.assertEqual(resplit_a.dtype, ht.int64) - del a - - # 1D non-contiguous resplit testing - t1 = ht.arange(10 * 10, split=0).reshape((10, 10)) - t1_sub = t1[:, 1] # .expand_dims(0) - res = ht.array([1, 11, 21, 31, 41, 51, 61, 71, 81, 91]) - t1_sub.resplit_(axis=None) - self.assertTrue(ht.all(t1_sub == res)) - self.assertEqual(t1_sub.split, None) + expected = torch.zeros( + (100, ht.MPI_WORLD.size), dtype=torch.uint8, device=self.device.torch_device + ) + data = ht.array(expected, split=0) + data.resplit_(None) + + self.assertTrue(torch.equal(data.larray, expected)) + self.assertFalse(data.is_distributed()) + self.assertIsNone(data.split) + self.assertEqual(data.dtype, ht.uint8) + self.assertEqual(data.larray.dtype, expected.dtype) + + # "in place" + length = torch.tensor([i + 20 for i in range(2)], device=self.device.torch_device) + test = torch.arange( + torch.prod(length), dtype=torch.float64, device=self.device.torch_device + ).reshape([i + 20 for i in range(2)]) + a = ht.array(test, split=1) + a.resplit_(axis=0) + self.assertTrue(ht.equal(a, ht.array(test, split=0))) + self.assertEqual(a.split, 0) + self.assertEqual(a.dtype, ht.float64) + del a + + test = torch.arange(torch.prod(length), device=self.device.torch_device) + a = ht.array(test, split=0) + a.resplit_(axis=None) + self.assertTrue(ht.equal(a, ht.array(test, split=None))) + self.assertEqual(a.split, None) + self.assertEqual(a.dtype, ht.int64) + del a + + a = ht.array(test, split=None) + a.resplit_(axis=0) + self.assertTrue(ht.equal(a, ht.array(test, split=0))) + self.assertEqual(a.split, 0) + self.assertEqual(a.dtype, ht.int64) + del a + + a = ht.array(test, split=0) + resplit_a = ht.manipulations.resplit(a, axis=None) + self.assertTrue(ht.equal(resplit_a, ht.array(test, split=None))) + self.assertEqual(resplit_a.split, None) + self.assertEqual(resplit_a.dtype, ht.int64) + del a + + a = ht.array(test, split=None) + resplit_a = ht.manipulations.resplit(a, axis=0) + self.assertTrue(ht.equal(resplit_a, ht.array(test, split=0))) + self.assertEqual(resplit_a.split, 0) + self.assertEqual(resplit_a.dtype, ht.int64) + del a + + # 1D non-contiguous resplit testing + t1 = ht.arange(10 * 10, split=0).reshape((10, 10)) + t1_sub = t1[:, 1] # .expand_dims(0) + res = ht.array([1, 11, 21, 31, 41, 51, 61, 71, 81, 91]) + t1_sub.resplit_(axis=None) + self.assertTrue(ht.all(t1_sub == res)) + self.assertEqual(t1_sub.split, None) # 3D non-contiguous resplit testing (Column mayor ordering) torch_array = torch.arange(100, device=self.device.torch_device).reshape((10, 5, 2)) @@ -1257,14 +1297,20 @@ def test_setitem_getitem(self): # setting with heat tensor a = ht.zeros((4, 5), split=0) - a[1, 0:4] = ht.arange(4) + if self.is_mps: + a[1, 0:4] = ht.arange(4, dtype=a.dtype) + else: + a[1, 0:4] = ht.arange(4) # if a.comm.size == 2: for c, i in enumerate(range(4)): self.assertEqual(a[1, c], i) # setting with torch tensor a = ht.zeros((4, 5), split=0) - a[1, 0:4] = torch.arange(4, device=self.device.torch_device) + if self.is_mps: + a[1, 0:4] = torch.arange(4, dtype=a.larray.dtype, device=self.device.torch_device) + else: + a[1, 0:4] = torch.arange(4, device=self.device.torch_device) # if a.comm.size == 2: for c, i in enumerate(range(4)): self.assertEqual(a[1, c], i) @@ -1365,7 +1411,10 @@ def test_setitem_getitem(self): # setting with heat tensor a = ht.zeros((4, 5), split=1) - a[1, 0:4] = ht.arange(4) + if self.is_mps: + a[1, 0:4] = ht.arange(4, dtype=a.dtype) + else: + a[1, 0:4] = ht.arange(4) for c, i in enumerate(range(4)): b = a[1, c] if b.larray.numel() > 0: @@ -1373,7 +1422,10 @@ def test_setitem_getitem(self): # setting with torch tensor a = ht.zeros((4, 5), split=1) - a[1, 0:4] = torch.arange(4, device=self.device.torch_device) + if a.device.torch_device.startswith("mps"): + a[1, 0:4] = torch.arange(4, dtype=a.larray.dtype, device=self.device.torch_device) + else: + a[1, 0:4] = torch.arange(4, device=self.device.torch_device) for c, i in enumerate(range(4)): self.assertEqual(a[1, c], i) @@ -1615,20 +1667,22 @@ def test_stride_and_strides(self): self.assertEqual(heat_float32.strides, numpy_float32.strides) # Local, float64, column-major memory layout - torch_float64 = torch.arange( - 6 * 5 * 3 * 4 * 5 * 7, dtype=torch.float64, device=self.device.torch_device - ).reshape(6, 5, 3, 4, 5, 7) - heat_float64_F = ht.array(torch_float64, order="F") - numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F") - self.assertNotEqual(heat_float64_F.stride(), torch_float64.stride()) - if pytorch_major_version >= 2: - self.assertTrue( - ( - np.asarray(heat_float64_F.strides) * 8 == np.asarray(numpy_float64_F.strides) - ).all() - ) - else: - self.assertEqual(heat_float64_F.strides, numpy_float64_F.strides) + if not self.is_mps: + torch_float64 = torch.arange( + 6 * 5 * 3 * 4 * 5 * 7, dtype=torch.float64, device=self.device.torch_device + ).reshape(6, 5, 3, 4, 5, 7) + heat_float64_F = ht.array(torch_float64, order="F") + numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F") + self.assertNotEqual(heat_float64_F.stride(), torch_float64.stride()) + if pytorch_major_version >= 2: + self.assertTrue( + ( + np.asarray(heat_float64_F.strides) * 8 + == np.asarray(numpy_float64_F.strides) + ).all() + ) + else: + self.assertEqual(heat_float64_F.strides, numpy_float64_F.strides) # Distributed, int16, row-major memory layout size = ht.communication.MPI_WORLD.size @@ -1674,24 +1728,25 @@ def test_stride_and_strides(self): self.assertEqual(heat_float32_split.strides, numpy_float32_split_strides) # Distributed, float64, column-major memory layout - split = -2 - torch_float64 = torch.arange( - 6 * 5 * 3 * 4 * 5 * size * 7, dtype=torch.float64, device=self.device.torch_device - ).reshape(6, 5, 3, 4, 5 * size, 7) - heat_float64_F_split = ht.array(torch_float64, order="F", split=split) - numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F") - numpy_float64_F_split_strides = numpy_float64_F.strides[: split + 1] + tuple( - np.array(numpy_float64_F.strides[split + 1 :]) / size - ) - if pytorch_major_version >= 2: - self.assertTrue( - ( - np.asarray(heat_float64_F_split.strides) * 8 - == np.asarray(numpy_float64_F_split_strides) - ).all() + if not self.is_mps: + split = -2 + torch_float64 = torch.arange( + 6 * 5 * 3 * 4 * 5 * size * 7, dtype=torch.float64, device=self.device.torch_device + ).reshape(6, 5, 3, 4, 5 * size, 7) + heat_float64_F_split = ht.array(torch_float64, order="F", split=split) + numpy_float64_F = np.array(torch_float64.cpu().numpy(), order="F") + numpy_float64_F_split_strides = numpy_float64_F.strides[: split + 1] + tuple( + np.array(numpy_float64_F.strides[split + 1 :]) / size ) - else: - self.assertEqual(heat_float64_F_split.strides, numpy_float64_F_split_strides) + if pytorch_major_version >= 2: + self.assertTrue( + ( + np.asarray(heat_float64_F_split.strides) * 8 + == np.asarray(numpy_float64_F_split_strides) + ).all() + ) + else: + self.assertEqual(heat_float64_F_split.strides, numpy_float64_F_split_strides) def test_tolist(self): a = ht.zeros([ht.MPI_WORLD.size, ht.MPI_WORLD.size, ht.MPI_WORLD.size], dtype=ht.int32) @@ -1758,6 +1813,14 @@ def test_torch_proxy(self): ) self.assertTrue(dndarray_proxy_nbytes == 1) + def test_torch_function(self): + arr = ht.array([1, 2, 3, 4]) + self.assertIsInstance(torch.concatenate([arr, arr]), ht.DNDarray) + self.assertIsInstance(torch.sum(arr, axis=0), ht.DNDarray) + + with self.assertRaises(TypeError): + torch.sigmoid(arr) + def test_xor(self): int16_tensor = ht.array([[1, 1], [2, 2]], dtype=ht.int16) int16_vector = ht.array([[3, 4]], dtype=ht.int16) diff --git a/heat/core/tests/test_exponential.py b/heat/core/tests/test_exponential.py index 861e0166d2..b26cfe789a 100644 --- a/heat/core/tests/test_exponential.py +++ b/heat/core/tests/test_exponential.py @@ -6,9 +6,14 @@ class TestExponential(TestCase): + def set_torch_dtype(self): + dtype = torch.float32 if self.is_mps else torch.float64 + return dtype + def test_exp(self): elements = 10 - tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).exp() + torch_dtype = self.set_torch_dtype() + tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).exp() comparison = ht.array(tmp) # exponential of float32 @@ -19,11 +24,12 @@ def test_exp(self): self.assertTrue(ht.allclose(float32_exp, comparison.astype(ht.float32))) # exponential of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_exp = ht.exp(float64_tensor) - self.assertIsInstance(float64_exp, ht.DNDarray) - self.assertEqual(float64_exp.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_exp, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_exp = ht.exp(float64_tensor) + self.assertIsInstance(float64_exp, ht.DNDarray) + self.assertEqual(float64_exp.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_exp, comparison)) # exponential of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) @@ -33,11 +39,12 @@ def test_exp(self): self.assertTrue(ht.allclose(int32_exp, ht.float32(comparison))) # exponential of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(elements, dtype=ht.int64) - int64_exp = int64_tensor.exp() - self.assertIsInstance(int64_exp, ht.DNDarray) - self.assertEqual(int64_exp.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_exp, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(elements, dtype=ht.int64) + int64_exp = int64_tensor.exp() + self.assertIsInstance(int64_exp, ht.DNDarray) + self.assertEqual(int64_exp.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_exp, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -56,8 +63,9 @@ def test_exp(self): self.assertEqual(actual.dtype, ht.float32) def test_expm1(self): + torch_dtype = self.set_torch_dtype() elements = 10 - tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).expm1() + tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).expm1() comparison = ht.array(tmp) # expm1onential of float32 @@ -68,11 +76,12 @@ def test_expm1(self): self.assertTrue(ht.allclose(float32_expm1, comparison.astype(ht.float32))) # expm1onential of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_expm1 = ht.expm1(float64_tensor) - self.assertIsInstance(float64_expm1, ht.DNDarray) - self.assertEqual(float64_expm1.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_expm1, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_expm1 = ht.expm1(float64_tensor) + self.assertIsInstance(float64_expm1, ht.DNDarray) + self.assertEqual(float64_expm1.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_expm1, comparison)) # expm1onential of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) @@ -82,11 +91,12 @@ def test_expm1(self): self.assertTrue(ht.allclose(int32_expm1, ht.float32(comparison))) # expm1onential of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(elements, dtype=ht.int64) - int64_expm1 = int64_tensor.expm1() - self.assertIsInstance(int64_expm1, ht.DNDarray) - self.assertEqual(int64_expm1.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_expm1, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(elements, dtype=ht.int64) + int64_expm1 = int64_tensor.expm1() + self.assertIsInstance(int64_expm1, ht.DNDarray) + self.assertEqual(int64_expm1.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_expm1, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -95,8 +105,10 @@ def test_expm1(self): ht.expm1("hello world") def test_exp2(self): + torch_dtype = self.set_torch_dtype() elements = 10 - tmp = np.exp2(torch.arange(elements, dtype=torch.float64)) + tmp = np.exp2(torch.arange(elements, dtype=torch_dtype).numpy()) + tmp = torch.tensor(tmp) tmp = tmp.to(self.device.torch_device) comparison = ht.array(tmp, device=self.device) @@ -108,11 +120,12 @@ def test_exp2(self): self.assertTrue(ht.allclose(float32_exp2, comparison.astype(ht.float32))) # exponential of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_exp2 = ht.exp2(float64_tensor) - self.assertIsInstance(float64_exp2, ht.DNDarray) - self.assertEqual(float64_exp2.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_exp2, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_exp2 = ht.exp2(float64_tensor) + self.assertIsInstance(float64_exp2, ht.DNDarray) + self.assertEqual(float64_exp2.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_exp2, comparison)) # exponential of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) @@ -122,11 +135,12 @@ def test_exp2(self): self.assertTrue(ht.allclose(int32_exp2, ht.float32(comparison))) # exponential of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(elements, dtype=ht.int64) - int64_exp2 = int64_tensor.exp2() - self.assertIsInstance(int64_exp2, ht.DNDarray) - self.assertEqual(int64_exp2.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_exp2, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(elements, dtype=ht.int64) + int64_exp2 = int64_tensor.exp2() + self.assertIsInstance(int64_exp2, ht.DNDarray) + self.assertEqual(int64_exp2.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_exp2, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -135,8 +149,9 @@ def test_exp2(self): ht.exp2("hello world") def test_log(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device).log() + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log() comparison = ht.array(tmp) # logarithm of float32 @@ -147,11 +162,12 @@ def test_log(self): self.assertTrue(ht.allclose(float32_log, comparison.astype(ht.float32))) # logarithm of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_log = ht.log(float64_tensor) - self.assertIsInstance(float64_log, ht.DNDarray) - self.assertEqual(float64_log.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_log, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_log = ht.log(float64_tensor) + self.assertIsInstance(float64_log, ht.DNDarray) + self.assertEqual(float64_log.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_log, comparison)) # logarithm of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(1, elements, dtype=ht.int32) @@ -161,11 +177,12 @@ def test_log(self): self.assertTrue(ht.allclose(int32_log, ht.float32(comparison))) # logarithm of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(1, elements, dtype=ht.int64) - int64_log = int64_tensor.log() - self.assertIsInstance(int64_log, ht.DNDarray) - self.assertEqual(int64_log.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_log, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(1, elements, dtype=ht.int64) + int64_log = int64_tensor.log() + self.assertIsInstance(int64_log, ht.DNDarray) + self.assertEqual(int64_log.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_log, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -174,8 +191,9 @@ def test_log(self): ht.log("hello world") def test_log2(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device).log2() + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log2() comparison = ht.array(tmp) # logarithm of float32 @@ -186,11 +204,12 @@ def test_log2(self): self.assertTrue(ht.allclose(float32_log2, comparison.astype(ht.float32))) # logarithm of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_log2 = ht.log2(float64_tensor) - self.assertIsInstance(float64_log2, ht.DNDarray) - self.assertEqual(float64_log2.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_log2, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_log2 = ht.log2(float64_tensor) + self.assertIsInstance(float64_log2, ht.DNDarray) + self.assertEqual(float64_log2.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_log2, comparison)) # logarithm of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(1, elements, dtype=ht.int32) @@ -200,11 +219,12 @@ def test_log2(self): self.assertTrue(ht.allclose(int32_log2, ht.float32(comparison))) # logarithm of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(1, elements, dtype=ht.int64) - int64_log2 = int64_tensor.log2() - self.assertIsInstance(int64_log2, ht.DNDarray) - self.assertEqual(int64_log2.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_log2, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(1, elements, dtype=ht.int64) + int64_log2 = int64_tensor.log2() + self.assertIsInstance(int64_log2, ht.DNDarray) + self.assertEqual(int64_log2.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_log2, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -213,10 +233,9 @@ def test_log2(self): ht.log2("hello world") def test_log10(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange( - 1, elements, dtype=torch.float64, device=self.device.torch_device - ).log10() + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log10() comparison = ht.array(tmp) # logarithm of float32 @@ -227,11 +246,12 @@ def test_log10(self): self.assertTrue(ht.allclose(float32_log10, comparison.astype(ht.float32))) # logarithm of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_log10 = ht.log10(float64_tensor) - self.assertIsInstance(float64_log10, ht.DNDarray) - self.assertEqual(float64_log10.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_log10, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_log10 = ht.log10(float64_tensor) + self.assertIsInstance(float64_log10, ht.DNDarray) + self.assertEqual(float64_log10.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_log10, comparison)) # logarithm of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(1, elements, dtype=ht.int32) @@ -241,11 +261,12 @@ def test_log10(self): self.assertTrue(ht.allclose(int32_log10, ht.float32(comparison))) # logarithm of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(1, elements, dtype=ht.int64) - int64_log10 = int64_tensor.log10() - self.assertIsInstance(int64_log10, ht.DNDarray) - self.assertEqual(int64_log10.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_log10, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(1, elements, dtype=ht.int64) + int64_log10 = int64_tensor.log10() + self.assertIsInstance(int64_log10, ht.DNDarray) + self.assertEqual(int64_log10.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_log10, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -254,10 +275,9 @@ def test_log10(self): ht.log10("hello world") def test_log1p(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange( - 1, elements, dtype=torch.float64, device=self.device.torch_device - ).log1p() + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device).log1p() comparison = ht.array(tmp) # logarithm of float32 @@ -268,11 +288,12 @@ def test_log1p(self): self.assertTrue(ht.allclose(float32_log1p, comparison.astype(ht.float32))) # logarithm of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_log1p = ht.log1p(float64_tensor) - self.assertIsInstance(float64_log1p, ht.DNDarray) - self.assertEqual(float64_log1p.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_log1p, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_log1p = ht.log1p(float64_tensor) + self.assertIsInstance(float64_log1p, ht.DNDarray) + self.assertEqual(float64_log1p.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_log1p, comparison)) # logarithm of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(1, elements, dtype=ht.int32) @@ -282,11 +303,12 @@ def test_log1p(self): self.assertTrue(ht.allclose(int32_log1p, ht.float32(comparison))) # logarithm of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(1, elements, dtype=ht.int64) - int64_log1p = int64_tensor.log1p() - self.assertIsInstance(int64_log1p, ht.DNDarray) - self.assertEqual(int64_log1p.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_log1p, comparison)) + if not self.is_mps: + int64_tensor = ht.arange(1, elements, dtype=ht.int64) + int64_log1p = int64_tensor.log1p() + self.assertIsInstance(int64_log1p, ht.DNDarray) + self.assertEqual(int64_log1p.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_log1p, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -295,8 +317,9 @@ def test_log1p(self): ht.log1p("hello world") def test_logaddexp(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device) + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device) tmp = tmp.logaddexp(tmp) comparison = ht.array(tmp) @@ -308,11 +331,12 @@ def test_logaddexp(self): self.assertTrue(ht.allclose(float32_logaddexp, comparison.astype(ht.float32))) # logaddexp of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_logaddexp = ht.logaddexp(float64_tensor, float64_tensor) - self.assertIsInstance(float64_logaddexp, ht.DNDarray) - self.assertEqual(float64_logaddexp.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_logaddexp, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_logaddexp = ht.logaddexp(float64_tensor, float64_tensor) + self.assertIsInstance(float64_logaddexp, ht.DNDarray) + self.assertEqual(float64_logaddexp.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_logaddexp, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -321,8 +345,9 @@ def test_logaddexp(self): ht.logaddexp("hello world", "hello world") def test_logaddexp2(self): + torch_dtype = self.set_torch_dtype() elements = 15 - tmp = torch.arange(1, elements, dtype=torch.float64, device=self.device.torch_device) + tmp = torch.arange(1, elements, dtype=torch_dtype, device=self.device.torch_device) tmp = tmp.logaddexp2(tmp) comparison = ht.array(tmp) @@ -334,11 +359,12 @@ def test_logaddexp2(self): self.assertTrue(ht.allclose(float32_logaddexp2, comparison.astype(ht.float32))) # logaddexp2 of float64 - float64_tensor = ht.arange(1, elements, dtype=ht.float64) - float64_logaddexp2 = ht.logaddexp2(float64_tensor, float64_tensor) - self.assertIsInstance(float64_logaddexp2, ht.DNDarray) - self.assertEqual(float64_logaddexp2.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_logaddexp2, comparison)) + if not self.is_mps: + float64_tensor = ht.arange(1, elements, dtype=ht.float64) + float64_logaddexp2 = ht.logaddexp2(float64_tensor, float64_tensor) + self.assertIsInstance(float64_logaddexp2, ht.DNDarray) + self.assertEqual(float64_logaddexp2.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_logaddexp2, comparison)) # check exceptions with self.assertRaises(TypeError): @@ -347,8 +373,9 @@ def test_logaddexp2(self): ht.logaddexp2("hello world", "hello world") def test_sqrt(self): + torch_dtype = self.set_torch_dtype() elements = 25 - tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).sqrt() + tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).sqrt() comparison = ht.array(tmp) # square roots of float32 @@ -359,11 +386,12 @@ def test_sqrt(self): self.assertTrue(ht.allclose(float32_sqrt, comparison.astype(ht.float32), 1e-06)) # square roots of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_sqrt = ht.sqrt(float64_tensor) - self.assertIsInstance(float64_sqrt, ht.DNDarray) - self.assertEqual(float64_sqrt.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-06)) + if not self.is_mps: + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_sqrt = ht.sqrt(float64_tensor) + self.assertIsInstance(float64_sqrt, ht.DNDarray) + self.assertEqual(float64_sqrt.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-06)) # square roots of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) @@ -373,11 +401,12 @@ def test_sqrt(self): self.assertTrue(ht.allclose(int32_sqrt, ht.float32(comparison), 1e-06)) # square roots of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(elements, dtype=ht.int64) - int64_sqrt = int64_tensor.sqrt() - self.assertIsInstance(int64_sqrt, ht.DNDarray) - self.assertEqual(int64_sqrt.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-06)) + if not self.is_mps: + int64_tensor = ht.arange(elements, dtype=ht.int64) + int64_sqrt = int64_tensor.sqrt() + self.assertIsInstance(int64_sqrt, ht.DNDarray) + self.assertEqual(int64_sqrt.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-06)) # check exceptions with self.assertRaises(TypeError): @@ -386,8 +415,9 @@ def test_sqrt(self): ht.sqrt("hello world") def test_sqrt_method(self): + torch_dtype = self.set_torch_dtype() elements = 25 - tmp = torch.arange(elements, dtype=torch.float64, device=self.device.torch_device).sqrt() + tmp = torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device).sqrt() comparison = ht.array(tmp) # square roots of float32 @@ -397,10 +427,11 @@ def test_sqrt_method(self): self.assertTrue(ht.allclose(float32_sqrt, comparison.astype(ht.float32), 1e-05)) # square roots of float64 - float64_sqrt = ht.arange(elements, dtype=ht.float64).sqrt() - self.assertIsInstance(float64_sqrt, ht.DNDarray) - self.assertEqual(float64_sqrt.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-05)) + if not self.is_mps: + float64_sqrt = ht.arange(elements, dtype=ht.float64).sqrt() + self.assertIsInstance(float64_sqrt, ht.DNDarray) + self.assertEqual(float64_sqrt.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_sqrt, comparison, 1e-05)) # square roots of ints, automatic conversion to intermediate floats int32_sqrt = ht.arange(elements, dtype=ht.int32).sqrt() @@ -409,10 +440,11 @@ def test_sqrt_method(self): self.assertTrue(ht.allclose(int32_sqrt, ht.float32(comparison), 1e-05)) # square roots of longs, automatic conversion to intermediate floats - int64_sqrt = ht.arange(elements, dtype=ht.int64).sqrt() - self.assertIsInstance(int64_sqrt, ht.DNDarray) - self.assertEqual(int64_sqrt.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-05)) + if not self.is_mps: + int64_sqrt = ht.arange(elements, dtype=ht.int64).sqrt() + self.assertIsInstance(int64_sqrt, ht.DNDarray) + self.assertEqual(int64_sqrt.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_sqrt, comparison, 1e-05)) # check exceptions with self.assertRaises(TypeError): @@ -449,9 +481,10 @@ def test_sqrt_out_of_place(self): ht.sqrt(number_range, "hello world") def test_square(self): + torch_dtype = self.set_torch_dtype() elements = 25 tmp = torch.square( - torch.arange(elements, dtype=torch.float64, device=self.device.torch_device) + torch.arange(elements, dtype=torch_dtype, device=self.device.torch_device) ) comparison = ht.array(tmp) @@ -463,11 +496,12 @@ def test_square(self): self.assertTrue(ht.allclose(float32_square, comparison.astype(ht.float32), 1e-09)) # squares of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_square = ht.square(float64_tensor) - self.assertIsInstance(float64_square, ht.DNDarray) - self.assertEqual(float64_square.dtype, ht.float64) - self.assertTrue(ht.allclose(float64_square, comparison, 1e-09)) + if not self.is_mps: + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_square = ht.square(float64_tensor) + self.assertIsInstance(float64_square, ht.DNDarray) + self.assertEqual(float64_square.dtype, ht.float64) + self.assertTrue(ht.allclose(float64_square, comparison, 1e-09)) # squares of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) @@ -477,11 +511,12 @@ def test_square(self): self.assertTrue(ht.allclose(int32_square, ht.float32(comparison), 1e-09)) # squares of longs, automatic conversion to intermediate floats - int64_tensor = ht.arange(elements, dtype=ht.int64) - int64_square = int64_tensor.square() - self.assertIsInstance(int64_square, ht.DNDarray) - self.assertEqual(int64_square.dtype, ht.float64) - self.assertTrue(ht.allclose(int64_square, comparison, 1e-09)) + if not self.is_mps: + int64_tensor = ht.arange(elements, dtype=ht.int64) + int64_square = int64_tensor.square() + self.assertIsInstance(int64_square, ht.DNDarray) + self.assertEqual(int64_square.dtype, ht.float64) + self.assertTrue(ht.allclose(int64_square, comparison, 1e-09)) # check exceptions with self.assertRaises(TypeError): diff --git a/heat/core/tests/test_factories.py b/heat/core/tests/test_factories.py index 25b3845f2a..a4ea615090 100644 --- a/heat/core/tests/test_factories.py +++ b/heat/core/tests/test_factories.py @@ -96,15 +96,19 @@ def test_arange(self): self.assertEqual(three_arg_arange_dtype_short.sum(axis=0, keepdims=True), 20) # testing setting dtype to float64 - three_arg_arange_dtype_float64 = ht.arange(0, 10, 2, dtype=torch.float64) - self.assertIsInstance(three_arg_arange_dtype_float64, ht.DNDarray) - self.assertEqual(three_arg_arange_dtype_float64.shape, (5,)) - self.assertLessEqual(three_arg_arange_dtype_float64.lshape[0], 5) - self.assertEqual(three_arg_arange_dtype_float64.dtype, ht.float64) - self.assertEqual(three_arg_arange_dtype_float64.larray.dtype, torch.float64) - self.assertEqual(three_arg_arange_dtype_float64.split, None) - # make an in direct check for the sequence, compare against the gaussian sum - self.assertEqual(three_arg_arange_dtype_float64.sum(axis=0, keepdims=True), 20.0) + if not self.is_mps: + three_arg_arange_dtype_float64 = ht.arange(0, 10, 2, dtype=torch.float64) + self.assertIsInstance(three_arg_arange_dtype_float64, ht.DNDarray) + self.assertEqual(three_arg_arange_dtype_float64.shape, (5,)) + self.assertLessEqual(three_arg_arange_dtype_float64.lshape[0], 5) + self.assertEqual(three_arg_arange_dtype_float64.dtype, ht.float64) + self.assertEqual(three_arg_arange_dtype_float64.larray.dtype, torch.float64) + self.assertEqual(three_arg_arange_dtype_float64.split, None) + # make an in direct check for the sequence, compare against the gaussian sum + self.assertEqual(three_arg_arange_dtype_float64.sum(axis=0, keepdims=True), 20.0) + + check_precision = ht.arange(16777217.0, 16777218, 1, dtype=ht.float64) + self.assertEqual(check_precision.sum(), 16777217) check_precision = ht.arange(16777217.0, 16777218, 1, dtype=ht.float64) self.assertEqual(check_precision.sum(), 16777217) @@ -145,8 +149,9 @@ def test_array(self): == torch.tensor(tuple_data, dtype=torch.int8, device=self.device.torch_device) ).all() ) - check_precision = ht.array(16777217.0, dtype=ht.float64) - self.assertEqual(check_precision.sum(), 16777217) + if not self.is_mps: + check_precision = ht.array(16777217.0, dtype=ht.float64) + self.assertEqual(check_precision.sum(), 16777217) # basic array function, unsplit data, no copy torch_tensor = torch.tensor([6, 5, 4, 3, 2, 1], device=self.device.torch_device) @@ -190,10 +195,18 @@ def test_array(self): ) # distributed array, chunk local data (split), copy True - array_2d = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]]) + if self.is_mps: + np_dtype = np.float32 + torch_dtype = torch.float32 + else: + np_dtype = np.float64 + torch_dtype = torch.float64 + ht_dtype = ht.types.canonical_heat_type(torch_dtype) + + array_2d = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]], dtype=np_dtype) dndarray_2d = ht.array(array_2d, split=0, copy=True) self.assertIsInstance(dndarray_2d, ht.DNDarray) - self.assertEqual(dndarray_2d.dtype, ht.float64) + self.assertEqual(dndarray_2d.dtype, ht_dtype) self.assertEqual(dndarray_2d.gshape, (3, 3)) self.assertEqual(len(dndarray_2d.lshape), 2) self.assertLessEqual(dndarray_2d.lshape[0], 3) @@ -208,12 +221,12 @@ def test_array(self): # distributed array, chunk local data (split), copy False, torch devices array_2d = torch.tensor( [[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]], - dtype=torch.double, + dtype=torch_dtype, device=self.device.torch_device, ) - dndarray_2d = ht.array(array_2d, split=0, copy=False, dtype=ht.double) + dndarray_2d = ht.array(array_2d, split=0, copy=False, dtype=ht_dtype) self.assertIsInstance(dndarray_2d, ht.DNDarray) - self.assertEqual(dndarray_2d.dtype, ht.float64) + self.assertEqual(dndarray_2d.dtype, ht_dtype) self.assertEqual(dndarray_2d.gshape, (3, 3)) self.assertEqual(len(dndarray_2d.lshape), 2) self.assertLessEqual(dndarray_2d.lshape[0], 3) @@ -229,9 +242,9 @@ def test_array(self): self.assertIs(dndarray_2d.larray, array_2d) # The array should not change as all properties match - dndarray_2d_new = ht.array(dndarray_2d, split=0, copy=False, dtype=ht.double) + dndarray_2d_new = ht.array(dndarray_2d, split=0, copy=False, dtype=ht_dtype) self.assertIsInstance(dndarray_2d_new, ht.DNDarray) - self.assertEqual(dndarray_2d_new.dtype, ht.float64) + self.assertEqual(dndarray_2d_new.dtype, ht_dtype) self.assertEqual(dndarray_2d_new.gshape, (3, 3)) self.assertEqual(len(dndarray_2d_new.lshape), 2) self.assertLessEqual(dndarray_2d_new.lshape[0], 3) @@ -245,14 +258,14 @@ def test_array(self): # Reuse the same array self.assertIs(dndarray_2d_new.larray, dndarray_2d.larray) - # Should throw exeception because of resplit it causes a resplit + # Should throw exeception because it causes a resplit with self.assertRaises(ValueError): dndarray_2d_new = ht.array(dndarray_2d, split=1, copy=False, dtype=ht.double) # The array should not change as all properties match - dndarray_2d_new = ht.array(dndarray_2d, is_split=0, copy=False, dtype=ht.double) + dndarray_2d_new = ht.array(dndarray_2d, is_split=0, copy=False, dtype=ht_dtype) self.assertIsInstance(dndarray_2d_new, ht.DNDarray) - self.assertEqual(dndarray_2d_new.dtype, ht.float64) + self.assertEqual(dndarray_2d_new.dtype, ht_dtype) self.assertEqual(dndarray_2d_new.gshape, (3, 3)) self.assertEqual(len(dndarray_2d_new.lshape), 2) self.assertLessEqual(dndarray_2d_new.lshape[0], 3) @@ -574,65 +587,67 @@ def get_offset(tensor_array): def test_from_partitioned(self): a = ht.zeros((120, 120), split=0) - b = ht.from_partitioned(a, comm=a.comm) - a[2, :] = 128 - self.assertTrue(ht.equal(a, b)) - - a.resplit_(None) - b = ht.from_partitioned(a, comm=a.comm) - self.assertTrue(ht.equal(a, b)) - - a.resplit_(1) - b = ht.from_partitioned(a, comm=a.comm) - b[50] = 94 - self.assertTrue(ht.equal(a, b)) - - del b.__partitioned__["shape"] - with self.assertRaises(RuntimeError): - _ = ht.from_partitioned(b) - b.__partitions_dict__ = None - _ = b.__partitioned__ - - del b.__partitioned__["locals"] - with self.assertRaises(RuntimeError): - _ = ht.from_partitioned(b) - b.__partitions_dict__ = None - _ = b.__partitioned__ - - del b.__partitioned__["locals"] - with self.assertRaises(RuntimeError): - _ = ht.from_partitioned(b) - b.__partitions_dict__ = None - _ = b.__partitioned__ + if not self.is_mps: + b = ht.from_partitioned(a, comm=a.comm) + a[2, :] = 128 + self.assertTrue(ht.equal(a, b)) + + a.resplit_(None) + b = ht.from_partitioned(a, comm=a.comm) + self.assertTrue(ht.equal(a, b)) + + a.resplit_(1) + b = ht.from_partitioned(a, comm=a.comm) + b[50] = 94 + self.assertTrue(ht.equal(a, b)) + + del b.__partitioned__["shape"] + with self.assertRaises(RuntimeError): + _ = ht.from_partitioned(b) + b.__partitions_dict__ = None + _ = b.__partitioned__ + + del b.__partitioned__["locals"] + with self.assertRaises(RuntimeError): + _ = ht.from_partitioned(b) + b.__partitions_dict__ = None + _ = b.__partitioned__ + + del b.__partitioned__["locals"] + with self.assertRaises(RuntimeError): + _ = ht.from_partitioned(b) + b.__partitions_dict__ = None + _ = b.__partitioned__ def test_from_partition_dict(self): a = ht.zeros((120, 120), split=0) - b = ht.from_partition_dict(a.__partitioned__, comm=a.comm) - a[0, 0] = 100 - self.assertTrue(ht.equal(a, b)) - - a.resplit_(None) - a[0, 0] = 50 - b = ht.from_partition_dict(a.__partitioned__, comm=a.comm) - self.assertTrue(ht.equal(a, b)) - - del b.__partitioned__["shape"] - with self.assertRaises(RuntimeError): - _ = ht.from_partition_dict(b.__partitioned__) - b.__partitions_dict__ = None - _ = b.__partitioned__ - - del b.__partitioned__["locals"] - with self.assertRaises(RuntimeError): - _ = ht.from_partition_dict(b.__partitioned__) - b.__partitions_dict__ = None - _ = b.__partitioned__ - - del b.__partitioned__["locals"] - with self.assertRaises(RuntimeError): - _ = ht.from_partition_dict(b.__partitioned__) - b.__partitions_dict__ = None - _ = b.__partitioned__ + if not self.is_mps: + b = ht.from_partition_dict(a.__partitioned__, comm=a.comm) + a[0, 0] = 100 + self.assertTrue(ht.equal(a, b)) + + a.resplit_(None) + a[0, 0] = 50 + b = ht.from_partition_dict(a.__partitioned__, comm=a.comm) + self.assertTrue(ht.equal(a, b)) + + del b.__partitioned__["shape"] + with self.assertRaises(RuntimeError): + _ = ht.from_partition_dict(b.__partitioned__) + b.__partitions_dict__ = None + _ = b.__partitioned__ + + del b.__partitioned__["locals"] + with self.assertRaises(RuntimeError): + _ = ht.from_partition_dict(b.__partitioned__) + b.__partitions_dict__ = None + _ = b.__partitioned__ + + del b.__partitioned__["locals"] + with self.assertRaises(RuntimeError): + _ = ht.from_partition_dict(b.__partitioned__) + b.__partitions_dict__ = None + _ = b.__partitioned__ def test_full(self): # simple tensor @@ -732,8 +747,9 @@ def test_linspace(self): zero_samples = ht.linspace(-3, 5, num=0) self.assertEqual(zero_samples.size, 0) - check_precision = ht.linspace(0.0, 16777217.0, num=2, dtype=torch.float64) - self.assertEqual(check_precision.sum(), 16777217) + if not self.is_mps: + check_precision = ht.linspace(0.0, 16777217.0, num=2, dtype=torch.float64) + self.assertEqual(check_precision.sum(), 16777217) # simple inverse linear space descending = ht.linspace(-5, 3, num=100) diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py index 6f75846e5f..ac5ebd4a6c 100644 --- a/heat/core/tests/test_io.py +++ b/heat/core/tests/test_io.py @@ -1,3 +1,4 @@ +from typing import Iterable import numpy as np import os import torch @@ -35,6 +36,11 @@ def setUpClass(cls): .to(cls.device.torch_device) ) + cls.ZARR_SHAPE = (100, 100) + cls.ZARR_OUT_PATH = pwd + "/zarr_test_out.zarr" + cls.ZARR_IN_PATH = pwd + "/zarr_test_in.zarr" + cls.ZARR_TEMP_PATH = pwd + "/zarr_temp.zarr" + def tearDown(self): # synchronize all nodes ht.MPI_WORLD.Barrier() @@ -53,9 +59,38 @@ def tearDown(self): pass # if ht.MPI_WORLD.rank == 0: + if ht.io.supports_zarr(): + for file in [self.ZARR_TEMP_PATH, self.ZARR_IN_PATH, self.ZARR_OUT_PATH]: + try: + shutil.rmtree(file) + except FileNotFoundError: + pass + # synchronize all nodes ht.MPI_WORLD.Barrier() + def test_size_from_slice(self): + test_cases = [ + (1000, slice(500)), + (10, slice(0, 10, 2)), + (100, slice(0, 100, 10)), + (1000, slice(0, 1000, 100)), + (0, slice(0)), + ] + for size, slice_obj in test_cases: + with self.subTest(size=size, slice=slice_obj): + expected_sequence = list(range(size))[slice_obj] + if len(expected_sequence) == 0: + expected_offset = 0 + else: + expected_offset = expected_sequence[0] + + expected_new_size = len(expected_sequence) + + new_size, offset = ht.io.size_from_slice(size, slice_obj) + self.assertEqual(expected_new_size, new_size) + self.assertEqual(expected_offset, offset) + # catch-all loading def test_load(self): # HDF5 @@ -154,12 +189,23 @@ def test_load_csv(self): "Requires the environment variable 'TMPDIR' to point to a globally accessible path. Otherwise the test will be skiped on multi-node setups.", ) def test_save_csv(self): - for rnd_type in [ - (ht.random.randint, ht.types.int32), - (ht.random.randint, ht.types.int64), - (ht.random.rand, ht.types.float32), - (ht.random.rand, ht.types.float64), - ]: + # Test for different random types + # include float64 only if device is not MPS + data = None + if self.is_mps: + rnd_types = [ + (ht.random.randint, ht.types.int32), + (ht.random.randint, ht.types.int64), + (ht.random.rand, ht.types.float32), + ] + else: + rnd_types = [ + (ht.random.randint, ht.types.int32), + (ht.random.randint, ht.types.int64), + (ht.random.rand, ht.types.float32), + (ht.random.rand, ht.types.float64), + ] + for rnd_type in rnd_types: for separator in [",", ";", "|"]: for split in [None, 0, 1]: for headers in [None, ["# This", "# is a", "# test."]]: @@ -541,10 +587,6 @@ def test_load_hdf5(self): self.assertEqual(iris.larray.dtype, torch.float32) self.assertTrue((self.IRIS == iris.larray).all()) - # cropped load - iris_cropped = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0, load_fraction=0.5) - self.assertEqual(iris_cropped.shape[0], iris.shape[0] // 2) - # positive split axis iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0) self.assertIsInstance(iris, ht.DNDarray) @@ -582,10 +624,6 @@ def test_load_hdf5_exception(self): ht.load_hdf5("iris.h5", 1) with self.assertRaises(TypeError): ht.load_hdf5("iris.h5", dataset="data", split=1.0) - with self.assertRaises(TypeError): - ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, load_fraction="a") - with self.assertRaises(ValueError): - ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, load_fraction=0.0, split=0) # file or dataset does not exist with self.assertRaises(IOError): @@ -783,17 +821,19 @@ def test_load_npy_float(self): float_array = np.concatenate(crea_array, 1) ht.MPI_WORLD.Barrier() - load_array = ht.load_npy_from_path( - os.path.join(os.getcwd(), "heat/datasets"), dtype=ht.float64, split=1 - ) - load_array_npy = load_array.numpy() - self.assertIsInstance(load_array, ht.DNDarray) - self.assertEqual(load_array.dtype, ht.float64) - if ht.MPI_WORLD.rank == 0: - self.assertTrue((load_array_npy == float_array).all) - for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")): - if fnmatch.fnmatch(file, "*.npy"): - os.remove(os.path.join(os.getcwd(), "heat/datasets", file)) + if not self.is_mps: + # float64 not supported in MPS + load_array = ht.load_npy_from_path( + os.path.join(os.getcwd(), "heat/datasets"), dtype=ht.float64, split=1 + ) + load_array_npy = load_array.numpy() + self.assertIsInstance(load_array, ht.DNDarray) + self.assertEqual(load_array.dtype, ht.float64) + if ht.MPI_WORLD.rank == 0: + self.assertTrue((load_array_npy == float_array).all) + for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")): + if fnmatch.fnmatch(file, "*.npy"): + os.remove(os.path.join(os.getcwd(), "heat/datasets", file)) def test_load_npy_exception(self): with self.assertRaises(TypeError): @@ -892,3 +932,235 @@ def test_load_multiple_csv_exception(self): ht.MPI_WORLD.Barrier() if ht.MPI_WORLD.rank == 0: shutil.rmtree(os.path.join(os.getcwd(), "heat/datasets/csv_tests")) + + def test_load_zarr(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + test_data = np.arange(self.ZARR_SHAPE[0] * self.ZARR_SHAPE[1]).reshape(self.ZARR_SHAPE) + + if ht.MPI_WORLD.rank == 0: + try: + arr = zarr.create_array( + self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64 + ) + except AttributeError: + arr = zarr.create( + store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64 + ) + arr[:] = test_data + + ht.MPI_WORLD.handle.Barrier() + + dndarray = ht.load_zarr(self.ZARR_TEMP_PATH) + dndnumpy = dndarray.numpy() + + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == test_data).all()) + + ht.MPI_WORLD.Barrier() + + def test_load_zarr_slice(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + test_data = np.arange(25).reshape(5, 5) + + if ht.MPI_WORLD.rank == 0: + try: + arr = zarr.create_array( + self.ZARR_TEMP_PATH, shape=test_data.shape, dtype=test_data.dtype + ) + except AttributeError: + arr = zarr.create( + store=self.ZARR_TEMP_PATH, shape=test_data.shape, dtype=test_data.dtype + ) + arr[:] = test_data + + ht.MPI_WORLD.Barrier() + + slices_to_test = [ + None, + slice(None), + slice(1, -1), + [None], + [None, slice(None)], + [None, slice(1, -1)], + [slice(1, -1)], + [slice(1, -1), None], + ] + + for slices in slices_to_test: + with self.subTest(silces=slices): + dndarray = ht.load_zarr(self.ZARR_TEMP_PATH, slices=slices) + dndnumpy = dndarray.numpy() + + if not isinstance(slices, Iterable): + slices = [slices] + + slices = tuple( + slice(elem) if not isinstance(elem, slice) else elem for elem in slices + ) + + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == test_data[slices]).all()) + + ht.MPI_WORLD.Barrier() + + def test_save_zarr_2d_split0(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for dims in [(i, self.ZARR_SHAPE[1]) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]: + with self.subTest(type=type, dims=dims): + n = dims[0] * dims[1] + dndarray = ht.arange(0, n, dtype=type, split=0).reshape(dims) + ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True) + dndnumpy = dndarray.numpy() + zarr_array = zarr.open_array(self.ZARR_OUT_PATH) + + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == zarr_array).all()) + + ht.MPI_WORLD.handle.Barrier() + + def test_save_zarr_2d_split1(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for dims in [(self.ZARR_SHAPE[0], i) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]: + with self.subTest(type=type, dims=dims): + n = dims[0] * dims[1] + dndarray = ht.arange(0, n, dtype=type).reshape(dims).resplit(axis=1) + ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True) + dndnumpy = dndarray.numpy() + zarr_array = zarr.open_array(self.ZARR_OUT_PATH) + + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == zarr_array).all()) + + ht.MPI_WORLD.handle.Barrier() + + def test_save_zarr_split_none(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for n in [10, 100, 1000]: + with self.subTest(type=type, n=n): + dndarray = ht.arange(n, dtype=type, split=None) + ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True) + arr = zarr.open_array(self.ZARR_OUT_PATH) + dndnumpy = dndarray.numpy() + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == arr).all()) + + ht.MPI_WORLD.handle.Barrier() + + def test_save_zarr_1d_split_0(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for n in [10, 100, 1000]: + with self.subTest(type=type, n=n): + dndarray = ht.arange(n, dtype=type, split=0) + ht.save_zarr(dndarray, self.ZARR_OUT_PATH, overwrite=True) + arr = zarr.open_array(self.ZARR_OUT_PATH) + dndnumpy = dndarray.numpy() + if ht.MPI_WORLD.rank == 0: + self.assertTrue((dndnumpy == arr).all()) + + ht.MPI_WORLD.handle.Barrier() + + def test_load_zarr_arguments(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + with self.assertRaises(TypeError): + ht.load_zarr(None) + with self.assertRaises(ValueError): + ht.load_zarr("data.npy") + with self.assertRaises(TypeError): + ht.load_zarr("", "") + with self.assertRaises(TypeError): + ht.load_zarr("", device=1) + with self.assertRaises(TypeError): + ht.load_zarr("", slices=0) + with self.assertRaises(TypeError): + ht.load_zarr("", slices=[0]) + + def test_save_zarr_arguments(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + with self.assertRaises(TypeError): + ht.save_zarr(None, None) + with self.assertRaises(ValueError): + ht.save_zarr(None, "data.npy") + + comm = ht.MPI_WORLD + if comm.rank == 0: + zarr.create( + store=self.ZARR_TEMP_PATH, + shape=(4, 4), + dtype=ht.types.int.char(), + overwrite=True, + ) + comm.Barrier() + + with self.assertRaises(RuntimeError): + ht.save_zarr(ht.arange(16).reshape((4, 4)), self.ZARR_TEMP_PATH) + + @unittest.skipIf(not ht.io.supports_hdf5(), reason="Requires HDF5") + def test_load_partial_hdf5(self): + test_axis = [None, 0, 1] + test_slices = [ + (slice(0, 50, None), slice(None, None, None)), + (slice(0, 50, None), slice(0, 2, None)), + (slice(50, 100, None), slice(None, None, None)), + (slice(None, None, None), slice(2, 4, None)), + (slice(50), None), + (None, slice(0, 3, 2)), + (slice(50),), + (slice(50, 100),), + ] + test_cases = [(a, s) for a in test_axis for s in test_slices] + + for axis, slices in test_cases: + with self.subTest(axis=axis, slices=slices): + HDF5_PATH = os.path.join(os.getcwd(), "heat/datasets/iris.h5") + HDF5_DATASET = "data" + expect_error = False + for s in slices: + if s and s.step not in [None, 1]: + expect_error = True + break + + if expect_error: + with self.assertRaises(ValueError): + sliced_iris = ht.load_hdf5( + HDF5_PATH, HDF5_DATASET, split=axis, slices=slices + ) + else: + original_iris = ht.load_hdf5(HDF5_PATH, HDF5_DATASET, split=axis) + tmp_slices = tuple(slice(None) if s is None else s for s in slices) + expected_iris = original_iris[tmp_slices] + sliced_iris = ht.load_hdf5(HDF5_PATH, HDF5_DATASET, split=axis, slices=slices) + self.assertTrue(ht.equal(sliced_iris, expected_iris)) diff --git a/heat/core/tests/test_logical.py b/heat/core/tests/test_logical.py index 3e46fd144e..c2da61d64b 100644 --- a/heat/core/tests/test_logical.py +++ b/heat/core/tests/test_logical.py @@ -182,7 +182,9 @@ def test_allclose(self): c = ht.zeros((4, 6), split=0) d = ht.zeros((4, 6), split=1) e = ht.zeros((4, 6)) - f = ht.float64([[2.000005, 2.000005], [2.000005, 2.000005]]) + + if not self.is_mps: + f = ht.float64([[2.000005, 2.000005], [2.000005, 2.000005]]) self.assertFalse(ht.allclose(a, b)) self.assertTrue(ht.allclose(a, b, atol=1e-04)) @@ -190,7 +192,8 @@ def test_allclose(self): self.assertTrue(ht.allclose(a, 2)) self.assertTrue(ht.allclose(a, 2.0)) self.assertTrue(ht.allclose(2, a)) - self.assertTrue(ht.allclose(f, a)) + if not self.is_mps: + self.assertTrue(ht.allclose(f, a)) self.assertTrue(ht.allclose(c, d)) self.assertTrue(ht.allclose(c, e)) self.assertTrue(e.allclose(c)) @@ -223,13 +226,14 @@ def test_any(self): self.assertTrue(ht.equal(any_tensor, res)) # float values, no axis - x = ht.float64([[0, 0, 0], [0, 0, 0]]) - res = ht.zeros(1, dtype=ht.uint8) - any_tensor = ht.any(x) - self.assertIsInstance(any_tensor, ht.DNDarray) - self.assertEqual(any_tensor.shape, ()) - self.assertEqual(any_tensor.dtype, ht.bool) - self.assertTrue(ht.equal(any_tensor, res)) + if not self.is_mps: + x = ht.float64([[0, 0, 0], [0, 0, 0]]) + res = ht.zeros(1, dtype=ht.uint8) + any_tensor = ht.any(x) + self.assertIsInstance(any_tensor, ht.DNDarray) + self.assertEqual(any_tensor.shape, ()) + self.assertEqual(any_tensor.dtype, ht.bool) + self.assertTrue(ht.equal(any_tensor, res)) # split tensor, along axis x = ht.arange(10, split=0) diff --git a/heat/core/tests/test_manipulations.py b/heat/core/tests/test_manipulations.py index cefb95a01b..e3c5ad232d 100644 --- a/heat/core/tests/test_manipulations.py +++ b/heat/core/tests/test_manipulations.py @@ -56,9 +56,10 @@ def tests_broadcast_to(self): self.assertEqual(broadcasted.dtype, ht.float32) # check split - a = ht.zeros((5, 5), split=0) - broadcasted = ht.broadcast_to(a, (5, 5, 5)) - self.assertEqual(broadcasted.split, 1) + if not self.is_mps: + a = ht.zeros((5, 5), split=0) + broadcasted = ht.broadcast_to(a, (5, 5, 5)) + self.assertEqual(broadcasted.split, 1) # test view a = ht.arange(5) @@ -442,10 +443,11 @@ def test_concatenate(self): self.assertEqual(res.lshape, tuple(lshape)) # 0 0 0 - x = ht.ones((16,), split=0, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + x = ht.ones((16,), split=0, dtype=dtype) res = ht.concatenate((x, y), axis=0) self.assertEqual(res.gshape, (32,)) - self.assertEqual(res.dtype, ht.float64) + self.assertEqual(res.dtype, dtype) _, _, chk = res.comm.chunk((32,), res.split) lshape = [0] lshape[0] = chk[0].stop - chk[0].start @@ -455,7 +457,7 @@ def test_concatenate(self): y = ht.ones((16,), split=None, dtype=ht.int64) res = ht.concatenate((x, y), axis=0) self.assertEqual(res.gshape, (32,)) - self.assertEqual(res.dtype, ht.float64) + self.assertEqual(res.dtype, dtype) _, _, chk = res.comm.chunk((32,), res.split) lshape = [0] lshape[0] = chk[0].stop - chk[0].start @@ -571,13 +573,14 @@ def test_diag(self): numpy_args={"k": 2}, ) - self.assert_func_equal( - (5,), - heat_func=ht.diag, - numpy_func=np.diag, - heat_args={"offset": -3}, - numpy_args={"k": -3}, - ) + if not res.device.torch_device.startswith("mps"): + self.assert_func_equal( + (5,), + heat_func=ht.diag, + numpy_func=np.diag, + heat_args={"offset": -3}, + numpy_args={"k": -3}, + ) def test_diagonal(self): size = ht.MPI_WORLD.size @@ -685,7 +688,8 @@ def test_diagonal(self): res.balance_() self.assertTrue( torch.equal( - res.larray, torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device) + res.larray, + torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device), ) ) @@ -702,7 +706,8 @@ def test_diagonal(self): res.balance_() self.assertTrue( torch.equal( - res.larray, torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device) + res.larray, + torch.tensor([rank * 2, 1 + rank * 2], device=self.device.torch_device), ) ) @@ -824,29 +829,30 @@ def test_diagonal(self): with self.assertRaises(ValueError): ht.diagonal(data) - self.assert_func_equal( - (5, 5, 5), - heat_func=ht.diagonal, - numpy_func=np.diagonal, - heat_args={"dim1": 0, "dim2": 2}, - numpy_args={"axis1": 0, "axis2": 2}, - ) + if not res.device.torch_device.startswith("mps"): + self.assert_func_equal( + (5, 5, 5), + heat_func=ht.diagonal, + numpy_func=np.diagonal, + heat_args={"dim1": 0, "dim2": 2}, + numpy_args={"axis1": 0, "axis2": 2}, + ) - self.assert_func_equal( - (5, 4, 3, 2), - heat_func=ht.diagonal, - numpy_func=np.diagonal, - heat_args={"dim1": 1, "dim2": 2}, - numpy_args={"axis1": 1, "axis2": 2}, - ) + self.assert_func_equal( + (5, 4, 3, 2), + heat_func=ht.diagonal, + numpy_func=np.diagonal, + heat_args={"dim1": 1, "dim2": 2}, + numpy_args={"axis1": 1, "axis2": 2}, + ) - self.assert_func_equal( - (4, 6, 3), - heat_func=ht.diagonal, - numpy_func=np.diagonal, - heat_args={"dim1": 0, "dim2": 1}, - numpy_args={"axis1": 0, "axis2": 1}, - ) + self.assert_func_equal( + (4, 6, 3), + heat_func=ht.diagonal, + numpy_func=np.diagonal, + heat_args={"dim1": 0, "dim2": 1}, + numpy_args={"axis1": 0, "axis2": 1}, + ) def test_dsplit(self): # for further testing, see test_split @@ -1773,7 +1779,7 @@ def test_repeat(self): # ------------------- # a = np.ndarray # ------------------- - a = np.array([1.2, 2.4, 3, 4, 5]) + a = np.array([1.2, 2.4, 3, 4, 5]).astype(np.float32) # axis is None # repeats = scalar repeats = 2 @@ -2222,7 +2228,11 @@ def test_reshape(self): self.assertTrue(ht.equal(reshaped, result)) self.assertEqual(reshaped.device, result.device) - b = ht.arange(4 * 5 * 6, dtype=ht.float64) + if a.device.torch_device.startswith("mps"): + float_type = ht.float32 + else: + float_type = ht.float64 + b = ht.arange(4 * 5 * 6, dtype=float_type) # test *shape input reshaped = b.reshape(4, 5, 6) self.assertTrue(reshaped.gshape == (4, 5, 6)) @@ -2269,8 +2279,8 @@ def test_reshape(self): self.assertEqual(reshaped.device, result.device) # 1-dim distributed vector - a = ht.arange(8, dtype=ht.float64, split=0, device=self.device) - result = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dtype=ht.float64, split=0) + a = ht.arange(8, dtype=float_type, split=0, device=self.device) + result = ht.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], dtype=float_type, split=0) reshaped = ht.reshape(a, (2, 2, 2)) self.assertEqual(reshaped.size, result.size) @@ -2554,74 +2564,75 @@ def test_roll(self): self.assertEqual(rolled.split, a.split) self.assertTrue(np.array_equal(rolled.numpy(), compare)) - a = ht.arange(20, dtype=ht.complex64).reshape((4, 5), new_split=1) - - rolled = ht.roll(a, -1) - compare = np.roll(a.numpy(), -1) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, 1, 0) - compare = np.roll(a.numpy(), 1, 0) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, -2, [0, 1]) - compare = np.roll(a.numpy(), -2, [0, 1]) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, [1, 2, 1], [0, 1, -2]) - compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2]) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - # added 3D test, only a quick test for functionality - a = ht.arange(4 * 5 * 6, dtype=ht.complex64).reshape((4, 5, 6), new_split=2) - - rolled = ht.roll(a, -1) - compare = np.roll(a.numpy(), -1) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, 1, 0) - compare = np.roll(a.numpy(), 1, 0) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, -2, [0, 1]) - compare = np.roll(a.numpy(), -2, [0, 1]) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) - - rolled = ht.roll(a, [1, 2, 1], [0, 1, -2]) - compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2]) - self.assertEqual(rolled.device, a.device) - self.assertEqual(rolled.size, a.size) - self.assertEqual(rolled.dtype, a.dtype) - self.assertEqual(rolled.split, a.split) - self.assertTrue(np.array_equal(rolled.numpy(), compare)) + if not a.device.torch_device.startswith("mps"): + a = ht.arange(20, dtype=ht.complex64).reshape((4, 5), new_split=1) + + rolled = ht.roll(a, -1) + compare = np.roll(a.numpy(), -1) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, 1, 0) + compare = np.roll(a.numpy(), 1, 0) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, -2, [0, 1]) + compare = np.roll(a.numpy(), -2, [0, 1]) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, [1, 2, 1], [0, 1, -2]) + compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2]) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + # added 3D test, only a quick test for functionality + a = ht.arange(4 * 5 * 6, dtype=ht.complex64).reshape((4, 5, 6), new_split=2) + + rolled = ht.roll(a, -1) + compare = np.roll(a.numpy(), -1) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, 1, 0) + compare = np.roll(a.numpy(), 1, 0) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, -2, [0, 1]) + compare = np.roll(a.numpy(), -2, [0, 1]) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) + + rolled = ht.roll(a, [1, 2, 1], [0, 1, -2]) + compare = np.roll(a.numpy(), [1, 2, 1], [0, 1, -2]) + self.assertEqual(rolled.device, a.device) + self.assertEqual(rolled.size, a.size) + self.assertEqual(rolled.dtype, a.dtype) + self.assertEqual(rolled.split, a.split) + self.assertTrue(np.array_equal(rolled.numpy(), compare)) with self.assertRaises(TypeError): ht.roll(a, 1.0, 0) @@ -2687,7 +2698,10 @@ def test_row_stack(self): # test local row_stack, 2-D arrays a = np.arange(10, dtype=np.float32).reshape(2, 5) b = np.arange(15, dtype=np.float32).reshape(3, 5) - np_rstack = np.row_stack((a, b)) + if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1": + np_rstack = np.vstack((a, b)) + else: + np_rstack = np.row_stack((a, b)) ht_a = ht.array(a) ht_b = ht.array(b) ht_rstack = ht.row_stack((ht_a, ht_b)) @@ -2695,14 +2709,20 @@ def test_row_stack(self): # 2-D and 1-D arrays c = np.arange(5, dtype=np.float32) - np_rstack = np.row_stack((a, b, c)) + if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1": + np_rstack = np.vstack((a, b, c)) + else: + np_rstack = np.row_stack((a, b, c)) ht_c = ht.array(c) ht_rstack = ht.row_stack((ht_a, ht_b, ht_c)) self.assertTrue((np_rstack == ht_rstack.numpy()).all()) # 2-D and 1-D arrays, distributed c = np.arange(5, dtype=np.float32) - np_rstack = np.row_stack((a, b, c)) + if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1": + np_rstack = np.vstack((a, b, c)) + else: + np_rstack = np.row_stack((a, b, c)) ht_a = ht.array(a, split=0) ht_b = ht.array(b, split=0) ht_c = ht.array(c, split=0) @@ -2713,7 +2733,10 @@ def test_row_stack(self): # 1-D arrays, distributed, different dtypes d = np.arange(10).astype(np.float32) e = np.arange(10) - np_rstack = np.row_stack((d, e)) + if np.lib.NumpyVersion(np.__version__) >= "2.0.0b1": + np_rstack = np.vstack((d, e)) + else: + np_rstack = np.row_stack((d, e)) ht_d = ht.array(d, split=0) ht_e = ht.array(e, split=0) ht_rstack = ht.row_stack((ht_d, ht_e)) @@ -2814,20 +2837,22 @@ def test_sort(self): exp_axis_zero = torch.tensor( [[2, 3, 0], [0, 2, 3]], dtype=torch.int32, device=self.device.torch_device ) - if torch.cuda.is_available() and data.device == ht.gpu and size < 4: - indices_axis_zero = torch.tensor( - [[0, 2, 2], [3, 2, 0]], dtype=torch.int32, device=self.device.torch_device - ) - else: - indices_axis_zero = torch.tensor( - [[0, 2, 2], [3, 0, 0]], dtype=torch.int32, device=self.device.torch_device - ) + indices_axis_zero = torch.tensor( + [[0, 2, 2], [3, 0, 0]], dtype=torch.int32, device=self.device.torch_device + ) result, result_indices = ht.sort(data, axis=0) first = result[0].larray first_indices = result_indices[0].larray if rank == 0: self.assertTrue(torch.equal(first, exp_axis_zero)) - self.assertTrue(torch.equal(first_indices, indices_axis_zero)) + try: + self.assertTrue(torch.equal(first_indices, indices_axis_zero)) + except AssertionError: + # if environment is CUDA (not ROCm), the indices are not sorted correctly + indices_axis_zero = torch.tensor( + [[0, 2, 2], [3, 2, 0]], dtype=torch.int32, device=self.device.torch_device + ) + self.assertTrue(torch.equal(first_indices, indices_axis_zero)) data = ht.array(tensor, split=1) exp_axis_one = torch.tensor([[2, 2, 3]], dtype=torch.int32, device=self.device.torch_device) @@ -3475,13 +3500,14 @@ def test_tile(self): # test tile along split axis # len(reps) = x.ndim - split = 1 - x = ht.random.randn(3, 3, dtype=ht.float64, split=split) - reps = (2, 3) - tiled_along_split = ht.tile(x, reps) - np_tiled_along_split = np.tile(x.numpy(), reps) - self.assertTrue((tiled_along_split.numpy() == np_tiled_along_split).all()) - self.assertTrue(tiled_along_split.dtype is x.dtype) + if not self.is_mps: + split = 1 + x = ht.random.randn(3, 3, dtype=ht.float64, split=split) + reps = (2, 3) + tiled_along_split = ht.tile(x, reps) + np_tiled_along_split = np.tile(x.numpy(), reps) + self.assertTrue((tiled_along_split.numpy() == np_tiled_along_split).all()) + self.assertTrue(tiled_along_split.dtype is x.dtype) # test exceptions float_reps = (1, 2, 2, 1.5) @@ -3540,14 +3566,22 @@ def test_topk(self): self.assertTrue((out[1].larray == exp_zero.larray).all()) self.assertTrue(out[1].larray.dtype == exp_zero_indcs.larray.dtype) - torch_array = torch.arange( - size, dtype=torch.float64, device=self.device.torch_device - ).expand(size, size) + if self.is_mps: + float_type = torch.float32 + else: + float_type = torch.float64 + ht_float_type = ht.types.canonical_heat_type(float_type) + + torch_array = torch.arange(size, dtype=float_type, device=self.device.torch_device).expand( + size, size + ) split_zero = ht.array(torch_array, split=0) split_one = ht.array(torch_array, split=1) res, indcs = ht.topk(split_zero, 2, sorted=True) - exp_zero = ht.array([[size - 1, size - 2] for i in range(size)], dtype=ht.float64, split=0) + exp_zero = ht.array( + [[size - 1, size - 2] for i in range(size)], dtype=ht_float_type, split=0 + ) exp_zero_indcs = ht.array( [[size - 1, size - 2] for i in range(size)], dtype=ht.int64, split=0 ) @@ -3556,7 +3590,9 @@ def test_topk(self): self.assertTrue(indcs.larray.dtype == exp_zero_indcs.larray.dtype) res, indcs = ht.topk(split_one, 2, sorted=True) - exp_one = ht.array([[size - 1, size - 2] for i in range(size)], dtype=ht.float64, split=1) + exp_one = ht.array( + [[size - 1, size - 2] for i in range(size)], dtype=ht_float_type, split=1 + ) exp_one_indcs = ht.array( [[size - 1, size - 2] for i in range(size)], dtype=ht.int64, split=1 ) @@ -3570,7 +3606,7 @@ def test_topk(self): out = (ht.empty_like(exp_zero), ht.empty_like(exp_zero_indcs)) res, indcs = ht.topk(split_zero, 2, sorted=True, largest=False, out=out) with self.assertRaises(RuntimeError): - exp_zero = ht.array([[0, 1] for i in range(size)], dtype=ht.float64, split=0) + exp_zero = ht.array([[0, 1] for i in range(size)], dtype=ht_float_type, split=0) exp_zero_indcs = ht.array([[0, 1] for i in range(size)], dtype=ht.int16, split=0) out = (ht.empty_like(exp_zero), ht.empty_like(exp_zero_indcs)) res, indcs = ht.topk(split_zero, 2, sorted=True, largest=False, out=out) @@ -3619,11 +3655,15 @@ def test_unique(self): res, inv = ht.unique(data, return_inverse=True, axis=0) _, exp_inv = torch_array.unique(dim=0, return_inverse=True, sorted=True) - self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype))) + self.assertTrue( + (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all() + ) res, inv = ht.unique(data, return_inverse=True, axis=1) _, exp_inv = torch_array.unique(dim=1, return_inverse=True, sorted=True) - self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype))) + self.assertTrue( + (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all() + ) torch_array = torch.tensor( [[1, 1, 2], [1, 2, 2], [2, 1, 2], [1, 3, 2], [0, 1, 2]], @@ -3647,7 +3687,9 @@ def test_unique(self): data_split_zero = ht.array(torch_array, split=0) res, inv = ht.unique(data_split_zero, return_inverse=True, sorted=True) - self.assertTrue(torch.equal(inv, exp_inv.to(dtype=inv.dtype))) + self.assertTrue( + (inv == ht.array(exp_inv.to(dtype=inv.larray.dtype), split=inv.split)).all() + ) def test_vsplit(self): # for further testing, see test_split diff --git a/heat/core/tests/test_printing.py b/heat/core/tests/test_printing.py index cc8fd6d0a9..fd6e382e2a 100644 --- a/heat/core/tests/test_printing.py +++ b/heat/core/tests/test_printing.py @@ -430,10 +430,19 @@ def test_split_2_above_threshold(self): if dndarray.comm.rank == 0: self.assertEqual(comparison, __str) + def test___repr__(self): + a = ht.array([1, 2, 3, 4]) + r = a.__repr__() + self.assertEqual( + r, + f"", + ) + class TestPrintingGPU(TestCase): def test_print_GPU(self): # this test case also includes GPU now, checking the output is not done; only test whether the routine itself works... - a0 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(0) - a1 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(1) - print(a0, a1) + if not self.is_mps: + a0 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(0) + a1 = ht.arange(2**20, dtype=ht.float32).reshape((2**10, 2**10)).resplit_(1) + print(a0, a1) diff --git a/heat/core/tests/test_random.py b/heat/core/tests/test_random.py index c8e867c490..f0bc9b1f92 100644 --- a/heat/core/tests/test_random.py +++ b/heat/core/tests/test_random.py @@ -1,9 +1,16 @@ +import os +import platform +import unittest + import numpy as np import torch import heat as ht from .test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + class TestRandom_Batchparallel(TestCase): def test_default(self): @@ -53,10 +60,13 @@ def test_permutation(self): if self.device.torch_device == "cpu": state = torch.random.get_rng_state() else: - state = torch.cuda.get_rng_state(self.device.torch_device) + if self.is_mps: + state = torch.mps.get_rng_state() + else: + state = torch.cuda.get_rng_state(self.device.torch_device) # results - a = ht.random.permutation(10) + a = ht.random.permutation(10, device=self.device) b_arr = ht.arange(10, dtype=ht.float32) b = ht.random.permutation(ht.resplit(b_arr, 0)) @@ -70,7 +80,10 @@ def test_permutation(self): if self.device.torch_device == "cpu": torch.random.set_rng_state(state) else: - torch.cuda.set_rng_state(state, self.device.torch_device) + if self.is_mps: + torch.mps.set_rng_state(state) + else: + torch.cuda.set_rng_state(state, self.device.torch_device) # torch results to compare to a_cmp = torch.randperm(a.shape[0], device=self.device.torch_device) @@ -83,18 +96,19 @@ def test_permutation(self): self.assertEqual(a.dtype, ht.int64) self.assertEqual(b.dtype, ht.float32) - c0.resplit_(None) - c1.resplit_(None) - b.resplit_(None) + if not self.is_mps: + c0.resplit_(None) + c1.resplit_(None) + b.resplit_(None) - # due to different states of the torch RNG on different processes and due to construction of the permutation - # the values are only equal on process no 0 which has been used for generating the permutation - if ht.MPI_WORLD.rank == 0: - self.assertTrue((a.larray == a_cmp).all()) - self.assertTrue((b.larray == b_cmp).all()) - self.assertTrue((c.larray == c_cmp).all()) - self.assertTrue((c0.larray == c0_cmp).all()) - self.assertTrue((c1.larray == c1_cmp).all()) + # due to different states of the torch RNG on different processes and due to construction of the permutation + # the values are only equal on process no 0 which has been used for generating the permutation + if ht.MPI_WORLD.rank == 0: + self.assertTrue((a.larray == a_cmp).all()) + self.assertTrue((b.larray == b_cmp).all()) + self.assertTrue((c.larray == c_cmp).all()) + self.assertTrue((c0.larray == c0_cmp).all()) + self.assertTrue((c1.larray == c1_cmp).all()) with self.assertRaises(TypeError): ht.random.permutation("abc") @@ -122,19 +136,21 @@ def test_rand(self): self.assertTrue((counts <= 2).all()) # Two large arrays that were created after each other don't share too much values - b = ht.random.rand(14, 7, 3, 12, 18, 42, split=5, comm=ht.MPI_WORLD, dtype=ht.float64) - c = np.concatenate((a.flatten(), b.numpy().flatten())) - _, counts = np.unique(c, return_counts=True) - self.assertTrue((counts <= 2).all()) + if not self.is_mps: + # this condition is not met if b is float32, MPS does not support float64 + b = ht.random.rand(14, 7, 3, 12, 18, 42, split=5, comm=ht.MPI_WORLD, dtype=ht.float64) + c = np.concatenate((a.flatten(), b.numpy().flatten())) + _, counts = np.unique(c, return_counts=True) + self.assertTrue((counts <= 2).all()) - # Values should be spread evenly across the range [0, 1) - mean = np.mean(c) - median = np.median(c) - std = np.std(c) - self.assertTrue(0.49 < mean < 0.51) - self.assertTrue(0.49 < median < 0.51) - self.assertTrue(std < 0.3) - self.assertTrue(((0 <= c) & (c < 1)).all()) + # Values should be spread evenly across the range [0, 1) + mean = np.mean(c) + median = np.median(c) + std = np.std(c) + self.assertTrue(0.49 < mean < 0.51) + self.assertTrue(0.49 < median < 0.51) + self.assertTrue(std < 0.3) + self.assertTrue(((0 <= c) & (c < 1)).all()) # No arguments work correctly ht.random.seed(seed) @@ -196,7 +212,9 @@ def test_randint(self): ht.random.seed(13579) b = ht.random.randint(low=0, high=10000, size=shape, split=2, dtype=ht.int64) - self.assertTrue(ht.equal(a, b)) + if not self.is_mps: + # assertion fails on more than 4 dimensions on MPS + self.assertTrue(ht.equal(a, b)) mean = ht.mean(a) # median = ht.median(a) std = ht.std(a) @@ -252,11 +270,12 @@ def test_randint(self): self.assertTrue(ht.equal(a, b)) def test_randn(self): + float_dtype = ht.float32 if self.is_mps else ht.float64 # Test that the random values have the correct distribution ht.random.seed(54321) shape = (5, 10, 13, 23) - a = ht.random.randn(*shape, split=0, dtype=ht.float64) - self.assertEqual(a.dtype, ht.float64) + a = ht.random.randn(*shape, split=0, dtype=float_dtype) + self.assertEqual(a.dtype, float_dtype) mean = ht.mean(a) median = ht.median(a) std = ht.std(a) @@ -265,22 +284,23 @@ def test_randn(self): self.assertTrue(0.98 < std < 1.02) # Creating the same array two times without resetting seed results in different elements - c = ht.random.randn(*shape, split=0, dtype=ht.float64) + c = ht.random.randn(*shape, split=0, dtype=float_dtype) self.assertEqual(c.shape, a.shape) self.assertFalse(ht.allclose(a, c)) - # All the created values should be different - d = ht.concatenate((a, c)) - d.resplit_(None) - d = d.numpy() - _, counts = np.unique(d, return_counts=True) - self.assertTrue((counts == 1).all()) + if not self.is_mps: + # If dtype is float64, all the created values should be different + d = ht.concatenate((a, c)) + d.resplit_(None) + d = d.numpy() + _, counts = np.unique(d, return_counts=True) + self.assertTrue((counts == 1).all()) # Two arrays are the same for same seed and split-axis != 0 ht.random.seed(12345) - a = ht.random.randn(*shape, split=3, dtype=ht.float64) + a = ht.random.randn(*shape, split=3, dtype=float_dtype) ht.random.seed(12345) - b = ht.random.randn(*shape, split=3, dtype=ht.float64) + b = ht.random.randn(*shape, split=3, dtype=float_dtype) self.assertTrue(ht.equal(a, b)) # Tests with float32 @@ -313,32 +333,43 @@ def test_randn(self): self.assertTrue(isinstance(x, float)) def test_randperm(self): + # Reset RNG + ht.random.seed() if self.device.torch_device == "cpu": state = torch.random.get_rng_state() else: - state = torch.cuda.get_rng_state(self.device.torch_device) + if self.is_mps: + state = torch.mps.get_rng_state() + else: + state = torch.cuda.get_rng_state(self.device.torch_device) # results a = ht.random.randperm(10, dtype=ht.int32) b = ht.random.randperm(4, dtype=ht.float32, split=0) c = ht.random.randperm(5, split=0) - d = ht.random.randperm(5, dtype=ht.float64) + if not self.is_mps: + d = ht.random.randperm(5, dtype=ht.float64) if self.device.torch_device == "cpu": torch.random.set_rng_state(state) else: - torch.cuda.set_rng_state(state, self.device.torch_device) + if self.is_mps: + torch.mps.set_rng_state(state) + else: + torch.cuda.set_rng_state(state, self.device.torch_device) # torch results to compare to - a_cmp = torch.randperm(10, dtype=torch.int32, device=self.device.torch_device) + a_cmp = torch.randperm(10, dtype=torch.int32, device=a.larray.device) b_cmp = torch.randperm(4, dtype=torch.float32, device=self.device.torch_device) c_cmp = torch.randperm(5, dtype=torch.int64, device=self.device.torch_device) - d_cmp = torch.randperm(5, dtype=torch.float64, device=self.device.torch_device) + if not self.is_mps: + d_cmp = torch.randperm(5, dtype=torch.float64, device=self.device.torch_device) self.assertEqual(a.dtype, ht.int32) self.assertEqual(b.dtype, ht.float32) self.assertEqual(c.dtype, ht.int64) - self.assertEqual(d.dtype, ht.float64) + if not self.is_mps: + self.assertEqual(d.dtype, ht.float64) brsp = ht.resplit(b) crsp = ht.resplit(c) @@ -348,7 +379,8 @@ def test_randperm(self): self.assertTrue((a.larray == a_cmp).all()) self.assertTrue((brsp.larray == b_cmp).all()) self.assertTrue((crsp.larray == c_cmp).all()) - self.assertTrue((d.larray == d_cmp).all()) + if not self.is_mps: + self.assertTrue((d.larray == d_cmp).all()) with self.assertRaises(TypeError): ht.random.randperm("abc") @@ -411,6 +443,7 @@ def test_set_state(self): """ +@unittest.skipIf(is_mps, "Threefry not supported on Apple MPS") class TestRandom_Threefry(TestCase): def test_setting_threefry(self): ht.random.set_state(("Threefry", 12345, 0xFFF)) @@ -605,7 +638,7 @@ def test_rand(self): self.assertTrue(ht.equal(a, b)) # Too big arrays cant be created - with self.assertRaises(ValueError): + with self.assertRaises(RuntimeError): ht.random.randn(0x7FFFFFFFFFFFFFFF) with self.assertRaises(ValueError): ht.random.rand(3, 2, -2, 5, split=1) diff --git a/heat/core/tests/test_rounding.py b/heat/core/tests/test_rounding.py index 761742095d..597cd044f9 100644 --- a/heat/core/tests/test_rounding.py +++ b/heat/core/tests/test_rounding.py @@ -1,3 +1,5 @@ +import platform + import numpy as np import torch @@ -17,12 +19,14 @@ def test_abs(self): int16_absolute_values_fabs = ht.fabs(int16_tensor_fabs) int32_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int32, split=0) int32_absolute_values_fabs = ht.fabs(int32_tensor_fabs) - int64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int64, split=0) - int64_absolute_values_fabs = ht.fabs(int64_tensor_fabs) + if not self.is_mps: + int64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.int64, split=0) + int64_absolute_values_fabs = ht.fabs(int64_tensor_fabs) float32_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float32, split=0) float32_absolute_values_fabs = ht.fabs(float32_tensor_fabs) - float64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float64, split=0) - float64_absolute_values_fabs = ht.fabs(float64_tensor_fabs) + if not self.is_mps: + float64_tensor_fabs = ht.arange(-10.5, 10.5, dtype=ht.float64, split=0) + float64_absolute_values_fabs = ht.fabs(float64_tensor_fabs) # basic absolute test self.assertIsInstance(absolute_values, ht.DNDarray) @@ -32,9 +36,11 @@ def test_abs(self): self.assertEqual(int8_absolute_values_fabs.sum(axis=0), 100.0) self.assertEqual(int16_absolute_values_fabs.sum(axis=0), 100.0) self.assertEqual(int32_absolute_values_fabs.sum(axis=0), 100.0) - self.assertEqual(int64_absolute_values_fabs.sum(axis=0), 100.0) + if not self.is_mps: + self.assertEqual(int64_absolute_values_fabs.sum(axis=0), 100.0) self.assertEqual(float32_absolute_values_fabs.sum(axis=0), 110.5) - self.assertEqual(float64_absolute_values_fabs.sum(axis=0), 110.5) + if not self.is_mps: + self.assertEqual(float64_absolute_values_fabs.sum(axis=0), 110.5) # check whether output works # for abs==absolute @@ -65,9 +71,10 @@ def test_abs(self): self.assertEqual(int8_absolute_values_fabs.dtype, ht.float32) self.assertEqual(int16_absolute_values_fabs.dtype, ht.float32) self.assertEqual(int32_absolute_values_fabs.dtype, ht.float32) - self.assertEqual(int64_absolute_values_fabs.dtype, ht.float64) self.assertEqual(float32_absolute_values_fabs.dtype, ht.float32) - self.assertEqual(float64_absolute_values_fabs.dtype, ht.float64) + if not self.is_mps: + self.assertEqual(int64_absolute_values_fabs.dtype, ht.float64) + self.assertEqual(float64_absolute_values_fabs.dtype, ht.float64) # exceptions # for abs==absolute @@ -92,8 +99,9 @@ def test_abs(self): def test_ceil(self): start, end, step = -5.0, 5.0, 1.4 + float_dtype = torch.float32 if self.is_mps else torch.float64 comparison = torch.arange( - start, end, step, dtype=torch.float64, device=self.device.torch_device + start, end, step, dtype=float_dtype, device=self.device.torch_device ).ceil() # exponential of float32 @@ -105,12 +113,13 @@ def test_ceil(self): self.assertTrue((float32_floor.larray == comparison.float()).all()) # exponential of float64 - float64_tensor = ht.arange(start, end, step, dtype=ht.float64) - float64_floor = float64_tensor.ceil() - self.assertIsInstance(float64_floor, ht.DNDarray) - self.assertEqual(float64_floor.dtype, ht.float64) - self.assertEqual(float64_floor.dtype, ht.float64) - self.assertTrue((float64_floor.larray == comparison).all()) + if not self.is_mps: + float64_tensor = ht.arange(start, end, step, dtype=ht.float64) + float64_floor = float64_tensor.ceil() + self.assertIsInstance(float64_floor, ht.DNDarray) + self.assertEqual(float64_floor.dtype, ht.float64) + self.assertEqual(float64_floor.dtype, ht.float64) + self.assertTrue((float64_floor.larray == comparison).all()) # check exceptions with self.assertRaises(TypeError): @@ -159,12 +168,13 @@ def test_floor(self): self.assertTrue((float32_floor.larray == comparison.float()).all()) # exponential of float64 - float64_tensor = ht.arange(start, end, step, dtype=ht.float64) + 0.01 - float64_floor = float64_tensor.floor() - self.assertIsInstance(float64_floor, ht.DNDarray) - self.assertEqual(float64_floor.dtype, ht.float64) - self.assertEqual(float64_floor.dtype, ht.float64) - self.assertTrue((float64_floor.larray == comparison).all()) + if not self.is_mps: + float64_tensor = ht.arange(start, end, step, dtype=ht.float64) + 0.01 + float64_floor = float64_tensor.floor() + self.assertIsInstance(float64_floor, ht.DNDarray) + self.assertEqual(float64_floor.dtype, ht.float64) + self.assertEqual(float64_floor.dtype, ht.float64) + self.assertTrue((float64_floor.larray == comparison).all()) # check exceptions with self.assertRaises(TypeError): @@ -191,18 +201,19 @@ def test_modf(self): self.assert_array_equal(float32_modf[1], comparison[1]) # exponential of float64 - npArray = np.arange(start, end, step, np.float64) - comparison = np.modf(npArray) + if not self.is_mps: + npArray = np.arange(start, end, step, np.float64) + comparison = np.modf(npArray) - float64_tensor = ht.array(npArray, dtype=ht.float64) - float64_modf = float64_tensor.modf() - self.assertIsInstance(float64_modf[0], ht.DNDarray) - self.assertIsInstance(float64_modf[1], ht.DNDarray) - self.assertEqual(float64_modf[0].dtype, ht.float64) - self.assertEqual(float64_modf[1].dtype, ht.float64) + float64_tensor = ht.array(npArray, dtype=ht.float64) + float64_modf = float64_tensor.modf() + self.assertIsInstance(float64_modf[0], ht.DNDarray) + self.assertIsInstance(float64_modf[1], ht.DNDarray) + self.assertEqual(float64_modf[0].dtype, ht.float64) + self.assertEqual(float64_modf[1].dtype, ht.float64) - self.assert_array_equal(float64_modf[0], comparison[0]) - self.assert_array_equal(float64_modf[1], comparison[1]) + self.assert_array_equal(float64_modf[0], comparison[0]) + self.assert_array_equal(float64_modf[1], comparison[1]) # check exceptions with self.assertRaises(TypeError): @@ -211,8 +222,9 @@ def test_modf(self): ht.modf(object()) with self.assertRaises(TypeError): ht.modf(float32_tensor, 1) - with self.assertRaises(ValueError): - ht.modf(float32_tensor, (float32_tensor, float32_tensor, float64_tensor)) + if not self.is_mps: + with self.assertRaises(ValueError): + ht.modf(float32_tensor, (float32_tensor, float32_tensor, float64_tensor)) with self.assertRaises(TypeError): ht.modf(float32_tensor, (float32_tensor, 2)) @@ -233,23 +245,24 @@ def test_modf(self): self.assert_array_equal(float32_modf_distrbd[1], comparison[1]) # exponential of float64 - npArray = npArray = np.arange(start, end, step, np.float64) - comparison = np.modf(npArray) - - float64_tensor_distrbd = ht.array(npArray, split=0) - float64_modf_distrbd = ( - ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype), - ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype), - ) - # float64_modf_distrbd = float64_tensor_distrbd.modf() - float64_tensor_distrbd.modf(out=float64_modf_distrbd) - self.assertIsInstance(float64_modf_distrbd[0], ht.DNDarray) - self.assertIsInstance(float64_modf_distrbd[1], ht.DNDarray) - self.assertEqual(float64_modf_distrbd[0].dtype, ht.float64) - self.assertEqual(float64_modf_distrbd[1].dtype, ht.float64) - - self.assert_array_equal(float64_modf_distrbd[0], comparison[0]) - self.assert_array_equal(float64_modf_distrbd[1], comparison[1]) + if not self.is_mps: + npArray = npArray = np.arange(start, end, step, np.float64) + comparison = np.modf(npArray) + + float64_tensor_distrbd = ht.array(npArray, split=0) + float64_modf_distrbd = ( + ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype), + ht.zeros_like(float64_tensor_distrbd, dtype=float64_tensor_distrbd.dtype), + ) + # float64_modf_distrbd = float64_tensor_distrbd.modf() + float64_tensor_distrbd.modf(out=float64_modf_distrbd) + self.assertIsInstance(float64_modf_distrbd[0], ht.DNDarray) + self.assertIsInstance(float64_modf_distrbd[1], ht.DNDarray) + self.assertEqual(float64_modf_distrbd[0].dtype, ht.float64) + self.assertEqual(float64_modf_distrbd[1].dtype, ht.float64) + + self.assert_array_equal(float64_modf_distrbd[0], comparison[0]) + self.assert_array_equal(float64_modf_distrbd[1], comparison[1]) def test_round(self): size = ht.communication.MPI_WORLD.size @@ -266,13 +279,14 @@ def test_round(self): self.assert_array_equal(float32_round, comparison) # exponential of float64 - comparison = torch.arange(start, end, step, dtype=torch.float64).round() - float64_tensor = ht.array(comparison, dtype=ht.float64) - float64_round = float64_tensor.round() - self.assertIsInstance(float64_round, ht.DNDarray) - self.assertEqual(float64_round.dtype, ht.float64) - self.assertEqual(float64_round.dtype, ht.float64) - self.assert_array_equal(float64_round, comparison) + if not self.is_mps: + comparison = torch.arange(start, end, step, dtype=torch.float64).round() + float64_tensor = ht.array(comparison, dtype=ht.float64) + float64_round = float64_tensor.round() + self.assertIsInstance(float64_round, ht.DNDarray) + self.assertEqual(float64_round.dtype, ht.float64) + self.assertEqual(float64_round.dtype, ht.float64) + self.assert_array_equal(float64_round, comparison) # check exceptions with self.assertRaises(TypeError): @@ -286,24 +300,25 @@ def test_round(self): # with split tensors - # exponential of float32 - comparison = torch.arange(start, end, step, dtype=torch.float32) # .round() - float32_tensor_distrbd = ht.array(comparison, split=0, dtype=ht.double) - comparison = comparison.round() - float32_round_distrbd = float32_tensor_distrbd.round(dtype=ht.float) - self.assertIsInstance(float32_round_distrbd, ht.DNDarray) - self.assertEqual(float32_round_distrbd.dtype, ht.float32) - self.assert_array_equal(float32_round_distrbd, comparison) - - # exponential of float64 - comparison = torch.arange(start, end, step, dtype=torch.float64) # .round() - float64_tensor_distrbd = ht.array(comparison, split=0) - comparison = comparison.round() - float64_round_distrbd = float64_tensor_distrbd.round() - self.assertIsInstance(float64_round_distrbd, ht.DNDarray) - self.assertEqual(float64_round_distrbd.dtype, ht.float64) - self.assertEqual(float64_round_distrbd.dtype, ht.float64) - self.assert_array_equal(float64_round_distrbd, comparison) + if not self.is_mps: + # exponential of float32 + comparison = torch.arange(start, end, step, dtype=torch.float32) # .round() + float32_tensor_distrbd = ht.array(comparison, split=0, dtype=ht.double) + comparison = comparison.round() + float32_round_distrbd = float32_tensor_distrbd.round(dtype=ht.float) + self.assertIsInstance(float32_round_distrbd, ht.DNDarray) + self.assertEqual(float32_round_distrbd.dtype, ht.float32) + self.assert_array_equal(float32_round_distrbd, comparison) + + # exponential of float64 + comparison = torch.arange(start, end, step, dtype=torch.float64) # .round() + float64_tensor_distrbd = ht.array(comparison, split=0) + comparison = comparison.round() + float64_round_distrbd = float64_tensor_distrbd.round() + self.assertIsInstance(float64_round_distrbd, ht.DNDarray) + self.assertEqual(float64_round_distrbd.dtype, ht.float64) + self.assertEqual(float64_round_distrbd.dtype, ht.float64) + self.assert_array_equal(float64_round_distrbd, comparison) def test_sgn(self): # floats @@ -325,7 +340,9 @@ def test_sgn(self): self.assertEqual(signed.dtype, ht.heat_type_of(comparison)) self.assertEqual(signed.shape, a.shape) self.assertEqual(signed.device, a.device) - self.assertTrue(ht.equal(signed, ht.array(comparison, split=0))) + # complex types only supported on MPS starting from MacOS 14.0+ + if not self.is_mps or platform.mac_ver()[0] >= "14.0": + self.assertTrue(ht.equal(signed, ht.array(comparison, split=0))) def test_sign(self): # floats 1d @@ -339,50 +356,54 @@ def test_sign(self): self.assertEqual(signed.split, a.split) self.assertTrue(ht.equal(signed, comparison)) - # complex + 2d + split - a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=0) - signed = ht.sign(a) - comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=0) - - self.assertEqual(signed.dtype, comparison.dtype) - self.assertEqual(signed.shape, comparison.shape) - self.assertEqual(signed.device, a.device) - self.assertEqual(signed.split, a.split) - self.assertTrue(ht.allclose(signed.real, comparison.real)) - self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) - - # complex + split + out - a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=1) - b = ht.empty_like(a) - signed = ht.sign(a, b) - comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=1) - - self.assertIs(b, signed) - self.assertEqual(signed.dtype, comparison.dtype) - self.assertEqual(signed.shape, comparison.shape) - self.assertEqual(signed.device, a.device) - self.assertEqual(signed.split, a.split) - self.assertTrue(ht.allclose(signed.real, comparison.real)) - self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) - - # zeros + 3d + complex + split - a = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2) - signed = ht.sign(a) - comparison = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2) - - self.assertEqual(signed.dtype, comparison.dtype) - self.assertEqual(signed.shape, comparison.shape) - self.assertEqual(signed.device, a.device) - self.assertEqual(signed.split, a.split) - self.assertTrue(ht.allclose(signed.real, comparison.real)) - self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) + # complex on MPS only from MacOS 14.0+ + if not self.is_mps or platform.mac_ver()[0] >= "14.0": + # complex + 2d + split + a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=0) + signed = ht.sign(a) + comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=0) + + self.assertEqual(signed.dtype, comparison.dtype) + self.assertEqual(signed.shape, comparison.shape) + self.assertEqual(signed.device, a.device) + self.assertEqual(signed.split, a.split) + self.assertTrue(ht.allclose(signed.real, comparison.real)) + self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) + + # complex + split + out + a = ht.array([[1 - 2j, -0.5 + 1j], [0, 4 + 6j]], split=1) + b = ht.empty_like(a) + signed = ht.sign(a, b) + comparison = ht.array([[1 + 0j, -1 + 0j], [0 + 0j, 1 + 0j]], split=1) + + self.assertIs(b, signed) + self.assertEqual(signed.dtype, comparison.dtype) + self.assertEqual(signed.shape, comparison.shape) + self.assertEqual(signed.device, a.device) + self.assertEqual(signed.split, a.split) + self.assertTrue(ht.allclose(signed.real, comparison.real)) + self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) + + # zeros + 3d + complex + split + if not self.is_mps: + # double precision complex not supported on MPS + a = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2) + signed = ht.sign(a) + comparison = ht.zeros((4, 4, 4), dtype=ht.complex128, split=2) + + self.assertEqual(signed.dtype, comparison.dtype) + self.assertEqual(signed.shape, comparison.shape) + self.assertEqual(signed.device, a.device) + self.assertEqual(signed.split, a.split) + self.assertTrue(ht.allclose(signed.real, comparison.real)) + self.assertTrue(ht.allclose(signed.imag, comparison.imag, atol=2e-5)) def test_trunc(self): base_array = np.random.randn(20) + if self.is_mps: + base_array = base_array.astype(np.float32) - comparison = torch.tensor( - base_array, dtype=torch.float64, device=self.device.torch_device - ).trunc() + comparison = torch.tensor(base_array, device=self.device.torch_device).trunc() # trunc of float32 float32_tensor = ht.array(base_array, dtype=ht.float32) @@ -392,11 +413,12 @@ def test_trunc(self): self.assertTrue((float32_floor.larray == comparison.float()).all()) # trunc of float64 - float64_tensor = ht.array(base_array, dtype=ht.float64) - float64_floor = float64_tensor.trunc() - self.assertIsInstance(float64_floor, ht.DNDarray) - self.assertEqual(float64_floor.dtype, ht.float64) - self.assertTrue((float64_floor.larray == comparison).all()) + if not self.is_mps: + float64_tensor = ht.array(base_array, dtype=ht.float64) + float64_floor = float64_tensor.trunc() + self.assertIsInstance(float64_floor, ht.DNDarray) + self.assertEqual(float64_floor.dtype, ht.float64) + self.assertTrue((float64_floor.larray == comparison).all()) # check exceptions with self.assertRaises(TypeError): diff --git a/heat/core/tests/test_sanitation.py b/heat/core/tests/test_sanitation.py index 2e79a08105..fd08a1401f 100644 --- a/heat/core/tests/test_sanitation.py +++ b/heat/core/tests/test_sanitation.py @@ -14,6 +14,17 @@ def test_sanitize_in(self): with self.assertRaises(TypeError): ht.sanitize_in(np_x) + def sanitize_in_nd_realfloating(self): + x = "this is not a DNDarray" + with self.assertRaises(TypeError): + ht.sanitize_in_nd_realfloating(x, "x", [2]) + x = ht.zeros(10, 10, 10, dtype=ht.float32, split=0) + with self.assertRaises(ValueError): + ht.sanitize_in_nd_realfloating(x, "x", [1, 2]) + x = ht.zeros(10, 10, dtype=ht.int32, split=None) + with self.assertRaises(ValueError): + ht.sanitize_in_nd_realfloating(x, "x", [1, 2]) + def test_sanitize_out(self): output_shape = (4, 5, 6) output_split = 1 diff --git a/heat/core/tests/test_signal.py b/heat/core/tests/test_signal.py index 818538cfea..ad3ecea12a 100644 --- a/heat/core/tests/test_signal.py +++ b/heat/core/tests/test_signal.py @@ -30,12 +30,12 @@ def test_convolve(self): with self.assertRaises(TypeError): signal_wrong_type = [0, 1, 2, "tre", 4, "five", 6, "ʻehiku", 8, 9, 10] - ht.convolve(signal_wrong_type, kernel_odd, mode="full") + ht.convolve(signal_wrong_type, kernel_odd, mode="full", stride=1) with self.assertRaises(TypeError): filter_wrong_type = [1, 1, "pizza", "pineapple"] - ht.convolve(dis_signal, filter_wrong_type, mode="full") + ht.convolve(dis_signal, filter_wrong_type, mode="full", stride=1) with self.assertRaises(ValueError): - ht.convolve(dis_signal, kernel_odd, mode="invalid") + ht.convolve(dis_signal, kernel_odd, mode="invalid", stride=1) if dis_signal.comm.size > 1: with self.assertRaises(ValueError): s = dis_signal.reshape((2, -1)).resplit(axis=1) @@ -59,17 +59,19 @@ def test_convolve(self): modes = ["full", "same", "valid"] for i, mode in enumerate(modes): # odd kernel size - conv = ht.convolve(dis_signal, kernel_odd, mode=mode) - gathered = manipulations.resplit(conv, axis=None) - self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) + if not self.is_mps: + # torch convolution does not support int on MPS + conv = ht.convolve(dis_signal, kernel_odd, mode=mode) + gathered = manipulations.resplit(conv, axis=None) + self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) - conv = ht.convolve(dis_signal, dis_kernel_odd, mode=mode) - gathered = manipulations.resplit(conv, axis=None) - self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) + conv = ht.convolve(dis_signal, dis_kernel_odd, mode=mode) + gathered = manipulations.resplit(conv, axis=None) + self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) - conv = ht.convolve(signal, dis_kernel_odd, mode=mode) - gathered = manipulations.resplit(conv, axis=None) - self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) + conv = ht.convolve(signal, dis_kernel_odd, mode=mode).astype(ht.float) + gathered = manipulations.resplit(conv, axis=None) + self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered)) # different data types conv = ht.convolve(dis_signal.astype(ht.float), kernel_odd) @@ -87,17 +89,36 @@ def test_convolve(self): # even kernel size # skip mode 'same' for even kernels if mode != "same": - conv = ht.convolve(dis_signal, kernel_even, mode=mode) - dis_conv = ht.convolve(dis_signal, dis_kernel_even, mode=mode) - gathered = manipulations.resplit(conv, axis=None) - dis_gathered = manipulations.resplit(dis_conv, axis=None) + # int tests not on MPS + if not self.is_mps: + conv = ht.convolve(dis_signal, kernel_even, mode=mode) + dis_conv = ht.convolve(dis_signal, dis_kernel_even, mode=mode) + gathered = manipulations.resplit(conv, axis=None) + dis_gathered = manipulations.resplit(dis_conv, axis=None) - if mode == "full": - self.assertTrue(ht.equal(full_even, gathered)) - self.assertTrue(ht.equal(full_even, dis_gathered)) + if mode == "full": + self.assertTrue(ht.equal(full_even, gathered)) + self.assertTrue(ht.equal(full_even, dis_gathered)) + else: + self.assertTrue(ht.equal(full_even[3:-3], gathered)) + self.assertTrue(ht.equal(full_even[3:-3], dis_gathered)) else: - self.assertTrue(ht.equal(full_even[3:-3], gathered)) - self.assertTrue(ht.equal(full_even[3:-3], dis_gathered)) + # float tests + conv = ht.convolve(dis_signal.astype(ht.float), kernel_even, mode=mode) + dis_conv = ht.convolve( + dis_signal.astype(ht.float), dis_kernel_even.astype(ht.float), mode=mode + ) + gathered = manipulations.resplit(conv, axis=None) + dis_gathered = manipulations.resplit(dis_conv, axis=None) + + if mode == "full": + self.assertTrue(ht.equal(full_even.astype(ht.float), gathered)) + self.assertTrue(ht.equal(full_even.astype(ht.float), dis_gathered)) + else: + self.assertTrue(ht.equal(full_even[3:-3].astype(ht.float), gathered)) + self.assertTrue( + ht.equal(full_even[3:-3].astype(ht.float), dis_gathered) + ) # distributed large signal and kernel np.random.seed(12) @@ -105,27 +126,44 @@ def test_convolve(self): np_b = np.random.randint(1000, size=1543) np_conv = np.convolve(np_a, np_b, mode=mode) - a = ht.array(np_a, split=0, dtype=ht.int32) - b = ht.array(np_b, split=0, dtype=ht.int32) - conv = ht.convolve(a, b, mode=mode) - self.assert_array_equal(conv, np_conv) + if self.is_mps: + # torch convolution only supports float on MPS + a = ht.array(np_a, split=0, dtype=ht.float32) + b = ht.array(np_b, split=0, dtype=ht.float32) + conv = ht.convolve(a, b, mode=mode) + self.assert_array_equal(conv, np_conv.astype(np.float32)) + else: + a = ht.array(np_a, split=0, dtype=ht.int32) + b = ht.array(np_b, split=0, dtype=ht.int32) + conv = ht.convolve(a, b, mode=mode) + self.assert_array_equal(conv, np_conv) # test edge cases # non-distributed signal, size-1 kernel - signal = ht.arange(0, 16).astype(ht.int) - alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) - kernel = ht.ones(1).astype(ht.int) - conv = ht.convolve(alt_signal, kernel) + if self.is_mps: + # torch convolution only supports float on MPS + signal = ht.arange(0, 16, dtype=ht.float32) + alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) + kernel = ht.ones(1, dtype=ht.float32) + conv = ht.convolve(alt_signal, kernel) + else: + signal = ht.arange(0, 16).astype(ht.int) + alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) + kernel = ht.ones(1).astype(ht.int) + conv = ht.convolve(alt_signal, kernel) self.assertTrue(ht.equal(signal, conv)) - conv = ht.convolve(1, 5) - self.assertTrue(ht.equal(ht.array([5]), conv)) + if not self.is_mps: + conv = ht.convolve(1, 5) + self.assertTrue(ht.equal(ht.array([5]), conv)) - # test batched convolutions, distributed along the first axis - signal = ht.random.randn(1000, dtype=ht.float64) - batch_signal = ht.empty((10, 1000), dtype=ht.float64, split=0) + # test batched convolutions + float_dtype = ht.float32 if self.is_mps else ht.float64 + # distributed along the first axis + signal = ht.random.randn(1000, dtype=float_dtype) + batch_signal = ht.empty((10, 1000), dtype=float_dtype, split=0) batch_signal.larray[:] = signal.larray - kernel = ht.random.randn(19, dtype=ht.float64) + kernel = ht.random.randn(19, dtype=float_dtype) batch_convolved = ht.convolve(batch_signal, kernel, mode="same") self.assertTrue(ht.equal(ht.convolve(signal, kernel, mode="same"), batch_convolved[0])) @@ -133,13 +171,13 @@ def test_convolve(self): dis_kernel = ht.array(kernel, split=0) batch_convolved = ht.convolve(batch_signal, dis_kernel) self.assertTrue(ht.equal(ht.convolve(signal, kernel), batch_convolved[0])) - batch_kernel = ht.empty((10, 19), dtype=ht.float64, split=1) + batch_kernel = ht.empty((10, 19), dtype=float_dtype, split=1) batch_kernel.larray[:] = dis_kernel.larray batch_convolved = ht.convolve(batch_signal, batch_kernel, mode="full") self.assertTrue(ht.equal(ht.convolve(signal, kernel, mode="full"), batch_convolved[0])) # n-D batch convolution - batch_signal = ht.empty((4, 3, 3, 1000), dtype=ht.float64, split=1) + batch_signal = ht.empty((4, 3, 3, 1000), dtype=float_dtype, split=1) batch_signal.larray[:, :, :] = signal.larray batch_convolved = ht.convolve(batch_signal, kernel, mode="valid") self.assertTrue( @@ -147,10 +185,260 @@ def test_convolve(self): ) # test batch-convolve exceptions - batch_kernel_wrong_shape = ht.random.randn(3, 19, dtype=ht.float64) + batch_kernel_wrong_shape = ht.random.randn(3, 19, dtype=float_dtype) with self.assertRaises(ValueError): ht.convolve(batch_signal, batch_kernel_wrong_shape) if kernel.comm.size > 1: batch_signal_wrong_split = batch_signal.resplit(-1) with self.assertRaises(ValueError): ht.convolve(batch_signal_wrong_split, kernel) + + def test_only_balanced_kernel(self): + signal = ht.arange(0, 16, split=0).astype(ht.float32) + dis_kernel = ht.array([1, 1, 1], split=0).astype(ht.float32) + + if self.comm.size > 1: + target_map = dis_kernel.lshape_map + target_map[0] = 3 + target_map[1:] = 0 + dis_kernel.redistribute_(dis_kernel.lshape_map, target_map) + with self.assertRaises(ValueError): + ht.convolve(signal, dis_kernel) + + def test_convolve_stride_errors(self): + dis_signal = ht.arange(0, 16, split=0).astype(ht.int) + kernel_odd = ht.ones(3).astype(ht.int) + kernel_even = [1, 1, 1, 1] + + # stride not positive integer + with self.assertRaises(ValueError): + ht.convolve(dis_signal, kernel_even, mode="full", stride=0) + + # stride > 1 for mode 'same' + with self.assertRaises(ValueError): + ht.convolve(dis_signal, kernel_odd, mode="same", stride=2) + + def test_convolve_stride_batch_convolutions(self): + float_dtype = ht.float32 if self.is_mps else ht.float64 + signal = ht.random.randn(1000, dtype=float_dtype) + kernel = ht.random.randn(19, dtype=float_dtype) + + # distributed input along the first axis + stride = 123 + batch_signal = ht.empty((10, 1000), dtype=float_dtype, split=0) + batch_signal.larray[:] = signal.larray + + batch_convolved = ht.convolve(batch_signal, kernel, mode="valid", stride=stride) + self.assertTrue( + ht.equal(ht.convolve(signal, kernel, mode="valid", stride=stride), batch_convolved[0]) + ) + + # distributed kernel + stride = 142 + dis_kernel = ht.array(kernel, split=0) + + batch_convolved = ht.convolve(batch_signal, dis_kernel, stride=stride) + self.assertTrue(ht.equal(ht.convolve(signal, kernel, stride=stride), batch_convolved[0])) + + # batch kernel + stride = 41 + batch_kernel = ht.empty((10, 19), dtype=float_dtype, split=1) + batch_kernel.larray[:] = dis_kernel.larray + + batch_convolved = ht.convolve(batch_signal, kernel, mode="full", stride=stride) + self.assertTrue( + ht.equal(ht.convolve(signal, kernel, mode="full", stride=stride), batch_convolved[0]) + ) + + # n-D batch convolution + stride = 55 + batch_signal = ht.empty((4, 3, 3, 1000), dtype=float_dtype, split=1) + batch_signal.larray[:, :, :] = signal.larray + + batch_convolved = ht.convolve(batch_signal, kernel, mode="valid", stride=stride) + self.assertTrue( + ht.equal( + ht.convolve(signal, kernel, mode="valid", stride=stride), + batch_convolved[1, 2, 0], + ) + ) + + def assert_convolution_stride(self, signal, kernel, mode, stride, solution): + conv = ht.convolve(signal, kernel, mode=mode, stride=stride) + gathered = manipulations.resplit(conv, axis=None) + self.assertTrue(ht.equal(solution, gathered)) + + def test_convolve_stride_kernel_odd_mode_full(self): + + ht_dtype = ht.int + + mode = "full" + stride = 2 + solution = ht.array([0, 3, 9, 15, 21, 27, 33, 39, 29]).astype(ht_dtype) + + dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype) + signal = ht.arange(0, 16).astype(ht_dtype) + kernel = ht.ones(3).astype(ht_dtype) + dis_kernel = ht.ones(3, split=0).astype(ht_dtype) + + # avoid kernel larger than signal chunk + if self.comm.size <= 3: + + if not self.is_mps: + # torch convolution does not support int on MPS + self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution) + self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution) + self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution) + + # different data types of input and kernel + self.assert_convolution_stride( + dis_signal.astype(ht.float), kernel, mode, stride, solution + ) + self.assert_convolution_stride( + signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + self.assert_convolution_stride( + dis_signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + + def test_convolve_stride_kernel_odd_mode_valid(self): + + ht_dtype = ht.int + + mode = "valid" + stride = 2 + solution = ht.array([3, 9, 15, 21, 27, 33, 39]).astype(ht_dtype) + + dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype) + signal = ht.arange(0, 16).astype(ht_dtype) + kernel = ht.ones(3).astype(ht_dtype) + dis_kernel = ht.ones(3, split=0).astype(ht_dtype) + + # avoid kernel larger than signal chunk + if self.comm.size <= 3: + + if not self.is_mps: + # torch convolution does not support int on MPS + self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution) + self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution) + self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution) + + # different data types of input and kernel + self.assert_convolution_stride( + dis_signal.astype(ht.float), kernel, mode, stride, solution + ) + self.assert_convolution_stride( + signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + self.assert_convolution_stride( + dis_signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + + def test_convolve_stride_kernel_even_mode_full(self): + + ht_dtype = ht.int + + mode = "full" + stride = 2 + solution = ht.array([0, 3, 10, 18, 26, 34, 42, 50, 42, 15]).astype(ht_dtype) + + dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype) + signal = ht.arange(0, 16).astype(ht_dtype) + kernel = [1, 1, 1, 1] + dis_kernel = ht.ones(4, split=0).astype(ht_dtype) + + # avoid kernel larger than signal chunk + if self.comm.size <= 3: + + if not self.is_mps: + # torch convolution does not support int on MPS + self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution) + self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution) + self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution) + + # different data types of input and kernel + self.assert_convolution_stride( + dis_signal.astype(ht.float), kernel, mode, stride, solution + ) + self.assert_convolution_stride( + signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + self.assert_convolution_stride( + dis_signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + + def test_convolve_stride_kernel_even_mode_valid(self): + + ht_dtype = ht.int + + mode = "valid" + stride = 2 + solution = ht.array([6, 14, 22, 30, 38, 46, 54]).astype(ht_dtype) + + dis_signal = ht.arange(0, 16, split=0).astype(ht_dtype) + signal = ht.arange(0, 16).astype(ht_dtype) + kernel = [1, 1, 1, 1] + dis_kernel = ht.ones(4, split=0).astype(ht_dtype) + + # avoid kernel larger than signal chunk + if self.comm.size <= 3: + + if not self.is_mps: + # torch convolution does not support int on MPS + self.assert_convolution_stride(dis_signal, kernel, mode, stride, solution) + self.assert_convolution_stride(signal, dis_kernel, mode, stride, solution) + self.assert_convolution_stride(dis_signal, dis_kernel, mode, stride, solution) + + # different data types of input and kernel + self.assert_convolution_stride( + dis_signal.astype(ht.float), kernel, mode, stride, solution + ) + self.assert_convolution_stride( + signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + self.assert_convolution_stride( + dis_signal.astype(ht.float), dis_kernel, mode, stride, solution + ) + + def test_convolution_stride_large_signal_and_kernel_modes(self): + if self.comm.size <= 3: + # prep + np.random.seed(12) + np_a = np.random.randint(1000, size=4418) + np_b = np.random.randint(1000, size=154) + # torch convolution does not support int on MPS + ht_dtype = ht.float32 if self.is_mps else ht.int32 + np_type = np.float32 if self.is_mps else np.int32 + stride = np.random.randint(1, high=len(np_a), size=1)[0] + + for mode in ["full", "valid"]: + # solution + np_conv = np.convolve(np_a, np_b, mode=mode) + solution = np_conv[::stride].astype(np_type) + + # test + a = ht.array(np_a, split=0, dtype=ht_dtype) + b = ht.array(np_b, split=None, dtype=ht_dtype) + conv = ht.convolve(a, b, mode=mode, stride=stride) + self.assert_array_equal(conv, solution) + + b = ht.array(np_b, split=0, dtype=ht_dtype) + conv = ht.convolve(a, b, mode=mode, stride=stride) + self.assert_array_equal(conv, solution) + + def test_convolution_stride_kernel_size_1(self): + + # prep + ht_dtype = ht.float32 if self.is_mps else ht.int32 + + # non-distributed signal + signal = ht.arange(0, 16, dtype=ht_dtype) + alt_signal = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) + kernel = ht.ones(1, dtype=ht_dtype) + conv = ht.convolve(alt_signal, kernel, stride=2) + self.assertTrue(ht.equal(signal[0::2], conv)) + + if not self.is_mps: + for s in [2, 3, 4]: + conv = ht.convolve(1, 5, stride=s) + self.assertTrue(ht.equal(ht.array([5]), conv)) diff --git a/heat/core/tests/test_statistics.py b/heat/core/tests/test_statistics.py index a6024d6b54..64579e7c73 100644 --- a/heat/core/tests/test_statistics.py +++ b/heat/core/tests/test_statistics.py @@ -60,7 +60,7 @@ def test_argmax(self): data = ht.tril(ht.ones((size, size), split=0), k=-1) result = ht.argmax(data, axis=0) - expected = torch.tensor(np.argmax(data.numpy(), axis=0)) + expected = torch.tensor(np.argmax(data.numpy(), axis=0), device=result.larray.device) self.assertIsInstance(result, ht.DNDarray) self.assertEqual(result.dtype, ht.int64) self.assertEqual(result.larray.dtype, torch.int64) @@ -77,7 +77,7 @@ def test_argmax(self): output = ht.empty((size,), dtype=ht.int64) result = ht.argmax(data, axis=0, out=output) - expected = torch.tensor(np.argmax(data.numpy(), axis=0)) + expected = torch.tensor(np.argmax(data.numpy(), axis=0), device=result.larray.device) self.assertIsInstance(result, ht.DNDarray) self.assertEqual(output.dtype, ht.int64) self.assertEqual(output.larray.dtype, torch.int64) @@ -151,7 +151,7 @@ def test_argmin(self): data = ht.triu(ht.ones((size, size), split=0), k=1) result = ht.argmin(data, axis=0) - expected = torch.tensor(np.argmin(data.numpy(), axis=0)) + expected = torch.tensor(np.argmin(data.numpy(), axis=0), device=result.larray.device) self.assertIsInstance(result, ht.DNDarray) self.assertEqual(result.dtype, ht.int64) self.assertEqual(result.larray.dtype, torch.int64) @@ -168,7 +168,7 @@ def test_argmin(self): output = ht.empty((size,), dtype=ht.int64) result = ht.argmin(data, axis=0, out=output) - expected = torch.tensor(np.argmin(data.numpy(), axis=0)) + expected = torch.tensor(np.argmin(data.numpy(), axis=0), device=result.larray.device) self.assertIsInstance(result, ht.DNDarray) self.assertEqual(output.dtype, ht.int64) self.assertEqual(output.larray.dtype, torch.int64) @@ -228,21 +228,25 @@ def test_average(self): self.assertEqual(avg_horizontal.larray.dtype, torch.float32) self.assertTrue((avg_horizontal.numpy() == np.average(comparison, axis=1)).all()) + if self.is_mps: + dtype = torch.float32 + else: + dtype = torch.float64 # check weighted average over all float elements of split 3d tensor, across split axis random_volume = ht.array( - torch.randn((3, 3, 3), dtype=torch.float64, device=self.device.torch_device), is_split=1 + torch.randn((3, 3, 3), dtype=dtype, device=self.device.torch_device), is_split=1 ) size = random_volume.comm.size random_weights = ht.array( - torch.randn((3 * size,), dtype=torch.float64, device=self.device.torch_device), split=0 + torch.randn((3 * size,), dtype=dtype, device=self.device.torch_device), split=0 ) avg_volume = ht.average(random_volume, weights=random_weights, axis=1) np_avg_volume = np.average(random_volume.numpy(), weights=random_weights.numpy(), axis=1) self.assertIsInstance(avg_volume, ht.DNDarray) self.assertEqual(avg_volume.shape, (3, 3)) self.assertEqual(avg_volume.lshape, (3, 3)) - self.assertEqual(avg_volume.dtype, ht.float64) - self.assertEqual(avg_volume.larray.dtype, torch.float64) + self.assertEqual(avg_volume.dtype, ht.types.canonical_heat_type(dtype)) + self.assertEqual(avg_volume.larray.dtype, dtype) self.assertEqual(avg_volume.split, None) self.assertAlmostEqual(avg_volume.numpy().all(), np_avg_volume.all()) avg_volume_with_cumwgt = ht.average( @@ -256,15 +260,15 @@ def test_average(self): # check weighted average over all float elements of split 3d tensor (3d weights) random_weights_3d = ht.array( - torch.randn((3, 3, 3), dtype=torch.float64, device=self.device.torch_device), is_split=1 + torch.randn((3, 3, 3), dtype=dtype, device=self.device.torch_device), is_split=1 ) avg_volume = ht.average(random_volume, weights=random_weights_3d, axis=1) np_avg_volume = np.average(random_volume.numpy(), weights=random_weights.numpy(), axis=1) self.assertIsInstance(avg_volume, ht.DNDarray) self.assertEqual(avg_volume.shape, (3, 3)) self.assertEqual(avg_volume.lshape, (3, 3)) - self.assertEqual(avg_volume.dtype, ht.float64) - self.assertEqual(avg_volume.larray.dtype, torch.float64) + self.assertEqual(avg_volume.dtype, ht.types.canonical_heat_type(dtype)) + self.assertEqual(avg_volume.larray.dtype, dtype) self.assertEqual(avg_volume.split, None) self.assertAlmostEqual(avg_volume.numpy().all(), np_avg_volume.all()) avg_volume_with_cumwgt = ht.average( @@ -344,8 +348,13 @@ def test_bincount(self): w = ht.arange(5) res = ht.bincount(a, weights=w) self.assertEqual(res.size, 5) - self.assertEqual(res.dtype, ht.float64) - self.assertTrue(ht.equal(res, ht.arange(5, dtype=ht.float64))) + if self.is_mps: + # torch.bincount on MPS returns int32 here + self.assertEqual(res.dtype, ht.int32) + self.assertTrue(ht.equal(res, ht.arange(5))) + else: + self.assertEqual(res.dtype, ht.float64) + self.assertTrue(ht.equal(res, ht.arange(5, dtype=ht.float64))) res = ht.bincount(a, minlength=8) self.assertEqual(res.size, 8) @@ -356,8 +365,13 @@ def test_bincount(self): w = ht.arange(4, split=0) res = ht.bincount(a, weights=w) self.assertEqual(res.size, 4) - self.assertEqual(res.dtype, ht.float64) - self.assertTrue(ht.equal(res, ht.arange(4, dtype=ht.float64))) + if self.is_mps: + # torch.bincount on MPS returns int32 here + self.assertEqual(res.dtype, ht.int32) + self.assertTrue(ht.equal(res, ht.arange(4))) + else: + self.assertEqual(res.dtype, ht.float64) + self.assertTrue(ht.equal(res, ht.arange(4, dtype=ht.float64))) with self.assertRaises(ValueError): ht.bincount(ht.array([0, 1, 2, 3], split=0), weights=ht.array([1, 2, 3, 4])) @@ -390,66 +404,73 @@ def test_bucketize(self): ht.bucketize(a, ht.array([0.0, 0.5, 1.0], split=0)) def test_cov(self): - x = ht.array([[0, 2], [1, 1], [2, 0]], dtype=ht.float, split=1).T + if self.is_mps: + dtype = ht.float32 + np_dtype = np.float32 + else: + dtype = ht.float64 + np_dtype = np.float64 + + x = ht.array([[0, 2], [1, 1], [2, 0]], dtype=dtype, split=1).T if x.comm.size < 3: cov = ht.cov(x) actual = ht.array([[1, -1], [-1, 1]], split=0) self.assertTrue(ht.equal(cov, actual)) data = np.loadtxt("heat/datasets/iris.csv", delimiter=";") - np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False) + np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False).astype(np_dtype) # split = None tests htdata = ht.load("heat/datasets/iris.csv", sep=";", split=None) ht_cov = ht.cov(htdata[:, 0], htdata[:, 1:3], rowvar=False) - comp = ht.array(np_cov, dtype=ht.float) + comp = ht.array(np_cov, dtype=dtype) self.assertTrue(ht.allclose(comp - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False) + np_cov = np.cov(data, rowvar=False).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False, ddof=1) + np_cov = np.cov(data, rowvar=False, ddof=1).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False, ddof=1) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False, bias=True) + np_cov = np.cov(data, rowvar=False, bias=True).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False, bias=True) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) # split = 0 tests data = np.loadtxt("heat/datasets/iris.csv", delimiter=";") - np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False) + np_cov = np.cov(data[:, 0], data[:, 1:3], rowvar=False).astype(np_dtype) htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0) ht_cov = ht.cov(htdata[:, 0], htdata[:, 1:3], rowvar=False) comp = ht.array(np_cov, dtype=ht.float) self.assertTrue(ht.allclose(comp - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False) + np_cov = np.cov(data, rowvar=False).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False, ddof=1) + np_cov = np.cov(data, rowvar=False, ddof=1).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False, ddof=1) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) - np_cov = np.cov(data, rowvar=False, bias=True) + np_cov = np.cov(data, rowvar=False, bias=True).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False, bias=True) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float) - ht_cov, 0, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype) - ht_cov, 0, atol=1e-4)) if 1 < x.comm.size < 5: # split 1 tests htdata = ht.load("heat/datasets/iris.csv", sep=";", split=1) - np_cov = np.cov(data, rowvar=False) + np_cov = np.cov(data, rowvar=False).astype(np_dtype) ht_cov = ht.cov(htdata, rowvar=False) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float), ht_cov, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype), ht_cov, atol=1e-4)) - np_cov = np.cov(data, data, rowvar=True) + np_cov = np.cov(data, data, rowvar=True).astype(np_dtype) htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0) ht_cov = ht.cov(htdata, htdata, rowvar=True) - self.assertTrue(ht.allclose(ht.array(np_cov, dtype=ht.float), ht_cov, atol=1e-4)) + self.assertTrue(ht.allclose(ht.array(np_cov, dtype=dtype), ht_cov, atol=1e-4)) htdata = ht.load("heat/datasets/iris.csv", sep=";", split=0) with self.assertRaises(RuntimeError): @@ -516,14 +537,16 @@ def test_digitize(self): ht.digitize(a, ht.array([0.0, 0.5, 1.0], split=0)) def test_histc(self): - # few entries and float64 - c = torch.arange(4, dtype=torch.float64, device=self.device.torch_device) + dtype = torch.float32 if self.is_mps else torch.float64 + + # few entries and (if not MPS) float64 + c = torch.arange(4, dtype=dtype, device=self.device.torch_device) comp = torch.histc(c, 7) a = ht.array(c) res = ht.histc(a, 7) self.assertEqual(res.shape, (7,)) - self.assertEqual(res.dtype, ht.float64) + self.assertEqual(res.dtype, ht.types.canonical_heat_type(dtype)) self.assertEqual(res.device, self.device) self.assertEqual(res.split, None) self.assertTrue(torch.equal(res.larray, comp)) @@ -586,7 +609,7 @@ def test_histc(self): self.assertTrue(torch.equal(out.larray, comp)) # Alias - a = ht.arange(10, dtype=ht.float) + a = ht.arange(10, dtype=dtype) hist = ht.histc(a, 10) alias = ht.histogram(a) @@ -622,7 +645,9 @@ def __split_calc(ht_split, axis): # 1 dim ht_data = ht.random.rand(50) np_data = ht_data.copy().numpy() - np_kurtosis32 = ht.array((ss.kurtosis(np_data, bias=False)), dtype=ht_data.dtype) + np_kurtosis32 = ht.array( + (ss.kurtosis(np_data, bias=False)).astype(np_data.dtype), dtype=ht_data.dtype + ) self.assertAlmostEqual(ht.kurtosis(ht_data), np_kurtosis32.item(), places=5) ht_data = ht.resplit(ht_data, 0) self.assertAlmostEqual(ht.kurtosis(ht_data), np_kurtosis32.item(), places=5) @@ -651,21 +676,23 @@ def __split_calc(ht_split, axis): sp = __split_calc(ht_data.split, ax) self.assertEqual(ht_kurtosis.split, sp) - # 2 dim float64 - ht_data = ht.random.rand(50, 30, dtype=ht.float64) + # 2 dim float64 (if not MPS) + dtype = ht.float64 if not self.is_mps else ht.float32 + ht_data = ht.random.rand(50, 30, dtype=dtype) np_data = ht_data.copy().numpy() - np_kurtosis32 = ss.kurtosis(np_data, axis=None, bias=False) + np_kurtosis32 = ss.kurtosis(np_data, axis=None, bias=False).astype(np_data.dtype) self.assertAlmostEqual(ht.kurtosis(ht_data) - np_kurtosis32, 0, places=5) ht_data = ht.resplit(ht_data, 0) for ax in range(2): np_kurtosis32 = ht.array( - (ss.kurtosis(np_data, axis=ax, bias=False)), dtype=ht_data.dtype + (ss.kurtosis(np_data, axis=ax, bias=False)).astype(np_data.dtype), + dtype=ht_data.dtype, ) ht_kurtosis = ht.kurtosis(ht_data, axis=ax) self.assertTrue(ht.allclose(ht_kurtosis, np_kurtosis32, atol=1e-5)) sp = __split_calc(ht_data.split, ax) self.assertEqual(ht_kurtosis.split, sp) - self.assertEqual(ht_kurtosis.dtype, ht.float64) + self.assertEqual(ht_kurtosis.dtype, dtype) ht_data = ht.resplit(ht_data, 1) for ax in range(2): np_kurtosis32 = ht.array( @@ -675,7 +702,7 @@ def __split_calc(ht_split, axis): self.assertTrue(ht.allclose(ht_kurtosis, np_kurtosis32, atol=1e-5)) sp = __split_calc(ht_data.split, ax) self.assertEqual(ht_kurtosis.split, sp) - self.assertEqual(ht_kurtosis.dtype, ht.float64) + self.assertEqual(ht_kurtosis.dtype, dtype) # 3 dim ht_data = ht.random.rand(50, 30, 16) @@ -819,11 +846,12 @@ def test_maximum(self): self.assertTrue((maximum_volume.numpy() == np_maximum).all()) # check maximum against size-1 array - random_volume_1_split_none = ht.random.randn(1, split=None, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + random_volume_1_split_none = ht.random.randn(1, split=None, dtype=dtype) random_volume_2_splitdiff = ht.random.randn(3, 3, 4, split=1) maximum_volume_splitdiff = ht.maximum(random_volume_1_split_none, random_volume_2_splitdiff) self.assertEqual(maximum_volume_splitdiff.split, 1) - self.assertEqual(maximum_volume_splitdiff.dtype, ht.float64) + self.assertEqual(maximum_volume_splitdiff.dtype, dtype) random_volume_1_split_none = ht.random.randn(3, 3, 4, split=0) random_volume_2_splitdiff = ht.random.randn(1, split=None) @@ -1082,11 +1110,12 @@ def test_minimum(self): self.assertTrue((minimum_volume.numpy() == np_minimum).all()) # check minimum against size-1 array - random_volume_1_split_none = ht.random.randn(1, split=None, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + random_volume_1_split_none = ht.random.randn(1, split=None, dtype=dtype) random_volume_2_splitdiff = ht.random.randn(3, 3, 4, split=1) minimum_volume_splitdiff = ht.minimum(random_volume_1_split_none, random_volume_2_splitdiff) self.assertEqual(minimum_volume_splitdiff.split, 1) - self.assertEqual(minimum_volume_splitdiff.dtype, ht.float64) + self.assertEqual(minimum_volume_splitdiff.dtype, dtype) random_volume_1_split_none = ht.random.randn(3, 3, 4, split=0) random_volume_2_splitdiff = ht.random.randn(1, split=None) @@ -1182,12 +1211,13 @@ def test_percentile(self): # test list q and writing to output buffer q = [0.1, 2.3, 15.9, 50.0, 84.1, 97.7, 99.9] axis = 2 + out_dtype = ht.float32 if self.is_mps else ht.float64 try: p_np = np.percentile(x_np, q, axis=axis, method="lower", keepdims=True) except TypeError: p_np = np.percentile(x_np, q, axis=axis, interpolation="lower", keepdims=True) p_ht = ht.percentile(x_ht, q, axis=axis, interpolation="lower", keepdims=True) - out = ht.empty(p_np.shape, dtype=ht.float64, split=None, device=x_ht.device) + out = ht.empty(p_np.shape, dtype=out_dtype, split=None, device=x_ht.device) ht.percentile(x_ht, q, axis=axis, out=out, interpolation="lower", keepdims=True) self.assertEqual(p_ht.numpy()[5].all(), p_np[5].all()) self.assertEqual(out.numpy()[2].all(), p_np[2].all()) @@ -1225,8 +1255,9 @@ def test_percentile(self): # test tuple axis and out buffer q = (20, 50, 80) + dtype = ht.float32 if self.is_mps else ht.float64 for split in [None, 2, 1, 0]: - x_ht = ht.random.randn(3, 10, 10, dtype=ht.float64, split=split) + x_ht = ht.random.randn(3, 10, 10, dtype=dtype, split=split) x_np = x_ht.numpy() p_np = np.percentile(x_np, q, axis=(0, 1)) if isinstance(split, int) and split == 2: @@ -1250,13 +1281,14 @@ def test_percentile(self): t_out = torch.empty((len(q),), dtype=torch.float64) with self.assertRaises(TypeError): ht.percentile(x_ht, q, out=t_out) - out_wrong_dtype = ht.empty((len(q),), dtype=ht.float32) - with self.assertRaises(TypeError): - ht.percentile(x_ht, q, out=out_wrong_dtype) - out_wrong_shape = ht.empty((len(q) + 1,), dtype=ht.float64) + if not self.is_mps: + out_wrong_dtype = ht.empty((len(q),), dtype=ht.float32) + with self.assertRaises(TypeError): + ht.percentile(x_ht, q, out=out_wrong_dtype) + out_wrong_shape = ht.empty((len(q) + 1,), dtype=dtype) with self.assertRaises(ValueError): ht.percentile(x_ht, q, out=out_wrong_shape) - out_wrong_split = ht.empty((len(q),), dtype=ht.float32, split=0) + out_wrong_split = ht.empty((len(q),), dtype=dtype, split=0) with self.assertRaises(ValueError): ht.percentile(x_ht, q, out=out_wrong_split) @@ -1314,7 +1346,9 @@ def __split_calc(ht_split, axis): # 1 dim ht_data = ht.random.rand(50) np_data = ht_data.copy().numpy() - np_skew32 = ht.array(ss.skew(np_data, bias=False)).astype(ht_data.dtype) + np_skew32 = ht.array(ss.skew(np_data, bias=False).astype(np_data.dtype)).astype( + ht_data.dtype + ) self.assertAlmostEqual(ht.skew(ht_data), np_skew32.item(), places=5) ht_data = ht.resplit(ht_data, 0) self.assertAlmostEqual(ht.skew(ht_data), np_skew32.item(), places=5) @@ -1340,9 +1374,10 @@ def __split_calc(ht_split, axis): self.assertEqual(ht_skew.split, sp) # 2 dim float64 - ht_data = ht.random.rand(50, 30, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + ht_data = ht.random.rand(50, 30, dtype=dtype) np_data = ht_data.copy().numpy() - np_skew32 = ss.skew(np_data, axis=None, bias=False) + np_skew32 = ss.skew(np_data, axis=None, bias=False).astype(np_data.dtype) self.assertAlmostEqual(ht.skew(ht_data) - np_skew32, 0, places=5) ht_data = ht.resplit(ht_data, 0) for ax in range(2): @@ -1351,7 +1386,7 @@ def __split_calc(ht_split, axis): self.assertTrue(ht.allclose(ht_skew, np_skew32, atol=1e-5)) sp = __split_calc(ht_data.split, ax) self.assertEqual(ht_skew.split, sp) - self.assertEqual(ht_skew.dtype, ht.float64) + self.assertEqual(ht_skew.dtype, dtype) ht_data = ht.resplit(ht_data, 1) for ax in range(2): np_skew32 = ht.array((ss.skew(np_data, axis=ax, bias=False)), dtype=ht_data.dtype) @@ -1359,12 +1394,12 @@ def __split_calc(ht_split, axis): self.assertTrue(ht.allclose(ht_skew, np_skew32, atol=1e-5)) sp = __split_calc(ht_data.split, ax) self.assertEqual(ht_skew.split, sp) - self.assertEqual(ht_skew.dtype, ht.float64) + self.assertEqual(ht_skew.dtype, dtype) # 3 dim ht_data = ht.random.rand(50, 30, 16) np_data = ht_data.copy().numpy() - np_skew32 = ss.skew(np_data, axis=None, bias=False) + np_skew32 = ss.skew(np_data, axis=None, bias=False).astype(np_data.dtype) self.assertAlmostEqual(ht.skew(ht_data) - np_skew32, 0, places=5) for split in range(3): ht_data = ht.resplit(ht_data, split) @@ -1379,7 +1414,10 @@ def test_std(self): # test basics a = ht.arange(1, 5) self.assertAlmostEqual(a.std(), 1.118034) - self.assertAlmostEqual(a.std(bessel=True), 1.2909944) + if self.is_mps: + self.assertAlmostEqual(a.std(bessel=True).item(), 1.2909944, places=5) + else: + self.assertAlmostEqual(a.std(bessel=True), 1.2909944) # test raises x = ht.zeros((2, 3, 4)) @@ -1423,7 +1461,10 @@ def test_var(self): ht.var(x, axis=torch.Tensor([0, 0])) a = ht.arange(1, 5) - self.assertEqual(a.var(ddof=1), 1.666666666666666) + if self.is_mps: + self.assertAlmostEqual(a.var(ddof=1).item(), 1.666666666666666, places=5) + else: + self.assertEqual(a.var(ddof=1), 1.666666666666666) # ones dimensions = [] diff --git a/heat/core/tests/test_suites/basic_test.py b/heat/core/tests/test_suites/basic_test.py index a222203d91..79242e297f 100644 --- a/heat/core/tests/test_suites/basic_test.py +++ b/heat/core/tests/test_suites/basic_test.py @@ -1,30 +1,26 @@ -import unittest -import platform import os - -from heat.core import dndarray, MPICommunication, MPI, types, factories -import heat as ht +import platform +import unittest import numpy as np import torch +from typing import Optional, Callable, Any, Union + +import heat as ht +from heat.core import MPI, MPICommunication, dndarray, factories, types, Device # TODO adapt for GPU once this is working properly class TestCase(unittest.TestCase): __comm = MPICommunication() - __device = None - _hostnames: list[str] = None - - @property - def comm(self): - return TestCase.__comm + device: Device = ht.cpu + _hostnames: Optional[list[str]] = None + other_device: Optional[Device] = None + envar: Optional[str] = None - @property - def device(self): - return TestCase.__device @classmethod - def setUpClass(cls): + def setUpClass(cls) -> None: """ Read the environment variable 'HEAT_TEST_USE_DEVICE' and return the requested devices. Supported values @@ -36,8 +32,8 @@ def setUpClass(cls): RuntimeError if value of 'HEAT_TEST_USE_DEVICE' is not recognized """ - envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") + is_mps = False if envar == "cpu": ht.use_device("cpu") @@ -46,35 +42,48 @@ def setUpClass(cls): if torch.cuda.is_available(): torch.cuda.set_device(torch.device(ht.gpu.torch_device)) other_device = ht.gpu - elif envar == "gpu" and torch.cuda.is_available(): - ht.use_device("gpu") - torch.cuda.set_device(torch.device(ht.gpu.torch_device)) - ht_device = ht.gpu - other_device = ht.cpu + elif torch.backends.mps.is_built() and torch.backends.mps.is_available(): + other_device = ht.gpu + elif envar == "gpu": + if torch.cuda.is_available(): + ht.use_device("gpu") + torch.cuda.set_device(torch.device(ht.gpu.torch_device)) + ht_device = ht.gpu + other_device = ht.cpu + elif torch.backends.mps.is_built() and torch.backends.mps.is_available(): + ht.use_device("gpu") + ht_device = ht.gpu + other_device = ht.cpu + is_mps = True else: raise RuntimeError( f"Value '{envar}' of environment variable 'HEAT_TEST_USE_DEVICE' is unsupported" ) - cls.device, cls.other_device, cls.envar = ht_device, other_device, envar + cls.device, cls.other_device, cls.envar, cls.is_mps = ht_device, other_device, envar, is_mps + + @property + def comm(self) -> MPICommunication: + return self.__comm - def get_rank(self): + + def get_rank(self) -> Optional[int]: return self.comm.rank - def get_size(self): + def get_size(self) -> Optional[int]: return self.comm.size @classmethod - def get_hostnames(cls): + def get_hostnames(cls) -> list[str]: if not cls._hostnames: if platform.system() == "Windows": host = platform.uname().node else: host = os.uname()[1] - cls._hostnames = set(cls.__comm.handle.allgather(host)) + cls._hostnames = list(set(cls.__comm.handle.allgather(host))) return cls._hostnames - def assert_array_equal(self, heat_array, expected_array): + def assert_array_equal(self, heat_array: ht.DNDarray, expected_array: Union[np.ndarray,torch.Tensor], rtol:float=1e-5, atol:float=1e-08) -> None: """ Check if the heat_array is equivalent to the expected_array. Therefore first the split heat_array is compared to the corresponding expected_array slice locally and second the heat_array is combined and fully compared with the @@ -142,23 +151,24 @@ def assert_array_equal(self, heat_array, expected_array): ) # compare local tensors to corresponding slice of expected_array is_allclose = torch.tensor( - np.allclose(heat_array.larray.cpu(), expected_array[slices]), dtype=torch.int32 + np.allclose(heat_array.larray.cpu(), expected_array[slices], atol=atol, rtol=rtol), + dtype=torch.int32, ) heat_array.comm.Allreduce(MPI.IN_PLACE, is_allclose, MPI.SUM) self.assertTrue(is_allclose == heat_array.comm.size) def assert_func_equal( self, - shape, - heat_func, - numpy_func, - distributed_result=True, - heat_args=None, - numpy_args=None, - data_types=(np.int32, np.int64, np.float32, np.float64), - low=-10000, - high=10000, - ): + shape: Union[tuple[Any, ...],list[Any]], + heat_func: Callable[..., Any], + numpy_func: Callable[..., Any], + distributed_result: bool=True, + heat_args: Optional[dict[str, Any]]=None, + numpy_args:Optional[dict[str, Any]]=None, + data_types: tuple[type,...]=(np.int32, np.int64, np.float32, np.float64), + low:int=-10000, + high:int=10000, + ) -> None: """ This function will create random tensors of the given shape with different data types. All of these tensors will be tested with `ht.assert_func_equal_for_tensor`. @@ -192,7 +202,7 @@ def assert_func_equal( Raises ------ - AssertionError if the functions to not perform equally. + AssertionError if the functions do not perform equally. Examples -------- @@ -204,13 +214,19 @@ def assert_func_equal( AssertionError: [...] >>> self.assert_func_equal((1, 3, 5), ht.any, np.any, distributed_result=False) - >>> heat_args = {'sorted': True, 'axis': 0} - >>> numpy_args = {'axis': 0} - >>> self.assert_func_equal([5, 5, 5, 5], ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args) + >>> heat_args = {"sorted": True, "axis": 0} + >>> numpy_args = {"axis": 0} + >>> self.assert_func_equal( + ... [5, 5, 5, 5], ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args + ... ) """ if not isinstance(shape, tuple) and not isinstance(shape, list): raise ValueError(f"The shape must be either a list or a tuple but was {type(shape)}") + if self.is_mps and np.float64 in data_types: + # MPS does not support float64 + data_types = [dtype for dtype in data_types if dtype != np.float64] + for dtype in data_types: tensor = self.__create_random_np_array(shape, dtype=dtype, low=low, high=high) self.assert_func_equal_for_tensor( @@ -224,13 +240,13 @@ def assert_func_equal( def assert_func_equal_for_tensor( self, - tensor, - heat_func, - numpy_func, - heat_args=None, - numpy_args=None, - distributed_result=True, - ): + tensor: Union[np.ndarray,torch.Tensor], + heat_func: Callable[..., Any], + numpy_func: Callable[..., Any], + heat_args:Optional[dict[str,Any]]=None, + numpy_args:Optional[dict[str,Any]]=None, + distributed_result:bool=True, + ) -> None: """ This function tests if the heat function and the numpy function create the equal result on the given tensor. @@ -268,9 +284,11 @@ def assert_func_equal_for_tensor( >>> self.assert_func_equal_for_tensor(a, ht.any, np.any, distributed_result=False) >>> a = torch.ones([5, 5, 5, 5]) - >>> heat_args = {'sorted': True, 'axis': 0} - >>> numpy_args = {'axis': 0} - >>> self.assert_func_equal_for_tensor(a, ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args) + >>> heat_args = {"sorted": True, "axis": 0} + >>> numpy_args = {"axis": 0} + >>> self.assert_func_equal_for_tensor( + ... a, ht.unique, np.unique, heat_arg=heat_args, numpy_args=numpy_args + ... ) """ self.assertTrue(callable(heat_func)) self.assertTrue(callable(numpy_func)) @@ -310,12 +328,12 @@ def assert_func_equal_for_tensor( else: self.assertTrue(np.array_equal(ht_res.larray.cpu().numpy(), np_res)) - def assertTrue_memory_layout(self, tensor, order): + def assertTrue_memory_layout(self, tensor: ht.DNDarray, order: str) -> None: """ Checks that the memory layout of a given heat tensor is as specified by argument order. - Parameters: - ----------- + Parameters + ---------- order: str, 'C' for C-like (row-major), 'F' for Fortran-like (column-major) memory layout. """ stride = tensor.larray.stride() @@ -328,7 +346,7 @@ def assertTrue_memory_layout(self, tensor, order): else: raise ValueError(f"expected order to be 'C' or 'F', but was {order}") - def __create_random_np_array(self, shape, dtype=np.float32, low=-10000, high=10000): + def __create_random_np_array(self, shape: Union[list[Any],tuple[Any]], dtype:type=np.float32, low:int=-10000, high:int=10000) -> np.ndarray: """ Creates a random array based on the input parameters. The used seed will be printed to stdout for debugging purposes. @@ -364,7 +382,7 @@ def __create_random_np_array(self, shape, dtype=np.float32, low=-10000, high=100 self.comm.Bcast(seed, root=0) np.random.seed(seed=seed.item()) if issubclass(dtype, np.floating): - array = np.random.randn(*shape) + array: np.ndarray = np.random.randn(*shape) elif issubclass(dtype, np.integer): array = np.random.randint(low=low, high=high, size=shape) else: diff --git a/heat/core/tests/test_tiling.py b/heat/core/tests/test_tiling.py index f94784a33e..b6e00c3161 100644 --- a/heat/core/tests/test_tiling.py +++ b/heat/core/tests/test_tiling.py @@ -1,9 +1,17 @@ +import os +import platform +import unittest + import torch import heat as ht from .test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.machine() == "arm64" + +@unittest.skipIf(is_mps, "Distribution not supported on Apple MPS") class TestSplitTiles(TestCase): # most of the cases are covered by the resplit tests def test_raises(self): diff --git a/heat/core/tests/test_trigonometrics.py b/heat/core/tests/test_trigonometrics.py index 9202550589..7e09472b86 100644 --- a/heat/core/tests/test_trigonometrics.py +++ b/heat/core/tests/test_trigonometrics.py @@ -10,7 +10,7 @@ def test_arccos(self): # base elements elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0] comparison = torch.tensor( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).acos() # arccos of float32 @@ -18,19 +18,20 @@ def test_arccos(self): float32_arccos = ht.acos(float32_tensor) self.assertIsInstance(float32_arccos, ht.DNDarray) self.assertEqual(float32_arccos.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_arccos.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_arccos.larray, comparison)) - # arccos of float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_arccos = ht.arccos(float64_tensor) - self.assertIsInstance(float64_arccos, ht.DNDarray) - self.assertEqual(float64_arccos.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_arccos.larray.double(), comparison)) + if not self.is_mps: + # arccos of float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_arccos = ht.arccos(float64_tensor) + self.assertIsInstance(float64_arccos, ht.DNDarray) + self.assertEqual(float64_arccos.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_arccos.larray, comparison.double())) # arccos of value out of domain nan_tensor = ht.array([1.2]) nan_arccos = ht.arccos(nan_tensor) - self.assertIsInstance(float64_arccos, ht.DNDarray) + self.assertIsInstance(nan_arccos, ht.DNDarray) self.assertEqual(nan_arccos.dtype, ht.float32) self.assertTrue(math.isnan(nan_arccos.larray.item())) @@ -43,7 +44,7 @@ def test_arccos(self): def test_acosh(self): # base elements comparison = torch.arange( - 1, 31, dtype=torch.float64, device=self.device.torch_device + 1, 31, dtype=torch.float32, device=self.device.torch_device ).acosh() # acosh of float32 @@ -51,28 +52,33 @@ def test_acosh(self): float32_acosh = ht.acosh(float32_tensor) self.assertIsInstance(float32_acosh, ht.DNDarray) self.assertEqual(float32_acosh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_acosh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_acosh.larray, comparison)) - # acosh of float64 - float64_tensor = ht.arange(1, 31, dtype=ht.float64) - float64_acosh = ht.acosh(float64_tensor) - self.assertIsInstance(float64_acosh, ht.DNDarray) - self.assertEqual(float64_acosh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_acosh.larray.double(), comparison)) + if not self.is_mps: + # acosh of float64 + float64_tensor = ht.arange(1, 31, dtype=ht.float64) + float64_acosh = ht.acosh(float64_tensor) + self.assertIsInstance(float64_acosh, ht.DNDarray) + self.assertEqual(float64_acosh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_acosh.larray, comparison.double())) # acosh of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(1, 31, dtype=ht.int32) int32_acosh = ht.acosh(int32_tensor) self.assertIsInstance(int32_acosh, ht.DNDarray) self.assertEqual(int32_acosh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_acosh.larray.double(), comparison)) + self.assertTrue(torch.allclose(int32_acosh.larray, comparison)) # acosh of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(1, 31, dtype=ht.int64) int64_acosh = ht.arccosh(int64_tensor) self.assertIsInstance(int64_acosh, ht.DNDarray) - self.assertEqual(int64_acosh.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_acosh.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_acosh.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_acosh.larray, comparison)) + else: + self.assertEqual(int64_acosh.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_acosh.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -84,7 +90,7 @@ def test_arcsin(self): # base elements elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0] comparison = torch.tensor( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).asin() # arcsin of float32 @@ -92,19 +98,20 @@ def test_arcsin(self): float32_arcsin = ht.asin(float32_tensor) self.assertIsInstance(float32_arcsin, ht.DNDarray) self.assertEqual(float32_arcsin.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_arcsin.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_arcsin.larray, comparison)) - # arcsin of float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_arcsin = ht.arcsin(float64_tensor) - self.assertIsInstance(float64_arcsin, ht.DNDarray) - self.assertEqual(float64_arcsin.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_arcsin.larray.double(), comparison)) + if not self.is_mps: + # arcsin of float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_arcsin = ht.arcsin(float64_tensor) + self.assertIsInstance(float64_arcsin, ht.DNDarray) + self.assertEqual(float64_arcsin.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_arcsin.larray, comparison.double())) # arcsin of value out of domain nan_tensor = ht.array([1.2]) nan_arcsin = ht.arcsin(nan_tensor) - self.assertIsInstance(float64_arcsin, ht.DNDarray) + self.assertIsInstance(nan_arcsin, ht.DNDarray) self.assertEqual(nan_arcsin.dtype, ht.float32) self.assertTrue(math.isnan(nan_arcsin.larray.item())) @@ -118,7 +125,7 @@ def test_asinh(self): # base elements elements = 30 comparison = torch.linspace( - -28, 30, elements, dtype=torch.float64, device=self.device.torch_device + -28, 30, elements, dtype=torch.float32, device=self.device.torch_device ).asinh() # asinh of float32 @@ -126,28 +133,33 @@ def test_asinh(self): float32_asinh = ht.asinh(float32_tensor) self.assertIsInstance(float32_asinh, ht.DNDarray) self.assertEqual(float32_asinh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_asinh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_asinh.larray, comparison)) - # asinh of float64 - float64_tensor = ht.linspace(-28, 30, elements, dtype=ht.float64) - float64_asinh = ht.asinh(float64_tensor) - self.assertIsInstance(float64_asinh, ht.DNDarray) - self.assertEqual(float64_asinh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_asinh.larray.double(), comparison)) + if not self.is_mps: + # asinh of float64 + float64_tensor = ht.linspace(-28, 30, elements, dtype=ht.float64) + float64_asinh = ht.asinh(float64_tensor) + self.assertIsInstance(float64_asinh, ht.DNDarray) + self.assertEqual(float64_asinh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_asinh.larray, comparison.double())) # asinh of ints, automatic conversion to intermediate floats int32_tensor = ht.linspace(-28, 30, elements, dtype=ht.int32) int32_asinh = ht.asinh(int32_tensor) self.assertIsInstance(int32_asinh, ht.DNDarray) self.assertEqual(int32_asinh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_asinh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_asinh.larray, comparison)) # asinh of longs, automatic conversion to intermediate floats int64_tensor = ht.linspace(-28, 30, elements, dtype=ht.int64) int64_asinh = ht.arcsinh(int64_tensor) self.assertIsInstance(int64_asinh, ht.DNDarray) - self.assertEqual(int64_asinh.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_asinh.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_asinh.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_asinh.larray, comparison)) + else: + self.assertEqual(int64_asinh.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_asinh.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -159,7 +171,7 @@ def test_arctan(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).atan() # arctan of float32 @@ -167,28 +179,33 @@ def test_arctan(self): float32_arctan = ht.arctan(float32_tensor) self.assertIsInstance(float32_arctan, ht.DNDarray) self.assertEqual(float32_arctan.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_arctan.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_arctan.larray, comparison)) - # arctan of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_arctan = ht.arctan(float64_tensor) - self.assertIsInstance(float64_arctan, ht.DNDarray) - self.assertEqual(float64_arctan.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_arctan.larray.double(), comparison)) + if not self.is_mps: + # arctan of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_arctan = ht.arctan(float64_tensor) + self.assertIsInstance(float64_arctan, ht.DNDarray) + self.assertEqual(float64_arctan.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_arctan.larray, comparison.double())) # arctan of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_arctan = ht.arctan(int32_tensor) self.assertIsInstance(int32_arctan, ht.DNDarray) self.assertEqual(int32_arctan.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_arctan.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_arctan.larray, comparison)) # arctan of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_arctan = ht.atan(int64_tensor) self.assertIsInstance(int64_arctan, ht.DNDarray) - self.assertEqual(int64_arctan.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_arctan.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_arctan.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_arctan.larray, comparison)) + else: + self.assertEqual(int64_arctan.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_arctan.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -207,25 +224,25 @@ def test_arctan2(self): self.assertEqual(float32_arctan2.dtype, ht.float32) self.assertTrue(torch.allclose(float32_arctan2.larray, float32_comparison)) - float64_y = torch.randn(30, dtype=torch.float64, device=self.device.torch_device) - float64_x = torch.randn(30, dtype=torch.float64, device=self.device.torch_device) + if not self.is_mps: + float64_y = torch.randn(30, dtype=torch.float64, device=self.device.torch_device) + float64_x = torch.randn(30, dtype=torch.float64, device=self.device.torch_device) - float64_comparison = torch.atan2(float64_y, float64_x) - float64_arctan2 = ht.atan2(ht.array(float64_y), ht.array(float64_x)) + float64_comparison = torch.atan2(float64_y, float64_x) + float64_arctan2 = ht.atan2(ht.array(float64_y), ht.array(float64_x)) - self.assertIsInstance(float64_arctan2, ht.DNDarray) - self.assertEqual(float64_arctan2.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_arctan2.larray, float64_comparison)) + self.assertIsInstance(float64_arctan2, ht.DNDarray) + self.assertEqual(float64_arctan2.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_arctan2.larray, float64_comparison)) # Rare Special Case with integers - int32_x = ht.array([-1, +1, +1, -1]) - int32_y = ht.array([-1, -1, +1, +1]) - - int32_comparison = ht.array([-135.0, -45.0, 45.0, 135.0], dtype=ht.float64) + int32_x = ht.array([-1, +1, +1, -1], dtype=ht.int32) + int32_y = ht.array([-1, -1, +1, +1], dtype=ht.int32) + int32_comparison = ht.array([-135.0, -45.0, 45.0, 135.0], dtype=ht.float32) int32_arctan2 = ht.arctan2(int32_y, int32_x) * 180 / ht.pi self.assertIsInstance(int32_arctan2, ht.DNDarray) - self.assertEqual(int32_arctan2.dtype, ht.float64) + self.assertEqual(int32_arctan2.dtype, ht.float32) self.assertTrue(ht.allclose(int32_arctan2, int32_comparison)) int16_x = ht.array([-1, +1, +1, -1], dtype=ht.int16) @@ -242,7 +259,7 @@ def test_atanh(self): # base elements elements = [-1.0, -0.83, -0.12, 0.0, 0.24, 0.67, 1.0] comparison = torch.tensor( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).atanh() # atanh of float32 @@ -250,19 +267,20 @@ def test_atanh(self): float32_atanh = ht.atanh(float32_tensor) self.assertIsInstance(float32_atanh, ht.DNDarray) self.assertEqual(float32_atanh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_atanh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_atanh.larray, comparison)) - # atanh of float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_atanh = ht.atanh(float64_tensor) - self.assertIsInstance(float64_atanh, ht.DNDarray) - self.assertEqual(float64_atanh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_atanh.larray.double(), comparison)) + if not self.is_mps: + # atanh of float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_atanh = ht.atanh(float64_tensor) + self.assertIsInstance(float64_atanh, ht.DNDarray) + self.assertEqual(float64_atanh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_atanh.larray, comparison.double())) # atanh of value out of domain nan_tensor = ht.array([1.2]) nan_atanh = ht.arctanh(nan_tensor) - self.assertIsInstance(float64_atanh, ht.DNDarray) + self.assertIsInstance(nan_atanh, ht.DNDarray) self.assertEqual(nan_atanh.dtype, ht.float32) self.assertTrue(math.isnan(nan_atanh.larray.item())) @@ -277,7 +295,7 @@ def test_degrees(self): elements = [0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14] comparison = ( 180.0 - * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device) + * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device) / 3.141592653589793 ) @@ -286,14 +304,15 @@ def test_degrees(self): float32_degrees = ht.degrees(float32_tensor) self.assertIsInstance(float32_degrees, ht.DNDarray) self.assertEqual(float32_degrees.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_degrees.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_degrees.larray, comparison)) - # degrees with float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_degrees = ht.degrees(float64_tensor) - self.assertIsInstance(float64_degrees, ht.DNDarray) - self.assertEqual(float64_degrees.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_degrees.larray.double(), comparison)) + if not self.is_mps: + # degrees with float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_degrees = ht.degrees(float64_tensor) + self.assertIsInstance(float64_degrees, ht.DNDarray) + self.assertEqual(float64_degrees.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_degrees.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -306,7 +325,7 @@ def test_deg2rad(self): elements = [0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0] comparison = ( 3.141592653589793 - * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device) + * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device) / 180.0 ) @@ -315,14 +334,15 @@ def test_deg2rad(self): float32_deg2rad = ht.deg2rad(float32_tensor) self.assertIsInstance(float32_deg2rad, ht.DNDarray) self.assertEqual(float32_deg2rad.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_deg2rad.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_deg2rad.larray, comparison)) - # deg2rad with float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_deg2rad = ht.deg2rad(float64_tensor) - self.assertIsInstance(float64_deg2rad, ht.DNDarray) - self.assertEqual(float64_deg2rad.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_deg2rad.larray.double(), comparison)) + if not self.is_mps: + # deg2rad with float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_deg2rad = ht.deg2rad(float64_tensor) + self.assertIsInstance(float64_deg2rad, ht.DNDarray) + self.assertEqual(float64_deg2rad.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_deg2rad.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -334,7 +354,7 @@ def test_cos(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).cos() # cosine of float32 @@ -342,28 +362,33 @@ def test_cos(self): float32_cos = ht.cos(float32_tensor) self.assertIsInstance(float32_cos, ht.DNDarray) self.assertEqual(float32_cos.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_cos.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_cos.larray, comparison)) - # cosine of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_cos = ht.cos(float64_tensor) - self.assertIsInstance(float64_cos, ht.DNDarray) - self.assertEqual(float64_cos.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_cos.larray.double(), comparison)) + if not self.is_mps: + # cosine of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_cos = ht.cos(float64_tensor) + self.assertIsInstance(float64_cos, ht.DNDarray) + self.assertEqual(float64_cos.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_cos.larray, comparison.double())) # cosine of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_cos = ht.cos(int32_tensor) self.assertIsInstance(int32_cos, ht.DNDarray) self.assertEqual(int32_cos.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_cos.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_cos.larray, comparison)) # cosine of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_cos = int64_tensor.cos() self.assertIsInstance(int64_cos, ht.DNDarray) - self.assertEqual(int64_cos.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_cos.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_cos.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_cos.larray, comparison)) + else: + self.assertEqual(int64_cos.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_cos.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -375,7 +400,7 @@ def test_cosh(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).cosh() # hyperbolic cosine of float32 @@ -383,28 +408,33 @@ def test_cosh(self): float32_cosh = float32_tensor.cosh() self.assertIsInstance(float32_cosh, ht.DNDarray) self.assertEqual(float32_cosh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_cosh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_cosh.larray, comparison)) - # hyperbolic cosine of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_cosh = ht.cosh(float64_tensor) - self.assertIsInstance(float64_cosh, ht.DNDarray) - self.assertEqual(float64_cosh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_cosh.larray.double(), comparison)) + if not self.is_mps: + # hyperbolic cosine of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_cosh = ht.cosh(float64_tensor) + self.assertIsInstance(float64_cosh, ht.DNDarray) + self.assertEqual(float64_cosh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_cosh.larray, comparison.double())) # hyperbolic cosine of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_cosh = ht.cosh(int32_tensor) self.assertIsInstance(int32_cosh, ht.DNDarray) self.assertEqual(int32_cosh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_cosh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_cosh.larray, comparison)) # hyperbolic cosine of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_cosh = ht.cosh(int64_tensor) self.assertIsInstance(int64_cosh, ht.DNDarray) - self.assertEqual(int64_cosh.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_cosh.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_cosh.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_cosh.larray, comparison)) + else: + self.assertEqual(int64_cosh.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_cosh.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -417,7 +447,7 @@ def test_rad2deg(self): elements = [0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14] comparison = ( 180.0 - * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device) + * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device) / 3.141592653589793 ) @@ -426,14 +456,15 @@ def test_rad2deg(self): float32_rad2deg = ht.rad2deg(float32_tensor) self.assertIsInstance(float32_rad2deg, ht.DNDarray) self.assertEqual(float32_rad2deg.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_rad2deg.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_rad2deg.larray, comparison)) - # rad2deg with float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_rad2deg = ht.rad2deg(float64_tensor) - self.assertIsInstance(float64_rad2deg, ht.DNDarray) - self.assertEqual(float64_rad2deg.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_rad2deg.larray.double(), comparison)) + if not self.is_mps: + # rad2deg with float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_rad2deg = ht.rad2deg(float64_tensor) + self.assertIsInstance(float64_rad2deg, ht.DNDarray) + self.assertEqual(float64_rad2deg.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_rad2deg.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -446,7 +477,7 @@ def test_radians(self): elements = [0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0] comparison = ( 3.141592653589793 - * torch.tensor(elements, dtype=torch.float64, device=self.device.torch_device) + * torch.tensor(elements, dtype=torch.float32, device=self.device.torch_device) / 180.0 ) @@ -455,14 +486,15 @@ def test_radians(self): float32_radians = ht.radians(float32_tensor) self.assertIsInstance(float32_radians, ht.DNDarray) self.assertEqual(float32_radians.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_radians.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_radians.larray, comparison)) - # radians with float64 - float64_tensor = ht.array(elements, dtype=ht.float64) - float64_radians = ht.radians(float64_tensor) - self.assertIsInstance(float64_radians, ht.DNDarray) - self.assertEqual(float64_radians.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_radians.larray.double(), comparison)) + if not self.is_mps: + # radians with float64 + float64_tensor = ht.array(elements, dtype=ht.float64) + float64_radians = ht.radians(float64_tensor) + self.assertIsInstance(float64_radians, ht.DNDarray) + self.assertEqual(float64_radians.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_radians.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -474,7 +506,7 @@ def test_sin(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).sin() # sine of float32 @@ -482,28 +514,33 @@ def test_sin(self): float32_sin = float32_tensor.sin() self.assertIsInstance(float32_sin, ht.DNDarray) self.assertEqual(float32_sin.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_sin.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_sin.larray, comparison)) - # sine of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_sin = ht.sin(float64_tensor) - self.assertIsInstance(float64_sin, ht.DNDarray) - self.assertEqual(float64_sin.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_sin.larray.double(), comparison)) + if not self.is_mps: + # sine of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_sin = ht.sin(float64_tensor) + self.assertIsInstance(float64_sin, ht.DNDarray) + self.assertEqual(float64_sin.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_sin.larray, comparison.double())) # sine of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_sin = ht.sin(int32_tensor) self.assertIsInstance(int32_sin, ht.DNDarray) self.assertEqual(int32_sin.dtype, ht.float32) - self.assertTrue(torch.allclose(int32_sin.larray.double(), comparison)) + self.assertTrue(torch.allclose(int32_sin.larray, comparison)) # sine of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_sin = ht.sin(int64_tensor) self.assertIsInstance(int64_sin, ht.DNDarray) - self.assertEqual(int64_sin.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_sin.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_sin.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_sin.larray, comparison)) + else: + self.assertEqual(int64_sin.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_sin.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -515,7 +552,7 @@ def test_sinh(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).sinh() # hyperbolic sine of float32 @@ -523,28 +560,33 @@ def test_sinh(self): float32_sinh = float32_tensor.sinh() self.assertIsInstance(float32_sinh, ht.DNDarray) self.assertEqual(float32_sinh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_sinh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_sinh.larray, comparison)) - # hyperbolic sine of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_sinh = ht.sinh(float64_tensor) - self.assertIsInstance(float64_sinh, ht.DNDarray) - self.assertEqual(float64_sinh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_sinh.larray.double(), comparison)) + if not self.is_mps: + # hyperbolic sine of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_sinh = ht.sinh(float64_tensor) + self.assertIsInstance(float64_sinh, ht.DNDarray) + self.assertEqual(float64_sinh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_sinh.larray, comparison.double())) # hyperbolic sine of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_sinh = ht.sinh(int32_tensor) self.assertIsInstance(int32_sinh, ht.DNDarray) self.assertEqual(int32_sinh.dtype, ht.float32) - self.assertTrue(torch.allclose(int32_sinh.larray.double(), comparison)) + self.assertTrue(torch.allclose(int32_sinh.larray, comparison)) # hyperbolic sine of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_sinh = ht.sinh(int64_tensor) self.assertIsInstance(int64_sinh, ht.DNDarray) - self.assertEqual(int64_sinh.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_sinh.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_sinh.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_sinh.larray, comparison)) + else: + self.assertEqual(int64_sinh.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_sinh.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -556,7 +598,7 @@ def test_tan(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).tan() # tangent of float32 @@ -564,28 +606,33 @@ def test_tan(self): float32_tan = float32_tensor.tan() self.assertIsInstance(float32_tan, ht.DNDarray) self.assertEqual(float32_tan.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_tan.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_tan.larray, comparison)) - # tangent of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_tan = ht.tan(float64_tensor) - self.assertIsInstance(float64_tan, ht.DNDarray) - self.assertEqual(float64_tan.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_tan.larray.double(), comparison)) + if not self.is_mps: + # tangent of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_tan = ht.tan(float64_tensor) + self.assertIsInstance(float64_tan, ht.DNDarray) + self.assertEqual(float64_tan.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_tan.larray, comparison.double())) # tangent of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_tan = ht.tan(int32_tensor) self.assertIsInstance(int32_tan, ht.DNDarray) self.assertEqual(int32_tan.dtype, ht.float32) - self.assertTrue(torch.allclose(int32_tan.larray.double(), comparison)) + self.assertTrue(torch.allclose(int32_tan.larray, comparison)) # tangent of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_tan = ht.tan(int64_tensor) self.assertIsInstance(int64_tan, ht.DNDarray) - self.assertEqual(int64_tan.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_tan.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_tan.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_tan.larray, comparison)) + else: + self.assertEqual(int64_tan.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_tan.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): @@ -597,7 +644,7 @@ def test_tanh(self): # base elements elements = 30 comparison = torch.arange( - elements, dtype=torch.float64, device=self.device.torch_device + elements, dtype=torch.float32, device=self.device.torch_device ).tanh() # hyperbolic tangent of float32 @@ -605,28 +652,33 @@ def test_tanh(self): float32_tanh = float32_tensor.tanh() self.assertIsInstance(float32_tanh, ht.DNDarray) self.assertEqual(float32_tanh.dtype, ht.float32) - self.assertTrue(torch.allclose(float32_tanh.larray.double(), comparison)) + self.assertTrue(torch.allclose(float32_tanh.larray, comparison)) - # hyperbolic tangent of float64 - float64_tensor = ht.arange(elements, dtype=ht.float64) - float64_tanh = ht.tanh(float64_tensor) - self.assertIsInstance(float64_tanh, ht.DNDarray) - self.assertEqual(float64_tanh.dtype, ht.float64) - self.assertTrue(torch.allclose(float64_tanh.larray.double(), comparison)) + if not self.is_mps: + # hyperbolic tangent of float64 + float64_tensor = ht.arange(elements, dtype=ht.float64) + float64_tanh = ht.tanh(float64_tensor) + self.assertIsInstance(float64_tanh, ht.DNDarray) + self.assertEqual(float64_tanh.dtype, ht.float64) + self.assertTrue(torch.allclose(float64_tanh.larray, comparison.double())) # hyperbolic tangent of ints, automatic conversion to intermediate floats int32_tensor = ht.arange(elements, dtype=ht.int32) int32_tanh = ht.tanh(int32_tensor) self.assertIsInstance(int32_tanh, ht.DNDarray) self.assertEqual(int32_tanh.dtype, ht.float32) - self.assertTrue(torch.allclose(int32_tanh.larray.double(), comparison)) + self.assertTrue(torch.allclose(int32_tanh.larray, comparison)) # hyperbolic tangent of longs, automatic conversion to intermediate floats int64_tensor = ht.arange(elements, dtype=ht.int64) int64_tanh = ht.tanh(int64_tensor) self.assertIsInstance(int64_tanh, ht.DNDarray) - self.assertEqual(int64_tanh.dtype, ht.float64) - self.assertTrue(torch.allclose(int64_tanh.larray.double(), comparison)) + if self.is_mps: + self.assertEqual(int64_tanh.dtype, ht.float32) + self.assertTrue(torch.allclose(int64_tanh.larray, comparison)) + else: + self.assertEqual(int64_tanh.dtype, ht.float64) + self.assertTrue(torch.allclose(int64_tanh.larray, comparison.double())) # check exceptions with self.assertRaises(TypeError): diff --git a/heat/core/tests/test_types.py b/heat/core/tests/test_types.py index 6aa765a070..42e0124ef2 100644 --- a/heat/core/tests/test_types.py +++ b/heat/core/tests/test_types.py @@ -23,7 +23,9 @@ def assert_is_instantiable_heat_type(self, heat_type, torch_type): no_value = heat_type() self.assertIsInstance(no_value, ht.DNDarray) self.assertEqual(no_value.shape, (1,)) - self.assertEqual((no_value.larray == 0).all().item(), 1) + if not self.is_mps and not ht.types.heat_type_is_complexfloating(heat_type): + # equal unstable on MPS and complex types + self.assertEqual((no_value.larray == 0).all().item(), 1) self.assertEqual(no_value.larray.dtype, torch_type) # check a type constructor with a complex value @@ -31,15 +33,17 @@ def assert_is_instantiable_heat_type(self, heat_type, torch_type): elaborate_value = heat_type(ground_truth) self.assertIsInstance(elaborate_value, ht.DNDarray) self.assertEqual(elaborate_value.shape, (2, 3)) - self.assertEqual( - ( - elaborate_value.larray - == torch.tensor(ground_truth, dtype=torch_type, device=self.device.torch_device) + if not self.is_mps and not ht.types.heat_type_is_complexfloating(heat_type): + # equal unstable on MPS and complex types + self.assertEqual( + ( + elaborate_value.larray + == torch.tensor(ground_truth, dtype=torch_type, device=self.device.torch_device) + ) + .all() + .item(), + 1, ) - .all() - .item(), - 1, - ) self.assertEqual(elaborate_value.larray.dtype, torch_type) # check exception when there is more than one parameter @@ -94,8 +98,9 @@ def test_float32(self): self.assert_is_instantiable_heat_type(ht.float_, torch.float32) def test_float64(self): - self.assert_is_instantiable_heat_type(ht.float64, torch.float64) - self.assert_is_instantiable_heat_type(ht.double, torch.float64) + if not self.is_mps: + self.assert_is_instantiable_heat_type(ht.float64, torch.float64) + self.assert_is_instantiable_heat_type(ht.double, torch.float64) def test_flexible(self): self.assert_non_instantiable_heat_type(ht.flexible) @@ -108,10 +113,11 @@ def test_complex64(self): self.assertEqual(ht.complex64.char(), "c8") def test_complex128(self): - self.assert_is_instantiable_heat_type(ht.complex128, torch.complex128) - self.assert_is_instantiable_heat_type(ht.cdouble, torch.complex128) + if not self.is_mps: + self.assert_is_instantiable_heat_type(ht.complex128, torch.complex128) + self.assert_is_instantiable_heat_type(ht.cdouble, torch.complex128) - self.assertEqual(ht.complex128.char(), "c16") + self.assertEqual(ht.complex128.char(), "c16") def test_iscomplex(self): a = ht.array([1, 1.2, 1 + 1j, 1 + 0j]) @@ -336,19 +342,20 @@ def test_result_type(self): self.assertEqual(ht.result_type(1.0, ht.array(1, dtype=ht.int32)), ht.float32) self.assertEqual(ht.result_type(ht.uint8, ht.int8), ht.int16) self.assertEqual(ht.result_type("b", "f4"), ht.float32) - self.assertEqual(ht.result_type(ht.array([1], dtype=ht.float64), "f4"), ht.float64) - self.assertEqual( - ht.result_type( - ht.array([1, 2, 3, 4], dtype=ht.float64, split=0), - 1, - ht.bool, - "u", - torch.uint8, - np.complex128, - ht.array(1, dtype=ht.int64), - ), - ht.complex128, - ) + if not self.is_mps: + self.assertEqual(ht.result_type(ht.array([1], dtype=ht.float64), "f4"), ht.float64) + self.assertEqual( + ht.result_type( + ht.array([1, 2, 3, 4], dtype=ht.float64, split=0), + 1, + ht.bool, + "u", + torch.uint8, + np.complex128, + ht.array(1, dtype=ht.int64), + ), + ht.complex128, + ) self.assertEqual( ht.result_type(np.array([1, 2, 3]), np.dtype("int32"), torch.tensor([1, 2, 3])), ht.int64, diff --git a/heat/core/tests/test_vmap.py b/heat/core/tests/test_vmap.py index 8fd1f4734d..0f7ba62d2e 100644 --- a/heat/core/tests/test_vmap.py +++ b/heat/core/tests/test_vmap.py @@ -1,5 +1,6 @@ import heat as ht import torch +import os from .test_suites.basic_test import TestCase @@ -79,51 +80,45 @@ def func(x0, m=1, scale=2): vfunc_torch = torch.vmap(func, (0,), (0,)) y0_torch = vfunc_torch(x0_torch, m=2, scale=3) - print(y0.resplit(None).larray, y0_torch) - self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch)) def test_vmap_with_chunks(self): - # same as before but now with prescribed chunk sizes for the vmap - x0 = ht.random.randn(5 * ht.MPI_WORLD.size, 10, 10, split=0) - x1 = ht.random.randn(10, 5 * ht.MPI_WORLD.size, split=1) - out_dims = (0, 0) - - def func(x0, x1, k=2, scale=1e-2): - return torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, scale * x0 @ x1 - - vfunc = ht.vmap(func, out_dims, chunk_size=2) - y0, y1 = vfunc(x0, x1, k=2, scale=-2.2) - - # compare with torch - x0_torch = x0.resplit(None).larray - x1_torch = x1.resplit(None).larray - vfunc_torch = torch.vmap(func, (0, 1), (0, 0)) - y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=2, scale=-2.2) - - self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch)) - self.assertTrue(torch.allclose(y1.resplit(None).larray, y1_torch)) - - # two inputs (only one of them split), two outputs, including keyword arguments that are not vmapped - # output split along different axis - x0 = ht.random.randn(5 * ht.MPI_WORLD.size, 10, 10, split=0) - x1 = ht.random.randn(10, 5 * ht.MPI_WORLD.size, split=None) - out_dims = (0, 1) - - def func(x0, x1, k=2, scale=1e-2): - return torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, scale * x0 @ x1 - - vfunc = ht.vmap(func, out_dims, chunk_size=1) - y0, y1 = vfunc(x0, x1, k=5, scale=2.2) - - # compare with torch - x0_torch = x0.resplit(None).larray - x1_torch = x1.resplit(None).larray - vfunc_torch = torch.vmap(func, (0, None), (0, 1)) - y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=5, scale=2.2) - - self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch)) - self.assertTrue(torch.allclose(y1.resplit(None).larray, y1_torch)) + x1_splits = [None, 1] + chunk_sizes = list(range(1, 5)) + dtypes = [ht.float32, ht.float64] + for x1_split in x1_splits: + for cs in chunk_sizes: + for dtype in dtypes: + with self.subTest(x1_split=x1_split, chunk_size=cs, dtype=dtype): + # same as before but now with prescribed chunk sizes for the vmap + x0 = ht.random.randn( + 5 * ht.MPI_WORLD.size, 10, 10, split=0, dtype=dtype + ) + x1 = ht.random.randn( + 10, 5 * ht.MPI_WORLD.size, split=x1_split, dtype=dtype + ) + out_dims = (0, 0) + + def func(x0, x1, k=2, scale=1e-2): + return ( + torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, + scale * x0 @ x1, + ) + + vfunc = ht.vmap(func, out_dims, chunk_size=cs) + y0, y1 = vfunc(x0, x1, k=2, scale=-2.2) + + # compare with torch + x0_torch = x0.resplit(None).larray + x1_torch = x1.resplit(None).larray + vfunc_torch = torch.vmap(func, (0, x1_split), out_dims) + y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=2, scale=-2.2) + + self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch)) + tol = 1e-12 if dtype == ht.float64 else 1e-4 + self.assertTrue( + torch.allclose(y1.resplit(None).larray, y1_torch, atol=tol) + ) def test_vmap_catch_errors(self): # not a callable diff --git a/heat/core/tiling.py b/heat/core/tiling.py index 1418ae8245..aa1294497f 100644 --- a/heat/core/tiling.py +++ b/heat/core/tiling.py @@ -39,7 +39,13 @@ class SplitTiles: Examples -------- - >>> a = ht.zeros((10, 11,), split=None) + >>> a = ht.zeros( + ... ( + ... 10, + ... 11, + ... ), + ... split=None, + ... ) >>> a.create_split_tiles() >>> print(a.tiles.tile_ends_g) [0/2] tensor([[ 4, 7, 10], @@ -190,7 +196,9 @@ def __getitem__(self, key: Union[int, slice, Tuple[Union[int, slice], ...]]) -> Examples -------- - >>> test = torch.arange(np.prod([i + 6 for i in range(2)])).reshape([i + 6 for i in range(2)]) + >>> test = torch.arange(np.prod([i + 6 for i in range(2)])).reshape( + ... [i + 6 for i in range(2)] + ... ) >>> a = ht.array(test, split=0).larray [0/2] tensor([[ 0., 1., 2., 3., 4., 5., 6.], [0/2] [ 7., 8., 9., 10., 11., 12., 13.]]) @@ -387,7 +395,7 @@ class SquareDiagTiles: Default: 2 Attributes - ----------- + ---------- __col_per_proc_list : List List is length of the number of processes, each element has the number of tile columns on the process whos rank equals the index @@ -408,7 +416,7 @@ class SquareDiagTiles: The generation of these tiles may unbalance the original ``DNDarray``! Notes - ----------- + ----- This tiling scheme is intended for use with the :func:`~heat.core.linalg.qr.qr` function. """ @@ -509,7 +517,6 @@ def __init__(self, arr: DNDarray, tiles_per_proc: int = 2) -> None: # noqa: D10 # if arr.split == 1: # adjust the 0th dim to be the cumsum row_inds = [0] + row_inds[:-1] row_inds = torch.tensor(row_inds, device=arr.larray.device).cumsum(dim=0) - for num, c in enumerate(col_inds): # set columns tile_map[:, num, 1] = c for num, r in enumerate(row_inds): # set rows @@ -1012,7 +1019,9 @@ def local_set( >>> a = ht.zeros((11, 10), split=0) >>> a_tiles = tiling.SquareDiagTiles(a, tiles_per_proc=2) # type: tiling.SquareDiagTiles >>> local = a_tiles.local_get(key=slice(None)) - >>> a_tiles.local_set(key=slice(None), value=torch.arange(local.numel()).reshape(local.shape)) + >>> a_tiles.local_set( + ... key=slice(None), value=torch.arange(local.numel()).reshape(local.shape) + ... ) >>> print(a.larray) [0/1] tensor([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], [0/1] [10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], diff --git a/heat/core/trigonometrics.py b/heat/core/trigonometrics.py index 63926127a2..4ffa3825bb 100644 --- a/heat/core/trigonometrics.py +++ b/heat/core/trigonometrics.py @@ -59,7 +59,7 @@ def arccos(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.arccos(ht.array([-1.,-0., 0.83])) + >>> ht.arccos(ht.array([-1.0, -0.0, 0.83])) DNDarray([3.1416, 1.5708, 0.5917], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.acos, x, out) @@ -91,7 +91,7 @@ def acosh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.acosh(ht.array([1., 10., 20.])) + >>> ht.acosh(ht.array([1.0, 10.0, 20.0])) DNDarray([0.0000, 2.9932, 3.6883], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.acosh, x, out) @@ -117,7 +117,7 @@ def arcsin(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.arcsin(ht.array([-1.,-0., 0.83])) + >>> ht.arcsin(ht.array([-1.0, -0.0, 0.83])) DNDarray([-1.5708, -0.0000, 0.9791], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.asin, x, out) @@ -149,7 +149,7 @@ def asinh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.asinh(ht.array([-10., 0., 10.])) + >>> ht.asinh(ht.array([-10.0, 0.0, 10.0])) DNDarray([-2.9982, 0.0000, 2.9982], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.asinh, x, out) @@ -211,10 +211,6 @@ def arctan2(x1: DNDarray, x2: DNDarray) -> DNDarray: >>> ht.arctan2(y, x) * 180 / ht.pi DNDarray([-135.0000, -45.0000, 45.0000, 135.0000], dtype=ht.float64, device=cpu:0, split=None) """ - # Cast integer to float because torch.atan2() only supports integer types on PyTorch 1.5.0. - x1 = x1.astype(types.promote_types(x1.dtype, types.float)) - x2 = x2.astype(types.promote_types(x2.dtype, types.float)) - return binary_op(torch.atan2, x1, x2) @@ -243,7 +239,7 @@ def atanh(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.atanh(ht.array([-1.,-0., 0.83])) + >>> ht.atanh(ht.array([-1.0, -0.0, 0.83])) DNDarray([ -inf, -0.0000, 1.1881], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.atanh, x, out) @@ -321,7 +317,7 @@ def deg2rad(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.deg2rad(ht.array([0.,20.,45.,78.,94.,120.,180., 270., 311.])) + >>> ht.deg2rad(ht.array([0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0])) DNDarray([0.0000, 0.3491, 0.7854, 1.3614, 1.6406, 2.0944, 3.1416, 4.7124, 5.4280], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.deg2rad, x, out) @@ -341,7 +337,7 @@ def degrees(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.degrees(ht.array([0.,0.2,0.6,0.9,1.2,2.7,3.14])) + >>> ht.degrees(ht.array([0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14])) DNDarray([ 0.0000, 11.4592, 34.3775, 51.5662, 68.7549, 154.6986, 179.9088], dtype=ht.float32, device=cpu:0, split=None) """ return rad2deg(x, out=out) @@ -361,7 +357,7 @@ def rad2deg(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.rad2deg(ht.array([0.,0.2,0.6,0.9,1.2,2.7,3.14])) + >>> ht.rad2deg(ht.array([0.0, 0.2, 0.6, 0.9, 1.2, 2.7, 3.14])) DNDarray([ 0.0000, 11.4592, 34.3775, 51.5662, 68.7549, 154.6986, 179.9088], dtype=ht.float32, device=cpu:0, split=None) """ return local_op(torch.rad2deg, x, out=out) @@ -381,7 +377,7 @@ def radians(x: DNDarray, out: Optional[DNDarray] = None) -> DNDarray: Examples -------- - >>> ht.radians(ht.array([0., 20., 45., 78., 94., 120., 180., 270., 311.])) + >>> ht.radians(ht.array([0.0, 20.0, 45.0, 78.0, 94.0, 120.0, 180.0, 270.0, 311.0])) DNDarray([0.0000, 0.3491, 0.7854, 1.3614, 1.6406, 2.0944, 3.1416, 4.7124, 5.4280], dtype=ht.float32, device=cpu:0, split=None) """ return deg2rad(x, out) diff --git a/heat/core/types.py b/heat/core/types.py index 6bb8e0272c..6858b206ee 100644 --- a/heat/core/types.py +++ b/heat/core/types.py @@ -46,6 +46,8 @@ "canonical_heat_type", "heat_type_is_exact", "heat_type_is_inexact", + "heat_type_is_realfloating", + "heat_type_is_complexfloating", "iscomplex", "isreal", "issubdtype", @@ -502,7 +504,7 @@ def canonical_heat_type(a_type: Union[str, Type[datatype], Any]) -> Type[datatyp In the three former cases the according mapped type is looked up, in the latter the type is simply returned. Raises - ------- + ------ TypeError If the type cannot be converted. """ @@ -547,9 +549,26 @@ def heat_type_is_inexact(ht_dtype: Type[datatype]) -> bool: return ht_dtype in _inexact +def heat_type_is_realfloating(ht_dtype: Type[datatype]) -> bool: + """ + Check if Heat type is a real floating point number, i.e float32 or float64 + + Parameters + ---------- + ht_dtype: Type[datatype] + Heat type to check + + Returns + ------- + out: bool + True if ht_dtype is a real float, False otherwise + """ + return ht_dtype in (float32, float64) + + def heat_type_is_complexfloating(ht_dtype: Type[datatype]) -> bool: """ - Check if HeAT type is a complex floating point number, i.e complex64 + Check if Heat type is a complex floating point number, i.e complex64 Parameters ---------- @@ -580,7 +599,7 @@ def heat_type_of( The object for which to infer the type. Raises - ------- + ------ TypeError If the object's type cannot be inferred. """ @@ -696,7 +715,7 @@ def can_cast( Raises - ------- + ------ TypeError If the types are not understood or casting is not a string ValueError @@ -714,7 +733,7 @@ def can_cast( True >>> ht.can_cast(2.0e200, "u1") False - >>> ht.can_cast('i8', 'i4', 'no') + >>> ht.can_cast("i8", "i4", "no") False >>> ht.can_cast("i8", "i4", "safe") False @@ -774,7 +793,7 @@ def iscomplex(x: dndarray.DNDarray) -> dndarray.DNDarray: Examples -------- - >>> ht.iscomplex(ht.array([1+1j, 1])) + >>> ht.iscomplex(ht.array([1 + 1j, 1])) DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None) """ sanitation.sanitize_in(x) @@ -796,7 +815,7 @@ def isreal(x: dndarray.DNDarray) -> dndarray.DNDarray: Examples -------- - >>> ht.iscomplex(ht.array([1+1j, 1])) + >>> ht.iscomplex(ht.array([1 + 1j, 1])) DNDarray([ True, False], dtype=ht.bool, device=cpu:0, split=None) """ return _operations.__local_op(torch.isreal, x, None, no_cast=True) @@ -825,7 +844,7 @@ def issubdtype( False >>> ht.issubdtype(ht.float64, ht.float32) False - >>> ht.issubdtype('i', ht.integer) + >>> ht.issubdtype("i", ht.integer) True """ # Assure that each argument is a ht.dtype @@ -868,7 +887,7 @@ def promote_types( def result_type( - *arrays_and_types: Tuple[Union[dndarray.DNDarray, Type[datatype], Any]] + *arrays_and_types: Tuple[Union[dndarray.DNDarray, Type[datatype], Any]], ) -> Type[datatype]: """ Returns the data type that results from type promotions rules performed in an arithmetic operation. @@ -975,7 +994,7 @@ class finfo: Kind of floating point data-type about which to get information. Examples - --------- + -------- >>> import heat as ht >>> info = ht.types.finfo(ht.float32) >>> info.bits @@ -1023,7 +1042,7 @@ class iinfo: Kind of floating point data-type about which to get information. Examples - --------- + -------- >>> import heat as ht >>> info = ht.types.iinfo(ht.int32) >>> info.bits diff --git a/heat/core/version.py b/heat/core/version.py index 0d7e23cc23..30094d536d 100644 --- a/heat/core/version.py +++ b/heat/core/version.py @@ -1,10 +1,10 @@ -"""This module contains Heat's version information.""" +"""Heat's version information.""" major: int = 1 """Indicates Heat's main version.""" -minor: int = 5 +minor: int = 6 """Indicates feature extension.""" -micro: int = 1 +micro: int = 0 """Indicates revisions for bugfixes.""" extension: str = None """Indicates special builds, e.g. for specific hardware.""" @@ -13,4 +13,4 @@ __version__: str = f"{major}.{minor}.{micro}" """The combined version string, consisting out of major, minor, micro and possibly extension.""" else: - __version__: str = f"{major}.{minor}.{micro}-{extension}" + __version__ = f"{major}.{minor}.{micro}-{extension}" diff --git a/heat/core/vmap.py b/heat/core/vmap.py index 03cb0f449f..2defdfc928 100644 --- a/heat/core/vmap.py +++ b/heat/core/vmap.py @@ -1,4 +1,5 @@ """ +Vmap module. This implements a functionality similar to PyTorchs vmap function. Requires PyTorch 2.0.0 or higher. """ @@ -21,7 +22,7 @@ def vmap( chunk_size: int = None, ) -> Callable[[Tuple[DNDarray]], Tuple[DNDarray]]: """ - This function is used to apply a function to a DNDarray in a vectorized way. + Apply a function to a DNDarray in a vectorized way. `heat.vmap` return a callable that can be applied to DNDarrays. Vectorization will automatically take place along the split axis/axes of the DNDarray(s); therefore, unlike in PyTorch, there is no argument `in_dims`. diff --git a/heat/decomposition/__init__.py b/heat/decomposition/__init__.py index 9a9721c92f..1589ee59fb 100644 --- a/heat/decomposition/__init__.py +++ b/heat/decomposition/__init__.py @@ -3,3 +3,4 @@ """ from .pca import * +from .dmd import * diff --git a/heat/decomposition/dmd.py b/heat/decomposition/dmd.py new file mode 100644 index 0000000000..556ce26d96 --- /dev/null +++ b/heat/decomposition/dmd.py @@ -0,0 +1,715 @@ +""" +Module implementing the Dynamic Mode Decomposition (DMD) algorithm. +""" + +import heat as ht +from typing import Optional, Union, List +import torch + +try: + from typing import Self +except ImportError: + from typing_extensions import Self + + +def _torch_matrix_diag(diagonal): + # auxiliary function to create a batch of diagonal matrices from a batch of diagonal vectors + # source: fmassas comment on Oct 4, 2018 in https://github.com/pytorch/pytorch/issues/12160 [Accessed Oct 09, 2024] + N = diagonal.shape[-1] + shape = diagonal.shape[:-1] + (N, N) + device, dtype = diagonal.device, diagonal.dtype + result = torch.zeros(shape, dtype=dtype, device=device) + indices = torch.arange(result.numel(), device=device).reshape(shape) + indices = indices.diagonal(dim1=-2, dim2=-1) + result.view(-1)[indices] = diagonal + return result + + +class DMD(ht.RegressionMixin, ht.BaseEstimator): + """ + Dynamic Mode Decomposition (DMD), plain vanilla version with SVD-based implementation. + + The time series of which DMD shall be computed must be provided as a 2-D DNDarray of shape (n_features, n_timesteps). + Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis. + + Parameters + ---------- + svd_solver : str, optional + Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'. + svd_rank : int, optional + The rank to which SVD shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation. + For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified. + For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified. + For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to. + svd_tol : float, optional + The tolerance to which SVD shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation. + For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified. + For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None. + + Attributes + ---------- + svd_solver : str + The algorithm used for the singular value decomposition (SVD). + svd_rank : int + The rank to which SVD shall be truncated. + svd_tol : float + The tolerance to which SVD shall be truncated. + rom_basis_ : DNDarray + The reduced order model basis. + rom_transfer_matrix_ : DNDarray + The reduced order model transfer matrix. + rom_eigenvalues_ : DNDarray + The reduced order model eigenvalues. + rom_eigenmodes_ : DNDarray + The reduced order model eigenmodes ("DMD modes") + + Notes + ----- + We follow the "exact DMD" method as described in [1], Sect. 2.2. + + References + ---------- + [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016. + """ + + def __init__( + self, + svd_solver: Optional[str] = "full", + svd_rank: Optional[int] = None, + svd_tol: Optional[float] = None, + ): + # Check if the specified SVD algorithm is valid + if not isinstance(svd_solver, str): + raise TypeError( + f"Invalid type '{type(svd_solver)}' for 'svd_solver'. Must be a string." + ) + # check if the specified SVD algorithm is valid + if svd_solver not in ["full", "hierarchical", "randomized"]: + raise ValueError( + f"Invalid SVD algorithm '{svd_solver}'. Must be one of 'full', 'hierarchical', 'randomized'." + ) + # check if the respective algorithm got the right combination of non-None parameters + if svd_solver == "full" and svd_rank is not None and svd_tol is not None: + raise ValueError( + "For 'full' SVD, at most one of 'svd_rank' or 'svd_tol' may be specified." + ) + if svd_solver == "hierarchical": + if svd_rank is None and svd_tol is None: + raise ValueError( + "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but none of them is specified." + ) + if svd_rank is not None and svd_tol is not None: + raise ValueError( + "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but currently both are specified." + ) + if svd_solver == "randomized": + if svd_rank is None: + raise ValueError("For 'randomized' SVD, 'svd_rank' must be specified.") + if svd_tol is not None: + raise ValueError("For 'randomized' SVD, 'svd_tol' must be None.") + # check correct data types of non-None parameters + if svd_rank is not None: + if not isinstance(svd_rank, int): + raise TypeError( + f"Invalid type '{type(svd_rank)}' for 'svd_rank'. Must be an integer." + ) + if svd_rank < 1: + raise ValueError( + f"Invalid value '{svd_rank}' for 'svd_rank'. Must be a positive integer." + ) + if svd_tol is not None: + if not isinstance(svd_tol, float): + raise TypeError(f"Invalid type '{type(svd_tol)}' for 'svd_tol'. Must be a float.") + if svd_tol <= 0: + raise ValueError(f"Invalid value '{svd_tol}' for 'svd_tol'. Must be non-negative.") + # set or initialize the attributes + self.svd_solver = svd_solver + self.svd_rank = svd_rank + self.svd_tol = svd_tol + self.rom_basis_ = None + self.rom_transfer_matrix_ = None + self.rom_eigenvalues_ = None + self.rom_eigenmodes_ = None + self.dmdmodes_ = None + self.n_modes_ = None + + def fit(self, X: ht.DNDarray) -> Self: + """ + Fits the DMD model to the given data. + + Parameters + ---------- + X : DNDarray + The time series data to fit the DMD model to. Must be of shape (n_features, n_timesteps). + """ + ht.sanitize_in(X) + # check if the input data is a 2-D DNDarray + if X.ndim != 2: + raise ValueError( + f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_features, n_timesteps)." + ) + # check if the input data has at least two time steps + if X.shape[1] < 2: + raise ValueError( + f"Invalid number of time steps '{X.shape[1]}' in input data 'X'. Must have at least two time steps." + ) + # first step of DMD: compute the SVD of the input data from first to second last time step + if self.svd_solver == "full" or not X.is_distributed(): + U, S, V = ht.linalg.svd( + X[:, :-1] if X.split == 0 else X[:, :-1].balance(), full_matrices=False + ) + if self.svd_tol is not None: + # truncation w.r.t. prescribed bound on explained variance + # determine svd_rank accordingly + total_variance = (S**2).sum() + variance_threshold = (1 - self.svd_tol) * total_variance.larray.item() + variance_cumsum = (S**2).larray.cumsum(0) + self.n_modes_ = len(variance_cumsum[variance_cumsum <= variance_threshold]) + 1 + elif self.svd_rank is not None: + # truncation w.r.t. prescribed rank + self.n_modes_ = self.svd_rank + else: + # no truncation + self.n_modes_ = S.shape[0] + self.rom_basis_ = U[:, : self.n_modes_] + V = V[:, : self.n_modes_] + S = S[: self.n_modes_] + # compute SVD via "hierarchical" SVD + elif self.svd_solver == "hierarchical": + if self.svd_tol is not None: + # hierarchical SVD with prescribed upper bound on relative error + U, S, V, _ = ht.linalg.hsvd_rtol( + X[:, :-1] if X.split == 0 else X[:, :-1].balance(), + self.svd_tol, + compute_sv=True, + safetyshift=5, + ) + else: + # hierarchical SVD with prescribed, fixed rank + U, S, V, _ = ht.linalg.hsvd_rank( + X[:, :-1] if X.split == 0 else X[:, :-1].balance(), + self.svd_rank, + compute_sv=True, + safetyshift=5, + ) + self.rom_basis_ = U + self.n_modes_ = U.shape[1] + else: + # compute SVD via "randomized" SVD + U, S, V = ht.linalg.rsvd( + X[:, :-1] if X.split == 0 else X[:, :-1].balance_(), + self.svd_rank, + ) + self.rom_basis_ = U + self.n_modes_ = U.shape[1] + # second step of DMD: compute the reduced order model transfer matrix + # we need to assume that the the transfer matrix of the ROM is small enough to fit into memory of one process + if X.split == 0 or X.split is None: + # if split axis of the input data is 0, using X[:,1:] does not result in un-balancedness and corresponding problems in matmul + self.rom_transfer_matrix_ = self.rom_basis_.T @ X[:, 1:] @ V / S + else: + # if input is split along columns, X[:,1:] will be un-balanced and cause problems in matmul + Xplus = X[:, 1:] + Xplus.balance_() + self.rom_transfer_matrix_ = self.rom_basis_.T @ Xplus @ V / S + + self.rom_transfer_matrix_.resplit_(None) + # third step of DMD: compute the reduced order model eigenvalues and eigenmodes + eigvals_loc, eigvec_loc = torch.linalg.eig(self.rom_transfer_matrix_.larray) + self.rom_eigenvalues_ = ht.array(eigvals_loc, split=None, device=X.device) + self.rom_eigenmodes_ = ht.array(eigvec_loc, split=None, device=X.device) + self.dmdmodes_ = self.rom_basis_ @ self.rom_eigenmodes_ + + def predict_next(self, X: ht.DNDarray, n_steps: int = 1) -> ht.DNDarray: + """ + Predicts and returns the state(s) after n_steps-many time steps for given a current state(s). + + Parameters + ---------- + X : DNDarray + The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, + i.e., X can be of shape (n_features,) or (n_features, n_current_states). + The output will have the same shape as the input. + n_steps : int, optional + The number of steps to predict into the future. Default is 1, i.e., the next time step is predicted. + """ + if not isinstance(n_steps, int): + raise TypeError(f"Invalid type '{type(n_steps)}' for 'n_steps'. Must be an integer.") + if self.rom_basis_ is None: + raise RuntimeError("Model has not been fitted yet. Call 'fit' first.") + # sanitize input data + ht.sanitize_in(X) + # if X is a 1-D DNDarray, we add an artificial batch dimension + if X.ndim == 1: + X = X.expand_dims(1) + # check if the input data has the right number of features + if X.shape[0] != self.rom_basis_.shape[0]: + raise ValueError( + f"Invalid number of features '{X.shape[0]}' in input data 'X'. Must have the same number of features as the training data." + ) + rom_mat = self.rom_transfer_matrix_.copy() + rom_mat.larray = torch.linalg.matrix_power(rom_mat.larray, n_steps) + # the following line looks that complicated because we have to make sure that splits of the resulting matrices in + # each of the products are split along the axis that deserves being splitted + nextX = (self.rom_basis_.T @ X).T.resplit_(None) @ (self.rom_basis_ @ rom_mat).T + return (nextX.T).squeeze() + + def predict(self, X: ht.DNDarray, steps: Union[int, List[int]]) -> ht.DNDarray: + """ + Predics and returns future states given a current state(s) and returns them all as an array of size (n_steps, n_features). + + This function avoids a time-stepping loop (i.e., repeated calls to 'predict_next') and computes the future states in one go. + To do so, the number of future times to predict must be of moderate size as an array of shape (n_steps, self.n_modes_, self.n_modes_) must fit into memory. + Moreover, it must be ensured that: + + - the array of initial states is not split or split along the batch axis (axis 1) and the feature axis is small (i.e., self.rom_basis_ is not split) + + Parameters + ---------- + X : DNDarray + The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, + i.e., X can be of shape (n_features,) or (n_current_states, n_features). + steps : int or List[int] + if int: predictions at time step 0, 1, ..., steps-1 are computed + if List[int]: predictions at time steps given in the list are computed + """ + if self.rom_basis_ is None: + raise RuntimeError("Model has not been fitted yet. Call 'fit' first.") + # sanitize input data + ht.sanitize_in(X) + # if X is a 1-D DNDarray, we add an artificial batch dimension + if X.ndim == 1: + X = X.expand_dims(1) + # check if the input data has the right number of features + if X.shape[0] != self.rom_basis_.shape[0]: + raise ValueError( + f"Invalid number of features '{X.shape[0]}' in input data 'X'. Must have the same number of features as the training data." + ) + if isinstance(steps, int): + steps = torch.arange(steps, dtype=torch.int32, device=X.device.torch_device) + elif isinstance(steps, list): + steps = torch.tensor(steps, dtype=torch.int32, device=X.device.torch_device) + else: + raise TypeError( + f"Invalid type '{type(steps)}' for 'steps'. Must be an integer or a list of integers." + ) + steps = steps.reshape(-1, 1).repeat(1, self.rom_eigenvalues_.shape[0]) + X_rom = self.rom_basis_.T @ X + + transfer_mat = _torch_matrix_diag(torch.pow(self.rom_eigenvalues_.larray, steps)) + transfer_mat = ( + self.rom_eigenmodes_.larray @ transfer_mat @ self.rom_eigenmodes_.larray.inverse() + ) + transfer_mat = torch.real( + transfer_mat + ) # necessary to avoid imaginary parts due to numerical errors + + if self.rom_basis_.split is None and (X.split is None or X.split == 1): + result = ( + transfer_mat @ X_rom.larray + ) # here we assume that X_rom is not split or split along the second axis (axis 1) + del transfer_mat + + result = ( + self.rom_basis_.larray @ result + ) # here we assume that self.rom_basis_ is not split (i.e., the feature number is small) + result = ht.array(result, is_split=2 if X.split == 1 else None) + return result.squeeze().T + else: + raise NotImplementedError( + "Predicting multiple time steps in one go is not supported for the given data layout. Please, use 'predict_next' instead, or open an issue on GitHub if you require this feature." + ) + + def __str__(self): + if ht.MPI_WORLD.rank == 0: + if self.rom_basis_ is not None: + return ( + f"-------------------- DMD (Dynamic Mode Decomposition) --------------------\n" + f"Number of modes: {self.n_modes_}\n" + f"State space dimension: {self.rom_basis_.shape[0]}\n" + f"DMD eigenvalues: {self.rom_eigenvalues_.larray}\n" + f"--------------------------------------------------------------------------\n" + f"ROM basis of shape {self.rom_basis_.shape}:\n" + f"\t split axis: {self.rom_basis_.split}\n" + f"\t device: {self.rom_basis_.device.__str__().split(':')[-2]}\n" + f"--------------------------------------------------------------------------\n" + ) + else: + return ( + f"---------------- UNFITTED DMD (Dynamic Mode Decomposition) ---------------\n" + f"Parameters for fit are as follows: \n" + f"\t SVD-solver: {self.svd_solver}\n" + f"\t SVD-rank: {self.svd_rank}\n" + f"\t SVD-tolerance: {self.svd_tol}\n" + f"--------------------------------------------------------------------------\n" + ) + else: + return "" + + +class DMDc(ht.RegressionMixin, ht.BaseEstimator): + """ + Dynamic Mode Decomposition with Control (DMDc), plain vanilla version with SVD-based implementation. + + The time series of states and controls must be provided as 2-D DNDarrays of shapes (n_state_features, n_timesteps) and (n_control_features, n_timesteps), respectively. + Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis. + + Parameters + ---------- + svd_solver : str, optional + Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'. + svd_rank : int, optional + The rank to which SVD of the states shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation. + For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified. + For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified. + For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to. + svd_tol : float, optional + The tolerance to which SVD of the states shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation. + For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified. + For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None. + + Attributes + ---------- + svd_solver : str + The algorithm used for the singular value decomposition (SVD). + svd_rank : int + The rank to which SVD shall be truncated. + svd_tol : float + The tolerance to which SVD shall be truncated. + rom_basis_ : DNDarray + The reduced order model basis. + rom_transfer_matrix_ : DNDarray + The reduced order model transfer matrix. + rom_control_matrix_ : DNDarray + The reduced order model control matrix. + rom_eigenvalues_ : DNDarray + The reduced order model eigenvalues. + rom_eigenmodes_ : DNDarray + The reduced order model eigenmodes ("DMD modes") + + Notes + ----- + We follow the approach described in [1], Sects. 3.3 and 3.4. + In the case that svd_rank is prescribed, the rank of the SVD of the full system matrix is set to svd_rank + n_control_features; cf. https://github.com/dynamicslab/pykoopman + for the same approach. + + References + ---------- + [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016. + """ + + def __init__( + self, + svd_solver: Optional[str] = "full", + svd_rank: Optional[int] = None, + svd_tol: Optional[float] = None, + ): + # Check if the specified SVD algorithm is valid + if not isinstance(svd_solver, str): + raise TypeError( + f"Invalid type '{type(svd_solver)}' for 'svd_solver'. Must be a string." + ) + # check if the specified SVD algorithm is valid + if svd_solver not in ["full", "hierarchical", "randomized"]: + raise ValueError( + f"Invalid SVD algorithm '{svd_solver}'. Must be one of 'full', 'hierarchical', 'randomized'." + ) + # check if the respective algorithm got the right combination of non-None parameters + if svd_solver == "full" and svd_rank is not None and svd_tol is not None: + raise ValueError( + "For 'full' SVD, at most one of 'svd_rank' or 'svd_tol' may be specified." + ) + if svd_solver == "hierarchical": + if svd_rank is None and svd_tol is None: + raise ValueError( + "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but none of them is specified." + ) + if svd_rank is not None and svd_tol is not None: + raise ValueError( + "For 'hierarchical' SVD, exactly one of 'svd_rank' or 'svd_tol' must be specified, but currently both are specified." + ) + if svd_solver == "randomized": + if svd_rank is None: + raise ValueError("For 'randomized' SVD, 'svd_rank' must be specified.") + if svd_tol is not None: + raise ValueError("For 'randomized' SVD, 'svd_tol' must be None.") + # check correct data types of non-None parameters + if svd_rank is not None: + if not isinstance(svd_rank, int): + raise TypeError( + f"Invalid type '{type(svd_rank)}' for 'svd_rank'. Must be an integer." + ) + if svd_rank < 1: + raise ValueError( + f"Invalid value '{svd_rank}' for 'svd_rank'. Must be a positive integer." + ) + if svd_tol is not None: + if not isinstance(svd_tol, float): + raise TypeError(f"Invalid type '{type(svd_tol)}' for 'svd_tol'. Must be a float.") + if svd_tol <= 0: + raise ValueError(f"Invalid value '{svd_tol}' for 'svd_tol'. Must be non-negative.") + # set or initialize the attributes + self.svd_solver = svd_solver + self.svd_rank = svd_rank + self.svd_tol = svd_tol + self.rom_basis_ = None + self.rom_transfer_matrix_ = None + self.rom_control_matrix_ = None + self.rom_eigenvalues_ = None + self.rom_eigenmodes_ = None + self.dmdmodes_ = None + self.n_modes_ = None + self.n_modes_system_ = None + + def fit(self, X: ht.DNDarray, C: ht.DNDarray) -> Self: + """ + Fits the DMD model to the given data. + + Parameters + ---------- + X : DNDarray + The time series data of states to fit the DMD model to. Must be of shape (n_state_features, n_timesteps). + C : DNDarray + The time series of control inputs to fit the DMD model to. Must be of shape (n_control_features, n_timesteps). + """ + ht.sanitize_in(X) + ht.sanitize_in(C) + # check if the input data is a 2-D DNDarray + if X.ndim != 2: + raise ValueError( + f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_state_features, n_timesteps)." + ) + if C.ndim != 2: + raise ValueError( + f"Invalid shape '{C.shape}' for input data 'C'. Must be a 2-D DNDarray of shape (n_control_features, n_timesteps)." + ) + # check if the input data has at least two time steps + if X.shape[1] < 2: + raise ValueError( + f"Invalid number of time steps '{X.shape[1]}' in input data 'X'. Must have at least two time steps." + ) + if C.shape[1] < 2: + raise ValueError( + f"Invalid number of time steps '{C.shape[1]}' in input data 'C'. Must have at least two time steps." + ) + # check if the input data has the same number of time steps + if X.shape[1] != C.shape[1]: + raise ValueError( + f"Invalid number of time steps {X.shape[1]} in input data 'X' and {C.shape[1]} in input data 'C'. Must have the same number of time steps." + ) + if X.split is not None and C.split is not None and X.split != C.split: + raise ValueError( + f"If both input data 'X' and 'C' are distributed, they must be distributed along the same axis, but X.split={X.split}, C.split={C.split}." + ) + Xplus = X[:, 1:] + Xplus.balance_() + Omega = ht.concatenate((X, C), axis=0)[:, :-1] + # first step of DMDc: compute the SVD of the input data from first to second last time step + # as well as of the full system matrix + if self.svd_solver == "full" or not X.is_distributed(): + U, S, V = ht.linalg.svd(Xplus, full_matrices=False) + Utilde, Stilde, Vtilde = ht.linalg.svd(Omega, full_matrices=False) + if self.svd_tol is not None: + # truncation w.r.t. prescribed bound on explained variance + # determine svd_rank accordingly + total_variance = (S**2).sum() + variance_threshold = (1 - self.svd_tol) * total_variance.larray.item() + variance_cumsum = (S**2).larray.cumsum(0) + self.n_modes_ = len(variance_cumsum[variance_cumsum <= variance_threshold]) + 1 + total_variance_system = (Stilde**2).sum() + variance_threshold_system = (1 - self.svd_tol) * total_variance_system.larray.item() + variance_cumsum_system = (Stilde**2).larray.cumsum(0) + self.n_modes_system_ = ( + len(variance_cumsum_system[variance_cumsum_system <= variance_threshold_system]) + + 1 + ) + elif self.svd_rank is not None: + # truncation w.r.t. prescribed rank + self.n_modes_ = self.svd_rank + self.n_modes_system_ = self.svd_rank + C.shape[0] + else: + # no truncation + self.n_modes_ = S.shape[0] + self.n_modes_system = Stilde.shape[0] + + self.rom_basis_ = U[:, : self.n_modes_] + V = V[:, : self.n_modes_] + S = S[: self.n_modes_] + Vtilde = Vtilde[:, : self.n_modes_system_] + Stilde = Stilde[: self.n_modes_system_] + Utilde1 = Utilde[: X.shape[0], : self.n_modes_system_] + Utilde2 = Utilde[X.shape[0] :, : self.n_modes_system_] + # compute SVD via "hierarchical" SVD + elif self.svd_solver == "hierarchical": + if self.svd_tol is not None: + # hierarchical SVD with prescribed upper bound on relative error + U, S, V, _ = ht.linalg.hsvd_rtol( + Xplus, + self.svd_tol, + compute_sv=True, + safetyshift=5, + ) + Utilde, Stilde, Vtilde, _ = ht.linalg.hsvd_rtol( + Omega, + self.svd_tol, + compute_sv=True, + safetyshift=5, + ) + else: + # hierarchical SVD with prescribed, fixed rank + U, S, V, _ = ht.linalg.hsvd_rank( + Xplus, + self.svd_rank, + compute_sv=True, + safetyshift=5, + ) + Utilde, Stilde, Vtilde, _ = ht.linalg.hsvd_rank( + Omega, + self.svd_rank + C.shape[0], + compute_sv=True, + safetyshift=5, + ) + self.rom_basis_ = U + self.n_modes_ = U.shape[1] + self.n_modes_system_ = Utilde.shape[1] + Utilde1 = Utilde[: X.shape[0], :] + Utilde2 = Utilde[X.shape[0] :, :] + else: + # compute SVD via "randomized" SVD + U, S, V = ht.linalg.rsvd( + Xplus, + self.svd_rank, + ) + Utilde, Stilde, Vtilde = ht.linalg.rsvd( + Omega, + self.svd_rank + C.shape[0], + ) + self.rom_basis_ = U + self.n_modes_ = U.shape[1] + self.n_modes_system_ = Utilde.shape[1] + Utilde1 = Utilde[: X.shape[0], :] + Utilde2 = Utilde[X.shape[0] :, :] + + # ensure that everything is balanced for the following steps + Utilde2.balance_() + Utilde1.balance_() + Vtilde.balance_() + if Utilde2.split is not None and Utilde2.shape[Utilde2.split] < Utilde2.comm.size: + Utilde2.resplit_((Utilde2.split + 1) % 2) + if Utilde1.split is not None and Utilde1.shape[Utilde1.split] < Utilde1.comm.size: + Utilde1.resplit_((Utilde1.split + 1) % 2) + if Vtilde.split is not None and Vtilde.shape[Vtilde.split] < Vtilde.comm.size: + Vtilde.resplit_((Vtilde.split + 1) % 2) + # second step of DMD: compute the reduced order model transfer matrix + # we need to assume that the the transfer matrix of the ROM is small enough to fit into memory of one process + self.rom_transfer_matrix_ = ( + self.rom_basis_.T + @ Xplus + @ (Vtilde / Stilde) + @ (Utilde1.T @ self.rom_basis_).resplit_(None) + ) + self.rom_control_matrix_ = (self.rom_basis_.T @ Xplus) @ ( + (Vtilde / Stilde) @ Utilde2.T + ).resplit_(0) + self.rom_transfer_matrix_.resplit_(None) + self.rom_control_matrix_.resplit_(None) + + # third step of DMD: compute the reduced order model eigenvalues and eigenmodes + eigvals_loc, eigvec_loc = torch.linalg.eig(self.rom_transfer_matrix_.larray) + self.rom_eigenvalues_ = ht.array(eigvals_loc, split=None, device=X.device) + self.rom_eigenmodes_ = ht.array(eigvec_loc, split=None, device=X.device) + self.dmdmodes_ = ( + Xplus @ (Vtilde / Stilde) @ Utilde1.T @ self.rom_basis_ @ self.rom_eigenmodes_ + ) + + def predict(self, X: ht.DNDarray, C: ht.DNDarray) -> ht.DNDarray: + """ + Predicts and returns future states given the current state(s) ``X`` and control trajectory ``C``. + + Parameters + ---------- + X : DNDarray + The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, + i.e., X can be of shape (n_state_features,) or (n_batch, n_state_features). + C : DNDarray + The control trajectory for the prediction. Must have the same number of control features as the training data, i.e., C must be of shape + (n_control_features,) --for a single time step-- or (n_control_features, n_timesteps). + """ + if self.rom_basis_ is None: + raise RuntimeError("Model has not been fitted yet. Call 'fit' first.") + # sanitize input data + ht.sanitize_in(X) + ht.sanitize_in(C) + # if X is a 1-D DNDarray, we add an artificial batch dimension; check correct dimensions for X + if X.ndim == 1: + X = X.expand_dims(0) + if X.ndim > 2: + raise ValueError( + f"Invalid shape '{X.shape}' for input data 'X'. Must be a 2-D DNDarray of shape (n_batch, n_state_features) or a 1-D DNDarray of shape (n_state_features,)." + ) + # if C is a 1-D DNDarray, we add an artificial dimension for the single time step; check correct dimensions for C + if C.ndim == 1: + C = C.expand_dims(1) + if C.ndim > 2: + raise ValueError( + f"Invalid shape '{C.shape}' for input data 'C'. Must be a 2-D DNDarray of shape (n_control_features, n_timesteps) or a 1-D DNDarray of shape (n_control_features,) for a single time step." + ) + # check if the input data has the right number of features for control and state space + if X.shape[1] != self.rom_basis_.shape[0]: + raise ValueError( + f"Invalid number of features '{X.shape[1]}' in input data 'X'. Must have the same number of features as the training data (={self.rom_basis_.shape[0]})." + ) + if C.shape[0] != self.rom_control_matrix_.shape[1]: + raise ValueError( + f"Invalid number of features '{C.shape[1]}' in input data 'C'. Must have the same number of features as the training data (={self.rom_control_matrix_.shape[1]})." + ) + # different cases + if C.split is not None: + raise ValueError("So far only C.split = None is supported .") + # time evolution in the reduced order model + X_red = X @ self.rom_basis_ + X_red_full = ht.zeros( + (X.shape[0], self.rom_basis_.shape[1], C.shape[1]), + split=X_red.split, + device=X.device, + dtype=X.dtype, + ) + X_red_full[:, :, 0] = X_red + for i in range(1, C.shape[1]): + X_red_full[:, :, i] = (self.rom_transfer_matrix_ @ X_red_full[:, :, i - 1].T).T + ( + self.rom_control_matrix_ @ C[:, i - 1] + ).T + # reshape in order to be able to multiply with basis again + X_red_full = X_red_full.reshape(self.rom_basis_.shape[1], -1).resplit_( + 1 if X_red_full.split == 0 else None + ) + X_pred = self.rom_basis_ @ X_red_full + # reshape again and return + return X_pred.reshape(X.shape[0], X.shape[1], C.shape[1]) + + def __str__(self): + if ht.MPI_WORLD.rank == 0: + if self.rom_basis_ is not None: + return ( + f"----------- DMDc (Dynamic Mode Decomposition with control) ---------------\n" + f"Number of modes: {self.n_modes_}\n" + f"State space dimension: {self.rom_basis_.shape[0]}\n" + f"Control space dimension: {self.rom_control_matrix_.shape[1]}\n" + f"DMD eigenvalues: {self.rom_eigenvalues_.larray}\n" + f"--------------------------------------------------------------------------\n" + f"ROM basis of shape {self.rom_basis_.shape}:\n" + f"\t split axis: {self.rom_basis_.split}\n" + f"\t device: {self.rom_basis_.device.__str__().split(':')[-2]}\n" + f"--------------------------------------------------------------------------\n" + ) + else: + return ( + f"-------- UNFITTED DMDc (Dynamic Mode Decomposition with control) ---------\n" + f"Parameters for fit are as follows: \n" + f"\t SVD-solver: {self.svd_solver}\n" + f"\t SVD-rank: {self.svd_rank}\n" + f"\t SVD-tolerance: {self.svd_tol}\n" + f"--------------------------------------------------------------------------\n" + ) + else: + return "" diff --git a/heat/decomposition/pca.py b/heat/decomposition/pca.py index dde1d15f5e..ac715ad594 100644 --- a/heat/decomposition/pca.py +++ b/heat/decomposition/pca.py @@ -4,6 +4,7 @@ import heat as ht from typing import Optional, Tuple, Union +from ..core.linalg.svdtools import _isvd try: from typing import Self @@ -36,16 +37,18 @@ class PCA(ht.TransformMixin, ht.BaseEstimator): svd_solver : {'full', 'hierarchical'}, default='hierarchical' 'full' : Full SVD is performed. In general, this is more accurate, but also slower. So far, this is only supported for tall-skinny or short-fat data. 'hierarchical' : Hierarchical SVD, i.e., an algorithm for computing an approximate, truncated SVD, is performed. Only available for data split along axis no. 0. + 'randomized' : Randomized SVD is performed. tol : float, default=None Not yet necessary as iterative methods for PCA are not yet implemented. - iterated_power : {'auto', int}, default='auto' - if svd_solver='randomized', ... (not yet supported) + iterated_power : int, default=0 + if svd_solver='randomized', this parameter is the number of iterations for the power method. + Choosing `iterated_power > 0` can lead to better results in the case of slowly decaying singular values but is computationally more expensive. n_oversamples : int, default=10 - if svd_solver='randomized', ... (not yet supported) + if svd_solver='randomized', this parameter is the number of additional random vectors to sample the range of X so that the range of X can be approximated more accurately. power_iteration_normalizer : {'qr'}, default='qr' - if svd_solver='randomized', ... (not yet supported) + if svd_solver='randomized', this parameter is the normalization form of the iterated power method. So far, only QR is supported. random_state : int, default=None - if svd_solver='randomized', ... (not yet supported) + if svd_solver='randomized', this parameter allows to set the seed for the random number generator. Attributes ---------- @@ -53,16 +56,17 @@ class PCA(ht.TransformMixin, ht.BaseEstimator): Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_. explained_variance_ : DNDarray of shape (n_components,) The amount of variance explained by each of the selected components. - Not supported by svd_solver='hierarchical'. + Not supported by svd_solver='hierarchical' and svd_solver='randomized'. explained_variance_ratio_ : DNDarray of shape (n_components,) Percentage of variance explained by each of the selected components. - Not supported by svd_solver='hierarchical'. + Not supported by svd_solver='hierarchical' and svd_solver='randomized'. total_explained_variance_ratio_ : float The percentage of total variance explained by the selected components together. For svd_solver='hierarchical', an lower estimate for this quantity is provided; see :func:`ht.linalg.hsvd_rtol` and :func:`ht.linalg.hsvd_rank` for details. + Not supported by svd_solver='randomized'. singular_values_ : DNDarray of shape (n_components,) The singular values corresponding to each of the selected components. - Not supported by svd_solver='hierarchical'. + Not supported by svd_solver='hierarchical' and svd_solver='randomized'. mean_ : DNDarray of shape (n_features,) Per-feature empirical mean, estimated from the training set. n_components_ : int @@ -73,9 +77,10 @@ class PCA(ht.TransformMixin, ht.BaseEstimator): not yet implemented Notes - ------------ - Hieararchical SVD (`svd_solver = "hierarchical"`) computes and approximate, truncated SVD. Thus, the results are not exact, in general, unless the - truncation rank chose is larger than the actual rank (matrix rank) of the underlying data; see :func:`ht.linalg.hsvd_rank` and :func:`ht.linalg.hsvd_rtol` for details. + ----- + Hierarchical SVD (`svd_solver = "hierarchical"`) computes an approximate, truncated SVD. Thus, the results are not exact, in general, unless the + truncation rank chosen is larger than the actual rank (matrix rank) of the underlying data; see :func:`ht.linalg.hsvd_rank` and :func:`ht.linalg.hsvd_rtol` for details. + Randomized SVD (`svd_solver = "randomized"`) is a stochastic algorithm that computes an approximate, truncated SVD. """ def __init__( @@ -85,7 +90,7 @@ def __init__( whiten: bool = False, svd_solver: str = "hierarchical", tol: Optional[float] = None, - iterated_power: Union[str, int] = "auto", + iterated_power: Union[str, int] = 0, n_oversamples: int = 10, power_iteration_normalizer: str = "qr", random_state: Optional[int] = None, @@ -99,10 +104,12 @@ def __init__( raise NotImplementedError("Whitening is not yet supported. Please set whiten=False.") if not (svd_solver == "full" or svd_solver == "hierarchical" or svd_solver == "randomized"): raise ValueError( - "At the moment, only svd_solver='full' (for tall-skinny or short-fat data) and svd_solver='hierarchical' are supported. \n An implementation of the 'full' option for arbitrarily shaped data as well as the option 'randomized' are already planned." + "At the moment, only svd_solver='full' (for tall-skinny or short-fat data), svd_solver='hierarchical', and svd_solver='randomized' are supported. \n An implementation of the 'full' option for arbitrarily shaped data is already planned." + ) + if not isinstance(iterated_power, int): + raise TypeError( + "iterated_power must be an integer. The option 'auto' is not yet supported." ) - if iterated_power != "auto" and not isinstance(iterated_power, int): - raise TypeError("iterated_power must be 'auto' or an integer.") if isinstance(iterated_power, int) and iterated_power < 0: raise ValueError("if an integer, iterated_power must be greater or equal to 0.") if power_iteration_normalizer != "qr": @@ -113,10 +120,8 @@ def __init__( raise ValueError( "Argument tol is not yet necessary as iterative methods for PCA are not yet implemented. Please set tol=None." ) - if random_state is None: - random_state = 0 - if not isinstance(random_state, int): - raise ValueError("random_state must be None or an integer.") + if random_state is not None and not isinstance(random_state, int): + raise ValueError(f"random_state must be None or an integer, was {type(random_state)}.") if ( n_components is not None and not (isinstance(n_components, int) and n_components >= 1) @@ -135,6 +140,9 @@ def __init__( self.n_oversamples = n_oversamples self.power_iteration_normalizer = power_iteration_normalizer self.random_state = random_state + if self.random_state is not None: + # set random seed accordingly + ht.random.seed(self.random_state) # set future attributes to None to initialize those that will not be computed later on with None (e.g., explained_variance_ for svd_solver='hierarchical') self.components_ = None @@ -220,10 +228,15 @@ def fit(self, X: ht.DNDarray, y=None) -> Self: self.total_explained_variance_ratio_ = 1 - info.larray.item() ** 2 else: - # here one could add other computational backends - raise NotImplementedError( - f"The chosen svd_solver {self.svd_solver} is not yet implemented." + # compute SVD via "randomized" SVD + _, S, V = ht.linalg.rsvd( + X_centered, + self.n_components_, + n_oversamples=self.n_oversamples, + power_iter=self.iterated_power, ) + self.components_ = V.T + self.n_components_ = V.shape[1] self.n_samples_ = X.shape[0] self.noise_variance_ = None # not yet implemented @@ -265,3 +278,156 @@ def inverse_transform(self, X: ht.DNDarray) -> ht.DNDarray: ) return X @ self.components_ + self.mean_ + + +class IncrementalPCA(ht.TransformMixin, ht.BaseEstimator): + """ + Incremental Principal Component Analysis (PCA). + + This class allows for incremental updates of the PCA model. This is especially useful for large data sets that do not fit into memory. + + An example how to apply this class is given in, e.g., `benchmarks/cb/decomposition.py`. + + Parameters + ---------- + n_components : int, optional + Number of components to keep. If `n_components` is not set all components are kept (default). + copy : bool, default=True + In-place operations are not yet supported. Please set `copy=True`. + whiten : bool, default=False + Not yet supported. + batch_size : int, optional + Currently not needed and only added for API consistency and possible future extensions. + + Attributes + ---------- + components_ : DNDarray of shape (n_components, n_features) + Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by `explained_variance_. + singular_values_ : DNDarray of shape (n_components,) + The singular values corresponding to each of the selected components. + mean_ : DNDarray of shape (n_features,) + Per-feature empirical mean, estimated from the training set. + n_components_ : int + The estimated number of components. + n_samples_seen_ : int + Number of samples processed so far. + """ + + def __init__( + self, + n_components: Optional[int] = None, + copy: bool = True, + whiten: bool = False, + batch_size: Optional[int] = None, + ): + if not copy: + raise NotImplementedError( + "In-place operations for PCA are not supported at the moment. Please set copy=True." + ) + if whiten: + raise NotImplementedError("Whitening is not yet supported. Please set whiten=False.") + if n_components is not None: + if not isinstance(n_components, int): + raise TypeError( + f"n_components must be None or an integer, but is {type(n_components)}." + ) + else: + if n_components < 1: + raise ValueError("if an integer, n_components must be greater or equal to 1.") + self.whiten = whiten + self.n_components = n_components + self.batch_size = batch_size + self.components_ = None + # self.explained_variance_ = None # not yet supported + # self.explained_variance_ratio_ = None # not yet supported + self.singular_values_ = None + self.mean_ = None + self.n_components_ = None + self.batch_size_ = None + self.n_samples_seen_ = 0 + + def fit(self, X, y=None) -> Self: + """ + Not yet implemented; please use `.partial_fit` instead. + Please open an issue on GitHub if you would like to see this method implemented and make a suggestion on how you would like to see it implemented. + """ + raise NotImplementedError( + f"You have called IncrementalPCA's `.fit`-method with an argument of type {type(X)}. \n So far, we have only implemented the method `.partial_fit` which performs a single-step update of incremental PCA. \n Please consider using `.partial_fit` for the moment, and open an issue on GitHub in which we can discuss what you would like to see implemented for the `.fit`-method." + ) + + def partial_fit(self, X: ht.DNDarray, y=None): + """ + One single step of incrementally building up the PCA. + Input X is the current batch of data that needs to be added to the existing PCA. + """ + ht.sanitize_in(X) + if y is not None: + raise ValueError( + "Argument y is ignored and just present for API consistency by convention." + ) + if self.n_samples_seen_ == 0: + # this is the first batch of data, hence we need to initialize everything + if self.n_components is None: + self.n_components_ = min(X.shape) + else: + self.n_components_ = min(X.shape[0], X.shape[1], self.n_components) + + self.mean_ = X.mean(axis=0) + X_centered = X - self.mean_ + _, S, V = ht.linalg.svd(X_centered) + self.components_ = V[:, : self.n_components_].T + self.singular_values_ = S[: self.n_components_] + self.n_samples_seen_ = X.shape[0] + + else: + # if already batches of data have been seen before, only an update is necessary + U, S, mean = _isvd( + X.T, + self.components_.T, + self.singular_values_, + V_old=None, + maxrank=self.n_components, + old_matrix_size=self.n_samples_seen_, + old_rowwise_mean=self.mean_, + ) + self.components_ = U.T + self.singular_values_ = S + self.mean_ = mean + self.n_samples_seen_ += X.shape[0] + self.n_components_ = self.components_.shape[0] + + def transform(self, X: ht.DNDarray) -> ht.DNDarray: + """ + Apply dimensionality based on PCA to X. + + Parameters + ---------- + X : DNDarray of shape (n_samples, n_features) + Data set to be transformed. + """ + ht.sanitize_in(X) + if X.shape[1] != self.mean_.shape[0]: + raise ValueError( + f"X must have the same number of features as the training data. Expected {self.mean_.shape[0]} but got {X.shape[1]}." + ) + + # center data and apply PCA + X_centered = X - self.mean_ + return X_centered @ self.components_.T + + def inverse_transform(self, X: ht.DNDarray) -> ht.DNDarray: + """ + Transform data back to its original space. + + Parameters + ---------- + X : DNDarray of shape (n_samples, n_components) + Data set to be transformed back. + """ + ht.sanitize_in(X) + if X.shape[1] != self.n_components_: + raise ValueError( + f"Dimension mismatch. Expected input of shape n_points x {self.n_components_} but got {X.shape}." + ) + + return X @ self.components_ + self.mean_ diff --git a/heat/decomposition/tests/test_dmd.py b/heat/decomposition/tests/test_dmd.py new file mode 100644 index 0000000000..9aa803d6c6 --- /dev/null +++ b/heat/decomposition/tests/test_dmd.py @@ -0,0 +1,589 @@ +import os +import unittest +import platform +import numpy as np +import torch +import heat as ht + +from ...core.tests.test_suites.basic_test import TestCase + +# MPS does not support non-float matrix multiplication +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + + +@unittest.skipIf(is_mps, "MPS does not support non-float matrix multiplication") +class TestDMD(TestCase): + def test_dmd_setup_catch_wrong(self): + # catch wrong inputs during setup + with self.assertRaises(TypeError): + ht.decomposition.DMD(svd_solver=0) + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="Gramian") + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="full", svd_rank=3, svd_tol=1e-1) + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="full", svd_tol=-0.031415926) + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="hierarchical") + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3, svd_tol=1e-1) + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="randomized") + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="randomized", svd_rank=2, svd_tol=1e-1) + with self.assertRaises(TypeError): + ht.decomposition.DMD(svd_solver="full", svd_rank=0.1) + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=0) + with self.assertRaises(TypeError): + ht.decomposition.DMD(svd_solver="hierarchical", svd_tol="auto") + with self.assertRaises(ValueError): + ht.decomposition.DMD(svd_solver="randomized", svd_rank=0) + + def test_dmd_fit_catch_wrong(self): + dmd = ht.decomposition.DMD(svd_solver="full") + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0)) + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0)) + + def test_dmd_predict_catch_wrong(self): + # not yet fitted + dmd = ht.decomposition.DMD(svd_solver="full") + with self.assertRaises(RuntimeError): + dmd.predict_next(ht.zeros(10)) + with self.assertRaises(RuntimeError): + dmd.predict(ht.zeros(10), 10) + + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32) + dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4) + dmd.fit(X) + # wrong shape of input for prediction + with self.assertRaises(ValueError): + dmd.predict_next(ht.zeros((100, 4), split=0)) + with self.assertRaises(ValueError): + dmd.predict(ht.zeros((100, 4), split=0), 10) + # wrong input for steps in predict + with self.assertRaises(TypeError): + dmd.predict( + ht.zeros((1000, 5), split=0), + "this is clearly neither an integer nor a list of integers", + ) + # check catching wrong n_steps argument + with self.assertRaises(TypeError): + dmd.predict_next(X, "this is clearly not an integer") + # what not has been implemented so far + with self.assertRaises(NotImplementedError): + dmd.predict(ht.zeros((1000, 5), split=0), 10) + + def test_dmd_functionality_split0_full(self): + # split=0, full SVD + X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0) + dmd = ht.decomposition.DMD(svd_solver="full") + dmd.fit(X) + self.assertTrue(dmd.rom_eigenmodes_.dtype == ht.complex64) + self.assertEqual(dmd.rom_eigenmodes_.shape, (dmd.n_modes_, dmd.n_modes_)) + dmd = ht.decomposition.DMD(svd_solver="full", svd_tol=1e-1) + dmd.fit(X) + self.assertTrue(dmd.rom_basis_.shape[0] == 10 * ht.MPI_WORLD.size) + dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=3) + dmd.fit(X) + self.assertTrue(dmd.rom_basis_.shape[1] == 3) + self.assertTrue(dmd.dmdmodes_.shape == (10 * ht.MPI_WORLD.size, 3)) + + def test_dmd_functionality_split0_hierarchical(self): + # split=0, hierarchical SVD + X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0) + dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3) + dmd.fit(X) + self.assertTrue(dmd.rom_eigenvalues_.shape == (3,)) + dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_tol=1e-1) + dmd.fit(X) + Y = ht.random.randn(10 * ht.MPI_WORLD.size, split=0) + Z = dmd.predict_next(Y) + self.assertTrue(Z.shape == (10 * ht.MPI_WORLD.size,)) + self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex64) + self.assertTrue(dmd.dmdmodes_.dtype == ht.complex64) + + def test_dmd_functionality_split0_randomized(self): + # split=0, randomized SVD + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32) + dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4) + dmd.fit(X) + Y = ht.random.rand(1000, 2 * ht.MPI_WORLD.size, split=1, dtype=ht.float32) + Z = dmd.predict_next(Y, 2) + self.assertTrue(Z.dtype == ht.float32) + self.assertEqual(Z.shape, Y.shape) + Y = ht.random.rand(1000, split=0, dtype=ht.float32) + Z = dmd.predict_next(Y, 2) + self.assertTrue(Z.dtype == ht.float32) + self.assertEqual(Z.shape, Y.shape) + + def test_dmd_functionality_split1_full(self): + # split=1, full SVD + X = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + dmd = ht.decomposition.DMD(svd_solver="full") + print(dmd) + dmd.fit(X) + print(dmd) + self.assertTrue(dmd.dmdmodes_.shape[0] == 10) + dmd = ht.decomposition.DMD(svd_solver="full", svd_tol=1e-1) + dmd.fit(X) + dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=3) + dmd.fit(X) + self.assertTrue(dmd.dmdmodes_.shape[1] == 3) + + def test_dmd_functionality_split1_hierarchical(self): + # split=1, hierarchical SVD + X = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=3) + dmd.fit(X) + self.assertTrue(dmd.rom_transfer_matrix_.shape == (3, 3)) + self.assertTrue(dmd.rom_transfer_matrix_.dtype == ht.float64) + dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_tol=1e-1) + dmd.fit(X) + self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex128) + Y = ht.random.randn(10, 2 * ht.MPI_WORLD.size, split=1) + Z = dmd.predict_next(Y) + self.assertTrue(Z.shape == Y.shape) + + def test_dmd_functionality_split1_randomized(self): + # split=1, randomized SVD + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0) + dmd = ht.decomposition.DMD(svd_solver="randomized", svd_rank=4) + dmd.fit(X) + self.assertTrue(dmd.rom_eigenmodes_.shape == (4, 4)) + self.assertTrue(dmd.n_modes_ == 4) + Y = ht.random.randn(1000, 2, split=0, dtype=ht.float64) + Z = dmd.predict_next(Y) + self.assertTrue(Z.dtype == Y.dtype) + self.assertEqual(Z.shape, Y.shape) + + def test_dmd_correctness_split0(self): + ht.random.seed(25032025) + # test correctness on behalf of a constructed example with known solution + # to do so we need to use the exact SVD, i.e., the "full" solver + r = 6 + A_red = ht.array( + [ + [0.0, -1.0, 0.0, 0.0, 0.0, 0.0], + [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], + [0.0, 0.0, 1.5, 0.0, 0.0, 0.0], + [0.0, 0.0, 0.0, 0.5, 0.0, 0.0], + [0.0, 0.0, 0.0, 0.0, -1.5, 0.0], + [0.0, 0.0, 0.0, 0.0, 0.0, -0.5], + ], + split=None, + dtype=ht.float32, + ) + x0_red = ht.random.randn(r, 1, split=None) + m, n = 25 * ht.MPI_WORLD.size, 15 + X = ht.hstack( + [ + (ht.array(torch.linalg.matrix_power(A_red.larray, i) @ x0_red.larray)) + for i in range(n + 1) + ] + ) + U = ht.random.randn(m, r, split=0) + U, _ = ht.linalg.qr(U) + X = U @ X + + dmd = ht.decomposition.DMD(svd_solver="full", svd_rank=r) + dmd.fit(X) + + # check whether the DMD-modes are correct + sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy()) + sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy())) + self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-3, rtol=1e-3)) + + # check prediction of next states + Y = dmd.predict_next(X) + self.assertTrue(ht.allclose(Y[:, :n], X[:, 1:], atol=1e-3, rtol=1e-3)) + + # check prediction of previous states + Y = dmd.predict_next(X, -1) + self.assertTrue(ht.allclose(Y[:, 1:], X[:, :n], atol=1e-3, rtol=1e-3)) + + def test_dmd_correctness_split1(self): + # dtype is float64, transfer matrix with nontrivial kernel + r = 3 + A_red = ht.array( + [[0.0, 0.0, 1.0], [0.5, 0.0, 0.0], [0.5, 0.0, 0.0]], split=None, dtype=ht.float64 + ) + x0_red = ht.random.randn(r, 1, split=None, dtype=ht.float64) + m, n = 10, 15 * ht.MPI_WORLD.size + 2 + X = ht.hstack( + [ + (ht.array(torch.linalg.matrix_power(A_red.larray, i) @ x0_red.larray)) + for i in range(n + 1) + ] + ) + U = ht.random.randn(m, r, split=None, dtype=ht.float64) + U, _ = ht.linalg.qr(U) + X = U @ X + X = X.resplit_(1) + + dmd = ht.decomposition.DMD(svd_solver="hierarchical", svd_rank=r) + dmd.fit(X) + + # check whether the DMD-modes are correct + sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy()) + sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy())) + self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-12, rtol=1e-12)) + + # check prediction of third-next step + Y = dmd.predict_next(X, 3) + self.assertTrue(ht.allclose(Y[:, : n - 2], X[:, 3:], atol=1e-12, rtol=1e-12)) + # note: checking previous steps doesn't make sense here, as kernel of A_red is nontrivial + + # check batch prediction (split = 1) + X_batch = X[:, : 5 * ht.MPI_WORLD.size] + X_batch.balance_() + Y = dmd.predict(X_batch, 5) + Y_np = Y.numpy() + X_np = X.numpy() + for i in range(5): + self.assertTrue(np.allclose(Y_np[i, :, :5], X_np[:, i : i + 5], atol=1e-12, rtol=1e-12)) + + # check batch prediction (split = None) + X_batch = ht.random.rand(10, 2 * ht.MPI_WORLD.size, split=None) + Y = dmd.predict(X_batch, [-1, 1, 3]) + + +class TestDMDc(TestCase): + def test_dmdc_setup_catch_wrong(self): + # catch wrong inputs + with self.assertRaises(TypeError): + ht.decomposition.DMDc(svd_solver=0) + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="Gramian") + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="full", svd_rank=3, svd_tol=1e-1) + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="full", svd_tol=-0.031415926) + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="hierarchical") + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3, svd_tol=1e-1) + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="randomized") + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="randomized", svd_rank=2, svd_tol=1e-1) + with self.assertRaises(TypeError): + ht.decomposition.DMDc(svd_solver="full", svd_rank=0.1) + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=0) + with self.assertRaises(TypeError): + ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol="auto") + with self.assertRaises(ValueError): + ht.decomposition.DMDc(svd_solver="randomized", svd_rank=0) + + def test_dmdc_fit_catch_wrong(self): + dmd = ht.decomposition.DMDc(svd_solver="full") + # wrong dimensions of input + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0), ht.zeros((2, 4), split=0)) + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((2, 4), split=0), ht.zeros((5 * ht.MPI_WORLD.size, 2, 2), split=0)) + # less than two timesteps + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0), ht.zeros((2, 4), split=0)) + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((2, 4), split=0), ht.zeros((5 * ht.MPI_WORLD.size, 1), split=0)) + # inconsistent number of timesteps + with self.assertRaises(ValueError): + dmd.fit(ht.zeros((5 * ht.MPI_WORLD.size, 3), split=0), ht.zeros((2, 4), split=0)) + # predict for fit + with self.assertRaises(RuntimeError): + dmd.predict(ht.zeros((5 * ht.MPI_WORLD.size, 3), split=0), ht.zeros((2, 4), split=0)) + # split mismatch for X and C + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32) + dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4) + # split mismatch for X and C + C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=1) + with self.assertRaises(ValueError): + dmd.fit(X, C) + + def test_dmdc_predict_catch_wrong(self): + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32) + dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4) + C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None) + dmd.fit(X, C) + Y = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=1) + # wrong dimensions of input for prediction + with self.assertRaises(ValueError): + dmd.predict(Y, ht.zeros((5, 5, 5), split=0)) + with self.assertRaises(ValueError): + dmd.predict(ht.zeros((5, 5, 5), split=0), C) + # wrong sizes for inputs in predict + with self.assertRaises(ValueError): + dmd.predict(Y, ht.zeros((10, 5), split=0)) + with self.assertRaises(ValueError): + dmd.predict(ht.zeros((1000, 5), split=0), C) + # wrong split for C + with self.assertRaises(ValueError): + dmd.predict(Y, ht.zeros((10, 5), split=1)) + # wrong shape for C + with self.assertRaises(ValueError): + dmd.predict(Y, ht.zeros((5, 5), split=None)) + + def test_dmdc_functionality_split0_full(self): + # split=0, full SVD + X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0) + C = ht.random.randn(10, 10, split=0) + dmd = ht.decomposition.DMDc(svd_solver="full") + print(dmd) + dmd.fit(X, C) + print(dmd) + self.assertTrue(dmd.rom_eigenmodes_.dtype == ht.complex64) + self.assertEqual(dmd.rom_eigenmodes_.shape, (dmd.n_modes_, dmd.n_modes_)) + dmd = ht.decomposition.DMDc(svd_solver="full", svd_tol=1e-1) + dmd.fit(X, C) + self.assertTrue(dmd.rom_basis_.shape[0] == 10 * ht.MPI_WORLD.size) + dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3) + dmd.fit(X, C) + self.assertTrue(dmd.rom_basis_.shape[1] == 3) + self.assertTrue(dmd.dmdmodes_.shape == (10 * ht.MPI_WORLD.size, 3)) + + def test_dmdc_functionality_split0_hierarchical(self): + # split=0, hierarchical SVD + X = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0) + C = ht.random.randn(10, 10, split=0) + dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3) + dmd.fit(X, C) + self.assertTrue(dmd.rom_eigenvalues_.shape == (3,)) + dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol=1e-1) + dmd.fit(X, C) + Y = ht.random.randn(3, 10 * ht.MPI_WORLD.size, split=1) + C = ht.random.randn(10, 5, split=None) + Z = dmd.predict(Y, C) + self.assertTrue(Z.shape == (3, 10 * ht.MPI_WORLD.size, 5)) + self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex64) + self.assertTrue(dmd.dmdmodes_.dtype == ht.complex64) + + def test_dmdc_functionality_split0_randomized(self): + # split=0, randomized SVD + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0, dtype=ht.float32) + dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=4) + C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None) + dmd.fit(X, C) + Y = ht.random.rand(2 * ht.MPI_WORLD.size, 1000, split=0, dtype=ht.float32) + C = ht.random.rand(10, 5, split=None) + Z = dmd.predict(Y, C) + self.assertTrue(Z.dtype == ht.float32) + self.assertEqual(Z.shape, (2 * ht.MPI_WORLD.size, 1000, 5)) + + def test_dmdc_functionality_split1_full(self): + # split=1, full SVD + X = ht.random.randn(10, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + C = ht.random.randn(2, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + dmd = ht.decomposition.DMDc(svd_solver="full") + dmd.fit(X, C) + self.assertTrue(dmd.dmdmodes_.shape[0] == 10) + dmd = ht.decomposition.DMDc(svd_solver="full", svd_tol=1e-1) + dmd.fit(X, C) + dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3) + dmd.fit(X, C) + self.assertTrue(dmd.dmdmodes_.shape[1] == 3) + + def test_dmdc_functionality_split1_hierarchical(self): + # split=1, hierarchical SVD + X = ht.random.randn(10, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + C = ht.random.randn(2, 15 * ht.MPI_WORLD.size, split=1, dtype=ht.float64) + dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_rank=3) + dmd.fit(X, C) + self.assertTrue(dmd.rom_transfer_matrix_.shape == (3, 3)) + self.assertTrue(dmd.rom_transfer_matrix_.dtype == ht.float64) + dmd = ht.decomposition.DMDc(svd_solver="hierarchical", svd_tol=1e-1) + dmd.fit(X, C) + self.assertTrue(dmd.rom_eigenvalues_.dtype == ht.complex128) + Y = ht.random.randn(10 * ht.MPI_WORLD.size, 10, split=0) + C = ht.random.randn(2, split=None) + Z = dmd.predict(Y, C) + self.assertTrue(Z.shape == (10 * ht.MPI_WORLD.size, 10, 1)) + + def test_dmdc_functionality_split1_randomized(self): + # split=1, randomized SVD + X = ht.random.randn(1000, 10 * ht.MPI_WORLD.size, split=0) + C = ht.random.randn(10, 10 * ht.MPI_WORLD.size, split=None) + dmd = ht.decomposition.DMDc(svd_solver="randomized", svd_rank=8) + dmd.fit(X, C) + self.assertTrue(dmd.rom_eigenmodes_.shape == (8, 8)) + self.assertTrue(dmd.n_modes_ == 8) + Y = ht.random.randn(1000, split=0, dtype=ht.float64) + Z = dmd.predict(Y, C) + self.assertTrue(Z.dtype == Y.dtype) + self.assertEqual(Z.shape, (1, 1000, 10 * ht.MPI_WORLD.size)) + + def test_dmdc_correctness_split0(self): + # check correctness on behalf of a constructed example with known solution, + # thus only the "full" solver is used + r = 3 + A_red = ht.array( + [ + [0.0, 1, 0.0], + [-1.0, 0.0, 0.0], + [0.0, 0.0, 0.1], + ], + split=None, + dtype=ht.float64, + ) + B_red = ht.array( + [ + [1.0, 0.0], + [0.0, -1.0], + [0.0, 1.0], + ], + split=None, + dtype=ht.float64, + ) + x0_red = ht.array( + [ + [ + 10.0, + ], + [ + 5.0, + ], + [ + -10.0, + ], + ], + split=None, + dtype=ht.float64, + ) + m, n = 10 * ht.MPI_WORLD.size, 10 + C = 0.1 * ht.ones((2, n), split=None, dtype=ht.float64) + X_red = [x0_red] + for k in range(n - 1): + X_red.append(A_red @ X_red[-1] + B_red @ C[:, k].reshape(-1, 1)) + X = ht.stack(X_red, axis=1).squeeze() + U = ht.random.randn(m, r, split=0, dtype=ht.float64) + U, _ = ht.linalg.qr(U) + X = U @ X + + dmd = ht.decomposition.DMDc(svd_solver="full", svd_rank=3) + dmd.fit(X, C) + + # check whether the DMD-modes are correct + sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy()) + sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy())) + self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-12, rtol=1e-12)) + + # check if DMD fits the data correctly + X_red = dmd.rom_basis_.T @ X + X_res = ( + X_red[:, 1:] + - dmd.rom_transfer_matrix_ @ X_red[:, :-1] + - dmd.rom_control_matrix_ @ C[:, :-1] + ) + self.assertTrue(ht.max(ht.abs(X_res)) < 1e-10) + + # check predict + Y = dmd.predict(X[:, 0], C[:, :10]).squeeze() + + # check prediction of next states + Y_red = dmd.rom_basis_.T @ Y + Y_res = ( + Y_red[:, 1:] + - dmd.rom_transfer_matrix_ @ Y_red[:, :-1] + - dmd.rom_control_matrix_ @ C[:, :-1] + ) + self.assertTrue(ht.max(ht.abs(Y_res)) < 1e-10) + self.assertTrue(ht.allclose(Y[:, :], X[:, :10], atol=1e-10, rtol=1e-10)) + + def test_dmdc_correctness_split1(self): + # check correctness on behalf of a constructed example with known solution, + # thus only the "full" solver is used + A_red = ht.array( + [ + [ + 1.0, + 0.0, + 0.0, + 0.0, + 0.0, + ], + [ + 0.0, + 1.05, + 0.0, + 0.0, + 0.0, + ], + [ + 0.0, + 0.0, + -0.1, + 0.0, + 0.0, + ], + [ + 0.0, + 0.0, + 0.0, + 0.0, + 0.5, + ], + [ + 0.0, + 0.0, + 0.0, + -0.5, + 0.0, + ], + ], + split=None, + dtype=ht.float32, + ) + B_red = ht.array( + [ + [1.0, 0.0], + [0.0, 1.0], + [1.0, 0.0], + [0.0, 1.0], + [0.0, 0.0], + ], + split=None, + dtype=ht.float32, + ) + x0_red = ht.ones((5, 1), split=None, dtype=ht.float32) + n = 20 * ht.MPI_WORLD.size + C = 0.1 * ht.random.randn(2, n, split=None, dtype=ht.float32) + X_red = [x0_red] + for k in range(n - 1): + X_red.append(A_red @ X_red[-1] + B_red @ C[:, k].reshape(-1, 1)) + X = ht.stack(X_red, axis=1).squeeze() + X.resplit_(1) + + dmd = ht.decomposition.DMDc(svd_solver="full") + dmd.fit(X, C) + + # check whether the DMD-modes are correct + sorted_ev_1 = np.sort_complex(dmd.rom_eigenvalues_.numpy()) + sorted_ev_2 = np.sort_complex(np.linalg.eigvals(A_red.numpy())) + self.assertTrue(np.allclose(sorted_ev_1, sorted_ev_2, atol=1e-4, rtol=1e-4)) + + # check if DMD fits the data correctly + X_red = dmd.rom_basis_.T @ X + X_red.resplit_(None) + X_res = ( + X_red[:, 1:] + - dmd.rom_transfer_matrix_ @ X_red[:, :-1] + - dmd.rom_control_matrix_ @ C[:, :-1] + ) + self.assertTrue(ht.max(ht.abs(X_res)) < 1e-2) + + # # check predict + Y = dmd.predict(X[:, 0], C).squeeze() + + # check prediction of next states + Y_red = dmd.rom_basis_.T @ Y + Y_res = ( + Y_red[:, 1:] + - dmd.rom_transfer_matrix_ @ Y_red[:, :-1] + - dmd.rom_control_matrix_ @ C[:, :-1] + ) + self.assertTrue(ht.max(ht.abs(Y_res)) < 1e-2) + self.assertTrue(ht.allclose(Y[:, :], X[:, :], atol=1e-2, rtol=1e-2)) diff --git a/heat/decomposition/tests/test_pca.py b/heat/decomposition/tests/test_pca.py index ffe6d52750..361272ec91 100644 --- a/heat/decomposition/tests/test_pca.py +++ b/heat/decomposition/tests/test_pca.py @@ -20,10 +20,10 @@ def test_pca_setup(self): self.assertEqual(pca.whiten, False) self.assertEqual(pca.svd_solver, "hierarchical") self.assertEqual(pca.tol, None) - self.assertEqual(pca.iterated_power, "auto") + self.assertEqual(pca.iterated_power, 0) self.assertEqual(pca.n_oversamples, 10) self.assertEqual(pca.power_iteration_normalizer, "qr") - self.assertEqual(pca.random_state, 0) + self.assertEqual(pca.random_state, None) # check catching of invalid parameters # wrong withening @@ -115,7 +115,6 @@ def test_pca_with_hiearchical_rtol(self): and pca.total_explained_variance_ratio_ >= 0.0 and pca.total_explained_variance_ratio_ <= 1.0 ) - print(pca.total_explained_variance_ratio_) self.assertTrue(pca.total_explained_variance_ratio_ >= ratio) if ht.MPI_WORLD.size > 1: self.assertEqual(pca.explained_variance_, None) @@ -192,8 +191,147 @@ def test_pca_with_full_rtol(self): self.assertEqual(pca.noise_variance_, None) def test_pca_randomized(self): - pca = ht.decomposition.PCA(n_components=2, svd_solver="randomized") + rank = 2 + pca = ht.decomposition.PCA(n_components=rank, svd_solver="randomized") data = ht.random.randn(15 * ht.MPI_WORLD.size, 5, split=0) + + pca.fit(data) + self.assertEqual(pca.components_.shape, (rank, 5)) + self.assertEqual(pca.n_components_, rank) + self.assertEqual(pca.mean_.shape, (5,)) + if ht.MPI_WORLD.size > 1: - with self.assertRaises(NotImplementedError): - pca.fit(data) + self.assertEqual(pca.total_explained_variance_ratio_, None) + self.assertEqual(pca.noise_variance_, None) + self.assertEqual(pca.explained_variance_, None) + self.assertEqual(pca.explained_variance_ratio_, None) + self.assertEqual(pca.singular_values_, None) + + pca = ht.decomposition.PCA(n_components=None, svd_solver="randomized", random_state=1234) + self.assertEqual(ht.random.get_state()[1], 1234) + + +class TestIncrementalPCA(TestCase): + def test_incrementalpca_setup(self): + pca = ht.decomposition.IncrementalPCA(n_components=2) + + # check correct base classes + self.assertTrue(ht.is_estimator(pca)) + self.assertTrue(ht.is_transformer(pca)) + + # check correct default values + self.assertEqual(pca.n_components, 2) + self.assertEqual(pca.whiten, False) + self.assertEqual(pca.batch_size, None) + self.assertEqual(pca.components_, None) + self.assertEqual(pca.singular_values_, None) + self.assertEqual(pca.mean_, None) + self.assertEqual(pca.n_components_, None) + self.assertEqual(pca.batch_size_, None) + self.assertEqual(pca.n_samples_seen_, 0) + + # check catching of invalid parameters + # whitening and in-place are not yet supported + with self.assertRaises(NotImplementedError): + ht.decomposition.IncrementalPCA(whiten=True) + with self.assertRaises(NotImplementedError): + ht.decomposition.IncrementalPCA(copy=False) + # wrong n_components + with self.assertRaises(TypeError): + ht.decomposition.IncrementalPCA(n_components=0.9) + with self.assertRaises(ValueError): + ht.decomposition.IncrementalPCA(n_components=0) + + def test_incrementalpca_full_rank_reached_split0(self): + # full rank is reached, split = 0 + # dtype float32 + pca = ht.decomposition.IncrementalPCA() + data0 = ht.random.randn(150 * ht.MPI_WORLD.size, 2 * ht.MPI_WORLD.size + 1, split=0) + data1 = 1.0 + ht.random.rand(50 * ht.MPI_WORLD.size, 2 * ht.MPI_WORLD.size + 1, split=0) + data = ht.vstack([data0, data1]) + data0_np = data0.numpy() + data_np = data.numpy() + + # test partial_fit, step 0 + pca.partial_fit(data0) + self.assertEqual( + pca.components_.shape, (2 * ht.MPI_WORLD.size + 1, 2 * ht.MPI_WORLD.size + 1) + ) + self.assertEqual(pca.n_components_, 2 * ht.MPI_WORLD.size + 1) + self.assertEqual(pca.mean_.shape, (2 * ht.MPI_WORLD.size + 1,)) + self.assertEqual(pca.singular_values_.shape, (2 * ht.MPI_WORLD.size + 1,)) + self.assertEqual(pca.n_samples_seen_, 150 * ht.MPI_WORLD.size) + s0_np = np.linalg.svd(data0_np - data0_np.mean(axis=0), compute_uv=False, hermitian=False) + self.assertTrue(np.allclose(s0_np, pca.singular_values_.numpy())) + + # test partial_fit, step 1 + pca.partial_fit(data1) + self.assertEqual( + pca.components_.shape, (2 * ht.MPI_WORLD.size + 1, 2 * ht.MPI_WORLD.size + 1) + ) + self.assertEqual(pca.n_components_, 2 * ht.MPI_WORLD.size + 1) + self.assertTrue(ht.allclose(pca.mean_, ht.mean(data, axis=0))) + self.assertEqual(pca.singular_values_.shape, (2 * ht.MPI_WORLD.size + 1,)) + self.assertEqual(pca.n_samples_seen_, 200 * ht.MPI_WORLD.size) + s_np = np.linalg.svd(data_np - data_np.mean(axis=0), compute_uv=False, hermitian=False) + self.assertTrue(np.allclose(s_np, pca.singular_values_.numpy())) + + # test transform (only possible here, as in the next test truncation happens) + new_data = ht.random.rand(100, 2 * ht.MPI_WORLD.size + 1, split=1) + Y = pca.transform(new_data) + Z = pca.inverse_transform(Y) + self.assertTrue(ht.allclose(new_data, Z, atol=1e-4, rtol=1e-4)) + + def test_incrementalpca_truncation_happens_split1(self): + # full rank not reached, but truncation happens, split = 1 + # dtype float64 unless on MPS + dtype = ht.float64 if not self.is_mps else ht.float32 + pca = ht.decomposition.IncrementalPCA(n_components=15) + data0 = ht.random.randn(9, 100 * ht.MPI_WORLD.size + 1, split=1, dtype=dtype) + data1 = 1.0 + ht.random.rand(11, 100 * ht.MPI_WORLD.size + 1, split=1, dtype=dtype) + data = ht.vstack([data0, data1]) + data0_np = data0.numpy() + data_np = data.numpy() + + # test partial_fit, step 0 + pca.partial_fit(data0) + self.assertEqual(pca.components_.shape, (9, 100 * ht.MPI_WORLD.size + 1)) + self.assertEqual(pca.components_.dtype, dtype) + self.assertEqual(pca.n_components_, 9) + self.assertEqual(pca.mean_.shape, (100 * ht.MPI_WORLD.size + 1,)) + self.assertEqual(pca.mean_.dtype, dtype) + self.assertEqual(pca.singular_values_.shape, (9,)) + self.assertEqual(pca.singular_values_.dtype, dtype) + self.assertEqual(pca.n_samples_seen_, 9) + s0_np = np.linalg.svd(data0_np - data0_np.mean(axis=0), compute_uv=False, hermitian=False) + if not self.is_mps: + self.assertTrue(np.allclose(s0_np, pca.singular_values_.numpy(), atol=1e-12)) + + # test partial_fit, step 1 + # here actually truncation happens as we have rank 20 but n_components=15 + pca.partial_fit(data1) + self.assertEqual(pca.components_.shape, (15, 100 * ht.MPI_WORLD.size + 1)) + self.assertEqual(pca.n_components_, 15) + self.assertEqual(pca.mean_.shape, (100 * ht.MPI_WORLD.size + 1,)) + self.assertEqual(pca.singular_values_.shape, (15,)) + self.assertEqual(pca.n_samples_seen_, 20) + s_np = np.linalg.svd(data_np - data_np.mean(axis=0), compute_uv=False, hermitian=False) + self.assertTrue(np.allclose(s_np[:15], pca.singular_values_.numpy())) + + def test_incrementalpca_catch_wrong_inputs(self): + pca = ht.decomposition.IncrementalPCA(n_components=1) + data0 = ht.random.randn(15, 15, split=None) + + # fit is not yet implemented + with self.assertRaises(NotImplementedError): + pca.fit(data0) + # wrong input for partial_fit + with self.assertRaises(ValueError): + pca.partial_fit(data0, y="Why can't we get rid of this argument?") + + pca.partial_fit(data0) + # wrong inputs for transform and inverse transform + with self.assertRaises(ValueError): + pca.transform(ht.zeros((15, 16), split=None)) + with self.assertRaises(ValueError): + pca.inverse_transform(ht.zeros((17, 2), split=None)) diff --git a/heat/fft/tests/test_fft.py b/heat/fft/tests/test_fft.py index e3ff6bc0de..b0ecdc68b0 100644 --- a/heat/fft/tests/test_fft.py +++ b/heat/fft/tests/test_fft.py @@ -1,24 +1,41 @@ import numpy as np import torch import unittest +import platform +import os import heat as ht from heat.core.tests.test_suites.basic_test import TestCase torch_ihfftn = hasattr(torch.fft, "ihfftn") +# On MPS, FFTs only supported for MacOS 14+ +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + +@unittest.skipIf( + is_mps and int(platform.mac_ver()[0].split(".")[0]) < 14, + "FFT on Apple MPS only supported on MacOS 14+", +) class TestFFT(TestCase): def test_fft_ifft(self): + dtype = ht.float32 if self.is_mps else ht.float64 # 1D non-distributed - x = ht.random.randn(6, dtype=ht.float64) + x = ht.random.randn(6, dtype=dtype) y = ht.fft.fft(x) np_y = np.fft.fft(x.numpy()) + if self.is_mps: + np_y = np_y.astype(np.complex64) self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, x.shape) - self.assert_array_equal(y, np_y) - backwards = ht.fft.ifft(y) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.ifft(y) + self.assertTrue(ht.allclose(backwards, x)) # 1D distributed x = ht.random.randn(6, split=0) @@ -28,10 +45,12 @@ def test_fft_ifft(self): self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, np_y.shape) self.assertTrue(y.split == 0) - self.assert_array_equal(y, np_y) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) # n-D distributed - x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0) + x = ht.random.randn(10, 8, 6, dtype=dtype, split=0) # FFT along last axis n = 5 y = ht.fft.fft(x, n=n) @@ -39,7 +58,9 @@ def test_fft_ifft(self): self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, np_y.shape) self.assertTrue(y.split == 0) - self.assert_array_equal(y, np_y) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) # FFT along distributed axis, n not None n = 8 @@ -48,10 +69,12 @@ def test_fft_ifft(self): self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, np_y.shape) self.assertTrue(y.split == 0) - self.assert_array_equal(y, np_y) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) # complex input - x = x + 1j * ht.random.randn(10, 8, 6, dtype=ht.float64, split=0) + x = x + 1j * ht.random.randn(10, 8, 6, dtype=dtype, split=0) # FFT along last axis (distributed) x.resplit_(axis=2) y = ht.fft.fft(x, n=n) @@ -75,24 +98,32 @@ def test_fft_ifft(self): ht.fft.fft(x, axis=(0, 1)) def test_fft2_ifft2(self): + dtype = ht.float32 if self.is_mps else ht.float64 # 2D FFT along non-split axes - x = ht.random.randn(3, 6, 6, split=0, dtype=ht.float64) + x = ht.random.randn(3, 6, 6, split=0, dtype=dtype) y = ht.fft.fft2(x) np_y = np.fft.fft2(x.numpy()) self.assertTrue(y.split == 0) self.assert_array_equal(y, np_y) - backwards = ht.fft.ifft2(y) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.ifft2(y) + self.assertTrue(ht.allclose(backwards, x)) # 2D FFT along split axes - x = ht.random.randn(10, 6, 6, split=0, dtype=ht.float64) + x = ht.random.randn(10, 6, 6, split=0, dtype=dtype) axes = (0, 1) y = ht.fft.fft2(x, axes=axes) np_y = np.fft.fft2(x.numpy(), axes=axes) self.assertTrue(y.split == 0) - self.assert_array_equal(y, np_y) - backwards = ht.fft.ifft2(y, axes=axes) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.ifft2(y, axes=axes) + self.assertTrue(ht.allclose(backwards, x)) # exceptions x = ht.arange(10, split=0) @@ -100,6 +131,7 @@ def test_fft2_ifft2(self): ht.fft.fft2(x) def test_fftn_ifftn(self): + dtype = ht.float32 if self.is_mps else ht.float64 # 1D non-distributed x = ht.random.randn(6) y = ht.fft.fftn(x) @@ -107,8 +139,11 @@ def test_fftn_ifftn(self): self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, x.shape) self.assert_array_equal(y, np_y) - backwards = ht.fft.ifftn(y) - self.assertTrue(ht.allclose(backwards, x, atol=1e-7)) + if not self.is_mps: + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.ifftn(y) + self.assertTrue(ht.allclose(backwards, x, atol=1e-7)) # 1D distributed x = ht.random.randn(6, split=0) @@ -120,10 +155,10 @@ def test_fftn_ifftn(self): self.assert_array_equal(y, np_y) # n-D distributed - x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0) + x = ht.random.randn(10, 8, 6, dtype=dtype, split=0) # FFT along last 2 axes y = ht.fft.fftn(x, s=(6, 6)) - np_y = np.fft.fftn(x.numpy(), s=(6, 6)) + np_y = np.fft.fftn(x.numpy(), s=(6, 6), axes=(1, 2)) self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, np_y.shape) self.assertTrue(y.split == 0) @@ -206,8 +241,11 @@ def test_fftshift_ifftshift(self): np_y = np.fft.fftshift(x.numpy()) self.assertEqual(y.shape, np_y.shape) self.assert_array_equal(y, np_y) - backwards = ht.fft.ifftshift(y) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.ifftshift(y) + self.assertTrue(ht.allclose(backwards, x)) # distributed # (following fftshift example from torch.fft) @@ -223,8 +261,10 @@ def test_fftshift_ifftshift(self): with self.assertRaises(IndexError): ht.fft.fftshift(x, axes=(0, 2)) + @unittest.skipIf(is_mps, "Insufficient precision on MPS") def test_hfft_ihfft(self): - x = ht.zeros((3, 5), split=0, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + x = ht.zeros((3, 5), split=0, dtype=dtype) edges = [1, 3, 7] for i, n in enumerate(edges): x[i] = ht.linspace(0, n, 5) @@ -237,8 +277,10 @@ def test_hfft_ihfft(self): reconstructed_x = ht.fft.hfft(inv_fft, n=n) self.assertEqual(reconstructed_x.shape[-1], n) + @unittest.skipIf(is_mps, "Insufficient precision on MPS") def test_hfft2_ihfft2(self): - x = ht.random.randn(10, 6, 6, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + x = ht.random.randn(10, 6, 6, dtype=dtype) if torch_ihfftn: inv_fft = ht.fft.ihfft2(x) reconstructed_x = ht.fft.hfft2(inv_fft, s=x.shape[-2:]) @@ -247,8 +289,10 @@ def test_hfft2_ihfft2(self): with self.assertRaises(NotImplementedError): ht.fft.ihfft2(x) + @unittest.skipIf(is_mps, "Insufficient precision on MPS") def test_hfftn_ihfftn(self): - x = ht.random.randn(10, 6, 6, dtype=ht.float64) + dtype = ht.float32 if self.is_mps else ht.float64 + x = ht.random.randn(10, 6, 6, dtype=dtype) if torch_ihfftn: inv_fft = ht.fft.ihfftn(x) reconstructed_x = ht.fft.hfftn(inv_fft, s=x.shape) @@ -260,36 +304,45 @@ def test_hfftn_ihfftn(self): ht.fft.ihfftn(x) def test_rfft_irfft(self): + dtype = ht.float32 if self.is_mps else ht.float64 # n-D distributed - x = ht.random.randn(10, 8, 3, dtype=ht.float64, split=0) + x = ht.random.randn(10, 8, 3, dtype=dtype, split=0) # FFT along last axis y = ht.fft.rfft(x) np_y = np.fft.rfft(x.numpy()) self.assertTrue(y.split == 0) self.assert_array_equal(y, np_y) - backwards = ht.fft.irfft(y, n=x.shape[-1]) - self.assertTrue(ht.allclose(backwards, x)) - backwards_no_n = ht.fft.irfft(y) - self.assertEqual(backwards_no_n.shape[-1], 2 * (y.shape[-1] - 1)) + if not self.is_mps: + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.irfft(y, n=x.shape[-1]) + self.assertTrue(ht.allclose(backwards, x)) + backwards_no_n = ht.fft.irfft(y) + self.assertEqual(backwards_no_n.shape[-1], 2 * (y.shape[-1] - 1)) # exceptions # complex input - x = x + 1j * ht.random.randn(10, 8, 3, dtype=ht.float64, split=0) + x = x + 1j * ht.random.randn(10, 8, 3, dtype=dtype, split=0) with self.assertRaises(TypeError): ht.fft.rfft(x) def test_rfftn_irfftn(self): + dtype = ht.float32 if self.is_mps else ht.float64 # n-D distributed - x = ht.random.randn(10, 8, 6, dtype=ht.float64, split=0) + x = ht.random.randn(10, 8, 6, dtype=dtype, split=0) # FFT along last 2 axes y = ht.fft.rfftn(x, axes=(1, 2)) np_y = np.fft.rfftn(x.numpy(), axes=(1, 2)) self.assertIsInstance(y, ht.DNDarray) self.assertEqual(y.shape, np_y.shape) self.assertTrue(y.split == 0) - self.assert_array_equal(y, np_y) - backwards = ht.fft.irfftn(y, s=x.shape[-2:]) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # precision loss on imaginary part of single elements of MPS tensor + self.assert_array_equal(y, np_y) + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.irfftn(y, s=x.shape[-2:]) + self.assertTrue(ht.allclose(backwards, x)) # FFT along all axes # TODO: comment this out after merging indexing PR # y = ht.fft.rfftn(x) @@ -298,13 +351,14 @@ def test_rfftn_irfftn(self): # exceptions # complex input - x = x + 1j * ht.random.randn(10, 8, 6, dtype=ht.float64, split=0) + x = x + 1j * ht.random.randn(10, 8, 6, dtype=dtype, split=0) with self.assertRaises(TypeError): ht.fft.rfftn(x) def test_rfft2_irfft2(self): + dtype = ht.float32 if self.is_mps else ht.float64 # n-D distributed - x = ht.random.randn(4, 8, 6, dtype=ht.float64, split=0) + x = ht.random.randn(4, 8, 6, dtype=dtype, split=0) # FFT along last 2 axes y = ht.fft.rfft2(x, axes=(1, 2)) np_y = np.fft.rfft2(x.numpy(), axes=(1, 2)) @@ -313,5 +367,8 @@ def test_rfft2_irfft2(self): self.assertTrue(y.split == 0) self.assert_array_equal(y, np_y) - backwards = ht.fft.irfft2(y, s=x.shape[-2:]) - self.assertTrue(ht.allclose(backwards, x)) + if not self.is_mps: + # backwards transform buggy on MPS, see + # https://github.com/pytorch/pytorch/issues/124096 + backwards = ht.fft.irfft2(y, s=x.shape[-2:]) + self.assertTrue(ht.allclose(backwards, x)) diff --git a/heat/naive_bayes/gaussianNB.py b/heat/naive_bayes/gaussianNB.py index 9baaa50504..2cbb10cf08 100644 --- a/heat/naive_bayes/gaussianNB.py +++ b/heat/naive_bayes/gaussianNB.py @@ -108,7 +108,7 @@ def __check_partial_fit_first_call(self, classes: Optional[DNDarray] = None) -> set on :class:`GaussianNB`. """ if getattr(self, "classes_", None) is None and classes is None: - raise ValueError("classes must be passed on the first call " "to partial_fit.") + raise ValueError("classes must be passed on the first call to partial_fit.") elif classes is not None: unique_labels = classes @@ -273,7 +273,7 @@ def __partial_fit( raise ValueError("Sample weights must be 1D tensor") if sample_weight.shape != (n_samples,): raise ValueError( - f"sample_weight.shape == {sample_weight.shape}, expected {(n_samples, )}!" + f"sample_weight.shape == {sample_weight.shape}, expected {(n_samples,)}!" ) # If the ratio of data variance between dimensions is too small, it @@ -293,8 +293,12 @@ def __partial_fit( self.theta_ = ht.zeros((n_classes, n_features), dtype=x.dtype, device=x.device) self.sigma_ = ht.zeros((n_classes, n_features), dtype=x.dtype, device=x.device) + if x.larray.is_mps: + class_count_dtype = ht.float32 + else: + class_count_dtype = ht.types.promote_types(x.dtype, ht.float) self.class_count_ = ht.zeros( - (x.comm.size, n_classes), dtype=ht.float64, device=x.device, split=0 + (x.comm.size, n_classes), dtype=class_count_dtype, device=x.device, split=0 ) # Initialise the class prior # Take into account the priors @@ -305,7 +309,7 @@ def __partial_fit( priors = self.priors # Check that the provide prior match the number of classes if len(priors) != n_classes: - raise ValueError("Number of priors must match number of" " classes.") + raise ValueError("Number of priors must match number of classes.") # Check that the sum is 1 if not ht.isclose(priors.sum(), ht.array(1.0, dtype=priors.dtype)): raise ValueError("The sum of the priors should be 1.") @@ -316,7 +320,7 @@ def __partial_fit( else: # Initialize the priors to zeros for each class self.class_prior_ = ht.zeros( - len(self.classes_), dtype=ht.float64, split=None, device=x.device + len(self.classes_), dtype=class_count_dtype, split=None, device=x.device ) else: if x.shape[1] != self.theta_.shape[1]: diff --git a/heat/naive_bayes/tests/test_gaussiannb.py b/heat/naive_bayes/tests/test_gaussiannb.py index 57fe5122bc..3918c6d4a0 100644 --- a/heat/naive_bayes/tests/test_gaussiannb.py +++ b/heat/naive_bayes/tests/test_gaussiannb.py @@ -23,14 +23,16 @@ def test_get_and_set_params(self): self.assertEqual(1e-10, gnb.var_smoothing) def test_fit_iris(self): + if self.is_mps: + dtype = ht.float32 + else: + dtype = ht.float64 # load sklearn train/test sets and resulting probabilities - X_train = ht.load("heat/datasets/iris_X_train.csv", sep=";", dtype=ht.float64) - X_test = ht.load("heat/datasets/iris_X_test.csv", sep=";", dtype=ht.float64) + X_train = ht.load("heat/datasets/iris_X_train.csv", sep=";", dtype=dtype) + X_test = ht.load("heat/datasets/iris_X_test.csv", sep=";", dtype=dtype) y_train = ht.load("heat/datasets/iris_y_train.csv", sep=";", dtype=ht.int64).squeeze() y_test = ht.load("heat/datasets/iris_y_test.csv", sep=";", dtype=ht.int64).squeeze() - y_pred_proba_sklearn = ht.load( - "heat/datasets/iris_y_pred_proba.csv", sep=";", dtype=ht.float64 - ) + y_pred_proba_sklearn = ht.load("heat/datasets/iris_y_pred_proba.csv", sep=";", dtype=dtype) # test ht.GaussianNB from heat.naive_bayes import GaussianNB diff --git a/heat/nn/__init__.py b/heat/nn/__init__.py index 4bac4f4f23..7da9d072a3 100644 --- a/heat/nn/__init__.py +++ b/heat/nn/__init__.py @@ -1,5 +1,5 @@ """ -This is the heat.nn submodule. +Neural network submodule. It contains data parallel specific nn modules. It also includes all of the modules in the torch.nn namespace """ diff --git a/heat/nn/data_parallel.py b/heat/nn/data_parallel.py index 4f9ceee02c..a3a9a0a434 100644 --- a/heat/nn/data_parallel.py +++ b/heat/nn/data_parallel.py @@ -1,5 +1,5 @@ """ -This file is for the general data parallel neural network classes. +General data parallel neural network classes. """ import warnings @@ -312,7 +312,7 @@ def _reset_parameters(module: tnn.Module) -> None: class DataParallelMultiGPU(tnn.Module): """ - This creates data parallel networks local to each node using PyTorch's distributed class. This does NOT + Creates data parallel networks local to each node using PyTorch's distributed class. This does NOT do any global synchronizations. To make optimal use of this structure, use :func:`ht.optim.DASO `. Notes diff --git a/heat/optim/__init__.py b/heat/optim/__init__.py index 5e1cc5399e..fb7f869897 100644 --- a/heat/optim/__init__.py +++ b/heat/optim/__init__.py @@ -1,5 +1,5 @@ """ -This is the heat.optimizer submodule. +Optimizer module. It contains data parallel specific optimizers and learning rate schedulers. It also includes all of the optimizers and learning rate schedulers in the torch namespace diff --git a/heat/optim/dp_optimizer.py b/heat/optim/dp_optimizer.py index 5e45545349..d1f219588d 100644 --- a/heat/optim/dp_optimizer.py +++ b/heat/optim/dp_optimizer.py @@ -862,9 +862,7 @@ class DataParallelOptimizer: use blocking communications or not. will typically be overwritten by :func:`nn.DataParallel ` """ - def __init__( - self, torch_optimizer: torch.optim.Optimizer, blocking: bool = False - ): # noqa: D107 + def __init__(self, torch_optimizer: torch.optim.Optimizer, blocking: bool = False): # noqa: D107 self.torch_optimizer = torch_optimizer if not isinstance(blocking, bool): raise TypeError(f"blocking parameter must be a boolean, currently {type(blocking)}") diff --git a/heat/preprocessing/preprocessing.py b/heat/preprocessing/preprocessing.py index 7057b22f20..442ffff933 100644 --- a/heat/preprocessing/preprocessing.py +++ b/heat/preprocessing/preprocessing.py @@ -443,7 +443,7 @@ def inverse_transform(self, Y: ht.DNDarray) -> Union[Self, ht.DNDarray]: class RobustScaler(ht.TransformMixin, ht.BaseEstimator): """ - This scaler transforms the features of a given data set making use of statistics + Scales the features of a given data set making use of statistics that are robust to outliers: it removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range); this routine is similar to ``sklearn.preprocessing.RobustScaler``. diff --git a/heat/py.typed b/heat/py.typed new file mode 100644 index 0000000000..e69de29bb2 diff --git a/heat/regression/lasso.py b/heat/regression/lasso.py index 7d99a72454..8e9b2d45b7 100644 --- a/heat/regression/lasso.py +++ b/heat/regression/lasso.py @@ -42,7 +42,7 @@ class Lasso(ht.RegressionMixin, ht.BaseEstimator): Examples -------- >>> X = ht.random.randn(10, 4, split=0) - >>> y = ht.random.randn(10,1, split=0) + >>> y = ht.random.randn(10, 1, split=0) >>> estimator = ht.regression.lasso.Lasso(max_iter=100, tol=None) >>> estimator.fit(X, y) """ diff --git a/heat/sparse/factories.py b/heat/sparse/factories.py index 0966785cdf..dbdf111f16 100644 --- a/heat/sparse/factories.py +++ b/heat/sparse/factories.py @@ -141,7 +141,7 @@ def sparse_csc_matrix( Create a :class:`~heat.sparse.DCSC_matrix` from :class:`torch.Tensor` (layout ==> torch.sparse_csc) >>> indptr = torch.tensor([0, 2, 3, 6]) >>> indices = torch.tensor([0, 2, 2, 0, 1, 2]) - >>> data = torch.tensor([1., 4., 5., 2., 3., 6.], dtype=torch.float) + >>> data = torch.tensor([1.0, 4.0, 5.0, 2.0, 3.0, 6.0], dtype=torch.float) >>> torch_sparse_csc = torch.sparse_csc_tensor(indptr, indices, data) >>> heat_sparse_csc = ht.sparse.sparse_csc_matrix(torch_sparse_csc, split=1) >>> heat_sparse_csc diff --git a/heat/sparse/manipulations.py b/heat/sparse/manipulations.py index 355b04cbd1..2199f32ebd 100644 --- a/heat/sparse/manipulations.py +++ b/heat/sparse/manipulations.py @@ -43,7 +43,11 @@ def __to_sparse(array: DNDarray, orientation="row") -> __DCSX_matrix: array.balance_() method = sparse_csr_matrix if orientation == "row" else sparse_csc_matrix result = method( - array.larray, dtype=array.dtype, is_split=array.split, device=array.device, comm=array.comm + array.larray, + dtype=array.dtype, + is_split=array.split, + device=array.device, + comm=array.comm, ) return result diff --git a/heat/sparse/tests/test_arithmetics_csr.py b/heat/sparse/tests/test_arithmetics_csr.py index aac8ced1d5..38f23062a5 100644 --- a/heat/sparse/tests/test_arithmetics_csr.py +++ b/heat/sparse/tests/test_arithmetics_csr.py @@ -4,14 +4,19 @@ import heat as ht import os +import platform import random from heat.core.tests.test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + + @unittest.skipIf( - int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12, - f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.", + is_mps, + "sparse_csr_tensor not supported on MPS (PyTorch 2.3)", ) class TestArithmeticsCSR(TestCase): @classmethod diff --git a/heat/sparse/tests/test_dcscmatrix.py b/heat/sparse/tests/test_dcscmatrix.py index 595ae483cc..22386d1444 100644 --- a/heat/sparse/tests/test_dcscmatrix.py +++ b/heat/sparse/tests/test_dcscmatrix.py @@ -1,4 +1,6 @@ import unittest +import os +import platform import heat as ht import torch @@ -6,10 +8,13 @@ from typing import Tuple +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + @unittest.skipIf( - int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12, - f"ht.sparse requires torch >= 2.0. Found version {torch.__version__}.", + is_mps, + "sparse_csr_tensor not supported on MPS (PyTorch 2.3)", ) class TestDCSC_matrix(TestCase): @classmethod diff --git a/heat/sparse/tests/test_dcsrmatrix.py b/heat/sparse/tests/test_dcsrmatrix.py index 6cf86ebf87..4f5b99df64 100644 --- a/heat/sparse/tests/test_dcsrmatrix.py +++ b/heat/sparse/tests/test_dcsrmatrix.py @@ -1,4 +1,6 @@ import unittest +import os +import platform import heat as ht import torch @@ -7,9 +9,13 @@ from typing import Tuple +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + + @unittest.skipIf( - int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12, - f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.", + is_mps, + "sparse_csr_tensor not supported on MPS (PyTorch 2.3)", ) class TestDCSR_matrix(TestCase): @classmethod diff --git a/heat/sparse/tests/test_factories.py b/heat/sparse/tests/test_factories.py index b9422f1d3f..84dd5e2b5d 100644 --- a/heat/sparse/tests/test_factories.py +++ b/heat/sparse/tests/test_factories.py @@ -1,14 +1,19 @@ import unittest +import os +import platform import heat as ht import torch import scipy from heat.core.tests.test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + @unittest.skipIf( - int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12, - f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.", + is_mps, + "sparse_csr_tensor not supported on MPS (PyTorch 2.3)", ) class TestFactories(TestCase): @classmethod diff --git a/heat/sparse/tests/test_manipulations.py b/heat/sparse/tests/test_manipulations.py index 1de090a871..97b5ab5ca9 100644 --- a/heat/sparse/tests/test_manipulations.py +++ b/heat/sparse/tests/test_manipulations.py @@ -1,13 +1,18 @@ import unittest +import os +import platform import heat as ht import torch from heat.core.tests.test_suites.basic_test import TestCase +envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu") +is_mps = envar == "gpu" and platform.system() == "Darwin" + @unittest.skipIf( - int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 12, - f"ht.sparse requires torch >= 1.12. Found version {torch.__version__}.", + is_mps, + "sparse_csr_tensor not supported on MPS (PyTorch 2.3)", ) class TestManipulations(TestCase): @classmethod diff --git a/heat/spatial/distance.py b/heat/spatial/distance.py index 03579fbdb7..5a92e727b7 100644 --- a/heat/spatial/distance.py +++ b/heat/spatial/distance.py @@ -227,7 +227,7 @@ def _dist(X: DNDarray, Y: DNDarray = None, metric: Callable = _euclidian) -> DND If metric requires additional arguments, it must be handed over as a lambda function: ``lambda x, y: metric(x, y, **args)`` Notes - ------- + ----- If ``X.split=None`` and ``Y.split=0``, result will be ``split=1`` """ diff --git a/heat/spatial/tests/test_distances.py b/heat/spatial/tests/test_distances.py index d8ce1a44ca..d5769c2009 100644 --- a/heat/spatial/tests/test_distances.py +++ b/heat/spatial/tests/test_distances.py @@ -238,10 +238,11 @@ def test_cdist(self): result = ht.array(res, dtype=ht.float32, split=0) self.assertTrue(ht.allclose(d, result, atol=1e-8)) - B = A.astype(ht.float64) - d = ht.spatial.cdist(A, B, quadratic_expansion=False) - result = ht.array(res, dtype=ht.float64, split=0) - self.assertTrue(ht.allclose(d, result, atol=1e-8)) + if not self.is_mps: + B = A.astype(ht.float64) + d = ht.spatial.cdist(A, B, quadratic_expansion=False) + result = ht.array(res, dtype=ht.float64, split=0) + self.assertTrue(ht.allclose(d, result, atol=1e-8)) B = A.astype(ht.int16) d = ht.spatial.cdist(A, B, quadratic_expansion=False) @@ -257,7 +258,8 @@ def test_cdist(self): result = ht.array(res, dtype=ht.float32, split=0) self.assertTrue(ht.allclose(d, result, atol=1e-8)) - B = A.astype(ht.float64) - d = ht.spatial.cdist(B, quadratic_expansion=False) - result = ht.array(res, dtype=ht.float64, split=0) - self.assertTrue(ht.allclose(d, result, atol=1e-8)) + if not self.is_mps: + B = A.astype(ht.float64) + d = ht.spatial.cdist(B, quadratic_expansion=False) + result = ht.array(res, dtype=ht.float64, split=0) + self.assertTrue(ht.allclose(d, result, atol=1e-8)) diff --git a/heat/tests/test_cli.py b/heat/tests/test_cli.py new file mode 100644 index 0000000000..2979c9744b --- /dev/null +++ b/heat/tests/test_cli.py @@ -0,0 +1,56 @@ +from unittest.mock import patch +import argparse +from heat import cli +import io +import contextlib + +class TestCLI: + @patch("argparse.ArgumentParser.parse_args", return_value=argparse.Namespace(info=False)) + def test_cli_help(self, mock_parse_args): + stdout = io.StringIO() + with contextlib.redirect_stdout(stdout): + cli.cli() + + print(stdout.getvalue()) + assert "usage: heat [-h] [-i]" in stdout.getvalue() + + @patch("platform.platform") + @patch("mpi4py.MPI.Get_library_version") + @patch("torch.cuda.is_available") + @patch("torch.cuda.device_count") + @patch("torch.cuda.current_device") + @patch("torch.cuda.get_device_name") + @patch("torch.cuda.get_device_properties") + def test_platform_info( + self, + mock_get_device_properties, + mock_get_device_name, + mock_get_default_device, + mock_device_count, + mock_cuda_current_device, + mock_mpi_lib_version, + mock_platform, + ): + mock_platform.return_value = "Test Platform" + mock_mpi_lib_version.return_value = "Test MPI Library" + mock_cuda_current_device.return_value = True + mock_device_count.return_value = 1 + mock_get_default_device.return_value = "cuda:0" + mock_get_device_name.return_value = "Test Device" + mock_get_device_properties.return_value.total_memory = 1024**4 # 1TiB + + stdout_stream = io.StringIO() + with contextlib.redirect_stdout(stdout_stream): + cli.plaform_info() + stdout = stdout_stream.getvalue() + print(stdout) + assert "HeAT: Helmholtz Analytics Toolkit" in stdout + assert "Platform: Test Platform" in stdout + assert "mpi4py Version:" in stdout + assert "MPI Library Version: Test MPI Library" in stdout + assert "Torch Version:" in stdout + assert "CUDA Available: True" in stdout + assert "Device count: 1" in stdout + assert "Default device: cuda:0" in stdout + assert "Device name: Test Device" in stdout + assert "Device memory: 1024.0 GiB" in stdout diff --git a/heat/utils/data/_utils.py b/heat/utils/data/_utils.py index d0a80a9c1d..a20cd2fb09 100644 --- a/heat/utils/data/_utils.py +++ b/heat/utils/data/_utils.py @@ -1,4 +1,5 @@ """ +Data utilities module. This file contains functions which may be useful for certain datatypes, but are not test in the heat framework This file contains standalone utilities for data preparation which may be useful The functions contained within are not tested, nor actively supported diff --git a/heat/utils/data/datatools.py b/heat/utils/data/datatools.py index 6bc92f4b75..044195ccbe 100644 --- a/heat/utils/data/datatools.py +++ b/heat/utils/data/datatools.py @@ -214,7 +214,7 @@ def __init__( def __getitem__(self, index: Union[int, slice, tuple, list, torch.Tensor]) -> torch.Tensor: """ - This is the most basic form of getitem. As the dataset is often very specific to the dataset, + Basic form of __getitem__. As the dataset is often very specific to the dataset, this should be overwritten by the user. In this form it only gets the raw items from the data. """ if self.transforms: diff --git a/heat/utils/data/partial_dataset.py b/heat/utils/data/partial_dataset.py index 5b48d72efa..f06d496790 100644 --- a/heat/utils/data/partial_dataset.py +++ b/heat/utils/data/partial_dataset.py @@ -174,6 +174,7 @@ def Ishuffle(self): def __getitem__(self, index: Union[int, slice, List[int], torch.Tensor]) -> torch.Tensor: """ + Abstract __getitem__ method. This should be defined by the user at runtime. This function needs to be designed such that the data is in the 0th dimension and the indexes called are only in the 0th dim! """ diff --git a/heat/utils/data/spherical.py b/heat/utils/data/spherical.py index af57bf4637..133f25c89a 100644 --- a/heat/utils/data/spherical.py +++ b/heat/utils/data/spherical.py @@ -63,7 +63,7 @@ def create_clusters( The clusters are of the same size (quantitatively) and distributed evenly over the processes, unless cluster_weight is specified. Parameters - ------------ + ---------- n_samples: int Number of overall samples n_features: int @@ -146,7 +146,7 @@ def create_clusters( for k in range(n_clusters) ] local_data = torch.cat(local_data, dim=0) - rand_perm = torch.randperm(local_shape[0]) + rand_perm = torch.randperm(local_shape[0], device=device.torch_device) local_data = local_data[rand_perm, :] data = ht.DNDarray( local_data, diff --git a/heat/utils/data/tests/test_matrixgallery.py b/heat/utils/data/tests/test_matrixgallery.py index e7696c44c3..17390cb013 100644 --- a/heat/utils/data/tests/test_matrixgallery.py +++ b/heat/utils/data/tests/test_matrixgallery.py @@ -32,13 +32,14 @@ def test_hermitian(self): self.assertTrue(A_err <= 1e-6) for posdef in [True, False]: - # test complex double precision - A = ht.utils.data.matrixgallery.hermitian( - 20, dtype=ht.complex128, split=0, positive_definite=posdef - ) - A_err = ht.norm(A - A.T.conj().resplit_(A.split)) / ht.norm(A) - self.assertTrue(A.dtype == ht.complex128) - self.assertTrue(A_err <= 1e-12) + if not self.is_mps: + # test complex double precision + A = ht.utils.data.matrixgallery.hermitian( + 20, dtype=ht.complex128, split=0, positive_definite=posdef + ) + A_err = ht.norm(A - A.T.conj().resplit_(A.split)) / ht.norm(A) + self.assertTrue(A.dtype == ht.complex128) + self.assertTrue(A_err <= 1e-12) # test real datatype A = ht.utils.data.matrixgallery.hermitian( diff --git a/pyproject.toml b/pyproject.toml index 5306055560..5168e89f48 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -2,5 +2,178 @@ requires = ["setuptools"] build-backend = "setuptools.build_meta" -[tool.black] +[project] +name="heat" +dynamic = ["version"] +description="A framework for high-performance data analytics and machine learning." +readme = "README.md" +authors = [ + { name = "Markus Götz", email = "markus.goetz@kit.edu"}, + { name = "Charlotte Debus", email = "charlotte.debus@kit.edu"}, + { name = "Daniel Coquelin", email = "daniel.coquelin@kit.edu"}, + { name = "Kai Krajsek", email = "k.krajsek@fz-juelich.de"}, + { name = "Claudia Comito", email = "c.comito@fz-juelich.de"}, + { name = "Philipp Knechtges", email = "philipp.knechtges@dlr.de"}, + { name = "Björn Hagemeier", email = "b.hagemeier@fz-juelich.de"}, + { name = "Martin Siggel", email = "martin.siggel@dlr.de"}, + { name = "Achim Basermann", email = "achim.basermann@dlr.de"}, + { name = "Achim Streit", email = "achim.streit@kit.de"}, +] +maintainers = [ + { name = "Claudia Comito", email = "c.comito@fz-juelich.de"}, + { name = "Michael Tarnawa", email = "m.tarnawa@fz-juelich.de"}, + { name = "Fabian Hoppe", email = "f.hoppe@dlr.de"}, + { name = "Juan Pedro Gutiérrez Hermosillo Muriedas", email = "juan.muriedas@kit.edu"}, + { name = "Hakan Akdag", email = "hakan.akdag@dlr.de"}, + { name = "Berkant Palazoglu", email = "b.palazoglu@fz-juelich.de"} +] +license = "MIT" +license-files = ["LICENSE"] +keywords=["data", "analytics", "tensors", "distributed", "gpu"] +classifiers=[ + "Development Status :: 5 - Production/Stable", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Intended Audience :: Science/Research", + "Topic :: Scientific/Engineering", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Topic :: Scientific/Engineering :: Information Analysis", + "Topic :: Scientific/Engineering :: Mathematics", + "Typing :: Typed" +] + +requires-python = ">=3.10" + +dependencies = [ + "mpi4py>=3.0.0", + "torch~=2.0,<2.8.0", + "torchvision~=0.15", + "scipy~=1.14", +] + +[project.optional-dependencies] +## IO Modules +hdf5 = ["h5py>=2.8.0"] +netcdf = ["netCDF4>=1.5.6"] +zarr = ["zarr"] + +## Examples and tutorial +examples = [ + "scikit-learn~=0.24", + "matplotlib~=3.1", + "jupyter", + "ipyparallel", + "pillow" +] + +dev = [ + # QA + "pre-commit", + "ruff", + "mypy", + + # Testing + "pytest", + "coverage", + + # Benchmarking + "perun", +] + +docs = [ + "sphinx", + "sphinx_rtd_theme", + "sphinx-autoapi", + "nbsphinx", + "sphinx-autobuild", + "sphinx-copybutton", +] + +[project.scripts] +heat = "heat.cli:cli" + +[project.urls] +Homepage = "https://github.com/helmholtz-analytics/heat" +Documentation = "https://heat.readthedocs.io/" +Repository = "https://github.com/helmholtz-analytics/heat" +Issues = "https://github.com/helmholtz-analytics/heat/issues" +Changelog = "https://github.com/helmholtz-analytics/heat/blob/main/CHANGELOG.md" + +[tool.setuptools.packages.find] +where = ["."] +include = ["heat", "heat.*"] +exclude = ["*tests*", "*benchmarks*"] + + +[tool.setuptools.package-data] +datasets = ["*.csv", "*.h5", "*.nc"] +heat = ["py.typed"] + +[tool.setuptools.dynamic] +version = {attr = "heat.core.version.__version__"} + +# Mypy +[tool.mypy] +packages=["heat"] +python_version="3.10" +exclude=[ + 'test_\w+\.py$', + '^benchmarks/', + '^examples/' +] + +# Strict configuration from https://careers.wolt.com/en/blog/tech/professional-grade-mypy-configuration +disallow_untyped_defs = true +disallow_any_unimported = true +no_implicit_optional = true +check_untyped_defs=true +warn_return_any=true +show_error_codes =true +warn_unused_ignores=true +follow_imports = "normal" +follow_untyped_imports = true + +# Ignore most the errors now, focus only ont eh core module +ignore_errors=true + +[[tool.mypy.overrides]] +module = "heat.core.*" +ignore_errors=false + + +# Ruff +[tool.ruff] +target-version = "py310" +exclude = ["tutorials", "examples", "benchmarks", "scripts", "**/tests/", "doc", "docker"] line-length = 100 + +[tool.ruff.lint] +select = ["E", "F", "D", "W", "D417"] + +ignore = [ + "E203", + "E402", + "E501", + "F401", + "F403", + "D105", + "D107", + "D200", + "D203", + "D205", + "D212", + "D301", + "D400", + "D401", + "D402", + "D410", + "D415", +] + +[tool.ruff.lint.pydocstyle] +convention = "numpy" + +[tool.ruff.format] +docstring-code-format = true diff --git a/scripts/numpy_coverage_tables.py b/scripts/numpy_coverage_tables.py index e4d68e64ae..1f98c5d86a 100644 --- a/scripts/numpy_coverage_tables.py +++ b/scripts/numpy_coverage_tables.py @@ -534,6 +534,47 @@ numpy_functions.append(numpy_random_operations) headers[str(len(headers))] = "NumPy Random Operations" +# numpy fft operations +numpy_fft_operations = [ + "fft.fft", + "fft.ifft", + "fft.fft2", + "fft.ifft2", + "fft.fftn", + "fft.ifftn", + "fft.rfft", + "fft.irfft", + "fft.fftshift", + "fft.ifftshift", +] +numpy_functions.append(numpy_fft_operations) +headers[str(len(headers))] = "NumPy FFT Operations" + +# numpy masked array operations +numpy_masked_array_operations = [ + "ma.masked_array", + "ma.masked_where", + "ma.fix_invalid", + "ma.is_masked", + "ma.mean", + "ma.median", + "ma.std", + "ma.var", + "ma.sum", + "ma.min", + "ma.max", + "ma.ptp", + "ma.count", + "ma.any", + "ma.all", + "ma.masked_equal", + "ma.masked_greater", + "ma.masked_less", + "ma.notmasked_contiguous", +] +numpy_functions.append(numpy_masked_array_operations) +headers[str(len(headers))] = "NumPy Masked Array Operations" + # initialize markdown file # open the file in write mode f = open("coverage_tables.md", "w") @@ -558,20 +599,23 @@ # Check if functions exist in the heat library and create table rows for func_name in function_list: - if ( - hasattr(heat, func_name) + if (hasattr(heat, func_name) or hasattr(heat.linalg, func_name.replace("linalg.", "")) or hasattr(heat.random, func_name.replace("random.", "")) + or (hasattr(heat, "fft") and hasattr(heat.fft, func_name.replace("fft.", ""))) + or (hasattr(heat, "ma") and hasattr(heat.ma, func_name.replace("ma.", ""))) ): support_status = "✅" # Green checkmark for supported functions else: support_status = "❌" # Red cross for unsupported functions - table_row = f"| {func_name} | {support_status} |" + # **CHANGE 2: Create the issue URL and add it to the row** + issue_url = f"https://github.com/helmholtz-analytics/heat/issues?q=is%3Aissue+is%3Aopen+{func_name}" + table_row = f"| {func_name} | {support_status} | [Search]({issue_url}) |" table_rows.append(table_row) - # Create the Markdown table header - table_header = f"| {headers[str(i)]} | Heat |\n|---|---|\n" + # **CHANGE 1: Add the "Issues" column to the table header** + table_header = f"| {headers[str(i)]} | Heat | Issues |\n|---|---|---|\n" # Combine the header and table rows markdown_table = table_header + "\n".join(table_rows) @@ -581,3 +625,35 @@ # Print the Markdown table f.write(markdown_table) f.write("\n") + + +# for i, function_list in enumerate(numpy_functions): +# f.write(f"## {headers[str(i)]}\n") +# # Initialize a list to store the rows of the Markdown table +# table_rows = [] + +# # Check if functions exist in the heat library and create table rows +# for func_name in function_list: +# if ( +# hasattr(heat, func_name) +# or hasattr(heat.linalg, func_name.replace("linalg.", "")) +# or hasattr(heat.random, func_name.replace("random.", "")) +# ): +# support_status = "✅" # Green checkmark for supported functions +# else: +# support_status = "❌" # Red cross for unsupported functions + +# table_row = f"| {func_name} | {support_status} |" +# table_rows.append(table_row) + +# # Create the Markdown table header +# table_header = f"| {headers[str(i)]} | Heat |\n|---|---|\n" + +# # Combine the header and table rows +# markdown_table = table_header + "\n".join(table_rows) + +# # write link to table of contents +# f.write("[Back to Table of Contents](#table-of-contents)\n\n") +# # Print the Markdown table +# f.write(markdown_table) +# f.write("\n") diff --git a/setup.cfg b/setup.cfg deleted file mode 100644 index 70371c49dd..0000000000 --- a/setup.cfg +++ /dev/null @@ -1,14 +0,0 @@ -[metadata] -description_file = README.md - -[pycodestyle] -max-line-length = 100 -ignore = E203,E402,W503 - -[flake8] -max-line-length = 100 -ignore = E203,E402,W503,E501,F403,F401 - -[pydocstyle] -add-select = D417 -add-ignore = D105, D107, D200, D203, D205, D212, D301, D400, D401, D402, D410, D415 diff --git a/setup.py b/setup.py deleted file mode 100644 index 2fd4d61363..0000000000 --- a/setup.py +++ /dev/null @@ -1,52 +0,0 @@ -from setuptools import setup, find_packages -import codecs - - -with codecs.open("README.md", "r", "utf-8") as handle: - long_description = handle.read() - -__version__ = None # appeases flake, assignment in exec() below -with open("./heat/core/version.py") as handle: - exec(handle.read()) - -setup( - name="heat", - packages=find_packages(exclude=("*tests*", "*benchmarks*")), - package_data={"heat.datasets": ["*.csv", "*.h5", "*.nc"]}, - version=__version__, - description="A framework for high-performance data analytics and machine learning.", - long_description=long_description, - long_description_content_type="text/markdown", - author="Helmholtz Association", - author_email="martin.siggel@dlr.de", - url="https://github.com/helmholtz-analytics/heat", - keywords=["data", "analytics", "tensors", "distributed", "gpu"], - python_requires=">=3.9", - classifiers=[ - "Development Status :: 4 - Beta", - "Programming Language :: Python :: 3.9", - "Programming Language :: Python :: 3.10", - "Programming Language :: Python :: 3.11", - "Programming Language :: Python :: 3.12", - "License :: OSI Approved :: MIT License", - "Intended Audience :: Science/Research", - "Topic :: Scientific/Engineering", - ], - install_requires=[ - "mpi4py>=3.0.0", - "numpy>=1.22.0, <2", - "torch>=2.0.0, <2.6.1", - "scipy>=1.10.0", - "pillow>=6.0.0", - "torchvision>=0.15.2, <0.21.1", - ], - extras_require={ - "docutils": ["docutils>=0.16"], - "hdf5": ["h5py>=2.8.0"], - "netcdf": ["netCDF4>=1.5.6"], - "dev": ["pre-commit>=1.18.3"], - "examples": ["scikit-learn>=0.24.0", "matplotlib>=3.1.0"], - "cb": ["perun>=0.2.0"], - "pandas": ["pandas>=1.4"], - }, -) diff --git a/tutorials/hpc/2_basics.ipynb b/tutorials/hpc/2_basics.ipynb deleted file mode 120000 index 68f73c480c..0000000000 --- a/tutorials/hpc/2_basics.ipynb +++ /dev/null @@ -1 +0,0 @@ -../local/2_basics.ipynb \ No newline at end of file diff --git a/tutorials/hpc/3_internals.ipynb b/tutorials/hpc/3_internals.ipynb deleted file mode 120000 index 4105ea65c6..0000000000 --- a/tutorials/hpc/3_internals.ipynb +++ /dev/null @@ -1 +0,0 @@ -../local/3_internals.ipynb \ No newline at end of file diff --git a/tutorials/hpc/4_loading_preprocessing.ipynb b/tutorials/hpc/4_loading_preprocessing.ipynb deleted file mode 120000 index c2010bb811..0000000000 --- a/tutorials/hpc/4_loading_preprocessing.ipynb +++ /dev/null @@ -1 +0,0 @@ -../local/4_loading_preprocessing.ipynb \ No newline at end of file diff --git a/tutorials/hpc/5_matrix_factorizations.ipynb b/tutorials/hpc/5_matrix_factorizations.ipynb deleted file mode 120000 index 41ae51349c..0000000000 --- a/tutorials/hpc/5_matrix_factorizations.ipynb +++ /dev/null @@ -1 +0,0 @@ -../local/5_matrix_factorizations.ipynb \ No newline at end of file diff --git a/tutorials/hpc/6_clustering.ipynb b/tutorials/hpc/6_clustering.ipynb deleted file mode 120000 index 8668389f7e..0000000000 --- a/tutorials/hpc/6_clustering.ipynb +++ /dev/null @@ -1 +0,0 @@ -../local/6_clustering.ipynb \ No newline at end of file diff --git a/tutorials/local/2_basics.ipynb b/tutorials/local/2_basics.ipynb deleted file mode 100644 index 834169c76e..0000000000 --- a/tutorials/local/2_basics.ipynb +++ /dev/null @@ -1,780 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Heat Basics\n", - "\n", - "We have started an `ipcluster` with 4 engines at the end of the [Intro notebook](1_intro.ipynb).\n", - "\n", - "Let's start the interactive session with a look into the `heat` data object. But first, we need to import the `ipyparallel` client." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from ipyparallel import Client\n", - "rc = Client(profile=\"default\")\n", - "rc.ids\n", - "\n", - "if len(rc.ids) == 0:\n", - " print(\"No engines found\")\n", - "else:\n", - " print(f\"{len(rc.ids)} engines found\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will always start `heat` cells with the `%%px` magic command to execute the cell on all engines. However, the first section of this tutorial doesn't deal with distributed arrays. In these cases, we will use the `%%px --target 0` magic command to execute the cell only on the first engine." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### DNDarrays\n", - "\n", - "\n", - "Similar to a NumPy `ndarray`, a Heat `dndarray` (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension. \n", - "\n", - "Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "%%px \n", - "import heat as ht\n", - "a = ht.array([1, 2, 3])\n", - "a\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.ones((4, 5,))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.arange(10)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.full((3, 2,), fill_value=9)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data Types\n", - "\n", - "Heat supports various data types and operations to retrieve and manipulate the type of a Heat array. However, in contrast to NumPy, Heat is limited to logical (bool) and numerical types (uint8, int16/32/64, float32/64, and complex64/128). \n", - "\n", - "**NOTE:** by default, Heat will allocate floating-point values in single precision, due to a much higher processing performance on GPUs. This is one of the main differences between Heat and NumPy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.zeros((3, 4,))\n", - "a" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "b = a.astype(ht.int64)\n", - "b" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Operations\n", - "\n", - "Heat supports many mathematical operations, ranging from simple element-wise functions, binary arithmetic operations, and linear algebra, to more powerful reductions. Operations are by default performed on the entire array or they can be performed along one or more of its dimensions when available. Most relevant for data-intensive applications is that **all Heat functionalities support memory-distributed computation and GPU acceleration**. This holds for all operations, including reductions, statistics, linear algebra, and high-level algorithms. \n", - "\n", - "You can try out the few simple examples below if you want, but we will skip to the [Parallel Processing](#Parallel-Processing) section to see memory-distributed operations in action." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.full((3, 4,), 8)\n", - "b = ht.ones((3, 4,))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a + b" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.sub(a, b)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.arange(5).sin()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a.T" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "b.sum(axis=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "Heat implements the same broadcasting rules (implicit repetion of an operation when the rank/shape of the operands do not match) as NumPy does, e.g.:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.arange(10) + 3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.ones((3, 4,))\n", - "b = ht.arange(4)\n", - "c = a + b\n", - "\n", - "a, b, c" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Indexing\n", - "\n", - "Heat allows the indexing of arrays, and thereby, the extraction of a partial view of the elements in an array. It is possible to obtain single values as well as entire chunks, i.e. slices." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "a = ht.arange(10)\n", - "a" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "a[3]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "a[1:7]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "a[::2]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**NOTE:** Indexing in Heat is undergoing a [major overhaul](https://github.com/helmholtz-analytics/heat/pull/938), to increase interoperability with NumPy/PyTorch indexing, and to provide a fully distributed item setting functionality. Stay tuned for this feature in the next release." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Documentation\n", - "\n", - "Heat is extensively documented. You may find the online API reference on Read the Docs: [Heat Documentation](https://heat.readthedocs.io/). It is also possible to look up the docs in an interactive session." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "help(ht.sum)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Parallel Processing\n", - "---\n", - "\n", - "Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs. \n", - "\n", - "For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem.\n", - "\n", - "**NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### GPUs\n", - "\n", - "Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "The following cells will only work if you have a GPU available.\n", - "\n", - "
" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "ht.zeros((3, 4,), device='gpu')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Arrays on the same device can be seamlessly used in any Heat operation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.zeros((3, 4,), device='gpu')\n", - "b = ht.ones((3, 4,), device='gpu')\n", - "a + b" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.full((3, 4,), 4, device='cpu')\n", - "b = ht.ones((3, 4,), device='gpu')\n", - "a + b" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It is possible to explicitly move an array from one device to the other and back to avoid this error." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px --target 0\n", - "a = ht.full((3, 4,), 4, device='gpu')\n", - "a.cpu()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We'll put our multi-GPU setup to the test in the next section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Distributed Computing\n", - "\n", - "Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept.\n", - "\n", - "\n", - "\n", - "The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions. \n", - "\n", - "\n", - "\n", - "\n", - "Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation.\n", - "\n", - "You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node.\n", - "\n", - "On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This\n", - "\n", - "```bash\n", - "python ./my_script.py\n", - "```\n", - "\n", - "becomes this instead:\n", - "\n", - "```bash\n", - "mpirun -n python ./my_script.py\n", - "```\n", - "On an HPC cluster you'll of course use SBATCH or similar.\n", - "\n", - "\n", - "Let's see some examples of working with distributed Heat:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)). " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "import heat as ht\n", - "dndarray = ht.arange(60).reshape(5,4,3)\n", - "dndarray" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.\n", - "\n", - "Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "dndarray = ht.arange(60, split=0).reshape(5,4,3)\n", - "dndarray" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the \"local array\" via the `larray` attribute." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "dndarray.larray" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator.\n", - "\n", - "(In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "print(f\"Global shape of the dndarray: {dndarray.shape}\")\n", - "print(f\"On rank {dndarray.comm.rank}/{dndarray.comm.size}, local shape of the dndarray: {dndarray.lshape}\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported. \n", - "\n", - "The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px \n", - "# transpose \n", - "dndarray.T\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# reduction operation along the distribution axis\n", - "%timeit -n 1 dndarray.sum(axis=0)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px \n", - "# reduction operation along non-distribution axis: no communication required\n", - "%timeit -n 1 dndarray.sum(axis=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Operations between tensors with equal split or no split are fully parallelizable and therefore very fast." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "other_dndarray = ht.arange(60,120, split=0).reshape(5,4,3) # distributed reshape\n", - "\n", - "# element-wise multiplication\n", - "dndarray * other_dndarray\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems.\n", - "\n", - "So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize. \n", - "\n", - "With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# free up memory if necessary\n", - "try:\n", - " del x, y, z\n", - "except NameError:\n", - " pass\n", - "\n", - "n, m = 40000, 40000\n", - "x = ht.random.randn(n, m, split=0, device=\"gpu\") # distributed RNG\n", - "y = ht.random.randn(m, n, split=None, device=\"gpu\")\n", - "z = x @ y\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that:\n", - "- If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.)\n", - "- If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU.\n", - "- If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option.\n", - "\n", - "Time the multiplication for different split parameters and see how the performance changes.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "z = %timeit -n 1 -r 5 x @ y " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Heat supports many linear algebra operations:\n", - "```bash\n", - ">>> ht.linalg.\n", - "ht.linalg.basics ht.linalg.hsvd_rtol( ht.linalg.projection( ht.linalg.triu(\n", - "ht.linalg.cg( ht.linalg.inv( ht.linalg.qr( ht.linalg.vdot(\n", - "ht.linalg.cross( ht.linalg.lanczos( ht.linalg.solver ht.linalg.vecdot(\n", - "ht.linalg.det( ht.linalg.matmul( ht.linalg.svdtools ht.linalg.vector_norm(\n", - "ht.linalg.dot( ht.linalg.matrix_norm( ht.linalg.trace( \n", - "ht.linalg.hsvd( ht.linalg.norm( ht.linalg.transpose( \n", - "ht.linalg.hsvd_rank( ht.linalg.outer( ht.linalg.tril( \n", - "```\n", - "\n", - "and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](tinyurl.com/demoissues) and we'll get back to you." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interoperability\n", - "\n", - "We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows. Here a basic example with xarrays:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "import xarray as xr\n", - "\n", - "local_xr = xr.DataArray(dndarray.larray, dims=(\"z\", \"y\", \"x\"))\n", - "# proceed with local xarray operations\n", - "local_xr\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**NOTE:** this is not a distributed `xarray`, but local xarray objects on each rank.\n", - "Work on [expanding xarray support](https://github.com/helmholtz-analytics/heat/pull/1183) is ongoing.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "import torch\n", - "torch_array = torch.arange(5)\n", - "heat_array = ht.array(torch_array, copy=False)\n", - "heat_array[0] = -1\n", - "print(torch_array)\n", - "\n", - "torch_array = torch.arange(5)\n", - "heat_array = ht.array(torch_array, copy=True)\n", - "heat_array[0] = -1\n", - "print(torch_array)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](tinyurl.com/demoissues) if you encounter any issues or have any feature requests." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the [next notebook](3_internals.ipynb), let's have a look at Heat's most important internal functions." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "heat-dev-torch2", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.18" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/tutorials/local/3_internals.ipynb b/tutorials/local/3_internals.ipynb deleted file mode 100644 index f592f77ed2..0000000000 --- a/tutorials/local/3_internals.ipynb +++ /dev/null @@ -1,301 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Heat as infrastructure for MPI applications\n", - "\n", - "In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Your IPython cluster should still be running. Let's check it out." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from ipyparallel import Client\n", - "rc = Client(profile=\"default\")\n", - "rc.ids\n", - "\n", - "if len(rc.ids) == 0:\n", - " print(\"No engines found\")\n", - "else:\n", - " print(f\"{len(rc.ids)} engines found\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If no engines are found, go back to the [Intro](1_intro.ipynb) for instructions." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We already mentioned that the DNDarray object is \"MPI-aware\". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it. \n", - "\n", - "We will use the %%px magic in every cell that executes MPI code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "\n", - "%%px\n", - "a = ht.random.randn(7,4,3, split=0)\n", - "a.comm" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# MPI size = total number of processes \n", - "size = a.comm.size\n", - "\n", - "print(f\"a is distributed over {size} processes\")\n", - "print(f\"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# MPI rank = rank of each process\n", - "rank = a.comm.rank\n", - "# Local shape = shape of the data on each process\n", - "local_shape = a.lshape\n", - "print(f\"Rank {rank} holds a slice of a with local shape {local_shape}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Distribution map\n", - "\n", - "In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array. \n", - "\n", - "The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray. \n", - "\n", - "The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "lshape_map = a.lshape_map\n", - "lshape_map" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Modifying the DNDarray distribution\n", - "\n", - "In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm.\n", - "\n", - "Depending on your needs, you can choose between:\n", - "- `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a \"target map\".\n", - "- `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis.\n", - "\n", - "Let's see some examples." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "#redistribute\n", - "target_map = a.lshape_map\n", - "target_map[:, a.split] = torch.tensor([1, 2, 2, 2])\n", - "# in-place redistribution (see ht.redistribute for out-of-place)\n", - "a.redistribute_(target_map=target_map)\n", - "\n", - "# new lshape map after redistribution\n", - "a.lshape_map" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# local arrays after redistribution\n", - "a.larray" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# resplit\n", - "a.resplit_(axis=1)\n", - "\n", - "a.lshape_map" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# \"un-split\" distributed array\n", - "a.resplit_(axis=None)\n", - "# each process now holds a copy of the entire array" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# make `a` split again\n", - "a = ht.array(a, split=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Making disjoint data into a global DNDarray\n", - "\n", - "Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend.\n", - "\n", - "You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be \"joined\" into a global, distributed DNDarray." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# create some random local arrays on each process\n", - "import numpy as np\n", - "local_array = np.random.rand(3, 4)\n", - "\n", - "# join them into a distributed array\n", - "a_0 = ht.array(local_array, is_split=0)\n", - "a_0.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `ht.array` function takes any data object as an input that can be converted to a torch tensor. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can access the MPI communication functionalities of the DNDarray through the `comm` attribute, i.e.:\n", - "\n", - "```python\n", - "# these are just examples, this cell won't do anything\n", - "a.comm.Allreduce(a, b, op=MPI.SUM)\n", - "\n", - "a.comm.Allgather(a, b)\n", - "a.comm.Isend(a, dest=1, tag=0)\n", - "```\n", - "\n", - "etc." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the next notebooks, we'll show you how we use Heat's distributed-array infrastructure to scale complex data analysis workflows to large datasets and high-performance computing resources.\n", - "\n", - "- [Data loading and preprocessing](4_loading_preprocessing.ipynb)\n", - "- [Matrix factorization algorithms](5_matrix_factorizations.ipynb)\n", - "- [Clustering algorithms](6_clustering.ipynb)" - ] - } - ], - "metadata": { - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/tutorials/local/4_loading_preprocessing.ipynb b/tutorials/local/4_loading_preprocessing.ipynb deleted file mode 100644 index 9abf4f3f55..0000000000 --- a/tutorials/local/4_loading_preprocessing.ipynb +++ /dev/null @@ -1,209 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Loading and Preprocessing\n", - "\n", - "### Refresher\n", - "\n", - "Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. \n", - "\n", - "As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each \"worker\" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user.\n", - "\n", - "In other words: \n", - "- you don't have to worry about optimizing data chunk sizes; \n", - "- you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; \n", - "- you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### I/O\n", - "\n", - "Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).\n", - "\n", - "This particular example data set (generated from all Asteroids from the [JPL Small Body Database](https://ssd.jpl.nasa.gov/sb/)) is really small, but it allows to demonstrate the basic functionality of Heat. \n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Your ipcluster should still be running (see the [Intro](1_intro.ipynb)). Let's test it:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from ipyparallel import Client\n", - "rc = Client(profile=\"default\")\n", - "rc.ids" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above cell should return [0, 1, 2, 3].\n", - "\n", - "Now let's import `heat` and load the data set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "import heat as ht\n", - "X = ht.load_hdf5(\"/p/scratch/training2404/data/JPL_SBDB/sbdb_asteroids.h5\",dtype=ht.float64,dataset=\"data\",split=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data exploration\n", - "\n", - "Let's get an idea of the size of the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px \n", - "# print global metadata once only\n", - "if X.comm.rank == 0:\n", - " print(f\"X is a {X.ndim}-dimensional array with shape{X.shape}\")\n", - " print(f\"X takes up {X.nbytes/1e6} MB of memory.\")\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "X is a matrix of shape *(datapoints, features)*. \n", - "\n", - "To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "features_mean = ht.mean(X,axis=0)\n", - "features_var = ht.var(X,axis=0)\n", - "features_max = ht.max(X,axis=0)\n", - "features_min = ht.min(X,axis=0)\n", - "# ht.percentile is buggy, see #1389, we'll leave it out for now\n", - "#features_median = ht.percentile(X,50.,axis=0)\n", - "\n", - "if ht.MPI_WORLD.rank == 0:\n", - " print(f\"Mean: {features_mean}\")\n", - " print(f\"Var: {features_var}\")\n", - " print(f\"Max: {features_max}\")\n", - " print(f\"Min: {features_min}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Preprocessing/scaling\n", - "\n", - "Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated.\n", - "\n", - "Again, please let us know if you're missing any features." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# Standard Scaler\n", - "scaler = ht.preprocessing.StandardScaler()\n", - "X_standardized = scaler.fit_transform(X)\n", - "standardized_mean = ht.mean(X_standardized,axis=0)\n", - "standardized_var = ht.var(X_standardized,axis=0)\n", - "print(f\"Standard Scaler Mean: {standardized_mean}\")\n", - "print(f\"Standard Scaler Var: {standardized_var}\")\n", - "\n", - "# Robust Scaler\n", - "scaler = ht.preprocessing.RobustScaler()\n", - "X_robust = scaler.fit_transform(X)\n", - "robust_mean = ht.mean(X_robust,axis=0)\n", - "robust_var = ht.var(X_robust,axis=0)\n", - "\n", - "print(f\"Robust Scaler Mean: {robust_mean}\")\n", - "print(f\"Robust Scaler Median: {robust_var}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data. Check out our dedicated \"clustering\" notebook for an example.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)! " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/tutorials/local/6_clustering.ipynb b/tutorials/local/6_clustering.ipynb deleted file mode 100644 index 6e6960b405..0000000000 --- a/tutorials/local/6_clustering.ipynb +++ /dev/null @@ -1,787 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Cluster Analysis\n", - "================\n", - "\n", - "This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html). \n", - "\n", - "We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation.\n", - "\n", - "We will use matplotlib for visualization of data and results." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4 engines found\n" - ] - } - ], - "source": [ - "from ipyparallel import Client\n", - "rc = Client(profile=\"default\")\n", - "rc.ids\n", - "\n", - "if len(rc.ids) == 0:\n", - " print(\"No engines found\")\n", - "else:\n", - " print(f\"{len(rc.ids)} engines found\")" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "%px import heat as ht\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Spherical Clouds of Datapoints\n", - "------------------------------\n", - "For a simple demonstration of the clustering process and the differences between the algorithms, we will create an\n", - "artificial dataset, consisting of two circularly shaped clusters positioned at $(x_1=2, y_1=2)$ and $(x_2=-2, y_2=-2)$ in 2D space.\n", - "For each cluster we will sample 100 arbitrary points from a circle with radius of $R = 1.0$ by drawing random numbers\n", - "for the spherical coordinates $( r\\in [0,R], \\phi \\in [0,2\\pi])$, translating these to cartesian coordinates\n", - "and shifting them by $+2$ for cluster ``c1`` and $-2$ for cluster ``c2``. The resulting concatenated dataset ``data`` has shape\n", - "$(200, 2)$ and is distributed among the ``p`` processes along axis 0 (sample axis)." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "\n", - "num_ele = 100\n", - "R = 1.0\n", - "\n", - "# Create default spherical point cloud\n", - "# Sample radius between 0 and 1, and phi between 0 and 2pi\n", - "r = ht.random.rand(num_ele, split=0) * R\n", - "phi = ht.random.rand(num_ele, split=0) * 2 * ht.constants.PI\n", - "\n", - "# Transform spherical coordinates to cartesian coordinates\n", - "x = r * ht.cos(phi)\n", - "y = r * ht.sin(phi)\n", - "\n", - "\n", - "# Stack the sampled points and shift them to locations (2,2) and (-2, -2)\n", - "cluster1 = ht.stack((x + 2, y + 2), axis=1)\n", - "cluster2 = ht.stack((x - 2, y - 2), axis=1)\n", - "\n", - "data = ht.concatenate((cluster1, cluster2), axis=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's plot the data for illustration. In order to do so with matplotlib, we need to unsplit the data (gather it from\n", - "all processes) and transform it into a numpy array. Plotting can only be done on rank 0.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "data_np = ht.resplit(data, axis=None).numpy() " - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\u001b[0;31mOut[0:13]: \u001b[0m[]" - ] - }, - "metadata": { - "after": null, - "completed": null, - "data": {}, - "engine_id": 0, - "engine_uuid": "e3649dd0-f970dcd5e37935a1f3fe07c8", - "error": null, - "execute_input": "import matplotlib.pyplot as plt\nplt.plot(data_np[:,0], data_np[:,1], 'bo')\n", - "execute_result": { - "data": { - "text/plain": "[]" - }, - "execution_count": 13, - "metadata": {} - }, - "follow": null, - "msg_id": null, - "outputs": [], - "received": null, - "started": null, - "status": null, - "stderr": "", - "stdout": "", - "submitted": "2024-03-21T09:43:55.286159Z" - }, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[output:0]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA3+0lEQVR4nO3dfXBV9Z3H8c8llUAVUvMAi+ZaGHTdtq7OFt0OGdklW5eRP9q4FBS6WlwfOihYkbXtqjsCTh0cQZFqsbo7o53V4BNROnbraFui7DCOpSNbt46OIBSSgCQwmyDTJtPL3T/OnuTm5jz8zrnn3HNu7vs1k4m5Ofeec6+255vf9+GXyefzeQEAACRgQtIXAAAAqheBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASAyBCAAASMxnkr4AL6dPn1ZPT4+mTJmiTCaT9OUAAAAD+XxeJ0+e1DnnnKMJE7zXPFIdiPT09CibzSZ9GQAAIITDhw+rubnZ85hUByJTpkyRZL2RqVOnJnw1AADAxMDAgLLZ7PB93EuqAxE7HTN16lQCEQAAKoxJWQXFqgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDEEIgAAIDGxBiKPP/64Lr744uE5IHPnztXPf/7zOE8JAAAqSKwDzZqbm/XAAw/o/PPPlyT95Cc/UVtbm95991196UtfivPUAAAYy+WkXbukI0ekGTOkefOkmpqkr6o6ZPL5fL6cJ6yvr9fGjRt14403+h47MDCguro69ff3M1kVABCLjg7p9tulrq6Rx5qbpS1bpEWLkruuShbk/l22GpFcLqfnnntOp06d0ty5cx2PGRwc1MDAwKgvAADi0tEhLV48OgiRpO5u6/GOjmSuq5rEHoi89957Ouuss1RbW6sVK1bo5Zdf1he/+EXHYzds2KC6urrhL3beBYDqk8tJnZ3Stm3W91wuvvN8+9uSU17Afmz16vjOD0vsgciFF16ovXv36u2339Ytt9yi5cuX6/3333c89q677lJ/f//w1+HDh+O+PABAinR0SDNnSq2t0je/aX2fOTOelYn775eOH3f/fT4vHT5s1Y4gPmWvEbniiis0e/ZsPfHEE77HUiMCANXDTpMU35XsDVxfeim6mo1cTpo2TTpxwv/Y9nZp2bKxzy8ubpUoeLUFuX/H2jXjJJ/Pa3BwsNynBQCkWC5nFYy6pUkyGStN0tYWzc191y6zIESygopCTsWtDQ3W98IVFrvgta2NAMVLrIHI3XffrYULFyqbzerkyZN67rnn1NnZqddeey3O0wIAKsyuXWMLRgsVpknmzy/9fEeOmB3X0DCy2iG5r9o4pXi6u6VvfMN6DacAhY4cS6yByCeffKLrrrtOR44cUV1dnS6++GK99tpr+vu///s4TwsAqDCmgYHpcX6KVzncfOc7I6sXXqs2TuzjioMUuyMnylRTJSt7jUgQ1IgAQHXo7LQKU/3s3BnNikguZxXBdne7BxYNDdInn4wEIqbXaCKTsVZGDhwYn2maVM4RAQDAzbx51o3ZLkwtlslI2ezoNEkpamqs9Ij92k6efHJ0kBDVaoxER04hAhEAQOK8AgP750ceiXb1YNEiKz1y7rmjH89mpe3bx6ZNTNM5QUQZ3FQqAhEAQCq4BQZnny2tW2d1n8RxzoMHrZRPe7v1/cAB59oNe9UmSnEEN5WGGhEAQKrkctawsS1bRrfYltptEsXGdvfdJ61dG+78hagRGcGKCAAgVXbssFZAiud8lLL/S1QTWy+4wPxYe7ZIuVJNlYpABACQGn6DzaTg+79EubGdaSpl82ar42b79rGppuZmWncLkZoBAKRG1G28dpuu27C0oCkSv7Zfp9eLIiVUaUjNAAAqkmkXyfbto3fmdduxN8jEVhNhuntqaqzgY8YM6/3t2sWOvoUIRAAAqWGa+njssZE6j+99z73+I46JrW7dPW4pl3LuKFyJSM0AAFLDZOKpCXt1Yt06sy6XMBNbTVIu5dxROE2C3L8JRAAAqWLfvKXSg5Fzz7Veo6fHvKYjKlHXp5icLy21KNSIAAAqllvqI6h83goCvv1t6+dyt9Ga1qd0djrXtwRRyekfAhEAQOoUTjxdtaq017rggmA1HVExrTu5+urSAogo25OTQCACADDi1pkSl5oaq26jqam015kxI9go96iYFt6WMrgtjrkr5UaNCADAV0eHdcMr/Ku71JHrpud1KvY0Yddg7Nsn7d4dXe2EaS1GKYW3pvUjUc9diQo1IgAAIyarHEkt/Xv9te/Hrv9YulSaPds/9WG62hOkFsNr5ogf0/kmcbQnlxuBCABUKZObapJL/37FnrZrrhm7K25zs3TnndKmTf4BlGlwESYgcyu8ra/3f1+SfwBhmv5J8y6/pGYAoAqZzrdIcul/2zYrMPDT3m4VfBamS1parJUQv9bZhx6yAhm/z6HUVtzidE4uJ11xhf978/tcw4ycL4cg9+/PlOmaAAAp4bfKkclYqxxtbcku/U+bZnbcjBkjha22zk6z1tlbbzX7HIKMincKHIqvL5ezAgS/AGLePPdz2q+7ZYsVVGYyo1+rUnb5JTUDAFUmyE01qaX/jg5p+XLvYzIZKZt1vlmbBkZ9fe6/K/wcog7IwuxZ4yboyPm0IRABgCoT5KY6b97Y+gsnP/tZaddUyE4bdXe7H2PfrB9+2AoUiotMowyM7HSKiSDnjTKASKI9OSrUiABAlQla9/Hii1YNhp8XXxwZzR6WXy2GrblZWrbMCkCcWorb2vxrJxobpd5e/2vaudMKyOKqxUjTaPao0L4LAHBlr3J4tZQ2NVkFn/Y/m7j11tK7Z0w7ZW680bsjZscO/9TH1q3en0Nh6ifKVEoxu35k2TLre6UHIUERiABAlTGZb9Hba3WddHSYp3J6e/3nXvgxPddDD/m3FLe1eac+Fi8OFlxUei1GWhGIAEAVMtlYzl5d+Ogj89cttXvGtMbi00/df2cXma5bZ83r2L/fvXYiaHBRybUYaUWNCABUsaEh6ybs1j2SyVi//+MfvTtMbKXOEzGZiyEFm7ZqMop+PNZpJIkaEQCAkd27/VtYu7qklSv9X8utlTYIv1qMfD74yHeTUfTVXqeRJAIRAKhipqmUCy+Uvvtd999nMtENzvJKl6xeHfz1CutGhobKu4Mw/BGIAEAVCzIf48EHrRbd4i6abDb6Yk23Woy2tnCvZ9eNnHuu2YZ1KB9qRACgyhTWQ0ybJl1/fbD5GEnWU5jOGTFVvKcMosFeMwAARx0d1j4zhTfyhoaRvVVM9iop3jelnGpqrDqOjRujeb3iPWWoDSk/UjMAUCXctrE/ccL6Xrw1fRrnY3R0WIPMolS4pwzKjxURAKgCJjvuTp4s/eIX0rFj0aRcok7heL2HQsUrO6bi2EEY/ghEAKAKmOy429U1kvoolVMKyGSehxfT8e/Fe8g0NZntKRP1DsIwQyACAFUg6m3svdgpoOJVCXueR9h0T5Dx79nsyEpMS4s1rt6vILfUGSgIhxoRAKgCcWxj78QvBSRZhaFh5neYXtuaNVbdiz2cbOLE+DasQ+kIRACgCvjtuFu402wpTFJAYQtDTXYNlqxJscWTVNO8YV0uV91D1ghEAKAKxLmNfaE4U0AmuwYXKl55SeOGdR0d1lyUah6yRiACAFWiHKsCcaeA7PfQ2Oh9XOEOvIWrDGH3lIlj1cKtndpkb5zxhMmqAFBl4pyMarJ7bvGk1jCefVa69lrz4706dvw+jzg6gPwmxEb1OSWFyaoAAFdxTka10yeLF5tPag2jeFXHj1vHjluQcfPN0gUXSB99JK1da/56poLU0iQ1xbZcWBEBAETO6QafzVqttU1Npa/G+K28OCleZXBrMzZVyqrFtm1WTYif9vZo5rqUGysiAIBELVpk7d1SmPLo65PuuGPs6sPmzVbNh19wUrxZ3003WTUgpgpXGebNM5vSavp6QVctytVOXQkIRAAAsShMAXV0SFdfPfbG39UlLVky+jGn+gunFZawjhwxn9Jq+npB2a3IDFmjawYAEDPTPWJsxV0jbt0lYc2YEe2+MtOmBX9OudqpKwGBCACMcyatp3EO1Qq6+lA4gXVoqPQUSqGzzrLeW5jgIWppHrJWTrEGIhs2bNBll12mKVOmaNq0abrqqqv04YcfxnlKAEABk4FZcQ/VCrP6YNdfbN0a3UqIJH36qXTFFdL110sNDWaD0fy8+mr456ZxyFq5xdo1c+WVV2rp0qW67LLL9Kc//Un33HOP3nvvPb3//vs688wzfZ9P1wwAhOfWFWLffF96yfrud0ypN8XOTiu4CWPVKumxx0o7v5PC1uLiNuOg7C6gakijmApy/y5r+25vb6+mTZumN998U3/zN3/jezyBCACEYzIwy04JxD1UK0yrrW3lSulHPwp/bi+ZjFRfL02aZF1bKXbuHP/zPoIIcv8ua41If3+/JKm+vt7x94ODgxoYGBj1BQAIzmRgVleX2VCtRx8trWYk6B4xhX70o/hWGvJ56fhxqw24VIXpp2rfxC6osgUi+Xxea9as0eWXX66LLrrI8ZgNGzaorq5u+CubzZbr8gBgXImyK+SOO0qvGXErzDQR941806bSX8Oe98EmdsGVLTWzcuVK/exnP9N//dd/qbm52fGYwcFBDQ4ODv88MDCgbDZLagYAAiqlLsNJVDUjxfu69PZKa9aYFaROmCCdPh3+3HEoTF/t2BF/vU1Qce4r5CV1NSK33XabXnnlFb311luaNWuW8fOoEQGAcEw2n7NXJ0xrN+LaiC2Xs9I/d9wR3WuaKqVQtTDAaGvzrsmRrM/u4MHyFbXGsVmfqdTUiOTzea1atUodHR361a9+FSgIAQCEZzIwa8uWkWNMFI40j1JNjTR9erSvaSpIEFJ8Py2c92EyK6WrS7r//uDXGIbbELjiYXFpEGsgsnLlSj3zzDNqb2/XlClTdPToUR09elR/+MMf4jwtAEBmA7MWLZLuvDPY60ZZf2KrhD1VHnvMfd6H6Weydm38QYDXJNvCYXFpKaKNNTWTcSmPfuqpp3T99df7Pp/UDACUzqtOwK/N10kcraqltPiWi9f7DlKTk81Gn94Kcy2bN1srUXHUjqRm990yjigBALgo3HyuWJDx63FuxGankhYvdq/bmDxZSmpBPZv1ft/2JnYmn2XYHXtNma7OFNbklKt2xAl7zQBAFQuaZolzIza/Ft+og5BMxhrzbnKc3/surMkxEUd6yxYmzZVk7QiBCABUMdObVlNTedpP7b1X1q+P9zx25cCTT0rbt1srAk6yWfP3vWiR+XXHWRNjr84EGR6XZO1IWUe8B0WNCADEK5ez6gSOH3c/ZupUa97HxInlu6agdStBZbPWKocdYNh1NN3d1nttarJWZoLWTpiM1o+jBbqY3TUjBa+5iaIGKDU1IgCAynfGGdHfNL0KaIPUrYSxebN0222j35NTHY3TNdrX5zYgrLDWxa1r5Zpr4p8lYqe5iueImIgzbeSEQAQAqtiuXd6rIZL1+yiLK/0GbcV5I2xqkm691T8QcLpGu56k8PNyKvK0W6I3bnR+7U2brJWRBx8M9x5MLVpkDVqzA6dPPjEbGlfuVmpqRACgipne9KMKDkwGbcV5I+ztlWbP9i7KdLvG48fHBm1ORZ65nLXhnZeNG6UXXwx27WHYKz3LllmrQH61Iw0N8XRFeSEQAYAqZnrTLyU4sHej/Y//kG64wX/QVktL8GLLILw6RLyGgTlxKvI0TS2tXFnewlA7beT13o4ft/bMKScCEQCoYn4dFpmM/wwNL4W70X7rW1J/v/ux9gj53bvdx9NHwatDJEx9SvHoe9PVo97e6Mfl+2lr825ZzmTK3zlDIAIA44C96rBtm/Xd9EZisidN2NkhbikOP0eOuM8UaWwMfh1O3PbNKSUFZT83yOpRuQtD/WqC4tpPyAuBCABUuMJVh29+0/o+c6b5cCqTPWmCCpriKGTfyO2ZIoX7uzzySPDX81IcCJSSgrKfO2+eecA0bVr484VR7pogEwQiAFDBotpl1emmX7ipW1CdncFXQpzSQIXFlvPnu09dLfZP/2R2XHHgEWYYWPF119RY9R9pVI6aoKAIRACgQkW9y2rxTT/srIuODunqq4M/L5+Xli71Pq9JoNDcLP34x+FqX7xSVW6vI41NX114of9zJenYMbPjohJ3TVAYBCIAUKH8CiuTyPfbKzQnToR7/saN0n33ude6+NW0ZDLW7ydODF/74paqamgYW+jplr5K48qDFG9NUFiMeAeACrVtm1UT4qe93VrliFsco9nddoV1GjhWPLbd67iHH7bqONwmpNrvJ+hk1cLnzpxppcic7rLlGvXuxvTzCyvI/ZtABAAqVGenVZjqJ4q9Q0yYXk8Q9l/phasOQfeFKQ4oenulNWtG34SbmqR//EervTXo/jJu3PZ7cXpPSfAas18qAhEAqAJp+6vbdIUmqML3sWOH93h4P3Zw4HXna2yUrr02mqAk7pWHtCIQAYAqkaa/uuNYESm0fr20bt3YIML0vYZJHQUJcrzOG9fKQ1oRiABAFUnLX90mKzTnnmv9rqcn+IyR+nr3IliT1Z8wgVJa0iiVJsj9m64ZAKhwUc8ACcukI2PLFumHP3Q+xo9XJ45Jh1CYIV1h2qARDIEIAIwDUc0AKZXJlFa3Y9xkMt77oxTyCjbCtsom0QZdTQhEAACRMlmhKT5m/fqROSCF7J+/8x2zc3sFG2GmphaKe+x52P2CKt1nkr4AAMD4Y6/QBDnmooucO2IeecTqYPm3f/PvEPKaCGqnjhYvto4PWqMS5/AxpzqfKAplKwHFqgCA1PDqMImqQ8jppu8l7jZot5biSi6UpWsGADAuRdUhZAc8O3ZIzzwj9fU5H+c1UC2Kdly/luKkJ7CGRSACABi3op7LURiUPPusNXnVVhzkRJ1CSdt03KgEuX9TIwIAqCgm9SdhXm/+fGnTJv/UUPGf793d1uNhUiimBbBxF8omiUAEAID/5xbk5HLWSohTDiGft1Ioq1dbRbVBVmfSuktvOdG+CwCoaiZts7t2eRe3hp014tdSnMlY6SGvbqBKRyACAKhaHR1WsWhrq7VhX2ur9XNHx+jjwqRQTAIck2m0jzxSWYWqQRGIAACqkl3zUbzSYdd8FAYjQVMopgGOZDaNdjyjawYAUHWCts2abOjX3Czt2yc98IC0dq3zMZJ7cGGvoHR2Wj/bBbSVuBrCpncAAHgIWvNhkkJZulSaPds5CLFfU3LfQG/HDun666Uf/MD6uuIK91WU8YRABABQdcLUfHilUO6802r99ZvW6lbUGiRNNN7QvgsAqDph22YXLbJadAtnjbS0WCshQQodiota42gNrhQEIgCAyEQ99TQudttsmE30imeNdHaa71tjKwxwgqSJKmm6qilSMwCASATpFElalG2zQaaeOs0FqfbpqgQiAICSVWKNQ1Rts0GnnhYHONU+XZX2XQBASSp9B9lS00l+rb02t83xTFuD0/r5OaF9FwBQNnGNPy8Xu+Zj2bJwczu80jy29eulgwedV1mqfboqgQgAoCTVXuMguad5sllp+3bp3nu9A4lqnq5K1wwAoCTVXuNgc2rtDZLmKfX5lYoaEQBAScpV41AprcGgRgQAUEblqHGopNZgBEMgAgAoWZw1DpXYGgxzpGYAAJGJOn1S6a3B1SrI/ZtiVQBAZIrHn5eq2sefV4NYUzNvvfWWvva1r+mcc85RJpPRK6+8EufpAADjDK3B41+sgcipU6d0ySWX6LHHHovzNACAcYrW4PEv1tTMwoULtXDhwjhPAQAYx0rZJReVIVVdM4ODgxoYGBj1BQCoXtU+/rwapCoQ2bBhg+rq6oa/stls0pcEAEhYNY8/rwZla9/NZDJ6+eWXddVVV7keMzg4qMHBweGfBwYGlM1mad8FADBZtYJUbPtubW2tamtrk74MAEAKRd0ajHRIVWoGAABUl1hXRD799FPt27dv+OcDBw5o7969qq+v13nnnRfnqQEAQAWINRDZs2ePWltbh39es2aNJGn58uV6+umn4zw1AACoALEGIvPnz1eKt7IBAAAJo0YEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkhkAEAAAkpiyByNatWzVr1ixNmjRJc+bM0a5du8pxWgAAkHKxByLPP/+8Vq9erXvuuUfvvvuu5s2bp4ULF+rQoUNxnxoAAKRcJp/P5+M8wVe+8hV9+ctf1uOPPz782Be+8AVdddVV2rBhg+dzBwYGVFdXp/7+fk2dOjXOywQAABEJcv+OdUVkaGhIv/nNb7RgwYJRjy9YsEC7d+8ec/zg4KAGBgZGfQEAgPEr1kCkr69PuVxO06dPH/X49OnTdfTo0THHb9iwQXV1dcNf2Ww2zssDAAAJK0uxaiaTGfVzPp8f85gk3XXXXerv7x/+Onz4cDkuDwAAJOQzcb54Y2Ojampqxqx+HDt2bMwqiSTV1taqtrY2zksCAAApEuuKyMSJEzVnzhy98cYbox5/44031NLSEuepAQBABYh1RUSS1qxZo+uuu06XXnqp5s6dqyeffFKHDh3SihUr4j41AABIudgDkWuuuUbHjx/XfffdpyNHjuiiiy7Sf/7nf+rzn/983KcGAAApF/sckVIwRwQAgMqTmjkiAAAAXghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYghEAABAYmINRO6//361tLTos5/9rD73uc/FeSoAAFCBYg1EhoaGtGTJEt1yyy1xngYAAFSoz8T54uvXr5ckPf3003GeBmWWy0m7dklHjkgzZkjz5kk1NUlfFQCgEsUaiAQ1ODiowcHB4Z8HBgYSvBo4eekl6dZbpd7ekceam6UtW6RFi5K7LgBAZUpVseqGDRtUV1c3/JXNZpO+pNTL5aTOTmnbNut7Lhffub73PWnJktFBiCR1dUmLF0sdHfGdGwAwPgUORNatW6dMJuP5tWfPnlAXc9ddd6m/v3/46/Dhw6Fep1p0dEgzZ0qtrdI3v2l9nzkznoDgxReljRvdf5/PS6tXxxsIAQDGn8CpmVWrVmnp0qWex8ycOTPUxdTW1qq2tjbUc6tNR4e1CpHPj368u9t6/KWXokuV5HJWOsbP4cNW7cj8+WOf71RTQq0JACBwINLY2KjGxsY4rgWGcjnp9tvHBiGS9VgmY61OtLVFc2PftUvq6zM79siR0T93dFjX2tU18lhzs7RsmZVOKn784YelpiaCEwCoFrEWqx46dEgnTpzQoUOHlMvltHfvXknS+eefr7POOivOU49ru3aNvoEXy+fdVyfCKA4uvMyYMfLPbqs2XV3OaZ6uLunqq0c/RiEsAIxvsQYi9957r37yk58M//xXf/VXkqSdO3dqfhR3yCplGhgECSC8FAYXXpqarBUMyXvVJog4Uk0AgPSItWvm6aefVj6fH/NFEFIa08DA9Dg/8+ZZKxN+tm4dSaP4rdqYsgMZCmEBYHxKVfsuzNiBQSbj/PtMRspmR1YnSlVTY6VH3M4nSd/9rrVyYYtqNUYanWoCAIwvBCIVyA4MpLHBgf3zI49EW+S5aJGVHileGWlqkl54QXrwwdGPR7UaUyjK4AYAkA4EIhXKDgzOPXf042efLa1bZ3XMxHHOgwelnTul9nbr+5Ej1pCzYqbpnCDiCG4AAMnK5POllhPGZ2BgQHV1derv79fUqVOTvpxUyuWk+++3VkhOnBh5vNRukyhmfNx3n7R2bbjzF8pkrPdz4ACtvABQCYLcv1kRqXA7dlgrIIVBiDTSbRJmympUE1svuCD4uYvFlWoCAKQDgUgF8xtsJgXvNrFnfxR3vIQJbExTKf/6r1aa58UXx6Zzmptp3QWA8YzUTAXr7LRWK/zs3Gk22CyXs1Y+3Npug6ZI7Nfr7nYOlpxej7HvAFD5gty/Yx1ohniZdpFs325997upRz2x1e7uWbzYCjoKgxG3lEtNTTTTYAEAlYHUTAUzTX089tjoOo9czlpN2bbN+m6nbuKY2OrW3eOXcnG7RgDA+MKKSAWzW2TdUh/Furulb3xDamiQjh8fedzusIlrYuuiRVY7sWnKxW2jPPacAYDxhxqRCmcXl0rh93Wx0yQvvCDdcUewmo6ouW2UZ19jVIWr1KIAQHxo360ibqmPIOyb/po10sMPW/9cromtheLoAnISVXsyAKB0BCLjQOHE01Wrwr2GXYja1BSupiMKpsWyjz4avnYkyvZkAEDpSM3EJKmlf9OWXjft7dKyZclc/7Zt1gpFEEFqR6JuTwYAOKN9N2FJFlv29Vk30bDpC7sQNao22iABTZi9ZOyVDJPVmqjbkwEApSM1E4BJS2mSS/8dHdLVV4cLQjIZKZuVWlqia5sNWothdwEV16d4CVI7Ekd7MgCgNAQihkxuquUqtnTidW7bhP//t+1WiLp0qTR7tn/gEFdAZg9Ac7pGL4UrGV7iak8GAIRHIGLA9KYaZOk/an7nlqTTp6X1650LUe+8U9q0yf89xh2QldIF5LeS4bfiYq8KzZsX/NwAgHAIRHwEuakmufTf3W123AUXjHTYtLdb3/fts1Y3/N7jiy+WJyAr7AJqb5c2bzZ7b34rGV4rLuzyCwDJoFjVR5CbalJL/x0dVqBgeu7iQtTOTrP3eOut7sFKJmNdQ1tbNAFZ4TXmctJDD/kPWjNZybBXXJyKiR95hMmtAFBurIj4CHJTtZf+/fT1lXZNhey0kd9r2jfrXC78HjNe54gzIIt6JaN4xWXnTqtllyAEAMqPQMRHkJtqTc3IZFIvK1ZEU7BqUqAqjex8+4c/SFdcMba2I8oVmsKALMpajLCb57mxV1yWLbO+k44BgGQQiPgIelNtavJ/zePHpfvvL/3aTApUJWnKlJHzFrJrO3p7/d+jyfuSRgKyOGoxWMkAgPGHQMSHX0tpPi/ddNPIz6Zpjh/+sPRVEdNzuZ3HXkn5538eKQh1Cxy2bg0WkEW9gmFjJQMAxhcCEQN+LaVr1wZPcxw/Xnobr+m5Tp1y/51d2/Hf/y2tWyedc87o39uBw+LFwVc5WMEAAPhhr5kAcjkrpbJ27djf2Tfj55+3akBOnPB/PXtfl1KuZ+ZM724Syb+GpFBzs3TzzVabr9NIdqfx9dksHScAgBFB7t8EIgGYbpp2443W6oKfnTtL39PE7pqRRgccdoFqUHbw4pU+SWpDPwBAZQhy/yY1E4DpTJHLL5caGtyPi3KCp1cthulskUKFA8yGhpxHuVOnAQCICoFIAKbFoceOSU8+6VzYGccET7dajLa2cK9nB1Tnnmu+YR0AAGEQiAQQZKaIvVJRPOCs1K4RN06rFKYD1twUDzArxw7CAIDqQiDio3Cn2VwueAtrkl0jNTWlFcMWi3sHYQBA9WGvGQ9OHSINDSN7qxQXh0pjUy7F+7qUU0eHtaNulApHuSf1vgAA4weBiAu7G6W488Ruy62vHz2ptNRN06LuRAk6/j2oOHYQBgBUHwIRB143cXs1ZPJk6Re/sApTSw0cnFZemputAWJhAxvT8e+NjdaId1tT0+if3US9gzAAoDoRiDgwadPt6oqmBsNt5cUuDA1b2Gq6YvHQQ1Zdi70S09IizZ7tPSStuTma1mMAAChWdWB6Ey81PeG38iKFLww1XbFYs8ZKN9ndNhMnxrNhHQAATghEHARp0y2F6YC0MHvS+O0abOvrG9uSG9eGdVEp7GQqHLQGAKg8BCIO/G7iUU1GjXPlxW/X4GLFKy9Jtx676eiwBqsxaA0AxgcCEQdeN/Eo0xNxr7zYKxuNjd7H2Ssv69ZFM8o9rhULu56meBWJQWsAULnY9M5D3DvNmuye29xsrUSUEvQ8+6x07bXmx3t17Pi1GcfRAWSf12TDwVI/KwBA6dh9N0Jx7zTrtXuuFE1NRmenlcIw5XZutyDj5pulCy6QPvpIWrvW/PWCMH0PUexoDAAoDYFIhUl65cVJ8QqDW5uxqVJXLLZts2pC/LS3RzvWHgAQXJD7N3NEUmDRImun3OKVF8laCSh+zGSFpnAlZ9o06aabrBoQU4UdO/PmmU1pNX29MCsW5epkAgCUF4FIShTvSeO2z400drR8cf2F03PDOnLEfEqr6euFYXcyMWgNAMYXumZSyK075Pjx0UGINLZjxO25Yc2YEe2+MtOmhXteuTqZAADlRSCSMqab1dkKJ7AODZk/N5Pxv2nbs1LSku5I+6A1AEBwsQUiBw8e1I033qhZs2Zp8uTJmj17ttauXauhoaG4ThkJkxkYcU72DJMGsesvtm41f24+73/dl1xiXU9Li9mUVhPHjpX2/LQOWgMAhBNbjcgHH3yg06dP64knntD555+v//mf/9HNN9+sU6dOadOmTXGdtiQmMzDimpNhKyUNsn9/6ecv9Oqr1ldzs9WJsmmTFYyUUrT60UelX1dxPQ0AoHKVtX1348aNevzxx/Xxxx8bHV/O9l239tTCGRiS/zHlnvlRaPNm6Y47Sju/E/v93XmntQpUSv1Jc7O1okEtBwCMX0Hu32WtEenv71d9fb3r7wcHBzUwMDDqqxxMdsG9/fb4dsotZLpZXbH6eulLX7LqJ6JIoRSy399zz0kPPVTaa3V1jd7Ejw3sAKC6lS0Q2b9/vx599FGtWLHC9ZgNGzaorq5u+CubzZbl2kx2we3qMtsp99FHS7uZBt2sznbihLRggfTHP1rXEkcwcviwtGpV6a9lp5/YwA4AEDgQWbdunTKZjOfXnj17Rj2np6dHV155pZYsWaKbbrrJ9bXvuusu9ff3D38dPnw4+DsKIcr21DvuKP1m6tYd0tAwMkvEzYkT1vczzwx/fi+9vaW/xowZ6dzAjtUZACi/wDUifX196uvr8zxm5syZmjRpkiQrCGltbdVXvvIVPf3005owwTz2KVeNSCl1GU6iqhlx2udGsq736qtHgg4nNTXpu5HaQ8f27ZNmz07XBnZxFyEDQDVJzV4z3d3dam1t1Zw5c/TMM8+oJuBdpVyBiMkuuPbqhOl+LXHeTKMOnMqhMDirr0/XBnYmhcoEIwBgLhXFqj09PZo/f76y2aw2bdqk3t5eHT16VEePHo3rlKGZTO3csmXkGBOFe6tELcpUUlyKP8fCoWOm179jR/TXVcykUDmKImQAgLPY5oi8/vrr2rdvn/bt26fm5uZRv0vjhr92XYbT8nzhLrh33ilt3Gj+unEEDWmZdOpl7Vrpb//WeXM+0+t/5BHreXGuRpgUKttFyNOne280CAAIrqxzRIIq5xwRm1Ndhn3TsVM4QeZoxJFe8EslScnWiDQ0SJ984n6zNv0cy1Ersm2b1bETBLUjAOAtFamZSmVP7Vy2zPpeeAMMMn49kxnZqyWOa/Rr8b3ggmjPabI3jX3ck096H1t4/V7iTG/ZwqwuJdnZAwDjDYFIAEHTLHHuBuvW4mv74IPozmUHO2vWWP/sFvw0NJgXdi5aZNVemIizJibMADlqRwAgOgQiAZj+9Tx1ank6LewN4Navj/c8dqHpgw86Bz/19dY1fPJJsPfc1mZ2XJw1MWEHyJVjtQYAqgE1IgGY1jace670+99HtxoSdd1KEJs3S7fdNvq9eF2P6XUXXrtXrUu59qZxmiNior3dSuMBAEZQIxKTmhrp5pv9j+vuju4vZb8x6EHqVoJqapJuvXVsEOBUR1M8lfTFF/3Ht5vUivzv/5anjddeXdq50wouNm82e14ldDABQJqxIhKQaZdFFH8pmwzaGhwM3vURhEmHiOlqgtuAsO99z78levv28napmKzW+HUHAUC1YkUkRqZ/AZfyl3IuJ/3yl9bqi9+grWnTwp/HhF+HiNueMU6cijxzOSu483P77eUtDLVXa7zC9OPHy7NaAwDjGYFIQH5dFqW27dqpmCuu8N5Lxi6WlIJ3fQTh1SHiNZXU6/UKizxNU0tdXeUvDG1r895kMJOhcwYASlWVgUgpu6yajIMP27YbZHXBduxYuK6PINw6REqpT7FbcoO05pZ7tP2uXdaqhxs6ZwCgdFUXiPgVf5pwm+FRuJ9KUGFWFyQrBeR2PfX1wa/DS3EgUEpgYKeugqSwyl0Yavr+KmHvHwBIq9j2mkkjt+JPuw4iSBCxaJG1dG/Sxmoi6OqCPf7cTgE5XU8uZ6V4/Fx5pfTaa/7HFQcCYQKD4uueN88KoLq7vZ83YYLU0hL8fKUoRz0QAFS7qlkRiWOXVa9x8EGv7Ze/DPacfF5aunT0OYuvZ/58//qR5mar4DJM3UvQqaROqauaGunb3/Z/7unT0u7dZueJStz1QACAKgpETHdZLXe+304V/eAHwZ+7aZN3SsmvniWTsX4/cWK4upegU0ndUlem++KUOwUSZz0QAMBSNYFIGvP9YYpTC+Xz0ooV0rPPuhfdmtazhK17cXteNmsNNbMHhO3cae2i6/Q6aU6BxFEPBAAYUTUDzTo7rcJUPzt3WimNuMUxmt1r+FgpY9mlkcfsuSXHjo1+HdPXd+I3PMyuKzlwILnVh1LeHwBUmyD376oJRNJ2szMNjIJwmlxa6g3Ub2pqY6N07bVWoWwpN2d7dUga/e/HbRorACC9mKzqIG35/jhSQMVFt6W2Kpukjvr6rM8tTBt0IVIgAFCdqmZFxOb0F342a91My3mzM10RmTJFOnky+OuvXy+tW+e9T43X+w2TOopi9YIUCABUPlIzPtJwszNNFT30kHTNNdZjQf5N1de7j4g3SUOFTR2loZ4DAJAsUjM+opr/Ueo1mKSKlixxTln4MdmnxqtVOWzqiLHnAIAgqjIQSYsgrbUHD1odPc88IzU1eQ/Z8tqorZBXsFFqq2zcbdCl7BcEAEiPqhrxnkamo+LtVRxJmjzZKiLNZJw7TL7zHWntWv9zewUb9lRRt9RRKa9dKqc6H6/WZQBAelVljch44FV029YWTauyW0utl7hrRNz2C6LNFwDSg2LVKuFVdBvVXA6/OSKF4phjUsivk4dCWQBIBwIRSIquVbkwmLAnq776qjVavrfX/bWjTqGkbTouAMAZgQiGxdmqbLIiE2UKZds2azCbn/Z2qyMKAJCMIPdvilXHucIi13K9di5nrYQ4hbj5vBWMrF5t1bIECYrSvDkeACAc2ncRiEnb7K5d3jUlYWeN2J08Xq3L2ezIRn0AgPQjEIEx071rTGeIFB5nEuCkbb8gAEDpCERgxG0DvO5u6/HCYCRoCiXI5nxsjgcA4wvFqvAVtG3WdB+dffukBx5wHr7mV9Sahv2CAADO2GsGkQpa82GSQlm6VJo9230CrB3ArF7tnqaZN88KQo4csc7NmHcAqDwEIvAVpubDK4Vy553Spk3+Q9K8ilqDpHMAAOlF+y58hW2bddpHp6XFWgkJkhAsDoTcZpTY9SrUigBA5aBGJEGVUudgWvNhMlrddDpqocJJqYx5B4D0o0akAlRSaiHKtlnTNI/92sVzQeKaUQIASAaBSAKCtMKmRVRts0GnnhYHOGHqVQAA6UUgUmZ+488l906RpC1aJB08aKVK2tut7wcOBKvH8JuOanMLcBjzDgDjC4FImVV6asHeX2bZMut70DoMrzSPbf16K+BxCnAY8w4A4wuBSJmRWnBP82Sz0vbt0r33ugc4jHkHgPGFQKTMSC1YSknzMOYdAMYP2nfLLMpWWL/zVEJrcCmq4T0CQCUKcv9moFmZ2amFxYutoKMwGIkqtdDRYRXEFtaiNDdb5x1PqwV2vQoAoHKRmklAnKmFSmwNBgBUL1IzCYo6tcDUUQBAGpCaqRBRpxaCtAaT0gAApEGsqZmvf/3rOu+88zRp0iTNmDFD1113nXp6euI8ZVWjNRgAUGliDURaW1v1wgsv6MMPP9T27du1f/9+LV68OM5TVjVagwEAlaasNSI//elPddVVV2lwcFBnnHGG7/HjvUYkauVqDQYAwEsqd989ceKEnn32WbW0tLgGIYODgxoYGBj1BXNMHQUAVJrYA5Hvf//7OvPMM9XQ0KBDhw5px44drsdu2LBBdXV1w1/ZbDbuyxt3mDoKAKgkgVMz69at0/r16z2P+fWvf61LL71UktTX16cTJ07o97//vdavX6+6ujq9+uqryjjsWjY4OKjBwcHhnwcGBpTNZknNhMDUUQBAUoKkZgIHIn19ferr6/M8ZubMmZo0adKYx7u6upTNZrV7927NnTvX91zUiAAAUHlinSPS2NioxsbGUBdmxzyFqx4AAKB6xTbQ7J133tE777yjyy+/XGeffbY+/vhj3XvvvZo9e7bRaggAABj/YitWnTx5sjo6OvTVr35VF154oW644QZddNFFevPNN1VbWxvXaQEAQAWJbUXkL//yL/WrX/0qrpcHAADjALvvAgCAxBCIAACAxBCIAACAxBCIAACAxBCIAACAxMTWNRMFewAam98BAFA57Pu2yfD2VAciJ0+elCQ2vwMAoAKdPHlSdXV1nscE3mumnE6fPq2enh5NmTLFcZM8J/ZGeYcPH2Z/mgD43ILjMwuHzy04PrNw+NyCi+ozy+fzOnnypM455xxNmOBdBZLqFZEJEyaoubk51HOnTp3Kf3gh8LkFx2cWDp9bcHxm4fC5BRfFZ+a3EmKjWBUAACSGQAQAACRm3AUitbW1Wrt2LRvrBcTnFhyfWTh8bsHxmYXD5xZcEp9ZqotVAQDA+DbuVkQAAEDlIBABAACJIRABAACJIRABAACJGfeByNe//nWdd955mjRpkmbMmKHrrrtOPT09SV9Wah08eFA33nijZs2apcmTJ2v27Nlau3athoaGkr601Lv//vvV0tKiz372s/rc5z6X9OWk0tatWzVr1ixNmjRJc+bM0a5du5K+pNR766239LWvfU3nnHOOMpmMXnnllaQvKfU2bNigyy67TFOmTNG0adN01VVX6cMPP0z6slLt8ccf18UXXzw8yGzu3Ln6+c9/XpZzj/tApLW1VS+88II+/PBDbd++Xfv379fixYuTvqzU+uCDD3T69Gk98cQT+t3vfqfNmzfrxz/+se6+++6kLy31hoaGtGTJEt1yyy1JX0oqPf/881q9erXuuecevfvuu5o3b54WLlyoQ4cOJX1pqXbq1Cldcskleuyxx5K+lIrx5ptvauXKlXr77bf1xhtv6E9/+pMWLFigU6dOJX1pqdXc3KwHHnhAe/bs0Z49e/R3f/d3amtr0+9+97v4T56vMjt27MhnMpn80NBQ0pdSMR588MH8rFmzkr6MivHUU0/l6+rqkr6M1Pnrv/7r/IoVK0Y99hd/8Rf5f/mXf0noiiqPpPzLL7+c9GVUnGPHjuUl5d98882kL6WinH322fl///d/j/08435FpNCJEyf07LPPqqWlRWeccUbSl1Mx+vv7VV9fn/RloIINDQ3pN7/5jRYsWDDq8QULFmj37t0JXRWqRX9/vyTx/2OGcrmcnnvuOZ06dUpz586N/XxVEYh8//vf15lnnqmGhgYdOnRIO3bsSPqSKsb+/fv16KOPasWKFUlfCipYX1+fcrmcpk+fPurx6dOn6+jRowldFapBPp/XmjVrdPnll+uiiy5K+nJS7b333tNZZ52l2tparVixQi+//LK++MUvxn7eigxE1q1bp0wm4/m1Z8+e4eO/+93v6t1339Xrr7+umpoafetb31K+ygbKBv3MJKmnp0dXXnmllixZoptuuimhK09WmM8N7jKZzKif8/n8mMeAKK1atUq//e1vtW3btqQvJfUuvPBC7d27V2+//bZuueUWLV++XO+//37s5/1M7GeIwapVq7R06VLPY2bOnDn8z42NjWpsbNSf//mf6wtf+IKy2azefvvtsiw5pUXQz6ynp0etra2aO3eunnzyyZivLr2Cfm5w1tjYqJqamjGrH8eOHRuzSgJE5bbbbtNPf/pTvfXWW2pubk76clJv4sSJOv/88yVJl156qX79619ry5YteuKJJ2I9b0UGInZgEYa9EjI4OBjlJaVekM+su7tbra2tmjNnjp566ilNmFCRC2eRKOW/NYyYOHGi5syZozfeeEP/8A//MPz4G2+8oba2tgSvDONRPp/XbbfdppdfflmdnZ2aNWtW0pdUkfL5fFnulRUZiJh655139M477+jyyy/X2WefrY8//lj33nuvZs+eXVWrIUH09PRo/vz5Ou+887Rp0yb19vYO/+7P/uzPEryy9Dt06JBOnDihQ4cOKZfLae/evZKk888/X2eddVayF5cCa9as0XXXXadLL710eKXt0KFD1B/5+PTTT7Vv377hnw8cOKC9e/eqvr5e5513XoJXll4rV65Ue3u7duzYoSlTpgyvxNXV1Wny5MkJX1063X333Vq4cKGy2axOnjyp5557Tp2dnXrttdfiP3nsfTkJ+u1vf5tvbW3N19fX52tra/MzZ87Mr1ixIt/V1ZX0paXWU089lZfk+AVvy5cvd/zcdu7cmfSlpcaPfvSj/Oc///n8xIkT81/+8pdppzSwc+dOx/+uli9fnvSlpZbb/4c99dRTSV9aat1www3D/9tsamrKf/WrX82//vrrZTl3Jp+vsqpNAACQGtWb/AcAAIkjEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIkhEAEAAIn5P0cWyeHXPnRsAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "engine": 0 - }, - "output_type": "display_data" - } - ], - "source": [ - "%%px --target 0\n", - "import matplotlib.pyplot as plt\n", - "plt.plot(data_np[:,0], data_np[:,1], 'bo')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we perform the clustering analysis with kmeans. We chose 'kmeans++' as an intelligent way of sampling the\n", - "initial centroids." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[stdout:3] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[ 2.0065, 2.0425],\n", - " [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:2] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[ 2.0065, 2.0425],\n", - " [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:0] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[ 2.0065, 2.0425],\n", - " [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:1] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[ 2.0065, 2.0425],\n", - " [-1.9935, -1.9575]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%px\n", - "kmeans = ht.cluster.KMeans(n_clusters=2, init=\"kmeans++\")\n", - "labels = kmeans.fit_predict(data).squeeze()\n", - "centroids = kmeans.cluster_centers_\n", - "\n", - "# Select points assigned to clusters c1 and c2\n", - "c1 = data[ht.where(labels == 0), :]\n", - "c2 = data[ht.where(labels == 1), :]\n", - "# After slicing, the arrays are no longer distributed evenly among the processes; we might need to balance the load\n", - "c1.balance_() #in-place operation\n", - "c2.balance_()\n", - "\n", - "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", - " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", - " f\"Centroids = {centroids}\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's plot the assigned clusters and the respective centroids:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "# just for plotting: collect all the data on each process and extract the numpy arrays. This will copy data to CPU if necessary.\n", - "c1_np = c1.numpy()\n", - "c2_np = c2.numpy()" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[output:0]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA98klEQVR4nO3df5RU1Z3v/U9Vd+jGQLdg+4uHRml+iRq9KiY2o3NFCALRiFEnetc4zk3ijD62V+R55kYljz8mEsyMEY2ORk0uybor0RgRyWVABBN/ZBEu4siYCwINTdIkHVQk6QYmNOmu8/zRvQ+nTp1Tfar6nDqnut6vtVihq6vq7Kpk5nzY+7u/O2VZliUAAIAYpOMeAAAAqFwEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbKrjHkA+mUxGHR0dGjlypFKpVNzDAQAAAViWpYMHD2rMmDFKp/PPeSQ6iHR0dKixsTHuYQAAgCLs3btXY8eOzfucRAeRkSNHSur7IHV1dTGPBgAABNHV1aXGxkb7Pp5PooOIWY6pq6sjiAAAUGaClFVQrAoAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiE2kQeSpp57SOeecY/cBaW5u1po1a6K8JAAAKCORBpGxY8fqoYce0ubNm7V582Zddtlluuqqq7R169YoLwsAAMpEyrIsq5QXHD16tP75n/9ZX/7ylwd8bldXl+rr69XZ2UlnVQBA6I6sXyql0qqdeUfu7157TLIyqp11ZwwjK2+F3L9LViPS29ur559/XocPH1Zzc7Pnc7q7u9XV1ZX1BwCAyKTS6l7/SF/ocDjy2mPqXv+IlKKUMmqRf8O/+tWvNGLECNXU1OiWW27RihUrdOaZZ3o+d8mSJaqvr7f/cPIuACAqR9YvlSTVzFqYFUZMCKlqavacKUG4Il+aOXr0qNrb2/XHP/5Ry5cv13e/+1298cYbnmGku7tb3d3d9s/m9D6WZgBg6Cv1MokJHDWzFkpS3wxI1TCp96ikvoBCEClOopZmhg0bpokTJ2ratGlasmSJzj33XD322GOez62pqbF32HDiLgBUmBIvk9TOvMOeDZFECIlJdakvaFlW1qwHAACS7Bu/CQa1M+/ImrWIIhi4rzkQr1kb85iknFkbCl4HFmkQueeeezR37lw1Njbq4MGDev755/X666/rlVdeifKyAIAy5QwG3T9/Quo9WtrZiaphqpnRkhWGsvTP2mT9zvGYWeaRpEPPfFG9ezZmPWYQUI6JNIh88MEHuvHGG/X73/9e9fX1Ouecc/TKK6/os5/9bJSXBQCUsdqZd9ghRFXDIg0h9rKPlLM04xVGvGZt/N63d8/GvNf0CiiVKNIg8r3vfS/KtwcADEFHXnvMDiHqPaojrz0WWRjp2b1B0rGaEGdIqJm1ULIyOa/xm7XJ95h5XdRLTeWo5DUiAAD4cd+onTMWYd+4zayFMxQ4Q0bNrIWeSyd2TYiZQXHM2nQ7QpRzvLEtNZUBgggAIBG8ZguCLoUUxcp4hgL7Z4/ZEElZNSHOWZueto2S1SulqrJmckq51FSOCCIAgGQoNhgUKV+haNCwUDOjRdKxoFTVNF0jbn4uu/ZEKtlSUzkiiAAAEsEEA68tsubvce828WyClqqyf1/ddJGk3JmcUiw1lSuCCAAgWby2yKr43Sahdmx1zdrYSy6pKtXMvMNz1sbZKj7SpaYyRRABACRK6I3NQgw2Oc3KHEsu7t+XeqmpXBFEAACJE2Zjsyg6tgbZ3RNGDUolIIgAABIpzN0mYQabku/uGeIiP/QOAIBiuJc+Dj17Q+5z1i/VoWdv6KsD8Xi98/HamXd49v4oWJ4lF3cTtCPrl+Yc4uc3vkpFEAEAJI5z1qH+wVZVNU1Xb9uGnDDSs2eTets2qGfPJs/XO0/s9erYWozaWXf6hpjamXdkL8mU+EThcsQ3AABIFK+ljxE3P5cTRo689ph62zbYj5ubvdfr3cHGnCVTbBgJysySOK9Fm/dsKcuyrLgH4aerq0v19fXq7OxUXV1d3MMBAJRAvu22h569Qb1tG+xZjZxiUdfjkv+Nv5SBIN/4Bv3eYW5PDkkh929mRAAAiZJv6WPEzc951nnkrf8ooKYjKl7jC61+pMyXf5I9OgBARXPfrHMKWJ/5oufjztcUVNMREc/xhRQgyn35h+27AIC8Yp36dx4wJ2U1IOte/4h692y0l2uS2kbdr+dIzayFdoCQBtffJMztyaVGEAEA5Bdyy/VCeJ3ZYn6umbVQPW0b7YJVr54ePW0b+5ZzXIoJUMUEsoF6jjjDSBiN28rxlF+WZgCgAhVSnxD31H/tzDtU1dQsqe9sF+d1q8d/WlVN01U9/tMer8neTeP8fGbpo6A6jWKWUgLUp4TV3ySs7cmlxowIAFSiAmc54p76H3Hz8+r82qTcItU8Mxojbn4uZ5kmSGt2yft7KKZVfJA2714BotDvNejnSiKCCABUoKJuqjFO/Rd7sx4oQBX6PYQdyMIIEOXecp4gAgAVqpCb6pH1S/u6l3qEgbALVt21GO7ZiZ62XxZ0gx0oQBUaLsIKZKEFiDI/5ZcgAgAVLOhN1bRST41qVN1//4V9E3UWix5ZvzScMDLATpmaWQtV3dQc+GYdZDalkHARxlKKpNACRLmf8kuxKgBUsCAFjqaVempUo6w/7NWhZ2/IKgZNjWrs63YaUuMsZ3Fsz+4NOSGkduYdkpVRVdN0z5u1s8g0aGv3oIWeYbaKT0J/kyRgRgQAKlTg+gTHv9xNz47Oe5okq9cOJ2EXrjqXKHrb/y13uSSVVm/bBlU3XeT7mYIufRRTwFqOtRhJRRABgAqUVXdhZbKWF9w3afN3qW8nigkhkiIJIYbfcompIfFrBlbV1Gy3UB9o6aOgcFHmtRhJRRABgErkuKl6zgC4btJGX2Fq77H3SVVFNgvgW4vRX0Pi1QxMkqr7e44E2jobIKzYj5V5LUZSEUQAoAI5b6pBt7Aem3GY3l8TUiVZvTr07A2e3UsHI8hyiR2STDMwKavDqvv9vHb2EC7iRxABAAy4hdUdQtw1I2GGkUKWS5y7ayTZnVSDNGlDMrBrBgAgyfuoelv/LhVnCJH6akb8WqkXLUBb9BxVw7J215TjKbSVKmVZlhX3IPx0dXWpvr5enZ2dqquri3s4ADCk2csf/WHEffOO9RRev7FKWeOV5P24lUnM2CtBIfdvZkQAAIH6YySp70XP7g2SlDNe85hSVdkzO8UcWIeSoEYEACqE34yGe9urlNz+GKbVfO+ejTk1JD1tG+06Flm9np1PCzlbB6VBEAGASuFz4q6ZXTDbXo1E9sfob2Tm3h3j7P7qrGPx220TxwnC8EYQAYAK4Tcr4J5d8HpNUjg/g/PQPa8dPe7nm5/jOkEY3ihWBYAKM1BR6qDfvwRFrV6fIUhBqj0rFNFnRx+KVQEAvvJu0w1DCQpDvT7DQMW0zo6sgz2wDuFhaQYAKkxox9j7KEVhqP0Z+nfHeDUxc868cGBdchFEAKCCBD5xd5CiLAx114RUNU33DT02DqxLLGpEAKBC+M1KRLmNtfNrk+zZl/oHWwf9fn5ByhlK3AWrSZWkBnFho0YEAJCrmNbpg+C1BDRors9gxm4O4SuXECKJJmv9WJoBgApRypNmo1oC8voMWVty89y8i5lliHLWgiZrfQgiAIBQlbow1D3zYq7xC+ss/b933qGHlz6mi1Nbc+pGgoQMvyZwYZ3oS5M1gggAIGwlLAz1m3k5su5bWrQypX/f3q5Ft39Za6+yVPvZ/yd7TK6QYYKJpJwg1b3+EfXs3qARf/fj0GctKr3JGkEEABAqr6WKKJY48s28rH5msTZt/0j/94Uj9eTb7Xrrpr/XlT7ByD6t1xFM7GDTf7aNJPXu2WgX35qZkCPrl4bSnC3K7dRJVxmVMACAeHkUZh5Zv1SHnr3BszDzyGuP9YWX/ueZ1zn/LivTf8BdJus1NZf9N33zveG6cMwwLb7seF04Zpi+seLf5LVJ1BS72ksjLj17Ntm7cZwN1CSFUlAa5NTjoY4ZEQBA5LxqRJw3+bz1F46ZCvffe9s2qLrpomPbeMdfpHWbb9Wm7e1a/lcnKpVK6a6L63XNC29r1Tdv1WenTcmZwXAvjdTMaMmq2TBbgiVl1aEMdmmGJmt9CCIAgJLwKsw0N3n3AXZ+N+eaWQvtWQNJdlgxvUSqxn9GX3/gq7pwzDBdNr5WknTZ+FpdOGaYvv74Ms36wTdzxuVeGpHkmv3wb7c1qJ0zNFmTFHFDsyVLluill17S9u3bNXz4cE2fPl3f/OY3NWXKlECvp6EZABQv0K4QqeRNtdxNzoIewud+niT77ybQvPbhCF3zP7Zp+V+dqJlNw+3Xvtb2J13zwkf66ZK/15V3fcd+v57dG7JOH3ZuM866jtTXTt7qzZo1kVRxu1yCSExDszfeeEO33XabNm7cqHXr1qmnp0ezZ8/W4cOHo7wsAEAK1jCrxE21fAszAxzC536e8+/V4z8tHT9WD72yK2s2xLhsfK0u/L9q9OAPVsuyLPvzOUOIW82MFlU1NUuSUqMa+0JI/9k2PW0bJUlV4y8ihAxSpEszr7zyStbPy5Yt00knnaR33nlHf/mXfxnlpQGg4hXSMKsUTbX8ttr2tG0MtGsk3xJKz55Neu3fWvV2x1G7NsQplUrprr+o0zUv7NWqb96qv+xco6qmZlU3NUtWRoee+aJdc2LXplgZVTc1q7djq6w/7FVqVKOsP+y1O7hWNU3XiJufC+37qVQlrRHp7OyUJI0ePdrz993d3eru7rZ/7urqKsm4AGCoCtIwqxRNtfxqP3raNto39erxn1bPnk2ehZqHnr3BDglmOUXqm7WQpCPrvqWH3u71nA0xnLUib/zgmxo+a4E9NvN+zsJZ5zJNalSjhp1/jbpff9I+9ddZ24LilWz7rmVZWrhwoS6++GKdffbZns9ZsmSJ6uvr7T+NjY2lGh4ADFlBlj6ynpOq8p2RMFtqC+ZRmHnktceyQoiZkTCn6ZrlIhNCUseP7Qsu/aEhNarRDgqvH2nS23sO6K6L63NmQwyzg+btjqN6s3dq3xj6P0/V+IskyQ4XXiFEqfSxGRmrN2vrsHkvv223g/ruhriSBZGWlha99957eu45/2msu+++W52dnfafvXv3lmp4ADBkBTl8zn5Of0HmoWdvyPn9YGpGamfdmRtu+sPJiJufs39vDrAzN3lnWBk27a/sv1c1TbeXS7rf/rG+8ZNf5p0NMcysyL0L/k5/Wv+oeto25nyu7vWPZIUQ6w977ZkaZ7+PvoP2HN8Hh9gVpSRLM7fffrt++tOf6s0339TYsWN9n1dTU6OamppSDAkAKkKQw+fczzEzEIeevUEjbn4uspoRvwPspL4w0PubzZ7LRM4eI71tG/Ra2598a0PcjvUVadeaZ7+heX+3SNUTptvbf+1+IX3PlvWHvfbjA/X7SNIhdlEe1he2SIOIZVm6/fbbtWLFCr3++usaP358lJcDADgEaZhl/u6sjRhx83N2GOm8+3RJVklvpPnOXskaf9UwWZalh37RqfHHV+uE46q0Zd9Rv7e1nXBclcYfX62HftGpuTdbdq2I8zvpY9nLRtVNubtjvPp91M68Qz27N/jW25QsBER8WF+YIg0it912m370ox9p5cqVGjlypPbt2ydJqq+v1/Dhwwd4NQBgUAI2zHI3FZP6wogJIX41I1EZ6OwVZ1A52it1HOzV7w726tLv7yvoOn+urtPRo39WvruR6dyad0uxS/WE6X11LK4gVcoQkKTZmYFEGkSeeuopSdKll16a9fiyZcv0t3/7t1FeGgAqXr5/dbtvRO4lm74aEcuuGSnV7pCgS0lm+25NdUprbzxZ+//D1YU0Xa2Rt62UJHVv+pFkWTr69vN94av/dyeddJLqx47NbmKmVNbnlgbZbt1Rk1PqEFCK3VBhiHxpBgCQfFk3rdces3eFOGtEnM+LwkBLSUff+YnSoxrtOg4zkzO2rlqN48bKOtIlHTnW9qHmD2/1vf788/uWmk6utgNGzR/eUu35fe/ds7vv/Uxhqnlf8599zcyCtVt3z3o4C1/jCAHuZS5TAJyk2hHOmgEASOq/afWHEKWq7GZdJTuIzbGU5Cy2dPYb6f3DXqm2Tqnh9VlhwfrjbyUdCxNmvD1tG5U50C7rj7/1DFaS1LtnY05Bqn2InilgDbDjxStI2SEgJu5lLnPQoJSc2hGCCABUOHPTl3SsjXn/ckzfY45/JUd4EFvWv8RdxZZ9BbTXq7ftl32zHrV1ObtcUsePVd1//0XWGTLO2ROvYFXV1Nx387UyWbUgzjqa6qaLgn1uV01OTgho+6VU4lobr2Uu06dFSkbtCEEEACqd46bvVZth/pVcypuU1yxMdVNzXxCRZP3xt+rtnwWpGn+R3QzNLDsc6j8Lpk8qpxW7M2gUUoiad8yOIOUXAuKqtZHcAWx6YmpHCCIAgETyK7Z0L3dUT5iedbO3l5ekvOfXRHXjdc48eIWAnraNqh7/6WhrMQLsmDJ9WvIdNFgKBBEAqHT9Ny0p96Zvfh+WQhtt5RRbSjk1F85ZE2cIybfzptAxZi9fZXJmP9zLV2bZyDSFM9c2tS7mPaMKIwPtmBpoi3QpEUQAoMI5b1p+jcRCU2CjLfcN07y2qmm6qpsusn/OmQlR326YI/Je5sm7Q8RrjK7lq3zjNu/p1aHWWWBb3XRRYd9dSEytjVdQk1TynTMEEQCApIEbiYWhkEZbXlthpb6dMeZGXjNr4bGbaH8IcYYUc0Cec0lioB0iQXcJDVTkmdWh9p4me0u0u118KfWFoV9mPeb+vKXeOUMQAQAEaiRW8Hua5Qwrk7XUYZYonLMY+UKI2c5b1dQsKZV1Qm/NrIVZW3YlZe1+cfbxKGSHiF99itdjvkFl/VJVN12k3j3/2w5JJoSY35f8vBfXMpyU/d9v1Xj/LrJRIYgAQIULciZNUTen/uUM5zZbEwb6fk7Z24Vz3t9VbOlVk+FcmpHke4PtaftlUTtE/M68Cbx85VjOOfZYlT3GOHp2uINPEnbOEEQAoIyFcspqwDNpCuW1XdQUa9qzGD4t5IO0p7f7nEg5oSArjMy8Q51fm1Rw7YvXUpWk4pev+j9rUs578QtapTZwqzgAQHL1/6s766Ysx9kxATqC1s66M28vjcEsH9TOvEM1sxb2dyetsmdCTCv1+m+02XUe7s8wENOa3R0UzDVNgPINFHk4Z4nqH2y1x+j1mNf7Obfw2tJVx8ae1eckHsV8L1FgRgQAyljSTln1mqHJ+pe3JMmSs8lYMT02jrz2mHr3bPStaXHOmhRa+1JMHUnO+zm28Gb1PqkapqrTpuWcdlxqUdQEFYsgAgBlzq+wMpabnMfWV+dpucdYvj02BtrWGrSmpejaF6+lKkeRp3Opym/5qnbWnXaxqqSsmYfqpouCt42PQGQ1QUVKWQk+Irerq0v19fXq7OxUXV1d3MMBgERz1kHUP9ga2zj8tt1KyjrR1izPjLj5OR165otZMxzu93PWugStiwmlfmaQ/GYe4qwRKcX3Usj9mxkRABgCktQp0/2va8N9Mzb9QEyPDS/5Goblu7bf80oZTpI282AE/f5KhSACAGUuSev9Rla79VSVahx1G87ljO6fPd73nKphqpnRkndpJZQQ4dxS6+hv4g48oYSSiHYjDTUEEQAoY0n9V/cRE0L6Z2jcjrVZz36OvTvFq9alwPbwXtxbinvbNti1KeaQurBCSdJmHpKKIAIA5SyB/+oOMkOTr3bCDiau3hZh7RDKep/+LcVmmci0ZPccFyJBsSoAIDSFnBvj9xxJdhjxK17tXv9I3ucEYRf39jcaM/9pCmiTUFharihWBQDEI+AMTb6be1VTs0bc/LxvrUsYHUHdxb3OMNLbtsEOKYSQ6BFEAAChKbYuopBal8HuEHJfyyzHZM2MxNz2vJLQ4h0AEL88Mynudu1B26x78apNMYWqsnqVGtWYFUZK0fb8yPqlvtc58tpjfbuFhjBmRAAAsQt6yN2gdwg5Ao/fzIjZTWMO6gv0voMRwm6gckYQAQCUhxB2CGUFHlcoce+WkZVRddNFkYeRpJ0XVGrsmgEAVLwgzdIkRdqVNazdQElQyP2bGhEAQMWrnXWn702/duYdfQGjfwnFXc9hB4jU4G6ptTPv8O2hMpSxNAMAQABRL6Ek6bygUiKIAAAQkDOMeLahL1ISzwsqFYIIAKAiFXuIXhgN1dzXSuJ5QaVCjQgAoDIVWfPhtYSS9ftC+4IE7KEyVDEjAgCoSMXUfARaQimwL0iln9JLEAEAVKxCaj4GWkLpaduoETc/lxNwJKln9wb17tno/95FLhMNBSzNAAAqWuBts3mWUEw3VrMkY5ZVutc/ou71j+QNIZIi3xqcZMyIAAAqWtBts/lmJEbc/Fz+nS4DFLVWcndVgggAoGKFuW3Wa5lHUuC+IFFtDU66oTvXAwAomXI8Qdav5qPQE32dspZ5pIJPCa7E7qrMiAAABq8cT5AN4RA9N3uZxyVoX5BK7K5KEAEADFo51jiEvW3WfN6qpmZVNzVLyv4+Bgo4ldpdlSACAAhFpdY4SN7LPIZnGAnw+krprkoQAQCEJuz252VjsMs8ESwTlYuUZVlW3IPw09XVpfr6enV2dqquri7u4QAABmAvJ/TXOFTKjAiyFXL/ZtcMACAUzuWFQnaKoLKxNAMAGLQoaxwquf15JSCIAAAGL8oah3LcGozACCIAgEGL8gTZctwajOAIIgCAxKvkrcFDXaTFqm+++aauvPJKjRkzRqlUSi+//HKUlwMADGGV2P68EkQaRA4fPqxzzz1XTzzxRJSXAQBUAK/25yh/kS7NzJ07V3Pnzo3yEgCAClCp7c8rQaJqRLq7u9Xd3W3/3NXVFeNoAABJUMntzytBooLIkiVL9MADD8Q9DABAklRw+/NKULIW76lUSitWrND8+fN9n+M1I9LY2EiLdwAAykghLd4TNSNSU1OjmpqauIcBAABKhLNmAABAbCKdETl06JB27dpl/7xnzx5t2bJFo0eP1rhx46K8NAAAKAORBpHNmzdrxowZ9s8LF/adB3DTTTfp+9//fpSXBgAAZSDSIHLppZeqRLWwAACgDFEjAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEJuSBJEnn3xS48ePV21trS644AK99dZbpbgsAABIuMiDyI9//GMtWLBAixYt0rvvvqtLLrlEc+fOVXt7e9SXBgAACZeyLMuK8gKf+cxndP755+upp56yH5s6darmz5+vJUuW5H1tV1eX6uvr1dnZqbq6uiiHCQAAQlLI/TvSGZGjR4/qnXfe0ezZs7Menz17tjZs2JDz/O7ubnV1dWX9AQAAQ1ekQWT//v3q7e3VySefnPX4ySefrH379uU8f8mSJaqvr7f/NDY2Rjk8AAAQs5IUq6ZSqayfLcvKeUyS7r77bnV2dtp/9u7dW4rhAQCAmFRH+eYNDQ2qqqrKmf348MMPc2ZJJKmmpkY1NTVRDgkAACRIpDMiw4YN0wUXXKB169ZlPb5u3TpNnz49yksDAIAyEOmMiCQtXLhQN954o6ZNm6bm5mY988wzam9v1y233BL1pQEAQMJFHkS++MUv6uOPP9Y//uM/6ve//73OPvtsrV69WqeddlrUlwYAAAkXeR+RwaCPCAAA5ScxfUQAAADyIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACITaRBZPHixZo+fbqOO+44HX/88VFeCgAAlKFIg8jRo0d13XXX6dZbb43yMgAAoExVR/nmDzzwgCTp+9//fpSXAQAAZSrSIFKo7u5udXd32z93dXXFOBq4LV+1Rul0WlfPuzzndytWr1Umk9E1V8yNYWQAgHKVqGLVJUuWqL6+3v7T2NgY95DQb/mqNdq+a7deXLVaK1avzfrd4kef0IurViudTtT/nAAAZaDgO8f999+vVCqV98/mzZuLGszdd9+tzs5O+8/evXuLep9KsHzVmpxAYKxYvVbLV60J9XrpdFrbdrbqzMmTssLI4kefsB/3mikBACCfgpdmWlpadP311+d9zumnn17UYGpqalRTU1PUaytNOp3Wi6tWS1JWAFixeq1eXLVa114xL9TrmWu8uGq1HUZeWv2KMpmMzpw8SYsWtIR6PQBAZSg4iDQ0NKihoSGKsaAAzmBgfnaGkChmJ9zXzGQySqfTviHEr6bELPOcMXFCTk0JtSYAUFkiLVZtb2/XgQMH1N7ert7eXm3ZskWSNHHiRI0YMSLKS1cEZzB4+ZW16unpjSyEOK/50uo1ymQsSX1hZMXqtZ7X9Ju12b5rt7btbM16rgkn23a25szmEE4AYOiKtLrw3nvv1Xnnnaf77rtPhw4d0nnnnafzzjuv6BoS5Lp63uWqrq5ST0+vqqurIq/TWPzoE8pkLKXTKUnKqRlxj+3aK+Zl/X7F6rV2Tcm2na324yaEuGtNzCwPhbAAMDSlLMuy4h6En66uLtXX16uzs1N1dXVxDyeRzI3ahJEoZ0SchamLFrTY1zahwu/afmN0P+5+n6iXmgAA0Sjk/p2oPiIojPtGbX6WFPqN2zmTYWpC3AWsmUwm53WmTsRv1iadTtmPO8NNqZaaAADxIoiUKa/ZAq8C1rBkMhnPUGB+9qvhcNaJmDBilmO8Hr963uV2CCnFUhMAIF4EkTIVJBiEKV+haNCwMH9OdlCSlDObs21nqx1CnOEEADA0EUTKlAkGXltkzd/j3m3i7mli6kEMZ2Hq1fMu17adrZ41KOb3AIChhyBS5sJubBbmeTLuWRuz5JJKpTR10kSdMXFC1nubEGIej3KpCQCQDASRMhd2Y7Mwg40zsKxYvTZrycW9TbfUS00AgGQgiAwBYTY2i6Jja5DdPWHUoAAAyg9BZIgIc7dJmMGm1Lt7AADlhXaVQ4R76WPxo0/4Pi/IybxhdWzNt+Ry7RXzWHIBgApHEBkCnLMOP/j2I3aHUncYeXDp477t0t0BxR1svFq4B3HNFXN9Q8zV8y7PWZJZvmqN77WChigAQPkgiJQ5r6WPRQtacsLIitVr9X7rrrzvYQKKO9i4z4uJkimWdV+LM2cAYGiiRqTM+S19LFrQYp8Nc9N/W2jXeUj5C1HjrumIoljWKcztyQCAwSOIlLl8N81FC1rsEOKu8/ArRE3CNtowi2Xdwu67AgAYHOa5hzC/Oo98haiF1nRExT3GTCYTSu2IKZJ1Lv9wyi8AxIcZkZDFOfXvvLb75rr40SeyznhJ+nku7hC1fddubdvZKmnwMxlRzrgAAApDEAlZnFP/5trmzBZn3Ydpn27Glq+5WBgGE8j86lac4x9s7Qin/AJAMhBEAijkphp1sWU+zmubFuruALRtZ6umTproWYi6bWerFi1oGfAzBlFsIBuoWNaEkTAarSV9VggAKgFBJIBCb6pxTv07r+3cLXP1vMu1fNUa30JUM4viviE7P2MpAlmQYtmdbW2DmskI0nIeAFAaBJEAirmpxjn173ftgXbYuG/IQW/YYQaygc6cGexMRtzbkwEA2QgiARVyU12+ao2279rtecMMu2DVa5bC3KzT6XRBN+uBPmPcgSyMmYwkbE8GABxDEClA0Juq2eFx5uRJWTMNziLSsLiXjZyFne4C1aBhJN9nLHSWI6xajLBmMjjlFwCShT4iBQhy/orZodIwenRWzYUzGEgK7cwUZ18Ms0XXXOvaK+Zp0YIWTZ000bdFezFnzAQ9EC/MVvEcngcAQxMzIgEFXRYwN0yp71/qzpmDMydPkmVZoW/jdc4MpNOprK27knTWlMl6v3WXtu7Ymbe2o5CakIFmOcKuxWAmAwCGJoJIAIXcVN03zBdXrbZv2maZJIodNH5LKqaGxDkb4QwZZitv0M9YaCCjFgMAkA9BJADnTdVZHOq+qeYrRO3p6Y20l4jfLIWpIbn2inl2GDGBReqbLXF/RifnZxxMIPN6TwAACCIBOG+q7uJQc1PNt8zhvPFHIcgshfm9CSqS7KZn7s/o1xMkXx8SiVkOAEDhCCIFKmS5woQS50xF2L0qCpmlcJ41I2nABmZuzHIAAMJGECnCQFtY3QWr7oDiLhodjGJqMaqrqzR/zuV2Ma15PqfQAgBKLWVZlhX3IPx0dXWpvr5enZ2dqquri3s4OUwL9erqKv3g249k/c7vph7Hzd65VGNmZpxByf14JpOJ7QRhAED5K+T+TR+RIg3UbyNJfS+27tgpSTn9PMxj6XQqa7eNqYNxfyYTaNJp/mcDAAgHSzNFCFIcmoR6CtNq/v3WXTk1JNt2ttrNzzIZy7MnCMs2AICoEUR8+J006+6/ISX30LR0Om13c3UvD23b2aoTTxid1fzMb7dNqU8QBgBUDoKID/c2XcMsc5j+G0YSt7A6w4S7kZmzDXy+QBXXCcIAgMpAEPHhtzzhXubwek0h/GZepHAKQ/1mNjKZTM5MifP5poFZGAfWAQDghyCSRymWJ/xmXvL18yhUoTMbhbRyBwBgMAgiA4h6eaIUhaFmZiOdTnvObLhnXsI+sA4AAD8EkQGUYnkiypkXd02IOXjPXNdr5oUD6wAApUJDszzytW6PYkYgX4O0YviN3xlK3AWrSRZ1PQ0AIBw0NAuB3/KEaQbmbvYVxvXyNUgrhntmw4x/285WpdOpsgohkmi0BgBDEEszPkq5PBFVYajX7ICz5iWVSuUdU6EzDKXcAWR+ptEaAJQ3goiPUnVGLXVhqHvmJeiOnSAho1Q7gCQarQHAUEEQiVkSZl5eXLVa//7uv+nF53+ka6//L2r93b6cMblDhgkmknKClDlh+Gt33h7JjAWN1gBg6KBYtULkOw34J//rX/Xqip/o4w/26YSTT9Ez31umL3xuTt73kI7N2Jj3NGfbbNvZKin3pN+wiknNOJzvTxgBgOQo5P7NjEgCeS2DOGcg3Dd0Z/2F87XOv5tOqmaGxfmad9/ZrOc+2Kcp5/wn7Xhviz5Z7V074pzxqK6uyvm9CSFnTp6knW1t9oyFeU0YSzM0WgOAoYUgkkBetRbOx5w3dPcshfN57r+bkOB8jWVZev6H/1MnnHyKzp9+ifZ/sE+333GHdrz/vmcxq3tZZP6cy7PqNcyWYElZdShhzFrQaA0Ahh6CSAIFvbkOdGO+9op59nZjSVlLKua8mf96a4tad+zQpZ+7SqlUSp+a9hm9/q8r9V9vbdGVV1yRs5TiLnaVjgWO6uqqrCASNhqtAcDQE1mNyK9//Wt9/etf189+9jPt27dPY8aM0V//9V9r0aJFGjZsWKD3qPQaEa9aCEmB6iPcr5WOBYapkyZKkj76+GP98LtPKyXps1dfp1QqJcuy9OqKn0iSFnz1Hn3tzttz3tNrWcR5HffSjJk1OXPyJJ0xcQJNxwBgiEtEQ7Pt27crk8no6aef1tatW7V06VJ95zvf0T333BPVJQdl+ao1vk3EVqxeq+Wr1gR6Tpiunnd51mzD1fMu93wsyGudfz9rymS937pL723Zoo8/2Kezp33GXoYxsyIff7BP3Qc77fdb/OgTeZdY5s+53A4423a25syamJkSmo4BAJwiW5qZM2eO5sw5tvOiqalJO3bs0FNPPaWHH344qssWLWgPjKj7ZDj5dVsNcvZNviWUrTt2auqkiVr70gs64eRTdGrjuKzXnto4TiedOkav/uv/0j/+f4v08ppX7fqSTCajFavXauuOnXq/dVfWjpizpkzWRx8f0P4DByT1hRPnd8buFgCAW0lrRDo7OzV69Gjf33d3d6u7u9v+uaurqxTDklRY185SdPbMtwwy0I4Rv/DkDAYd7b/Rxx/ss2tDnFKplM48/0K9/q8rdd+D39Du33+Yc80Gx3+P7rFIUsPo0b67awAAMEoWRHbv3q3HH39c3/rWt3yfs2TJEj3wwAOlGlKOIF07S9HZM0i4MVtzncWo2UFhlLbtbNW2na1qGD1aJzWcYL9nvtkQ49TGcTrh5FP05L88oWe+tyyrGNbMhpiTfJ3X2X/ggKZOmqiv3Xl71iF+8+dcnlVMygF2AACpiBqR+++/X6lUKu+fzZs3Z72mo6NDc+bM0XXXXaevfOUrvu999913q7Oz0/6zd+/ewj/RIAWpwXA+J9/NtNiaEa/dIeYxs9vFvZ3X3OS37tgpSTqpoUHbdrbqxBP6woFZWnl9wy/12muv6eMP9ulTjtoQN2etyCerU1qxeq2+/si3tWL1Wp01ZbIkZR2el0ql7CWZs6ZM9lwacgYLDrADAEhFzIi0tLTo+uuvz/uc008/3f57R0eHZsyYoebmZj3zzDN5X1dTU6OamppChxQqr7oMd9Awz0mn08pkMlr86BNatKAl6/eDqRnxmgnwmx0w17nmirlasXqtXbfhnB0xhaLbdrbKsiz9avP/zjsbYphZka/8/d/rkrmf11lTJudsC85k+jZdmc1X7i3CfktISTrAjtkZAIhPwUGkoaFBDQ0NgZ77u9/9TjNmzNAFF1ygZcuWJf5fuUG6drqfs/jRJ7RtZ6sdRkp5Mx1omcj5+3Q6pUzG0u/3tvvWhrg5+4oMs3qzPt+Zkyd5vsYEniBNx5JygF0pDusDAHiLrEako6NDl156qcaNG6eHH35YH330kf27U045JarLFi1I107z9zMnT7J/t2hBix1G/vq2O2RZpd0dMtABcM7fm9mQEXX1qh0+XAc++nDA968dPlwj6uq16uWX9NKVV+oLn5tjh41USnJ2oTEzL87vxzkOKbfpWN8yU8pz/KWajUjS7AwAVJrIgsirr76qXbt2adeuXRo7dmzW75J4zl7Qrp3mZutcslm0oMUOIel0qqQ3roGWko4tI6X05z/36E+HDuk/Dh/SKy8+X9B1hn2iWkePHtWK1WtzQoi5tgkh7u/H8Ppetu/arUzGssOIeV2pZyOSMjsDAJWG03eL4Lc8Y5Y/SnUD81tKMsWrzkPoTNv1w4cOKpXJ6Ihjm7QkzbzkL3TZxdP1s19s0Dv//iv9satLqXRKVsbSzEv+Qtd/Yb7Gjh2bc8KulF0XMnXSRJ01ZXKgmQx3DYv7P+MIAs6dPp+f/VlqRwCgCJy+GzHnv55fWv2KfbKts4bC+bwoDLSUdNzw4fqPP/1JJ54w2g4N114xT2/8cqM++viAjnO937vbWzVh4iR1Z6R0Ta0u/otPZX2eCe9t1dixY+3D88z7mes5C1jPmjI5cAhxhyjn4XylDiHu2SVn4KJ2BACiQRApgtllYXbNpNPprF0zUydNjDyMOJeSnLs+rp53uV3DceIJo/XRxwc0vLZWn5t1mbbtbNVHHx+wx3jWlMn2dt/3W3fZYzahyjl+Z78Q81p3kzfn9uJCxm+uY5ZE0um0zpg4IbwvKwC/YGR6pZgxUjsCAOEiiBTBucvChBHTD8PcpMzyRFS8enJIfTdL98zMn44csf9uZkpMkDA31/dbd9nv5wxV5j0l6f9s3zFgHU3QpYqBTvUt5Q6rgWaXTBihdgQAwkcQGaQvzOs7TyfO81T8tsd6mTdzRtZznX83NS5+hab53ncwnznItukoBSlUdp4mTAgBgPAQRArkdY5LVOepFNJoy2/Xh6lhMdz1HEaYNS6FjNvv7BozVtOEbeqkiZEVhuZ7XxOMghw0CAAoHEGkQO5/PTt7eLjPUxmsQhttuXuKbNvZqkwmo1QqJcuy7N0o7vBkajKcN1h3XYTfsotX6DDj3razVWdMnJATOpzjNoW+Zlzmms5aF0l2W/lSe3Dp457dap3jZ+cMABSPIFIg501noPNUBqvQRlvu8TgLVt1bY814U6lU1lZf53VN7Uu+HSJeYckdIszYvMbtDinOm7xzd04cMxDu2hkp978Tds4AwOAQRIoURV2D1+xC9lbhNb59SryWjCTpPzdfZP/s7CdiXPO5uVm/L3SHiF9YcoYe05sj6Ps4w0ichaFm9suMS8r+73bqpIks0QDAIBFEihCkHXwxNyjnkoZ754okuwNpvhBitvP69fgwzpw8yQ4ezt8Xs0MkX1dSZ4OwfO9jQphzdsm8Jq7mYe7rsXMGAMKX7FPoQrZ81ZqcY+eNFavXavmqNYHeJ98ui6B9NLxcPe9yexZh8aNP2OPKrkmwcj6DezzXXDHXrrPwGk86ndKiBS32753LDGdMnGCHgUJ2iFw97/Kc13kVefoxIcyEEEnq6enV4kefsBudxcnr8wEABq+iZkTCOmV1oF0Wg+E8RO/GljuzQoTfMkGQ8Ty49HFJytn54e4BUuwOEffrzGcoZunKOVsTV5dVN3bOAEA0KiqIFFr8GZdFC1o8Q4i7k6l7V4ofU3TpFwrM+xZb9+J39o4zQORbunJex8wI7Wxrs3/vd4heqcTd5wQAhrKKCiJSsk5Z9eu3YWoi/Dh3pZw5eVLeawStZym27sXrdaY9uztAOGdfnDKZTFbbeGddyfw5l2vrjp2RdqnNJ6p6IABAn4oLIlJuv424biReS0XumhDT7dRrV0qQABWka2ghzwvy/s4tue7XeY13oC3RX7vz9ryfMUrFfi8AgGBSlmVZcQ/CTyHHCBfC3OzNzS7OZRm/bbeScpY6pGM1Hs6D6dzvV65NtvyWQJK0bAYAGFgh9++K2jUjZd/sfvDtR+xdI/l2dETJuXPlpdXHdu04b76LFrTYSzB9B8Kl7GUPJ/PZwtph4txl5N5x5NxlVMiOIz9+SyBx//cDAIhWRS3NJHW937lUlE6n9IV5c3PGYYo4zVLNQM3HCjnvxY/7lGHz9607dtrFr+4ZnWJnZFgCAYDKVFFBJKk3O6+6CPfvvZYs8jUfC2OrsjOkeTVFMwWzXksphYpySzQAILkqKogk8WY30NbQgWZx0um0Z9FtWFuVne+TfVBeKmuLLvUcAIBiVGSxalL43bydj2cyGd8lFlPEmq/oNqzCXOeWWulYrUomYyWi6BcAkByF3L8rakYkaYIsFfnN4ri38Po12Qpjq7LX0pH5u9+MDAAAQRBEYlTsUlEhRbeDbU3ut714/pxjTdVMGClF99MwinABAMlBEClDQYtuB9ua3CuEuAOJ2c3j3sUTlbDOCwIAJANBpAwFmUkJY6uyM/AsX7UmK9BI0tRJE7VoQYs9E1GKMFIu5wUBAIKhWHWIinIJI8h7S4p0CSVJ3XEBANkKuX8TRBCJIDuCBhscnDt5fvDtRwY7ZABASNg1g9hFvYQy2CJcAEAyEEQQGWcY8er+WqzBFuECAJKDIIJAiq05CaOPiftaSTwvCABQnIo7fRfFMdtmCz3x12sJxcl9qq/7te5TffNtXTadaAEA5YMZEQRSTM1HkCWUQvuCJPG8IABA8QgiCKyQmo+BllC27WzVogUtOQFHkrbu2Kn3W3exJRcAKgBBBAUJWvORbwnFtIY3O128wki+EEKbdwAYOqgRQUEGqvkwrrlirm+QWLSgRddeMc+z5kTSgEWtxdarAACShxkRBBbmtlmvZR5JgfqC0OYdAIYOgkgMynFpIYpts85lHkkFBZyoepQAAEqLIBKDcjxBNuiJv4UwyzxuQQNO2D1KAAClRxCJQTkuLYS9bdZ83qmTJuqsKZMlZX8fQQIObd4BoPwRRGJSyUsL+UKXVxgJ8h60eQeA8kQQiVGlLi0MdpmHNu8AMHQQRGJUqUsLg13miaJeBQAQD4JITFhaKB5t3gFg6CCIxCDKpYVy3BoMAKhcBJEYRLm0UI5bgwEAlYsgEoMolxbKcWswAKByEUSGoEreGgwAKC+Rng72+c9/XuPGjVNtba1OPfVU3Xjjjero6Ijykuh39bzL7d04lbQ1GABQXiINIjNmzNALL7ygHTt2aPny5dq9e7euvfbaKC+JfkFPyQUAIE6RLs3ceeed9t9PO+003XXXXZo/f77+/Oc/6xOf+ESUl65obA0GAJSLktWIHDhwQD/84Q81ffp03xDS3d2t7u5u++eurq5SDW/IoOsoAKCcRLo0I0lf/epX9clPflInnHCC2tvbtXLlSt/nLlmyRPX19fafxsbGqIc35OTbGnztFfPoOgoASJSUZVlWIS+4//779cADD+R9zttvv61p06ZJkvbv368DBw7oN7/5jR544AHV19dr1apVSqVSOa/zmhFpbGxUZ2en6urqChkmAACISVdXl+rr6wPdvwsOIvv379f+/fvzPuf0009XbW1tzuO//e1v1djYqA0bNqi5uXnAaxXyQQAAQDIUcv8uuEakoaFBDQ0NRQ3MZB7nrAcAAKhckRWrbtq0SZs2bdLFF1+sUaNGqa2tTffee68mTJgQaDYEAAAMfZEVqw4fPlwvvfSSZs6cqSlTpuhLX/qSzj77bL3xxhuqqamJ6rIAAKCMRDYj8qlPfUo/+9nPonp7AAAwBES+fRcAAMAPQQQAAMSGIAIAAGJDEAEAALEhiAAAgNiU7NC7YpgGaBx+BwBA+TD37SDN2xMdRA4ePChJHH4HAEAZOnjwoOrr6/M+p+CzZkopk8moo6NDI0eO9Dwkz4s5KG/v3r2cT1MAvrfC8Z0Vh++tcHxnxeF7K1xY35llWTp48KDGjBmjdDp/FUiiZ0TS6bTGjh1b1Gvr6ur4H14R+N4Kx3dWHL63wvGdFYfvrXBhfGcDzYQYFKsCAIDYEEQAAEBshlwQqamp0X333cfBegXieysc31lx+N4Kx3dWHL63wsXxnSW6WBUAAAxtQ25GBAAAlA+CCAAAiA1BBAAAxIYgAgAAYjPkg8jnP/95jRs3TrW1tTr11FN14403qqOjI+5hJdavf/1rffnLX9b48eM1fPhwTZgwQffdd5+OHj0a99ASb/HixZo+fbqOO+44HX/88XEPJ5GefPJJjR8/XrW1tbrgggv01ltvxT2kxHvzzTd15ZVXasyYMUqlUnr55ZfjHlLiLVmyRBdeeKFGjhypk046SfPnz9eOHTviHlaiPfXUUzrnnHPsRmbNzc1as2ZNSa495IPIjBkz9MILL2jHjh1avny5du/erWuvvTbuYSXW9u3blclk9PTTT2vr1q1aunSpvvOd7+iee+6Je2iJd/ToUV133XW69dZb4x5KIv34xz/WggULtGjRIr377ru65JJLNHfuXLW3t8c9tEQ7fPiwzj33XD3xxBNxD6VsvPHGG7rtttu0ceNGrVu3Tj09PZo9e7YOHz4c99ASa+zYsXrooYe0efNmbd68WZdddpmuuuoqbd26NfqLWxVm5cqVViqVso4ePRr3UMrGP/3TP1njx4+PexhlY9myZVZ9fX3cw0icT3/609Ytt9yS9dgZZ5xh3XXXXTGNqPxIslasWBH3MMrOhx9+aEmy3njjjbiHUlZGjRplffe73438OkN+RsTpwIED+uEPf6jp06frE5/4RNzDKRudnZ0aPXp03MNAGTt69KjeeecdzZ49O+vx2bNna8OGDTGNCpWis7NTkvj/YwH19vbq+eef1+HDh9Xc3Bz59SoiiHz1q1/VJz/5SZ1wwglqb2/XypUr4x5S2di9e7cef/xx3XLLLXEPBWVs//796u3t1cknn5z1+Mknn6x9+/bFNCpUAsuytHDhQl188cU6++yz4x5Oov3qV7/SiBEjVFNTo1tuuUUrVqzQmWeeGfl1yzKI3H///UqlUnn/bN682X7+P/zDP+jdd9/Vq6++qqqqKv3N3/yNrAprKFvodyZJHR0dmjNnjq677jp95StfiWnk8Srme4O/VCqV9bNlWTmPAWFqaWnRe++9p+eeey7uoSTelClTtGXLFm3cuFG33nqrbrrpJm3bti3y61ZHfoUItLS06Prrr8/7nNNPP93+e0NDgxoaGjR58mRNnTpVjY2N2rhxY0mmnJKi0O+so6NDM2bMUHNzs5555pmIR5dchX5v8NbQ0KCqqqqc2Y8PP/wwZ5YECMvtt9+un/70p3rzzTc1duzYuIeTeMOGDdPEiRMlSdOmTdPbb7+txx57TE8//XSk1y3LIGKCRTHMTEh3d3eYQ0q8Qr6z3/3ud5oxY4YuuOACLVu2TOl0WU6chWIw/1vDMcOGDdMFF1ygdevW6eqrr7YfX7duna666qoYR4ahyLIs3X777VqxYoVef/11jR8/Pu4hlSXLskpyryzLIBLUpk2btGnTJl188cUaNWqU2tradO+992rChAkVNRtSiI6ODl166aUaN26cHn74YX300Uf270455ZQYR5Z87e3tOnDggNrb29Xb26stW7ZIkiZOnKgRI0bEO7gEWLhwoW688UZNmzbNnmlrb2+n/mgAhw4d0q5du+yf9+zZoy1btmj06NEaN25cjCNLrttuu00/+tGPtHLlSo0cOdKeiauvr9fw4cNjHl0y3XPPPZo7d64aGxt18OBBPf/883r99df1yiuvRH/xyPflxOi9996zZsyYYY0ePdqqqamxTj/9dOuWW26xfvvb38Y9tMRatmyZJcnzD/K76aabPL+3n//853EPLTH+5V/+xTrttNOsYcOGWeeffz7bKQP4+c9/7vm/q5tuuinuoSWW3/8PW7ZsWdxDS6wvfelL9v9tnnjiidbMmTOtV199tSTXTllWhVVtAgCAxKjcxX8AABA7gggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYvP/A7VSxPISKof1AAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "engine": 0 - }, - "output_type": "display_data" - } - ], - "source": [ - "%%px --target 0\n", - "# plotting on 1 process only\n", - "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n", - "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n", - "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n", - "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[stdout:3] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[-2.0081, -2.0299],\n", - " [ 1.9919, 1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:2] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[-2.0081, -2.0299],\n", - " [ 1.9919, 1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:0] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[-2.0081, -2.0299],\n", - " [ 1.9919, 1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:1] Number of points assigned to c1: 100 \n", - "Number of points assigned to c2: 100 \n", - "Centroids = DNDarray([[-2.0081, -2.0299],\n", - " [ 1.9919, 1.9701]], dtype=ht.float32, device=cpu:0, split=None)\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%px\n", - "kmedians = ht.cluster.KMedians(n_clusters=2, init=\"kmedians++\")\n", - "labels = kmedians.fit_predict(data).squeeze()\n", - "centroids = kmedians.cluster_centers_\n", - "\n", - "# Select points assigned to clusters c1 and c2\n", - "c1 = data[ht.where(labels == 0), :]\n", - "c2 = data[ht.where(labels == 1), :]\n", - "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n", - "c1.balance_()\n", - "c2.balance_()\n", - "\n", - "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", - " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", - " f\"Centroids = {centroids}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Plotting the assigned clusters and the respective centroids:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "c1_np = c1.numpy()\n", - "c2_np = c2.numpy()" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[output:0]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGdCAYAAAAvwBgXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9RklEQVR4nO3df5RU9Z3n/1d1d+jGQLc2jShLE2l+CTG6Cv6A0RkRB4E1ETc6ibthnJOEObjiSDjf2UTJ+mOig5lxRCYmjpoczNkxqCMSHQZQwKjJEIL4lTEB5VcTG22JIptuYENzuuvuH92fy61b91bdqr637q2u5+McztLVVXU/VWbnvvh83p/3J2VZliUAAIAYVMU9AAAAULkIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2NTEPYBc0um02tvbNXToUKVSqbiHAwAAArAsS0ePHtXIkSNVVZV7ziPRQaS9vV3Nzc1xDwMAABTh4MGDGjVqVM7nJDqIDB06VFLvB6mvr495NAAAIIjOzk41Nzfb9/FcEh1EzHJMfX09QQQAgDITpKyCYlUAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwiDSKPPvqozj//fLsPyLRp07R+/fooLwkAAMpIpEFk1KhReuCBB7R9+3Zt375dV111la677jrt3LkzyssCAIAykbIsyyrlBRsbG/X3f//3+trXvpb3uZ2dnWpoaFBHRwedVQEAoVu9dr2qqqp0/dxrsn63Zt1LSqfT+uK1c2IYWXkr5P5dshqRnp4ePf300zp+/LimTZvm+Zyuri51dnZm/AEAICpVVVV6bu06rVn3Usbja9a9pOfWrst7ciz6L/Jv+Ne//rWGDBmi2tpaLVy4UGvWrNHkyZM9n7ts2TI1NDTYfzh5FwAQldVre2sWb7h2bkYYMSFk0vhxnjMlCFfkSzMnT55UW1ubfv/732v16tX64Q9/qNdee80zjHR1damrq8v+2Zzex9IMAAx8pV4mMYHjhmvnSpKeW7tONTXV6u7ukdQbUAgixUnU0sygQYM0btw4TZ06VcuWLdMFF1ygFStWeD63trbW3mHDibsAUFlKvUxy/dxr7NkQSYSQmNSU+oKWZWXMegAAIMm+8ZtgcP3cazJmLaIIBu5r5uM1a2Mek5Q1a0PBa36RBpE777xTc+bMUXNzs44ePaqnn35ar776qjZs2BDlZQEAZcoZDH664SV1d/eUdHaipqZa82ZfkxGGnMysjfN3zsfMMo8k3bf8e3pn776MxwwCyimRBpHf/e53mj9/vj788EM1NDTo/PPP14YNG/Snf/qnUV4WAFDGrp97jR1CamqqIw0hZsZFyl6a8QojXrM2fu/7zt59Oa/pFVAqUaRB5Ec/+lGUbw8AGIDWrDsVQrq7e7Rm3UuRhZGdu/dIOlUT4gwJN1w7V+l0Ous1frM2uR4zr4t6qakclbxGBAAAP+4btXPGIuwbt5m1cIYCZ8i44dq5nksnpibEBCXnrM3z6zZ4zuTEtdRUDggiAIBE8JotCLoUUox0Ou0ZCszPXrMhUmZNiHPWZteevUqn06qqqsqYySnlUlM5IogAABKh2GBQrFyFokHDwrzZmUFp8oTxWrp4UcZMjqSSLTWVI4IIACARTDDw2iJr/h73bhOvJmhVVSn795MnjJeUPZNTiqWmckUQAQAkitcWWan43SZhdmx1z9qYJZeqqpT+69w5nrM2zlbxUS41lSuCCAAgUcJubBZmsHE3K3Muubh/X+qlpnJFEAEAJE6Yjc2i6NgaZHdPGDUolYAgAgBIpDB3m4QZbEq9u2egi/zQOwAAiuFe+rj/4UeynrN67Xrd//AjWr12vefrnY9fP/caz94fhcq15OJugrZ67fqsQ/z8xlepCCIAgMRxzjr8+B8f0uQJ47Vrz96sMPLuvv3atWev3t233/P1zhN7vTq2FuOL187xDTHXz70mY0mm1CcKlyO+AQBAongtfSxdvCgrjJgmYuZxc7P3er072JizZIoNI0GZWRLntWjznillWZYV9yD8dHZ2qqGhQR0dHaqvr497OACAEsi13fb+hx/Rrj177VkNd7Go+3HJ/8ZfykCQa3z9Feb25LAUcv9mRgQAkCi5lj6WLl7kWeeRq/6jkJqOqHiNL6z6kXJf/kn26AAAFc19s3bXedy3/HuejztfU0hNR1S8xhdWgCj35R+27wIAcopz6t/ZjExSRgOy59au0zt799nLNUlto+7Xc+SGa+faAULqX3+TMLcnlxpBBACQU9gt1wvhdWaL+fmGa+dq1569dsGqV0+PXXv2auniRVnvW0yAKiaQ5es54gwjYTRuK8dTflmaAYAKVEh9QtxT/9fPvUaTxo+T1Hu2i/O6544bq8kTxuvccWOzXuPeTeP8fGbpo5DvoZillCD1KWH1Nwlre3KpMSMCABWo0FmOuKf+v/2N23TzXy3JulnnmtFYunhR1jJNkNbskvf3UEyr+CBt3r0CRKHfa9DPlUQEEQCoQMXcVOOc+i/2Zp0vQBX6PYQdyMIIEOXecp4gAgAVqpCb6uq16/Xuvv2eYSDsglV3LYZ7dmLn7j0F3WDzBahCw0VYgSysAFHup/wSRACgggW9qZpW6sOHNerh79xt30SdxaKr164PJYzk2ylzw7Vz9dmJEwLfrIPMphQSLsJYSpHCCxDlfsovxaoAUMGCFDiaVurDhzXq40+O6P6HH8koBh0+rFG79uwNrXGWszh25+49WSHk+rnXKJ1Oa/KE8Z43a2eRadDW7kELPcNsFZ+E/iZJwIwIAFSooPUJzn+5m54d8xctVjpt2eEk7MJV5xLF3gMHspZLqqqq7JkYv88UdOmjmALWcqzFSCqCCABUIOdNNZ1OZywvuG/S5u9S704UE0IkRRJCDL/lElND4tcMbNL4cXYL9XxLH4WEi3KvxUgqgggAVCDnTdVrBsB9kzZ6C1NPnZXq1+ArDH61GKaGxKsZmCR9duIEScFqJ4KEFaPcazGSiiACABXIeVMNuoXVPGZqQ6qqqpROp3X/w494di/tjyDLJeb3JqhIyuiw6n4/r509hIv4EUQAAHm3sLpDiLtmJMwwUshyiXN3jSS7k2qQJm1IBnbNAAAkeR9Vb5hdKs4QIvXWjPi1Ui9WkLbobjU11Rm7a8rxFNpKlbIsy8r/tHh0dnaqoaFBHR0dqq+vj3s4ADCgmZu2CSPum3ecp/D6jVVSxngleT6eTqcTM/ZKUMj9mxkRAECg/hhJ6nuxc/ceScoar3msqiqVMbNTzIF1KA1qRACgQvjNaLi3vUrJ7Y9hWs2/s3dfVg3Jrj177TqWdNry7HxayNk6KA2CCABUCL8Td83sgtn2aiSxP4azkZm7INXZ5TXfbps4ThCGN4IIAFQIv1kB9+yC12uSwvkZnIfuee3ocT/f/BzXCcLwRrEqAFSYfEWp/VWKolavzxCkINXMCkX12dGLYlUAgK9c23TDUIrCUK/PkK+Y1tmRtb8H1iE8LM0AQIUJ6xh7P6UoDDWfoaqqyvMzuGdeOLAuuQgiAFBBgp40219RFoa6a0ImTxjvG3oMDqxLLmpEAKBC+M1KRLmN9ea/WmLPvvz4Hx/q9/v5BSlnKHEXrCZVkhrEhY0aEQBAlmJap/eH1xJQf7k/gxl77yF8qbIJIVJpamnKAUszAFAhSnnSbFRLQF6fwbklN5VK5RxTobMMUc5a0GStF0EEABCqUheGumdevK7hVTcSJGT4NYEL60RfmqwRRAAAIStlYajfzEuQWQZ3yDDBRFLW859bu047d+/Rt79xW+izFpXeZI1iVQBA5KJY4shXfCspb+My98yG8+A8E07e3bdfu/bszXo/SZE1Zyv3MFLI/ZsZEQBA5LyWOJw3efcShzOcOEOM8+/pdLrvgLt01msk6fl16/POMjhnPGpqqrN+b8Y3ecJ47Wlttd/PvKa/SzOl2k6dZAQRAEDkvGpEnDf5XPUXzhDj/rt5vfME4TXrXtJ/vPX/a+3TT2nq5X+sM0eOsnemeM1guJdG5s2+JqNmw2wJlpRRh9LfmQuarPUiiAAASsKrMNPc5N0H2PndnG+4dq7dml1SxpLK5AnjNXnCeP3Lv/6bXl7zL/r9J4fV+eH7+pM/uTLr+U7uYldJGe3jc1Uw9GfnDE3WekUaRJYtW6bnn39e7777rgYPHqzp06fru9/9riZOnBjlZQEAClaXIamkTbXcsw9LFy+yw0euXSN+SyjuQPPxJ5/ow4Nt+uR3hzTx/P+sbdu26bThZ2vk6M94fr6du/dknD7sVV/yzt599vfknjWRvMNNEKXcTp1kkXZLee2113Trrbdq69at2rhxo7q7uzVr1iwdP348yssCABSsYVapm2r5nXMT5BA+9/Ocfz933FgNH9aojw5/ond3bNfws87WRdOv0LARZ+nX238ly7Ky6kmeW7suI4S4zZt9jSaNHydJGj6s0d7O293dYy/VTBo/rqJCQxQinRHZsGFDxs8rV67UmWeeqTfffFN//Md/HOWlAaDiFdIwqxRNtfwKM3ft2RvoEL5cSyjv7tuvjz85oj90/B8d+uADXflfrlMqldLnpl6qV//tBX14sE03fv6/ZFx30vhx+uzECUqn07pv+feUSqUyCmfT6bQ+O3GC3nv/A338yRENH9aojz85YndwnTxhvJYuXhTa91OpSloj0tHRIUlqbGz0/H1XV5e6urrsnzs7O0syLgAYqII0zCpFUy2/2o9de/baN/Vzx43Vu/v2exZq3v/wI3ZIMMspUu+shRn7pPHj9PB3/1bDRpyls5tHS5LObh6tYSPO0r7f7NC//Ou/eX6+Netest/PWTjrXKYZPqxRV1x6iV58eWPfqb+pjNoWFK9kjewty9KSJUt0+eWX67zzzvN8zrJly9TQ0GD/aW5uLtXwAGDACrL04XxOrpqR1WvXFzUGr8LMNeteygghzl0wzuUiE0KaGhu1a89eOzQMH9ZoB4XJE8Zr8+bN+uR3h/S5qZfard7NrMj7772n331wMOs7MJ/HLMGYcOEVQpw1Iul05lKPeS+/83T6890NdCWbEVm0aJHefvtt/eIXv/B9zh133KElS5bYP3d2dhJGAKCf/OoyvJ5TVVWldDqt+x9+JGPZob8tzb0KM/12jZgdMOl0OiOsmIAyecJ4Sb2hYfiwRr265Zf6+JMj+vX2X2XMhhhnN49W04iz9B/bturs5tEZB/C9s3ef3tm7z35Pc33DLMc4+524T/01om4HP1CVJIjcdtttevHFF/X6669r1KhRvs+rra1VbW1tKYYEABUhSMMs93PMDIQJI1HVjPgdYCf1hgHTQMyrnsXc1E3RqNkpY2pDnFKplM7rqxUZM2K4Lpp6ccaOl89OnGCHCvN+5nUff3LEfjxfv48kHWIX5WF9YYs0iFiWpdtuu01r1qzRq6++qjFjxkR5OQCAQ5CGWebvztqIpYsX2WHkK7feLstSSW+kuc5e8drGa1mW72yIYWZFXv63f9WFU6ZmXc+8p5PZaXPuuLFZTdecr3Muz1w/9xrt3L3Ht96mVCGgnGZnIq0RufXWW/XP//zP+slPfqKhQ4fq0KFDOnTokP7whz9EeVkAgHI3zLrh2rlKp9N2m3RTG2EsXbxIqZRkWVJVVaqk/5r3Wkpyj9+5c8bMhjhrQ9zMrMi2bdv0vUf/yW6Mlq9p2K49e31nFsxY3KHisxMnSFJWkIpqS7TfuEzjN/P9xTU7k0+kMyKPPvqoJOnKK6/MeHzlypX6i7/4iygvDQAVr5CGWe4lm/sffsQOIem0VbLdIUGXkkwICTIbYpgdNL/Z/ivNm7PGDi3Oa5wKX1V2SOlPu3VnkCp1CCjFbqgwRL40AwBIPudN6/l1G+yZEmeNiPN5Uci3lPTqlq06s2mYXccxecJ4bdq0ybc2xM3ZV+Tu+/5Wf/O/lkqSdu7eI+lUYaqZITL/b1NjY+B2616n+bpP9C0l9zKXKQBOUu0IZ80AACT13rRMCKmqqrJ3zZTqIDbnUpKz2NLZb+TwkSM6bXCdTht8mnbu3qN9v9mhIfUNqhs8WEc+/ijvNeoGD9bQhgb94PuP6D9fNEWpVMreNeO3K8Ysz+TjFaRMCIiLe5nL7P6RklM7QhABgApnbvqS7BBi/uVsHjP/So7yIDbnv8TdxZZLFy/Sfcu/p3f27tP//cMJnTb4NE0cO0YvHP5E//f4MW147umCrtVYU6NnX/xXnXfuuXatiLMg1VmI6u4X4sddk+MOATt37yl5rY3XMpfZBi3Fu7PHSFkJXj/p7OxUQ0ODOjo6VF9fH/dwAGBAci69eNVmxHWDylUvYhw/dlQjhzcppZRa29o084o/0lWXT9ePnnpGrW1t9vPuv+OvM1535pln6o23d0a2FOE39lJ9l37Xc8/0mJAU9rgKuX8zIwIASCS/YkvncsenhwzVn86cmXGz/489+5ROW2ocfqZ9o33v0MdZN9pcfa36w3mz96p12bVnr84dNzbSWoxcO6bM702fllwHDZYCQQQAKpy5aUnZN33z+7AU2mjLXWwpKavmwrnM8Py69Uqneyf6c+28KXSM7uUr5xjd43ZuiXZ2qHXWupj3jCqM5NsxFaTbbqkQRACgwjlvWn6NxMJSaKMt9w3TebaMs9bh1G6fU9UGZjeMX8dTv2UZrzE6H3OO0Wvc5j29OtQ6d+M428OXkqm18VvyKvXOGYIIAEBSsDNp+quQNuheW2Gl3m225kZumnZJp2ZunCHFHJDnXJLIt0Mk6C6hfHUfzg618xd9I2OmJM66G/OdGO7PW+qdMwQRAECgRmKFMssZZieOs15i1569GT1LcoUQs5130vhxSqVSGSf03nDtXLv/h+GszXD28SikaNSvPsXrMb/3Wb12vSZPGK939+23Q5IJIeb3pe7Z4V6GkzL/+04aP67kAYkgAgAVLsiZNMXcnMxyhvMwORMGdu3Zq1QqlRVSDHc48arJcC7NSPK9weY6+yUXvzNvgi5fOZdzTj2WsscYR88Od/BJQtdVgggAlLEwTlkNssOiGM4wY0KDKdY0sxjOniXO6wdpT+88g8YdCtyn4t78V0sKrn3xO/Om2OUr0y4/Kee95DpcsJSiP3kHABAZ869u98FwhRyw9sVr5xR0qFshzOFrpjtp70yI7Fbq//uR5VmHswVlilHdQcF5qJ+U/xA9L85Zoh//40P2GL0e83o/5xZew/nfwswQxamY7yUKzIgAQBkrpPizFLxmaJz/8pZ6D5VLpeTZQj5ojw1TdOlX0+I+8baQ2pdi6kjc7+cuTHXOPExoabFPO45rFiKKmqBiEUQAoMwl6ZRVr62vztNyDcuSb4+NfNtag9a0FFv74rVU5SzydC5V+S1fffHaOXaxqpS5nGN29UTZLj+XqGqCikUQAYABICnr/e4bmvvvZpbAbME1YcTd28LJq2FYkJqWYmtfvGZjgtSseL0mV6v3Uu+YMaKqCSoWZ80AwABgbnBRnR1S7Hic3Ddjd8Gq8znu9wnr84RR3BtUkP4ocResRoWzZgCggiRpvd9wtluvqkrpv86dk7UMkE6n9dMNLyudTqumplrzZl+Tc2kljBDhXDpybh12NzkLI5QkbeYhqQgiAFDGkrbe7xxXOm3ZMzRuzjbrzueYnShetS6Ftof34t5SbOpSTG1KmKGkmOWcSkQQAYAylsR/dQeZoclVO2GCibvWJawdQs73MVuKnTUrzhbshYQcFIcaEQBAaAo5N8bvOZJy1rqEVQ9jmpydakPf23Bs8oTx9iF1A72WIyrUiAAAYhF0hibXzX3S+HH69jdu8611CWOHkLuZlwkhVVW9Z9mYkEIIiR5BBAAQmmLrIgqpdenvKcHua5nlmFNhpCr2bdCVhBbvAIDY5ZpJcbdrD9pm3YtXbYopVE2nLQ0f1mjvpilV2/PVa9f7XmfNupe0eu36yMcQJ2ZEAACxC3rIXX93CDkDj9/MiNlN4zzdN8qZkTB2A5UzgggAoCyEsUPIGXjcocS9W8acFxN1GEnaeUGlxq4ZAEDFC9IsTVKkXVmT1h23Pwq5f1MjAgCoeF+8do7vTf/6udfoi9fOsZdQ3PUcJkBUVfXvlnr93Gt8e6gMZCzNAAAQQNRLKP3dDVSuCCIAAATkDCNebeiLlcTzgkqFIAIAqEjFHqIXRkM197WSeF5QqVAjAgCoSMXWfHgtoTgV2hckaA+VgYoZEQBARSqm5iPIEkqhfUEq/ZRegggAoGIVUvORbwll1569Wrp4UVbAkaSdu/fonb37fN+72GWigYClGQBARQu6bTbXEorpxmqWZMyyynNr1+m5tetyhhCp+GWigYAZEQBARQu6bTbXjMTSxYty7nTJV9Rayd1VCSIAgIoV5rZZr2UeSYH7gkS1NTjpBu5cDwCgZMrxBFm/mo9CT/R1ci7zSCr4lOBK7K7KjAgAoN/K8QTZMA7RczPLPG5B+4JUYndVgggAoN/KscYh7G2z5vNOGj9On504QVLm95Ev4FRqd1WCCAAgFJVa4yB5L/MYXmEkyOsrpbsqQQQAEJqw25+Xi/4u80SxTFQuUpZlWXEPwk9nZ6caGhrU0dGh+vr6uIcDAMjD/Mve1DhUyowIMhVy/2ZGBAAQikqtcUD/EEQAAP0WZY1DJbc/rwQEEQBAv0VZ41COW4MRHEEEANBvUZ4gW45bgxEcQQQAkHiVvDV4oIu0xfvrr7+uz3/+8xo5cqRSqZR++tOfRnk5AMAAVontzytBpEHk+PHjuuCCC/TII49EeRkAQAXwan+O8hfp0sycOXM0Zw6VzACA/mFr8MCVqBqRrq4udXV12T93dnbGOBoAQBJUcvvzSpCoILJs2TLde++9cQ8DAJAgldz+vBKUrMV7KpXSmjVrNG/ePN/neM2INDc30+IdAIAyUrYt3mtra1VbWxv3MAAAQIlEumsGAAAgl0hnRI4dO6Z9+/bZPx84cEA7duxQY2OjRo8eHeWlAQBAGYg0iGzfvl0zZsywf16yZIkk6eabb9aTTz4Z5aUBAEAZiDSIXHnllSpRLSwAAChD1IgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEpiRB5Ac/+IHGjBmjuro6TZkyRT//+c9LcVkAAJBwkQeRZ555RosXL9bSpUv11ltv6YorrtCcOXPU1tYW9aUBAEDCpSzLsqK8wKWXXqqLLrpIjz76qP3YpEmTNG/ePC1btiznazs7O9XQ0KCOjg7V19dHOUwAABCSQu7fkc6InDx5Um+++aZmzZqV8fisWbO0ZcuWrOd3dXWps7Mz4w8AABi4Ig0ihw8fVk9Pj0aMGJHx+IgRI3To0KGs5y9btkwNDQ32n+bm5iiHBwAAYlaSYtVUKpXxs2VZWY9J0h133KGOjg77z8GDB0sxPAAAEJOaKN+8qalJ1dXVWbMfH330UdYsiSTV1taqtrY2yiEBAIAEiXRGZNCgQZoyZYo2btyY8fjGjRs1ffr0KC8NAADKQKQzIpK0ZMkSzZ8/X1OnTtW0adP0+OOPq62tTQsXLoz60gAAIOEiDyJf+tKX9Mknn+hv/uZv9OGHH+q8887TunXr9JnPfCbqSwMAgISLvI9If9BHBACA8pOYPiIAAAC5EEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEhiACAABiQxABAACxIYgAAIDYEEQAAEBsCCIAACA2BBEAABAbgggAAIgNQQQAAMSGIAIAAGJDEAEAALEhiAAAgNgQRAAAQGwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIDUEEAADEJtIgcv/992v69Ok67bTTdPrpp0d5KQAAUIYiDSInT57UjTfeqFtuuSXKywAAgDJVE+Wb33vvvZKkJ598MsrLAACAMhVpEClUV1eXurq67J87OztjHA3cTmxaLqWqVDfz9uzfbV4hWWnVXf2NGEYGAChXiSpWXbZsmRoaGuw/zc3NcQ8JfU5sWq7uA9vUtemh3tDhcOyJm9S16SEplaj/OQEAykDBd4577rlHqVQq55/t27cXNZg77rhDHR0d9p+DBw8W9T6V4MSm5VmBwP7d5hW9sxdhSlWpp3WLqlumZ4SRY0/cZD/uNVMCAEAuBS/NLFq0SF/+8pdzPuecc84pajC1tbWqra0t6rUVJ1XVOwshZQSAE5tXqGvTQ6q9ekmolzPX6Nr0kB1GujavkKweVbdM15AFq0K9HgCgMhQcRJqamtTU1BTFWFAAZzAwPztDSBSzE+5ryuqRUtW+IcSvpsQs89SMuSSrpoRaEwCoLJEWq7a1tenIkSNqa2tTT0+PduzYIUkaN26chgwZEuWlK4IzGHT97BGp52RkIcR5TTMTIkmyenRi8wrva/rM2nQf2Kae1i0ZTzXhpKd1S9ZsDuEEAAauSKsL77rrLl144YW6++67dezYMV144YW68MILi64hQba6mbdL1YOknpNS9aDI6zSOPXGTPRMiKatmxD222quXZPz+xOYVdk1JT+sW+3ETQty1JmaWh0JYABiYUpZlWXEPwk9nZ6caGhrU0dGh+vr6uIeTSPaNui+MRDkj4ixMHbJglX1tEyr8ru03Rvfj7veJeqkJABCNQu7fieojgsK4b9T2jV0K/cbtnMkwNSHuAlZZ6ezX9dWJ+M7apKrtx53hplRLTQCAeBFEypTXbIFXAWtorLRnKLB/9qvhcNSJmDBilmO8Hq+bebsdQkqx1AQAiBdBpFwFCAZhylUoGjQs1M5YJMkRQKSs2Zzu1q12CHGGEwDAwEQQKVMmGHhtkTV/j3u3ibuniV0P0sdZmFo383Z1t271rEExvwcADDwEkXIXcmOzUM+Tcc3a2EsuqSpVj7lMNWMuyXhvE0LM45EuNQEAEoEgUuZCb2wWYrBxBpYTm1dkLLnUtFyWObYSLzUBAJKBIDIAhNnYLIqOrUF294RRgwIAKD8EkQEizN0mYQabku/uAQCUFdpVDhDupY9jT9zk+7wgJ/OG1rE1x5JL7dVLWHIBgApHEBkAnLMODffttTuUusPIsce/5Nsu3R1Q3MHGq4V7EHVXf8M3xNTNvD370LtNy32vFTREAQDKB0GkzHktfQxZsCorjJzYvEI9B7bmfA8TUNzBxn1eTKT6imXd1+LMGQAYmKgRKXc+Sx9DFqyyz4bp+PZ4u85Dyl2IGndNRxTFsk6hbk8GAPQbQaTM5bppDlmwyg4h7joP30LUBGyjDbNYNkvIfVcAAP3DPPcA5lfnkasQtdCajqi4xygrHUrtiCmSdS7/cMovAMSHGZGQxTn177y2++Z67ImbMs54Sfp5Lu4Q1X1gm3pat0gKodFalDMuAICCMCMStjiLLfuubUKHs+7DtE/v2vRQSQpR+7P7xatY1jn+MGYyQtueDADoF2ZEAihkliPqYstcnNc2B8q5Zwx6b+jTPAtRu1u3asiCVXk/YyBF1mLkK5a1w1QIjdaSPisEAJWAIBJEgTfVOKf+ndd27papm3m7Tmxa7luIak6+dd+QnZ+xJIEsQLFsz3vb+zWTEaTlPACgNAgiARRzUw2z5Xox4/W6dr4dNu4bctAbdpiBLN+ZM/2dyYh7ezIAIBNBJKBCbqonNi1X94FtnjfMsAtWvWYp7Jt1qrqgm3W+zxh3IAtlJiMB25MBAKcQRAoQ9KZqdnhUt0zPmGkwyx+h9qpwLRuZa5nOqqamwvy+v5+x0FmOsGoxwprJ4JRfAEgWds0UIMj5K2aHSur0URk1F85gICm0M1OcfTHMbhlzrdqrl/S1e5/muzOmmDNmgu44CbVVPIfnAcCARBAJKPBNte+GOWjqn0k6VTR6KoRYoW/jNTfjntYtUqraDiHmpl3TMk2S1L1/i+dnKvSMmaCBzGsGo9gwkpRGawCAcLE0E0AhywLuG2LXpofsm3ZNy2WRbeP1W1IxNSTOAJC5hDOtoDNmAtdpUIsBAAiAIBKE46bqLA5131RzFqL2nIy0l4hvLUZfDUnt1UvsMGIHFp2aLQkSHPoTyDzfEwBQ8QgiAWTcVF3FofbMg2sLq/Nn540/CkFmKeyxmdoOyW565v6Mfj1BcvUhkcQsBwCgYASRAhWyXGHvjnHMVITdq6KQWYqMs2akvA3Msj47sxwAgJBRrFoEZ9Flx7fHZy+59C1zSKdmIkzxpyR1t/4yvMEUs5ukelDG+DiFFgAQl5RlWVbcg/DT2dmphoYGdXR0qL6+Pu7hZDEt1FU9SA337c34nd9NPY6bvXOpxszMOINI1uNWOrYThAEA5a+Q+zczIkXKu4U1QX0vzLZd97Zc85jpwGrvtonzBGEAQEWhRqQIQYpDk1BPYVrN9xzYmlVD0t261W5+JqvHs/NpHCcIAwAqC0HEh99Js+7+G1KCD01LVdmN1NzLQz2tW5Q6ozmj+ZnfbptSnyAMAKgcBBE/rm26hlnmsPtv9EniFlZnmMhuZDY9qwOrV6CK6wRhAEBlIIj48FuecC9zeL2mEH4zL1I4haF+Mxuy0qppuSxvA7MwDqwDAMAPQSSHkixP+My85OrnUahCZzYKauUOAEA/EETyiHp5ohSFofbMRt/uGK8mZs6Zl0KapAEA0B8EkTxKsTwR5cyLuyakumW6b+ixcWAdAKBEaGiWQ67W7VHMCORqkFYMv/E7Q4m7YDXJoq6nAQCEg4ZmIfBbnjDNwLIamIVwvZwN0orhmtkw4+9p3SKlqssqhEii0RoADEAszfgp4fJEVIWhXrMDGTUvOW7cxcwwlHIHkPmZRmsAUN4IIj5K1Rm11IWh7pmXoDt2goSMUu0Akmi0BgADBUEkbgmYeQk0w+AKGSaYSMp6ftemh9S9f4uG/OUzkcxY0GgNAAYOilUrRL7TgCVlnMDrN/PhnNlwHpxnwkn3gW29NSiu95MUWjGpPeY84wUAxKOQ+zczIgnktQzinIFw39Cd9RfO12a8j5XuO+AunfUaSepyLNn43dQzloyqB2X93oSQ6pbp6nlv+6klICm0pRkarQHAwMI2gyTy2h3S95h7d0jWjhHna11/790tU+W9y8R1Aq+fupm3289T9SB7F1HHt8dnbAl216GEMWtR6p1MAIDoMSOSQEGLVfMVutZevcS+SUvKWFIx580ce+ImbX7lFf2vX1Xpocd/pMtTO3trPFq3qmbMJVlLKe5iV0kZwaSm5bJTSzNho9EaAAw4kdWI/Pa3v9V3vvMdvfLKKzp06JBGjhypr3zlK1q6dKkGDcqe1vdS6TUiXrUQkgLVR7hfK8n+e3XLNMmylP79B0ofadOsZ4/rjQNHdNklF+uV+/+bTv77j6QTnapuma4hC1ZlvafXsojzOu6lmdoZi+xGal7hBgAwsCSiRuTdd99VOp3WY489pnHjxuk3v/mNFixYoOPHj+vBBx+M6rJFC7Q9VSppZ0+/3SFBdoy4XyvJMWsxzQ4Qr3w8VG8cOKj/cfFQ/WDbG1r/xG81s2WwUmc0q2bMJfb7HXvippwN0GpnLFJ36y/V0/rLrGJVSfaSTU3LZWF9PQCAASCyIDJ79mzNnj3b/rmlpUW7d+/Wo48+msggErQHRtR9Mpx8u60GOPsm1xJKd+svVd0yXd37/10PbNini/9Tre6/6nS98UGXHvhFh66adKYa/ucvMt7L1H/ISuvE5hXq3r9FPQe2ZuyIqWmZpvSRg7J+/76k3nDi/M7Y3QIAcCtpjUhHR4caGxt9f9/V1aWuri77587OzlIMS1JhXTtL0dkz1zJIvh0jvttsXcHg1RMteqP9oFb/2XClUil96/IGffHZj/XKOx9pbl/A8RtH6ozmjO8uY5lGUur0Ub67awAAMErWR2T//v266KKL9A//8A/6+te/7vmce+65R/fee2/W46WsEQnSoyLqPhZe4cYdREwnU0neQeH0/6Sqxs+op3WLUqePUlXjaHtppbt1q7r3/7tm/e/fSZJenj9CqVRKlmX1PlY7RC//2aeVqqn1/HzHHv+Seg5szTo8L3X6KFm/f1/VYy7TkL98JuMQv9oZizKWrjjADgAGrkgPvbvnnnuUSqVy/tm+fXvGa9rb2zV79mzdeOONviFEku644w51dHTYfw4ePFjo8PrNvT3V60aZ8ZxUte/N9MSm5cUNwmt3SN9jzhCSFUwkdbf+UpJOhZAzmmX9/n07MJzc/qx6Wrf01oa0n9S3Lm9QKpWSJHtW5I0DR/TKb09mfAcnNq/Qscf/TCc2r1DN2OmSlHF4npSyl2Rqxk73XBrKCBYcYAcAUBEzIocPH9bhw4dzPuecc85RXV2dpN4QMmPGDF166aV68sknVVUV/AYTx66ZgmZEUtWS1ZN3d0nUY3XPhrh/tnt79LFnPnRqNsTzd38+Uimls2Y+3Es+Tu4twrmWr/KNv1SYnQGAcEW6a6apqUlNTU2BnvvBBx9oxowZmjJlilauXFlQCIlDkK6d7ueY3STHnrhJQxasKunNNN8BcBl1L32hSZJeOXBCb7SftGtDnJy1Ij+vn6UZTZ12CHF+vuqW6Z5j6m7dmrW7xq8vSmIOsCvBYX0AAG+RFau2t7fryiuv1OjRo/Xggw/q448/tn931llnRXXZogU5Bdf8vbpluv27IQtW2WGk445zJFklvZnmOwAu4/fqnfF44BcdunjkIF01ps7zPa8aU6eLRw7Sd77/Y13xlTNVM/aP1NO6xd6hY8KGlJJ0akLNOXMSuOmYle4NSR7jL9VsRCGFygCAcEU2RfHyyy9r3759euWVVzRq1CidffbZ9p9EytG1067BsE4tUzhrG3qXZfpuyj41I1Hx3eLr/n2qWtKp2RBnbYibXSvyQZdeea9bQxassr8Ds5U3I4T07YxxLt94tVuvm3l7VqjoPrCtd6amL4yY15W6VsTZKr7j2+MJIQBQIpy+WwS/5Rmz/FGqG1iuGgtZ6cxD6Fq35KwNcXM+97Uff1eDr17ce033CbvKrAupbpmmmpZpgWYy3DUsXrUopQ4CGTt9rvwf1I4AQBES0Vl1IMuobdi8IqNgtVSnweZdSqqrl050KnVGsx0afn76XL3R/phnbYibs1Zk/RN/q7mpVO/7m8PzlF2YamYUalqmBQ4hWfU4fbtwvJZ3ouaeXXIGLmpHACAaBJEimF0WdgFoqjpj10y1o4V6ZDdTx1KSc9eHs4YjdUazrP9zUKqr16A/+pq+85ff0ZjTazTstGrtOHQy7yWGnVatMWcM0nffHqyrNv6D/b7VYy5TzdjMoOAMI4EOn3MthWXUsqSqM9rLl0KuHUfUjgBAdAgixXDssjBhxNQ2mJuUWZ6IildPDqn3ZumemdGJTh19abnaD3fog6M9uvLJQwVdq3vwSaX+5P+TDv4q7+m3QZcq8p7qW8I+Ivlml0wYiXVnDwAMUASRfqp17biI4ybltz3WqbZGemn+CHV+7iZJ0slfPaVBl/53+++Sem/+VlqDLv3vqr3kv9mvPfPMM9UwalSgMRQjyLbpSOUoVDa/d54mTAgBgPAQRArkeY5LROepFNJoy68nh6lhkaRR9TXSe//SO/bx3zo1Y3LWoKwal9rxzUXfcAsZt9/ZNeazdLdu7TtQ77LICkNzva8JRkEOGgQAFI4gUijXv56dPTzMeSqhKbDRlrunSO9NvEdmq63ZjZIVnvpqMpw3WHddhN+yi2fo6Bt3d+tW1Yy5JCt0ZIzbsSXaec3MfiWy28qX2rEnvqye1l96ztZIYucMAPQTQaRAzptO3vNU+nutAhttucfjLFh1b40145VSktXjvUOkr29Izh0iHmHJHSLM2LzG7Q4pzpu8c3dOHDMQvT1TfpnxmPu/CTtnAKB/CCJFiqKuwWt2wWurcK7zb9znwAyacqP9s/vMGUmq7QsCxe4Q8QtLztBjenMEfR9nGIm1MLRv9suMS8r8b1s95jKWaACgnwgiRQjSDr6oG5RjScO5HdjWt1U4Vwgx23n9enwY1S3TVdNyWdbvi9khkuvMGGeDsFzvY2+JdswuOU/+jWMJxH09ds4AQPiSfQpdyE5sWu7ZelzqvZmf2LQ82BsFaQdfhLqZt9uzCMeeuMkeV2ZNQk/2Z3D35Lj6G3adhed4+vqeONuaS72zDzVjLjkVBgrYIVI38/as1+VrP585pr4lnh5Hf5Oekzr2xE0lbfXux+vzAQD6r7JmREI6ZTXfLov+yDhE784We8eLJN9lgiDjOfb4l3ofcO38cPcAKXaHiPt15jMUs3TlnK2Jq8uqGztnACAaFRVEyuWU1SELVnmGEHcnU/euFD8nNq9Qz4GtvqHAvG+xdS9+Z+84A0SupSvndcyMUM972+3fO0/+jUPsfU4AYACrqCAi5a5nKDW/fhsnHL0/vDh3pdS0XJb7GgHrWYqte/F6nWnP7g4QztmXDFbaPizPXVdSO2ORuvdvibRLbS6R1QMBACRVYBCRsvttxHYj8VgqyqoJ6Wsh77UrJVCACtA1tKDnBXh/55Zc9+u8xptvS/SQv3wm92eMUrHfCwAgkJRlWVbcg/BTyDHChbBv9n03uziXZfy23UrKWuqQZI/ZdEL1er9ybbLltwSSpGUzAEB+hdy/K2rXjJR5s2u4b6+9ayTnjo4IOXeudDnG4Lz5DlmwStUtfZ1F+06nNcseTnbACmmHiXOXkXvHkXOXUUE7jvyu5bMEEvd/HwBAtCpqaSap6/0ZS0WpatU6d7P0qWm5rHdWpG+pJl/zsULOe/GVccrwqb93799iF7+6Z3SKnpFhCQQAKlJFBZGk3uy86iLcv/dassjZfCyErcrOkObVFM0UzHotpRQqyi3RAIDkqqggksSbXb6toXlncVLVnkW3YW1Vzngf10F5zi261HMAAIpRkcWqSeF3886YWbDSvkssdhFrjqLbsApznVtqJdnLSLJ6ElH0CwBIjkLu3xU1I5I4AZaK/GZx3Ft4/ZpshbFV2XPpyPzdZ0YGAIAgCCIxKnapqJCi2/62JvfbXlw7Y5FdI2LCSCm6n4ZShAsASAyCSDkKWHTb39bkXiHEHUhMS3b3Lp7IhHReEAAgGQgiZSjITEooW5UdgefEpuUZgUaSqlumaciCVfZMhDmoLtB7F6lczgsCAARDseoAFeUSRpD3lhTpEkqSuuMCADIVcv8miCASQXYE9Tc4OHfyNNy3t79DBgCEhF0ziF3USyj9LcIFACQDQQSRcYYRz+6vRepvES4AIDkIIgik2JqTMPqYuK+VxPOCAADFqbjTd1Gkvm2zhZ7467WEkvF716m+7tdmneqbY+uy3YkWAFA2mBFBIMXUfARaQimwL0gSzwsCABSPIILACqn5yLeE0t26VUMWrMoKOJLUvX+Leg5sZUsuAFQAgggKErjmI8cSimkNb3a6eIWRXCGENu8AMHBQI4KC5Kv5MOqu/oZvkBiyYJVqr17iWXMiKX9Ra5H1KgCA5GFGBIGFuW3Wa5lHUqC+ILR5B4CBgyASg3JcWohi22zGMo9UUMCJqkcJAKC0CCJxKMcTZAOe+FsIe5nHJWjACbtHCQCg9AgiMSjHpYWwt82az1vdMk01LdMkZX4fQQIObd4BoPwRRGJSyUsLuUKXZxgJ8B60eQeA8kQQiVHFLi30c5mHNu8AMHAQRGJUqUsL/V7miaBeBQAQD4JITFhaKB5t3gFg4CCIxCDKpYVy3BoMAKhcBJE4RLm0UI5bgwEAFYsgEoMolxbKcWswAKByEUQGoEreGgwAKC+Rng72hS98QaNHj1ZdXZ3OPvtszZ8/X+3t7VFeEn3qZt5u78apqK3BAICyEmkQmTFjhp599lnt3r1bq1ev1v79+3XDDTdEeUn0CXpKLgAAcUpZlmWV6mIvvvii5s2bp66uLn3qU5/K+/zOzk41NDSoo6ND9fX1JRjhwOC3NZjlGQBAKRRy/y5ZjciRI0f01FNPafr06b4hpKurS11dXfbPnZ2dpRregEHXUQBAOYl0aUaSvvnNb+rTn/60hg0bpra2Nr3wwgu+z122bJkaGhrsP83NzVEPb+DJsTW49uoldB0FACRKwUsz99xzj+69996cz3njjTc0depUSdLhw4d15MgRvffee7r33nvV0NCgtWvXKpVKZb3Oa0akubmZpRkAAMpIIUszBQeRw4cP6/Dhwzmfc84556iuri7r8ffff1/Nzc3asmWLpk2blvda1IgAAFB+Iq0RaWpqUlNTU1EDM5nHOesBAAAqV2TFqtu2bdO2bdt0+eWX64wzzlBra6vuuusujR07NtBsCAAAGPgiK1YdPHiwnn/+ec2cOVMTJ07UV7/6VZ133nl67bXXVFtbG9VlAQBAGYlsRuRzn/ucXnnllajeHgAADACRb98FAADwQxABAACxIYgAAIDYEEQAAEBsCCIAACA2JTv0rhimARqH3wEAUD7MfTtI8/ZEB5GjR49KEoffAQBQho4ePaqGhoaczyn4rJlSSqfTam9v19ChQz0PyfNiDso7ePAg59MUgO+tcHxnxeF7KxzfWXH43goX1ndmWZaOHj2qkSNHqqoqdxVIomdEqqqqNGrUqKJeW19fz//wisD3Vji+s+LwvRWO76w4fG+FC+M7yzcTYlCsCgAAYkMQAQAAsRlwQaS2tlZ33303B+sViO+tcHxnxeF7KxzfWXH43goXx3eW6GJVAAAwsA24GREAAFA+CCIAACA2BBEAABAbgggAAIjNgA8iX/jCFzR69GjV1dXp7LPP1vz589Xe3h73sBLrt7/9rb72ta9pzJgxGjx4sMaOHau7775bJ0+ejHtoiXf//fdr+vTpOu2003T66afHPZxE+sEPfqAxY8aorq5OU6ZM0c9//vO4h5R4r7/+uj7/+c9r5MiRSqVS+ulPfxr3kBJv2bJluvjiizV06FCdeeaZmjdvnnbv3h33sBLt0Ucf1fnnn283Mps2bZrWr19fkmsP+CAyY8YMPfvss9q9e7dWr16t/fv364Ybboh7WIn17rvvKp1O67HHHtPOnTu1fPly/dM//ZPuvPPOuIeWeCdPntSNN96oW265Je6hJNIzzzyjxYsXa+nSpXrrrbd0xRVXaM6cOWpra4t7aIl2/PhxXXDBBXrkkUfiHkrZeO2113Trrbdq69at2rhxo7q7uzVr1iwdP3487qEl1qhRo/TAAw9o+/bt2r59u6666ipdd9112rlzZ/QXtyrMCy+8YKVSKevkyZNxD6Vs/N3f/Z01ZsyYuIdRNlauXGk1NDTEPYzEueSSS6yFCxdmPHbuueda3/rWt2IaUfmRZK1ZsybuYZSdjz76yJJkvfbaa3EPpaycccYZ1g9/+MPIrzPgZ0Scjhw5oqeeekrTp0/Xpz71qbiHUzY6OjrU2NgY9zBQxk6ePKk333xTs2bNynh81qxZ2rJlS0yjQqXo6OiQJP7vWEA9PT16+umndfz4cU2bNi3y61VEEPnmN7+pT3/60xo2bJja2tr0wgsvxD2ksrF//35973vf08KFC+MeCsrY4cOH1dPToxEjRmQ8PmLECB06dCimUaESWJalJUuW6PLLL9d5550X93AS7de//rWGDBmi2tpaLVy4UGvWrNHkyZMjv25ZBpF77rlHqVQq55/t27fbz//rv/5rvfXWW3r55ZdVXV2tP//zP5dVYQ1lC/3OJKm9vV2zZ8/WjTfeqK9//esxjTxexXxv8JdKpTJ+tiwr6zEgTIsWLdLbb7+tVatWxT2UxJs4caJ27NihrVu36pZbbtHNN9+sXbt2RX7dmsivEIFFixbpy1/+cs7nnHPOOfbfm5qa1NTUpAkTJmjSpElqbm7W1q1bSzLllBSFfmft7e2aMWOGpk2bpscffzzi0SVXod8bvDU1Nam6ujpr9uOjjz7KmiUBwnLbbbfpxRdf1Ouvv65Ro0bFPZzEGzRokMaNGydJmjp1qt544w2tWLFCjz32WKTXLcsgYoJFMcxMSFdXV5hDSrxCvrMPPvhAM2bM0JQpU7Ry5UpVVZXlxFko+vO/NZwyaNAgTZkyRRs3btT1119vP75x40Zdd911MY4MA5FlWbrtttu0Zs0avfrqqxozZkzcQypLlmWV5F5ZlkEkqG3btmnbtm26/PLLdcYZZ6i1tVV33XWXxo4dW1GzIYVob2/XlVdeqdGjR+vBBx/Uxx9/bP/urLPOinFkydfW1qYjR46ora1NPT092rFjhyRp3LhxGjJkSLyDS4AlS5Zo/vz5mjp1qj3T1tbWRv1RHseOHdO+ffvsnw8cOKAdO3aosbFRo0ePjnFkyXXrrbfqJz/5iV544QUNHTrUnolraGjQ4MGDYx5dMt15552aM2eOmpubdfToUT399NN69dVXtWHDhugvHvm+nBi9/fbb1owZM6zGxkartrbWOuecc6yFCxda77//ftxDS6yVK1dakjz/ILebb77Z83v72c9+FvfQEuP73/++9ZnPfMYaNGiQddFFF7GdMoCf/exnnv+7uvnmm+MeWmL5/d+wlStXxj20xPrqV79q///N4cOHWzNnzrRefvnlklw7ZVkVVrUJAAASo3IX/wEAQOwIIgAAIDYEEQAAEBuCCAAAiA1BBAAAxIYgAgAAYkMQAQAAsSGIAACA2BBEAABAbAgiAAAgNgQRAAAQG4IIAACIzf8DYp3+lvuVfPsAAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "engine": 0 - }, - "output_type": "display_data" - } - ], - "source": [ - "%%px --target 0\n", - "plt.plot(c1_np[:,0], c1_np[:,1], 'x', color='#f0781e')\n", - "plt.plot(c2_np[:,0], c2_np[:,1], 'x', color='#5a696e')\n", - "plt.plot(centroids[0,0],centroids[0,1], '^', markersize=10, markeredgecolor='black', color='#f0781e' )\n", - "plt.plot(centroids[1,0],centroids[1,1], '^', markersize=10, markeredgecolor='black',color='#5a696e')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The Iris Dataset\n", - "------------------------------\n", - "The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from\n", - "three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader.\n", - "\n", - "**NOTE: you might have to change the path to the dataset in the following cell.**" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "%%px\n", - "iris = ht.load(\"./heat/datasets/iris.csv\", sep=\";\", split=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well.\n", - "\n", - "Fitting the dataset with `kmeans`:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\u001b[0;31mOut[2:20]: \u001b[0m\n", - "KMeans({\n", - " \"n_clusters\": 3,\n", - " \"init\": \"probability_based\",\n", - " \"max_iter\": 300,\n", - " \"tol\": 0.0001,\n", - " \"random_state\": null\n", - "})" - ] - }, - "metadata": { - "after": null, - "completed": null, - "data": {}, - "engine_id": 2, - "engine_uuid": "69de46f1-abc608b965c5bc79faeb092a", - "error": null, - "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", - "execute_result": { - "data": { - "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" - }, - "execution_count": 20, - "metadata": {} - }, - "follow": null, - "msg_id": null, - "outputs": [], - "received": null, - "started": null, - "status": null, - "stderr": "", - "stdout": "", - "submitted": "2024-03-21T09:47:32.371869Z" - }, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "\u001b[0;31mOut[3:20]: \u001b[0m\n", - "KMeans({\n", - " \"n_clusters\": 3,\n", - " \"init\": \"probability_based\",\n", - " \"max_iter\": 300,\n", - " \"tol\": 0.0001,\n", - " \"random_state\": null\n", - "})" - ] - }, - "metadata": { - "after": null, - "completed": null, - "data": {}, - "engine_id": 3, - "engine_uuid": "a4657187-cf8e91c40f19240ba56a42f6", - "error": null, - "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", - "execute_result": { - "data": { - "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" - }, - "execution_count": 20, - "metadata": {} - }, - "follow": null, - "msg_id": null, - "outputs": [], - "received": null, - "started": null, - "status": null, - "stderr": "", - "stdout": "", - "submitted": "2024-03-21T09:47:32.371965Z" - }, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "\u001b[0;31mOut[1:20]: \u001b[0m\n", - "KMeans({\n", - " \"n_clusters\": 3,\n", - " \"init\": \"probability_based\",\n", - " \"max_iter\": 300,\n", - " \"tol\": 0.0001,\n", - " \"random_state\": null\n", - "})" - ] - }, - "metadata": { - "after": null, - "completed": null, - "data": {}, - "engine_id": 1, - "engine_uuid": "689d2228-122a4c5ed76d6d5375819746", - "error": null, - "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", - "execute_result": { - "data": { - "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" - }, - "execution_count": 20, - "metadata": {} - }, - "follow": null, - "msg_id": null, - "outputs": [], - "received": null, - "started": null, - "status": null, - "stderr": "", - "stdout": "", - "submitted": "2024-03-21T09:47:32.371782Z" - }, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "\u001b[0;31mOut[0:26]: \u001b[0m\n", - "KMeans({\n", - " \"n_clusters\": 3,\n", - " \"init\": \"probability_based\",\n", - " \"max_iter\": 300,\n", - " \"tol\": 0.0001,\n", - " \"random_state\": null\n", - "})" - ] - }, - "metadata": { - "after": null, - "completed": null, - "data": {}, - "engine_id": 0, - "engine_uuid": "e3649dd0-f970dcd5e37935a1f3fe07c8", - "error": null, - "execute_input": "k = 3\nkmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\nkmeans.fit(iris)\n", - "execute_result": { - "data": { - "text/plain": "KMeans({\n \"n_clusters\": 3,\n \"init\": \"probability_based\",\n \"max_iter\": 300,\n \"tol\": 0.0001,\n \"random_state\": null\n})" - }, - "execution_count": 26, - "metadata": {} - }, - "follow": null, - "msg_id": null, - "outputs": [], - "received": null, - "started": null, - "status": null, - "stderr": "", - "stdout": "", - "submitted": "2024-03-21T09:47:32.371675Z" - }, - "output_type": "display_data" - } - ], - "source": [ - "%%px\n", - "k = 3\n", - "kmeans = ht.cluster.KMeans(n_clusters=k, init=\"kmeans++\")\n", - "kmeans.fit(iris)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster." - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[stdout:3] Number of points assigned to c1: 50 \n", - "Number of points assigned to c2: 38 \n", - "Number of points assigned to c3: 62\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:0] Number of points assigned to c1: 50 \n", - "Number of points assigned to c2: 38 \n", - "Number of points assigned to c3: 62\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:1] Number of points assigned to c1: 50 \n", - "Number of points assigned to c2: 38 \n", - "Number of points assigned to c3: 62\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[stdout:2] Number of points assigned to c1: 50 \n", - "Number of points assigned to c2: 38 \n", - "Number of points assigned to c3: 62\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%px\n", - "labels = kmeans.predict(iris).squeeze()\n", - "\n", - "# Select points assigned to clusters c1, c2 and c3\n", - "c1 = iris[ht.where(labels == 0), :]\n", - "c2 = iris[ht.where(labels == 1), :]\n", - "c3 = iris[ht.where(labels == 2), :]\n", - "# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance\n", - "#TODO is balancing really necessary?\n", - "c1.balance_()\n", - "c2.balance_()\n", - "c3.balance_()\n", - "\n", - "print(f\"Number of points assigned to c1: {c1.shape[0]} \\n\"\n", - " f\"Number of points assigned to c2: {c2.shape[0]} \\n\"\n", - " f\"Number of points assigned to c3: {c3.shape[0]}\")" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Number of points assigned to c1: 50 \n", - "Number of points assigned to c2: 39 \n", - "Number of points assigned to c3: 61\n" - ] - } - ], - "source": [ - "# compare Heat results with sklearn\n", - "from sklearn.cluster import KMeans\n", - "import sklearn.datasets\n", - "k = 3\n", - "iris_sk = sklearn.datasets.load_iris().data\n", - "kmeans_sk = KMeans(n_clusters=k, init=\"k-means++\").fit(iris_sk)\n", - "labels_sk = kmeans_sk.predict(iris_sk)\n", - "\n", - "c1_sk = iris_sk[labels_sk == 0, :]\n", - "c2_sk = iris_sk[labels_sk == 1, :]\n", - "c3_sk = iris_sk[labels_sk == 2, :]\n", - "print(f\"Number of points assigned to c1: {c1_sk.shape[0]} \\n\"\n", - " f\"Number of points assigned to c2: {c2_sk.shape[0]} \\n\"\n", - " f\"Number of points assigned to c3: {c3_sk.shape[0]}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "heat_env", - "language": "python", - "name": "heat_env" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py b/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py new file mode 100644 index 0000000000..3c7fb8ec3d --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/01_basics_dndarrays.py @@ -0,0 +1,25 @@ +import heat as ht + +# ### DNDarrays +# +# +# Similar to a NumPy `ndarray`, a Heat `dndarray` (we'll get to the `d` later) is a grid of values of a single (one particular) type. The number of dimensions is the number of axes of the array, while the shape of an array is a tuple of integers giving the number of elements of the array along each dimension. +# +# Heat emulates NumPy's API as closely as possible, allowing for the use of well-known **array creation functions**. + + +a = ht.array([1, 2, 3]) +print("array creation with values [1,2,3] with the heat array method:") +print(a) + +a = ht.ones((4, 5)) +print("array creation of shape = (4, 5) example with the heat ones method:") +print(a) + +a = ht.arange(10) +print("array creation with [0,1,...,9] example with the heat arange method:") +print(a) + +a = ht.full((3, 2), fill_value=9) +print("array creation with ones and shape = (3, 2) with the heat full method:") +print(a) diff --git a/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py b/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py new file mode 100644 index 0000000000..5c6ab039d3 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/02_basics_datatypes.py @@ -0,0 +1,22 @@ +import heat as ht +import numpy as np +import torch + +# ### Data Types +# +# Heat supports various data types and operations to retrieve and manipulate the type of a Heat array. However, in contrast to NumPy, Heat is limited to logical (bool) and numerical types (uint8, int16/32/64, float32/64, and complex64/128). +# +# **NOTE:** by default, Heat will allocate floating-point values in single precision, due to a much higher processing performance on GPUs. This is one of the main differences between Heat and NumPy. + +a = ht.zeros((3, 4)) +print(f"floating-point values in single precision is default: {a.dtype}") + +b = torch.zeros(3, 4) +print(f"like in PyTorch: {b.dtype}") + + +b = np.zeros((3, 4)) +print(f"whereas floating-point values in double precision is default in numpy: {b.dtype}") + +b = a.astype(ht.int64) +print(f"casting to int64: {b}") diff --git a/tutorials/scripts/hpc/01_basics/03_basics_operations.py b/tutorials/scripts/hpc/01_basics/03_basics_operations.py new file mode 100644 index 0000000000..f2ea879388 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/03_basics_operations.py @@ -0,0 +1,30 @@ +import heat as ht + +# ### Operations +# +# Heat supports many mathematical operations, ranging from simple element-wise functions, binary arithmetic operations, and linear algebra, to more powerful reductions. Operations are by default performed on the entire array or they can be performed along one or more of its dimensions when available. Most relevant for data-intensive applications is that **all Heat functionalities support memory-distributed computation and GPU acceleration**. This holds for all operations, including reductions, statistics, linear algebra, and high-level algorithms. +# +# You can try out the few simple examples below if you want, but we will skip to the [Parallel Processing](#Parallel-Processing) section to see memory-distributed operations in action. + +a = ht.full((3, 4), 8) +b = ht.ones((3, 4)) +c = a + b +print("matrix addition a + b:") +print(c) + + +c = ht.sub(a, b) +print("matrix substraction a - b:") +print(c) + +c = ht.arange(5).sin() +print("application of sin() elementwise:") +print(c) + +c = a.T +print("transpose operation:") +print(c) + +c = b.sum(axis=1) +print("summation of array elements:") +print(c) diff --git a/tutorials/scripts/hpc/01_basics/04_basics_indexing.py b/tutorials/scripts/hpc/01_basics/04_basics_indexing.py new file mode 100644 index 0000000000..0949a21f09 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/04_basics_indexing.py @@ -0,0 +1,13 @@ +import heat as ht + +# ## Indexing + +# Heat allows the indexing of arrays, and thereby, the extraction of a partial view of the elements in an array. It is possible to obtain single values as well as entire chunks, i.e. slices. + +a = ht.arange(10) + +print(a[3]) +print(a[1:7]) +print(a[::2]) + +# **NOTE:** Indexing in Heat is undergoing a [major overhaul](https://github.com/helmholtz-analytics/heat/pull/938), to increase interoperability with NumPy/PyTorch indexing, and to provide a fully distributed item setting functionality. Stay tuned for this feature in the next release. diff --git a/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py b/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py new file mode 100644 index 0000000000..e84663b164 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/05_basics_broadcast.py @@ -0,0 +1,14 @@ +import heat as ht + +# --- +# Heat implements the same broadcasting rules (implicit repetion of an operation when the rank/shape of the operands do not match) as NumPy does, e.g.: + +a = ht.arange(10) + 3 +print(f"broadcast example of adding single value 3 to [0, 1, ..., 9]: {a}") + + +a = ht.ones((3, 4)) +b = ht.arange(4) +print( + f"broadcasting across the first dimension of {a} with shape = (3, 4) and {b} with shape = (4): {a + b}" +) diff --git a/tutorials/scripts/hpc/01_basics/06_basics_gpu.py b/tutorials/scripts/hpc/01_basics/06_basics_gpu.py new file mode 100644 index 0000000000..6383d8dda4 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/06_basics_gpu.py @@ -0,0 +1,39 @@ +import heat as ht +import torch + +# ## Parallel Processing +# --- +# +# Heat's actual power lies in the possibility to exploit the processing performance of modern accelerator hardware (GPUs) as well as distributed (high-performance) cluster systems. All operations executed on CPUs are, to a large extent, vectorized (AVX) and thread-parallelized (OpenMP). Heat builds on PyTorch, so it supports GPU acceleration on Nvidia and AMD GPUs. +# +# For distributed computations, your system or laptop needs to have Message Passing Interface (MPI) installed. For GPU computations, your system needs to have one or more suitable GPUs and (MPI-aware) CUDA/ROCm ecosystem. +# +# **NOTE:** The GPU examples below will only properly execute on a computer with a GPU. Make sure to either start the notebook on an appropriate machine or copy and paste the examples into a script and execute it on a suitable device. + +# ### GPUs +# +# Heat's array creation functions all support an additional parameter that which places the data on a specific device. By default, the CPU is selected, but it is also possible to directly allocate the data on a GPU. + +if torch.cuda.is_available(): + ht.zeros((3, 4), device="gpu") + +# Arrays on the same device can be seamlessly used in any Heat operation. + +if torch.cuda.is_available(): + a = ht.zeros((3, 4), device="gpu") + b = ht.ones((3, 4), device="gpu") + print(a + b) + +# However, performing operations on arrays with mismatching devices will purposefully result in an error (due to potentially large copy overhead). + +if torch.cuda.is_available(): + a = ht.full((3, 4), 4, device="cpu") + b = ht.ones((3, 4), device="gpu") + print(a + b) + +# It is possible to explicitly move an array from one device to the other and back to avoid this error. + +if torch.cuda.is_available(): + a = ht.full((3, 4), 4, device="gpu") + a.cpu() + print(a + b) diff --git a/tutorials/scripts/hpc/01_basics/07_basics_distributed.py b/tutorials/scripts/hpc/01_basics/07_basics_distributed.py new file mode 100644 index 0000000000..b92eb169be --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/07_basics_distributed.py @@ -0,0 +1,70 @@ +import heat as ht + +# ### Distributed Computing +# +# Heat is also able to make use of distributed processing capabilities such as those in high-performance cluster systems. For this, Heat exploits the fact that the operations performed on a multi-dimensional array are usually identical for all data items. Hence, a data-parallel processing strategy can be chosen, where the total number of data items is equally divided among all processing nodes. An operation is then performed individually on the local data chunks and, if necessary, communicates partial results behind the scenes. A Heat array assumes the role of a virtual overlay of the local chunks and realizes and coordinates the computations - see the figure below for a visual representation of this concept. +# +# +# +# The chunks are always split along a singular dimension (i.e. 1-D domain decomposition) of the array. You can specify this in Heat by using the `split` paramter. This parameter is present in all relevant functions, such as array creation (`zeros(), ones(), ...`) or I/O (`load()`) functions. +# +# +# +# +# Examples are provided below. The result of an operation on a Heat tensor will in most cases preserve the split of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation. +# +# You may also modify the data partitioning of a Heat array by using the `resplit()` function. This allows you to repartition the data as you so choose. Please note, that this should be used sparingly and for small data amounts only, as it entails significant data copying across the network. Finally, a Heat array without any split, i.e. `split=None` (default), will result in redundant copies of data on each computation node. +# +# On a technical level, Heat follows the so-called [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_synchronous_parallel) processing model. For the network communication, Heat utilizes the [Message Passing Interface (MPI)](https://computing.llnl.gov/tutorials/mpi/), a *de facto* standard on modern high-performance computing systems. It is also possible to use MPI on your laptop or desktop computer. Respective software packages are available for all major operating systems. In order to run a Heat script, you need to start it slightly differently than you are probably used to. This +# +# ```bash +# python ./my_script.py +# ``` +# +# becomes this instead: +# +# ```bash +# mpirun -n python ./my_script.py +# ``` +# On an HPC cluster you'll of course use SBATCH or similar. +# +# +# Let's see some examples of working with distributed Heat: + +# In the following examples, we'll recreate the array shown in the figure, a 3-dimensional DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)). + + +dndarray = ht.arange(60).reshape(5, 4, 3) +if dndarray.comm.rank == 0: + print("3-dimensional DNDarray of integers ranging from 0 to 59:") +print(dndarray) + + +# Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data. +# +# Let's experiment with a distributed DNDarray: we'll split the same DNDarray as above, but distributed along the major axis. + + +dndarray = ht.arange(60, split=0).reshape(5, 4, 3) +if dndarray.comm.rank == 0: + print("3-dimensional DNDarray of integers ranging from 0 to 59:") +print(dndarray) + + +# The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the "local array" via the `larray` attribute. + +print(f"data on process no {dndarray.comm.rank}: {dndarray.larray}") + + +# Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information. + +# The DNDarray can be distributed along any axis. Modify the `split` attribute when creating the DNDarray in the cell above, to distribute it along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes. + +# The `DNDarray` object has a number of methods and attributes that are useful for distributed computing. In particular, it keeps track of its global and local (on a given process) shape through distributed operations and array manipulations. The DNDarray is also associated to a `comm` object, the MPI communicator. +# +# (In MPI, the *communicator* is a group of processes that can communicate with each other. The `comm` object is a `MPI.COMM_WORLD` communicator, which is the default communicator that includes all the processes. The `comm` object is used to perform collective operations, such as reductions, scatter, gather, and broadcast. The `comm` object is also used to perform point-to-point communication between processes.) + + +print(f"Global shape on rank {dndarray.comm.rank}: {dndarray.shape}") +print(f"Local shape on rank: {dndarray.comm.rank}: {dndarray.lshape}") +print(f"Local device on rank: {dndarray.comm.rank}: {dndarray.device}") diff --git a/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py b/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py new file mode 100644 index 0000000000..a8bf106585 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/08_basics_distributed_operations.py @@ -0,0 +1,24 @@ +import heat as ht + +dndarray = ht.arange(60, split=0).reshape(5, 4, 3) + +# You can perform a vast number of operations on DNDarrays distributed over multi-node and/or multi-GPU resources. Check out our [Numpy coverage tables](https://github.com/helmholtz-analytics/heat/blob/main/coverage_tables.md) to see what operations are already supported. +# +# The result of an operation on DNDarays will in most cases preserve the `split` or distribution axis of the respective operands. However, in some cases the split axis might change. For example, a transpose of a Heat array will equally transpose the split axis. Furthermore, a reduction operations, e.g. `sum()` that is performed across the split axis, might remove data partitions entirely. The respective function behaviors can be found in Heat's documentation. + + +# transpose +print(dndarray.T) + + +# reduction operation along the distribution axis +print(dndarray.sum(axis=0)) + +# min / max etc. +print(ht.sin(dndarray).min(axis=0)) + + +other_dndarray = ht.arange(60, 120, split=0).reshape(5, 4, 3) # distributed reshape + +# element-wise multiplication +print(dndarray * other_dndarray) diff --git a/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py b/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py new file mode 100644 index 0000000000..d15ea26eb8 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/09_basics_distributed_matmul.py @@ -0,0 +1,55 @@ +# As we saw earlier, because the underlying data objects are PyTorch tensors, we can easily create DNDarrays on GPUs or move DNDarrays to GPUs. This allows us to perform distributed array operations on multi-GPU systems. +# +# So far we have demostrated small, easy-to-parallelize arithmetical operations. Let's move to linear algebra. Heat's `linalg` module supports a wide range of linear algebra operations, including matrix multiplication. Matrix multiplication is a very common operation data analysis, it is computationally intensive, and not trivial to parallelize. +# +# With Heat, you can perform matrix multiplication on distributed DNDarrays, and the operation will be parallelized across the MPI processes. Here on 4 GPUs: + +import heat as ht +import torch + +if torch.cuda.is_available(): + device = "gpu" +else: + device = "cpu" + +n, m = 400, 400 +x = ht.random.randn(n, m, split=0, device=device) # distributed RNG +y = ht.random.randn(m, n, split=None, device=device) +z = x @ y +print(z) + +# `ht.linalg.matmul` or `@` breaks down the matrix multiplication into a series of smaller `torch` matrix multiplications, which are then distributed across the MPI processes. This operation can be very communication-intensive on huge matrices that both require distribution, and users should choose the `split` axis carefully to minimize communication overhead. + +# You can experiment with sizes and the `split` parameter (distribution axis) for both matrices and time the result. Note that: +# - If you set **`split=None` for both matrices**, each process (in this case, each GPU) will attempt to multiply the entire matrices. Depending on the matrix sizes, the GPU memory might be insufficient. (And if you can multiply the matrices on a single GPU, it's much more efficient to stick to PyTorch's `torch.linalg.matmul` function.) +# - If **`split` is not None for both matrices**, each process will only hold a slice of the data, and will need to communicate data with other processes in order to perform the multiplication. This **introduces huge communication overhead**, but allows you to perform the multiplication on larger matrices than would fit in the memory of a single GPU. +# - If **`split` is None for one matrix and not None for the other**, the multiplication does not require communication, and the result will be distributed. If your data size allows it, you should always favor this option. +# +# Time the multiplication for different split parameters and see how the performance changes. +# +# + + +import time + +start = time.time() +z = x @ y +end = time.time() +print("runtime: ", end - start) + + +# Heat supports many linear algebra operations: +# ```bash +# >>> ht.linalg. +# ht.linalg.basics ht.linalg.hsvd_rtol( ht.linalg.projection( ht.linalg.triu( +# ht.linalg.cg( ht.linalg.inv( ht.linalg.qr( ht.linalg.vdot( +# ht.linalg.cross( ht.linalg.lanczos( ht.linalg.solver ht.linalg.vecdot( +# ht.linalg.det( ht.linalg.matmul( ht.linalg.svdtools ht.linalg.vector_norm( +# ht.linalg.dot( ht.linalg.matrix_norm( ht.linalg.trace( +# ht.linalg.hsvd( ht.linalg.norm( ht.linalg.transpose( +# ht.linalg.hsvd_rank( ht.linalg.outer( ht.linalg.tril( +# ``` +# +# and a lot more is in the works, including distributed eigendecompositions, SVD, and more. If the operation you need is not yet supported, leave us a note [here](tinyurl.com/demoissues) and we'll get back to you. + +# You can of course perform all operations on CPUs. You can leave out the `device` attribute entirely. diff --git a/tutorials/scripts/hpc/01_basics/10_interoperability.py b/tutorials/scripts/hpc/01_basics/10_interoperability.py new file mode 100644 index 0000000000..f3ec217425 --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/10_interoperability.py @@ -0,0 +1,26 @@ +# ### Interoperability +# +# We can easily create DNDarrays from PyTorch tensors and numpy ndarrays. We can also convert DNDarrays to PyTorch tensors and numpy ndarrays. This makes it easy to integrate Heat into existing PyTorch and numpy workflows. +# + +# Heat will try to reuse the memory of the original array as much as possible. If you would prefer a copy with different memory, the ```copy``` keyword argument can be used when creating a DNDArray from other libraries. + +import heat as ht +import torch +import numpy as np + +torch_array = torch.arange(ht.MPI_WORLD.rank, ht.MPI_WORLD.rank + 5) +heat_array = ht.array(torch_array, copy=False, is_split=0) +heat_array[0] = -1 +print(torch_array) + +torch_array = torch.arange(ht.MPI_WORLD.rank, ht.MPI_WORLD.rank + 5) +heat_array = ht.array(torch_array, copy=True, is_split=0) +heat_array[0] = -1 +print(torch_array) + +np_array = heat_array.numpy() +print(np_array) + + +# Interoperability is a key feature of Heat, and we are constantly working to increase Heat's compliance to the [Python array API standard](https://data-apis.org/array-api/latest/). As usual, please [let us know](tinyurl.com/demoissues) if you encounter any issues or have any feature requests. diff --git a/tutorials/scripts/hpc/01_basics/11_internals_1.py b/tutorials/scripts/hpc/01_basics/11_internals_1.py new file mode 100644 index 0000000000..d8c1dae30d --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/11_internals_1.py @@ -0,0 +1,44 @@ +import heat as ht +import torch + +# # Heat as infrastructure for MPI applications +# +# In this section, we'll go through some Heat-specific functionalities that simplify the implementation of a data-parallel application in Python. We'll demonstrate them on small arrays and 4 processes on a single cluster node, but the functionalities are indeed meant for a multi-node set up with huge arrays that cannot be processed on a single node. + + +# We already mentioned that the DNDarray object is "MPI-aware". Each DNDarray is associated to an MPI communicator, it is aware of the number of processes in the communicator, and it knows the rank of the process that owns it. +# + +a = ht.random.randn(7, 4, 3, split=1) +if a.comm.rank == 0: + print(f"a.comm gets the communicator {a.comm} associated with DNDarray a") + +# MPI size = total number of processes +size = a.comm.size + +if a.comm.rank == 0: + print(f"a is distributed over {size} processes") + print(f"a is a distributed {a.ndim}-dimensional array with global shape {a.shape}") + + +# MPI rank = rank of each process +rank = a.comm.rank +# Local shape = shape of the data on each process +local_shape = a.lshape +print(f"Rank {rank} holds a slice of a with local shape {local_shape}") + + +# ### Distribution map +# +# In many occasions, when building a memory-distributed pipeline it will be convenient for each rank to have information on what ranks holds which slice of the distributed array. +# +# The `lshape_map` attribute of a DNDarray gathers (or, if possible, calculates) this info from all processes and stores it as metadata of the DNDarray. Because it is meant for internal use, it is stored in a torch tensor, not a DNDarray. +# +# The `lshape_map` tensor is a 2D tensor, where the first dimension is the number of processes and the second dimension is the number of dimensions of the array. Each row of the tensor contains the local shape of the array on a process. + + +lshape_map = a.lshape_map +if a.comm.rank == 0: + print(f"lshape_map available on any process: {lshape_map}") + +# Go back to where we created the DNDarray and and create `a` with a different split axis. See how the `lshape_map` changes. diff --git a/tutorials/scripts/hpc/01_basics/12_internals_2.py b/tutorials/scripts/hpc/01_basics/12_internals_2.py new file mode 100644 index 0000000000..94d71a445d --- /dev/null +++ b/tutorials/scripts/hpc/01_basics/12_internals_2.py @@ -0,0 +1,71 @@ +import heat as ht +import torch + +# ### Modifying the DNDarray distribution +# +# In a distributed pipeline, it is sometimes necessary to change the distribution of a DNDarray, when the array is not distributed in the most convenient way for the next operation / algorithm. +# +# Depending on your needs, you can choose between: +# - `DNDarray.redistribute_()`: This method keeps the original split axis, but redistributes the data of the DNDarray according to a "target map". +# - `DNDarray.resplit_()`: This method changes the split axis of the DNDarray. This is a more expensive operation, and should be used only when absolutely necessary. Depending on your needs and available resources, in some cases it might be wiser to keep a copy of the DNDarray with a different split axis. +# +# Let's see some examples. + +a = ht.random.randn(7, 4, 3, split=1) + +# redistribute +target_map = a.lshape_map +target_map[:, a.split] = torch.tensor([1, 2, 2, 2]) +# in-place redistribution (see ht.redistribute for out-of-place) +a.redistribute_(target_map=target_map) + +# new lshape map after redistribution +a.lshape_map + +# local arrays after redistribution +a.larray + + +# resplit +a.resplit_(axis=1) + +a.lshape_map + + +# You can use the `resplit_` method (in-place), or `ht.resplit` (out-of-place) to change the distribution axis, but also to set the distribution axis to None. The latter corresponds to an MPI.Allgather operation that gathers the entire array on each process. This is useful when you've achieved a small enough data size that can be processed on a single device, and you want to avoid communication overhead. + + +# "un-split" distributed array +a.resplit_(axis=None) +# each process now holds a copy of the entire array + + +# The opposite is not true, i.e. you cannot use `resplit_` to distribute an array with split=None. In that case, you must use the `ht.array()` factory function: + + +# make `a` split again +a = ht.array(a, split=0) + + +# ### Making disjoint data into a global DNDarray +# +# Another common occurrence in a data-parallel pipeline: you have addressed the embarassingly-parallel part of your algorithm with any array framework, each process working independently from the others. You now want to perform a non-embarassingly-parallel operation on the entire dataset, with Heat as a backend. +# +# You can use the `ht.array` factory function with the `is_split` argument to create a DNDarray from a disjoint (on each MPI process) set of arrays. The `is_split` argument indicates the axis along which the disjoint data is to be "joined" into a global, distributed DNDarray. + + +# create some random local arrays on each process +import numpy as np + +local_array = np.random.rand(3, 4) + +# join them into a distributed array +a_0 = ht.array(local_array, is_split=0) +print(a_0.shape) + + +# Change the cell above and join the arrays along a different axis. Note that the shapes of the local arrays must be consistent along the non-split axes. They can differ along the split axis. + +# The `ht.array` function takes any data object as an input that can be converted to a torch tensor. + +# Once you've made your disjoint data into a DNDarray, you can apply any Heat operation or algorithm to it and exploit the cumulative RAM of all the processes in the communicator. diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py b/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py new file mode 100644 index 0000000000..ea8aec1545 --- /dev/null +++ b/tutorials/scripts/hpc/02_loading_preprocessing/01_IO.py @@ -0,0 +1,40 @@ +# # Loading and Preprocessing +# +# ### Refresher +# +# Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. +# +# As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each "worker" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user. +# +# In other words: +# - you don't have to worry about optimizing data chunk sizes; +# - you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; +# - you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs). + +# The following shows some I/O and preprocessing examples. We'll use small datasets here as each of us only has access to one node only. + +# ### I/O +# +# Let's start with loading a data set. Heat supports reading and writing from/into shared memory for a number of formats, including HDF5, NetCDF, and because we love scientists, csv. Check out the `ht.load` and `ht.save` functions for more details. Here we will load data in [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format). +# +# Now let's import `heat` and load a data set. + +import heat as ht + +# Some random data for small scale tests +iris = ht.load("~/heat/tutorials/02_loading_preprocessing/iris.csv", sep=";", split=0) +print(iris) + +# We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0. + +# similar for HDF5 + +# first, we generate some data +X = ht.random.randn(10000, 100, split=0) + +# ... and save it to file +ht.save(X, "~/mydata.h5", "mydata", mode="a") + +# ... then we can load it again +Y = ht.load_hdf5("~/mydata.h5", device="gpu", dataset="mydata", split=0) +print(ht.allclose(X, Y)) diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py b/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py new file mode 100644 index 0000000000..d3195ab5c1 --- /dev/null +++ b/tutorials/scripts/hpc/02_loading_preprocessing/02_preprocessing.py @@ -0,0 +1,69 @@ +import heat as ht + +X = ht.random.randn(1000, 3, split=0, device="gpu") + +# We have loaded the entire data onto 4 MPI processes, each with 12 cores. We have created `X` with `split=0`, so each process stores evenly-sized slices of the data along dimension 0. + +# ### Data exploration +# +# Let's get an idea of the size of the data. + + +# print global metadata once only +if X.comm.rank == 0: + print(f"X is a {X.ndim}-dimensional array with shape{X.shape}") + print(f"X takes up {X.nbytes / 1e6} MB of memory.") + +# X is a matrix of shape *(datapoints, features)*. +# +# To get a first overview, we can print the data and determine its feature-wise mean, variance, min, max etc. These are reduction operations along the datapoints dimension, which is also the `split` dimension. You don't have to implement [`MPI.Allreduce`](https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) operations yourself, communication is handled by Heat operations. + + +features_mean = ht.mean(X, axis=0) +features_var = ht.var(X, axis=0) +features_max = ht.max(X, axis=0) +features_min = ht.min(X, axis=0) +# ht.percentile is buggy, see #1389, we'll leave it out for now +# features_median = ht.percentile(X,50.,axis=0) + + +if ht.MPI_WORLD.rank == 0: + print(f"Mean: {features_mean}") + print(f"Var: {features_var}") + print(f"Max: {features_max}") + print(f"Min: {features_min}") + + +# Note that the `features_...` DNDarrays are no longer distributed, i.e. a copy of these results exists on each GPU, as the split dimension of the input data has been lost in the reduction operations. + +# ### Preprocessing/scaling +# +# Next, we can preprocess the data, e.g., by standardizing and/or normalizing. Heat offers several preprocessing routines for doing so, the API is similar to [`sklearn.preprocessing`](https://scikit-learn.org/stable/modules/preprocessing.html) so adapting existing code shouldn't be too complicated. +# +# Again, please let us know if you're missing any features. + + +# Standard Scaler +scaler = ht.preprocessing.StandardScaler() +X_standardized = scaler.fit_transform(X) +standardized_mean = ht.mean(X_standardized, axis=0) +standardized_var = ht.var(X_standardized, axis=0) + +if ht.MPI_WORLD.rank == 0: + print(f"Standard Scaler Mean: {standardized_mean}") + print(f"Standard Scaler Var: {standardized_var}") + +# Robust Scaler +scaler = ht.preprocessing.RobustScaler() +X_robust = scaler.fit_transform(X) +robust_mean = ht.mean(X_robust, axis=0) +robust_var = ht.var(X_robust, axis=0) + +if ht.MPI_WORLD.rank == 0: + print(f"Robust Scaler Mean: {robust_mean}") + print(f"Robust Scaler Median: {robust_var}") + + +# Within Heat, you have several options to apply memory-distributed machine learning algorithms on your data. +# +# Is the algorithm you're looking for not yet implemented? [Let us know](https://github.com/helmholtz-analytics/heat/issues/new/choose)! diff --git a/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv b/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv new file mode 100644 index 0000000000..8bc57da193 --- /dev/null +++ b/tutorials/scripts/hpc/02_loading_preprocessing/iris.csv @@ -0,0 +1,150 @@ +5.1;3.5;1.4;0.2 +4.9;3.0;1.4;0.2 +4.7;3.2;1.3;0.2 +4.6;3.1;1.5;0.2 +5.0;3.6;1.4;0.2 +5.4;3.9;1.7;0.4 +4.6;3.4;1.4;0.3 +5.0;3.4;1.5;0.2 +4.4;2.9;1.4;0.2 +4.9;3.1;1.5;0.1 +5.4;3.7;1.5;0.2 +4.8;3.4;1.6;0.2 +4.8;3.0;1.4;0.1 +4.3;3.0;1.1;0.1 +5.8;4.0;1.2;0.2 +5.7;4.4;1.5;0.4 +5.4;3.9;1.3;0.4 +5.1;3.5;1.4;0.3 +5.7;3.8;1.7;0.3 +5.1;3.8;1.5;0.3 +5.4;3.4;1.7;0.2 +5.1;3.7;1.5;0.4 +4.6;3.6;1.0;0.2 +5.1;3.3;1.7;0.5 +4.8;3.4;1.9;0.2 +5.0;3.0;1.6;0.2 +5.0;3.4;1.6;0.4 +5.2;3.5;1.5;0.2 +5.2;3.4;1.4;0.2 +4.7;3.2;1.6;0.2 +4.8;3.1;1.6;0.2 +5.4;3.4;1.5;0.4 +5.2;4.1;1.5;0.1 +5.5;4.2;1.4;0.2 +4.9;3.1;1.5;0.1 +5.0;3.2;1.2;0.2 +5.5;3.5;1.3;0.2 +4.9;3.1;1.5;0.1 +4.4;3.0;1.3;0.2 +5.1;3.4;1.5;0.2 +5.0;3.5;1.3;0.3 +4.5;2.3;1.3;0.3 +4.4;3.2;1.3;0.2 +5.0;3.5;1.6;0.6 +5.1;3.8;1.9;0.4 +4.8;3.0;1.4;0.3 +5.1;3.8;1.6;0.2 +4.6;3.2;1.4;0.2 +5.3;3.7;1.5;0.2 +5.0;3.3;1.4;0.2 +7.0;3.2;4.7;1.4 +6.4;3.2;4.5;1.5 +6.9;3.1;4.9;1.5 +5.5;2.3;4.0;1.3 +6.5;2.8;4.6;1.5 +5.7;2.8;4.5;1.3 +6.3;3.3;4.7;1.6 +4.9;2.4;3.3;1.0 +6.6;2.9;4.6;1.3 +5.2;2.7;3.9;1.4 +5.0;2.0;3.5;1.0 +5.9;3.0;4.2;1.5 +6.0;2.2;4.0;1.0 +6.1;2.9;4.7;1.4 +5.6;2.9;3.6;1.3 +6.7;3.1;4.4;1.4 +5.6;3.0;4.5;1.5 +5.8;2.7;4.1;1.0 +6.2;2.2;4.5;1.5 +5.6;2.5;3.9;1.1 +5.9;3.2;4.8;1.8 +6.1;2.8;4.0;1.3 +6.3;2.5;4.9;1.5 +6.1;2.8;4.7;1.2 +6.4;2.9;4.3;1.3 +6.6;3.0;4.4;1.4 +6.8;2.8;4.8;1.4 +6.7;3.0;5.0;1.7 +6.0;2.9;4.5;1.5 +5.7;2.6;3.5;1.0 +5.5;2.4;3.8;1.1 +5.5;2.4;3.7;1.0 +5.8;2.7;3.9;1.2 +6.0;2.7;5.1;1.6 +5.4;3.0;4.5;1.5 +6.0;3.4;4.5;1.6 +6.7;3.1;4.7;1.5 +6.3;2.3;4.4;1.3 +5.6;3.0;4.1;1.3 +5.5;2.5;4.0;1.3 +5.5;2.6;4.4;1.2 +6.1;3.0;4.6;1.4 +5.8;2.6;4.0;1.2 +5.0;2.3;3.3;1.0 +5.6;2.7;4.2;1.3 +5.7;3.0;4.2;1.2 +5.7;2.9;4.2;1.3 +6.2;2.9;4.3;1.3 +5.1;2.5;3.0;1.1 +5.7;2.8;4.1;1.3 +6.3;3.3;6.0;2.5 +5.8;2.7;5.1;1.9 +7.1;3.0;5.9;2.1 +6.3;2.9;5.6;1.8 +6.5;3.0;5.8;2.2 +7.6;3.0;6.6;2.1 +4.9;2.5;4.5;1.7 +7.3;2.9;6.3;1.8 +6.7;2.5;5.8;1.8 +7.2;3.6;6.1;2.5 +6.5;3.2;5.1;2.0 +6.4;2.7;5.3;1.9 +6.8;3.0;5.5;2.1 +5.7;2.5;5.0;2.0 +5.8;2.8;5.1;2.4 +6.4;3.2;5.3;2.3 +6.5;3.0;5.5;1.8 +7.7;3.8;6.7;2.2 +7.7;2.6;6.9;2.3 +6.0;2.2;5.0;1.5 +6.9;3.2;5.7;2.3 +5.6;2.8;4.9;2.0 +7.7;2.8;6.7;2.0 +6.3;2.7;4.9;1.8 +6.7;3.3;5.7;2.1 +7.2;3.2;6.0;1.8 +6.2;2.8;4.8;1.8 +6.1;3.0;4.9;1.8 +6.4;2.8;5.6;2.1 +7.2;3.0;5.8;1.6 +7.4;2.8;6.1;1.9 +7.9;3.8;6.4;2.0 +6.4;2.8;5.6;2.2 +6.3;2.8;5.1;1.5 +6.1;2.6;5.6;1.4 +7.7;3.0;6.1;2.3 +6.3;3.4;5.6;2.4 +6.4;3.1;5.5;1.8 +6.0;3.0;4.8;1.8 +6.9;3.1;5.4;2.1 +6.7;3.1;5.6;2.4 +6.9;3.1;5.1;2.3 +5.8;2.7;5.1;1.9 +6.8;3.2;5.9;2.3 +6.7;3.3;5.7;2.5 +6.7;3.0;5.2;2.3 +6.3;2.5;5.0;1.9 +6.5;3.0;5.2;2.0 +6.2;3.4;5.4;2.3 +5.9;3.0;5.1;1.8 diff --git a/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py b/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py new file mode 100644 index 0000000000..1543c81efe --- /dev/null +++ b/tutorials/scripts/hpc/03_matrix_factorizations/matrix_factorizations.py @@ -0,0 +1,99 @@ +# # Matrix factorizations +# +# ### Refresher +# +# Using PyTorch as compute engine and mpi4py for communication, Heat implements a number of array operations and algorithms that are optimized for memory-distributed data volumes. This allows you to tackle datasets that are too large for single-node (or worse, single-GPU) processing. +# +# As opposed to task-parallel frameworks, Heat takes a data-parallel approach, meaning that each "worker" or MPI process performs the same tasks on different slices of the data. Many operations and algorithms are not embarassingly parallel, and involve data exchange between processes. Heat operations and algorithms are designed to minimize this communication overhead, and to make it transparent to the user. +# +# In other words: +# - you don't have to worry about optimizing data chunk sizes; +# - you don't have to make sure your research problem is embarassingly parallel, or artificially make your dataset smaller so your RAM is sufficient; +# - you do have to make sure that you have sufficient **overall** RAM to run your global task (e.g. number of nodes / GPUs). + +# In the following, we will demonstrate the usage of Heat's truncated SVD algorithm. + +# ### SVD and its truncated counterparts in a nutshell +# +# Let $X \in \mathbb{R}^{m \times n}$ be a matrix, e.g., given by a data set consisting of $m$ data points $\in \mathbb{R}^n$ stacked together. The so-called **singular value decomposition (SVD)** of $X$ is given by +# +# $$ +# X = U \Sigma V^T +# $$ +# +# where $U \in \mathbb{R}^{m \times r_X}$ and $V \in \mathbb{R}^{n \times r_X}$ have orthonormal columns, $\Sigma = \text{diag}(\sigma_1,...,\sigma_{r_X}) \in \mathbb{R}^{r_X \times r_X}$ is a diagonal matrix containing the so-called singular values $\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_{r_X} > 0$, and $r_X \leq \min(m,n)$ denotes the rank of $X$ (i.e. the dimension of the subspace of $\mathbb{R}^m$ spanned by the columns of $X$). Since $\Sigma = U^T X V$ is diagonal, one can imagine this decomposition as finding orthogonal coordinate transformations under which $X$ looks "linear". + +# ### SVD in data science +# +# In data science, SVD is more often known as **principle component analysis (PCA)**, the columns of $U$ being called the principle components of $X$. In fact, in many applications **truncated SVD/PCA** suffices: to reduce $X$ to the "essential" information, one chooses a truncation rank $0 < r \leq r_X$ and considers the truncated SVD/PCA given by +# +# $$ +# X \approx X_r := U_{[:,:r]} \Sigma_{[:r,:r]} V_{[:,:r]}^T +# $$ +# +# where we have used `numpy`-like notation for selecting only the first $r$ columns of $U$ and $V$, respectively. The rationale behind this is that if the first $r$ singular values of $X$ are much larger than the remaining ones, $X_r$ will still contain all "essential" information contained in $X$; in mathematical terms: +# +# $$ +# \lVert X_r - X \rVert_{F}^2 = \sum_{i=r+1}^{r_X} \sigma_i^2, +# $$ +# +# where $\lVert \cdot \rVert_F$ denotes the Frobenius norm. Thus, truncated SVD/PCA may be used for, e.g., +# * filtering away non-essential information in order to get a "feeling" for the main characteristics of your data set, +# * to detect linear (or "almost" linear) dependencies in your data, +# * to generate features for further processing of your data. +# +# Moreover, there is a plenty of more advanced data analytics and data-based simulation techniques, such as, e.g., Proper Orthogonal Decomposition (POD) or Dynamic Mode Decomposition (DMD), that are based on SVD/PCA. + +# ### Truncated SVD in Heat +# +# In Heat we have currently implemented an algorithm for computing an approximate truncated SVD, where truncation takes place either w.r.t. a fixed truncation-rank (`heat.linalg.hsvd_rank`) or w.r.t. a desired accuracy (`heat.linalg.hsvd_rtol`). In the latter case it can be ensured that it holds for the "reconstruction error": +# +# $$ +# \frac{\lVert X - U U^T X \rVert_F}{\lVert X \rVert_F} \overset{!}{\leq} \text{rtol}, +# $$ +# +# where $U$ denotes the approximate left-singular vectors of $X$ computed by `heat.linalg.hsvd_rtol`. +# + +# To demonstrate the usage of Heat's truncated SVD algorithm, we will load the data set from the last example and then compute its truncated SVD. As usual, first we need to gain access to the MPI environment. + + +import heat as ht + +X = ht.load_hdf5("~/mydata.h5", dataset="mydata", split=0).T + + +# Note that due to the transpose, `X` is distributed along the columns now; this is required by the hSVD-algorithm. + +# Let's first compute the truncated SVD by setting the relative tolerance. + + +# compute truncated SVD w.r.t. relative tolerance +svd_with_reltol = ht.linalg.hsvd_rtol(X, rtol=1.0e-2, compute_sv=True, silent=False) +print("relative residual:", svd_with_reltol[3], "rank: ", svd_with_reltol[0].shape[1]) + + +# Alternatively, you can compute a truncated SVD with a fixed truncation rank: + +# compute truncated SVD w.r.t. a fixed truncation rank +svd_with_rank = ht.linalg.hsvd_rank(X, maxrank=3, compute_sv=True, silent=False) +print("relative residual:", svd_with_rank[3], "rank: ", svd_with_rank[0].shape[1]) + +# Once we have computed the truncated SVD, we can use it to approximate the original data matrix `X` by the truncated matrix `X_r`. +# +# Check out https://helmholtz-analytics.github.io/heat/2023/06/16/new-feature-hsvd.html to see how Heat's truncated SVD algorithm scales with the number of MPI processes and size of the dataset. + +# ### Other factorizations +# +# Other common factorization algorithms are supported in Heat, such as: +# - QR decomposition (`heat.linalg.qr`) +# - Lanczos algorithm for computing the largest eigenvalues and corresponding eigenvectors (`heat.linalg.lanczos`) +# +# Check out our [`linalg` PRs](https://github.com/helmholtz-analytics/heat/pulls?q=is%3Aopen+is%3Apr+label%3Alinalg) to see what's in progress. +# + +# **References for hierarchical SVD** +# +# 1. Iwen, Ong. *A distributed and incremental SVD algorithm for agglomerative data analysis on large networks.* SIAM J. Matrix Anal. Appl., **37** (4), 2016. +# 2. Himpe, Leibner, Rave. *Hierarchical approximate proper orthogonal decomposition.* SIAM J. Sci. Comput., **4** (5), 2018. +# 3. Halko, Martinsson, Tropp. *Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions.* SIAM Rev. 53, **2** (2011) diff --git a/tutorials/scripts/hpc/04_clustering/clustering.py b/tutorials/scripts/hpc/04_clustering/clustering.py new file mode 100644 index 0000000000..85c6e2c5e3 --- /dev/null +++ b/tutorials/scripts/hpc/04_clustering/clustering.py @@ -0,0 +1,68 @@ +# Cluster Analysis +# ================ +# +# This tutorial is an interactive version of our static [clustering tutorial on ReadTheDocs](https://heat.readthedocs.io/en/stable/tutorial_clustering.html). +# +# We will demonstrate memory-distributed analysis with k-means and k-medians from the ``heat.cluster`` module. As usual, we will run the analysis on a small dataset for demonstration. We need to have an `ipcluster` running to distribute the computation. +# +# We will use matplotlib for visualization of data and results. + + +import heat as ht + +# The Iris Dataset +# ------------------------------ +# The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from +# three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in the [Heat repository under 'heat/heat/datasets'](https://github.com/helmholtz-analytics/heat/tree/main/heat/datasets), and can be loaded in a distributed manner with Heat's parallel dataloader. +# +# **NOTE: you might have to change the path to the dataset in the following cell.** + +iris = ht.load("~/heat/tutorials/hpc/02_loading_preprocessing/iris.csv", sep=";", split=0) + + +# Feel free to try out the other [loading options](https://heat.readthedocs.io/en/stable/autoapi/heat/core/io/index.html#heat.core.io.load) as well. +# +# Fitting the dataset with `kmeans`: + +k = 3 +kmeans = ht.cluster.KMeans(n_clusters=k, init="kmeans++") +kmeans.fit(iris) + +# Let's see what the results are. In theory, there are 50 samples of each of the 3 iris types: setosa, versicolor and virginica. We will plot the results in a 3D scatter plot, coloring the samples according to the assigned cluster. + +labels = kmeans.predict(iris).squeeze() + +# Select points assigned to clusters c1, c2 and c3 +c1 = iris[ht.where(labels == 0), :] +c2 = iris[ht.where(labels == 1), :] +c3 = iris[ht.where(labels == 2), :] +# After slicing, the arrays are not distributed equally among the processes anymore; we need to balance +# TODO is balancing really necessary? +c1.balance_() +c2.balance_() +c3.balance_() + +print( + f"Number of points assigned to c1: {c1.shape[0]} \n" + f"Number of points assigned to c2: {c2.shape[0]} \n" + f"Number of points assigned to c3: {c3.shape[0]}" +) + + +# compare Heat results with sklearn +from sklearn.cluster import KMeans +import sklearn.datasets + +k = 3 +iris_sk = sklearn.datasets.load_iris().data +kmeans_sk = KMeans(n_clusters=k, init="k-means++").fit(iris_sk) +labels_sk = kmeans_sk.predict(iris_sk) + +c1_sk = iris_sk[labels_sk == 0, :] +c2_sk = iris_sk[labels_sk == 1, :] +c3_sk = iris_sk[labels_sk == 2, :] +print( + f"Number of points assigned to c1: {c1_sk.shape[0]} \n" + f"Number of points assigned to c2: {c2_sk.shape[0]} \n" + f"Number of points assigned to c3: {c3_sk.shape[0]}" +) diff --git a/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py b/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py new file mode 100644 index 0000000000..42e215a52a --- /dev/null +++ b/tutorials/scripts/hpc/05_your_turn/now_its_your_turn.py @@ -0,0 +1,44 @@ +import heat as ht +import numpy as np +import h5py + +# Now its your turn! Download one of the following three data sets and play around with it. +# Possible ideas: +# get familiar with the data: shape, min, max, avg, std (possibly along axes?) +# try SVD and/or QR to detect linear dependence +# K-Means Clustering (Asteroids, CERN?) +# Lasso (CERN?) +# n-dim FFT (CAMELS?)... + + +# "Asteroids": Asteroids of the Solar System +# Download the data set of the asteroids from the JPL Small Body Database from https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html#/ +# and load the resulting csv file to Heat. + + +# ... to be completed ... + +# "CAMELS": 1000 simulated universes on 128 x 128 x 128 grids +# Take a bunch of 1000 simulated universes from the CAMELS data set (8GB): +# ``` +# wget https://users.flatironinstitute.org/~fvillaescusa/priv/DEPnzxoWlaTQ6CjrXqsm0vYi8L7Jy/CMD/3D_grids/data/Nbody/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy ~/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy +# ``` +# load them in NumPy, convert to PyTorch and Heat... + +X_np = np.load("~/Grids_Mtot_Nbody_Astrid_LH_128_z=0.0.npy") + +# ... to be completed ... + +# "CERN": A particle physics data set from CERN +# Take a small part of the ATLAS Top Tagging Data Set from CERN (7.6GB, actually the "test"-part; the "train" part is much larger...) +# ``` +# wget https://opendata.cern.ch/record/15013/files/test.h5 ~/test.h5 +# ``` +# and load it directly into Heat (watch out: the h5-file contains different data sets that need to be stacked...) + +filename = "~/test.h5" +with h5py.File(filename, "r") as f: + features = f.keys() + arrays = [ht.load_hdf5(filename, feature, split=0) for feature in features] + +# ... to be completed ... diff --git a/tutorials/scripts/hpc/README.md b/tutorials/scripts/hpc/README.md new file mode 100644 index 0000000000..53304b16a1 --- /dev/null +++ b/tutorials/scripts/hpc/README.md @@ -0,0 +1,17 @@ +There are two example scripts in this directory, `slurm_script_cpu.sh` and `slurm_script_gpu.sh`, that demonstrate how to run a Heat application on an HPC-system with SLURM as resource manager. + +1. `slurm_script_cpu.sh` is an example script that runs a Heat application on a CPU node. You must specify the name of the respective partition of your cluster. Moreover, the + numer of CPU cores available at a node of your system must be greater or equal to the product of the tasks-per-node- and the cpus-per-task-argument (=8x16=128 in the example). + +2. `slurm_script_gpu.sh` is an example script that runs a Heat application on a GPU node. You must specify the name of the respective partition of your cluster. Moreover, the + numer of GPU devices available at a node of your system must be greater or equal to the number of GPUs requested in the script (=4 in the example). + +## Remarks + +* Please have a look into the documentation of your HPC-system for its detailed configuration and properties. Maybe, you have to adjust the script to your system. +* You need to load the required modules (e.g., for MPI, CUDA etc.) from the module system of your HPC-system before running the script. Moreover, you need to install Heat in a virtual environment (and activate it). Alternatively, you may use spack (if available on your system) for installing Heat and its dependencies. +* Depending on the configuration of SLURM and MPI on your system it might be the case that you need to replace `srun python ...` by + ``` + mpirun -n $SLURM_NTASKS --bind-to hwthread --map-by socket:PE=${SLURM_CPUS_PER_TASK} python ... + ``` + or similar. diff --git a/tutorials/scripts/hpc/slurm_script_cpu.sh b/tutorials/scripts/hpc/slurm_script_cpu.sh new file mode 100644 index 0000000000..6e534d3309 --- /dev/null +++ b/tutorials/scripts/hpc/slurm_script_cpu.sh @@ -0,0 +1,12 @@ +#!/bin/bash + +#SBATCH --partition= +#SBATCH --nodes=1 +#SBATCH --tasks-per-node=8 +#SBATCH --cpus-per-task=16 +#SBATCH --time="00:01:00" + +export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK + +srun python ~/heat/tutorials/hpc/01_basics/01_basics_dndarrays.py diff --git a/tutorials/scripts/hpc/slurm_script_gpu.sh b/tutorials/scripts/hpc/slurm_script_gpu.sh new file mode 100644 index 0000000000..9ffdc619f6 --- /dev/null +++ b/tutorials/scripts/hpc/slurm_script_gpu.sh @@ -0,0 +1,13 @@ +#!/bin/bash + +#SBATCH --partition= +#SBATCH --nodes=1 +#SBATCH --tasks-per-node=4 +#SBATCH --cpus-per-task=2 +#SBATCH --gres=gpu:4 +#SBATCH --time="00:01:00" + +export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK + +srun python ~/heat/tutorials/hpc/01_basics/01_basics_dndarrays.py From 3c2b559b70f4950275937a898d560c12065d48ea Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 30 Sep 2025 13:18:36 +0200 Subject: [PATCH 02/15] Sturdier MPI Check (#1926) (#1979) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * sturdier mpi check * GPU_AWARE as relevant global check, checks can depend on version now * gpu aware compatiblity changes * wip: communication layer fixes for gpu_aware_mpi * _moveToCompDevice function * fix: disabled OpenMPI cuda awareness due to unreliable tests and benchmarks --------- (cherry picked from commit 363766239475f0e074eb44477cf0a068a86f13e2) Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> --- .perun.ini | 2 +- .pre-commit-config.yaml | 8 +- benchmarks/cb/heat_signal.py | 6 +- benchmarks/cb/main.py | 6 ++ heat/cli.py | 6 +- heat/core/_config.py | 113 ++++++++++++++++------ heat/core/communication.py | 129 ++++++++++++++------------ heat/core/linalg/tests/test_basics.py | 2 +- heat/core/linalg/tests/test_qr.py | 4 +- heat/core/tests/test_communication.py | 10 +- 10 files changed, 179 insertions(+), 107 deletions(-) diff --git a/.perun.ini b/.perun.ini index b594eac4df..99863465da 100644 --- a/.perun.ini +++ b/.perun.ini @@ -19,7 +19,7 @@ format = bench data_out = ./bench_data [benchmarking] -rounds = 10 +rounds = 5 warmup_rounds = 1 metrics = runtime,energy region_metrics = runtime,power diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4b2b239cc2..08262173e8 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -39,10 +39,12 @@ repos: - id: "validate-cff" args: - "--verbose" - - repo: https://github.com/gitleaks/gitleaks - rev: v8.28.0 + - repo: https://github.com/thoughtworks/talisman + rev: 'v1.37.0' # Update me! hooks: - - id: gitleaks + # both pre-commit and pre-push supported + # - id: talisman-push + - id: talisman-commit - repo: https://github.com/shellcheck-py/shellcheck-py rev: v0.11.0.1 hooks: diff --git a/benchmarks/cb/heat_signal.py b/benchmarks/cb/heat_signal.py index 9ecf26a443..57f986f066 100644 --- a/benchmarks/cb/heat_signal.py +++ b/benchmarks/cb/heat_signal.py @@ -43,7 +43,7 @@ def convolution_batch_processing_stride(signal, kernel, stride): def run_signal_benchmarks(): - n_s = 1000000000 + n_s = 2000000 n_k = 10003 stride = 3 @@ -75,8 +75,8 @@ def run_signal_benchmarks(): del signal, kernel # batch processing - n_s = 90000 - n_b = 90000 + n_s = 50000 + n_b = 1000 n_k = 503 signal = ht.random.random((n_b, n_s), split=0) kernel = ht.random.random_integer(0, 1, (n_b, n_k), split=0) diff --git a/benchmarks/cb/main.py b/benchmarks/cb/main.py index 2dd4680ae0..9b958920e6 100644 --- a/benchmarks/cb/main.py +++ b/benchmarks/cb/main.py @@ -18,8 +18,14 @@ from heat_signal import run_signal_benchmarks run_linalg_benchmarks() +print("Linalg finished") run_cluster_benchmarks() +print("Cluster finished") run_manipulation_benchmarks() +print("Manipulation finished") run_preprocessing_benchmarks() +print("Preprocessing finished") run_decomposition_benchmarks() +print("Decomposition finished") run_signal_benchmarks() +print("Signal finished") diff --git a/heat/cli.py b/heat/cli.py index 29a91fdbc7..defe338102 100644 --- a/heat/cli.py +++ b/heat/cli.py @@ -8,7 +8,7 @@ import argparse from heat.core.version import __version__ as ht_version -from heat.core.communication import CUDA_AWARE_MPI +from heat.core._config import CUDA_AWARE_MPI, ROCM_AWARE_MPI, GPU_AWARE_MPI def cli() -> None: @@ -49,6 +49,8 @@ def plaform_info(): print(f" Device name: {torch.cuda.get_device_name(def_device)}") print(f" Device name: {torch.cuda.get_device_properties(def_device)}") print( - f" Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB" + f" Device memory: {torch.cuda.get_device_properties(def_device).total_memory / 1024**3} GiB" ) print(f" CUDA Aware MPI: {CUDA_AWARE_MPI}") + print(f" ROCM Aware MPI: {ROCM_AWARE_MPI}") + print(f" GPU Aware MPI: {GPU_AWARE_MPI}") diff --git a/heat/core/_config.py b/heat/core/_config.py index da327835a1..48d0a3e22b 100644 --- a/heat/core/_config.py +++ b/heat/core/_config.py @@ -9,41 +9,94 @@ import os import warnings import re +import dataclasses +from enum import Enum -PLATFORM = platform.platform() -MPI_LIBRARY_VERSION = mpi4py.MPI.Get_library_version() -TORCH_VERSION = torch.__version__ -TORCH_CUDA_IS_AVAILABLE = torch.cuda.is_available() -CUDA_IS_ACTUALLY_ROCM = "rocm" in TORCH_VERSION -CUDA_AWARE_MPI = False -ROCM_AWARE_MPI = False +class MPILibrary(Enum): + OpenMPI = "ompi" + IntelMPI = "impi" + MVAPICH = "mvapich" + MPICH = "mpich" + CrayMPI = "craympi" + ParastationMPI = "psmpi" + + +@dataclasses.dataclass +class MPILibraryInfo: + name: MPILibrary + version: str + -# check whether there is CUDA- or ROCm-aware OpenMPI -try: - buffer = subprocess.check_output(["ompi_info", "--parsable", "--all"]) - CUDA_AWARE_MPI = b"mpi_built_with_cuda_support:value:true" in buffer - pattern = re.compile(r"^MPI extensions:.*", re.MULTILINE) - match = pattern.search(buffer) - ROCM_AWARE_MPI = "rocm" in match.group(0) -except: # noqa E722 - pass +def _get_mpi_library() -> MPILibraryInfo: + library = mpi4py.MPI.Get_library_version().split() + match library: + case ["Open", "MPI", *_]: + return MPILibraryInfo(MPILibrary.OpenMPI, library[2]) + case ["Intel(R)", "MPI", *_]: + return MPILibraryInfo(MPILibrary.IntelMPI, library[3]) + case ["MPICH", "Version:", *_]: + return MPILibraryInfo(MPILibrary.MPICH, library[2]) + ### Missing libraries + case _: + print("Did not find a matching library") -# do the same for MVAPICH -CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MV2_USE_CUDA") == "1" -CUDA_AWARE_MPI = ROCM_AWARE_MPI or os.environ.get("MV2_USE_ROCM") == "1" -# do the same for MPICH, TODO: outdated? -CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MPIR_CVAR_ENABLE_HCOLL") == "1" +def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]: + match library.name: + case MPILibrary.OpenMPI: + try: + parsable_ompi_info = subprocess.check_output( + ["ompi_info", "--parsable", "--all"] + ).decode("utf-8") + ompi_info = subprocess.check_output(["ompi_info"]).decode("utf-8") -# Cray MPICH -CUDA_AWARE_MPI = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" -ROCM_AWARE_MPI = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" + # Check for CUDA support flag + cuda_support_flag = "mpi_built_with_cuda_support:value:true" in parsable_ompi_info -# do the same for ParaStationMPI, seems to have CUDA-support only -CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("PSP_CUDA") == "1" + # Check for extensions + match = re.search(r"MPI extensions: (.*)", ompi_info) + extensions = [ext.strip() for ext in match.group(0).split(":")[1].split(",")] + cuda = cuda_support_flag and "cuda" in extensions + if library.version.startswith("v4."): + rocm = cuda + elif library.version.startswith("v5."): + rocm = "rocm" in extensions or "hip" in extensions + # Seems to be broken, disabled by default for now + # return cuda, rocm + return False, False + except Exception as e: # noqa E722 + return False, False + case MPILibrary.IntelMPI: + return False, False + case MPILibrary.MVAPICH: + cuda = os.environ.get("MV2_USE_CUDA") == "1" + rocm = os.environ.get("MV2_USE_ROCM") == "1" + return cuda, rocm + case MPILibrary.MPICH: + cuda = os.environ.get("MPIR_CVAR_ENABLE_HCOLL") == "1" + rocm = False + return cuda, rocm + case MPILibrary.CrayMPI: + cuda = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" + rocm = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" + return cuda, rocm + case MPILibrary.ParastationMPI: + cuda = os.environ.get("PSP_CUDA") == "1" + rocm = False + return cuda, rocm + case _: + return False, False -# Intel-MPI? + +PLATFORM = platform.platform() +TORCH_VERSION = torch.__version__ +TORCH_CUDA_IS_AVAILABLE = torch.cuda.is_available() +CUDA_IS_ACTUALLY_ROCM = "rocm" in TORCH_VERSION + +mpi_library = _get_mpi_library() +CUDA_AWARE_MPI, ROCM_AWARE_MPI = _check_gpu_aware_mpi(mpi_library) +GPU_AWARE_MPI = False # warn the user if CUDA/ROCm-aware MPI is not available, but PyTorch can use GPUs with CUDA/ROCm if TORCH_CUDA_IS_AVAILABLE: @@ -52,11 +105,11 @@ f"Heat has CUDA GPU-support (PyTorch version {TORCH_VERSION} and `torch.cuda.is_available() = True`), but CUDA-awareness of MPI could not be detected. This may lead to performance degradation as direct MPI-communication between GPUs is not possible.", UserWarning, ) + elif CUDA_IS_ACTUALLY_ROCM and not ROCM_AWARE_MPI: warnings.warn( f"Heat has ROCm GPU-support (PyTorch version {TORCH_VERSION} and `torch.cuda.is_available() = True`), but ROCm-awareness of MPI could not be detected. This may lead to performance degradation as direct MPI-communication between GPUs is not possible.", UserWarning, ) - GPU_AWARE_MPI = True -else: - GPU_AWARE_MPI = False + else: + GPU_AWARE_MPI = True diff --git a/heat/core/communication.py b/heat/core/communication.py index eb3443bc10..3387ad7326 100644 --- a/heat/core/communication.py +++ b/heat/core/communication.py @@ -12,9 +12,10 @@ from mpi4py import MPI from typing import Any, Callable, Optional, List, Tuple, Union + from .stride_tricks import sanitize_axis -from ._config import CUDA_AWARE_MPI +from ._config import GPU_AWARE_MPI class MPIRequest: @@ -57,7 +58,7 @@ def Wait(self, status: MPI.Status = None): if self.tensor is not None and isinstance(self.tensor, torch.Tensor): if self.permutation is not None: self.recvbuf = self.recvbuf.permute(self.permutation) - if self.tensor is not None and self.tensor.is_cuda and not CUDA_AWARE_MPI: + if self.tensor is not None and self.tensor.is_cuda and not GPU_AWARE_MPI: self.tensor.copy_(self.recvbuf) def __getattr__(self, name: str) -> Callable: @@ -389,6 +390,17 @@ def as_buffer( obj.squeeze_(-1) return [mpi_mem, elements, mpi_type] + def _moveToCompDevice(self, x: torch.Tensor, func: Callable | None) -> torch.Tensor: + """Moves the torch tensor to the relevant device, in case the function is not compatible with the MPI+GPU library.""" + if x.is_cuda: + if GPU_AWARE_MPI: + torch.cuda.synchronize(x.device) + return x + else: + return x.cpu() + else: + return x + def alltoall_sendbuffer( self, obj: torch.Tensor ) -> List[Union[MPI.memory, Tuple[int, int], MPI.Datatype]]: @@ -534,7 +546,7 @@ def Irecv( if not isinstance(buf, torch.Tensor): return MPIRequest(self.handle.Irecv(buf, source, tag)) - rbuf = buf if CUDA_AWARE_MPI else buf.cpu() + rbuf = self._moveToCompDevice(buf, self.handle.Irecv) return MPIRequest(self.handle.Irecv(self.as_buffer(rbuf), source, tag), None, rbuf, buf) Irecv.__doc__ = MPI.Comm.Irecv.__doc__ @@ -565,10 +577,10 @@ def Recv( if not isinstance(buf, torch.Tensor): return self.handle.Recv(buf, source, tag, status) - rbuf = buf if CUDA_AWARE_MPI else buf.cpu() + rbuf = self._moveToCompDevice(buf, self.handle.Recv) ret = self.handle.Recv(self.as_buffer(rbuf), source, tag, status) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -597,7 +609,8 @@ def __send_like( return func(buf, dest, tag), None # in case of GPUs, the memory has to be copied to host memory if CUDA-aware MPI is not supported - sbuf = buf if CUDA_AWARE_MPI else buf.cpu() + sbuf = self._moveToCompDevice(buf, func) + return func(self.as_buffer(sbuf), dest, tag), sbuf def Bsend(self, buf: Union[DNDarray, torch.Tensor, Any], dest: int, tag: int = 0): @@ -765,7 +778,7 @@ def __broadcast_like( if not isinstance(buf, torch.Tensor): return func(buf, root), None, None, None - srbuf = buf if CUDA_AWARE_MPI else buf.cpu() + srbuf = self._moveToCompDevice(buf, func) return func(self.as_buffer(srbuf), root), srbuf, srbuf, buf @@ -781,7 +794,7 @@ def Bcast(self, buf: Union[DNDarray, torch.Tensor, Any], root: int = 0) -> None: Rank of the root process, that broadcasts the message """ ret, sbuf, rbuf, buf = self.__broadcast_like(self.handle.Bcast, buf, root) - if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -941,7 +954,7 @@ def __reduce_like( if isinstance(recvbuf, torch.Tensor): # Datatype and count shall be derived from the recv buffer, and applied to both, as they should match after the last code block buf = recvbuf - rbuf = recvbuf if CUDA_AWARE_MPI else recvbuf.cpu() + rbuf = self._moveToCompDevice(buf, func) recvbuf: Tuple[MPI.memory, int, MPI.Datatype] = self.as_buffer(rbuf, is_contiguous=True) if not recvbuf[2].is_predefined: # If using a derived datatype, we need to define the reduce operation to be able to handle the it. @@ -949,7 +962,7 @@ def __reduce_like( op = derived_op if isinstance(sendbuf, torch.Tensor): - sbuf = sendbuf if CUDA_AWARE_MPI else sendbuf.cpu() + sbuf = self._moveToCompDevice(sendbuf, func) sendbuf = (self.as_mpi_memory(sbuf), recvbuf[1], recvbuf[2]) # perform the actual reduction operation @@ -974,7 +987,7 @@ def Allreduce( The operation to perform upon reduction """ ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Allreduce, sendbuf, recvbuf, op) - if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -999,7 +1012,7 @@ def Exscan( The operation to perform upon reduction """ ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Exscan, sendbuf, recvbuf, op) - if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1118,7 +1131,7 @@ def Reduce( Rank of the root process """ ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Reduce, sendbuf, recvbuf, op, root) - if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1143,7 +1156,7 @@ def Scan( The operation to perform upon reduction """ ret, sbuf, rbuf, buf = self.__reduce_like(self.handle.Scan, sendbuf, recvbuf, op) - if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if buf is not None and isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1213,23 +1226,24 @@ def __allgather_like( else: recv_axis_permutation = None - sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu() - rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu() - - # prepare buffer objects - if sendbuf is MPI.IN_PLACE or not isinstance(sendbuf, torch.Tensor): - mpi_sendbuf = sbuf - else: + if isinstance(sendbuf, torch.Tensor): + sbuf = self._moveToCompDevice(sendbuf, func) mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs, sbuf_is_contiguous) if send_counts is not None: mpi_sendbuf[1] = mpi_sendbuf[1][0][self.rank] - - if recvbuf is MPI.IN_PLACE or not isinstance(recvbuf, torch.Tensor): - mpi_recvbuf = rbuf else: + sbuf = sendbuf + mpi_sendbuf = sendbuf + + if isinstance(recvbuf, torch.Tensor): + rbuf = self._moveToCompDevice(recvbuf, func) mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs, rbuf_is_contiguous) if recv_counts is None: mpi_recvbuf[1] //= self.size + else: + rbuf = recvbuf + mpi_recvbuf = recvbuf + # perform the scatter operation exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs) return exit_code, sbuf, rbuf, original_recvbuf, recv_axis_permutation @@ -1257,7 +1271,7 @@ def Allgather( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1286,7 +1300,7 @@ def Allgatherv( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1417,20 +1431,12 @@ def __alltoall_like( recvbuf = recvbuf.permute(*recv_axis_permutation) # prepare buffer objects - sbuf = ( - sendbuf - if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) - else sendbuf.cpu() - ) + sbuf = self._moveToCompDevice(sendbuf, func) mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs) if send_counts is None: mpi_sendbuf[1] //= self.size - rbuf = ( - recvbuf - if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) - else recvbuf.cpu() - ) + rbuf = self._moveToCompDevice(recvbuf, func) mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs) if recv_counts is None: mpi_recvbuf[1] //= self.size @@ -1461,16 +1467,8 @@ def __alltoall_like( recvbuf = recvbuf.permute(*axis_permutation) # prepare buffer objects - sbuf = ( - sendbuf - if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) - else sendbuf.cpu() - ) - rbuf = ( - recvbuf - if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) - else recvbuf.cpu() - ) + sbuf = self._moveToCompDevice(sendbuf, func) + rbuf = self._moveToCompDevice(recvbuf, func) mpi_sendbuf = self.alltoall_sendbuffer(sbuf) mpi_recvbuf = self.alltoall_recvbuffer(rbuf) @@ -1510,7 +1508,7 @@ def Alltoall( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1546,7 +1544,7 @@ def Alltoallv( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1570,7 +1568,7 @@ def Alltoallw( """ # Unpack sendbuffer information sendbuf_tensor, (send_counts, send_displs), subarray_params_list = sendbuf - sendbuf = sendbuf_tensor if CUDA_AWARE_MPI else sendbuf_tensor.cpu() + sendbuf = self._moveToCompDevice(sendbuf_tensor, self.handle.Alltoallw) is_contiguous = sendbuf.is_contiguous() stride = sendbuf.stride() @@ -1605,7 +1603,7 @@ def Alltoallw( # Unpack recvbuf information recvbuf_tensor, (recv_counts, recv_displs), subarray_params_list = recvbuf - recvbuf = recvbuf_tensor if CUDA_AWARE_MPI else recvbuf_tensor.cpu() + recvbuf = self._moveToCompDevice(recvbuf_tensor, self.handle.Alltoallw) recvbuf_ptr, _, recv_datatype = self.as_buffer(recvbuf) # Commit the receive subarray datatypes @@ -1633,7 +1631,7 @@ def Alltoallw( if ( isinstance(recvbuf_tensor, torch.Tensor) and recvbuf_tensor.is_cuda - and not CUDA_AWARE_MPI + and not GPU_AWARE_MPI ): recvbuf_tensor.copy_(recvbuf) else: @@ -1860,9 +1858,12 @@ def __gather_like( recv_axis_permutation[0], recv_axis_permutation[recv_axis] = recv_axis, 0 recvbuf = recvbuf.permute(*recv_axis_permutation) - # prepare buffer objects - sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu() - rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu() + sbuf = ( + self._moveToCompDevice(sendbuf, func) if isinstance(sendbuf, torch.Tensor) else sendbuf + ) + rbuf = ( + self._moveToCompDevice(recvbuf, func) if isinstance(recvbuf, torch.Tensor) else recvbuf + ) if sendbuf is not MPI.IN_PLACE: mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs) @@ -1870,6 +1871,7 @@ def __gather_like( mpi_sendbuf[1] //= send_factor else: mpi_sendbuf = sbuf + if recvbuf is not MPI.IN_PLACE: mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs) if recv_counts is None: @@ -1916,7 +1918,7 @@ def Gather( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -1951,7 +1953,7 @@ def Gatherv( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -2105,8 +2107,15 @@ def __scatter_like( recvbuf = recvbuf.permute(*recv_axis_permutation) # prepare buffer objects - sbuf = sendbuf if CUDA_AWARE_MPI or not isinstance(sendbuf, torch.Tensor) else sendbuf.cpu() - rbuf = recvbuf if CUDA_AWARE_MPI or not isinstance(recvbuf, torch.Tensor) else recvbuf.cpu() + if isinstance(sendbuf, torch.Tensor): + sbuf = self._moveToCompDevice(sendbuf, func) + else: + sbuf = sendbuf + + if isinstance(recvbuf, torch.Tensor): + rbuf = self._moveToCompDevice(recvbuf, func) + else: + rbuf = recvbuf if sendbuf is not MPI.IN_PLACE: mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs) @@ -2236,7 +2245,7 @@ def Scatter( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret @@ -2277,7 +2286,7 @@ def Scatterv( ) if buf is not None and isinstance(buf, torch.Tensor) and permutation is not None: rbuf = rbuf.permute(permutation) - if isinstance(buf, torch.Tensor) and buf.is_cuda and not CUDA_AWARE_MPI: + if isinstance(buf, torch.Tensor) and buf.is_cuda and not GPU_AWARE_MPI: buf.copy_(rbuf) return ret diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py index f8c500a72b..5fc901a5d1 100644 --- a/heat/core/linalg/tests/test_basics.py +++ b/heat/core/linalg/tests/test_basics.py @@ -375,7 +375,7 @@ def test_inv(self): a = ht.random.random((20, 20), dtype=dtype, split=0) ainv = ht.linalg.inv(a) i = ht.eye(a.shape, split=0, dtype=a.dtype) - print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}") + # print(f"Local result of rank {a.comm.Get_rank()}: {(a @ ainv).larray}") self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-5 if self.is_mps else atol * 1e2)) with self.assertRaises(RuntimeError): diff --git a/heat/core/linalg/tests/test_qr.py b/heat/core/linalg/tests/test_qr.py index dc31e03caf..acf814d3c0 100644 --- a/heat/core/linalg/tests/test_qr.py +++ b/heat/core/linalg/tests/test_qr.py @@ -32,8 +32,8 @@ def test_qr_split1orNone(self): if not allclose: diff = qr.Q @ qr.R - mat max_diff = ht.max(diff) - print(f"diff: {diff}") - print(f"max_diff: {max_diff}m") + #print(f"diff: {diff}") + #print(f"max_diff: {max_diff}m") self.assertTrue( ht.allclose(qr.Q @ qr.R, mat, atol=dtypetol, rtol=dtypetol) diff --git a/heat/core/tests/test_communication.py b/heat/core/tests/test_communication.py index 131b21f79a..162a5b0d45 100644 --- a/heat/core/tests/test_communication.py +++ b/heat/core/tests/test_communication.py @@ -74,8 +74,8 @@ def test_mpi_communicator(self): self.assertEqual(len(chunks), len(self.data.shape)) def test_cuda_aware_mpi(self): - self.assertTrue(hasattr(ht.communication, "CUDA_AWARE_MPI")) - self.assertIsInstance(ht.communication.CUDA_AWARE_MPI, bool) + self.assertTrue(hasattr(ht.communication, "GPU_AWARE_MPI")) + self.assertIsInstance(ht.communication.GPU_AWARE_MPI, bool) def test_contiguous_memory_buffer(self): # vector heat tensor @@ -139,7 +139,7 @@ def test_non_contiguous_memory_buffer(self): # check that after sending the data everything is equal self.assertTrue((non_contiguous_data.larray == contiguous_out.larray).all()) - if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI: + if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI: self.assertTrue(contiguous_out.larray.is_contiguous()) # non-contiguous destination @@ -158,7 +158,7 @@ def test_non_contiguous_memory_buffer(self): req.Wait() # check that after sending the data everything is equal self.assertTrue((contiguous_data.larray == non_contiguous_out.larray).all()) - if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI: + if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI: self.assertFalse(non_contiguous_out.larray.is_contiguous()) # non-contiguous destination @@ -181,7 +181,7 @@ def test_non_contiguous_memory_buffer(self): req.Wait() # check that after sending the data everything is equal self.assertTrue((both_non_contiguous_data.larray == both_non_contiguous_out.larray).all()) - if ht.get_device().device_type == "cpu" or ht.communication.CUDA_AWARE_MPI: + if ht.get_device().device_type == "cpu" or ht.communication.GPU_AWARE_MPI: self.assertFalse(both_non_contiguous_out.larray.is_contiguous()) def test_default_comm(self): From 80756c5de62f913b9ff9779ee9753fa062232dcf Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 28 Oct 2025 09:22:44 +0100 Subject: [PATCH 03/15] Bugs/1990 Fix handling of zarr groups (#1991) (#1999) * introduce variable to handle zarr groups * expand tests * edit docs * do not test with float64 on MPS * files housekeeping on MPS * load zarr variable from multiple dirs * fix path reading * fix import * expand docs * set device * fix dtype on empty ranks, balance output, refactor * adapt tests * Apply suggestions from code review * expand tests * fix deadlock at split sanitation * enable dtype setting * expand tests --------- (cherry picked from commit dc4bd1cd831fd08df64f1a394ee8cb1c5da0c202) Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- heat/core/io.py | 148 +++++++++++++++++++++++++++-- heat/core/tests/test_io.py | 186 +++++++++++++++++++++++++++++++++---- 2 files changed, 306 insertions(+), 28 deletions(-) diff --git a/heat/core/io.py b/heat/core/io.py index aae6ab5b2c..ca7d7bb0c8 100644 --- a/heat/core/io.py +++ b/heat/core/io.py @@ -3,6 +3,8 @@ from __future__ import annotations from functools import reduce +import glob + import operator import os.path from math import log10 @@ -797,6 +799,12 @@ def load( DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None) >>> ht.load("data.nc", variable="DATA") DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None) + >>> ht.load("my_data.zarr", variable="RECEIVER_1/DATA") + DNDarray([ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=0) + >>> ht.load("my_data.zarr", variable="RECEIVER_*/DATA") + DNDarray([[ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], + [ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981], + [ 1.0000, 2.7183, 7.3891, 20.0855, 54.5981]], dtype=ht.float32, device=cpu:0, split=0) See Also -------- @@ -1417,6 +1425,7 @@ def supports_zarr() -> bool: def load_zarr( path: str, + variable: str = None, split: int = 0, device: Optional[str] = None, comm: Optional[Communication] = None, @@ -1424,12 +1433,15 @@ def load_zarr( **kwargs, ) -> DNDarray: """ - Loads zarr-Format into DNDarray which will be returned. + Loads data from a zarr store into DNDarray. `path` can either point to a single zarr array or a zarr group. In the latter case, `variable` must be provided to specify which array in the group to load. If `variable` contains a wildcard pattern (e.g. `RECEIVER_*/DATA`), all matching arrays will be loaded and concatenated along the specified `split` axis. Parameters ---------- path : str Path to the directory in which a .zarr-file is located. + variable : str, optional + If the zarr store is a group, the variable (or path to variable) to load from the group. + Can contain a wildcard pattern to load and concatenate arrays stored in slices in different directories. split : int Along which axis the loaded arrays should be concatenated. device : str, optional @@ -1441,12 +1453,13 @@ def load_zarr( **kwargs : Any extra Arguments to pass to zarr.open """ + # sanitize inputs + device = devices.sanitize_device(device) + torch_device = device.torch_device + comm = sanitize_comm(comm) + if not isinstance(path, str): raise TypeError(f"path must be str, not {type(path)}") - if split is not None and not isinstance(split, int): - raise TypeError(f"split must be None or int, not {type(split)}") - if device is not None and not isinstance(device, str): - raise TypeError(f"device must be None or str, not {type(split)}") if not isinstance(slices, (slice, Iterable)) and slices is not None: raise TypeError(f"Slices Argument must be slice, tuple or None and not {type(slices)}") if isinstance(slices, Iterable): @@ -1461,7 +1474,127 @@ def load_zarr( else: raise ValueError("File has no zarr extension.") - arr: zarr.Array = zarr.open_array(store=path, **kwargs) + store_path = os.path.join(path, variable) if variable else path + + output_dtype = kwargs.pop("dtype", None) + torch_output_dtype = output_dtype.torch_type() if output_dtype else None + + if variable and "*" in variable: + # `variable` contains a wildcard pattern + # e.g. data were chunked at write-out and stored in multiple directories + if slices is not None: + raise NotImplementedError("Slicing is not supported when loading with a wildcard.") + + base_paths = sorted(glob.glob(store_path)) + + if not base_paths: + raise FileNotFoundError( + f"Zarr wildcard pattern '{variable}' did not match any arrays in store '{path}'" + ) + + variable_paths = [os.path.relpath(p, start=path) for p in base_paths] + + # each rank reads data from its assigned directories and concatenates locally + + # determine which directories to open on rank + dummy_array = factories.empty((len(base_paths),), dtype=types.float32) + _, _, local_dir_slice = dummy_array.comm.chunk( + dummy_array.shape, rank=dummy_array.comm.rank, split=0 + ) + + # load data to torch tensors + local_tensors = [] + for i, var_path in enumerate(variable_paths[local_dir_slice[0]]): + local_tensor = torch.from_numpy(zarr.open(path)[var_path][:]) + if torch_output_dtype: + local_tensor = local_tensor.to(torch_output_dtype) + local_tensors.append(local_tensor) + + # Have rank 0 determine the single-store shape and broadcast it to all ranks for sanitation + target_ndims = torch.zeros(1, dtype=torch.int32) + if dummy_array.comm.rank == 0: + if len(local_tensors) == 0: + raise ValueError( + f"Zarr wildcard pattern '{variable}' did not match any arrays in store '{path}'" + ) + # broadcast shape of first local tensor to allow sanitation on empty ranks + target_ndims = torch.tensor(local_tensors[0].ndim, dtype=torch.int32) + dummy_array.comm.Bcast(target_ndims, root=0) + # sanitize split axis + proxy_shape = (1,) * target_ndims.item() + split = sanitize_axis(proxy_shape, axis=split) + + # concatenate locally + if len(local_tensors) >= 1: + if len(local_tensors) == 1: + local_tensor = local_tensors[0] + else: + local_tensor = torch.cat(local_tensors, dim=split if split is not None else 0) + empty_ranks = torch.tensor([0], dtype=torch.int32) + ht_type_code = types.__type_codes[types.canonical_heat_type(local_tensor.dtype)] + else: + # no local tensors i.e. no data assigned to rank + local_tensor = torch.empty((0,)) + empty_ranks = torch.tensor([1], dtype=torch.int32) + # dummy dtype code + ht_type_code = -1 + # check for empty ranks + dummy_array.comm.Allreduce(MPI.IN_PLACE, empty_ranks, op=MPI.SUM) + if empty_ranks.item() > 0: + # fix local shape and dtype of empty tensors, otherwise DNDarray construction will fail + # Rank 0 broadcasts the info to all other ranks + target_shape = torch.zeros( + ( + 1, + target_ndims.item() + 1, + ), + dtype=torch.int64, + ) + if local_tensor.numel() > 0: + target_shape[0, :-1] = torch.tensor(local_tensor.shape, dtype=torch.int64) + # encode dtype as last entry + target_shape[0, -1] = ht_type_code + # share info about target shape and dtype + target_shapes = torch.zeros( + (dummy_array.comm.size, target_ndims.item() + 1), dtype=torch.int64 + ) + dummy_array.comm.Allgather(target_shape, target_shapes) + if local_tensor.numel() == 0: + ht_type_code = target_shapes[0, -1].item() + target_shape = target_shapes[0, :-1].clone() + target_shape[split] = 0 + for dtype, dtype_code in types.__type_codes.items(): + if dtype_code == ht_type_code: + ht_type = dtype + break + local_tensor = torch.empty( + tuple(target_shape.tolist()), dtype=ht_type.torch_type() + ) + # discard dtype code column + target_shapes = target_shapes[:, :-1] + # calculate global array shape + out_gshape = target_shapes[0, :].clone() + out_gshape[split] = target_shapes[:, split].sum().item() + # wrap local tensors in DNDarray + dndarray = DNDarray( + local_tensor.to(device=torch_device), + gshape=tuple(out_gshape.tolist()), + dtype=output_dtype + if output_dtype + else types.canonical_heat_type(local_tensor.dtype), + split=split, + device=device, + comm=comm, + balanced=False, + ) + else: + # all ranks are populated, create DNDarray directly + dndarray = factories.array(local_tensor, is_split=split, device=device, comm=comm) + dndarray.balance_() + return dndarray + + # standard single zarr array + arr: zarr.Array = zarr.open_array(store=store_path, **kwargs) shape = arr.shape if isinstance(slices, slice) or slices is None: @@ -1476,8 +1609,7 @@ def load_zarr( slices.extend([slice(None) for _ in range(abs(len(slices) - len(shape)))]) dtype = types.canonical_heat_type(arr.dtype) - device = devices.sanitize_device(device) - comm = sanitize_comm(comm) + split = sanitize_axis(shape, axis=split) # slices = tuple(slice(*tslice.indices(length)) for length, tslice in zip(shape, slices)) slices = tuple(slices) diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py index ac5ebd4a6c..b61325dd3c 100644 --- a/heat/core/tests/test_io.py +++ b/heat/core/tests/test_io.py @@ -40,9 +40,16 @@ def setUpClass(cls): cls.ZARR_OUT_PATH = pwd + "/zarr_test_out.zarr" cls.ZARR_IN_PATH = pwd + "/zarr_test_in.zarr" cls.ZARR_TEMP_PATH = pwd + "/zarr_temp.zarr" + cls.ZARR_NESTED_PATH = pwd + "/zarr_test_nested.zarr" + + # device-aware dtypes + testing_types = [ht.int32, ht.int64, ht.float32] + if not cls.is_mps: + testing_types.append(ht.float64) + cls.testing_types = testing_types def tearDown(self): - # synchronize all nodes + # synchronize all processes ht.MPI_WORLD.Barrier() # clean up of temporary files @@ -57,16 +64,20 @@ def tearDown(self): os.remove(self.NETCDF_OUT_PATH) except FileNotFoundError: pass - # if ht.MPI_WORLD.rank == 0: if ht.io.supports_zarr(): - for file in [self.ZARR_TEMP_PATH, self.ZARR_IN_PATH, self.ZARR_OUT_PATH]: - try: - shutil.rmtree(file) - except FileNotFoundError: - pass + if ht.MPI_WORLD.rank == 0: + for file in [ + self.ZARR_TEMP_PATH, + self.ZARR_IN_PATH, + self.ZARR_OUT_PATH, + self.ZARR_NESTED_PATH, + ]: + try: + shutil.rmtree(file) + except FileNotFoundError: + pass - # synchronize all nodes ht.MPI_WORLD.Barrier() def test_size_from_slice(self): @@ -831,9 +842,10 @@ def test_load_npy_float(self): self.assertEqual(load_array.dtype, ht.float64) if ht.MPI_WORLD.rank == 0: self.assertTrue((load_array_npy == float_array).all) - for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")): - if fnmatch.fnmatch(file, "*.npy"): - os.remove(os.path.join(os.getcwd(), "heat/datasets", file)) + if ht.MPI_WORLD.rank == 0: + for file in os.listdir(os.path.join(os.getcwd(), "heat/datasets")): + if fnmatch.fnmatch(file, "*.npy"): + os.remove(os.path.join(os.getcwd(), "heat/datasets", file)) def test_load_npy_exception(self): with self.assertRaises(TypeError): @@ -940,15 +952,15 @@ def test_load_zarr(self): import zarr test_data = np.arange(self.ZARR_SHAPE[0] * self.ZARR_SHAPE[1]).reshape(self.ZARR_SHAPE) - + dtype = np.float32 if ht.MPI_WORLD.rank == 0: try: arr = zarr.create_array( - self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64 + self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=dtype ) except AttributeError: arr = zarr.create( - store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=np.float64 + store=self.ZARR_TEMP_PATH, shape=self.ZARR_SHAPE, dtype=dtype ) arr[:] = test_data @@ -962,6 +974,140 @@ def test_load_zarr(self): ht.MPI_WORLD.Barrier() + def test_load_zarr_group(self): + if not ht.io.supports_zarr(): + self.skipTest("Requires zarr") + + import zarr + + # Write out a nested Zarr store + original_data = np.arange(np.prod(self.ZARR_SHAPE)).reshape(self.ZARR_SHAPE) + nested_group_name = "MAIN_0" + array_name = "DATA" + variable_path = f"{nested_group_name}/{array_name}" + + if ht.MPI_WORLD.rank == 0: + root = zarr.open_group(self.ZARR_NESTED_PATH, mode="w") + main_0 = root.create_group(nested_group_name) + main_0.create_dataset( + array_name, + shape=original_data.shape, + dtype=original_data.dtype, + data=original_data, + ) + + ht.MPI_WORLD.Barrier() + + # Test loading using both positional and keyword arguments for different splits + for split in [None, 0, 1]: + # Test with positional argument + with self.subTest(split=split, arg_type="positional"): + ht_tensor_pos = ht.load(self.ZARR_NESTED_PATH, variable_path, split=split) + self.assertIsInstance(ht_tensor_pos, ht.DNDarray) + self.assertEqual(ht_tensor_pos.gshape, original_data.shape) + self.assertTrue(np.array_equal(ht_tensor_pos.numpy(), original_data)) + + # Test with keyword argument + with self.subTest(split=split, arg_type="keyword"): + ht_tensor_kw = ht.load( + self.ZARR_NESTED_PATH, variable=variable_path, split=split + ) + self.assertIsInstance(ht_tensor_kw, ht.DNDarray) + self.assertEqual(ht_tensor_kw.gshape, original_data.shape) + self.assertTrue(np.array_equal(ht_tensor_kw.numpy(), original_data)) + + ht.MPI_WORLD.Barrier() + + # test loading with wildcard + num_chunks = self.comm.size * 2 + 1 + if self.comm.size > 3: + # test empty ranks + num_chunks = self.comm.size - 1 + + np_testing_types = [np.int32, np.int64, np.float32, np.complex64] + if not self.is_mps: + np_testing_types.extend([np.float64, np.complex128]) + + ht.MPI_WORLD.Barrier() + for dtype in np_testing_types: + global_data_shape = (num_chunks * 10, num_chunks * 5, 7) + global_data = np.arange(np.prod(global_data_shape), dtype=dtype).reshape(global_data_shape) + if self.comm.rank == 0: + # create zarr store for split=0 and split=1 + chunk_shape_split0 = (10, global_data_shape[1], global_data_shape[2]) + chunk_shape_split1 = (global_data_shape[0], 5, global_data_shape[2]) + + root_zarr = zarr.open_group(self.ZARR_OUT_PATH, mode="w") + + for i in range(num_chunks): + chunk_data_split0 = global_data[i * chunk_shape_split0[0] : (i + 1) * chunk_shape_split0[0], :, :] + chunk_group_split0 = root_zarr.create_group(f"CHUNK_{i}_SPLIT0") + chunk_group_split0.create_dataset( + "DATA", + shape=chunk_data_split0.shape, + dtype=chunk_data_split0.dtype, + data=chunk_data_split0 + ) + + chunk_data_split1 = global_data[:, i * chunk_shape_split1[1] : (i + 1) * chunk_shape_split1[1], :] + chunk_group_split1 = root_zarr.create_group(f"CHUNK_{i}_SPLIT1") + chunk_group_split1.create_dataset( + "DATA", + shape=chunk_data_split1.shape, + dtype=chunk_data_split1.dtype, + data=chunk_data_split1 + ) + ht.MPI_WORLD.Barrier() + + # test wildcard loading for split=0 + with self.subTest(dtype=dtype, split=0): + ht_array_split0 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=0, device=self.device) + self.assertIsInstance(ht_array_split0, ht.DNDarray) + self.assertEqual(ht_array_split0.gshape, global_data_shape) + ht_array_split0.balance_() + self.assertTrue((ht_array_split0.numpy() == global_data).all()) + self.assertTrue(ht_array_split0.dtype == ht.types.canonical_heat_type(dtype)) + + # test wildcard loading for split=1 + with self.subTest(dtype=dtype, split=1): + ht_array_split1 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT1/DATA", split=1, device=self.device) + self.assertIsInstance(ht_array_split1, ht.DNDarray) + self.assertEqual(ht_array_split1.gshape, global_data_shape) + self.assertTrue((ht_array_split1.numpy() == global_data).all()) + self.assertTrue(ht_array_split1.dtype == ht.types.canonical_heat_type(dtype)) + + # test wildcard loading with dtype conversion + with self.subTest(dtype=dtype, split="dtype_conversion"): + # only for non-complex dtypes + if not np.issubdtype(dtype, np.complexfloating): + ht_array_split0 = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=0, device=self.device, dtype=ht.float32) + self.assertIsInstance(ht_array_split0, ht.DNDarray) + self.assertEqual(ht_array_split0.gshape, global_data_shape) + self.assertTrue((ht_array_split0.numpy() == global_data).all()) + self.assertTrue(ht_array_split0.dtype == ht.float32) + + ht.MPI_WORLD.Barrier() + + # Test data misconstruction when using the wrong split axis + with self.subTest(split="split_mismatch_0", dtype=dtype): + with self.assertRaises(ValueError): + test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT1/DATA", split=0, device=self.device) + self.assertTrue((test.numpy() == global_data).all()) + + with self.subTest(split="split_mismatch_1", dtype=dtype): + with self.assertRaises(ValueError): + test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=1, device=self.device) + self.assertFalse((test.numpy() == global_data).all()) + + # test exceptions + with self.subTest(split="split_exception", dtype=dtype): + with self.assertRaises(ValueError): + test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", split=3) + with self.assertRaises(NotImplementedError): + test = ht.load(self.ZARR_OUT_PATH, variable="CHUNK_*_SPLIT0/DATA", slices=slice(0,10)) + with self.assertRaises(FileNotFoundError): + test = ht.load(self.ZARR_OUT_PATH, variable="NONEXSISTENT_CHUNK_*_SPLIT0/DATA", split=0) + def test_load_zarr_slice(self): if not ht.io.supports_zarr(): self.skipTest("Requires zarr") @@ -1017,7 +1163,7 @@ def test_save_zarr_2d_split0(self): import zarr - for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for type in self.testing_types: for dims in [(i, self.ZARR_SHAPE[1]) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]: with self.subTest(type=type, dims=dims): n = dims[0] * dims[1] @@ -1037,7 +1183,7 @@ def test_save_zarr_2d_split1(self): import zarr - for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for type in self.testing_types: for dims in [(self.ZARR_SHAPE[0], i) for i in range(1, max(10, ht.MPI_WORLD.size + 1))]: with self.subTest(type=type, dims=dims): n = dims[0] * dims[1] @@ -1057,7 +1203,7 @@ def test_save_zarr_split_none(self): import zarr - for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for type in self.testing_types: for n in [10, 100, 1000]: with self.subTest(type=type, n=n): dndarray = ht.arange(n, dtype=type, split=None) @@ -1075,7 +1221,7 @@ def test_save_zarr_1d_split_0(self): import zarr - for type in [ht.types.int32, ht.types.int64, ht.types.float32, ht.types.float64]: + for type in self.testing_types: for n in [10, 100, 1000]: with self.subTest(type=type, n=n): dndarray = ht.arange(n, dtype=type, split=0) @@ -1095,9 +1241,9 @@ def test_load_zarr_arguments(self): ht.load_zarr(None) with self.assertRaises(ValueError): ht.load_zarr("data.npy") - with self.assertRaises(TypeError): + with self.assertRaises(ValueError): ht.load_zarr("", "") - with self.assertRaises(TypeError): + with self.assertRaises(ValueError): ht.load_zarr("", device=1) with self.assertRaises(TypeError): ht.load_zarr("", slices=0) From eb98ba836d21f60e539941bfa0e9b54da053935d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 17 Nov 2025 11:40:34 +0100 Subject: [PATCH 04/15] Set correct dtype when loading and saving hdf5 (#2014) (#2024) * fixed load_hdf5 * fixed save_hdf5 * fixed different behavior in tests * test torch dtype for save_hdf5 --------- (cherry picked from commit 678cd47a551d40687bcab2548e0d504d53b3f0d4) Co-authored-by: Marc-Jindra Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> --- heat/core/io.py | 31 +++++++++++++++++++++++-------- heat/core/tests/test_io.py | 12 ++++++------ 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/heat/core/io.py b/heat/core/io.py index ca7d7bb0c8..beb3ea9de8 100644 --- a/heat/core/io.py +++ b/heat/core/io.py @@ -517,7 +517,7 @@ def supports_hdf5() -> bool: def load_hdf5( path: str, dataset: str, - dtype: datatype = types.float32, + dtype: Optional[datatype] = None, slices: Optional[Tuple[Optional[slice], ...]] = None, split: Optional[int] = None, device: Optional[str] = None, @@ -533,7 +533,7 @@ def load_hdf5( dataset : str Name of the dataset to be read. dtype : datatype, optional - Data type of the resulting array. + Data type of the resulting array, defaults to the loaded datasets type. slices : tuple of slice objects, optional Load only the specified slices of the dataset. split : int or None, optional @@ -625,8 +625,6 @@ def load_hdf5( elif split is not None and not isinstance(split, int): raise TypeError(f"split must be None or int, not {type(split)}") - # infer the type and communicator for the loaded array - dtype = types.canonical_heat_type(dtype) # determine the comm and device the data will be placed on device = devices.sanitize_device(device) comm = sanitize_comm(comm) @@ -637,6 +635,9 @@ def load_hdf5( gshape = data.shape new_gshape = tuple() offsets = [0] * len(gshape) + if dtype is None: + dtype = data.dtype + dtype = types.canonical_heat_type(dtype) if slices is not None: for i in range(len(gshape)): if i < len(slices) and slices[i]: @@ -687,7 +688,12 @@ def load_hdf5( return DNDarray(data, gshape, dtype, split, device, comm, balanced) def save_hdf5( - data: DNDarray, path: str, dataset: str, mode: str = "w", **kwargs: Dict[str, object] + data: DNDarray, + path: str, + dataset: str, + mode: str = "w", + dtype: Optional[datatype] = None, + **kwargs: Dict[str, object], ): """ Saves ``data`` to an HDF5 file. Attempts to utilize parallel I/O if possible. @@ -702,6 +708,8 @@ def save_hdf5( Name of the dataset the data is saved to. mode : str, optional File access mode, one of ``'w', 'a', 'r+'`` + dtype : datatype, optional + Data type of the saved data kwargs : dict, optional Additional arguments passed to the created dataset. @@ -732,16 +740,23 @@ def save_hdf5( is_split = data.split is not None _, _, slices = data.comm.chunk(data.gshape, data.split if is_split else 0) + if dtype is None: + dtype = data.dtype + elif type(dtype) == torch.dtype: + dtype = str(dtype).split(".")[-1] + if type(dtype) is not str: + dtype = dtype.__name__ + # attempt to perform parallel I/O if possible if h5py.get_config().mpi: with h5py.File(path, mode, driver="mpio", comm=data.comm.handle) as handle: - dset = handle.create_dataset(dataset, data.shape, **kwargs) + dset = handle.create_dataset(dataset, data.shape, dtype=dtype, **kwargs) dset[slices] = data.larray.cpu() if is_split else data.larray[slices].cpu() # otherwise a single rank only write is performed in case of local data (i.e. no split) elif data.comm.rank == 0: with h5py.File(path, mode) as handle: - dset = handle.create_dataset(dataset, data.shape, **kwargs) + dset = handle.create_dataset(dataset, data.shape, dtype=dtype, **kwargs) if is_split: dset[slices] = data.larray.cpu() else: @@ -763,7 +778,7 @@ def save_hdf5( next_rank = (data.comm.rank + 1) % data.comm.size data.comm.Isend([None, 0, MPI.INT], dest=next_rank) - DNDarray.save_hdf5 = lambda self, path, dataset, mode="w", **kwargs: save_hdf5( + DNDarray.save_hdf5 = lambda self, path, dataset, mode="w", dtype=None, **kwargs: save_hdf5( self, path, dataset, mode, **kwargs ) DNDarray.save_hdf5.__doc__ = save_hdf5.__doc__ diff --git a/heat/core/tests/test_io.py b/heat/core/tests/test_io.py index b61325dd3c..69369d2214 100644 --- a/heat/core/tests/test_io.py +++ b/heat/core/tests/test_io.py @@ -106,7 +106,7 @@ def test_size_from_slice(self): def test_load(self): # HDF5 if ht.io.supports_hdf5(): - iris = ht.load(self.HDF5_PATH, dataset="data") + iris = ht.load(self.HDF5_PATH, dataset="data", dtype=ht.float32) self.assertIsInstance(iris, ht.DNDarray) # shape invariant self.assertEqual(iris.shape, self.IRIS.shape) @@ -591,7 +591,7 @@ def test_load_hdf5(self): self.skipTest("Requires HDF5") # default parameters - iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET) + iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, dtype=ht.float32) self.assertIsInstance(iris, ht.DNDarray) self.assertEqual(iris.shape, self.IRIS.shape) self.assertEqual(iris.dtype, ht.float32) @@ -602,13 +602,13 @@ def test_load_hdf5(self): iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=0) self.assertIsInstance(iris, ht.DNDarray) self.assertEqual(iris.shape, self.IRIS.shape) - self.assertEqual(iris.dtype, ht.float32) + self.assertEqual(iris.dtype, ht.float64) lshape = iris.lshape self.assertLessEqual(lshape[0], self.IRIS.shape[0]) self.assertEqual(lshape[1], self.IRIS.shape[1]) # negative split axis - iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=-1) + iris = ht.load_hdf5(self.HDF5_PATH, self.HDF5_DATASET, split=-1, dtype=ht.float32) self.assertIsInstance(iris, ht.DNDarray) self.assertEqual(iris.shape, self.IRIS.shape) self.assertEqual(iris.dtype, ht.float32) @@ -650,7 +650,7 @@ def test_save_hdf5(self): # local unsplit data local_data = ht.arange(100) ht.save_hdf5( - local_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=local_data.dtype.char() + local_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=torch.int32 ) if local_data.comm.rank == 0: with ht.io.h5py.File(self.HDF5_OUT_PATH, "r") as handle: @@ -662,7 +662,7 @@ def test_save_hdf5(self): # distributed data range split_data = ht.arange(100, split=0) ht.save_hdf5( - split_data, self.HDF5_OUT_PATH, self.HDF5_DATASET, dtype=split_data.dtype.char() + split_data, self.HDF5_OUT_PATH, self.HDF5_DATASET ) if split_data.comm.rank == 0: with ht.io.h5py.File(self.HDF5_OUT_PATH, "r") as handle: From 267b7d5a5a1c2d94d041bef922b602613262f43f Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 1 Dec 2025 11:30:07 +0100 Subject: [PATCH 05/15] Supporting negative indices for flip operations (#2030) (#2050) * Converting negative indices * use sanitize_axis * Added tests * edit docs --------- (cherry picked from commit 3c0bc279bc981e51721de5953586990f55efe7bc) Co-authored-by: Marc-Jindra Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> --- heat/core/manipulations.py | 8 ++++++-- heat/core/tests/test_manipulations.py | 21 ++++++++++++++++++++- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/heat/core/manipulations.py b/heat/core/manipulations.py index d685f4d5ad..a7d9c542df 100644 --- a/heat/core/manipulations.py +++ b/heat/core/manipulations.py @@ -1076,7 +1076,7 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: a: DNDarray Input array to be flipped axis: int or Tuple[int,...] - A list of axes to be flipped + The axis or sequence of axes to be flipped See Also -------- @@ -1100,7 +1100,11 @@ def flip(a: DNDarray, axis: Union[int, Tuple[int, ...]] = None) -> DNDarray: # torch.flip only accepts tuples if isinstance(axis, int): - axis = [axis] + axis = (axis,) + elif isinstance(axis, list): + axis = tuple(axis) + + axis = stride_tricks.sanitize_axis(a.shape, axis) flipped = torch.flip(a.larray, axis) diff --git a/heat/core/tests/test_manipulations.py b/heat/core/tests/test_manipulations.py index e3c5ad232d..30138730d1 100644 --- a/heat/core/tests/test_manipulations.py +++ b/heat/core/tests/test_manipulations.py @@ -1089,6 +1089,25 @@ def test_flip(self): r_a = ht.array([[[3, 2], [1, 0]], [[7, 6], [5, 4]]], split=0, dtype=ht.uint8) self.assertTrue(ht.equal(ht.flip(a, [1, 2]), r_a)) + # test negative axis + a = ht.array([[1, 2], [3, 4]]) + r_a = ht.array([[2, 1], [4, 3]]) + self.assertTrue(ht.equal(ht.flip(a, -1), r_a)) + + a = ht.array([[1, 2], [3, 4]]) + r_a = ht.array([[3, 4], [1, 2]]) + self.assertTrue(ht.equal(ht.flip(a, -2), r_a)) + + a = ht.array([[1, 2], [3, 4]]) + r_a = ht.array([[4, 3], [2, 1]]) + self.assertTrue(ht.equal(ht.flip(a, (-2, -1)), r_a)) + + # test negative axis with split + a = ht.array([[2, 3], [4, 5], [6, 7], [8, 9]], split=1, dtype=ht.float32) + r_a = ht.array([[9, 8], [7, 6], [5, 4], [3, 2]], split=1, dtype=ht.float32) + self.assertTrue(ht.equal(ht.flip(a, (0, -1)), r_a)) + self.assertTrue(ht.equal(ht.flip(a, (-2, -1)), r_a)) + def test_fliplr(self): b = ht.array([[1, 2], [3, 4]]) r_b = ht.array([[2, 1], [4, 3]]) @@ -1119,7 +1138,7 @@ def test_fliplr(self): # test exception a = ht.arange(10) - with self.assertRaises(IndexError): + with self.assertRaises(ValueError): ht.fliplr(a) def test_flipud(self): From f72990bf78542c15add263e15b77f06544ae53e0 Mon Sep 17 00:00:00 2001 From: Heat Release Bot <> Date: Mon, 1 Dec 2025 15:25:32 +0000 Subject: [PATCH 06/15] Bump version to 1.7.0 --- CHANGELOG.md | 16 ++++++++++++++++ heat/core/version.py | 2 +- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 818a7dde09..eb9cc88bbd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,19 @@ +# v1.7.0 - Heat Minor Release - 1.7.0 +## Changes + +- #2050 [Backport stable] Supporting negative indices for flip operations (by @[github-actions[bot]](https://github.com/apps/github-actions)) +- #2024 [Backport stable] Set correct dtype when loading and saving hdf5 (by @[github-actions[bot]](https://github.com/apps/github-actions)) +- #1979 [Backport stable] Sturdier MPI Check (by @[github-actions[bot]](https://github.com/apps/github-actions)) + +### Bug Fixes + +- #1999 [Backport stable] Bugs/1990 Fix handling of zarr groups (by @[github-actions[bot]](https://github.com/apps/github-actions)) + +## Contributors + +@github-actions[bot] and [github-actions[bot]](https://github.com/apps/github-actions) + + # v1.6.0 ## Highlights diff --git a/heat/core/version.py b/heat/core/version.py index b81bec1221..bb133399f7 100644 --- a/heat/core/version.py +++ b/heat/core/version.py @@ -6,7 +6,7 @@ """Indicates feature extension.""" micro: int = 0 """Indicates revisions for bugfixes.""" -extension: str = "dev" +extension: str = None """Indicates special builds, e.g. for specific hardware.""" if not extension: From 2611ca1ce613000a1a6cf9639158c233b96ad5c2 Mon Sep 17 00:00:00 2001 From: Heat Release Bot <> Date: Mon, 1 Dec 2025 15:25:32 +0000 Subject: [PATCH 07/15] Update pytorch image in Dockerfile.release and Dockerfile.source to version --- docker/Dockerfile.release | 2 +- docker/Dockerfile.source | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docker/Dockerfile.release b/docker/Dockerfile.release index 8ead42996a..99e8e14370 100644 --- a/docker/Dockerfile.release +++ b/docker/Dockerfile.release @@ -1,5 +1,5 @@ ARG HEAT_VERSION=latest -ARG PYTORCH_IMG=25.07-py3 +ARG PYTORCH_IMG=25.11-py3 FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base COPY ./tzdata.seed /tmp/tzdata.seed diff --git a/docker/Dockerfile.source b/docker/Dockerfile.source index 93a017b359..ceb9fae49b 100644 --- a/docker/Dockerfile.source +++ b/docker/Dockerfile.source @@ -1,4 +1,4 @@ -ARG PYTORCH_IMG=25.07-py3 +ARG PYTORCH_IMG=25.11-py3 ARG HEAT_BRANCH=main FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base From 5ab622a7ee6e5ac77f631e5b5de1c71b6f316554 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?= Date: Mon, 1 Dec 2025 17:10:13 +0100 Subject: [PATCH 08/15] prep for 1.7.0 --- .github/workflows/inactivity.yml | 2 ++ .pre-commit-config.yaml | 4 ++-- .talismanrc | 2 ++ CHANGELOG.md | 14 +++++++------- CITATION.cff | 10 +++------- README.md | 2 +- .../notebooks/0_setup/0_setup_haicore.ipynb | 4 ---- .../tutorials/notebooks/0_setup/0_setup_jsc.ipynb | 3 --- 8 files changed, 17 insertions(+), 24 deletions(-) create mode 100644 .talismanrc diff --git a/.github/workflows/inactivity.yml b/.github/workflows/inactivity.yml index bf6d997838..f3529d9e42 100644 --- a/.github/workflows/inactivity.yml +++ b/.github/workflows/inactivity.yml @@ -31,3 +31,5 @@ jobs: stale-pr-message: "This pull request is stale because it has been open for 60 days with no activity." close-pr-message: "This pull request was closed because it has been inactive for 60 days since being marked as stale." repo-token: ${{ secrets.GITHUB_TOKEN }} + exempt-issue-labels: "epic,discussion,good first issue,RFC,student project" + exempt-pr-labels: "epic,discussion,good first issue,RFC,student project" diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 684b582c1f..5aecdbab0d 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -13,7 +13,7 @@ repos: - id: check-added-large-files - id: check-toml - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.18.2 # Use the sha / tag you want to point at + rev: v1.19.0 # Use the sha / tag you want to point at hooks: - id: mypy args: [--config-file, pyproject.toml, --ignore-missing-imports] @@ -26,7 +26,7 @@ repos: - repo: https://github.com/astral-sh/ruff-pre-commit # Ruff version. - rev: v0.14.6 + rev: v0.14.7 hooks: # Run the linter. - id: ruff diff --git a/.talismanrc b/.talismanrc new file mode 100644 index 0000000000..2fa4d07d95 --- /dev/null +++ b/.talismanrc @@ -0,0 +1,2 @@ +allowed_patterns: +- 'uses: [A-Za-z-\/]+@[\w\d]+ # v\d+\.\d+\.\d+' diff --git a/CHANGELOG.md b/CHANGELOG.md index eb9cc88bbd..f0821824c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,18 +1,18 @@ # v1.7.0 - Heat Minor Release - 1.7.0 ## Changes -- #2050 [Backport stable] Supporting negative indices for flip operations (by @[github-actions[bot]](https://github.com/apps/github-actions)) -- #2024 [Backport stable] Set correct dtype when loading and saving hdf5 (by @[github-actions[bot]](https://github.com/apps/github-actions)) -- #1979 [Backport stable] Sturdier MPI Check (by @[github-actions[bot]](https://github.com/apps/github-actions)) ### Bug Fixes -- #1999 [Backport stable] Bugs/1990 Fix handling of zarr groups (by @[github-actions[bot]](https://github.com/apps/github-actions)) +* Sturdier MPI+GPU compatibility check by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1979 +* Fix handling of zarr groups by @ClaudiaComito in https://github.com/helmholtz-analytics/heat/pull/1990 +* Supporting negative indices for flip operations by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014 +* Fixed issue where matrices returned by ```eigh``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2046 +* Fixed issue where matrices returned by ```qr``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2045 +* Dtype is now set correctly when loading and saving hdf5 files by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014 ## Contributors - -@github-actions[bot] and [github-actions[bot]](https://github.com/apps/github-actions) - +@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM # v1.6.0 ## Highlights diff --git a/CITATION.cff b/CITATION.cff index b09e7f80a5..99e317226a 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -7,19 +7,15 @@ message: 'If you use this software, please cite it as below.' type: software authors: # release highlights + - family-names: Comito + given-names: Claudia - family-names: Hoppe given-names: Fabian - family-names: Gutiérrez Hermosillo Muriedas given-names: Juan Pedro - - family-names: Palazoglu - given-names: Berkant - - family-names: Fischer - given-names: Carola +# active contributors in alphabetic order - family-names: Akdag given-names: Hakan - - family-names: Comito - given-names: Claudia -# active contributors in alphabetic order - family-names: Hees given-names: Jörn - family-names: Jindra diff --git a/README.md b/README.md index 0f6ca711d1..0ad84becf8 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ Heat is a distributed tensor framework for high performance data analytics. [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/helmholtz-analytics/heat/badge)](https://securityscorecards.dev/viewer/?uri=github.com/helmholtz-analytics/heat) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7688/badge)](https://bestpractices.coreinfrastructure.org/projects/7688) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2531472.svg)](https://doi.org/10.5281/zenodo.2531472) -[![Benchmarks](https://img.shields.io/badge/Grafana-Benchmarks-2ea44f)](https://57bc8d92-72f2-4869-accd-435ec06365cb.ka.bw-cloud-instance.org:3000/d/adjpqduq9r7k0a/heat-cb?orgId=1) +[![Benchmarks](https://img.shields.io/badge/Grafana-Benchmarks-2ea44f)](https://930000e0-e69a-4939-912e-89a92316b420.ka.bw-cloud-instance.org/grafana) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![JuRSE Code Pick of the Month](https://img.shields.io/badge/JuRSE_Code_Pick-August_2024-blue)](https://www.fz-juelich.de/en/rse/jurse-community/jurse-code-of-the-month/august-2024) diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb index 6e4662a701..90758679bd 100644 --- a/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_haicore.ipynb @@ -36,11 +36,7 @@ "tags": [] }, "source": [ - "\n", - "\n", - "\n", "## Introduction\n", - "---\n", "
\n", "Note:\n", "This notebook expects that you will be working on the JupyterLab hosted in HAICORE, at the Karlsruhe Institute of Technology.\n", diff --git a/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb index ee00ae6115..9ad18751a9 100644 --- a/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb +++ b/doc/source/tutorials/notebooks/0_setup/0_setup_jsc.ipynb @@ -20,13 +20,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", "
\n", " \n", "
\n", "\n", "## Introduction\n", - "---\n", "
\n", "Note:\n", "This tutorial is designed to run on Jupyter-JSC, a JupyterLab environment provided by the Jülich Supercomputing Centre. \n", @@ -156,7 +154,6 @@ "metadata": {}, "source": [ "## What is Heat for?\n", - "---\n", "\n", "[**deRSE24 NOTE**: do attend Fabian Hoppe's talk [TODAY at 16:30](https://events.hifis.net/event/994/contributions/7940/) for more details, benchmarks, and an overview of the parallel Python ecosystem.] \n", "\n", From 1cc54821b8f464a427e6c5efb4c4748a89d0c030 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 10 Dec 2025 12:47:10 +0100 Subject: [PATCH 09/15] Add device parameter to QR output arrays (#2051) Update QR decomposition to include device parameter for R and Q arrays. (cherry picked from commit bd5df7cf283d0bedf48f954aed5d9aeebe294b82) Co-authored-by: Giovanni Pederiva Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> --- heat/core/linalg/qr.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/heat/core/linalg/qr.py b/heat/core/linalg/qr.py index 4ca0c3fc01..d28183f297 100644 --- a/heat/core/linalg/qr.py +++ b/heat/core/linalg/qr.py @@ -107,7 +107,7 @@ def qr( if not A.is_distributed() or A.split < A.ndim - 2: # handle the case of a single process or split=None: just PyTorch QR Q, R = single_proc_qr(A.larray, mode=mode) - R = factories.array(R, is_split=A.split) + R = factories.array(R, is_split=A.split, device=A.device) if mode == "reduced": Q = factories.array(Q, is_split=A.split, device=A.device) else: From 38df20877e14642d808d56f26a6840779088a51f Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 10 Dec 2025 13:30:20 +0100 Subject: [PATCH 10/15] Handling of unknown MPI Libraries (#2032) (#2060) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix: Added 'other' as option for mpi library * fix: corrected parastation configuration * Update heat/core/_config.py --------- (cherry picked from commit 50080e08c8cea0aef45bfe50be87032feb0c334e) Co-authored-by: Juan Pedro Gutiérrez Hermosillo Muriedas Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> --- heat/core/_config.py | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/heat/core/_config.py b/heat/core/_config.py index 48d0a3e22b..c82063449e 100644 --- a/heat/core/_config.py +++ b/heat/core/_config.py @@ -19,7 +19,8 @@ class MPILibrary(Enum): MVAPICH = "mvapich" MPICH = "mpich" CrayMPI = "craympi" - ParastationMPI = "psmpi" + ParaStationMPI = "psmpi" + Other = "other" @dataclasses.dataclass @@ -37,9 +38,12 @@ def _get_mpi_library() -> MPILibraryInfo: return MPILibraryInfo(MPILibrary.IntelMPI, library[3]) case ["MPICH", "Version:", *_]: return MPILibraryInfo(MPILibrary.MPICH, library[2]) - ### Missing libraries + case ["MVAPICH", "Version:", *_]: + return MPILibraryInfo(MPILibrary.MVAPICH, library[2]) + case ["===", "ParaStation", "MPI", *_]: + return MPILibraryInfo(MPILibrary.ParaStationMPI, library[3]) case _: - print("Did not find a matching library") + return MPILibraryInfo(MPILibrary.Other, "unknown") def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]: @@ -81,7 +85,7 @@ def _check_gpu_aware_mpi(library: MPILibraryInfo) -> tuple[bool, bool]: cuda = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" rocm = os.environ.get("MPICH_GPU_SUPPORT_ENABLED") == "1" return cuda, rocm - case MPILibrary.ParastationMPI: + case MPILibrary.ParaStationMPI: cuda = os.environ.get("PSP_CUDA") == "1" rocm = False return cuda, rocm From b0961b0daecc087a9b25f2098744becc8d40d17b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?= Date: Fri, 19 Dec 2025 16:19:04 +0100 Subject: [PATCH 11/15] changelog --- CHANGELOG.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index f0821824c6..5c18de1608 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,21 @@ # v1.7.0 - Heat Minor Release - 1.7.0 + +## Highlights + +1) Randomized Symmetric eignevalue decomposition (eigh) +2) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch +3) Incremental SVD directly from an HDF5 file +4) Partial support of the Array API Standard (version: '2020.10'), and API namespace under `x.__array_namespace__(api_version='2020.10')` +5) Distributed PTP (peak to peak) function + ## Changes +### Features +* Randomized Symmetric Eigenvalue Decomposition (eigh) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1964 +* Incremental SVD directly from HDF5 file by @LScheib in https://github.com/helmholtz-analytics/heat/pull/2005 +* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022 +* Distributed Peak to Peak (ptp) function by @ivansherbakov9 in https://github.com/helmholtz-analytics/heat/pull/1954 +* PyTorch compatible DistributedSampler by @Berkant03 in https://githubcom/helmholtz-analytics/heat/pull/1807 ### Bug Fixes @@ -10,6 +25,9 @@ * Fixed issue where matrices returned by ```eigh``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2046 * Fixed issue where matrices returned by ```qr``` were not on the expected device by @GioPede in https://github.com/helmholtz-analytics/heat/pull/2045 * Dtype is now set correctly when loading and saving hdf5 files by @Marc-Jindra in https://github.com/helmholtz-analytics/heat/pull/2014 +* Fix MPI large count issues when respliting by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1973 +* Default GPU+MPI compatibility settings for unknown MPI implementations by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/2060 + ## Contributors @Marc-Jindra, @ClaudiaComito, @JuanPedroGHM From 70efcf924e7ded25a9ec23957fb30020f2add4d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Juan=20Pedro=20Guti=C3=A9rrez=20Hermosillo=20Muriedas?= Date: Mon, 22 Dec 2025 15:03:11 +0100 Subject: [PATCH 12/15] Update bug_report.yml --- .github/ISSUE_TEMPLATE/bug_report.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml index 12a163dde4..93ea99fb6a 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.yml +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -34,7 +34,7 @@ body: description: What version of Heat are you running? options: - main (development branch) - - 1.6.x + - 1.7.x - other validations: required: true From 7100ea70737e31910443a70c7459ee1f2e36ec28 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?= Date: Mon, 22 Dec 2025 15:04:44 +0100 Subject: [PATCH 13/15] final changelog, citation and readme --- CHANGELOG.md | 17 ++++++++++++----- CITATION.cff | 6 ++++-- README.md | 5 +++-- 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5c18de1608..dd5a6440d1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,21 +2,25 @@ ## Highlights -1) Randomized Symmetric eignevalue decomposition (eigh) -2) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch +1) DistributedSampler for efficient data loading and shuffling across multiple nodes with PyTorch +2) Randomized Symmetric eignevalue decomposition (eigh) 3) Incremental SVD directly from an HDF5 file 4) Partial support of the Array API Standard (version: '2020.10'), and API namespace under `x.__array_namespace__(api_version='2020.10')` 5) Distributed PTP (peak to peak) function +*SVD, PCA, and DMD have been implemented within the project ESAPCA funded by the European Space Agency (ESA). This support is gratefully acknowledged.* + ## Changes ### Features - * Randomized Symmetric Eigenvalue Decomposition (eigh) by @mrfh92 in https://github.com/helmholtz-analytics/heat/pull/1964 * Incremental SVD directly from HDF5 file by @LScheib in https://github.com/helmholtz-analytics/heat/pull/2005 -* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022 * Distributed Peak to Peak (ptp) function by @ivansherbakov9 in https://github.com/helmholtz-analytics/heat/pull/1954 * PyTorch compatible DistributedSampler by @Berkant03 in https://githubcom/helmholtz-analytics/heat/pull/1807 +### Interoperability +* Support Pytorch 2.9.1 by @github-actions[bot] in https://github.com/helmholtz-analytics/heat/pull/2001 +* Array API Namespace by @mtar in https://github.com/helmholtz-analytics/heat/pull/1022 + ### Bug Fixes * Sturdier MPI+GPU compatibility check by @JuanPedroGHM in https://github.com/helmholtz-analytics/heat/pull/1979 @@ -30,7 +34,10 @@ ## Contributors -@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM +@Marc-Jindra, @ClaudiaComito, @JuanPedroGHM, @GioPede, @ivansherbakov9, @LScheib, @Berkant03, @mrfh92, @mtar + +#### Acknowledgement and Disclaimer +*This work is partially carried out under a [programme](https://activities.esa.int/index.php/4000144045) of, and funded by, the European Space Agency. Any view expressed in this repository or related publications can in no way be taken to reflect the official opinion of the European Space Agency.* # v1.6.0 ## Highlights diff --git a/CITATION.cff b/CITATION.cff index 99e317226a..1e8e02c9c0 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -7,10 +7,12 @@ message: 'If you use this software, please cite it as below.' type: software authors: # release highlights - - family-names: Comito - given-names: Claudia - family-names: Hoppe given-names: Fabian + - family-names: Palazoglu + given-names: Berkant + - family-names: Comito + given-names: Claudia - family-names: Gutiérrez Hermosillo Muriedas given-names: Juan Pedro # active contributors in alphabetic order diff --git a/README.md b/README.md index 0ad84becf8..9a0866ca69 100644 --- a/README.md +++ b/README.md @@ -118,6 +118,7 @@ computational and memory needs of your laptop and desktop. ### Parallel I/O - h5py - netCDF4 +- zarr ### GPU support In order to do computations on your GPU(s): @@ -132,10 +133,10 @@ On most HPC-systems you will not be able to install/compile MPI or CUDA/ROCm you Install the latest version with ```bash -pip install heat[hdf5,netcdf] +pip install heat[hdf5,netcdf,zarr] ``` where the part in brackets is a list of optional dependencies. You can omit -it, if you do not need HDF5 or NetCDF support. +it, if you do not need HDF5, NetCDF, or Zarr support. ## **conda** From b1a60b8fd550166d1f7005a0125e0f9a886083d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?= Date: Mon, 22 Dec 2025 15:36:55 +0100 Subject: [PATCH 14/15] fix: removed p3.14 tests --- .github/workflows/ci.yaml | 8 ++++---- CHANGELOG.md | 2 +- CITATION.cff | 18 +++++++++--------- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index af57812b6c..55fa7882c6 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -27,10 +27,10 @@ jobs: - 'torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1' - 'torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0' - 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1' - include: - - py-version: '3.14' - pytorch-version: 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1' - install-options: '.' + # include: + # - py-version: '3.14' + # pytorch-version: 'torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1' + # install-options: '.' exclude: - py-version: '3.13' pytorch-version: 'numpy==1.26 torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2' diff --git a/CHANGELOG.md b/CHANGELOG.md index dd5a6440d1..a4bf58db96 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -# v1.7.0 - Heat Minor Release - 1.7.0 +# v1.7.0 ## Highlights diff --git a/CITATION.cff b/CITATION.cff index 1e8e02c9c0..225160bac8 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -7,17 +7,21 @@ message: 'If you use this software, please cite it as below.' type: software authors: # release highlights - - family-names: Hoppe - given-names: Fabian - family-names: Palazoglu given-names: Berkant + - family-names: Hoppe + given-names: Fabian + - family-names: Scheib + given-names: Lukas + - family-names: Tarnawa + given-names: Michael +# active contributors in alphabetic order + - family-names: Akdag + given-names: Hakan - family-names: Comito given-names: Claudia - family-names: Gutiérrez Hermosillo Muriedas given-names: Juan Pedro -# active contributors in alphabetic order - - family-names: Akdag - given-names: Hakan - family-names: Hees given-names: Jörn - family-names: Jindra @@ -28,10 +32,6 @@ authors: given-names: Kai - family-names: Lemmen given-names: Jonas - - family-names: Scheib - given-names: Lukas - - family-names: Tarnawa - given-names: Michael # historic core team - family-names: Coquelin given-names: Daniel From 3f78649d5a93597fe7677068cb1d17ca497fc10d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?= Date: Fri, 9 Jan 2026 14:37:51 +0100 Subject: [PATCH 15/15] fix: dev version number --- heat/core/version.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/heat/core/version.py b/heat/core/version.py index bb133399f7..ea6b038234 100644 --- a/heat/core/version.py +++ b/heat/core/version.py @@ -2,11 +2,11 @@ major: int = 1 """Indicates Heat's main version.""" -minor: int = 7 +minor: int = 8 """Indicates feature extension.""" micro: int = 0 """Indicates revisions for bugfixes.""" -extension: str = None +extension: str = "dev" """Indicates special builds, e.g. for specific hardware.""" if not extension: