helmholtz-analytics · JuanPedroGHM · Nov 7, 2025 · Nov 7, 2025 · Nov 7, 2025 · Nov 7, 2025
diff --git a/.gitignore b/.gitignore
@@ -309,3 +309,5 @@ perun_results/
 bench_data/
 my_dev_stuff/
 docs/source/autoapi
+
+.DS_Store
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -1,28 +1,13 @@
-# .readthedocs.yaml
-# Read the Docs configuration file
-# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
-
-# Required
 version: 2
 
-# Set the version of Python and other tools you might need
 build:
   os: ubuntu-22.04
   tools:
-    python: "3.11"
-  apt_packages:
-    - pandoc
-    - libopenmpi-dev
-
-# Build documentation in the docs/ directory with Sphinx
-sphinx:
-  configuration: doc/source/conf.py
+    python : "3.12"
 
-# We recommend specifying your dependencies to enable reproducible builds:
-# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
 python:
   install:
-  - method: pip
-    path: .
-    extra_requirements:
-    - docs
+  - requirements: doc/requirements.txt
+
+mkdocs:
+  configuration: mkdocs.yml
diff --git a/.talismanrc b/.talismanrc
@@ -0,0 +1,27 @@
+allowed_patterns:
+- 'uses: [A-Za-z-\/]+@[\w\d]+ # v\d+\.\d+\.\d+'
+fileignoreconfig:
+- filename: doc/api/heat/core/communication.md
+  checksum: bc3c94844563914d8699c1402a4abde3299fd116770d7ef5e1c070a033f4eacf
+- filename: doc/api/heat/core/random.md
+  checksum: c1eb4ea8d8435c712639e949296411aff4789c2d89751fd2951019bbe6bfc1da
+- filename: doc/api/heat/core/tiling.md
+  checksum: a64c827bbf08bd0f61470d3352ee9e711b23521aea8944854c34c7c00c393f33
+- filename: doc/api/heat/core/io.md
+  checksum: f5289cd5507487a1cfd432a038083c18b53b30b338a2a6187116b1ee3d821536
+- filename: doc/api/heat/core/exponential.md
+  checksum: 2a840d5c6bb43caada5ed7b4aa92739fdfa9523ce50fc31f606ddc1dd00bcddf
+- filename: doc/api/heat/core/factories.md
+  checksum: 3be4e1f2ec3ffc1fc55bca4a082eff60cb2426917814dcb0c6789b60f5995c27
+- filename: doc/api/heat/core/linalg/basics.md
+  checksum: 59448ee5640ea8007856ce844e68faf195571ba99ec4588ed9a20cf310c58d1f
+- filename: doc/api/heat/core/dndarray.md
+  checksum: 904826a791036d03728861a1d432faa95f3d4f9a358f678baea4ada7fddb789f
+- filename: doc/api/heat/graph/laplacian.md
+  checksum: d8d4ec48750ae7c5d34e11ff7e6a79157f87a220accd57f4d8fa4d1a82254605
+- filename: doc/api/heat/sparse/factories.md
+  checksum: 7f57c6834ad98632f9a5e599c9fb9ec927c391fa8d44041d8a6efb1806c6a98e
+- filename: doc/api/heat/optim/dp_optimizer.md
+  checksum: 47492db3eb665a09e16b9346487925554f2c96d020f3d8a0d2180a5f0b5d511b
+- filename: doc/api/heat/optim/index.md
+  checksum: 0965bdff7bc43743551b0043720b525cc0992247ed4840a69228d4fe28ebb812
diff --git a/doc/api/heat/classification/index.md b/doc/api/heat/classification/index.md
@@ -0,0 +1,8 @@
+Module heat.classification
+==========================
+Provides classification algorithms.
+
+Sub-modules
+-----------
+* heat.classification.kneighborsclassifier
+* heat.classification.tests
diff --git a/doc/api/heat/classification/kneighborsclassifier.md b/doc/api/heat/classification/kneighborsclassifier.md
@@ -0,0 +1,75 @@
+Module heat.classification.kneighborsclassifier
+===============================================
+Implements the k-nearest neighbors (kNN) classifier
+
+Classes
+-------
+
+`KNeighborsClassifier(n_neighbors: int = 5, effective_metric_: Callable = None)`
+:   Implementation of the k-nearest-neighbors Algorithm [1].
+
+    This algorithm predicts labels to data vectors by using an labeled training dataset as reference. The input vector
+    to be predicted is compared to the training vectors by calculating the Euclidean distance between each of them. A
+    majority vote of the k-nearest, i.e. closest or smallest distanced, training vectors labels is selected as
+    predicted class.
+
+    Parameters
+    ----------
+    n_neighbors : int, optional, default: 5
+        Number of neighbours to consider when choosing label.
+    effective_metric_ : Callable, optional
+        The distance function used to identify the nearest neighbors, defaults to the Euclidean distance.
+
+    References
+    ----------
+    [1] T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in IEEE Transactions on Information Theory,
+    vol. 13, no. 1, pp. 21-27, January 1967, doi: 10.1109/TIT.1967.1053964.
+
+    ### Ancestors (in MRO)
+
+    * heat.core.base.BaseEstimator
+    * heat.core.base.ClassificationMixin
+
+    ### Static methods
+
+    `one_hot_encoding(x: heat.core.dndarray.DNDarray) ‑> heat.core.dndarray.DNDarray`
+    :   One-hot-encodes the passed vector or single-column matrix.
+
+        Parameters
+        ----------
+        x : DNDarray
+            The data to be encoded.
+
+    ### Methods
+
+    `fit(self, x: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray)`
+    :   Fit the k-nearest neighbors classifier from the training dataset.
+
+        Parameters
+        ----------
+        x : DNDarray
+            Labeled training vectors used for comparison in predictions, Shape=(n_samples, n_features).
+        y : DNDarray
+            Corresponding labels for the training feature vectors. Must have the same number of samples as ``x``.
+            Shape=(n_samples) if integral labels or Shape=(n_samples, n_classes) if one-hot-encoded.
+
+        Raises
+        ------
+        TypeError
+            If ``x`` or ``y`` are not DNDarrays.
+        ValueError
+            If ``x`` and ``y`` shapes mismatch or are not two-dimensional matrices.
+
+        Examples
+        --------
+        >>> samples = ht.rand(10, 3)
+        >>> knn = KNeighborsClassifier(n_neighbors=1)
+        >>> knn.fit(samples)
+
+    `predict(self, x: heat.core.dndarray.DNDarray) ‑> heat.core.dndarray.DNDarray`
+    :   Predict the class labels for the provided data.
+
+        Parameters
+        ----------
+        x : DNDarray
+            The test samples.
diff --git a/doc/api/heat/classification/tests/index.md b/doc/api/heat/classification/tests/index.md
@@ -0,0 +1,6 @@
+Module heat.classification.tests
+================================
+
+Sub-modules
+-----------
+* heat.classification.tests.test_knn
diff --git a/doc/api/heat/classification/tests/test_knn.md b/doc/api/heat/classification/tests/test_knn.md
@@ -0,0 +1,63 @@
+Module heat.classification.tests.test_knn
+=========================================
+
+Classes
+-------
+
+`TestKNN(methodName='runTest')`
+:   A class whose instances are single test cases.
+
+    By default, the test code itself should be placed in a method named
+    'runTest'.
+
+    If the fixture may be used for many test cases, create as
+    many test methods as are needed. When instantiating such a TestCase
+    subclass, specify in the constructor arguments the name of the test method
+    that the instance is to execute.
+
+    Test authors should subclass TestCase for their own tests. Construction
+    and deconstruction of the test's environment ('fixture') can be
+    implemented by overriding the 'setUp' and 'tearDown' methods respectively.
+
+    If it is necessary to override the __init__ method, the base class
+    __init__ method must always be called. It is important that subclasses
+    should not change the signature of their __init__ method, since instances
+    of the classes are instantiated automatically by parts of the framework
+    in order to be run.
+
+    When subclassing TestCase, you can set these attributes:
+    * failureException: determines which exception will be raised when
+        the instance's assertion methods fail; test methods raising this
+        exception will be deemed to have 'failed' rather than 'errored'.
+    * longMessage: determines whether long messages (including repr of
+        objects used in assert methods) will be printed on failure in *addition*
+        to any explicit message passed.
+    * maxDiff: sets the maximum length of a diff in failure messages
+        by assert methods using difflib. It is looked up as an instance
+        attribute so can be configured by individual tests if required.
+
+    Create an instance of the class that will use the named test
+    method when executed. Raises a ValueError if the instance does
+    not have a method with the specified name.
+
+    ### Ancestors (in MRO)
+
+    * heat.core.tests.test_suites.basic_test.TestCase
+    * unittest.case.TestCase
+
+    ### Methods
+
+    `test_exception(self)`
+    :
+
+    `test_fit_one_hot(self)`
+    :
+
+    `test_split_none(self)`
+    :
+
+    `test_split_zero(self)`
+    :
+
+    `test_utility(self)`
+    :
diff --git a/doc/api/heat/cli.md b/doc/api/heat/cli.md
@@ -0,0 +1,12 @@
+Module heat.cli
+===============
+Heat command line interface module.
+
+Functions
+---------
+
+`cli() ‑> None`
+:   Command line interface entrypoint.
+
+`plaform_info()`
+:   Print the current software stack being used by heat, including available devices.
diff --git a/doc/api/heat/cluster/batchparallelclustering.md b/doc/api/heat/cluster/batchparallelclustering.md
@@ -0,0 +1,82 @@
+Module heat.cluster.batchparallelclustering
+===========================================
+Module implementing some clustering algorithms that work in parallel on batches of data.
+
+Variables
+---------
+
+`self`
+:   Auxiliary single-process functions and base class for batch-parallel k-clustering
+
+Classes
+-------
+
+`BatchParallelKMeans(n_clusters: int = 8, init: str = 'k-means++', max_iter: int = 300, tol: float = 0.0001, random_state: int = None, n_procs_to_merge: int = None)`
+:   Batch-parallel K-Means clustering algorithm from Ref. [1].
+    The input must be a ``DNDarray`` of shape `(n_samples, n_features)`, with split=0 (i.e. split along the sample axis).
+    This method performs K-Means clustering on each batch (i.e. on each process-local chunk) of data individually and in parallel.
+    After that, all centroids from the local K-Means are gathered and another instance of K-means is performed on them in order to determine the final centroids.
+    To improve scalability of this approach also on a large number of processes, this procedure can be applied in a hierarchical manner using the parameter `n_procs_to_merge`.
+
+    Attributes
+    ----------
+    n_clusters : int
+        The number of clusters to form as well as the number of centroids to generate.
+    init : str
+        Method for initialization for local and global k-means:
+        - ‘k-means++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2].
+        - ‘random’: choose k observations (rows) at random from data for the initial centroids. (Not implemented yet)
+    max_iter : int
+        Maximum number of iterations of the local/global k-means algorithms.
+    tol : float
+        Relative tolerance with regards to inertia to declare convergence, both for local and global k-means.
+    random_state : int
+        Determines random number generation for centroid initialization.
+    n_procs_to_merge : int
+        Number of processes to merge after each iteration of the local k-means. If None, all processes are merged after each iteration.
+
+
+    References
+    ----------
+    [1] Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila V. Sukhostat, Parallel batch k-means for Big data clustering, Computers & Industrial Engineering, Volume 152 (2021). https://doi.org/10.1016/j.cie.2020.107023.
+
+    ### Ancestors (in MRO)
+
+    * heat.cluster.batchparallelclustering._BatchParallelKCluster
+    * heat.core.base.ClusteringMixin
+    * heat.core.base.BaseEstimator
+
+`BatchParallelKMedians(n_clusters: int = 8, init: str = 'k-medians++', max_iter: int = 300, tol: float = 0.0001, random_state: int = None, n_procs_to_merge: int = None)`
+:   Batch-parallel K-Medians clustering algorithm, in analogy to the K-means algorithm from Ref. [1].
+    This requires data to be given as DNDarray of shape (n_samples, n_features) with split=0 (i.e. split along the sample axis).
+    The idea of the method is to perform the classical K-Medians on each batch of data (i.e. on each process-local chunk of data) individually and in parallel.
+    After that, all centroids from the local K-Medians are gathered and another instance of K-Medians is performed on them in order to determine the final centroids.
+    To improve scalability of this approach also on a range number of processes, this procedure can be applied in a hierarchical manor using the parameter n_procs_to_merge.
+
+    Attributes
+    ----------
+    n_clusters : int
+        The number of clusters to form as well as the number of centroids to generate.
+    init : str
+        Method for initialization for local and global k-medians:
+        - ‘k-medians++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2].
+        - ‘random’: choose k observations (rows) at random from data for the initial centroids. (Not implemented yet)
+    max_iter : int
+        Maximum number of iterations of the local/global k-Medians algorithms.
+    tol : float
+        Relative tolerance with regards to inertia to declare convergence, both for local and global k-Medians.
+    random_state : int
+        Determines random number generation for centroid initialization.
+    n_procs_to_merge : int
+        Number of processes to merge after each iteration of the local k-Medians. If None, all processes are merged after each iteration.
+
+
+    References
+    ----------
+    [1] Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila V. Sukhostat, Parallel batch k-means for Big data clustering, Computers & Industrial Engineering, Volume 152 (2021). https://doi.org/10.1016/j.cie.2020.107023.
+
+    ### Ancestors (in MRO)
+
+    * heat.cluster.batchparallelclustering._BatchParallelKCluster
+    * heat.core.base.ClusteringMixin
+    * heat.core.base.BaseEstimator
diff --git a/doc/api/heat/cluster/index.md b/doc/api/heat/cluster/index.md
@@ -0,0 +1,12 @@
+Module heat.cluster
+===================
+Add the clustering functions to the ht.cluster namespace
+
+Sub-modules
+-----------
+* heat.cluster.batchparallelclustering
+* heat.cluster.kmeans
+* heat.cluster.kmedians
+* heat.cluster.kmedoids
+* heat.cluster.spectral
+* heat.cluster.tests