In our research project we have some non-numerical data points which we would like to put into Variogram as the coordinates. There is a distance function which computes distance values between pairs of points. Also, there is a measure of fitness (quality) for each point, so this would go in as the values.
This is conceptually ok, but Variogram and MetricSpace put their data (coordinates and coords respectively) into Numpy arrays, so it's a crash. Below are my investigations, ending in a hack which seems to work.
Would you be interested in a pull request which allows MetricSpace(dist_matrix, metric='precomputed'), or some similar design?
# test non-numpy data
import numpy as np
from skgstat import MetricSpace, Variogram
import skgstat
print(skgstat.__version__)
# '1.0.23' - I used pip install -e from GitHub
# x data is not Numpy-compatible
# just an example: our real data is made up of complex nested tuples
x = [(1, 2, 3), (2, 3), (3, 1, 2), (4, 1, 2, 3), (1, 2, 3, 5, 4)]
y = [len(xi) for xi in x]
n = len(x)
def dist(a, b): return abs(len(a) - len(b))
dist_matrix = np.zeros((n, n))
for i in range(n):
for j in range(n):
dist_matrix[i, j] = dist(x[i], x[j])
print(dist_matrix) # abs(len(a) - len(b))
# [[0. 1. 0. 1. 2.]
# [1. 0. 1. 2. 3.]
# [0. 1. 0. 1. 2.]
# [1. 2. 1. 0. 1.]
# [2. 3. 2. 1. 0.]]
print("The x data can't go directly into a Numpy array, so this is a crash in Variogram:")
try:
v = Variogram(x, y, dist_func=dist)
except Exception as e:
print(e)
print("Same for MetricSpace:")
try:
ms = MetricSpace(x, dist_metric=dist)
except Exception as e:
print(e)
print("This method I thought might work, by providing precomputed distances")
print("It doesn't crash, but it is treating the distance matrix (nxn) as")
print("coordinates, not distances, ie n points in an n-dimensional space")
ms = MetricSpace(dist_matrix)
print(ms.dists)
# [[0. 2.23606798 0. 2.23606798 4. ]
# [2.23606798 0. 2.23606798 3.46410162 4.58257569]
# [0. 2.23606798 0. 2.23606798 4. ]
# [2.23606798 3.46410162 2.23606798 0. 2.23606798]
# [4. 4.58257569 4. 2.23606798 0. ]]
print("This crashes, as scipy doesn't expect 'precomputed'")
print("Something like this could be a nice API?")
try:
ms = MetricSpace(dist_matrix, dist_metric="precomputed")
except Exception as e:
print(e)
print("Claude suggested this hack, which seems to work ok")
ms = MetricSpace(dist_matrix)
dummy = np.random.random((n, 2)) # dummy coords, we will over-write dists
ms = MetricSpace(dummy)
ms._dists = dist_matrix
print(ms.dists)
# [[0. 1. 0. 1. 2.]
# [1. 0. 1. 2. 3.]
# [0. 1. 0. 1. 2.]
# [1. 2. 1. 0. 1.]
# [2. 3. 2. 1. 0.]]
v = Variogram(coordinates=ms, values=y)
In our research project we have some non-numerical data points which we would like to put into Variogram as the
coordinates. There is a distance function which computes distance values between pairs of points. Also, there is a measure of fitness (quality) for each point, so this would go in as thevalues.This is conceptually ok, but Variogram and MetricSpace put their data (
coordinatesandcoordsrespectively) into Numpy arrays, so it's a crash. Below are my investigations, ending in a hack which seems to work.Would you be interested in a pull request which allows
MetricSpace(dist_matrix, metric='precomputed'), or some similar design?