Skip to content

non-Numpy data in Variogram and MetricSpace #206

@jmmcd

Description

@jmmcd

In our research project we have some non-numerical data points which we would like to put into Variogram as the coordinates. There is a distance function which computes distance values between pairs of points. Also, there is a measure of fitness (quality) for each point, so this would go in as the values.

This is conceptually ok, but Variogram and MetricSpace put their data (coordinates and coords respectively) into Numpy arrays, so it's a crash. Below are my investigations, ending in a hack which seems to work.

Would you be interested in a pull request which allows MetricSpace(dist_matrix, metric='precomputed'), or some similar design?

# test non-numpy data

import numpy as np
from skgstat import MetricSpace, Variogram
import skgstat

print(skgstat.__version__)
# '1.0.23' - I used pip install -e from GitHub

# x data is not Numpy-compatible
# just an example: our real data is made up of complex nested tuples
x = [(1, 2, 3), (2, 3), (3, 1, 2), (4, 1, 2, 3), (1, 2, 3, 5, 4)]
y = [len(xi) for xi in x]
n = len(x)

def dist(a, b): return abs(len(a) - len(b)) 
dist_matrix = np.zeros((n, n))
for i in range(n):
    for j in range(n):
        dist_matrix[i, j] = dist(x[i], x[j])
print(dist_matrix) # abs(len(a) - len(b)) 
# [[0. 1. 0. 1. 2.] 
#  [1. 0. 1. 2. 3.]
#  [0. 1. 0. 1. 2.]
#  [1. 2. 1. 0. 1.]
#  [2. 3. 2. 1. 0.]]

print("The x data can't go directly into a Numpy array, so this is a crash in Variogram:")
try:
    v = Variogram(x, y, dist_func=dist)
except Exception as e:
    print(e)

print("Same for MetricSpace:")
try:
    ms = MetricSpace(x, dist_metric=dist)
except Exception as e:
    print(e)

print("This method I thought might work, by providing precomputed distances")
print("It doesn't crash, but it is treating the distance matrix (nxn) as")
print("coordinates, not distances, ie n points in an n-dimensional space")
ms = MetricSpace(dist_matrix)
print(ms.dists)
# [[0.         2.23606798 0.         2.23606798 4.        ]
#  [2.23606798 0.         2.23606798 3.46410162 4.58257569]
#  [0.         2.23606798 0.         2.23606798 4.        ]
#  [2.23606798 3.46410162 2.23606798 0.         2.23606798]
#  [4.         4.58257569 4.         2.23606798 0.        ]]

print("This crashes, as scipy doesn't expect 'precomputed'")
print("Something like this could be a nice API?")
try:
    ms = MetricSpace(dist_matrix, dist_metric="precomputed")
except Exception as e:
    print(e)


print("Claude suggested this hack, which seems to work ok")
ms = MetricSpace(dist_matrix)
dummy = np.random.random((n, 2)) # dummy coords, we will over-write dists
ms = MetricSpace(dummy) 
ms._dists = dist_matrix
print(ms.dists)
# [[0. 1. 0. 1. 2.]
#  [1. 0. 1. 2. 3.]
#  [0. 1. 0. 1. 2.]
#  [1. 2. 1. 0. 1.]
#  [2. 3. 2. 1. 0.]]
v = Variogram(coordinates=ms, values=y)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions