-
Notifications
You must be signed in to change notification settings - Fork 9
Description
It's currently possible to do something like this:
a = xugrid.data.elevation_nl()
b = a.isel(mesh2d_nFaces=permutation)
c = a + bThis'll work without a problem, because the xugrid machinery doesn't really check anything. Xarray will check the coordinates and dimensions, but these are always appropriate here (1...nface).
Anyway, reasoning from the structured case, xarray will call an align method here (e.g. if we have decreasing and increasing y-coordinates, or subsets). I'm not too fond of this, because it tends to hide (unintended/unknown) incommensurability of datasets and because it is somewhat expensive.
It is, of course, far more expensive for unstructured topologies. Failure can be cheap, when e.g. the first face is already different; however, success requires exhaustive checking of each face.
A pragmatic workaround here seems to hash. The question here is: what makes a topology equivalent? This can be asked broadly or narrowly: for 2D, are all nodes, edges, faces exactly the same; or are the faces the same, but are the edges ordered differently?
Given that xugrid/xarray will dispatch on the dimension, it seems to me that these can be reasonably separated. In other words, if a and b share faces but not edges; then it's okay to add the face associated data of both.
The full, explicit coordinates, are the best measure of topogical equality:
- node_coordinates (n_node, 2)
- face_node_coordinates (n_face, max_n_node_per_face, 2)
- edge_node_coordinates (n_edge, 2, 2)
However, the latter two are lazy properties since they aren't relevant often. I think it's reasonable to substitute:
- hash(face_node_coordinates) -> hash(node_coordinates) & hash(face_node_connectivity)
- hash(edge_node_coordinates) -> hash(node_coordinate) & hash(edge_node_connectivity)
Worth noting, this doesn't commute exactly: if the nodes are reordered, we will get a different hash on node_coordinates, but the face_node_connectivity may have been updated to take the ordering into account and still result in the same face_node_coordinates. However, this seems like sufficiently rare scenario that we may classify it as topological inequality (easily solved by reindex_like) that it doesn't warrant much worry.
Similarly, we will not worry about floating point roundoff; the general use of xugrid is with a static topology that may be reordered and subsetted, but will not be changed (and reindex_like provides a tolerance keyword here anyway).
In terms of hashing algorithm, we don't need some cryptographically secure. I'm wondering whether we can just use the built-in hash() (apparently "SipHash") or whether it's worthwhile to pull in an additional dependency in the form of xxhash. (For comparison: Joblib offers md5 and sha.)
In general: these hashes are created at instantation or reordering, with generally smallish arrays (compared to the full datasets with time, layer, etc. dimensions). MD5 has the benefit of being deterministic, whereas hash() differs across sessions, and being standard library.
Anyway, best to take this up for #35