Skip to content

Conversation

@fulmicoton
Copy link
Collaborator

@fulmicoton fulmicoton commented Dec 1, 2025

rebased PR of #2729

Implement a SPATIAL flag for use in creating a spatial field.
Encodes triangles with the bounding box in the first four words,
enabling efficient spatial pruning during tree traversal without
reconstructing the full triangle. The remaining words contain an
additional vertex and packed reconstruction metadata, allowing exact
triangle recovery when needed.
The `triangulate` function takes a polygon with floating-point lat/lon
coordinates, converts to integer coordinates with millimeter precision
(using 2^32 scaling), performs constrained Delaunay triangulation, and
encodes the resulting triangles with boundary edge information for block
kd-tree spatial indexing.

It handles polygons with holes correctly, preserving which triangle
edges lie on the original polygon boundaries versus internal
tessellation edges.
Implemented byte-wise histogram selection to find median values without
comparisons, enabling efficient partitioning of spatial data during
block kd-tree construction. Processes values through multiple passes,
building histograms for each byte position after a common prefix,
avoiding the need to sort or compare elements directly.
Implemented a `Surveyor` that will evaluate the bounding boxes of a set
of triangles and determine the dimension with the maximum spread and the
shared prefix for the values of dimension with the maximum spread.
Implements dimension-major bit-packing with zigzag encoding for signed i32
deltas, enabling compression of spatially-clustered triangles from 32-bit
coordinates down to 4-19 bits per delta depending on spatial extent.
Implement an immutable bulk-loaded spatial index using recursive median
partitioning on bounding box dimensions. Each leaf stores up to 512
triangles with delta-compressed coordinates and doc IDs. The tree
provides three query types (intersects, within, contains) that use exact
integer arithmetic for geometric predicates and accumulate results in
bit sets for efficient deduplication across leaves.

The serialized format stores compressed leaf pages followed by the tree
structure (leaf and branch nodes), enabling zero-copy access through
memory-mapped segments without upfront decompression.
Lossless compression for floating-point lat/lon coordinates using XOR
delta encoding on IEEE 754 bit patterns with variable-length integer
encoding. Designed for per-polygon random access in the document store,
where each polygon compresses independently without requiring sequential
decompression.
The triangulation function in `triangle.rs` is now called
`delaunay_to_triangles` and it accepts the output of a Delaunay
triangulation from `i_triangle` and not a GeoRust multi-polygon. The
translation of user polygons to `i_triangle` polygons and subsequent
triangulation will take place outside of `triangle.rs`.
Implemented a geometry document field with a minimal `Geometry` enum.
Now able to add that Geometry from GeoJSON parsed from a JSON document.
Geometry is triangulated if it is a polygon, otherwise it is correctly
encoded as a degenerate triangle if it is a point or a line string.
Write accumulated triangles to a block kd-tree on commit.

Serialize the original `f64` polygon for retrieval from search.

Created a query method for intersection. Query against the memory mapped
block kd-tree. Return hits and original `f64` polygon.

Implemented a merge of one or more block kd-trees from one or more
segments during merge.

Updated the block kd-tree to write to a Tantivy `WritePtr` instead of
more generic Rust I/O.
Ended up using `select_nth_unstable_by_key` from the Rust standard
library instead.
Read node structures using `from_le_bytes` instead of casting memory.
After an inspection of columnar storage, it appears that this is the
standard practice in Rust and in the Tantivy code base. Left the
structure alignment for now in case it tends to align with cache
boundaries.
Addressed all `todo!()` markers created when adding `Spatial` field type
and `Geometry` value type to existing code paths:

- Dynamic field handling: `Geometry` not supported in dynamic JSON
  fields, return `unimplemented!()` consistently with other complex
  types.
- Fast field writer: Panic if geometry routed incorrectly (internal
  error.)
- `OwnedValue` serialization: Implement `Geometry` to GeoJSON
  serialization and reference-to-owned conversion.
- Field type: Return `None` for `get_index_record_option()` since
  spatial fields use BKD trees, not inverted index.
- Space usage tracking: Add spatial field to `SegmentSpaceUsage` with
  proper integration through `SegmentReader`.
- Spatial query explain: Implement `explain()` method following pattern of
  other binary/constant-score queries.

Fixed `MultiPolygon` deserialization bug: count total points across all
rings, not number of rings.

Added clippy expects for legitimate too_many_arguments cases in geometric
predicates.
This reverts commit 19eab16.

Restore radix select in order to implement a merge solution that will
not require a temporary file.
Existing code behaved as if the result of `select_nth_unstable_by_key`
was either a sorted array or the product of an algorithm that gathered
partition values as in the Dutch national flag problem. The existing
code was written knowing that the former isn't true and the latter isn't
advertised. Knowing, but not remembering. Quite the oversight.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants