Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit language around edges and ensure available options capture engine behaviours that import/export Parquet #250

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 41 additions & 3 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,14 +181,52 @@ It is RECOMMENDED to always set the orientation (to counterclockwise) if `edges`

#### edges

This attribute indicates how to interpret the edges of the geometries: whether the line between two points is a straight cartesian line or the shortest line on the sphere (geodesic line). Available values are:
- `"planar"`: use a flat cartesian coordinate system.
- `"spherical"`: use a spherical coordinate system and radius derived from the spheroid defined by the coordinate reference system.
This attribute describes describing the interpretation of edges between explicitly
defined vertices.

- `"planar"`: edges will be interpreted following the language of
[Simple features access](https://www.opengeospatial.org/standards/sfa):

> **simple feature** feature with all geometric attributes described piecewise
> by straight line or planar interpolation between sets of points (Section 4.19).

- `"spherical"`: Edges follow the shortest distance between vertices approximated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can one describe the perfect sphere via a geodetic CRS? That way, we can merge spherical and geodesic together and recommend everyone to look into the crs key for interpreting the edge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a sphere is described by an ellipsoid with an inverse flattening conventionally set to 0 (it should actually be infinity, but the 0 convention is heavily used)
I believe an issue though (mentionned in some other discussion) is that in some situations some software might apply the formulas for spheres on non-spherical ellipsoids as a simplification avoiding using elliptical integrals

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did this we would loose the original CRS information, which is important for ensuring that the position of every single vertex is precisely defined (e.g., a CRS transform from a perfect sphere ellipsoid to UTM, for example, is not the transform we want to invoke on the longitude and latitude of the vertices). The spherical approximation is strictly limited to the (usually small) space between adjacent vertices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the geodesic and spherical are probably always trivial (I can at some point do some math to get a handle on the maximum/average error), but if we want a precise definition, we need it to be there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand. spherical mode actually overrides the edge specified in the crs. That's where the confusion came from.

I think this is making great progress. I will schedule another meeting among all parties to settle down on the geography type of Iceberg

as the shortest distance between the vertices on a perfect sphere. This edge
interpretation is used by
[BigQuery Geography](https://cloud.google.com/bigquery/docs/geospatial-data#coordinate_systems_and_edges),
and [Snowflake Geography](https://docs.snowflake.com/en/sql-reference/data-types-geospatial).
A common library for interpreting edges in this way is
[Google's s2geometry](https://github.com/google/s2geometry).
- `"geodesic"`: Edges follow the shortest distance between vertices on the
ellipsoid defined by the `crs` key. This edge interpretation is used by
[Microsoft SQL Server Geography](https://learn.microsoft.com/en-us/sql/t-sql/spatial-geography/spatial-types-geography),
[Amazon Redshift Geography](https://docs.aws.amazon.com/redshift/latest/dg/geospatial-overview.html),
and [PostGIS](https://postgis.net/docs/geography.html). A common library for
interpreting edges in this way is
[GeographicLib](https://github.com/geographiclib/geographiclib).

If no value is set, the default value to assume is `"planar"`.

Note if `edges` is `"spherical"` then it is RECOMMENDED that `orientation` is always ensured to be `"counterclockwise"`. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly. In this case, software will typically interpret the rings of a polygon such that it encloses at most half of the sphere (i.e. the smallest polygon of both ways it could be interpreted). But the specification itself does not make any guarantee about this.

If an implementation only has support for a single edge interpretation (e.g.,
a library with only planar edge support), an column with a different edge type
paleolimbot marked this conversation as resolved.
Show resolved Hide resolved
may be imported without loosing information if the geometries in the column
do not contain edges (i.e., the column only contains points or empty geometries).
For columns that contain edges, the error introduced by ignoring the original
edge interpretation is similar to the error introduced by applying a coordinate
transformation to vertices (which is usually small but may be large or create
invalid geometries, particularly if vertices are not closely spaced). Ignoring
the original edge interpretation will silently introduce invalid and/or
misinterpreted geometries for any edge that crosses the antimeridian (i.e.,
longitude 180/-180) when translating from `"spherical"` or `"geodesic"` edges
to planar edges.

Implementations may implicitly import columns with an unsupported edge type if the
columns do not contain edges. Implementations may otherwise import columns with an
unsupported edge type with an explicit opt-in from a user or if accompanied
by a prominent warning.

#### bbox

Bounding boxes are used to help define the spatial extent of each geometry column. Implementations of this schema may choose to use those bounding boxes to filter partitions (files) of a partitioned dataset.
Expand Down
Loading