Skip to content

Commit 1b68805

Browse files
kylebarroncholmes
andauthored
CI linting for consistent whitespace (#75)
* Whitespace lint step Co-authored-by: Chris Holmes <[email protected]>
1 parent dd9e32f commit 1b68805

File tree

5 files changed

+69
-23
lines changed

5 files changed

+69
-23
lines changed

.github/workflows/lint.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: Lint
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
pull_request:
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v2
14+
15+
- name: Set up Python 3.8
16+
uses: actions/setup-python@v2
17+
with:
18+
python-version: 3.8
19+
20+
- name: Install dependencies
21+
run: |
22+
python -m pip install --upgrade pip
23+
python -m pip install pre-commit
24+
25+
- name: Run pre-commit
26+
run: pre-commit run --all-files

.pre-commit-config.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# See https://pre-commit.com for more information
2+
# See https://pre-commit.com/hooks.html for more hooks
3+
4+
# Default to Python 3
5+
default_language_version:
6+
python: python3
7+
8+
# Optionally both commit and push
9+
default_stages: [commit]
10+
11+
# Regex for files to exclude
12+
# Don't lint the generated JSON metadata files
13+
exclude: "examples/.*json"
14+
15+
repos:
16+
- repo: https://github.com/pre-commit/pre-commit-hooks
17+
rev: v4.0.1
18+
hooks:
19+
- id: trailing-whitespace
20+
- id: end-of-file-fixer

README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,19 @@
22

33
## About
44

5-
This repository defines how to store geospatial [vector data](https://gisgeography.com/spatial-data-types-vector-raster/) (point,
6-
lines, polygons) in [Apache Parquet](https://parquet.apache.org/), a popular columnar storage format for tabular data - see
7-
[this vendor explanation](https://databricks.com/glossary/what-is-parquet) for more on what that means. Our goal is to standardize how
8-
geospatial data is represented in Parquet to further geospatial interoperability among tools using Parquet today, and hopefully
9-
help push forward what's possible with 'cloud-native geospatial' workflows.
10-
11-
**Warning:** This is not (yet) a stable specification that can be relied upon. All 0.X releases are made to gather wider feedback, and we anticipate that some things may change. For now we reserve the right to make changes in backwards incompatible
12-
ways (though will try not to), see the [versioning](#versioning) section below for more info. If you are excited about the potential
5+
This repository defines how to store geospatial [vector data](https://gisgeography.com/spatial-data-types-vector-raster/) (point,
6+
lines, polygons) in [Apache Parquet](https://parquet.apache.org/), a popular columnar storage format for tabular data - see
7+
[this vendor explanation](https://databricks.com/glossary/what-is-parquet) for more on what that means. Our goal is to standardize how
8+
geospatial data is represented in Parquet to further geospatial interoperability among tools using Parquet today, and hopefully
9+
help push forward what's possible with 'cloud-native geospatial' workflows.
10+
11+
**Warning:** This is not (yet) a stable specification that can be relied upon. All 0.X releases are made to gather wider feedback, and we anticipate that some things may change. For now we reserve the right to make changes in backwards incompatible
12+
ways (though will try not to), see the [versioning](#versioning) section below for more info. If you are excited about the potential
1313
please collaborate with us by building implementations, sounding in on the issues and contributing PR's!
1414

15-
Early contributors include developers from GeoPandas, GeoTrellis, OpenLayers, Vis.gl, Voltron Data, Microsoft, Carto, Azavea, Planet & Unfolded.
15+
Early contributors include developers from GeoPandas, GeoTrellis, OpenLayers, Vis.gl, Voltron Data, Microsoft, Carto, Azavea, Planet & Unfolded.
1616
Anyone is welcome to join us, by building implementations, trying it out, giving feedback through issues and contributing to the spec via pull requests.
17-
Initial work started in the [geo-arrow-spec](https://github.com/geopandas/geo-arrow-spec/) GeoPandas repository, and that will continue on
17+
Initial work started in the [geo-arrow-spec](https://github.com/geopandas/geo-arrow-spec/) GeoPandas repository, and that will continue on
1818
Arrow work in a compatible way, with this specification focused solely on Parquet.
1919

2020
## Goals
@@ -26,12 +26,12 @@ There are a few core goals driving the initial development.
2626
of the most popular formats, and hopefully establish a good pattern for how to do so.
2727
* **Introduce columnar data formats to the geospatial world** - And most of the geospatial world is not yet benefitting from all the breakthroughs in data analysis
2828
in the broader IT world, so we are excited to enable interesting geospatial analysis with a wider range of tools.
29-
* **Enable interoperability among cloud data warehouses** - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data
29+
* **Enable interoperability among cloud data warehouses** - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data
3030
with existing formats can be problematic. All support and often recommend Parquet, so defining a solid GeoParquet can help enable interoperability.
31-
* **Persist geospatial data from Apache Arrow** - GeoParquet is developed in parallel with a [GeoArrow spec](https://github.com/geopandas/geo-arrow-spec), to
31+
* **Persist geospatial data from Apache Arrow** - GeoParquet is developed in parallel with a [GeoArrow spec](https://github.com/geopandas/geo-arrow-spec), to
3232
enable cross-language in-memory analytics of geospatial information with Arrow. Parquet is already well-supported by Arrow as the key on disk persistance format.
3333

34-
And our broader goal is to innovate with 'cloud-native vector' providing a stable base to try out new ideas for cloud-native & streaming workflows.
34+
And our broader goal is to innovate with 'cloud-native vector' providing a stable base to try out new ideas for cloud-native & streaming workflows.
3535

3636

3737
## Features
@@ -44,16 +44,16 @@ A quick overview of what geoparquet supports (or at least plans to support).
4444
* **Multiple geometry columns** - There is a default geometry column, but additional geometry columns can be included.
4545
* **Great compression / small files** - Parquet is designed to compress very well, so data benefits by taking up less disk space & being more efficient over
4646
the network.
47-
* **Work with both planar and spherical coordinates** - Most cloud data warehouses support spherical coordinates, and so GeoParquet aims to help persist those
47+
* **Work with both planar and spherical coordinates** - Most cloud data warehouses support spherical coordinates, and so GeoParquet aims to help persist those
4848
and be clear about what is supported.
49-
* **Great at read-heavy analytic workflows** - Columnar formats enable cheap reading of a subset of columns, and Parquet in particular enables efficient filtering
49+
* **Great at read-heavy analytic workflows** - Columnar formats enable cheap reading of a subset of columns, and Parquet in particular enables efficient filtering
5050
of chunks based on column statistics, so the format will perform well in a variety of modern analytic workflows.
5151
* **Support for data partitioning** - Parquet has a nice ability to partition data into different files for efficiency, and we aim to enable geospatial partitions.
52-
* **Enable spatial indices** - To enable top performance a spatial index is essential. This will be the focus of the
52+
* **Enable spatial indices** - To enable top performance a spatial index is essential. This will be the focus of the
5353
[0.4](https://github.com/opengeospatial/geoparquet/milestone/5) release.
54-
54+
5555
It should be noted what GeoParquet is less good for. The biggest one is that it is not a good choice for write-heavy interactions. A row-based format
56-
will work much better if it is backing a system that is constantly updating the data and adding new data.
56+
will work much better if it is backing a system that is constantly updating the data and adding new data.
5757

5858
## Roadmap
5959

@@ -73,11 +73,11 @@ Our detailed roadmap is in the [Milestones](https://github.com/opengeospatial/ge
7373

7474
After we reach version 1.0 we will follow [SemVer](https://semver.org/), so at that point any breaking change will require the spec to go to 2.0.0.
7575
Currently implementors should expect breaking changes, though at some point, hopefully relatively soon (0.4?), we will declare that we don't *think* there
76-
will be any more potential breaking changes. Though the full commitment to that won't be made until 1.0.0.
76+
will be any more potential breaking changes. Though the full commitment to that won't be made until 1.0.0.
7777

7878
## Current Implementations & Examples
7979

80-
Examples of geoparquet files following the current spec can be found in the [examples/](examples/) folder. There is also a
80+
Examples of geoparquet files following the current spec can be found in the [examples/](examples/) folder. There is also a
8181
larger sample dataset [nz-buildings-outlines.parquet](https://storage.googleapis.com/open-geodata/linz-examples/nz-buildings-outlines.parquet)
8282
available on Google Cloud Storage.
8383

format-specs/geoparquet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@ This attribute indicates how to interpret the edges of the geometries: whether t
168168

169169
If no value is set, the default value to assume is 'planar'.
170170

171-
Note if `edges` is 'spherical' then it is recommended that `orientation` is always set to 'counterclockwise'. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly.
172-
then implementations should choose the smaller polygon for interpretation (but note this can only work with polygons
171+
Note if `edges` is 'spherical' then it is recommended that `orientation` is always set to 'counterclockwise'. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly.
172+
then implementations should choose the smaller polygon for interpretation (but note this can only work with polygons
173173
smaller than hemisphere, which is why it is recommended to always set it).
174174

175175
#### bbox

validator/python/.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,4 @@ cython_debug/
157157
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158158
# and can be added to the global gitignore or merged into this file. For a more nuclear
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160-
#.idea/
160+
#.idea/

0 commit comments

Comments
 (0)