Replies: 1 comment
-
Sorry for long post but just so you don't have to run the query to see the state of things :)
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Category Feedback
Add details
Expected behaviour you didn't see
Clean, and approaching definitive listing of less transient places (ie, more permanent places like airports/ galleries over restaurants.
Unexpected behaviour
A lot of duplication/ inconsistent naming in the data of some of the larger places (airports, see below). This seems to come from multiple sources, where perhaps they aren't as interested in data quality as, say, OS, but data from OS seems to be missing.
suggested functionality
-- Just looking for an update in the data clean-up that was announced in June? I know it will be a massive job!!, but transport hubs are key to my project :)
-- Where are the other data sources?
-- Should there be differentiation between passenger/ cargo/ private jet terminals in the categories? (this sort of thing may apply to other sub-cats).
Steps to reproduce the problem
The below query will produce a lot of duplication, inconsistent naming conventions, places incorrectly marked as an airport/ terminal, incorrect/ incomplete addresses and more.
SELECT addresses[1] as address, names.primary as name, categories.main as category, sources[1]['dataset'] as src
FROM read_parquet('..../release/theme=places/type=/', filename=true, hive_partitioning=1) WHERE addresses[1].country='GB' AND names.primary ilike '%heathrow%' AND categories.main in ('airport','airport_terminal');
Dependency with other categories, if any.
None
Beta Was this translation helpful? Give feedback.
All reactions