-
Notifications
You must be signed in to change notification settings - Fork 27
I saw something: broken boundary relations #273
Comments
Good question; I’m a bit removed from the underlying OSM mechanics of this. I believe @heffergm created the underlying extract service, and would be able to answer any questions about possible differences between these new extracts and the older, weekly ones. |
All the osmconvert code amongst the two systems is identical in function. All the ODES code was heavily borrowed (read largely cut/paste) from the old metroextractor system. ... except the hash-memory fix that snuck into the old system didn't get ported over, so I'll add that now. It may have an impact, although I'm somewhat doubtful given the size of the extracts in question. But if you want to re-cut an extract @missinglink, let me know if it looks any different. We've tried all kinds permutations of messing with including/excluding nodes. There are fundamental, systemic reasons we can't allow recursion (OOM) and at the end of the day every time we seem to choose one way, we get complaints about having preferred the other. I'm happy to change it to anything (provided it still functions) but at the rate it's happened in the past, I expect someone to ask me to switch it back (or to something else) within a few months. |
here some hard data: for sake of simplicity, in my examples I used the problem effects any element with member elements, so both old extracts$ wget https://s3.amazonaws.com/metro-extracts.mapzen.com/new-york_new-york.osm.pbf
$ ls -lah new-york_new-york.osm.pbf
-rw-rw-r-- 1 peter peter 79M Aug 13 14:53 new-york_new-york.osm.pbf
$ shasum new-york_new-york.osm.pbf
428d95b58dad852bee12c299d8c7e0eaa9ab50b3 new-york_new-york.osm.pbf I found the following ways using a script I wrote (it's only looking at ways which are a member of a relation with a single node ref): way 24315304 only has 1 ref
way 242967138 only has 1 ref
way 37788892 only has 1 ref
way 242967138 only has 1 ref
way 37788892 only has 1 ref
way 37561688 only has 1 ref
way 37788892 only has 1 ref
way 37788892 only has 1 ref
way 242967138 only has 1 ref
way 37788892 only has 1 ref
way 242967138 only has 1 ref
way 37788892 only has 1 ref
way 24315304 only has 1 ref which can be confirmed with $ osmconvert --out-osm new-york_new-york.osm.pbf | grep -A6 24315304
<way id="24315304" version="12" timestamp="2013-10-05T18:00:30Z" changeset="18199919" uid="113450" user="nfgusedautoparts">
<nd ref="263599185"/>
<tag k="source" v="county_import_v0.1_20080508235456"/>
<tag k="boundary" v="administrative"/>
<tag k="admin_level" v="6"/>
<tag k="border_type" v="county"/>
</way> compared to the source http://www.openstreetmap.org/way/24315304 you can see that this this is a similar issue to what I reported about the relations but not as severe, for road segments it's not a huge deal but for polygons it converts them to linestrings or cuts holes in them which is completely corrupting the data. ways with a single node should definitely be removed from the extract as they are essentially just a single point, i'm not sure how GIS systems will handle this sort of thing, in this case it's actually supposed to be a new extractsI created a custom extract geofabrikI also tried running my script against a small extract from geofabrik and found that they are also including $ wget http://download.geofabrik.de/europe/andorra-latest.osm.pbf
$ shasum andorra-latest.osm.pbf
56c4d3cc54d7e9fcda2c8706c00127691487d132 andorra-latest.osm.pbf way 45356002 not found in extract
way 45356023 not found in extract
way 45356009 not found in extract
... many more $ osmconvert --out-osm andorra-latest.osm.pbf | grep 45356002
<member type="way" ref="45356002" role="outer"/>
<member type="way" ref="45356002" role="outer"/>
<member type="way" ref="45356002" role="outer"/>
... 3x references in a relation but no actual way data in the file |
I hate to be the bearer of bad news but it seems that both the There is also a bunch of data that could be removed to thin them down, depending on the use-case. I got frustrated, so I wrote a linter https://github.com/missinglink/pbflint which you can use to validate the extracts, it's not feature complete but it's good enough for now to find some of the invalid entities. re: your comment @heffergm:
I know what you mean, it's probably down to the way that people use the data, for the purposes of display the truncated road segments are probably fine, it's possible that the road finishes slightly before the extract border but that's not a huge deal. The thing for me is; for analytical and data science purposes the extracts are not valid because they mutate the source data, this is a big problem and there is no indication in the file which entities have been mutated and which have not. I see a 4 different options when dealing with entity relationships:
no 1. would be my preference but due to the OOM errors this is probably not going to work at scale. thoughts? |
It sounds like you're the expert, so you tell me ;) What I can point to is what we're doing now, which in the case of PBF extracts is this (I think we're talking about pbf, yes?): https://github.com/mapzen/chef-mapzen_odes/blob/master/templates/default/extracts.sh.erb#L5 Note The short of it is, if osmconvert supports it, I can have it to whatever you'd prefer. |
ha! yea I'm becoming quite the expert :) so you feed the exporter i'll have a look in to the settings for |
The short answer is no, not without completely re-architecting the entire thing... We do currently use postgres, but only to produce shp files (the extracts we cut from the planet file are loaded into postgres then dumped as shapes). |
right, so I slept on this and I have figured out a way producing valid extracts using option 1. without having memory issues and which can cut small city extracts in under a minute. the idea is to use option 1, which recurses and includes ways/relations that are only partially inside the bbox (with the exception of super relations) and then; since they are valid for all child bboxes, we can just use any larger extract to produce the smaller ones. Eg. use @migurski if you want to take this further let me know and we can set up a video call |
heya,
I've been trying to extract
boundary:administrative
data from our PBF metro extracts and found that either all, or some of the members are missing from therelation
.An example would be http://www.openstreetmap.org/relation/1638991 which contains 43
outer
members.In the old metro extracts (wellington_new-zealand.osm.pbf) only 25 of the members were referenced, so unfortunately the polygon was invalid due to it not being able to close properly using the data available in the PBF file.
In the new metro extracts, all the members have been removed except for the
admin_centre
node.The wiki for boundary says that a relation marked as
boundary:*
should have at least one member with the role ofouter
.I understand that including all the member data in each extract would frequently include a bunch of data outside the bbox and could potentially recurse to include the whole planet file!!
Is it possible to do one of the following?
osmconvert --all-to-nodes
)The text was updated successfully, but these errors were encountered: