Deep-nesting prevents easily performing table-based analysis #19

e-lo · 2023-06-22T18:26:54Z

e-lo
Jun 22, 2023

As somebody potentially developing tools to analyze and edit roadway networks, I'd like to have a more straightforward conversion between the json-flavored data and a table-based schema which can be more efficiently analyzed and manipulated.

The advent of osmnx as a tool which took OSM data and put it in pandas GeoDataFrames has unlocked a great deal of research and development and subsequent tooling in analyzing roadway networks around the world.

I'm concerned that the deep-nesting in the draft schema would effectively prevent straightforward translation of the json-based data to a series of related tables. While I understand why json-based data format was chosen, it can (and should 🤞) be structured so that there can be clear breaking out of sub-tables of relationships.

See GMNS for a rough example of a table-based format that you might hope to achieve from manipulating/summarizing the json-based data. A goal could (should 🤞 ) be to be able to summarize several example datasets in overture's schema into GMNS or a similar format without significant loss of data or hard-coding.

vcschapp · 2023-06-23T00:42:51Z

vcschapp
Jun 23, 2023
Maintainer

Hi @e-lo!
Thanks for the feedback, it is greatly appreciated.

One thing I should be upfront about is that we're viewing GeoJSON as a "mental model" or "canonical data format" that allows us to describe the schema, but not necessarily as a viable final sharing format. That being said, in our current line of thinking, we're looking at data formats that would be losslessly convertible to/from the GeoJSON "canonical model", so your core concerns would probably still remain regardless of the data format.

Can I ask you a couple of questions that would help us put your feedback in context?

What's your definition of deep nesting? Is there some level that is not problematic in your use case, and another level that is too deep?
Is there a toolchain that you prefer to use for converting data to a more tabular format that would work with a different data format, but not with data conforming to our draft schema?
Is there a standard data format that you would prefer to see used?
Assuming for the sake of argument that the nesting level didn't change much in the future, can you imagine some kind of toolchain or cookbook that we could provide that would help you in your tabular conversion use case?

0 replies

e-lo · 2023-06-23T17:47:47Z

e-lo
Jun 23, 2023
Author

What's your definition of deep nesting? Is there some level that is not problematic in your use case, and another level that is too deep?

That's a good question. I don't know of a good definition (one probably exists!) but what I do know is that my initial test of some of the example data provided got icky quickly. For example a few rows of this:

schema/examples/transportation/segment/road/lanes/restrictions/lanes-hov-occupancy-scoped.yaml

Lines 2 to 56 in 874ca44

    
           id: "234" 
        
           type: Feature 
        
           geometry: 
        
             type: LineString 
        
             coordinates: [[0, 0], [1, 1]] 
        
           properties: 
        
             # Overture properties 
        
             theme: transportation 
        
             type: segment 
        
             subType: road 
        
             updateTime: "2023-05-10T12:02:30-08:00" 
        
             version: 0 
        
             road: 
        
               class: secondary 
        
               restrictions: 
        
                 speedLimits: 
        
                   - maxSpeed: 
        
                       - 100 
        
                       - "km/h" 
        
               lanes: 
        
                 # one-way road with access and speed limit restrictions 
        
                 # digitization: S->N 
        
                 # |   |   |   | 
        
                 # |   |   |   | => max speeds: 100 km/h for whole segment 
        
                 # | h |   |   |    but on lane 2 is limited to 80 km/h for hgv vehicles 
        
                 # | o |   |   | 
        
                 # | v |   |   | 
        
                 # |   |   |   | 
        
                 # | 0 | 1 | 2 | 
        
                 - direction: forward # lane 0 -> hov only that allows also bicycles 
        
                   restrictions: 
        
                     access: 
        
                       - allowed: 
        
                           when: 
        
                             mode: 
        
                               - hov 
        
                               - bicycle 
        
                     minOccupancy: 
        
                       - isAtLeast: 3 
        
                         when: 
        
                           mode: 
        
                             - hov 
        
                       - isAtLeast: 1 
        
                         when: 
        
                           mode: 
        
                             - hov 
        
                           recognized: 
        
                             - asPermitted 
        
                       - isAtLeast: 1 
        
                         when: 
        
                           mode: 
        
                           - bicycle 
        
                 - direction: forward # lane 1 
        
                 - direction: forward # lane 2

pd.DataFrame([flatten_json.flatten_json(row['properties']) for row in data_json])

Turns into a really ugly dataframe where each level of nesting ends up as a variable in the column name...in a way that isn't simplistically predictable for a random dataset.

	theme	type	subType	updateTime	version	road_class	road_restrictions_speedLimits_0_maxSpeed_0	road_restrictions_speedLimits_0_maxSpeed_1	road_lanes_0_direction	road_lanes_0_restrictions_access_0_allowed_when_mode_0	road_lanes_0_restrictions_access_0_allowed_when_mode_1	road_lanes_0_restrictions_minOccupancy_0_isAtLeast	road_lanes_0_restrictions_minOccupancy_0_when_mode_0	road_lanes_0_restrictions_minOccupancy_1_isAtLeast	road_lanes_0_restrictions_minOccupancy_1_when_mode_0	road_lanes_0_restrictions_minOccupancy_1_when_recognized_0	road_lanes_0_restrictions_minOccupancy_2_isAtLeast	road_lanes_0_restrictions_minOccupancy_2_when_mode_0	road_lanes_1_direction	road_lanes_2_direction
0	transportation	segment	road	2023-05-10T12:02:30-08:00	0	secondary	100	km/h	forward	hov	bicycle	3	hov	1	hov	asPermitted	1	bicycle	forward	forward
1	transportation	segment	road	2023-05-10T12:02:30-08:00	0	secondary	100	km/h	forward	hov	bicycle	3	hov	1	hov	asPermitted	1	bicycle	forward	forward

...when I started looking at writing some simple code to make it more usable, it got complex fast – especially for array items. Rather than invest more time in that (for now), I thought I would write this discussion issue instead ;-) (if nothing else, I would love to learn about an straightforward solution to handling the deep nesting even if the schema doesn't change)

0 replies

e-lo · 2023-06-23T18:00:18Z

e-lo
Jun 23, 2023
Author

I don't have any terrific answers to your other questions as I think if there were....there would be less of a need/attention on what you all are doing!

Is there a standard data format that you would prefer to see used?

Something similar to GMNS is a useful "target" for what it could look like in table-format. Although I don't have the perfect geojson representation of it you could construct it based on joining the relational tables.

Is there a toolchain that you prefer to use for converting data to a more tabular format that would work with a different data format, but not with data conforming to our draft schema?

In general, I see a lot of useful usage of osmnx, although its certainly not a perfect interpretation of all of the data in OSM.
I tend to use pandas and it's close companions in my work, but plenty of others in my field are in the PostGIS or r environs.

0 replies

e-lo · 2023-06-23T18:00:58Z

e-lo
Jun 23, 2023
Author

can you imagine some kind of toolchain or cookbook that we could provide that would help you in your tabular conversion use case?

I can imagine that being very useful!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep-nesting prevents easily performing table-based analysis #19

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deep-nesting prevents easily performing table-based analysis #19

e-lo Jun 22, 2023

Replies: 4 comments

vcschapp Jun 23, 2023 Maintainer

e-lo Jun 23, 2023 Author

e-lo Jun 23, 2023 Author

e-lo Jun 23, 2023 Author

e-lo
Jun 22, 2023

vcschapp
Jun 23, 2023
Maintainer

e-lo
Jun 23, 2023
Author

e-lo
Jun 23, 2023
Author

e-lo
Jun 23, 2023
Author