Data model for ROI information is awkward #58

stuarteberg · 2021-06-20T20:44:26Z

The way we handle ROIs in the neuprint data model has always rubbed me the wrong way. There's got to be a better way.

Disclaimer: I wasn't involved in the development of the original data model, so perhaps the tweaks I'd like to propose were already considered and rejected.

The Problems

There are two independent reasons handling ROI information is awkward in neuprint:

ROIs are permitted to overlap. That sounds flexible, but it introduces unnecessary complications and inconveniences. This forces us to distinguish between "primary" and non-primary ROIs. If one is interested in a non-primary ROI, one must be very careful when constructing queries and interpreting the results to ensure that duplicate results are accounted for properly. I'm certain this is not intuitive to newcomers.
ROI information is hidden away in a JSON property (roiInfo) within certain nodes. This makes filtering or otherwise manipulating the ROI information awkward. One must rely on special-purpose Cypher functions (e.g. apoc.convert.fromJsonMap()) or simply download a lot of little roiInfo JSON objects on the client and perform the filtering/manipulation on the client side. Yuck.

Possible Remedies

To address problem 1, I think we should simply require that ROIs are strictly hierarchical, and track each :Element, :Synapse, etc. according to the single bottom-level ROI it is contained in. When fully-qualified, ROI names will include the complete hierarchy, e.g. CX.PB.PB(L1), or possibly even hemibrain.CX.PB.PB(L1). Where necessary, convenience functions can be provided to map from an simple ROI name to its fully-qualified name. In client libraries such as neuprint-python, we'll make use of Cypher's regular expression features to refer to higher-level ROIs, e.g. CX.PB.* to capture everything in the PB. Note that under this scheme, there is no need to assign some ROIs a "primary" status.
To address problem 2, I think we can encode ROI information as additional nodes or edges in the data graph. There are probably multiple ways to do that, but the simplest that comes to mind is to add parallel nodes or edges in every place where we'd normally use an roiInfo.
- For :Element nodes (including :Synapse), non-overlapping ROIs as described above would allow us to replace roiInfo with a simple string property.
- Looking at the data model diagram, I think we might want to add parallel :ConnectsTo edges (one per ROI) between :Neuronnodes and also parallel edges between :SynapseSet nodes. Alternatively, we could add parallel :SynapseSet nodes themselves, but I'm not sure if that would make things more or less confusing.
  
  [Edit: There are other possibilities. One is to add properties to each :ConnectsTo edge for the ROI synapse totals for each ROI of the connection.]

The text was updated successfully, but these errors were encountered:

issuelabeler bot added duplicate This issue or pull request already exists to do user labels Jun 20, 2021

stuarteberg removed duplicate This issue or pull request already exists to do user labels Jun 20, 2021

issuelabeler bot added duplicate This issue or pull request already exists to do labels Jun 20, 2021

stuarteberg removed duplicate This issue or pull request already exists to do labels Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model for ROI information is awkward #58

Data model for ROI information is awkward #58

stuarteberg commented Jun 20, 2021 •

edited

Loading

Data model for ROI information is awkward #58

Data model for ROI information is awkward #58

Comments

stuarteberg commented Jun 20, 2021 • edited Loading

The Problems

Possible Remedies

stuarteberg commented Jun 20, 2021 •

edited

Loading