[draft] Support multi-entity in v2/observation#1867
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for multi-entity observations within the Spanner data source, including necessary protobuf updates and logic for querying and reconstructing observations from the normalized schema. The implementation also adds support for hydrating dimension nodes. Feedback focuses on improving data integrity and efficiency by removing hardcoded property exclusions that lead to data loss, eliminating redundant in-memory sorting of results already ordered by the database, and addressing potential data truncation during node hydration by properly handling result limits.
| var multiEntityDimensionExcludedProperties = map[string]struct{}{ | ||
| "facetId": {}, | ||
| "importName": {}, | ||
| "isDcAggregate": {}, | ||
| "measurementMethod": {}, | ||
| "observationAbout": {}, | ||
| "observationPeriod": {}, | ||
| "provenanceUrl": {}, | ||
| "scalingFactor": {}, | ||
| "unit": {}, | ||
| } |
There was a problem hiding this comment.
The property observationAbout is hardcoded in the multiEntityDimensionExcludedProperties list, which leads to dimension information being lost in the response. Following repository guidelines, avoid hardcoding property-specific behaviors such as special value filtering. The API or schema should dictate the type of value and how it is handled.
References
- Avoid hardcoding property-specific behaviors, such as special value filtering. Instead, the API/schema should dictate the type of value and how it's handled.
| sort.Slice(pointStats, func(i, j int) bool { | ||
| return pointStats[i].GetDate() < pointStats[j].GetDate() | ||
| }) |
There was a problem hiding this comment.
The explicit sort of pointStats in Go is redundant because the underlying Spanner query already includes an ORDER BY svo.date ASC clause within the ARRAY subquery (see statements_normalized.go), and Spanner preserves the order of elements in the resulting array.
| sort.Slice(pointStats, func(i, j int) bool { | |
| return pointStats[i].GetDate() < pointStats[j].GetDate() | |
| }) | |
| if len(pointStats) == 0 { | |
| return nil | |
| } |
| nodeResp, err := sds.Node(ctx, &pbv2.NodeRequest{ | ||
| Nodes: nodeDcids, | ||
| Property: nodePropertyExpression(nodeProperties), | ||
| }, datasources.DefaultPageSize) |
There was a problem hiding this comment.
Using datasources.DefaultPageSize as a limit for sds.Node might result in incomplete hydration of dimension nodes if the number of unique DCIDs in the multi-entity observations exceeds the default page size. Unless there is a strong, documented domain constraint ensuring the number of results will not exceed this limit, pagination should be handled or the limit should be adjusted to len(nodeDcids).
References
- Pagination for API calls can be omitted if there's a strong, documented domain constraint ensuring the number of results will not exceed the page size limit.
…obuf models, and multi-entity facet resolution
d943e78 to
c76a0e5
Compare
No description provided.