[v0/v1 migration] Implement CheckVariableGroupExistence to support StatVarGroup existence checks in Spanner by nick-nlb · Pull Request #1892 · datacommonsorg/mixer

nick-nlb · 2026-05-12T22:39:42Z

Issue

Description

Legacy observation existence checks were able to resolve checks for SVGs in addition to stat vars. This functionality was used to power the "Data source" and "Dataset" filters in the Stat Var Explorer. These checks used the BT cache, which is no longer available in the Spanner path and resulted in this functionality being broken.

This PR brings this functionality to Spanner by bringing existence check support to SVGs, specifically for imports / datasets.

Notes

Mirroring legacy code, we use a substring slice (importName = subjectID[8:]) to extract the raw import name from the subject_id found in the Edge table. We need to do this because the Edge table stores source nodes with a prefix (dc/base/AustraliaStatistics) while observations stores the name itself AustraliaStatistics in its import_name column. We are doing this in Go here (to avoid a substring in a join in Spanner). As a separate consideration from this PR, we could look at whether this shouldbe resolved ad the schema or data ingestion level so that the join can be performed on exact matches without string manipulation (this method does seem brittle).

Separately, this PR implements the SVG level existence checks, but only for imports and datasets (but not entities such as places). This is the only thing needed for the functionality as used here; however we could consider expanding this in another PR as a follow-up.

Testing

In the stat var explorer, when on master but with Spanner on, the "Data source" filter is empty.

When on this branch, the "Data source" filter is populated and should match production. Selecting "Data sources" should then both filter the SVGs and populate the "Dataset" list, with outputs that match production.

…ce checks in Spanner datasource

gemini-code-assist

Code Review

This pull request introduces the CheckVariableGroupExistence method to the Spanner client to support existence checks for variable groups and topics. The Observation data source has been updated to split incoming variables into standard statistical variables and groups/topics, routing them to the appropriate existence check logic. Feedback suggests avoiding the hardcoded string offset used for prefix stripping, ensuring that the import filter logic correctly handles requests with mixed entity types rather than assuming uniformity based on the first entity, and restoring the warning logs for malformed database rows to maintain observability.

keyurva

Thanks Nick! This was a deep change and would've taken a lot of figuring out and getting things in order. Much appreciated!

keyurva · 2026-05-13T20:58:49Z

 	if isExistenceRequest {
-		rows, err := sds.client.CheckVariableExistence(ctx, variables, entities)
+		if len(variables) == 0 {
+			rows, err := sds.client.CheckVariableExistence(ctx, variables, entities)


Can we create helpers like handleEntityOnlyExistence, sortNodesByType, etc. for better readability?

keyurva · 2026-05-13T21:02:04Z

 				continue
 			}
 			obs = append(obs, &Observation{
 				VariableMeasured: row[0],


For my understanding: This is interesting. So the current implementation returns SVs, SVGs and topics under VariableMeasured?

keyurva · 2026-05-13T21:06:11Z


+// CheckVariableGroupExistence checks for the existence of observations for the given variable groups/topics and entities.
+// Returns a slice of rows, where each row contains [variable, entity] that has at least one observation.
+func (sc *spannerDatabaseClient) CheckVariableGroupExistence(ctx context.Context, variables []string, entities []string) ([][]string, error) {


variables here are actually groups or topics, correct? If so, should we name them groupsOrTopics or something similar?

keyurva · 2026-05-13T21:06:52Z

+	// We are doing this because a query on root involve a massive join involving
+	// a large number of variables (causing a slow query and risk of timeout).
+	stmtRoots := spanner.Statement{
+		SQL: `SELECT DISTINCT subject_id


Our pattern is to define statements in statements.go. Can we do that for these statements as well?

keyurva · 2026-05-13T21:08:18Z

+	stmtRoots := spanner.Statement{
+		SQL: `SELECT DISTINCT subject_id
+			FROM Edge
+			WHERE predicate = 'specializationOf'


Can you describe how topics are handled because they won't have any specializationOf predicates.

keyurva · 2026-05-13T21:16:44Z


+// CheckVariableGroupExistence checks for the existence of observations for the given variable groups/topics and entities.
+// Returns a slice of rows, where each row contains [variable, entity] that has at least one observation.
+func (sc *spannerDatabaseClient) CheckVariableGroupExistence(ctx context.Context, variables []string, entities []string) ([][]string, error) {


To manage the size of this method, can we create helpers - e.g. identifyRoots, handleRoots, handleNonRoots, etc.?

keyurva · 2026-05-13T21:19:24Z

+	params := map[string]interface{}{}
+	for _, p := range keys {
+		ents := entitiesByPredicate[p]
+		filters = append(filters, fmt.Sprintf(`(predicate = '%s' AND object_id IN UNNEST(@%s))`, p, p))


I believe (not certain) these filter statements are declared in statements.go as well. Let's declare them there if that's the case for other filters as well.

keyurva · 2026-05-13T21:19:57Z

+	}
+
+	sourceEdgesStmt := spanner.Statement{
+		SQL: `SELECT DISTINCT subject_id, object_id


Can this be declared in statements.go as well?

keyurva · 2026-05-13T21:20:44Z

+	// Handle non-roots
+	if len(nonRoots) > 0 {
+		existenceQueryStmt := spanner.Statement{
+			SQL: `SELECT DISTINCT e.object_id AS variable, o.import_name


Similar statements.go comment as others.

keyurva · 2026-05-13T22:23:29Z


+// CheckVariableGroupExistence checks for the existence of observations for the given variable groups/topics and entities.
+// Returns a slice of rows, where each row contains [variable, entity] that has at least one observation.
+func (sc *spannerDatabaseClient) CheckVariableGroupExistence(ctx context.Context, variables []string, entities []string) ([][]string, error) {


nit: From offline chat, we learned that variables are actually SVGs or topics and entities are sources.

And the outputs are SVGs / topics and import names.

Let's rename variables / add comments wherever appropriate to make this clear.

Implement CheckVariableGroupExistence to support StatVarGroup existen…

47d0cd2

…ce checks in Spanner datasource

nick-nlb changed the title ~~[v0/v1 migration] Implement CheckVariableGroupExistence to support StatVarGroup existece checks in Spanner~~ [v0/v1 migration] Implement CheckVariableGroupExistence to support StatVarGroup existence checks in Spanner May 12, 2026

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

Comment thread internal/server/spanner/query.go

Comment thread internal/server/spanner/query.go Outdated

Comment thread internal/server/spanner/datasource.go

nick-nlb added 3 commits May 12, 2026 15:47

Implement CheckVariableGroupExistence in coordinateMockSpannerClient

495d50f

Update source edge query to support multiple predicates

132f8b2

Add back a slog that was hastily removed.

760aa53

nick-nlb marked this pull request as ready for review May 12, 2026 23:08

nick-nlb requested a review from keyurva May 12, 2026 23:08

keyurva reviewed May 13, 2026

View reviewed changes

Conversation

nick-nlb commented May 12, 2026

Issue

Description

Notes

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keyurva left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants