feat(energy_quantified): add Energy Quantified connector by mattslack-db · Pull Request #177 · databrickslabs/lakeflow-community-connectors

mattslack-db · 2026-05-11T21:12:22Z

What's included

API research doc: src/databricks/labs/community_connector/sources/energy_quantified/energy_quantified_api_doc.md
Implementation: energy_quantified.py (+ energy_quantified_schemas.py, energy_quantified_utils.py)
Simulator spec: source_simulator/specs/energy_quantified/endpoints.yaml + 6 corpus JSON files
Connector spec: connector_spec.yaml
Documentation: README.md
Tests: tests/unit/sources/energy_quantified/test_energy_quantified_lakeflow_connect.py + configs/dev_table_config.json

Tables (6)

European energy market data via Energy Quantified's curve-centric model:

Catalog: curves (snapshot, page-based pagination)
Time-series: timeseries (append), ohlc (append), srmc (append) — all partitioned over time windows
Forecasts: instances (append, backward cursor)
Open intervals: periods (cdc with lookback, partitioned)

Simulate-mode test results

18 passed, 1 skipped in 1.18s

(test_read_table_deletes skipped — no cdc_with_deletes tables.)

Notable design choices

Single connection parameter (api_key) — every per-curve parameter lives in table_options
%2B encoding on + in curve names via urllib.parse.quote(name, safe='')
Client-side rate pacing at ~15 req/s; exponential backoff with jitter on 429/5xx
All PKs flat top-level (#174)
No from __future__ import annotations (#173)
get_partitions raises NotImplementedError for non-partitioned tables (curves, instances)
Partitioned reads on the four range-query tables for parallel ingestion

The simulator spec and corpus have not yet been validated against the live API.

Next step: live testing

/validate-connector energy_quantified

This pull request and its description were written by Isaac.

…testing] Implements LakeflowConnect + SupportsPartitionedStream for the Energy Quantified REST API (European power, gas, weather, coal, carbon market data). 6 tables across the curve-centric data model: - curves (snapshot, catalog walk) - timeseries (append, partitioned over time windows) - instances (append, backward cursor walk from latest) - periods (cdc with lookback, partitioned) - ohlc (append, partitioned) - srmc (append, partitioned) Notable design choices: - Single connection parameter (api_key, X-API-Key header) — every table parameterisation lives in table_options to avoid the INVALID_DATASOURCE_OPTION_OVERRIDE_ATTEMPT trap. - curve_name is required on all 5 non-catalog tables; %2B encoding on '+' characters preserved via urllib.parse.quote(name, safe=''). - Client-side rate pacing at ~15 req/s; exponential backoff with jitter on 429/5xx. - _init_ts cap clamps every cursor returned to read_table so Trigger.AvailableNow converges. - Partitioned reads for the four range-query tables (timeseries, periods, ohlc, srmc); curves and instances opt out and rely on the framework's read_table fallback. - All PKs flat top-level columns (lesson from databrickslabs#174). OHLC product block and SRMC contract block exploded so traded_at / period / front / delivery are flat. - No `from __future__ import annotations` anywhere (lesson from databrickslabs#173 — merge script doesn't special-case it). - get_partitions raises NotImplementedError for curves/instances so the framework falls back to read_table for non-partitioned tables. Simulate-mode tests: 18 passed, 1 skipped (cdc_with_deletes N/A). Live API testing deferred to /validate-connector. Co-authored-by: Isaac

Surfaced by record-mode validation against a live Energy Quantified account. Three real connector bugs found (live requests failed) plus spec/corpus reshapes for fields the simulator was missing. Connector bugs (live API rejected the request or returned wrong data): - /api/metadata/curves/ returns a flat JSON array, not an envelope with `data`. _read_curves now handles flat-list responses and uses Link/X-Total-Pages headers for pagination instead of looking for a wrapper. - /api/instances/{curve} rejects ISO 8601 datetimes with a 'T' separator (400 VALIDATION ERROR: "Not a valid date time"). EQ requires a space separator. Added to_eq_datetime helper in utils and switched _init_ts and cursor formatting to use it. - /api/srmc/{curve}/timeseries/ wraps records under body.timeseries.data, not body.data. _explode_srmc_response now reads from the correct path and pulls srmc_options out of body.srmc_options. Schema/shape reshapes (records still parsed but fields were missing): - curve.resolution.{frequency,timezone} now flat on every embedded curve via _normalise_curve / _embedded_curve so they're addressable as flat columns where needed. - srmc top-level metadata (contract, denominator, srmc_options) is exploded out of the response envelope rather than left nested. Spec / corpus: - endpoints.yaml: curves declares a flat-list response (no wrapper); timeseries / periods / ohlc / srmc embed resolution.{frequency, timezone} inside the curve; srmc uses records_key: timeseries.data to model the nested envelope. - All six corpora reseeded from the live cassette via tools.cassette_to_corpus. Curves trimmed to 20 records across 4 curve types so the simulator has variety. Tests: - tests/unit/sources/energy_quantified/configs/dev_table_config.json: switched fictional simulator stand-in curve names to real subscribed curves on the account; lowercase period values ("month"); limit on instances so simulate mode terminates within the partition budget. Regenerated _generated_energy_quantified_python_source.py. Co-authored-by: Isaac

DLT's append_flow silently drops rows where any primary_key column is null. The srmc table's PK included both `front` and `delivery`, but the SRMC API's `contract` block only returns whichever of the two the caller queried with — the other is null on every record. For front-based queries (most common usage) this meant every srmc row had delivery=null, so the deployed pipeline ingested 0 rows into shared.matt_slack.srmc while local tests passed. Drop `delivery` from the srmc PK; (curve_name, date, period, front) is sufficient for uniqueness within a single request. `delivery` remains a regular nullable column for downstream filtering. The same shape exists on `ohlc`, but the OHLC API populates BOTH front and delivery on every record (delivery is the resolved contract month), so its PK works as-is. Left unchanged. Co-authored-by: Isaac

CI's pylint runs with the same flag set as /self-review-connector but flags three findings the connector-dev pass missed: - energy_quantified.py:43 (line-too-long): the absolute import path to energy_quantified_schemas is 101 chars. Adding `disable-next= line-too-long` since renaming the module or switching to a relative import is cosmetic at best. - energy_quantified_utils.py:25: same long absolute import — same fix. - energy_quantified.py _read_instances: 21 branches (max 20) and 57 statements (max 50). The function is doing legitimate pagination with multiple termination conditions; splitting it makes the cursor bookkeeping harder to follow. Disable `too-many-branches` and `too-many-statements` locally on the function rather than refactor. Co-authored-by: Isaac

yyoli-db · 2026-05-15T01:51:11Z

please file LPP following go/community-connectors and go/community-connectors/tracking
and also run /self-review-connector @mattslack-db

mattslack-db requested a review from yyoli-db as a code owner May 11, 2026 21:12

mattslack-db added 3 commits May 12, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(energy_quantified): add Energy Quantified connector#177

feat(energy_quantified): add Energy Quantified connector#177
mattslack-db wants to merge 4 commits into
databrickslabs:masterfrom
mattslack-db:feat/connector-energy_quantified

mattslack-db commented May 11, 2026

Uh oh!

yyoli-db commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattslack-db commented May 11, 2026

What's included

Tables (6)

Simulate-mode test results

Notable design choices

Next step: live testing

Uh oh!

yyoli-db commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants