Skip to content

feat(energy_quantified): add Energy Quantified connector#177

Open
mattslack-db wants to merge 4 commits into
databrickslabs:masterfrom
mattslack-db:feat/connector-energy_quantified
Open

feat(energy_quantified): add Energy Quantified connector#177
mattslack-db wants to merge 4 commits into
databrickslabs:masterfrom
mattslack-db:feat/connector-energy_quantified

Conversation

@mattslack-db
Copy link
Copy Markdown

What's included

  • API research doc: src/databricks/labs/community_connector/sources/energy_quantified/energy_quantified_api_doc.md
  • Implementation: energy_quantified.py (+ energy_quantified_schemas.py, energy_quantified_utils.py)
  • Simulator spec: source_simulator/specs/energy_quantified/endpoints.yaml + 6 corpus JSON files
  • Connector spec: connector_spec.yaml
  • Documentation: README.md
  • Tests: tests/unit/sources/energy_quantified/test_energy_quantified_lakeflow_connect.py + configs/dev_table_config.json

Tables (6)

European energy market data via Energy Quantified's curve-centric model:

  • Catalog: curves (snapshot, page-based pagination)
  • Time-series: timeseries (append), ohlc (append), srmc (append) — all partitioned over time windows
  • Forecasts: instances (append, backward cursor)
  • Open intervals: periods (cdc with lookback, partitioned)

Simulate-mode test results

18 passed, 1 skipped in 1.18s

(test_read_table_deletes skipped — no cdc_with_deletes tables.)

Notable design choices

  • Single connection parameter (api_key) — every per-curve parameter lives in table_options
  • %2B encoding on + in curve names via urllib.parse.quote(name, safe='')
  • Client-side rate pacing at ~15 req/s; exponential backoff with jitter on 429/5xx
  • All PKs flat top-level (#174)
  • No from __future__ import annotations (#173)
  • get_partitions raises NotImplementedError for non-partitioned tables (curves, instances)
  • Partitioned reads on the four range-query tables for parallel ingestion

The simulator spec and corpus have not yet been validated against the live API.

Next step: live testing

/validate-connector energy_quantified

This pull request and its description were written by Isaac.

…testing]

Implements LakeflowConnect + SupportsPartitionedStream for the
Energy Quantified REST API (European power, gas, weather, coal,
carbon market data). 6 tables across the curve-centric data model:

- curves (snapshot, catalog walk)
- timeseries (append, partitioned over time windows)
- instances (append, backward cursor walk from latest)
- periods (cdc with lookback, partitioned)
- ohlc (append, partitioned)
- srmc (append, partitioned)

Notable design choices:
- Single connection parameter (api_key, X-API-Key header) — every
  table parameterisation lives in table_options to avoid the
  INVALID_DATASOURCE_OPTION_OVERRIDE_ATTEMPT trap.
- curve_name is required on all 5 non-catalog tables; %2B encoding
  on '+' characters preserved via urllib.parse.quote(name, safe='').
- Client-side rate pacing at ~15 req/s; exponential backoff with
  jitter on 429/5xx.
- _init_ts cap clamps every cursor returned to read_table so
  Trigger.AvailableNow converges.
- Partitioned reads for the four range-query tables (timeseries,
  periods, ohlc, srmc); curves and instances opt out and rely on
  the framework's read_table fallback.
- All PKs flat top-level columns (lesson from databrickslabs#174). OHLC product
  block and SRMC contract block exploded so traded_at / period /
  front / delivery are flat.
- No `from __future__ import annotations` anywhere (lesson from
  databrickslabs#173 — merge script doesn't special-case it).
- get_partitions raises NotImplementedError for curves/instances so
  the framework falls back to read_table for non-partitioned tables.

Simulate-mode tests: 18 passed, 1 skipped (cdc_with_deletes N/A).
Live API testing deferred to /validate-connector.

Co-authored-by: Isaac
@mattslack-db mattslack-db requested a review from yyoli-db as a code owner May 11, 2026 21:12
Surfaced by record-mode validation against a live Energy Quantified
account. Three real connector bugs found (live requests failed) plus
spec/corpus reshapes for fields the simulator was missing.

Connector bugs (live API rejected the request or returned wrong data):
- /api/metadata/curves/ returns a flat JSON array, not an envelope
  with `data`. _read_curves now handles flat-list responses and uses
  Link/X-Total-Pages headers for pagination instead of looking for a
  wrapper.
- /api/instances/{curve} rejects ISO 8601 datetimes with a 'T'
  separator (400 VALIDATION ERROR: "Not a valid date time"). EQ
  requires a space separator. Added to_eq_datetime helper in utils
  and switched _init_ts and cursor formatting to use it.
- /api/srmc/{curve}/timeseries/ wraps records under
  body.timeseries.data, not body.data. _explode_srmc_response now
  reads from the correct path and pulls srmc_options out of
  body.srmc_options.

Schema/shape reshapes (records still parsed but fields were missing):
- curve.resolution.{frequency,timezone} now flat on every embedded
  curve via _normalise_curve / _embedded_curve so they're addressable
  as flat columns where needed.
- srmc top-level metadata (contract, denominator, srmc_options) is
  exploded out of the response envelope rather than left nested.

Spec / corpus:
- endpoints.yaml: curves declares a flat-list response (no wrapper);
  timeseries / periods / ohlc / srmc embed resolution.{frequency,
  timezone} inside the curve; srmc uses records_key: timeseries.data
  to model the nested envelope.
- All six corpora reseeded from the live cassette via
  tools.cassette_to_corpus. Curves trimmed to 20 records across 4
  curve types so the simulator has variety.

Tests:
- tests/unit/sources/energy_quantified/configs/dev_table_config.json:
  switched fictional simulator stand-in curve names to real subscribed
  curves on the account; lowercase period values ("month"); limit on
  instances so simulate mode terminates within the partition budget.

Regenerated _generated_energy_quantified_python_source.py.

Co-authored-by: Isaac
DLT's append_flow silently drops rows where any primary_key column
is null. The srmc table's PK included both `front` and `delivery`,
but the SRMC API's `contract` block only returns whichever of the
two the caller queried with — the other is null on every record.
For front-based queries (most common usage) this meant every srmc
row had delivery=null, so the deployed pipeline ingested 0 rows
into shared.matt_slack.srmc while local tests passed.

Drop `delivery` from the srmc PK; (curve_name, date, period, front)
is sufficient for uniqueness within a single request. `delivery`
remains a regular nullable column for downstream filtering.

The same shape exists on `ohlc`, but the OHLC API populates BOTH
front and delivery on every record (delivery is the resolved
contract month), so its PK works as-is. Left unchanged.

Co-authored-by: Isaac
CI's pylint runs with the same flag set as /self-review-connector
but flags three findings the connector-dev pass missed:

- energy_quantified.py:43 (line-too-long): the absolute import path
  to energy_quantified_schemas is 101 chars. Adding `disable-next=
  line-too-long` since renaming the module or switching to a
  relative import is cosmetic at best.
- energy_quantified_utils.py:25: same long absolute import — same fix.
- energy_quantified.py _read_instances: 21 branches (max 20) and 57
  statements (max 50). The function is doing legitimate pagination
  with multiple termination conditions; splitting it makes the cursor
  bookkeeping harder to follow. Disable `too-many-branches` and
  `too-many-statements` locally on the function rather than refactor.

Co-authored-by: Isaac
@yyoli-db
Copy link
Copy Markdown
Collaborator

please file LPP following go/community-connectors and go/community-connectors/tracking
and also run /self-review-connector @mattslack-db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants