feat(energy_quantified): add Energy Quantified connector#177
Open
mattslack-db wants to merge 4 commits into
Open
feat(energy_quantified): add Energy Quantified connector#177mattslack-db wants to merge 4 commits into
mattslack-db wants to merge 4 commits into
Conversation
…testing] Implements LakeflowConnect + SupportsPartitionedStream for the Energy Quantified REST API (European power, gas, weather, coal, carbon market data). 6 tables across the curve-centric data model: - curves (snapshot, catalog walk) - timeseries (append, partitioned over time windows) - instances (append, backward cursor walk from latest) - periods (cdc with lookback, partitioned) - ohlc (append, partitioned) - srmc (append, partitioned) Notable design choices: - Single connection parameter (api_key, X-API-Key header) — every table parameterisation lives in table_options to avoid the INVALID_DATASOURCE_OPTION_OVERRIDE_ATTEMPT trap. - curve_name is required on all 5 non-catalog tables; %2B encoding on '+' characters preserved via urllib.parse.quote(name, safe=''). - Client-side rate pacing at ~15 req/s; exponential backoff with jitter on 429/5xx. - _init_ts cap clamps every cursor returned to read_table so Trigger.AvailableNow converges. - Partitioned reads for the four range-query tables (timeseries, periods, ohlc, srmc); curves and instances opt out and rely on the framework's read_table fallback. - All PKs flat top-level columns (lesson from databrickslabs#174). OHLC product block and SRMC contract block exploded so traded_at / period / front / delivery are flat. - No `from __future__ import annotations` anywhere (lesson from databrickslabs#173 — merge script doesn't special-case it). - get_partitions raises NotImplementedError for curves/instances so the framework falls back to read_table for non-partitioned tables. Simulate-mode tests: 18 passed, 1 skipped (cdc_with_deletes N/A). Live API testing deferred to /validate-connector. Co-authored-by: Isaac
Surfaced by record-mode validation against a live Energy Quantified
account. Three real connector bugs found (live requests failed) plus
spec/corpus reshapes for fields the simulator was missing.
Connector bugs (live API rejected the request or returned wrong data):
- /api/metadata/curves/ returns a flat JSON array, not an envelope
with `data`. _read_curves now handles flat-list responses and uses
Link/X-Total-Pages headers for pagination instead of looking for a
wrapper.
- /api/instances/{curve} rejects ISO 8601 datetimes with a 'T'
separator (400 VALIDATION ERROR: "Not a valid date time"). EQ
requires a space separator. Added to_eq_datetime helper in utils
and switched _init_ts and cursor formatting to use it.
- /api/srmc/{curve}/timeseries/ wraps records under
body.timeseries.data, not body.data. _explode_srmc_response now
reads from the correct path and pulls srmc_options out of
body.srmc_options.
Schema/shape reshapes (records still parsed but fields were missing):
- curve.resolution.{frequency,timezone} now flat on every embedded
curve via _normalise_curve / _embedded_curve so they're addressable
as flat columns where needed.
- srmc top-level metadata (contract, denominator, srmc_options) is
exploded out of the response envelope rather than left nested.
Spec / corpus:
- endpoints.yaml: curves declares a flat-list response (no wrapper);
timeseries / periods / ohlc / srmc embed resolution.{frequency,
timezone} inside the curve; srmc uses records_key: timeseries.data
to model the nested envelope.
- All six corpora reseeded from the live cassette via
tools.cassette_to_corpus. Curves trimmed to 20 records across 4
curve types so the simulator has variety.
Tests:
- tests/unit/sources/energy_quantified/configs/dev_table_config.json:
switched fictional simulator stand-in curve names to real subscribed
curves on the account; lowercase period values ("month"); limit on
instances so simulate mode terminates within the partition budget.
Regenerated _generated_energy_quantified_python_source.py.
Co-authored-by: Isaac
DLT's append_flow silently drops rows where any primary_key column is null. The srmc table's PK included both `front` and `delivery`, but the SRMC API's `contract` block only returns whichever of the two the caller queried with — the other is null on every record. For front-based queries (most common usage) this meant every srmc row had delivery=null, so the deployed pipeline ingested 0 rows into shared.matt_slack.srmc while local tests passed. Drop `delivery` from the srmc PK; (curve_name, date, period, front) is sufficient for uniqueness within a single request. `delivery` remains a regular nullable column for downstream filtering. The same shape exists on `ohlc`, but the OHLC API populates BOTH front and delivery on every record (delivery is the resolved contract month), so its PK works as-is. Left unchanged. Co-authored-by: Isaac
CI's pylint runs with the same flag set as /self-review-connector but flags three findings the connector-dev pass missed: - energy_quantified.py:43 (line-too-long): the absolute import path to energy_quantified_schemas is 101 chars. Adding `disable-next= line-too-long` since renaming the module or switching to a relative import is cosmetic at best. - energy_quantified_utils.py:25: same long absolute import — same fix. - energy_quantified.py _read_instances: 21 branches (max 20) and 57 statements (max 50). The function is doing legitimate pagination with multiple termination conditions; splitting it makes the cursor bookkeeping harder to follow. Disable `too-many-branches` and `too-many-statements` locally on the function rather than refactor. Co-authored-by: Isaac
Collaborator
|
please file LPP following go/community-connectors and go/community-connectors/tracking |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's included
src/databricks/labs/community_connector/sources/energy_quantified/energy_quantified_api_doc.mdenergy_quantified.py(+energy_quantified_schemas.py,energy_quantified_utils.py)source_simulator/specs/energy_quantified/endpoints.yaml+ 6 corpus JSON filesconnector_spec.yamlREADME.mdtests/unit/sources/energy_quantified/test_energy_quantified_lakeflow_connect.py+configs/dev_table_config.jsonTables (6)
European energy market data via Energy Quantified's curve-centric model:
curves(snapshot, page-based pagination)timeseries(append),ohlc(append),srmc(append) — all partitioned over time windowsinstances(append, backward cursor)periods(cdc with lookback, partitioned)Simulate-mode test results
(
test_read_table_deletesskipped — nocdc_with_deletestables.)Notable design choices
api_key) — every per-curve parameter lives intable_options%2Bencoding on+in curve names viaurllib.parse.quote(name, safe='')#174)from __future__ import annotations(#173)get_partitionsraisesNotImplementedErrorfor non-partitioned tables (curves,instances)The simulator spec and corpus have not yet been validated against the live API.
Next step: live testing
This pull request and its description were written by Isaac.