Skip to content

feat: static cube:Cube metadata + ObservationSet link (#41)#51

Open
redlink-code-engine wants to merge 1 commit into
mainfrom
feature/issue-41-static-metadata
Open

feat: static cube:Cube metadata + ObservationSet link (#41)#51
redlink-code-engine wants to merge 1 commit into
mainfrom
feature/issue-41-static-metadata

Conversation

@redlink-code-engine
Copy link
Copy Markdown
Collaborator

Closes #41 (partial — see Out of scope).

Summary

  • New src/ogd_to_lod/metadata/ module with a deterministic MetadataGenerator that emits a companion metadata.ttl for every mapping.
  • The Turtle declares the dataset as a cube:Cube (IRI = base_uri) and its cube:ObservationSet (IRI = <base_uri>observation-set), populated from dataset_context.
  • ISO dates → xsd:date, ISO datetimes → xsd:dateTime, license URLs are emitted as IRIs (else as strings).
  • generate_node now also populates state.generated_metadata; create_pr_node / GitHubService.create_mapping_pr commit metadata.ttl as a third file alongside the YARRRML mapping and the CSV.
  • The YARRRML prompt is extended with a required observationSetLink mapping so that each row also emits <observation-set> cube:observation <obs> — keeping cube.link's forward-only linking convention (no back-pointer on cube:Observation).

IRI conventions chosen

  • Cube: <base_uri> (the dataset's root URI is the cube)
  • ObservationSet: <base_uri>observation-set
  • Observation: unchanged, ex-obs:$(...)

Out of scope (deferred to follow-up)

  • Per-property cube:DimensionConstraint / cube:MeasureConstraint blocks under a cube:ObservationConstraint
  • Static dimension values from context (e.g. fixed RAUM = Zürich)

Test plan

  • pytest -q → 373 passed, 3 skipped (integration tests requiring RMLMapper/Java)
  • Prompt template still formats correctly with {base_uri} substitution
  • Manual end-to-end run on a real dataset to inspect the generated metadata.ttl and the YARRRML's observationSetLink block

Adds a deterministic Turtle generator that produces a companion
metadata.ttl alongside the YARRRML mapping, declaring the dataset
as a cube.link Cube and its ObservationSet:

- New module src/ogd_to_lod/metadata/ with MetadataGenerator
- Cube IRI = base_uri itself, ObservationSet = <base_uri>observation-set
- Emits schema:name/description/publisher/keywords, dcterms:identifier/
  issued/modified/license from dataset_context (when present)
- ISO dates → xsd:date, ISO datetimes → xsd:dateTime, license URLs → IRI
- Always emits cube:observationSet link, even with empty context

Wired into the flow so generate_node populates state.generated_metadata
and create_pr_node commits metadata.ttl (3rd file in the PR alongside
mapping.yarrrml.yaml and the CSV).

Per-row Set→Observation linking is delegated to YARRRML: the prompt is
extended with a required `observationSetLink` mapping that emits
cube:observation triples on the static set IRI, keeping cube.link's
forward-only linking convention.

Per-property cube:DimensionConstraint / cube:MeasureConstraint and
static dimension values from issue #41 are out of scope here and
deferred to a follow-up issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate static RDF triples alongside YARRRML (cube:Cube, property definitions, constraints)

1 participant