Companion to #14. This roadmap presumes the layered-module refactor proposed there lands first — several sections below reference classes/modules (cmdr-clinical.yaml, Measurement, Provider, OmopCoded mixin) that don't exist in cmdr today.
Framing: what "interoperability" actually means here
Three layers, often conflated:
- Schema-level compatibility — cmdr's class shapes, slot names, and enum codings don't fight OMOP's.
- Vocabulary-level compatibility — coded fields bind to OMOP concept IDs using a shared source-vs-standard pattern.
- Data-level transformation — a deterministic, testable pipeline that converts cmdr instance data to OMOP CDM rows (and, more modestly, the reverse).
All three matter, but they're sequential: data transforms are brittle unless schema + vocabulary conventions hold first.
Directional asymmetry
- cmdr → OMOP (common path): export research cohort data for analytics in an OMOP-shaped warehouse. High-fidelity for clinical events (Condition, DrugExposure, Measurement, Visit, Procedure, Death). Lossy for cmdr's research-context additions (Study, Consent, Specimen activity stack, Questionnaire, Family). Acceptable loss, as long as we're explicit about it.
- OMOP → cmdr (niche but useful): wrap an existing OMOP dataset as a cmdr cohort — e.g., when a research project draws from an EHR warehouse. Clinical tables map in trivially; cmdr's
Study, Consent, Participant metadata has to be synthesized from project context.
Treat the cmdr → OMOP direction as load-bearing and ship it first; leave OMOP → cmdr for a later phase.
Phased roadmap
Phase 0 — Baseline (pre-req)
Phase 1 — Schema alignment with OMOP conventions
- Document an explicit principle in cmdr: "when a concept is clinical and an OMOP concept exists, bind to it by default."
- Sweep existing bdchm-style
meaning: OMOP:xxxxx bindings in bdc.yaml/bdchm.yaml into the cmdr enums they should have been in all along.
- Define cmdr temporal conventions that round-trip to OMOP's
*_date / *_datetime pairs (cmdr's TimePoint already supports both).
Phase 2 — The concept-triple mixin
Add a LinkML mixin OmopCoded with three slots:
concept_id — OMOP standard concept (integer)
source_value — the raw recorded string
source_concept_id — OMOP concept for the source vocabulary's own term
Apply it to Condition.code, DrugExposure.drug, Measurement.observation_type, Procedure.procedure_type, Visit.category, Observation.type, etc. — via slot-level mixin application so downstream schemas can opt out. Rationale: OMOP's most durable contribution is the three-part provenance of a coded value. cmdr should preserve that, even when OMOP concept_ids aren't yet known (nullable).
Phase 3 — The cmdr-to-omop transform (see "Transform approach" below)
- Author
project/mappings/cmdr-to-omop.transform.yaml using linkml-transformer.
- Ship a fixture in
examples/ that round-trips: synthetic Participant + Condition + DrugExposure + Measurement + Specimen + Questionnaire → OMOP CSV rows that validate against the OMOP CDM DDL.
- Run this as a CI conformance test on every cmdr release.
Phase 4 — Reverse direction (OMOP → cmdr), lightweight
- A small adapter that wraps a PERSON + event set as a cmdr
Participant with a placeholder Study and empty Consent, and translates clinical events back via the inverse rules.
- Scope this narrowly — don't try to reconstruct research context that isn't there.
Phase 5 — Tooling, docs, ecosystem
- Python helper that chains: validate cmdr JSON → transform → write OMOP CSV / Parquet / SQL bulk-load.
- Publish a "cmdr for OMOP users" and "OMOP for cmdr users" page on the cmdr site.
- Engage with the OHDSI community for feedback.
Transform approach: cmdr-to-omop.transform.yaml
Use linkml-transformer's derivation rules. Per target table, define: source class(es), ID strategy, concept-triple unpacking, date decomposition, type-concept mapping, and foreign-key resolution.
Identity strategy (universal rule):
- cmdr uses string IDs; OMOP uses integer surrogate keys. Introduce a stable deterministic mapping (e.g.,
xxhash64(cmdr_id) mod 2^31) generated once per transform run and persisted alongside output for reproducibility.
- Preserve the original cmdr ID in each table's
*_source_value column wherever OMOP allows.
Class-by-class (abridged):
| cmdr class |
OMOP target |
ID |
Key field mappings |
Participant |
PERSON |
person_id from hash(Participant.id); original → person_source_value |
demography.sex→gender_concept_id (OMOP:8507/8532); race→race_concept_id; ethnicity→ethnicity_concept_id; yearOfBirth→year_of_birth |
ObservationPeriod |
OBSERVATION_PERIOD |
sequence |
period.start/end → observation_period_start/end_date |
Visit |
VISIT_OCCURRENCE |
sequence |
category (OmopCoded) → visit_concept_id+source; participant→person_id; period→visit_start/end_datetime |
Condition |
CONDITION_OCCURRENCE |
sequence |
code (OmopCoded) → condition_concept_id/source; provenance → condition_type_concept_id; period → condition_start/end_date; visit → visit_occurrence_id |
DrugExposure |
DRUG_EXPOSURE |
sequence |
drug (OmopCoded) → drug_concept_id/source; provenance → drug_type_concept_id; dose/route/quantity → quantity/dose_unit_concept_id |
DeviceExposure |
DEVICE_EXPOSURE |
sequence |
analogous |
Procedure |
PROCEDURE_OCCURRENCE |
sequence |
analogous |
Measurement |
MEASUREMENT |
sequence |
type (OmopCoded) → measurement_concept_id; value→value_as_number or value_as_concept_id; unit (OmopCoded) → unit_concept_id |
Observation / SdohObservation |
OBSERVATION |
sequence |
analogous to Measurement but qualitative/SDOH |
CauseOfDeath |
DEATH |
— |
period.start → death_date; cause (OmopCoded) → cause_concept_id |
Organization |
CARE_SITE (+ LOCATION sidecar) |
sequence |
name → care_site_name; address slots → LOCATION fields |
Provider (new abstract) |
PROVIDER |
sequence |
name → provider_name; specialty (OmopCoded) → specialty_concept_id |
Specimen (core) |
SPECIMEN |
sequence |
type (OmopCoded) → specimen_concept_id; collection date → specimen_date; quantity → quantity+unit_concept_id |
Relationship / Family |
FACT_RELATIONSHIP |
— |
relationshipType (OmopCoded) → relationship_concept_id |
QuestionnaireResponseItem |
OBSERVATION |
sequence |
item (OmopCoded, e.g. PROMIS/LOINC) → observation_concept_id; response value → value_as_* |
Hard cases / accepted losses:
- Study / ResearchStudy — no OMOP target. Write to OMOP's
METADATA table as free-form provenance rows, and emit a sidecar cmdr_study.json alongside the OMOP output. Document that OMOP consumers can ignore it.
- Consent — no OMOP target. Same sidecar pattern. (OHDSI's nascent work on data-use compatibility could inform future alignment.)
- SpecimenContainer / SpecimenActivity stack (creation/processing/storage/transport) — OMOP's SPECIMEN is a single flat row. Accept the loss; emit a
cmdr_specimen_lineage.json sidecar for recoverability.
- Questionnaire (instrument structure, skip logic, item grouping) — only the responses survive in OMOP as OBSERVATION rows; the instrument definition goes to sidecar.
- Group / Characteristic / cohort criteria — could go to OMOP
COHORT_DEFINITION, but only if the cohort was derivable from data; otherwise sidecar.
- Unmapped concepts — set
*_concept_id = 0 (OMOP convention), keep original in *_source_value. Don't silently drop.
Governance & versioning
- Target OMOP version: pin to v5.4 initially. Add a 6.0 track when 6.0 uptake passes a threshold (currently low).
- Mapping ownership: lives in the
linkml/cmdr repo so it evolves with cmdr. Tag releases matching cmdr's semver.
- Conformance: CI loads transform output into a Postgres container with OMOP v5.4 DDL; failures block release.
- Vocabulary currency: OMOP concept_ids change over time. Pin to an ATHENA vocabulary snapshot and record the snapshot ID in transform provenance.
Open design questions
- Where should the
OmopCoded mixin live — in core cmdr.yaml, or in cmdr-clinical.yaml? (Argument for core: observations and non-clinical things also benefit. Argument for clinical: keeps core domain-neutral.)
- Do we want the transform to be lossless round-trippable (cmdr → OMOP+sidecars → cmdr), or accept it as one-way export? Round-trippability is a strong invariant but costs complexity.
- What's the minimal cmdr profile we promise maps cleanly? Declare a "cmdr-omop-conformant" subset; schemas adding beyond it carry their own mapping burden.
Success criteria (what "done" looks like for v1)
- A published
cmdr-to-omop.transform.yaml that covers the table above.
- CI that validates transform output against OMOP v5.4 DDL on a fixture dataset.
- Documented list of accepted losses and sidecar conventions.
- One real-world consumer (likely bdchm) using the mapping in anger.
Companion to #14. This roadmap presumes the layered-module refactor proposed there lands first — several sections below reference classes/modules (
cmdr-clinical.yaml,Measurement,Provider,OmopCodedmixin) that don't exist in cmdr today.Framing: what "interoperability" actually means here
Three layers, often conflated:
All three matter, but they're sequential: data transforms are brittle unless schema + vocabulary conventions hold first.
Directional asymmetry
Study,Consent,Participantmetadata has to be synthesized from project context.Treat the
cmdr → OMOPdirection as load-bearing and ship it first; leaveOMOP → cmdrfor a later phase.Phased roadmap
Phase 0 — Baseline (pre-req)
MeasurementObservation→Measurement; keepSdohObservationand siblings asObservationsubclasses so names align literally with OMOP.ObservationPeriodtocmdr-clinical.yaml.Providersubclass ofPerson/Agent.Phase 1 — Schema alignment with OMOP conventions
meaning: OMOP:xxxxxbindings inbdc.yaml/bdchm.yamlinto the cmdr enums they should have been in all along.*_date/*_datetimepairs (cmdr'sTimePointalready supports both).Phase 2 — The concept-triple mixin
Add a LinkML mixin
OmopCodedwith three slots:concept_id— OMOP standard concept (integer)source_value— the raw recorded stringsource_concept_id— OMOP concept for the source vocabulary's own termApply it to
Condition.code,DrugExposure.drug,Measurement.observation_type,Procedure.procedure_type,Visit.category,Observation.type, etc. — via slot-level mixin application so downstream schemas can opt out. Rationale: OMOP's most durable contribution is the three-part provenance of a coded value. cmdr should preserve that, even when OMOP concept_ids aren't yet known (nullable).Phase 3 — The
cmdr-to-omoptransform (see "Transform approach" below)project/mappings/cmdr-to-omop.transform.yamlusinglinkml-transformer.examples/that round-trips: synthetic Participant + Condition + DrugExposure + Measurement + Specimen + Questionnaire → OMOP CSV rows that validate against the OMOP CDM DDL.Phase 4 — Reverse direction (OMOP → cmdr), lightweight
Participantwith a placeholderStudyand emptyConsent, and translates clinical events back via the inverse rules.Phase 5 — Tooling, docs, ecosystem
Transform approach:
cmdr-to-omop.transform.yamlUse
linkml-transformer's derivation rules. Per target table, define: source class(es), ID strategy, concept-triple unpacking, date decomposition, type-concept mapping, and foreign-key resolution.Identity strategy (universal rule):
xxhash64(cmdr_id) mod 2^31) generated once per transform run and persisted alongside output for reproducibility.*_source_valuecolumn wherever OMOP allows.Class-by-class (abridged):
ParticipantPERSONperson_idfrom hash(Participant.id); original →person_source_valueObservationPeriodOBSERVATION_PERIODVisitVISIT_OCCURRENCEConditionCONDITION_OCCURRENCEDrugExposureDRUG_EXPOSUREDeviceExposureDEVICE_EXPOSUREProcedurePROCEDURE_OCCURRENCEMeasurementMEASUREMENTObservation/SdohObservationOBSERVATIONCauseOfDeathDEATHOrganizationCARE_SITE(+LOCATIONsidecar)Provider(new abstract)PROVIDERSpecimen(core)SPECIMENRelationship/FamilyFACT_RELATIONSHIPQuestionnaireResponseItemOBSERVATIONHard cases / accepted losses:
METADATAtable as free-form provenance rows, and emit a sidecarcmdr_study.jsonalongside the OMOP output. Document that OMOP consumers can ignore it.cmdr_specimen_lineage.jsonsidecar for recoverability.COHORT_DEFINITION, but only if the cohort was derivable from data; otherwise sidecar.*_concept_id = 0(OMOP convention), keep original in*_source_value. Don't silently drop.Governance & versioning
linkml/cmdrrepo so it evolves with cmdr. Tag releases matching cmdr's semver.Open design questions
OmopCodedmixin live — in corecmdr.yaml, or incmdr-clinical.yaml? (Argument for core: observations and non-clinical things also benefit. Argument for clinical: keeps core domain-neutral.)Success criteria (what "done" looks like for v1)
cmdr-to-omop.transform.yamlthat covers the table above.