feat(domains): add demographic-driven domain services with json schema validation for llm tool support#119
Merged
ake2l merged 1 commit intodevelopmentfrom Oct 12, 2025
Merged
Conversation
…ervices with schema validation, locale packs, determinism utils, examples, and tests; make mypy/ruff clean
Why
- Provide first-class, deterministic domain services (address, person, patient, doctor) that compose locale data and demographic components.
- Enforce contracts at boundaries via JSON Schemas and structured domain errors.
- Ensure codebase is typing- and lint-clean to keep CI green and onboarding smooth.
What Changed
- Domains
- Added services and packages for address, person, patient, doctor backed by domain datasets.
- Introduced locales module to load locale packs (people/address/doctor/patient) with auto delimiter detection and strict mode support.
- Added common/demographics components: profile metadata and sampler composition; resolved supported component datasets.
- Added facade to centralize orchestration (thin boundary).
- Contracts & Validation
- Added schema_registry with cached schema loading and validate_payload() for request/response enforcement.
- Added domain JSON schemas for v1 requests/responses (person, patient, doctor, address).
- Introduced DomainError with structured details and consistent to_dict() payload.
- Determinism
- New determinism helpers: canonicalization, canonical JSON, stable UUIDs, seed derivation/mixing, provenance hashing, frozen clock, and explicit RNG wrapper.
- Data & Examples
- Added demographic group CSVs and per-country component/profile metadata (DE, US, VN).
- Added examples for all four domains (requests + responses).
- Tooling & Hygiene
- mypy: fixed inferred type narrowing and added a targeted type: ignore for jsonschema stubs and asdict narrowing.
- ruff: cleaned unused/deprecated typing imports and ensured repo style.
- Updated README.md and docs/README.md to reflect new capabilities.
- Minor update to datamimic_ce/domains/utils/dataset_path.py for dataset path compliance.
Tests
- Added API-level tests covering:
- Locale loading and dataset coverage.
- Component resolution and strict mode behavior.
- Group schema integrity, loader path compliance, and absence of JSON duplication.
- Sampler composition/runtime and cross-entity context behavior.
- Profile seed determinism and profile vs baseline distribution.
- Service purity and property tests.
- Deterministic by default (seeded, filesystem isolated where applicable).
- Local runs:
- uv run mypy datamimic_ce → Success: no issues
- ruff check datamimic_ce --unsafe-fixes --fix → All checks passed
- uv run pytest -q (please confirm locally; CI should enforce)
Risks & Roll-back
- Risk: Missing domain_data files per-locale can raise errors in strict mode.
- Mitigation: Clear error messages; examples provided; strict mode can be toggled via existing config.
- Risk: Runtime validation costs for JSON Schema on hot paths.
- Mitigation: Cached validators; can gate and/or limit to boundary checks.
- Roll-back: Revert domain additions and schema registry changes in a single revert; typing/lint-only changes are isolated and safe.
Docs/CLI
- README/docs updated to link new domain services and examples.
- No public CLI changes; commands remain stable.
Assumptions & Alternatives
- Assumption: JSON Schema at runtime is acceptable. Alternative: migrate boundary validation to Pydantic models for tighter typing, generate JSON Schema from models for docs/contract sync.
- Assumption: jsonschema stubs not required short-term. Alternative: add types-jsonschema to dev deps to drop the type: ignore.
- Assumption: locale data shipped as CSV remains manageable. Alternatives: consolidate using a compact serialized format or lazy-load per component to reduce memory.
Performance Notes
- CSV access is file-backed with simple line iteration; hotspot risk is low.
- Determinism utilities use SHA-256 and minimal allocations; no O(N²) patterns detected in added paths.
Compatibility
- Additive feature set; no breaking changes to existing public APIs.
- SemVer: minor version bump appropriate.
DoD Checklist
- Types and style pass (mypy/ruff)
- Tests added and deterministic
- Public errors are human-readable; internal exceptions typed
- Docs updated; examples runnable
- No material performance regressions observed
- No architecture violations (SOC/SPOT/DRY/KISS)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



address/person/patient/doctor services with schema validation, locale packs, determinism utils, examples, and tests; make mypy/ruff clean
Why
What Changed
Tests
Risks & Roll-back
Docs/CLI
Assumptions & Alternatives
Performance Notes
Compatibility
DoD Checklist