Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.
Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.
- 🧬 Patient medical histories that match age and demographics
- 💳 Bank transactions that obey balance constraints
- 🛡 Insurance policies aligned with real risk profiles
Typical data generators (like Faker) produce isolated random values. That’s fine for unit tests — but meaningless for system, analytics, or compliance testing.
Example:
# Faker – broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" – nonsense data.DATAMIMIC – contextual realism
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"Install and run:
pip install datamimic-ceDATAMIMIC lets you generate the same data, every time across machines, environments, or CI pipelines. Seeds, clocks, and UUIDv5 namespaces ensure your synthetic datasets remain reproducible and traceable, no matter where or when they’re generated.
from datamimic_ce.domains.facade import generate_domain
request = {
"domain": "person",
"version": "v1",
"count": 1,
"seed": "docs-demo", # identical seed → identical output
"locale": "en_US",
"clock": "2025-01-01T00:00:00Z" # fixed clock = stable time context
}
response = generate_domain(request)
print(response["items"][0]["id"])Result:
Same input → same output.
Behind the scenes, every deterministic request combines:
- A stable seed (for idempotent randomness),
- A frozen clock (for time-dependent values), and
- A UUIDv5 namespace (for globally consistent identifiers).
Together, they form a reproducibility contract. Ideal for CI/CD pipelines, agentic pipelines, and analytics verification.
Agents can safely re-invoke the same generation call and receive byte-for-byte identical data.
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)- PatientService – Demographically realistic patients
- DoctorService – Specialties match conditions
- HospitalService – Realistic bed capacities and types
- MedicalRecordService – Longitudinal health records
from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)- Balances respect transactions
- Card/IBAN formats per locale
- Distributions tuned for fraud analytics and reconciliation
PersonService– Culturally consistent names, addresses, phone patterns- Locale packs for DE / US / VN, versioned and auditable
- Frozen clocks and canonical hashing → reproducible IDs
- Seeded random generators → identical outputs across runs
- Schema validation (XSD, JSONSchema) → structural integrity
- Provenance hashing → audit-friendly lineage
📘 See Developer Guide
Python-based generation:
from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService
cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())Equivalent XML model:
<setup>
<generate name="seeded_seniors" count="3" target="CSV">
<variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
<key name="full_name" script="patient.full_name" />
<key name="age" script="patient.age" />
<array name="conditions" script="patient.conditions" />
</generate>
</setup>| Feature | Community (CE) | Enterprise (EE) |
|---|---|---|
| Deterministic domain generation | ✅ | ✅ |
| XML + Python pipelines | ✅ | ✅ |
| Healthcare & Finance domains | ✅ | ✅ |
| Multi-user collaboration | ❌ | ✅ |
| Governance & lineage dashboards | ❌ | ✅ |
| ML engines (Mostly AI, Gretel, Sklearn) | ❌ | ✅ |
| RBAC & audit logging (HIPAA/GDPR/PCI) | ❌ | ✅ |
| Managed EDIFACT / SWIFT adapters | ❌ | ✅ |
👉 Compare editions • Book a strategy call
# Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml
# Verify version
datamimic version- Core pipeline: Determinism kit + domain services + schema validators
- Governance layer: Group tables, linkage audits, provenance hashing
- Execution layer: CLI, API, and XML runners
- Simulate SWIFT / ISO 20022 flows
- Replay hashed PCI transaction histories
- Validate fraud and reconciliation pipelines
- Generate deterministic patient journeys
- Integrate HL7/FHIR/EDIFACT exchanges
- Reproduce QA datasets for regression testing
pip install datamimic-ceGenerate data that makes sense — deterministically. ⭐ Star us on GitHub if DATAMIMIC improves your testing workflow.