-
Notifications
You must be signed in to change notification settings - Fork 128
Initial federated COVID-rich ICU database documentation. #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 13 commits
a8fd7f8
d886750
c720f84
d3104c2
f3e35fc
1a96d34
0b8287f
f2580ef
1d5e663
a94bf9e
c397bb6
4a74db5
9739e7e
1a81904
e6fa4d1
9cc1e41
d54af00
a34c0e9
edb72f0
2f8806e
5f05e6f
908c7c3
0f0db10
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| --- | ||
| title: "MIMIC-Northwestern documentation" | ||
| linktitle: Multi-center | ||
| weight: 45 | ||
|
|
||
| cascade: | ||
| - type: "docs" | ||
| _target: | ||
| path: "/**" | ||
|
|
||
| description: > | ||
| MIMIC-Northwestern: A Harmonized Multi-center COVID-rich ICU Database | ||
| --- | ||
| We introduce MIMIC-Northwestern, a large harmonized multi-center COVID-rich ICU database. It comprises deidentified health-related data from Beth Israel Deaconess Medical Center (BIDMC) and Northwestern Memorial HealthCare (NMHC) spanning 2020 to 2022, capturing the data distribution shifts during this critical period. The database adopts a similar data structure as MIMIC-IV v2.2. | ||
|
|
||
| Notably, Northwestern Memorial HealthCare (NMHC) uses the Epic electronic medical records (EMR) system. To make the EMR data available for research and quality assurance, the NM EMR systems transfer selected data into a relational Enterprise Data Warehouse (NM EDW). | ||
|
|
||
| The NM EDW tables are categorized into two primary categories, Fact and Dimension, following data warehousing conventions. As implemented in the NM EDW, Fact tables primarily contain events (such as encounters, admissions, diagnosis events, procedure orders, and medication orders), while Dimension tables describe persistent attributes of entities (patients, procedure names, the medication formulary). | ||
|
|
||
| The NM EDW also includes auxiliary tables not directly related to patient care, such as a list of International Classification of Disease codes (ICD-9 and ICD-10). In response to the COVID-19/SARS-COV-2 pandemic, a COVID-19 data mart was created within the EDW to provide convenient access to information on COVID-19 patients, lab results, medications and treatments. | ||
|
|
||
| The MIMIC-Northwestern database is currently organized into two distinct modules to highlight the source of the data: | ||
|
|
||
| - [Hosp](/docs/multi-center/modules/hosp/) - Hospital level data including patients, admissions, labs, ICD diagnoses for billing purposes, prescriptions, and electronic medication administration records. | ||
| - [ICU](/docs/multi-center/modules/icu/) - ICU level data including icu stays, procedure events, chartevents (vital signs). | ||
|
|
||
| {{% pageinfo %}} | ||
| The MIMIC-Northwestern database is not yet released and its structure is subject to change. | ||
| {{% /pageinfo %}} | ||
|
|
||
| The tables structures adopted to align with MIMIC's data structure for each module are detailed in the respective sections. Additionally, we have incorporated COVID-related concepts and standard terminologies (LOINC, RxNorm, SNOMED, ICD-9/10) and derived mappings (for drug administration) into the dataset. This integration not only facilitates current multi-center initiatives, but also facilities interoperability, allowing for seamless data exchange and collaboration across healthcare systems. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| title: "Modules" | ||
| linkTitle: "Modules" | ||
| weight: 3 | ||
| date: 2023-09-18 | ||
| description: > | ||
| Description of the data contained in each of the MIMIC-Northwestern modules. | ||
| --- | ||
|
|
||
| Data within the modules will be made available on PhysioNet. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| --- | ||
| title: "Hosp" | ||
| linkTitle: "Hosp" | ||
| date: 2023-09-18 | ||
| weight: 20 | ||
| description: > | ||
| The Hosp module comprises data sourced from the comprehensive Electronic Health Record (EHR) systems of both BIDMC and NMHC hospitals. Information covered includes patient and admission information, laboratory measurements, billed diagnoses, medication orders, and electronic medication administration records. | ||
| --- | ||
|
|
||
| The Hosp module contains data derived from the hospital wide EHR of BIDMC and NMHC. These measurements are predominantly recorded during the hospital stay, though some tables include data from outside the hospital as well (e.g. outpatient laboratory tests in *labevents*). | ||
|
|
||
| Information includes patient and admission details (*patients*, *admissions*), laboratory measurements (*labevents*, *d_labitems*), hospital billing information (*diagnoses_icd*, *d_icd_diagnoses*), medication orders (*prescriptions*), and electronic medication administration records (*emar*). | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,145 @@ | ||
| --- | ||
| title: "admissions table" | ||
| linktitle: "admissions" | ||
| date: 2023-09-18 | ||
| weight: 1 | ||
| description: > | ||
| Detailed information about hospital stays, including admission, discharge, and death times, as well as admission type, admission location, and discharge location; additionally, patient details such as insurance, language, marital status, and race are recorded at the hospital stay level. | ||
| --- | ||
|
|
||
| The *admissions* table gives information regarding a patient's admission to the hospital. | ||
|
|
||
| ### Links to | ||
|
|
||
| * *patients* on `subject_id` | ||
|
|
||
| ## Table columns | ||
|
|
||
| Name | Postgres data type | ||
| ---- | ---- | ||
| `subject_id` | INTEGER NOT NULL | ||
| `hadm_id` | INTEGER NOT NULL | ||
| `admittime` | TIMESTAMP NOT NULL | ||
| `dischtime` | TIMESTAMP | ||
| `deathtime` | TIMESTAMP | ||
| `admission_type` | VARCHAR(40) NOT NULL | ||
| `admission_location` | VARCHAR(60) | ||
| `discharge_location` | VARCHAR(60) | ||
| `insurance` | VARCHAR(255) | ||
| `language` | VARCHAR(10) | ||
| `marital_status` | VARCHAR(30) | ||
| `race` | VARCHAR(80) | ||
| `ethnicity` | VARCHAR(80) | ||
| `edregtime` | TIMESTAMP | ||
| `edouttime` | TIMESTAMP | ||
| `hospital_expire_flag` | SMALLINT | ||
|
|
||
| ## Detailed description | ||
|
|
||
| The *admissions* table defines all hospitalizations in the database. Hospitalizations are assigned a unique random integer known as the `hadm_id`. | ||
|
|
||
| ### `subject_id` | ||
|
|
||
| `subject_id` is unique identifier for each patient. `subject_id` is unique to each row and can be used to identify data associated with a specific patient. It is a cryptographic random number and each patient has a `subject_id` which is consistent across tables. | ||
|
|
||
| ### `hadm_id` | ||
|
|
||
| Each row of this table contains a unique `hadm_id`, which represents a single patient's admission to the hospital. It is possible for this table to have duplicate `subject_id`, indicating that a single patient had multiple admissions to the hospital. The ADMISSIONS table can be linked to the *patients* table using `subject_id`. | ||
|
|
||
| ### `admittime` | ||
|
|
||
| `admittime` provides the date and time the patient was admitted to the hospital. | ||
|
|
||
| ### `dischtime` | ||
|
|
||
| `dischtime` provides the date and time the patient was discharged from the hospital. | ||
|
|
||
| ### `deathtime` | ||
|
|
||
| `deathtime` provides the time of in-hospital death for the patient. Note that `deathtime` is only present if the patient died in-hospital, and if present is almost always the same as the patient’s dischtime. However, there may be some discrepancies. | ||
|
|
||
| ### `admission_type` | ||
|
|
||
| `admission_type` is useful for classifying the urgency of the admission. There are 6 distinct additional admission types sourced from the NW EDW database: 'Emergency', 'Urgent', 'Elective', 'Elective-Routine', and 'Trauma'. | ||
|
|
||
| ### `admission_location` | ||
|
|
||
| `admission_location` provides information about the hospital department into which the patient was initially admitted. There are 24 admission locations from NW EDW, including 'Neurology', 'Radiation Oncology', 'Pediatrics', 'Medicine', 'Respiratory Therapy', 'Cardiology', 'Cardiac Rehabilitation', 'Pre-Admission Testing', 'Neurological Intensive Care', 'Orthopaedic Surgery', 'Sleep Medicine', 'Gastroenterology', 'Unknown', 'Obstetrics and Gynecology', 'Emergency Medicine', 'Research', 'Intensive Care', 'Gynecology', 'Pediatric Intensive Care', 'Radiology', 'Pathology', 'Obstetrics', and 'Surgery'. Note, 'Pediatrics' is the name of the unit or room, which is not necessarily exclusively for pediatric patients. The data being shared pertains to the adult hospital, with patients aged 18 and above. | ||
|
|
||
|
|
||
| ## `discharge_location` | ||
|
|
||
| Similarly, `discharge_location` is the disposition of the patient after they are discharged from the hospital. There are 33 discharge locations from NW EDW. Some of the 33 discharge locations are suppressed under 'Other Facility' for privacy. | ||
|
|
||
| NMHC discharge locations: | ||
|
|
||
| | Discharge Location | Full Abbreviation (for clarity) | | ||
| | ------------------------------------------------------- | --------------------------------- | | ||
| | Expired | Died | | ||
| | Planned Readmission - DC/transferred to acute inpatient rehab | | | ||
| | ED Dismissed-Never Arrived | | | ||
| | Home with Equipment or O2 | | | ||
| | Shelter | | | ||
| | Expired - Hospice | Died in Hospice | | ||
| | Home with Home Health Care | | | ||
| | Planned Readmission - DC/transferred to skilled nursing facility | | | ||
| | Acute Inpatient Rehabilitation | | | ||
| | Home or Self Care | | | ||
| | Group Home | | | ||
| | Planned Readmission - Discharged to home/self-care | | | ||
| | Left Against Medical Advice | | | ||
| | Inpatient Hospice | | | ||
| | Admitted to L&D | Admitted to Labor and Delivery | | ||
| | Planned Readmission - DC/transferred to nursing home (custodial) | | | ||
| | unknown | | | ||
| | Cancer Center or Children's Hospital | | | ||
| | Home with Outpatient Services | | | ||
| | Critical Access Hospital | | | ||
| | Planned Readmission - DC/transferred to other type of healthcare institution | | | ||
| | Gift of Hope / Still a Patient | | | ||
| | Nursing Home (Custodial) | | | ||
| | Home with Hospice | | | ||
| | VA System Facility | | | ||
| | Planned Readmission - DC/transferred to Long-term Acute Care Hospital (LTAC) | | | ||
| | Swing Bed | | | ||
| | Against Medical Advice (AMA) or Elopement | | | ||
| | Skilled Nursing Facility or Subacute Rehab Care | | | ||
| | Designated Disaster Alternative Care Site | | | ||
| | Acute Care Hospital | | | ||
| | Long-Term Acute Care Hospital (LTAC) | | | ||
|
|
||
|
|
||
| ### `insurance`, `language`, `marital_status`, `race`, `ethnicity` | ||
|
|
||
| The `insurance`, `language`, `marital_status`, and `race` and `ethnicity` columns provide information about patient demographics for the given hospitalization. Note, in BIDMC there is only one column for `race`, however we have added `ethnicity` column to incorporate NMHC's data. | ||
|
|
||
| The race column in NMHC includes: | ||
|
|
||
| - American Indian or Alaska Native | ||
| - Other | ||
| - Unknown | ||
| - 2 or more races | ||
| - Unable to Answer | ||
| - Native Hawaiian or Other Pacific Islander | ||
| - Asian | ||
| - White | ||
| - Declined | ||
| - Black or African American | ||
|
|
||
| The ethnicity column in NMHC includes: | ||
|
|
||
| - Not Hispanic or Latino | ||
| - Hispanic or Latino | ||
| - Declined | ||
| - Unable to Answer | ||
|
|
||
| ### `edouttime` | ||
alistairewj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The date and time at which arrival of the patient in the emergency department was registered. | ||
|
|
||
| ### `edouttime` | ||
| The date and time at which the patient was discharged from the emergency department, either discharged from the hospital or transferred. | ||
|
|
||
| ### `hospital_expire_flag` | ||
|
|
||
| This is a binary flag which indicates whether the patient died within the given hospitalization. `1` indicates death in the hospital as noted in the `dod` column as part of the *patient* table, and `0` indicates survival to hospital discharge. | ||
alistairewj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| --- | ||
| title: "d_icd_diagnoses" | ||
| linktitle: "d_icd_diagnoses" | ||
| weight: 6 | ||
| date: 2023-09-18 | ||
| description: > | ||
| Dimension table for *diagnoses_icd*; provides a description of ICD-9/ICD-10 billed diagnoses. | ||
| --- | ||
|
|
||
| The *d_icd_diagnoses* table defines International Classification of Diseases (ICD) Version 9 and 10 codes for **diagnoses**. These codes are assigned at the end of the patient's stay and are used by the hospital to bill for care provided. | ||
|
|
||
| ### Links to | ||
|
|
||
| * *diagnoses_icd* ON `icd_code` and `icd_version` | ||
|
|
||
| ## Table columns | ||
|
|
||
| Name | Postgres data type | ||
| ---- | ---- | ||
| `icd_code` | CHAR(7) NOT NULL | ||
| `icd_version` | INTEGER NOT NULL | ||
| `long_title` | VARCHAR(255) | ||
|
|
||
| ## Detailed Description | ||
|
|
||
| ### `icd_code` | ||
|
|
||
| `icd_code` is the International Coding Definitions (ICD) code. | ||
|
|
||
| ### `icd_version` | ||
| There are two versions for this coding system: version 9 (ICD-9) and version 10 (ICD-10). These can be differentiated using the `icd_version` column. [ICD-9](https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes) and [ICD-10](https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM.html) diagnosis codes are acquired from Centers for Medicare & Medicaid Services (CMS). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. all of the data for this subset should be ICD-10 - can't imagine NW using ICD-9 in 2020 onward - perhaps make it clear the description of ICD-9 is just for informational purposes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, added a clarifying statement to indicate ICD 9 codes are just for informational purposes, after mandate of ICD10 in 2015 based on CMS. |
||
|
|
||
| In general, ICD-10 codes are more detailed, though code mappings (or "cross-walks") exist which convert ICD-9 codes to ICD-10 codes. | ||
|
|
||
| Both ICD-9 and ICD-10 codes are often presented with a decimal. This decimal is not required for interpretation of an ICD code; i.e. the `icd_code` of '0010' is equivalent to '001.0'. | ||
|
|
||
| ICD-9 and ICD-10 codes have distinct formats: ICD-9 codes are 5 character long strings which are entirely numeric (with the exception of codes prefixed with "E" or "V" which are used for external causes of injury or supplemental classification). Importantly, ICD-9 codes are retained as strings in the database as the leading 0s in codes are meaningful. | ||
|
|
||
| ICD-10 codes are 3-7 characters long and always prefixed by a letter followed by a set of numeric values. | ||
|
|
||
| ICD-11 became the official [WHO standard](https://www.who.int/standards/classifications/classification-of-diseases) on January 1, 2022 but has not been adopted in the US. The US Center for Medicare and Medicaid services (CMS) and HIPAA require ICD-10 since October 1, 2015. | ||
|
|
||
| ### `long_title` | ||
|
|
||
| The `long_title` provides a description of the ICD code. For example, the ICD-10 code U07.1 has a `long_title` of 'COVID-19 (confirmed by laboratory testing)'. | ||
|
|
||
| In the tables below, we provide ICD-10 codes related to covid or long covid. | ||
|
|
||
| Terminologies related to COVID markers in the ICD-10: | ||
|
|
||
| | icd_code | long_title | | ||
| | -------- | -------------------------------------------------------- | | ||
| | U07.1 | COVID-19 (confirmed by laboratory testing) | | ||
| | U07.2 | COVID-19, virus not identified | | ||
| | U10.9 | Multisystem inflammatory syndrome associated with COVID-19 | | ||
| | J12.81 | Pneumonia due to SARS-associated coronavirus | | ||
|
|
||
|
|
||
| Terminologies for Long COVID markers in ICD-10: | ||
|
|
||
| | icd_code | long_title | | ||
| | -------- | ---------------------------------------------- | | ||
| | U09.9 | Post COVID-19 condition, unspecified | | ||
|
|
||
| Terminologies related to other COVID aspects in ICD-10: | ||
|
|
||
| | icd_code | long_title | | ||
| |----------|-----------------------------------------------------------------------------------------------------------| | ||
| | U08.9 | Personal history of COVID-19, unspecified (not a marker) | | ||
| | B97.2 | Coronavirus as the cause of diseases classified elsewhere (not necessarily COVID-19) | | ||
| | B97.21 | SARS-associated coronavirus as the cause of diseases classified elsewhere | | ||
| | Z28.31 | Underimmunization for COVID-19 status (see detailed codes below) | | ||
| | Z28.310 | Unvaccinated for COVID-19 | | ||
| | Z28.311 | Partially vaccinated for COVID-19 | | ||
| | B97.29 | Other coronavirus as the cause of diseases classified elsewhere (SUPERSEDED; early coding guidelines) | | ||
| | Z20.822 | Contact with and (suspected) exposure to COVID-19 (unconfirmed) | | ||
| | Z86.16 | Personal history of COVID-19 | | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| title: "d_labitems" | ||
| linktitle: "d_labitems" | ||
| weight: 4 | ||
| date: 2023-09-18 | ||
| description: > | ||
| Dimension table for *labevents* provides a description of all lab items. | ||
| --- | ||
|
|
||
| ## *d_labitems* | ||
|
|
||
| *d_labitems* contains definitions for all `itemid` associated with lab measurements in the MIMIC database. All data in *labevents* link to the *d_labitems* table. Each unique (`fluid`, `category`, `label`) tuple in the hospital database was assigned an `itemid` in this table, and the use of this `itemid` facilitates efficient storage and querying of the data. | ||
|
|
||
| Laboratory data contains information collected and recorded in the hospital laboratory database. This includes measurements made in wards within the hospital and clinics outside the hospital. Most concepts in this table have been mapped to LOINC codes, an openly available ontology which facilitates interoperability. | ||
|
|
||
| For the data sourced from NMHC, Illinois law defines certain categories of information as Sensitive Protected Health Information (SPHI) which require special treatment. SPHI includes genetic counseling but does not include genetic testing. | ||
|
|
||
| To facilitate further multi-center initiatives, the lab mappings to standard terminologies (LOINC) will be released. | ||
|
|
||
| ### Links to | ||
|
|
||
| * *labevents* on `itemid` | ||
|
|
||
| ## Table columns | ||
|
|
||
| Name | Postgres data type | ||
| ---- | ---- | ||
| `itemid` | INTEGER | ||
| `label` | VARCHAR(50) | ||
| `fluid` | VARCHAR(50) | ||
| `category` | VARCHAR(50) | ||
|
|
||
| ## Detailed Description | ||
|
|
||
| ### `itemid` | ||
|
|
||
| A unique identifier for a laboratory concept. `itemid` is unique to each row, and can be used to identify data in labevents associated with a specific concept. | ||
|
|
||
| ### `label` | ||
|
|
||
| The `label` column describes the concept which is represented by the `itemid`. | ||
|
|
||
| We provide a list of common COVID-19 tests and measurements in the database, as defined by LOINC terminology, below: | ||
|
|
||
| - SARS-CoV-2 (COVID-19) [Presence] in Specimen by Organism specific culture | ||
|
||
| - SARS-CoV-2 (COVID-19) Ag [Presence] in Respiratory specimen by Rapid immunoassay | ||
| - SARS-CoV-2 (COVID-19) N gene [Cycle Threshold #] in Specimen by NAA with probe detection | ||
| - SARS-CoV-2 (COVID-19) E gene [Cycle Threshold #] in Respiratory specimen by NAA with probe detection | ||
|
|
||
|
|
||
| ### `fluid` | ||
|
|
||
| `fluid` describes the substance on which the measurement was made. These include blood, cerebrospinal fluid, joint fluid, ascites, urine and other body fluid. | ||
|
|
||
| ### `category` | ||
|
|
||
| `category` provides higher level information as to the type of measurement. These categories include hematology, chemistry, and blood gas. For example, a category of 'ABG' indicates that the measurement is an arterial blood gas. | ||
Uh oh!
There was an error while loading. Please reload this page.