Skip to content

Commit 085929b

Browse files
authored
Update README.md
1 parent eb395c2 commit 085929b

File tree

1 file changed

+66
-24
lines changed

1 file changed

+66
-24
lines changed

README.md

Lines changed: 66 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,66 @@
1-
# DfMD
2-
Datasheets for Medical Datasets app.
3-
4-
Requirements for Data and Data Dictionary files:
5-
- .csv or .xlsx files
6-
- no value are case sensitive
7-
- List of NaN values:
8-
- ["#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "None", "n/a", "nan", "null", "na", "-"]
9-
10-
Data Dictionary:
11-
- Requiered Columns:
12-
- "variable type"
13-
- only values permitted:
14-
-"continuous", "categorical", "date and time"
15-
- "role"
16-
- only values permitted:
17-
- "outcome", "feature", "identifier", "other"
18-
- variables labeled as "other" are not evaluated in the app.
19-
- If multiple variables are labeled as "identifier" only first is checked
20-
21-
Data:
22-
- List of allowed characters:
23-
- '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
24-
1+
# DAIMS: Datasheets for AI and Medical Datasets
2+
3+
![GitHub license](https://img.shields.io/github/license/PERSIMUNE/DAIMS)
4+
![GitHub last commit](https://img.shields.io/github/last-commit/PERSIMUNE/DAIMS)
5+
![GitHub issues](https://img.shields.io/github/issues/PERSIMUNE/DAIMS)
6+
![GitHub stars](https://img.shields.io/github/stars/PERSIMUNE/DAIMS?style=social)
7+
![Streamlit App](https://img.shields.io/badge/Streamlit-App-brightgreen)
8+
9+
Despite progress in data engineering, inconsistencies in data validation and documentation procedures continue to cause confusion and technical challenges in research involving machine learning (ML). While frameworks like **Datasheets for Datasets** have made strides in addressing these challenges, there is room for improvement to better prepare datasets for ML pipelines.
10+
11+
To bridge this gap, we introduce **DAIMS**: Datasheets for AI and Medical Datasets. DAIMS extends the foundational framework with additional tools and guidance tailored specifically for medical datasets and AI applications.
12+
13+
## Key Features
14+
15+
1. **Comprehensive Checklist**
16+
A 24-point checklist that covers common data standardization requirements. A subset of these checks is automated by the DAIMS software tool to validate dataset readiness.
17+
18+
2. **Data Documentation Form**
19+
An extended documentation form designed to capture essential metadata, pose relevant research questions, and ensure datasets are well-prepared for ML analysis.
20+
21+
3. **Data Dictionary Table**
22+
A tabular format to document variable descriptions, data types, units, and any relevant details about the dataset.
23+
24+
4. **Flowchart for ML Analyses**
25+
A guided flowchart that maps research questions to suggested ML methods, providing researchers with a clear pathway to address their objectives.
26+
27+
5. **Software Tool**
28+
A publicly available tool to assist in dataset preparation by automating key aspects of the checklist validation process.
29+
30+
6. **Online App**
31+
DAIMS is available as an easy-to-use online app hosted at [https://daims-app.streamlit.app/](https://daims-app.streamlit.app/), enabling efficient dataset evaluation and preparation.
32+
33+
## Benefits of DAIMS
34+
35+
- **Standardization**: Promotes consistent practices for preparing datasets in medical research.
36+
- **Guidance**: Offers actionable insights through the flowchart and checklist, helping researchers align datasets with their ML objectives.
37+
- **Automation**: Saves time by automating key validation processes.
38+
- **Documentation**: Enhances transparency and reproducibility through detailed data documentation.
39+
40+
## Getting Started
41+
42+
1. **Access the Repository**
43+
Clone or download the DAIMS repository from GitHub:
44+
[https://github.com/PERSIMUNE/DAIMS](https://github.com/PERSIMUNE/DAIMS)
45+
46+
2. **Explore the Online App**
47+
Use the online app for streamlined dataset evaluation:
48+
[https://daims-app.streamlit.app/](https://daims-app.streamlit.app/)
49+
50+
3. **Follow the Checklist**
51+
Refer to the provided checklist to ensure datasets meet the 24 common data standardization requirements.
52+
53+
4. **Document Your Dataset**
54+
Use the extended form and data dictionary table to comprehensively document your dataset.
55+
56+
5. **Use the Flowchart**
57+
Map your research questions to suggested ML methods for clearer analytical direction.
58+
59+
## Contributing
60+
61+
We welcome contributions to improve DAIMS! Feel free to open issues, submit pull requests, or provide feedback on the GitHub repository.
62+
63+
## License
64+
65+
This project is licensed under the [MIT License](LICENSE).
66+

0 commit comments

Comments
 (0)