The MCI_JSON2TSV tool is a python-based script takes an input directory of COG and/or IGM formatted Clinical Report JSON files and transforms them into a set of parsed and flattened TSV files. Additionally, presence of both COG and IGM Clinical Report JSON files outputs an XLSX file that integrates data from both source types for given participants together for viewing. Please note that this script is not intended for the transformation of IGM molecular assay JSONs. Additionally, please consult the source files for additional clinical context. Parsed data are taken from clinician-interpreted forms that contain human written free-text and may contain typos or other human errors.
- Python 3.10 or higher
- pandas 2.3 or higher
- numpy 2.3 or higher
- openpyxl 3.1 or higher
For developers, unit tests were tested using the following dependencies:
- pytest 8.3.4
- pytest-mock 3.14.8
Can run tests with command:
python -m pytest You can download the python scripts from the /src directory or clone the directory using:
git clone https://github.com/CBIIT/ChildhoodCancerDataInitiative-MCI_JSON2TSV.gitTo install dependencies:
pip install -r requirements.txtpython MCI_JSON2TSV.py -d <input DIR> -o <output DIR>
Required arguments:
-d/--directory : Path to directory containing JSON files to aggregate and transform.
-o/--output_path : Path to output directory to direct file outputs. Will throw error if directory already exists, to not overwrite existing directory.
IGM and COG output directories and files are generated when input file types present; if both IGM and COG JSON files present in input, integrated XLSX file will be generated. A log file detailing parsing process is also generated.
output_path/
├── COG/
│ ├── COG_form_level_TSVs_<date>_<time>/ # Directory of tab-delimited COG form level parsed data
│ ├── COG_JSON_table_conversion_decoded_<date>_<time>.tsv # Tab-delimited file of COG report parsed data
| └── COG_saslabels_<date>_<time>.tsv # Tab-delimited file of descriptive labels for COG fields
├── COG_IGM_integrated_<date>_<time>.xlsx # Integrated XLSX of COG and IGM data; generated if both files types provided in input
├── IGM/
│ ├── IGM_<assay>_JSON_table_conversion_<date>_<time>.tsv # Tab-delimited file of parsed data from IGM reports; a file for each assay will be generated.
│ └── IGM_results_level_TSVs_<date>_<time>/ # Directory of tab-delimited IGM form variant-level parsing files
└── JSON2TSV_<date>_<time>.log # Log file
For questions or to contribute, please reach out to NCIChildhoodCancerDataInitiative@mail.nih.gov