Skip to content

MIC-DKFZ/diabetes-xai

Repository files navigation


Logo

Explainable AI-based analysis of human pancreas sections identifies traits of type 2 diabetes

This study uses deep learning and explainable AI to analyze brightfield and fluoresence microscopy images to indentify histologic biomarkers which correlated with type-2 diabetes.

Abstract:
Type 2 diabetes (T2D) is a chronic disease currently affecting around 500 million people worldwide with often severe health consequences. Yet, histopathological analyses are still inadequate to infer the glycaemic state of a person based on morphological alterations linked to impaired insulin secretion and β-cell failure in T2D. Giga-pixel microscopy can capture subtle morphological changes, but data complexity exceeds human analysis capabilities. In response, we generated a dataset of pancreas whole-slide images with multiple chromogenic and multiplex fluorescent stainings and trained deep learning models to predict the T2D status. Using explainable AI, we made the learned relationships interpretable, quantified them as biomarkers, and assessed their association with T2D. Remarkably, the highest prediction performance was achieved by simultaneously focusing on islet α-and δ-cells and neuronal axons. Subtle alterations in the pancreatic tissue of T2D donors such as smaller islets, larger adipocyte clusters, altered islet-adipocyte proximity, and fibrotic patterns were also observed. Our innovative data-driven approach underpins key findings about pancreatic tissue alterations in T2D and provides novel targets for research.


📝  Citing this Work

If you use our work please cite our paper

@article{
}

🧭  Table of Contents


⚙️  Installation

This repositoy requires Python version 3.9 or later. All essential libraries for the execution of the code are provided in the requirements.txt file from which a new environment can be created (Linux only):

pip install -r requirements.txt

Depending on your GPU, you need to install an appropriate version of PyTorch and torchvision separately. All scripts run also on CPU, but can take substantially longer depending on the experiment. Testing and development were done with the Pytorch version using CUDA 11.6.


🗃  Project Structure

This code as well as its structure is based mostly on the CLAM repository and includes files in its original or modified form.

├── dataset_csv         # csv files for data loading
├── datasets            # implementation of WSI datasets
├── feature_extraction  # scripts for extracting features from the WSIs
├── models              # model architectures
├── preprocessing       # convert original images and extract patches
│   ├── chromogenic     # for chromogenic stainings
│   ├── dataset_splits  # scripts for splitting the data in train/test and Cross Validation
│   └── fluo            # for fluorescence stainings
├── splits              # csv files with different training splits
├── utils               # logger and file exports
├── vis_utils           # visualization helper functions
├── wsi_core            # load and visalize WSIs
├── xai                 # XAI analysis scripts
│   ├── notebooks       # notebooks for heatmaps
│   ├── scripts         # overview figures and biomarker extraction
│   └── utils           # attribution and heatmap helper functions
├── .gitignore
├── README
└── main.py             # start model training

💾 Dataset

Logo

The dataset is available here. For both the chromogenic and fluorescence Whole Slide Images (WSIs) it contains all original images.

If you can not open CZI files due to missing software or RAM limitations, we included the script ./xai/scripts/czi_to_tiff.py which converts the .czi files to 8x downsamples .tiff files for visualization purpose only.

♻️ Reproducing the Results

🚀 Model Training

💾 Data Preprocessing

If you want to run the preprocessing on the original data you can run the following commands. Note that some of the scripts require a lot of RAM (~300GB). Please adapt the paths in the scripts to your system.

Chromogenic:

python preprocessing/chromogenic/create_patches_no_seg.py

Fluorescence:

python preprocessing/fluo/czi2mzarr.py
python preprocessing/fluo/fluo_clean_imgs.py
run preprocessing/fluo/preprocess_fluo_annotation.ipynb
run preprocessing/fluo/make_labels_fluo.ipynb
python preprocessing/fluo/fluo_extract_patch_ids.py

🚀 Feature Extraction

Once you have patch coordinates you need to encode the respective patches to feature vectors. A GPU is required. Feature extraction can take several days for the whole dataset depending on the GPU. Please adapt the paths in the scripts to your system.

For the chromogenic WSIs encoded with the Imagenet21k pre-trained Vsion Transformer run:

python feature_extraction/extract_features_fp_timm.py --model beitv2_large_patch16_224_in22k 

For the chromogenic WSIs encoded with the Phikon pre-trained Vsion Transformer run:

python feature_extraction/extract_features_fp_phikon.py  

For the fluorescence WSIs (RGB representation) encoded with the Imagenet21k pre-trained Vsion Transformer run:

python feature_extraction/extract_features_fluo_rgb.py 

For the fluorescence WSIs (channel-wise representation) encoded with the Imagenet21k pre-trained Vsion Transformer run:

python feature_extraction/extract_features_fluo.py 

For obtaining the channel-wise average representation first run the channel-wise feature extraction and afterwards run the following command:

python feature_extraction/average_channel_wise_features.py 

🚀 Start Training

Please adapt the data_root_dir path to the respective feature directories from the dataset!

For training the CLAM model on the Imagenet21k pre-trained features of the chromogenic WSIs on the individual stainings run the following commands:

# PECAM 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt sgd --task DIADEM_15fold_cd31 --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 100 --traindata_ratio .05 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
# Glucagon
python main.py --drop_out --lr 1e-3 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt sgd --task DIADEM_15fold_glucagon --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 100 --traindata_ratio .05 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
# Tubulin beta 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_tubulin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
# Perilipin 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt sgd --task DIADEM_15fold_perilipin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 100 --traindata_ratio .05 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
# Somatostatin
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt sgd --task DIADEM_15fold_sst --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 100 --traindata_ratio .05 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
# Insulin
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_insulin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-imagenet-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen

For training the CLAM model on the Phikon pre-trained features of the chromogenic WSIs on the individual stainings run the following commands:

# PECAM 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_cd31 --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon
# Glucagon
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_glucagon --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon
# Tubulin beta 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_tubulin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon
# Perilipin 1
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_perilipin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon
# Somatostatin
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_sst --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon
# Insulin
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code DIADEM_15fold --opt madgrad --task DIADEM_15fold_insulin --model_type clam_mb --log_data --data_root_dir <path-to-chromogenic-phikon-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen --model_size phikon

For training the CLAM model on the Imagenet21k pre-trained features of the fluorescence WSIs on the individual stainingsets run the following commands:

# Channel-wise average
## Stainingsset 1
python main.py --drop_out --lr 1e-5 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt madgrad --task FLUO_15fold_M1 --model_type clam_mb --log_data --data_root_dir <path-to-channel-wise-avg-features> --max_epochs 300 --traindata_ratio .01 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
## Stainingsset 2
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt sgd --task FLUO_15fold_M2 --model_type clam_mb --log_data --data_root_dir <path-to-channel-wise-avg-features> --max_epochs 500 --traindata_ratio .05 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen

# Channel-wise
## Stainingsset 1
python main.py --drop_out --lr 1e-5 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt madgrad --task FLUO_15fold_M1 --model_type clam_mb --log_data --data_root_dir <path-to-channel-wise-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
## Stainingsset 2
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt sgd --task FLUO_15fold_M2 --model_type clam_mb --log_data --data_root_dir <path-to-channel-wise-features> --max_epochs 100 --traindata_ratio .5 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen

# RGB
## Stainingsset 1
python main.py --drop_out --lr 1e-5 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt madgrad --task FLUO_15fold_M1 --model_type clam_mb --log_data --data_root_dir <path-to-rgb-features> --max_epochs 300 --traindata_ratio .05 --reg 0. --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen
## Stainingsset 2
python main.py --drop_out --lr 1e-4 --k 15 --label_frac 1.0 --exp_code FLUO_15fold --opt sgd --task FLUO_15fold_M2 --model_type clam_mb --log_data --data_root_dir <path-to-rgb-features> --max_epochs 100 --traindata_ratio .5 --reg 0.00001 --in_memory --accumulation_steps 1 --vram --weighted_sample --weight_tuebingen

🔎 Explainable AI

Biomarker Extraction

The scripts for biomarker extraction are located in ./xai/scripts/. For standard biomarkers, use the biomarker_extraction_chromogenic.py script. You'll need to specify the paths to the whole slide images (WSIs), model checkpoints, segmentation masks, and .h5 files containing the coordinates.

If you also want to include the minimum distance between islets and adipocyte clusters as a biomarker, use biomarker_extraction_chromogenic_islets_fat_dist.py. However, note that this process may take longer to run.

For fluorescence biomarkers, use biomarker_extraction_fluo.py for standard biomarker extraction. To include specific intensity biomarkers, such as perilipin intensity or tubulin intensity within islets, run either biomarker_extraction_fluo_perilipin_scores.py or biomarker_extraction_fluo_tubulin_scores.py. As with the chromogenic data, ensure to specify the paths to the mzarr files, model checkpoints, segmentation masks, and .h5 files with the coordinates.

Overview Heatmap Figures

You can generate overview heatmap figures for global and local attention either through the respective notebooks in ./xai/notebooks/ or the scripts in ./xai/scripts/. Ensure you fill in the required paths in the scripts. Using the scripts will export overview heatmaps for all test set cases, while the notebooks allow you to select and export the heatmap for a specific patient, with the option to adjust hyperparameters. The notebook and script names match for convenience.

For chromogenic data, we provide two specific scripts: heatmaps_homa2b_chromogenic.py and biomarker_extraction_fluo_tubulin_scores.py. These compute heatmaps for models predicting type-2 diabetes status or the HOMA2B value, respectively.

For fluorescence data, there are scripts for each representation: RGB, Channel-Wise, and Channel-Wise Average (heatmaps_t2d_fluo_*.py). Please note that Channel-Wise heatmaps can become large in size and computationally intensive, as a heatmap is generated for each channel.

Statistical Analysis

Explorative and statistical analysis for both fluorescence and chromogenic data can be performed using the respective notebooks: statistical_analysis_chromogenic.ipynb and statistical_analysis_fluo.ipynb. Both notebooks are structured into "Data Loading", "Explorative Analysis", and "Statistical Analysis" sections. Be sure to download the metadata file for the dataset before running these analyses.

📣  Acknowledgements

The code is developed by the authors of the paper. However, it does also contain pieces of code from the following packages:



                   

DIADEM is developed and maintained by the Interactive Machine Learning Group and the Applied Computer Vision Lab of Helmholtz Imaging and the DKFZ, as well as the University Hospital Tübingen and the German Center for Diabetes Research.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published