An implementation of the YOLO algorithm trained to spot tumors in DICOM images. The model is trained on the "Crowds Cure Cancer" dataset, which only contains images that DO have tumors; this model will always predict a bounding box for a tumor (even if one is not present). NOTE: Our final model didn't quite work; we hypothesize that the quality of the data may be one of the culprits in our model's inability to learn how to detect the tumors with any accuracy.
We recommend using a virtual environment to ensure your dependencies are up-to-date and freshly installed. From there, you can use pip to install all the dependencies in the deps.txt file (this can be done quickly with pip install -r deps.txt).
We have included the appropriate directory structure below. This will allow model.py to access the data necessary for training the model. In order to run the following commands, please set up the directory like so:
| YOLO/
|---| crowds-cure-cancer-2017/ ## <-- result of Kaggle download
|---|---| annotated_dicoms.zip
|---|---| compressed_stacks.zip
|---|---| CrowdsCureCancer2017Annotations.csv ## <---- @move this file to YOLO/YOLO-Cancer-Detection/label_data
|---| data/
|---|---| TCGA-09-0364
|---|---| ...
## put the data files into this 'data/' directory
|---|---| ...
|---|---| TCGA-OY-A56Q
|---| YOLO-Cancer-Detection/
|---|---| label_data/
|---|---|---| clean_data.py
|---|---|---| CCC_clean.py
|---|---|---| CrowdsCureCancer2017Annotations.csv ## <---- @Here
|---|---| model.py
|---|---| predict.py
|---|---| deps.txt
|---|---| README.md
|---|---| trained_model/ ## <-- THIS IS A DIRECTORY FOR SAVING YOUR MODEL. DO NOT FORGET THIS.
torch
pydicom
pandas
numpy
matplotlib
opencv-python
traceback
tkinter
skimage
The data must be downloaded directly from Kaggle, where you need to create a username and password (if you don't already have one) in order to download the dataset. Once you have downloaded and unzipped the dataset, you will have the raw images and CSV data. We clean the CSV data down to only the necessary information using the clean_data.py script in the label_data/ directory, which produces a new, clean CSV file which is used in the training and example usage usage of the model.
# 基础训练
python model.py
# 自定义参数训练
python model.py --epochs 10 --batch_size 8 --lr 0.001 --mirror
# 不保存模型,verbose输出
python model.py --no_save --verbose --epochs 5
# 自定义数据路径
python model.py --csv_path custom/labels.csv --image_path custom/images/ --weight_path output/model.pth
To train the model, one can simply run $ python model_original.py at the command line. With the adjustable parameters at the top of the file, you can change some aspects of how the model trains very easily. As the model trains, (assuming you have the Tensorboard variables enabled in model_original.py) it will periodically save logged events to the logs directory which can be viewed using Tensorboard. Once the model is trained it will be saved to the trained_model/ directory (note that this directory must exist prior to the model being saved there, as the write command will fail if the directory is not present).
To see the results of your saved model, simply run $ python predict.py. This is a simple script that loads up the image data, CSV data, and a trained model from the trained_model/ directory and allows you to visually compare the predicted and ground truth bounding boxes on each image in the dataset.
If you want to see your result, you can find it in predictions Folder or you can run python prediction_visualization.py (this script has supported the GUI and Jupyter Environment).
