Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
data	data
region_proposals	region_proposals
run_files	run_files
tapis	tapis
tools	tools
README.md	README.md
requirements.txt	requirements.txt

TAPIS: Transformers for Actions, Phases, Steps and Instrument Segmentation

We present the Transformers for Actions, Phases, Steps, and Instrument Segmentation (TAPIS) model, a generalized architecture designed to tackle all the proposed tasks in the GraSP benchmark. Our method utilizes a localized instrument segmentation baseline applied on independent keyframes that act as a region proposal network and provide pixel-precise instrument masks and their corresponding segment embeddings. Further, our model uses a global video feature extractor on time windows centered on a keyframe to compute a class embedding and a sequence of spatio-temporal embeddings. A frame classification head uses the class embedding to classify the middle frame of the time window into a phase or a step, and a region classification head interrelates the global spatio-temporal features with the localized region embeddings for atomic action prediction or instrument region classification. In the following subsections, we explain the details of our proposed architecture.

Previous works

This work is an extended and consolidated version of three previous works:

Towards Holistic Surgical Scene Understanding, MICCAI 2022, Oral. Code here.
Winner solution of the 2022 SAR-RARP50 challenge
MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation, ISBI 2023, Oral. Code here.

Installation

Please follow these steps to run TAPIS:

$ conda create --name tapis python=3.8 -y
$ conda activate tapis
$ conda install pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# (for older cuda versions)
# conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

$ git clone https://github.com/BCV-Uniandes/GraSP
$ cd GraSP/TAPIS
$ pip install -r requirements.txt

$ pip install 'git+https://github.com/facebookresearch/fvcore'
$ pip install 'git+https://github.com/facebookresearch/fairscale'
$ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Data Preparation

In this Google Drive Link, you will find a compressed archive with our preprocessed data files, region proposals, and pre-trained models. We provide a README file with instructions about the data structures and the files in the link. Download this file and uncompress it with the following command.

$ tar -xzvf TAPIS.tar.gz

Then, locate the extracted files in a directory named GraSP inside the data directory of this repository. Please also include the video frames in a directory named "frames", and include the original annotations in the "annotations" directory next to the region predictions. In the end, the repository must have the following structure.

TAPIS
|
|__configs
|   ...
|__data
|   |__GraSP
|       |__annotations
|       |   |__fold1_train_preds.json
|       |   |__fold1_val_preds.json
|       |   |__fold2_train_preds.json
|       |   |__fold2_val_preds.json
|       |   |__train_train_preds.json
|       |   |__test_val_preds.json
|       |   |__grasp_long-term_fold1.json
|       |   |__grasp_long-term_fold2.json
|       |   |__grasp_long-term_train.json
|       |   |__grasp_long-term_test.json
|       |   |__grasp_short-term_fold1.json
|       |   |__grasp_short-term_fold2.json
|       |   |__grasp_short-term_train.json
|       |   |__grasp_short-term_test.json
|       |
|       |__features
|       |   |__fold1_train_region_features.pth
|       |   |__fold1_val_region_features.pth
|       |   |__fold2_train_region_features.pth
|       |   |__fold2_val_region_features.pth
|       |   |__train_train_region_features.pth
|       |   |__test_val_region_features.pth
|       |
|       |__frame_lists
|       |   |__fold1.csv
|       |   |__fold2.csv
|       |   |__train.csv
|       |   |__test.csv
|       |
|       |__frames
|       |   |__CASE001
|       |   |   |__000000000.jpg
|       |   |   |__000000002.jpg
|       |   |   ...
|       |   |__CASE002
|       |   |   ...
|       |   ...
|       |
|       |__pretrained_models
|           |__fold1
|           |   |__ACTIONS.pyth
|           |   |__LONG.pyth
|           |   |__PHASES.pyth
|           |   |__STEPS.pyth
|           |   |__INSTRUMENTS.pyth
|           |   |__SEGMENTATION_BASELINE
|           |       |__r50.pth
|           |       |__swinl.pth
|           |__fold2
|           |   ...
|           |__train
|               |__ACTIONS.pyth
|               |__LONG.pyth
|               |__INSTRUMENTS.pyth
|               |__SEGMENTATION_BASELINE
|                   |__swinl.pth
|
|__region_proposals
|__run_files
|__tapis
|__tools

Feel free to use soft/hard linking to other paths or to modify the directory structure, names, or locations of the files. However, you may also have to alter the .yaml config files or the bash running scripts.

Running the code

Task	cross-val mAP	test mAP	config	run file	model path
Phases	71.36 $\pm$ 1.3	76,72	PHASES	phases	TAPIS/pretrained_models/PHASES
Steps	50.74 $\pm$ 2.53	52.01	STEPS	steps	TAPIS/pretrained_models/STEPS
Instruments	90.28 $\pm$ 0.83	89.09	INSTRUMENTS	instruments	TAPIS/pretrained_models/INSTRUMENTS
Actions	35.46 $\pm$ 2.40	39.50	ACTIONS	actions	TAPIS/pretrained_models/ACTIONS

We provide bash scripts with the default parameters to evaluate each GraSP task. Please first download our preprocessed data files and pretrained models as instructed earlier and run the following commands to run evaluation on each task:

# Run the script corresponding to the desired task to evaluate
$ sh run_files/grasp_<actions/instruments/phases/steps/long-term/short-term_rpn>

Training TAPIS

You can easily modify the bash scripts to train our models. Just set TRAIN.ENABLE True on the desired script to enable training, and set TEST.ENABLE False to avoid testing before training. You might also want to modify TRAIN.CHECKPOINT_FILE_PATH to the model weights you want to use as initialization. You can modify the config files or the bash scripts to alter the architecture design, training schedule, video input design, etc. We provide documentation for each hyperparameter in the defaults script.

Evaluation metrics

Although our codes are configured to evaluate the model's performance after each epoch, you can easily evaluate your model's predictions using our evaluation codes and implementations. For this purpose, you can run the evaluate script and provide the required paths in the arguments as documented in the script. You can run this script on the output files of the detectron2 library using the --filter argument, or you can provide your predictions in the following format:

[
      {"<frame/name>":
            
            {
             # For long-term tasks
             "<phase/step>_score_dist": [class_1_score, ..., class_N_score],

             # For short-term tasks
             "instances": 
             [
                 {
                  "bbox": [x_min, y_min, x_max, y_max],
                  "<instruments/actions>_score_dist": [class_1_score, ..., class_N_score],
                  
                  # For instrument segmentation
                  "segment" <Segmentation in RLE format>
                 } 
             ]
            }
      },
      ...
]

You can run the evaluate.py script as follows:

$ python evaluate.py --coco_anns_path /path/to/coco/annotations/json \
--pred-path /path/to/predictions/json or pth \
--output_path /path/to/output/directory \
--tasks <instruments/actions/phases/steps> \
--metrics <mAP/[email protected]_box/[email protected]_segm/mIoU/mAP_pres> \
(optional) --masks-path /path/to/segmentation/masks \
# Optional for detectron2 outputs
--filter \
--slection <topk/thresh/cls_thresh/...> \
--selection_info <filtering info>

Instrument Segmentation Baseline

Our instrument segmentation baseline is wholly based on Mask2Former, so we recommend checking their repo for details on their implementation.

Installation

To run our baseline, first go to the region proposal directory and install the corresponding dependencies. You must have already installed all the required dependencies of the main TAPIS code. The following is an example of how to install dependencies correctly.

$ conda activate tapis
$ cd ./region_proposals
$ pip install -r requirements.txt
$ cd mask2former/modeling/pixel_decoder/ops
$ sh make.sh
$ cd ../../../..

Running the Segmentation Baseline

The original Mask2Former code does not accept segmentation annotations in RLE format; hence, to run our baseline, you must first transform our RLE masks into Polygons using the rle_to_polygon.py script as follows:

$ python rle_to_polygon.py --data_path /path/to/GraSP/annotations

Then to run the training code run the train_net.py script indicating the path to a configuration file in the configs directory with the --config-file argument. You should also indicate the path to the GraSP dataset with the DATASETS.DATA_PATH option, the path to the pretrained weights with the MODEL.WEIGHTS option, and the desired output path with the OUTPUT_DIR option. Download the pretrained Mask2Former weights for instance segmentation in the COCO dataset from the Mask2Former repo. Use the following command to train our baseline:

$ python train_net.py --num-gpus <number of GPUs> \
--config-file configs/grasp/<config file name>.yaml \
DATASETS.DATA_PATH path/to/grasp/dataset \
MODEL.WEIGHTS path/to/pretrained/model/weights \
OUTPUT_DIR output/path

You can modify most hyperparameters by changing the values in the configuration files or using command options; please check the Detectron2 library and the original Mask2Former repo for further details on configuration files and options.

To run the evaluation code, use the --eval-only argument and the TAPIS model weights provided in the data link. Run the following command to evaluate our baseline:

$ python train_net.py --num-gpus <number of GPUs> --eval-only \
--config-file configs/grasp/<config file name>.yaml \
DATASETS.DATA_PATH path/to/grasp/dataset \
MODEL.WEIGHTS path/to/pretrained/model/weights \
OUTPUT_DIR output/path

Note: You can easily run our segmentation baseline in a custom dataset by modifying the register_surgical_dataset function in the train_net.py script to register the dataset in a COCO JSON format. Once again, we recommend checking the Detectron2 library and the original Mask2Former for more details on registering your dataset.

Region Features

Our code allows calculating region features during training and validation (on the fly) or storing precalculated region features:

Our published results are based on stored region features, as calculating features on the fly significantly increases computational complexity and slows training down. Our code stores the region features corresponding to the predicted segments in the same results files in the output directory of the segmentation baseline. However, you can use the match_annots_n_preds.py script to filter predictions, assign region features to ground truth instances for training, and parse predictions into necessary files for TAPIS. Use the code as follows:

$ python match_annots_n_preds.py

To calculate region features on the fly, we provide an example of configuring our code in the run_files/grasp_short-term_rpn.sh file.

MATIS Baseline for Endovis 2017 and 2018

You can also run the segmentation baseline for the Endovis 2017 and Endovis 2018 datasets, as done in our previous MATIS paper. We recommend checking the paper and the MATIS repo.

To run our segmentation baseline in the Endovis 2017 and 2018 datasets, please download the preprocessed frames, instances annotations, and pretrained models from this link as instructed in the MATIS repo. Then run the segmentation baseline as previously instructed but using the provided configuration files for Endovis 2017 or Endovis 2018, and indicating the path to the downloaded data with the DATASETS.DATA_PATH option

Contact

If you have any doubts, questions, issues, or comments, please email [email protected].

Citing TAPIS

If you find GraSP or TAPIS useful for your research (or its previous versions, PSI-AVA, TAPIR, and MATIS), please include the following BibTex citations in your papers.

@article{ayobi2024pixelwise,
      title={Pixel-Wise Recognition for Holistic Surgical Scene Understanding}, 
      author={Nicol{\'a}s Ayobi and Santiago Rodr{\'i}guez and Alejandra P{\'e}rez and Isabela Hern{\'a}ndez and Nicol{\'a}s Aparicio and Eug{\'e}nie Dessevres and Sebasti{\'a}n Peña and Jessica Santander and Juan Ignacio Caicedo and Nicol{\'a}s Fern{\'a}ndez and Pablo Arbel{\'a}ez},
      year={2024},
      url={https://arxiv.org/abs/2401.11174},
      eprint={2401.11174},
      journal={arXiv},
      primaryClass={cs.CV}
}

@InProceedings{ayobi2023matis,
      author={Nicol{\'a}s Ayobi and Alejandra P{\'e}rez-Rond{\'o}n and Santiago Rodr{\'i}guez and Pablo Arbel{\'a}es},
      booktitle={2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI)}, 
      title={MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation}, 
      year={2023},
      pages={1-5},
      doi={10.1109/ISBI53787.2023.10230819}
}

@InProceedings{valderrama2020tapir,
      author={Natalia Valderrama and Paola Ruiz and Isabela Hern{\'a}ndez and Nicol{\'a}s Ayobi and Mathilde Verlyck and Jessica Santander and Juan Caicedo and Nicol{\'a}s Fern{\'a}ndez and Pablo Arbel{\'a}ez},
      title={Towards Holistic Surgical Scene Understanding},
      booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022},
      year={2022},
      publisher={Springer Nature Switzerland},
      address={Cham},
      pages={442--452},
      isbn={978-3-031-16449-1}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TAPIS

TAPIS

README.md

TAPIS: Transformers for Actions, Phases, Steps and Instrument Segmentation

Previous works

Installation

Data Preparation

Running the code

Training TAPIS

Evaluation metrics

Instrument Segmentation Baseline

Installation

Running the Segmentation Baseline

Region Features

MATIS Baseline for Endovis 2017 and 2018

Contact

Citing TAPIS

Files

TAPIS

Directory actions

More options

Directory actions

More options

Latest commit

History

TAPIS

Folders and files

parent directory

README.md

TAPIS: Transformers for Actions, Phases, Steps and Instrument Segmentation

Previous works

Installation

Data Preparation

Running the code

Training TAPIS

Evaluation metrics

Instrument Segmentation Baseline

Installation

Running the Segmentation Baseline

Region Features

MATIS Baseline for Endovis 2017 and 2018

Contact

Citing TAPIS