This repository contains code for extracting and processing endoscopy text-image pairs, focusing on the filtering, extraction, and classification of data from endoscopy videos.
The dataset extraction code is located in the Data-Pipeline folder.
-
Filtering Endoscopy Videos
- Preprocess and filter videos to focus on relevant endoscopy content.
-
Downloading YouTube Videos and Extracting Audio
- Library: PyTube
- PyTube Version: 12.0.0 (required for compatibility)
- Audio Extraction: Use
moviepyto extract audio from the downloaded videos.
-
Extracting Keyframes from Video
- Library: FFmpeg
- Hyperparameter (scene threshold): 0.01
- Command:
ffmpeg -i video_test.mp4 -vf "select='gt(scene,0.01)',showinfo,setpts=N/FRAME_RATE/TB" -q:v 2 -vsync vfr -f image2 /home/easgrad/baluhars/PIPELINE/VIDEOS/test_frames/frames_%03d.jpg 2> keyframes_output.txt
- Extracting Timestamps:
grep showinfo keyframes_output.txt | grep pts_time:[0-9.]* -o | grep [0-9.]* -o > keyframes_timestamps.txt
-
Classifying Keyframes
- Models: CLIP, Endoscopy Classifier
- Classify extracted keyframes using the models mentioned above.
-
Chunking Keyframes
- Algorithm: Chunking algorithm
- Hyperparameter (
pair_chunk_time): 10 (seconds) - Apply chunking to keyframes to create meaningful image-text pairs.
-
Extracting Relevant Audio Chunks and Applying ASR
- Model: Whisper-v3-large
- Extract audio chunks corresponding to keyframes and apply Automatic Speech Recognition (ASR).
-
Text Correction
- Libraries: SpaCy, ChatGPT 4.0 API
- Correct the extracted text using NLP tools and ChatGPT.
-
Combining Text & Image
- Create CSV files by combining the processed text and corresponding images.
The Endoscopy-Classifier folder contains the model training code and the saved weights used for classifying keyframes in the dataset. This includes the architecture, training scripts, and pre-trained models specifically tailored for endoscopy video classification.
- Python
- PyTorch
- PyTube 12.0.0
- moviepy
- FFmpeg
- CLIP Model
- Whisper-v3-large
- SpaCy
- ChatGPT 4.0 API
Ensure all dependencies are installed before running the code.
This project is licensed under the MIT License.
Special thanks to the contributors and the open-source community for the tools and libraries used in this project.