Endoscopy Text-Image Pairs Dataset

This repository contains code for extracting and processing endoscopy text-image pairs, focusing on the filtering, extraction, and classification of data from endoscopy videos.

Data Pipeline Overview

The dataset extraction code is located in the Data-Pipeline folder.

Data Processing Steps

Filtering Endoscopy Videos
- Preprocess and filter videos to focus on relevant endoscopy content.
Downloading YouTube Videos and Extracting Audio
- Library: PyTube
- PyTube Version: 12.0.0 (required for compatibility)
- Audio Extraction: Use moviepy to extract audio from the downloaded videos.

Extracting Keyframes from Video

Library: FFmpeg
Hyperparameter (scene threshold): 0.01

Command:

ffmpeg -i video_test.mp4 -vf "select='gt(scene,0.01)',showinfo,setpts=N/FRAME_RATE/TB" -q:v 2 -vsync vfr -f image2 /home/easgrad/baluhars/PIPELINE/VIDEOS/test_frames/frames_%03d.jpg 2> keyframes_output.txt

Extracting Timestamps:

grep showinfo keyframes_output.txt | grep pts_time:[0-9.]* -o | grep [0-9.]* -o > keyframes_timestamps.txt

Classifying Keyframes
- Models: CLIP, Endoscopy Classifier
- Classify extracted keyframes using the models mentioned above.
Chunking Keyframes
- Algorithm: Chunking algorithm
- Hyperparameter (pair_chunk_time): 10 (seconds)
- Apply chunking to keyframes to create meaningful image-text pairs.
Extracting Relevant Audio Chunks and Applying ASR
- Model: Whisper-v3-large
- Extract audio chunks corresponding to keyframes and apply Automatic Speech Recognition (ASR).
Text Correction
- Libraries: SpaCy, ChatGPT 4.0 API
- Correct the extracted text using NLP tools and ChatGPT.
Combining Text & Image
- Create CSV files by combining the processed text and corresponding images.

Endoscopy-Classifier

The Endoscopy-Classifier folder contains the model training code and the saved weights used for classifying keyframes in the dataset. This includes the architecture, training scripts, and pre-trained models specifically tailored for endoscopy video classification.

Requirements

Python
PyTorch
PyTube 12.0.0
moviepy
FFmpeg
CLIP Model
Whisper-v3-large
SpaCy
ChatGPT 4.0 API

Ensure all dependencies are installed before running the code.

License

This project is licensed under the MIT License.

Acknowledgements

Special thanks to the contributors and the open-source community for the tools and libraries used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Data-Pipeline		Data-Pipeline
Endoscopy-Classifier		Endoscopy-Classifier
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Endoscopy Text-Image Pairs Dataset

Data Pipeline Overview

Data Processing Steps

Endoscopy-Classifier

Requirements

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

BaluHarshavardan99/Endoscopy_Dataset

Folders and files

Latest commit

History

Repository files navigation

Endoscopy Text-Image Pairs Dataset

Data Pipeline Overview

Data Processing Steps

Endoscopy-Classifier

Requirements

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages