🚀 MediVision.AI is an advanced Generative AI-powered multimodal medical assistant, leveraging LLMs, speech processing, and vision-based analysis to provide AI-driven medical insights. Built using Gradio and integrated with LLama 3.3, ElevenLabs, and Groq APIs, this AI doctor is capable of speech-to-text (STT), text-to-speech (TTS), and medical query resolution.
🔗 Live Demo: MediVision.AI on Hugging Face
✅ Generative AI-Powered Medical Assistance: Uses LLMs for medical query resolution.
✅ Multimodal Interaction: Accepts images, text, and voice inputs.
✅ Speech Processing: Supports STT (speech-to-text) and TTS (text-to-speech) via ElevenLabs and gTTS.
✅ Gradio UI: User-friendly web interface for seamless interactions.
✅ Cloud-Hosted: Deployed on Hugging Face Spaces for real-time access.
MediVision.AI follows a multi-agent AI architecture where different components handle specific tasks:
- 🧠 "Brain of the Doctor" (
brain_of_the_doctor.py
) – Calls Groq's multimodal LLM API to analyze queries and images. - 🗣 "Voice of the Doctor" (
voice_of_the_doctor.py
) – Converts AI-generated responses into speech using ElevenLabs & gTTS. - 🧑⚕️ "Voice of the Patient" (
voice_of_the_patient.py
) – Captures user voice input and transcribes it using speech_recognition & Groq API. - 📟 "Gradio Interface" (
gradio_app.py
) – Integrates all components into an interactive web-based UI.
This modular design ensures seamless API-driven AI interactions.
Ensure you have the following installed:
- Python 3.8+
- Pip & Virtualenv
- FFmpeg (for voice processing)
# Clone the repository
git clone https://github.com/utkarshranaa/MediVision.AI.git
cd MediVision.AI
# Create and activate a virtual environment
python3 -m venv env
source env/bin/activate # On Windows, use `env\Scripts\activate`
# Install dependencies
pip install -r requirements.txt
# Run the application
python gradio_app.py
- Upload images, input symptoms, and use voice commands via the web interface.
- The AI processes your input using Llama 3.3 LLM APIs and speech models.
- Receive AI-driven responses, either as text or voice.
MediVision.AI integrates cutting-edge AI APIs:
- Groq API: Multimodal LLM for text and image-based medical reasoning.
- ElevenLabs & gTTS: Advanced text-to-speech (TTS) engines.
- SpeechRecognition & pydub: Speech-to-text (STT) processing.
- Gradio: Interactive AI-powered web UI.
- Cloud Hosting: Hugging Face Spaces for real-time inference.
📌 Upcoming Enhancements:
- 🔹 Conversational Memory: AI remembers patient history.
- 🔹 Improved STT Models: Enhancing speech input accuracy.
- 🔹 Multilingual Support: Expanding to different languages.
- 🔹 Mobile App Version: Bringing AI diagnostics to mobile devices.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries or collaborations, reach out via GitHub or connect on LinkedIn!
🔗 Author: Utkarsh Ranaa
🔗 Project Repository: MediVision.AI on GitHub