Hands‑free Multimodal Assistive Control
Visivox is an open‑source application that empowers users—especially those with mobility impairments—to control their computer hands‑free by combining webcam‑based gaze, facial‑gesture detection, and voice commands into precise OS actions.
Millions with mobility impairments (ALS, paralysis, RSI) struggle with traditional input devices. Voice‑only solutions lack precision for complex tasks. Visivox fills this gap with multimodal voice + gaze + facial gestures for versatile, real‑time control.
- Cursor Control via Gaze – Move your head to steer the pointer
- Single & Double Clicks – Blink once to click, twice quickly to double‑click
- Right‑Click – Raise eyebrow gesture
- Scrolling – Open mouth + head movement to scroll up/down/left/right
- Drag & Drop – Hold mouth‑open gesture to drag, close to release
- Voice Commands – “Open Notepad,” “Go to example.com,” “Type Hello,” “Press Ctrl+S”
- Customizable Settings – Adjust sensitivity, blink thresholds, scroll speed via GUI
VisoVox now harnesses Julep AI’s chat‑context remembering feature to keep track of our design decisions, model parameters, and debugging threads across multiple sessions.
- Seamless Context Retention: No more repeating prompts—Julep AI recalls prior exchanges end‑to‑end, cutting down redundant back‑and‑forth by over 80%.
- Faster Iteration: Prototyping new audio‑visual gestures went from days to hours, thanks to instant access to our last conversations and code snippets.
- Reduced Debugging Time: By preserving error histories and fix attempts, Julep AI has slashed troubleshooting efforts by roughly 50%, saving the team dozens of hours so far.
Overall, Julep AI’s memory feature has been a game‑changer—enabling uninterrupted workflows, accelerating feature roll‑out, and keeping our focus on innovation rather than repetition.
- Computer Vision: OpenCV, MediaPipe
- Speech Recognition: Google Cloud Speech‑to‑Text,
speech_recognition - Automation & Controls: PyAutoGUI,
keyboard, Windows Accessibility APIs - Language & Frameworks: Python 3.x, Tkinter
- LLM & Context: Julep API (GPT‑4o)
- Configuration:
python-dotenv, YAML
- Clone & Install
git clone https://github.com/pratik4505/visivox.git cd visivox python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
- Configure
- Copy
.env.exampleto.envand set yourJULEP_API_KEY. - Place
voiceKey.json(Google creds , service Account for Google Cloud Voice to Speech API) in project root. Full Process
- Copy
- Run
python main.py
- Operate
- Voice Tab: Say “arise” to activate, then speak commands.
- Mouse Tab: Calibrate and use facial gestures to control the cursor.
Team Name: CodeTasctic4
- Utsav Kasvala
- Pratik Nandan
- Vaibhav Kumar Maurya