-
Notifications
You must be signed in to change notification settings - Fork 0
FEATURE: Add Animated Agent Avatar with MuseTalk Lip-Sync #19
Copy link
Copy link
Open
Description
Feature Description
Add a visual agent avatar to Nod.ie that displays an animated character whose mouth movements are synchronized with the AI's speech output using MuseTalk technology. The avatar will be displayed in a circular frame, maintaining Nod.ie's distinctive design.
Key Requirements
-
Avatar Display
- Show the agent image (Nod.ie character) in a circular frame
- Increase window size from 120x120 to 250x250
- Maintain transparent background and frameless design
- Keep circular aesthetic with avatar masked to circle
-
Lip-Sync Animation
- Integrate MuseTalk for real-time mouth animation
- Sync mouth movements with TTS audio output from Unmute
- Gracefully fallback to static image if resources unavailable
- Support 30+ fps animation when available
-
UI/UX Considerations
- Preserve circular design language
- Maintain drag-to-move functionality
- Add avatar toggle in existing Settings menu
- Smooth transitions between animated/static states
- Show audio activity ring around avatar edge
Technical Implementation Plan
Phase 1: MuseTalk Backend Service
-
Create Docker Container
musetalk-service: image: nodie/musetalk-api:latest ports: - "8765:8765" environment: - MODEL_PATH=/models - FACE_SIZE=256
-
API Endpoints
POST /initialize- Load model and prepare for streamingPOST /process- Send audio chunk, receive video frameGET /health- Check service availabilityPOST /cleanup- Release resources
-
Backend Architecture
- FastAPI or Flask for REST API
- WebSocket support for real-time streaming
- Queue system for frame buffering
- Automatic GPU detection and fallback
Phase 2: Frontend Integration
-
Modify Electron Window (
main.js)- Update size: 120x120 → 250x250
- Keep circular transparent design
-
Update UI (
index.html)- Add avatar container with circular mask
- Layer structure:
- Background: Avatar (static or animated)
- Overlay: Audio activity ring
- Foreground: State indicators
-
Renderer Updates (
renderer.js)- Add MuseTalk WebSocket client
- Stream audio to MuseTalk backend
- Receive and display video frames
- Handle fallback to static image
Phase 3: Audio-Video Synchronization
-
Audio Pipeline
Unmute TTS → Audio Buffer → MuseTalk API ↓ Audio Playback -
Frame Synchronization
- Tag audio chunks with timestamps
- Buffer video frames with matching timestamps
- Synchronize playback using requestAnimationFrame
Phase 4: Settings Integration
-
Add to existing Settings dialog:
- "Show Avatar" toggle (default: on)
- "Avatar Quality" dropdown (auto/high/low)
- "Use GPU Acceleration" checkbox
-
Store preferences using existing electron-store
Phase 5: Resource Management
-
Fallback Logic
- Check MuseTalk service health on startup
- Monitor frame processing latency
- Auto-disable if latency > 500ms
- Graceful degradation to static image
-
Performance Optimization
- Implement frame skipping for low-end systems
- Cache processed frames when possible
- Limit concurrent processing requests
Docker Container Solution
MuseTalk API Container
FROM python:3.9-cuda
# Install MuseTalk dependencies
RUN pip install torch torchvision torchaudio
RUN git clone https://github.com/TMElyralab/MuseTalk.git
# Install API framework
RUN pip install fastapi uvicorn websockets
# Copy API wrapper code
COPY musetalk_api.py /app/
# Download models
RUN python -m musetalk.download_models
EXPOSE 8765
CMD ["uvicorn", "musetalk_api:app", "--host", "0.0.0.0", "--port", "8765"]API Wrapper (musetalk_api.py)
- Handles model initialization
- Processes audio → video frame conversion
- Manages GPU/CPU fallback
- Implements frame caching
- WebSocket streaming support
Benefits of Docker Approach
- Isolation - Python/ML dependencies contained
- Portability - Works across platforms
- Scalability - Can run on separate machine
- Consistency - Same as Unmute architecture
- Optional - Users can disable if not needed
Technical Challenges & Solutions
Challenge 1: MuseTalk requires Python/ML environment
- Solution: Docker container with REST/WebSocket API, similar to Unmute backend
Challenge 2: Maintaining circular design with video
- Solution: CSS clip-path or canvas masking to create circular video viewport
Challenge 3: Real-time performance
- Solution: Adaptive quality with automatic fallback to static image
Success Criteria
- Avatar displays in 250x250 circular window
- Lip movements sync with speech when resources available
- Automatic fallback to static image when needed
- Settings toggle for avatar on/off
- Maintains <200ms latency target
- Docker container easy to deploy
Future Enhancements
- Multiple avatar options in Settings
- Facial expressions beyond lip-sync
- Custom avatar upload support
- Avatar marketplace integration
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels