Skip to content

FEATURE: Add Animated Agent Avatar with MuseTalk Lip-Sync #19

@BenGWeeks

Description

@BenGWeeks

Feature Description

Add a visual agent avatar to Nod.ie that displays an animated character whose mouth movements are synchronized with the AI's speech output using MuseTalk technology. The avatar will be displayed in a circular frame, maintaining Nod.ie's distinctive design.

Key Requirements

  1. Avatar Display

    • Show the agent image (Nod.ie character) in a circular frame
    • Increase window size from 120x120 to 250x250
    • Maintain transparent background and frameless design
    • Keep circular aesthetic with avatar masked to circle
  2. Lip-Sync Animation

    • Integrate MuseTalk for real-time mouth animation
    • Sync mouth movements with TTS audio output from Unmute
    • Gracefully fallback to static image if resources unavailable
    • Support 30+ fps animation when available
  3. UI/UX Considerations

    • Preserve circular design language
    • Maintain drag-to-move functionality
    • Add avatar toggle in existing Settings menu
    • Smooth transitions between animated/static states
    • Show audio activity ring around avatar edge

Technical Implementation Plan

Phase 1: MuseTalk Backend Service

  1. Create Docker Container

    musetalk-service:
      image: nodie/musetalk-api:latest
      ports:
        - "8765:8765"
      environment:
        - MODEL_PATH=/models
        - FACE_SIZE=256
  2. API Endpoints

    • POST /initialize - Load model and prepare for streaming
    • POST /process - Send audio chunk, receive video frame
    • GET /health - Check service availability
    • POST /cleanup - Release resources
  3. Backend Architecture

    • FastAPI or Flask for REST API
    • WebSocket support for real-time streaming
    • Queue system for frame buffering
    • Automatic GPU detection and fallback

Phase 2: Frontend Integration

  1. Modify Electron Window (main.js)

    • Update size: 120x120 → 250x250
    • Keep circular transparent design
  2. Update UI (index.html)

    • Add avatar container with circular mask
    • Layer structure:
      • Background: Avatar (static or animated)
      • Overlay: Audio activity ring
      • Foreground: State indicators
  3. Renderer Updates (renderer.js)

    • Add MuseTalk WebSocket client
    • Stream audio to MuseTalk backend
    • Receive and display video frames
    • Handle fallback to static image

Phase 3: Audio-Video Synchronization

  1. Audio Pipeline

    Unmute TTS → Audio Buffer → MuseTalk API
                      ↓
                  Audio Playback
    
  2. Frame Synchronization

    • Tag audio chunks with timestamps
    • Buffer video frames with matching timestamps
    • Synchronize playback using requestAnimationFrame

Phase 4: Settings Integration

  1. Add to existing Settings dialog:

    • "Show Avatar" toggle (default: on)
    • "Avatar Quality" dropdown (auto/high/low)
    • "Use GPU Acceleration" checkbox
  2. Store preferences using existing electron-store

Phase 5: Resource Management

  1. Fallback Logic

    • Check MuseTalk service health on startup
    • Monitor frame processing latency
    • Auto-disable if latency > 500ms
    • Graceful degradation to static image
  2. Performance Optimization

    • Implement frame skipping for low-end systems
    • Cache processed frames when possible
    • Limit concurrent processing requests

Docker Container Solution

MuseTalk API Container

FROM python:3.9-cuda

# Install MuseTalk dependencies
RUN pip install torch torchvision torchaudio
RUN git clone https://github.com/TMElyralab/MuseTalk.git

# Install API framework
RUN pip install fastapi uvicorn websockets

# Copy API wrapper code
COPY musetalk_api.py /app/

# Download models
RUN python -m musetalk.download_models

EXPOSE 8765
CMD ["uvicorn", "musetalk_api:app", "--host", "0.0.0.0", "--port", "8765"]

API Wrapper (musetalk_api.py)

  • Handles model initialization
  • Processes audio → video frame conversion
  • Manages GPU/CPU fallback
  • Implements frame caching
  • WebSocket streaming support

Benefits of Docker Approach

  1. Isolation - Python/ML dependencies contained
  2. Portability - Works across platforms
  3. Scalability - Can run on separate machine
  4. Consistency - Same as Unmute architecture
  5. Optional - Users can disable if not needed

Technical Challenges & Solutions

Challenge 1: MuseTalk requires Python/ML environment

  • Solution: Docker container with REST/WebSocket API, similar to Unmute backend

Challenge 2: Maintaining circular design with video

  • Solution: CSS clip-path or canvas masking to create circular video viewport

Challenge 3: Real-time performance

  • Solution: Adaptive quality with automatic fallback to static image

Success Criteria

  • Avatar displays in 250x250 circular window
  • Lip movements sync with speech when resources available
  • Automatic fallback to static image when needed
  • Settings toggle for avatar on/off
  • Maintains <200ms latency target
  • Docker container easy to deploy

Future Enhancements

  • Multiple avatar options in Settings
  • Facial expressions beyond lip-sync
  • Custom avatar upload support
  • Avatar marketplace integration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions