Skip to content

AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.

Notifications You must be signed in to change notification settings

tblakex01/ai-phone-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Phone Agent Banner

πŸ“ž AI Phone Agent

Intelligent AI-powered voice assistant for automating phone calls

React TypeScript Vite Google Gemini License

✨ Features β€’ πŸš€ Quick Start β€’ 🎭 Personas β€’ πŸ—οΈ Architecture β€’ πŸ“– Documentation


🌟 Overview

AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.

πŸŽ™οΈ Real-time Voice πŸ€– Multiple Personas πŸ“ Live Transcription πŸ”Š Natural Speech
Bidirectional audio streaming 5 built-in presets + custom See conversations in real-time Multiple voice options

✨ Features

  • πŸ—£οΈ Real-time Voice Conversations - Bidirectional audio streaming with Google Gemini
  • 🎭 Customizable Personas - Switch between different AI personalities or create your own
  • πŸ“ Live Transcription - See both user and agent speech transcribed in real-time
  • πŸ”Š Multiple Voices - Choose from 5 different voice options (Puck, Charon, Kore, Fenrir, Zephyr)
  • ⚑ Low Latency - Optimized audio pipeline for natural conversation flow
  • 🎨 Modern UI - Clean, phone-like interface built with React and Tailwind CSS
  • πŸ“± Responsive Design - Works seamlessly across devices

πŸš€ Quick Start

Prerequisites

  • πŸ“¦ Node.js (v18 or higher recommended)
  • πŸ”‘ Google Gemini API Key - Get one at Google AI Studio

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-phone-agent.git
cd ai-phone-agent

# Install dependencies
npm install

# Configure environment
cp .env.example .env.local

Configuration

Create a .env.local file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

Running the App

# Start development server
npm run dev

πŸŽ‰ Open http://localhost:3000 in your browser!


🎭 Personas

AI Phone Agent comes with 5 pre-configured personas for common use cases:

Persona Description Voice Use Case
πŸ§‘β€πŸ’Ό Personal Assistant Helpful assistant for general tasks Kore General inquiries & tasks
🍽️ Restaurant Booker Makes dinner reservations Zephyr Outbound booking calls
🏒 Business Receptionist Answers calls for TechSolutions Inc Puck Inbound business calls
πŸ”§ Tech Support Troubleshoots internet issues Fenrir Customer support
πŸ“‹ Call Screener Screens incoming calls Charon Call filtering

Custom Personas

Create your own persona by configuring:

  • Name - Display name for the persona
  • Voice - Choose from available voices
  • System Instructions - Define the AI's behavior and role
  • Greeting - Initial message spoken when call starts

πŸ› οΈ Tech Stack

Category Technology
βš›οΈ Frontend React 19
πŸ“˜ Language TypeScript 5.8
⚑ Build Tool Vite 6
πŸ€– AI/ML Google Gemini SDK
🎨 Styling Tailwind CSS
πŸ”Š Audio Web Audio API

πŸ—οΈ Architecture

ai-phone-agent/
β”œβ”€β”€ πŸ“ components/           # React UI components
β”‚   β”œβ”€β”€ CallScreen.tsx       # Main call interface & audio handling
β”‚   β”œβ”€β”€ WelcomeScreen.tsx    # Persona selection screen
β”‚   β”œβ”€β”€ StatusIndicator.tsx  # Call status display
β”‚   └── Icons.tsx            # SVG icon components
β”œβ”€β”€ πŸ“ services/
β”‚   └── geminiService.ts     # Gemini API integration
β”œβ”€β”€ πŸ“ utils/
β”‚   └── audioUtils.ts        # Audio encoding utilities
β”œβ”€β”€ πŸ“„ App.tsx               # Root component
β”œβ”€β”€ πŸ“„ types.ts              # TypeScript definitions
β”œβ”€β”€ πŸ“„ constants.ts          # Config & persona presets
└── πŸ“„ vite.config.ts        # Build configuration

Audio Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Microphone │────▢│ 16kHz PCM    │────▢│   Gemini    β”‚
β”‚   Input     β”‚     β”‚ Base64 Encodeβ”‚     β”‚   Live API  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                                                β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   Speaker   │◀────│ 24kHz Decode β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚   Output    β”‚     β”‚ AudioBuffer  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Scripts

Command Description
npm run dev πŸš€ Start development server
npm run build πŸ“¦ Build for production
npm run preview πŸ‘οΈ Preview production build

πŸ”§ Configuration

Environment Variables

Variable Required Description
GEMINI_API_KEY βœ… Yes Your Google Gemini API key

Gemini Models Used

  • Live Conversations: gemini-2.5-flash-native-audio-preview-09-2025
  • Text-to-Speech: gemini-2.5-flash-preview-tts

πŸ“– Documentation


🌐 Deployment

Production Build

# Create optimized build
npm run build

# Preview locally
npm run preview

The build output will be in the dist/ directory, ready for deployment to any static hosting service.

Hosting Options

  • β–² Vercel - Zero-config deployment
  • πŸ”· Netlify - Simple drag & drop
  • ☁️ Google Cloud Run - Containerized deployment
  • πŸ…°οΈ AWS Amplify - Full-stack hosting

Note: HTTPS is required for microphone access in production environments.


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


Built with Google Gemini by Anthony M

⬆ Back to Top

About

AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •