Intelligent AI-powered voice assistant for automating phone calls
β¨ Features β’ π Quick Start β’ π Personas β’ ποΈ Architecture β’ π Documentation
AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.
| ποΈ Real-time Voice | π€ Multiple Personas | π Live Transcription | π Natural Speech |
|---|---|---|---|
| Bidirectional audio streaming | 5 built-in presets + custom | See conversations in real-time | Multiple voice options |
- π£οΈ Real-time Voice Conversations - Bidirectional audio streaming with Google Gemini
- π Customizable Personas - Switch between different AI personalities or create your own
- π Live Transcription - See both user and agent speech transcribed in real-time
- π Multiple Voices - Choose from 5 different voice options (Puck, Charon, Kore, Fenrir, Zephyr)
- β‘ Low Latency - Optimized audio pipeline for natural conversation flow
- π¨ Modern UI - Clean, phone-like interface built with React and Tailwind CSS
- π± Responsive Design - Works seamlessly across devices
- π¦ Node.js (v18 or higher recommended)
- π Google Gemini API Key - Get one at Google AI Studio
# Clone the repository
git clone https://github.com/yourusername/ai-phone-agent.git
cd ai-phone-agent
# Install dependencies
npm install
# Configure environment
cp .env.example .env.localCreate a .env.local file in the root directory:
GEMINI_API_KEY=your_gemini_api_key_here# Start development server
npm run devπ Open http://localhost:3000 in your browser!
AI Phone Agent comes with 5 pre-configured personas for common use cases:
| Persona | Description | Voice | Use Case |
|---|---|---|---|
| π§βπΌ Personal Assistant | Helpful assistant for general tasks | Kore | General inquiries & tasks |
| π½οΈ Restaurant Booker | Makes dinner reservations | Zephyr | Outbound booking calls |
| π’ Business Receptionist | Answers calls for TechSolutions Inc | Puck | Inbound business calls |
| π§ Tech Support | Troubleshoots internet issues | Fenrir | Customer support |
| π Call Screener | Screens incoming calls | Charon | Call filtering |
Create your own persona by configuring:
- Name - Display name for the persona
- Voice - Choose from available voices
- System Instructions - Define the AI's behavior and role
- Greeting - Initial message spoken when call starts
| Category | Technology |
|---|---|
| βοΈ Frontend | React 19 |
| π Language | TypeScript 5.8 |
| β‘ Build Tool | Vite 6 |
| π€ AI/ML | Google Gemini SDK |
| π¨ Styling | Tailwind CSS |
| π Audio | Web Audio API |
ai-phone-agent/
βββ π components/ # React UI components
β βββ CallScreen.tsx # Main call interface & audio handling
β βββ WelcomeScreen.tsx # Persona selection screen
β βββ StatusIndicator.tsx # Call status display
β βββ Icons.tsx # SVG icon components
βββ π services/
β βββ geminiService.ts # Gemini API integration
βββ π utils/
β βββ audioUtils.ts # Audio encoding utilities
βββ π App.tsx # Root component
βββ π types.ts # TypeScript definitions
βββ π constants.ts # Config & persona presets
βββ π vite.config.ts # Build configuration
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Microphone ββββββΆβ 16kHz PCM ββββββΆβ Gemini β
β Input β β Base64 Encodeβ β Live API β
βββββββββββββββ ββββββββββββββββ ββββββββ¬βββββββ
β
βββββββββββββββ ββββββββββββββββ β
β Speaker βββββββ 24kHz Decode ββββββββββββββ
β Output β β AudioBuffer β
βββββββββββββββ ββββββββββββββββ
| Command | Description |
|---|---|
npm run dev |
π Start development server |
npm run build |
π¦ Build for production |
npm run preview |
ποΈ Preview production build |
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
β Yes | Your Google Gemini API key |
- Live Conversations:
gemini-2.5-flash-native-audio-preview-09-2025 - Text-to-Speech:
gemini-2.5-flash-preview-tts
- CLAUDE.MD - AI assistant context and codebase guide
- Google Gemini API - Gemini API documentation
- React Documentation - React framework docs
- Vite Guide - Vite build tool docs
# Create optimized build
npm run build
# Preview locally
npm run previewThe build output will be in the dist/ directory, ready for deployment to any static hosting service.
- β² Vercel - Zero-config deployment
- π· Netlify - Simple drag & drop
- βοΈ Google Cloud Run - Containerized deployment
π °οΈ AWS Amplify - Full-stack hosting
Note: HTTPS is required for microphone access in production environments.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini - Powering the AI conversations
- React - UI framework
- Vite - Lightning fast build tool
- Tailwind CSS - Utility-first CSS framework
Built with Google Gemini by Anthony M