Skip to content

Commit 54f5f75

Browse files
committed
1.0 examples
1 parent 5de10db commit 54f5f75

File tree

86 files changed

+72965
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+72965
-0
lines changed

README.md

+260
Original file line numberDiff line numberDiff line change
@@ -1 +1,261 @@
11

2+
<div style='text-align: center; margin-bottom: 1rem; display: flex; justify-content: center; align-items: center;'>
3+
<h1 style='color: white; margin: 0;'>LiveKit Agents Examples</h1>
4+
<img src='livekit-logo-dark.png'
5+
alt="LiveKit Logo"
6+
style="margin-left: 10px; height: 60px;">
7+
</div>
8+
9+
<div style="display: flex; flex-direction: row; justify-content: center">
10+
<a href="https://github.com/livekit/agents" target="_blank"><img alt="Static Badge" src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"></a>
11+
<a href="https://docs.livekit.io/agents/" target="_blank"><img alt="Static Badge" src="https://img.shields.io/badge/docs-blue?logo=readthedocs&logoColor=white"></a>
12+
</div>
13+
14+
<h3 style='text-align: center'>
15+
Example applications and code snippets for LiveKit Agents
16+
</h3>
17+
18+
This repository contains example code and demo applications for LiveKit Agents, a suite of tools for building, deploying, and scaling real-time voice and video AI agents.
19+
20+
## LiveKit Agents
21+
22+
LiveKit Agents is a Python library that enables you to build intelligent conversational agents with speech, text, and media capabilities. This repository contains examples that demonstrate how to use various features of the library.
23+
24+
## Installation
25+
26+
To use these examples, first install the LiveKit Agents library:
27+
28+
```bash
29+
pip install livekit-agents
30+
```
31+
32+
## Key Features of LiveKit Agents
33+
34+
- 🗣️ **Speech and Voice Processing** - Built-in STT, TTS, and VAD capabilities for natural conversations.
35+
- 💬 **Comprehensive LLM Support** - Integrate with OpenAI, Anthropic, Google, and more.
36+
- 📞 **Telephony Integration** - Make and receive SIP calls with your agents.
37+
- 📊 **Metrics and Monitoring** - Track and analyze agent performance.
38+
- 🔄 **Real-time Processing** - Stream audio, text, and video in real-time.
39+
- 📱 **Multi-modal Capabilities** - Handle text, audio, and video simultaneously.
40+
- 🌐 **Multilingual Support** - Transcribe and respond in multiple languages.
41+
- 🧩 **Extensible Plugin System** - Add custom capabilities to your agents.
42+
43+
## Official Documentation
44+
45+
For full documentation of LiveKit Agents, visit [https://docs.livekit.io/agents/](https://docs.livekit.io/agents/)
46+
47+
## Example Demos
48+
49+
<table>
50+
<tr>
51+
<td width="50%">
52+
<h3>🎙️ Listen and Respond</h3>
53+
<p>Basic agent that listens for user input and provides a response.</p>
54+
<p>
55+
<a href="basics/listen_and_respond.py">Code</a>
56+
</p>
57+
</td>
58+
<td width="50%">
59+
<h3>🔄 Uninterruptable</h3>
60+
<p>An agent that continues speaking without being interrupted.</p>
61+
<p>
62+
<a href="basics/uninterruptable.py">Code</a>
63+
</p>
64+
</td>
65+
</tr>
66+
67+
<tr>
68+
<td width="50%">
69+
<h3>🏥 Medical Office Triage</h3>
70+
<p>Agent that triages patients based on symptoms and medical history.</p>
71+
<p>
72+
<a href="complex-agents/medical_office_triage/">Code</a>
73+
</p>
74+
</td>
75+
<td width="50%">
76+
<h3>🛍️ Personal Shopper</h3>
77+
<p>AI shopping assistant that helps find products based on user preferences.</p>
78+
<p>
79+
<a href="complex-agents/personal_shopper/">Code</a>
80+
</p>
81+
</td>
82+
</tr>
83+
84+
<tr>
85+
<td width="50%">
86+
<h3>☎️ Phone Caller</h3>
87+
<p>Agent that can make outbound phone calls and handle conversations.</p>
88+
<p>
89+
<a href="telephony/make_call/">Code</a>
90+
</p>
91+
</td>
92+
<td width="50%">
93+
<h3>🌐 Change Language</h3>
94+
<p>Agent that can switch between different languages during conversation.</p>
95+
<p>
96+
<a href="pipeline-tts/elevenlabs_change_language.py">Code</a>
97+
</p>
98+
</td>
99+
</tr>
100+
101+
<tr>
102+
<td width="50%">
103+
<h3>🔄 TTS Comparison</h3>
104+
<p>Compare different text-to-speech providers side by side.</p>
105+
<p>
106+
<a href="pipeline-tts/tts_comparison.py">Code</a>
107+
</p>
108+
</td>
109+
<td width="50%">
110+
<h3>📞 SIP Warm Handoff</h3>
111+
<p>Transfer calls from an AI agent to a human operator seamlessly.</p>
112+
<p>
113+
<a href="telephony/warm_handoff.py">Code</a>
114+
</p>
115+
</td>
116+
</tr>
117+
118+
<tr>
119+
<td width="50%">
120+
<h3>📝 Transcriber</h3>
121+
<p>Real-time speech transcription with high accuracy.</p>
122+
<p>
123+
<a href="pipeline-stt/transcriber.py">Code</a>
124+
</p>
125+
</td>
126+
<td width="50%">
127+
<h3>🗣️ Realtime OpenAI</h3>
128+
<p>Integrate with OpenAI's streaming API for natural conversations.</p>
129+
<p>
130+
<a href="realtime/openai.py">Code</a>
131+
</p>
132+
</td>
133+
</tr>
134+
135+
<td width="50%">
136+
<h3>🔤 Keyword Detection</h3>
137+
<p>Detect specific keywords in speech in real-time.</p>
138+
<p>
139+
<a href="pipeline-stt/keyword_detection.py">Code</a>
140+
</p>
141+
</td>
142+
<td width="50%">
143+
<h3>🎮 Function Calling</h3>
144+
<p>Implement function calling capabilities in your agents.</p>
145+
<p>
146+
<a href="basics/function_calling.py">Code</a>
147+
</p>
148+
</td>
149+
</tr>
150+
151+
<tr>
152+
<td width="50%">
153+
<h3>📞 SIP Lifecycle</h3>
154+
<p>Complete lifecycle management for SIP calls.</p>
155+
<p>
156+
<a href="telephony/sip_lifecycle.py">Code</a>
157+
</p>
158+
</td>
159+
<td width="50%">
160+
<h3>🔄 Context Variables</h3>
161+
<p>Maintain conversation context across interactions.</p>
162+
<p>
163+
<a href="basics/context_variables.py">Code</a>
164+
</p>
165+
</td>
166+
</tr>
167+
168+
<tr>
169+
<td width="50%">
170+
<h3>🔊 Playing Audio</h3>
171+
<p>Play audio files during agent interactions.</p>
172+
<p>
173+
<a href="basics/playing_audio.py">Code</a>
174+
</p>
175+
</td>
176+
<td width="50%">
177+
<h3>🎙️ Sound Repeater</h3>
178+
<p>Simple sound repeating demo for testing audio pipelines.</p>
179+
<p>
180+
<a href="basics/repeater.py">Code</a>
181+
</p>
182+
</td>
183+
</tr>
184+
185+
<tr>
186+
<td width="50%">
187+
<h3>📱 Raspberry Pi Transcriber</h3>
188+
<p>Run transcription on Raspberry Pi hardware.</p>
189+
<p>
190+
<a href="hardware/pi_zero_transcriber.py">Code</a>
191+
</p>
192+
</td>
193+
<td width="50%">
194+
<h3>📞 Answer Incoming Calls</h3>
195+
<p>Set up an agent to answer incoming SIP calls.</p>
196+
<p>
197+
<a href="telephony/answer_call.py">Code</a>
198+
</p>
199+
</td>
200+
</tr>
201+
</table>
202+
203+
## Code Examples by Category
204+
205+
### Basic Features
206+
- [Listen and Respond](basics/listen_and_respond.py)
207+
- [Uninterruptable Agent](basics/uninterruptable.py)
208+
- [Playing Audio](basics/playing_audio.py)
209+
- [Function Calling](basics/function_calling.py)
210+
- [Context Variables](basics/context_variables.py)
211+
- [Sound Repeater](basics/repeater.py)
212+
213+
### LLM Integrations
214+
- [Anthropic Claude](pipeline-llm/anthropic_llm.py)
215+
- [Cerebras](pipeline-llm/cerebras_llm.py)
216+
- [Google Gemini](pipeline-llm/google_llm.py)
217+
- [Ollama](pipeline-llm/ollama_llm.py)
218+
- [OpenAI](pipeline-llm/openai_llm.py)
219+
220+
### TTS Integrations
221+
- [Cartesia](pipeline-tts/cartesia_tts.py)
222+
- [ElevenLabs](pipeline-tts/elevenlabs_tts.py)
223+
- [OpenAI](pipeline-tts/openai_tts.py)
224+
- [PlayAI](pipeline-tts/playai_tts.py)
225+
- [Rime](pipeline-tts/rime_tts.py)
226+
227+
### STT and Voice Processing
228+
- [Transcription](pipeline-stt/transcriber.py)
229+
- [Keyword Detection](pipeline-stt/keyword_detection.py)
230+
231+
### Realtime Processing
232+
- [OpenAI Streaming](realtime/openai.py)
233+
234+
### Advanced LLM Features
235+
- [Interrupt User](pipeline-llm/interrupt_user.py)
236+
- [LLM Content Filter](pipeline-llm/llm_powered_content_filter.py)
237+
- [Simple Content Filter](pipeline-llm/simple_content_filter.py)
238+
- [Replacing LLM Output](pipeline-llm/replacing_llm_output.py)
239+
240+
### Translation Features
241+
- [Pipeline Translator](translators/pipeline_translator.py)
242+
- [TTS Translator](translators/tts_translator.py)
243+
244+
### Telephony
245+
- [Answer Call](telephony/answer_call.py)
246+
- [SIP Lifecycle](telephony/sip_lifecycle.py)
247+
- [Warm Handoff](telephony/warm_handoff.py)
248+
- [Survey Caller](telephony/survey_caller/)
249+
250+
### Metrics and Monitoring
251+
- [LLM Metrics](metrics/metrics_llm.py)
252+
- [STT Metrics](metrics/metrics_stt.py)
253+
- [TTS Metrics](metrics/metrics_tts.py)
254+
- [VAD Metrics](metrics/metrics_vad.py)
255+
256+
### Hardware Integration
257+
- [Raspberry Pi Transcriber](hardware/pi_zero_transcriber.py)
258+
259+
## Complex Demo Agents
260+
- [Medical Office Triage](demos/medical_office_triage/)
261+
- [Personal Shopper](demos/personal_shopper/)

basics/audio.wav

155 KB
Binary file not shown.

basics/context_variables.py

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import logging
2+
from pathlib import Path
3+
from dotenv import load_dotenv
4+
from livekit.agents import JobContext, WorkerOptions, cli
5+
from livekit.agents.voice import Agent, AgentSession
6+
from livekit.plugins import openai, deepgram, silero
7+
8+
load_dotenv(dotenv_path=Path(__file__).parent.parent / '.env')
9+
10+
logger = logging.getLogger("context-variables")
11+
logger.setLevel(logging.INFO)
12+
13+
class ContextAgent(Agent):
14+
def __init__(self, context_vars=None) -> None:
15+
instructions = """
16+
You are a helpful agent. The user's name is {name}.
17+
They are {age} years old and live in {city}.
18+
"""
19+
20+
if context_vars:
21+
instructions = instructions.format(**context_vars)
22+
23+
super().__init__(
24+
instructions=instructions,
25+
stt=deepgram.STT(),
26+
llm=openai.LLM(model="gpt-4o"),
27+
tts=openai.TTS(),
28+
vad=silero.VAD.load()
29+
)
30+
31+
async def on_enter(self):
32+
self.session.generate_reply()
33+
34+
async def entrypoint(ctx: JobContext):
35+
await ctx.connect()
36+
37+
context_variables = {
38+
"name": "Shayne",
39+
"age": 35,
40+
"city": "Toronto"
41+
}
42+
43+
session = AgentSession()
44+
45+
await session.start(
46+
agent=ContextAgent(context_vars=context_variables),
47+
room=ctx.room
48+
)
49+
50+
if __name__ == "__main__":
51+
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

basics/function_calling.py

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
## This is a basic example of how to use function calling.
2+
## To test the function, you can ask the agent to print to the console!
3+
4+
import logging
5+
from pathlib import Path
6+
from dotenv import load_dotenv
7+
from livekit.agents import JobContext, WorkerOptions, cli
8+
from livekit.agents.llm import function_tool
9+
from livekit.agents.voice import Agent, AgentSession, RunContext
10+
from livekit.plugins import deepgram, openai, silero
11+
12+
logger = logging.getLogger("function-calling")
13+
logger.setLevel(logging.INFO)
14+
15+
load_dotenv(dotenv_path=Path(__file__).parent.parent / '.env')
16+
17+
class FunctionAgent(Agent):
18+
def __init__(self) -> None:
19+
super().__init__(
20+
instructions="""
21+
You are a helpful assistant communicating through voice. Don't use any unpronouncable characters.
22+
Note: If asked to print to the console, use the `print_to_console` function.
23+
""",
24+
stt=deepgram.STT(),
25+
llm=openai.LLM(model="gpt-4o"),
26+
tts=openai.TTS(),
27+
vad=silero.VAD.load()
28+
)
29+
30+
@function_tool
31+
async def print_to_console(self, context: RunContext):
32+
print("Console Print Success!")
33+
return None, "I've printed to the console."
34+
35+
async def on_enter(self):
36+
self.session.generate_reply()
37+
38+
async def entrypoint(ctx: JobContext):
39+
await ctx.connect()
40+
41+
session = AgentSession()
42+
43+
await session.start(
44+
agent=FunctionAgent(),
45+
room=ctx.room
46+
)
47+
48+
if __name__ == "__main__":
49+
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

0 commit comments

Comments
 (0)