| title | category | tags | difficulty | description | demonstrates | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
Repeater |
basics |
|
beginner |
Shows how to create an agent that can repeat what the user says. |
|
This example shows how to build a simple repeater: when the user finishes speaking, the agent says back exactly what it heard by listening to the user_input_transcribed event.
- Add a
.envin this directory with your LiveKit credentials:LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret - Install dependencies:
pip install "livekit-agents[silero]" python-dotenv
Load your .env so the media plugins can authenticate and initialize the AgentServer.
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, inference
from livekit.plugins import silero
load_dotenv()
server = AgentServer()Preload the VAD model once per process to reduce connection latency.
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarmCreate the session with interruptions disabled so playback is not cut off mid-echo. Attach a handler to user_input_transcribed; once a transcript is marked final, echo it back with session.say.
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
stt=inference.STT(model="deepgram/nova-3-general"),
llm=inference.LLM(model="openai/gpt-5-mini"),
tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
vad=ctx.proc.userdata["vad"],
allow_interruptions=False,
)
@session.on("user_input_transcribed")
def on_transcript(transcript):
if transcript.is_final:
session.say(transcript.transcript)
await session.start(
agent=Agent(
instructions="You are a helpful assistant that repeats what the user says."
),
room=ctx.room
)
await ctx.connect()Start the agent server with the CLI runner.
if __name__ == "__main__":
cli.run_app(server)python repeater.py console- The VAD is prewarmed once per process for faster connections.
- A session-level event emits transcripts as the user speaks.
- When the transcript is final, the handler calls
session.saywith the same text. - Because interruptions are disabled, the echoed audio plays fully.
- This pattern is a starting point for building more advanced post-processing on transcripts.
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, inference
from livekit.plugins import silero
load_dotenv()
server = AgentServer()
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarm
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
stt=inference.STT(model="deepgram/nova-3-general"),
llm=inference.LLM(model="openai/gpt-5-mini"),
tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
vad=ctx.proc.userdata["vad"],
allow_interruptions=False,
)
@session.on("user_input_transcribed")
def on_transcript(transcript):
if transcript.is_final:
session.say(transcript.transcript)
await session.start(
agent=Agent(
instructions="You are a helpful assistant that repeats what the user says."
),
room=ctx.room
)
await ctx.connect()
if __name__ == "__main__":
cli.run_app(server)