Voice Agents

Voice agents are the realtime orchestration entrypoints for live conversations. They wrap prompt versions, add voice configuration, define handoff targets, and start inspectable realtime sessions.

What voice agents are

A voice agent currently includes:

  • name and stable agent key
  • status
  • versions
  • published version state
  • a root prompt version for each agent version
  • optional voice configuration
  • optional handoff description

Current product behavior

The app presents voice agents as a distinct resource from text prompts and multi-step agents:

  • prompts define the core behavior
  • tools provide external capabilities
  • voice agents define realtime entrypoints and handoffs
  • realtime sessions record live execution

The backend currently exposes these routes under /api/realtime-agents and alternate /api/voice-agents aliases.

Session flow

Creating a realtime session returns:

  • a run id
  • a provider/model pair
  • an external realtime session id
  • a client secret for the realtime provider
  • the resolved agent graph snapshot used for the session

That graph includes the prompt composition and the tools available to each participating agent version.

API workflow

Create a voice agent version

curl -X POST "$API_BASE/api/realtime-agents/agent_123/versions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "organizationId": "org_123",
    "promptVersionId": "prompt_version_123",
    "voice": "alloy",
    "handoffDescription": "Transfer billing questions to the billing specialist"
  }'

Create a realtime session

curl -X POST "$API_BASE/api/realtime-agents/agent_123/session" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "organizationId": "org_123"
  }'

Was this page helpful?