Agents
Voice AI agent configuration, prompts, and behavior
Agents
An Agent is a voice AI entity that can have conversations with users. Each agent has its own personality (prompt), voice, language settings, and capabilities (tools).
Agent Configuration
{
"name": "meal-coach-agent",
"display_name": "Meal Coach Agent",
"description": "Interactive meal logging agent",
"config": {
"prompt": {
"system": "You are a friendly health coach...",
"greeting": "Hi {{user.name}}! What did you have for {{workflow.first_pending_meal}}?",
"greeting_interruptible": false,
"variables": [
{ "name": "user.name", "default": "there" },
{ "name": "user.language", "default": "hi" },
{ "name": "user.language_name", "default": "Hindi" }
]
},
"stt": {
"provider": "deepgram",
"model": "nova-3",
"language": "multi"
},
"tts": {
"provider": "cartesia",
"model": "sonic-3",
"voice_id": "{{user.voice_id}}",
"options": {
"emotion": ["Affectionate"],
"speed": 1.0
}
},
"llm": {
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"temperature": 0.5
},
"vad": {
"provider": "silero",
"activation_threshold": 0.5
},
"tools": ["collect_meal_info", "log_skipped_meal", "finish_call"],
"handoffs": [
{ "target_agent": "feedback-agent", "conditions": ["all_meals_logged"] }
],
"session": {
"min_endpointing_delay": 0.3,
"auto_disconnect_timeout": 300
},
"languages": ["hi", "en", "ta", "te", "bn", "mr", "gu", "kn", "ml", "ur"],
"default_language": "hi"
}
}Prompt System
The prompt system controls what the agent says and how it behaves.
System Prompt
The core personality and instructions for the agent. Supports template variables:
You are Tap Health Coach, a friendly health coach.
LANGUAGE: Speak ONLY in {{user.language_name}}. All responses must be in {{user.language_name}}.
CONTEXT:
- User: {{user.name}} | Time: {{workflow.current_time}}
- Pending meals: {{workflow.pending_meals_display}}
- Already logged: {{workflow.logged_meals_display}}
INSTRUCTIONS:
...Greeting
The first thing the agent says when activated. This is spoken via TTS before the LLM engages.
{
"greeting": "Hi {{user.name}}! Your coach from Tap Health here.",
"greeting_interruptible": false
}| Property | Type | Description |
|---|---|---|
greeting | string | Template string spoken at call start |
greeting_interruptible | boolean | If false, user cannot interrupt the greeting |
Variables
Variables define dynamic values that can be injected into prompts. They are resolved from the Call Context.
{
"variables": [
{ "name": "user.name", "default": "there" },
{ "name": "user.language_name", "default": "Hindi" },
{ "name": "workflow.pending_meals", "default": [] }
]
}Variable paths use dot notation:
user.*- User data fromuser_contextwhen call is dispatchedworkflow.*- Shared workflow stateflags.*- Boolean flags set by tools
Multilingual Support
Agents can support multiple languages. The language is determined by user.language in the call context.
Language Configuration
{
"languages": ["hi", "en", "ta", "te", "bn", "mr", "gu", "kn", "ml", "ur"],
"default_language": "hi"
}Language in Prompts
Include language instructions in your system prompt:
LANGUAGE: Speak ONLY in {{user.language_name}}. All responses must be in {{user.language_name}}.
- If Hindi: Use natural spoken Hindi (Hinglish mix is fine)
- If English: Use simple, warm English
- For regional languages: Use natural spoken formLanguage-Specific Greetings
The backend can provide pre-computed greetings in the user's language:
# Backend dispatches call with language-specific greeting
await client.post("/api/v1/calls", json={
"workflow_slug": "meal-logging",
"user_context": {
"name": "Rahul",
"language": "hi",
"language_name": "Hindi",
"greeting": "Namaste Rahul! Aaj breakfast mein kya khaya?"
}
})Then in agent config:
{
"greeting": "{{user.greeting}}"
}See Multilingual Agents Guide for detailed setup.
Provider Configuration
Each agent specifies which AI providers to use:
Speech-to-Text (STT)
{
"stt": {
"provider": "deepgram",
"model": "nova-3",
"language": "multi"
}
}| Provider | Models | Languages |
|---|---|---|
deepgram | nova-3, nova-2 | multi, en, hi, ta, te, etc. |
Text-to-Speech (TTS)
{
"tts": {
"provider": "cartesia",
"model": "sonic-3",
"voice_id": "{{user.voice_id}}",
"options": {
"emotion": ["Affectionate"],
"speed": 1.0
}
}
}| Provider | Models | Features |
|---|---|---|
cartesia | sonic-3 | Multiple Indian language voices, emotion control |
The voice_id can be a template to select voice based on user language.
Language Model (LLM)
{
"llm": {
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"temperature": 0.5,
"max_tokens": 1024
}
}| Provider | Models |
|---|---|
gemini | gemini-2.0-flash-exp, gemini-1.5-pro |
openai | gpt-4o, gpt-4o-mini |
Voice Activity Detection (VAD)
{
"vad": {
"provider": "silero",
"activation_threshold": 0.5,
"min_speech_duration_ms": 50,
"min_silence_duration_ms": 150
}
}Tools
Tools are functions the agent can invoke during conversation. Defined as an array of tool names:
{
"tools": ["collect_meal_info", "log_skipped_meal", "finish_call"]
}The agent can only use tools that are:
- Listed in its
toolsarray - Exist in the tenant's tool library
See Tools for creating tools.
Handoffs
Agents can transfer conversations to other agents:
{
"handoffs": [
{
"target_agent": "feedback-agent",
"conditions": ["all_meals_logged"]
}
]
}Handoffs are typically triggered by tools using the handoff action type. See Workflows for multi-agent orchestration.
Session Configuration
Control conversation behavior:
{
"session": {
"min_endpointing_delay": 0.3,
"auto_disconnect_timeout": 300,
"max_tool_steps": 10
}
}| Property | Type | Description |
|---|---|---|
min_endpointing_delay | number | Seconds to wait after user stops speaking (lower = faster response) |
auto_disconnect_timeout | number | Seconds of inactivity before auto-disconnect |
max_tool_steps | number | Maximum tool calls per conversation |
Key Inheritance
Agents inherit API keys in this order:
- Agent override - Keys specified in agent config
- Tenant BYOK - Keys in tenant's provider settings
- Platform default - Platform-level API keys
This allows tenants to bring their own keys while having platform defaults as fallback.