Agent Studio

Agents

Voice AI agent configuration, prompts, and behavior

Agents

An Agent is a voice AI entity that can have conversations with users. Each agent has its own personality (prompt), voice, language settings, and capabilities (tools).

Agent Configuration

{
  "name": "meal-coach-agent",
  "display_name": "Meal Coach Agent",
  "description": "Interactive meal logging agent",
  "config": {
    "prompt": {
      "system": "You are a friendly health coach...",
      "greeting": "Hi {{user.name}}! What did you have for {{workflow.first_pending_meal}}?",
      "greeting_interruptible": false,
      "variables": [
        { "name": "user.name", "default": "there" },
        { "name": "user.language", "default": "hi" },
        { "name": "user.language_name", "default": "Hindi" }
      ]
    },
    "stt": {
      "provider": "deepgram",
      "model": "nova-3",
      "language": "multi"
    },
    "tts": {
      "provider": "cartesia",
      "model": "sonic-3",
      "voice_id": "{{user.voice_id}}",
      "options": {
        "emotion": ["Affectionate"],
        "speed": 1.0
      }
    },
    "llm": {
      "provider": "gemini",
      "model": "gemini-2.0-flash-exp",
      "temperature": 0.5
    },
    "vad": {
      "provider": "silero",
      "activation_threshold": 0.5
    },
    "tools": ["collect_meal_info", "log_skipped_meal", "finish_call"],
    "handoffs": [
      { "target_agent": "feedback-agent", "conditions": ["all_meals_logged"] }
    ],
    "session": {
      "min_endpointing_delay": 0.3,
      "auto_disconnect_timeout": 300
    },
    "languages": ["hi", "en", "ta", "te", "bn", "mr", "gu", "kn", "ml", "ur"],
    "default_language": "hi"
  }
}

Prompt System

The prompt system controls what the agent says and how it behaves.

System Prompt

The core personality and instructions for the agent. Supports template variables:

You are Tap Health Coach, a friendly health coach.

LANGUAGE: Speak ONLY in {{user.language_name}}. All responses must be in {{user.language_name}}.

CONTEXT:
- User: {{user.name}} | Time: {{workflow.current_time}}
- Pending meals: {{workflow.pending_meals_display}}
- Already logged: {{workflow.logged_meals_display}}

INSTRUCTIONS:
...

Greeting

The first thing the agent says when activated. This is spoken via TTS before the LLM engages.

{
  "greeting": "Hi {{user.name}}! Your coach from Tap Health here.",
  "greeting_interruptible": false
}
PropertyTypeDescription
greetingstringTemplate string spoken at call start
greeting_interruptiblebooleanIf false, user cannot interrupt the greeting

Variables

Variables define dynamic values that can be injected into prompts. They are resolved from the Call Context.

{
  "variables": [
    { "name": "user.name", "default": "there" },
    { "name": "user.language_name", "default": "Hindi" },
    { "name": "workflow.pending_meals", "default": [] }
  ]
}

Variable paths use dot notation:

  • user.* - User data from user_context when call is dispatched
  • workflow.* - Shared workflow state
  • flags.* - Boolean flags set by tools

Multilingual Support

Agents can support multiple languages. The language is determined by user.language in the call context.

Language Configuration

{
  "languages": ["hi", "en", "ta", "te", "bn", "mr", "gu", "kn", "ml", "ur"],
  "default_language": "hi"
}

Language in Prompts

Include language instructions in your system prompt:

LANGUAGE: Speak ONLY in {{user.language_name}}. All responses must be in {{user.language_name}}.

- If Hindi: Use natural spoken Hindi (Hinglish mix is fine)
- If English: Use simple, warm English
- For regional languages: Use natural spoken form

Language-Specific Greetings

The backend can provide pre-computed greetings in the user's language:

# Backend dispatches call with language-specific greeting
await client.post("/api/v1/calls", json={
    "workflow_slug": "meal-logging",
    "user_context": {
        "name": "Rahul",
        "language": "hi",
        "language_name": "Hindi",
        "greeting": "Namaste Rahul! Aaj breakfast mein kya khaya?"
    }
})

Then in agent config:

{
  "greeting": "{{user.greeting}}"
}

See Multilingual Agents Guide for detailed setup.


Provider Configuration

Each agent specifies which AI providers to use:

Speech-to-Text (STT)

{
  "stt": {
    "provider": "deepgram",
    "model": "nova-3",
    "language": "multi"
  }
}
ProviderModelsLanguages
deepgramnova-3, nova-2multi, en, hi, ta, te, etc.

Text-to-Speech (TTS)

{
  "tts": {
    "provider": "cartesia",
    "model": "sonic-3",
    "voice_id": "{{user.voice_id}}",
    "options": {
      "emotion": ["Affectionate"],
      "speed": 1.0
    }
  }
}
ProviderModelsFeatures
cartesiasonic-3Multiple Indian language voices, emotion control

The voice_id can be a template to select voice based on user language.

Language Model (LLM)

{
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.0-flash-exp",
    "temperature": 0.5,
    "max_tokens": 1024
  }
}
ProviderModels
geminigemini-2.0-flash-exp, gemini-1.5-pro
openaigpt-4o, gpt-4o-mini

Voice Activity Detection (VAD)

{
  "vad": {
    "provider": "silero",
    "activation_threshold": 0.5,
    "min_speech_duration_ms": 50,
    "min_silence_duration_ms": 150
  }
}

Tools

Tools are functions the agent can invoke during conversation. Defined as an array of tool names:

{
  "tools": ["collect_meal_info", "log_skipped_meal", "finish_call"]
}

The agent can only use tools that are:

  1. Listed in its tools array
  2. Exist in the tenant's tool library

See Tools for creating tools.


Handoffs

Agents can transfer conversations to other agents:

{
  "handoffs": [
    {
      "target_agent": "feedback-agent",
      "conditions": ["all_meals_logged"]
    }
  ]
}

Handoffs are typically triggered by tools using the handoff action type. See Workflows for multi-agent orchestration.


Session Configuration

Control conversation behavior:

{
  "session": {
    "min_endpointing_delay": 0.3,
    "auto_disconnect_timeout": 300,
    "max_tool_steps": 10
  }
}
PropertyTypeDescription
min_endpointing_delaynumberSeconds to wait after user stops speaking (lower = faster response)
auto_disconnect_timeoutnumberSeconds of inactivity before auto-disconnect
max_tool_stepsnumberMaximum tool calls per conversation

Key Inheritance

Agents inherit API keys in this order:

  1. Agent override - Keys specified in agent config
  2. Tenant BYOK - Keys in tenant's provider settings
  3. Platform default - Platform-level API keys

This allows tenants to bring their own keys while having platform defaults as fallback.

On this page