Agent Studio
Guides

Multilingual Agents

Building voice agents that support multiple languages

Multilingual Agents

Agent Studio supports building voice agents that can converse in multiple languages. This guide covers how to configure agents for multilingual support.

Overview

Multilingual support involves three components:

  1. Language Detection - Determining the user's preferred language
  2. Prompts - Instructions written to respond in the correct language
  3. Voice Selection - Choosing the right TTS voice for the language
┌─────────────────────────────────────────────────────────────────┐
│                  Multilingual Voice Call                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   user_context.language = "ta"  (Tamil)                         │
│                   ↓                                              │
│   ┌─────────────────────────────────────────────────────────┐  │
│   │ Prompt: "Speak ONLY in Tamil..."                        │  │
│   └─────────────────────────────────────────────────────────┘  │
│                   ↓                                              │
│   ┌─────────────────────────────────────────────────────────┐  │
│   │ TTS: voices.ta = "tamil-female-voice-id"                │  │
│   └─────────────────────────────────────────────────────────┘  │
│                   ↓                                              │
│   Agent speaks in Tamil with Tamil voice                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Language Configuration

Supported Languages

Configure supported languages in the agent config:

{
  "config": {
    "languages": ["hi", "en", "ta", "te", "bn", "mr", "gu", "kn", "ml", "ur"],
    "default_language": "hi"
  }
}

Language Code Reference

CodeLanguage
hiHindi
enEnglish
taTamil
teTelugu
bnBengali
mrMarathi
guGujarati
knKannada
mlMalayalam
urUrdu

Backend Integration

The backend determines the user's language and provides it when dispatching calls.

Dispatch with Language

# Backend dispatches call with user's language
response = await client.post("/api/v1/calls", json={
    "workflow_slug": "meal-logging",
    "user_context": {
        "name": "Priya",
        "language": "ta",           # Tamil
        "language_name": "Tamil",   # Display name for prompts
        "voice_id": "tamil-female-voice-id"
    }
})

Dynamic Language Context

Provide all language-related context from the backend:

# In your backend application
def get_user_call_context(user_id: str) -> dict:
    user = get_user(user_id)
    language = user.preferred_language or "hi"
    
    return {
        "name": user.name,
        "language": language,
        "language_name": LANGUAGE_NAMES[language],
        "greeting": generate_greeting(language, user.name),
        "closing_message": get_closing_message(language)
    }

# Language name mapping
LANGUAGE_NAMES = {
    "hi": "Hindi",
    "en": "English",
    "ta": "Tamil",
    "te": "Telugu",
    "bn": "Bengali",
    "mr": "Marathi",
    "gu": "Gujarati",
    "kn": "Kannada",
    "ml": "Malayalam",
    "ur": "Urdu",
}

Prompt Design

Language Instructions

Include explicit language instructions in your system prompt:

{
  "prompt": {
    "system": "You are Tap Health Coach, a friendly health coach.\n\nLANGUAGE: Speak ONLY in {{user.language_name}}. All responses must be in {{user.language_name}}.\n\n- If Hindi: Use natural spoken Hindi (Hinglish mix is fine)\n- If English: Use simple, warm English\n- For regional languages: Use natural spoken form\n\n..."
  }
}

Language-Specific Style Guide

LANGUAGE & VOICE STYLE:

If {{user.language_name}} is Hindi:
- Use natural spoken Hindi (Hinglish mix acceptable)
- Use familiar terms: roti, dal, sabzi, chai, paratha
- "Namaste!" / "Aap kaise ho?"

If {{user.language_name}} is Tamil:
- Use natural spoken Tamil
- Use familiar terms: idli, dosa, sambar, rasam
- "Vanakkam!" / "Eppadi irukkinga?"

If {{user.language_name}} is Telugu:
- Use natural spoken Telugu
- Use familiar terms: pesarattu, upma, pappu
- "Namaskaram!" / "Ela unnaru?"

General:
- Keep responses SHORT (8-12 words max)
- Warm, supportive tone
- Simple everyday language

Dynamic Greetings Based on User Type

The greeting content and interruptibility depend on whether the user is new or returning:

New Users - Full Onboarding

New users receive a comprehensive welcome that explains the service. This greeting is non-interruptible.

# Full welcome script for new users
def get_welcome_script(name: str, language: str) -> str:
    lang_name = LANGUAGE_NAMES.get(language, "Hindi")
    return f"""IMPORTANT: Speak ONLY in {lang_name}.

Namaste {name}! Welcome to Tap Health.

I'm your nutrition coach, and I'll be calling you every day to help you log your meals.

Why is this important? What you eat directly affects your blood sugar. By logging your meals together, we can spot patterns and help you make better choices.

Here's how it works. Just tell me what you ate, how much, and around what time.

For example: 2 rotis with dal at 1 PM. Or 1 cup chai in the evening.

Now, let's log your meals for today."""

Returning Users - Quick Greeting

Returning users get a short greeting and can interrupt immediately:

# Short greetings for returning users
RETURNING_GREETINGS = {
    "hi": "Hi {name}! Aaj {meal} mein kya khaya?",
    "en": "Hi {name}! What did you have for {meal} today?",
    "ta": "வணக்கம் {name}! இன்று {meal} என்ன சாப்பிட்டீர்கள்?",
    "te": "హాయ్ {name}! ఈ రోజు {meal} ఏమి తిన్నారు?",
}

def generate_greeting(language: str, name: str, first_meal: str) -> str:
    template = RETURNING_GREETINGS.get(language, RETURNING_GREETINGS["hi"])
    return template.format(name=name, meal=first_meal)

Backend Determines Greeting Type

def get_user_call_context(user_id: str) -> dict:
    user = get_user(user_id)
    language = user.preferred_language or "hi"
    pending_meals = get_pending_meals(user_id)
    is_new_user = user.call_count == 0
    
    if is_new_user:
        greeting = get_welcome_script(user.name, language)
        greeting_interruptible = False  # Full onboarding
    else:
        greeting = generate_greeting(language, user.name, pending_meals[0])
        greeting_interruptible = True   # Can interrupt
    
    return {
        "name": user.name,
        "is_new_user": is_new_user,
        "greeting": greeting,
        "greeting_interruptible": greeting_interruptible,
        # ... other fields
    }

Agent Config

{
  "prompt": {
    "greeting": "{{user.greeting}}",
    "greeting_interruptible": "{{user.greeting_interruptible}}"
  }
}

Voice Configuration

Language-Specific Voices

Configure TTS with voice mappings per language:

{
  "tts": {
    "provider": "cartesia",
    "model": "sonic-3",
    "voice_id": "a167e0f3-df7e-4e3e-a5b4-3b3a6f3b8b3a",
    "voices": {
      "hi": "4cda7a46-e225-445d-8b18-0519d58ce5c0",
      "en": "b7d50908-b17c-442d-ad8d-810c63997ed9",
      "ta": "d4b3c8a2-f123-4567-8901-abcdef123456",
      "te": "e5c4d9b3-g234-5678-9012-bcdefg234567"
    }
  }
}

Dynamic Voice Selection

The voice_id can also be a template:

{
  "tts": {
    "voice_id": "{{user.voice_id}}"
  }
}

Backend provides the voice:

user_context = {
    "voice_id": get_voice_for_language(user.language)
}

STT Configuration

Multi-Language Detection

Use multilingual models for automatic language detection:

{
  "stt": {
    "provider": "deepgram",
    "model": "nova-3",
    "language": "multi"
  }
}

Fixed Language

For single-language agents:

{
  "stt": {
    "provider": "deepgram",
    "model": "nova-3",
    "language": "hi"
  }
}

Localized Messages

Closing Messages

Pre-compute localized system messages:

CLOSING_MESSAGES = {
    "hi": "आज लॉग करने के लिए धन्यवाद! जैसे ही आपका भोजन फीडबैक तैयार होगा, आपको नोटिफिकेशन मिलेगी!",
    "en": "Thanks for logging today! You'll receive a notification once your meal feedback is ready!",
    "ta": "இன்று பதிவு செய்ததற்கு நன்றி! உங்கள் உணவு பின்னூட்டம் தயாரானதும் அறிவிப்பு வரும்!",
    "te": "ఈ రోజు లాగ్ చేసినందుకు ధన్యవాదాలు! మీ భోజన ఫీడ్‌బ్యాక్ సిద్ధమైనప్పుడు మీకు నోటిఫికేషన్ వస్తుంది!",
    "bn": "আজ লগ করার জন্য ধন্যবাদ! আপনার খাবারের ফিডব্যাক তৈরি হলে আপনি নোটিফিকেশন পাবেন!",
}

def get_closing_message(language: str) -> str:
    return CLOSING_MESSAGES.get(language, CLOSING_MESSAGES["hi"])

In Prompts

CALL END SCRIPT:
When all meals are logged, say exactly:
"{{user.closing_message}}"

Complete Example

Single Meal Coach Agent

A single meal-coach-agent handles both new and returning users. The backend determines the appropriate greeting and interruptibility.

Agent Configuration

{
  "name": "meal-coach-agent",
  "config": {
    "prompt": {
      "system": "You are Tap Health Coach.\n\nLANGUAGE: Speak ONLY in {{user.language_name}}. All responses in {{user.language_name}}.\n\nCONTEXT:\n- User: {{user.name}}\n- Pending meals: {{user.pending_meals_display}}\n- Logged meals: {{user.logged_meals_display}}\n\nVOICE STYLE:\n- Warm, friendly\n- Short sentences (8-12 words)\n- Use familiar food terms",
      "greeting": "{{user.greeting}}",
      "greeting_interruptible": "{{user.greeting_interruptible}}",
      "variables": [
        { "name": "user.name", "default": "there" },
        { "name": "user.language_name", "default": "Hindi" },
        { "name": "user.greeting", "default": "Namaste!" },
        { "name": "user.greeting_interruptible", "default": true }
      ]
    },
    "stt": {
      "provider": "deepgram",
      "model": "nova-3",
      "language": "multi"
    },
    "tts": {
      "provider": "cartesia",
      "model": "sonic-3",
      "voice_id": "default-voice",
      "voices": {
        "hi": "hindi-voice-id",
        "en": "english-voice-id",
        "ta": "tamil-voice-id"
      }
    },
    "llm": {
      "provider": "gemini",
      "model": "gemini-2.0-flash-exp"
    },
    "languages": ["hi", "en", "ta", "te"],
    "default_language": "hi"
  }
}

Backend Call Dispatch

async def dispatch_meal_logging_call(user_id: str):
    user = await get_user(user_id)
    language = user.preferred_language or "hi"
    is_new_user = user.call_count == 0
    
    # Get pending/logged meals
    pending_meals = await get_pending_meals(user_id)
    logged_meals = await get_logged_meals(user_id)
    
    # Determine greeting based on user type
    if is_new_user:
        greeting = get_welcome_script(user.name, language)
        greeting_interruptible = False  # Full onboarding, don't interrupt
    else:
        greeting = generate_returning_greeting(language, user.name, pending_meals[0])
        greeting_interruptible = True   # Quick greeting, can interrupt
    
    # Build language-aware context
    user_context = {
        "name": user.name,
        "language": language,
        "language_name": LANGUAGE_NAMES[language],
        "is_new_user": is_new_user,
        
        # Dynamic greeting with interruptibility
        "greeting": greeting,
        "greeting_interruptible": greeting_interruptible,
        
        # Pre-computed display strings
        "closing_message": get_closing_message(language),
        
        # Meal data
        "pending_meals": pending_meals,
        "pending_meals_display": ", ".join(pending_meals) or "none",
        "logged_meals": logged_meals,
        "logged_meals_display": ", ".join(logged_meals) or "none",
        
        # Voice selection
        "voice_id": get_voice_for_language(language)
    }
    
    response = await client.post("/api/v1/calls", json={
        "workflow_slug": "meal-logging",
        "user_id": user_id,
        "user_context": user_context
    })
    
    return response.json()

Best Practices

1. Pre-compute Critical Messages

Don't rely on LLM to generate greetings/closings in correct language. Pre-compute them:

user_context = {
    "greeting": generate_greeting(language, name),
    "closing_message": get_closing_message(language),
    "error_message": get_error_message(language)
}

2. Use Strong Language Instructions

Be explicit in prompts:

LANGUAGE: Speak ONLY in {{user.language_name}}. 
This is MANDATORY. ALL responses must be in {{user.language_name}}.

3. Match Voice to Language

Always map voices to languages:

{
  "voices": {
    "hi": "hindi-voice-id",
    "ta": "tamil-voice-id"
  }
}

4. Use Multilingual STT

Let Deepgram auto-detect:

{
  "stt": {
    "language": "multi"
  }
}

5. Test Each Language

Test all supported languages:

  • Greeting pronunciation
  • Response language consistency
  • Voice naturalness
  • Cultural appropriateness

Multi-Tenant Considerations

Each tenant can have different:

  • Supported languages
  • Voice mappings
  • Greeting templates
  • Closing messages

Store tenant-specific language configs:

# Tenant settings
tenant.settings = {
    "supported_languages": ["hi", "en", "ta"],
    "default_language": "hi",
    "voice_mappings": {
        "hi": "tenant-hindi-voice",
        "en": "tenant-english-voice"
    },
    "greeting_templates": {
        "hi": "नमस्ते {name}! {company_name} से बोल रहे हैं।",
        "en": "Hi {name}! This is {company_name}."
    }
}

Troubleshooting

Agent responds in wrong language

  1. Check user.language_name is being passed
  2. Verify prompt has strong language instructions
  3. Check variable defaults

Voice doesn't match language

  1. Check voices mapping has the language
  2. Verify voice_id for that language is correct
  3. Check TTS provider supports the language

Greeting not in user's language

Use pre-computed greeting instead of LLM:

{
  "greeting": "{{user.greeting}}"
}

With backend providing:

user_context["greeting"] = generate_greeting(language, name)

On this page