Agent Studio

Providers

Speech, language model, and voice providers for voice AI

Providers

Providers are the AI services that power voice agents - speech recognition (STT), text-to-speech (TTS), language models (LLM), and voice activity detection (VAD).

Overview

┌─────────────────────────────────────────────────────────────────┐
│                         Voice Pipeline                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   User Speech   ┌─────┐   Text   ┌─────┐   Response   ┌─────┐  │
│  ────────────►  │ STT │ ───────► │ LLM │ ───────────► │ TTS │  │
│                 └─────┘          └─────┘              └─────┘  │
│                    │                                     │      │
│                    │ ┌─────┐                            │      │
│                    └─│ VAD │ (voice activity detection)─┘      │
│                      └─────┘                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Provider Types

Speech-to-Text (STT)

Converts user speech to text.

ProviderModelsLanguagesBest For
deepgramnova-3, nova-2multi, en, hi, ta, te, etc.Real-time streaming, Indian languages
sarvamsarvam-1hi, ta, te, bn, mr, gu, kn, mlIndian languages

Configuration:

{
  "stt": {
    "provider": "deepgram",
    "model": "nova-3",
    "language": "multi"
  }
}

Options:

OptionTypeDescription
languagestringLanguage code or "multi" for auto-detect
modelstringModel variant
interim_resultsbooleanStream partial results

Text-to-Speech (TTS)

Converts agent responses to speech.

ProviderModelsFeatures
cartesiasonic-3Emotion control, fast streaming, Indian voices
sarvambulbul:v1Native Indian language voices

Configuration:

{
  "tts": {
    "provider": "cartesia",
    "model": "sonic-3",
    "voice_id": "a167e0f3-df7e-4e3e-a5b4-3b3a6f3b8b3a",
    "options": {
      "emotion": ["Affectionate"],
      "speed": 1.0
    }
  }
}

Options:

OptionTypeDescription
voice_idstringVoice identifier (can be template)
emotionarrayEmotion tags (Cartesia)
speednumberSpeech speed multiplier

Language-specific voices:

{
  "tts": {
    "provider": "cartesia",
    "voice_id": "default-voice-id",
    "voices": {
      "hi": "hindi-female-voice-id",
      "en": "english-female-voice-id",
      "ta": "tamil-female-voice-id"
    }
  }
}

Language Model (LLM)

Powers the agent's intelligence and responses.

ProviderModelsFeatures
geminigemini-2.0-flash-exp, gemini-1.5-proFast, good multilingual
openaigpt-4o, gpt-4o-miniStrong reasoning

Configuration:

{
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.0-flash-exp",
    "temperature": 0.5,
    "max_tokens": 1024
  }
}

Options:

OptionTypeDescription
temperaturenumberResponse creativity (0.0-1.0)
max_tokensnumberMaximum response length

Voice Activity Detection (VAD)

Detects when the user starts/stops speaking.

ProviderFeatures
sileroFast, accurate, no API key needed

Configuration:

{
  "vad": {
    "provider": "silero",
    "activation_threshold": 0.5,
    "min_speech_duration_ms": 50,
    "min_silence_duration_ms": 150
  }
}

Options:

OptionTypeDescription
activation_thresholdnumberSensitivity (0.0-1.0)
min_speech_duration_msnumberMinimum speech to detect
min_silence_duration_msnumberSilence before end of utterance

Key Inheritance

Provider API keys follow an inheritance chain:

Priority: Agent Override → Tenant BYOK → Platform Default
┌─────────────────────────────────────────────────────────────────┐
│                      Key Resolution                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Agent Override     →  Specific agent uses different key     │
│          ↓                                                       │
│  2. Tenant BYOK        →  Tenant's own API keys                 │
│          ↓                                                       │
│  3. Platform Default   →  Platform-wide API keys                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Platform Keys (Environment)

Set via environment variables:

DEEPGRAM_API_KEY=xxx
CARTESIA_API_KEY=xxx
GOOGLE_API_KEY=xxx
OPENAI_API_KEY=xxx

Tenant BYOK (Dashboard)

Tenants can bring their own keys via Dashboard → Settings → Provider Keys:

{
  "provider_keys": {
    "deepgram": { "api_key": "tenant-deepgram-key" },
    "cartesia": { "api_key": "tenant-cartesia-key" },
    "gemini": { "api_key": "tenant-google-key" }
  }
}

Agent Override

Agents can override specific providers:

{
  "config": {
    "llm": {
      "provider": "openai",
      "model": "gpt-4o",
      "api_key": "agent-specific-openai-key"
    }
  }
}

Provider Factory

The ProviderFactory creates provider instances at runtime:

from agent_studio.worker.provider_factory import create_provider_factory

# Create factory with key chain
factory = create_provider_factory(
    platform_keys={"deepgram_api_key": "xxx", "cartesia_api_key": "xxx"},
    tenant_keys={"stt.deepgram": "tenant-key"},
    agent_keys={},
    default_language="hi"
)

# Create all providers for an agent
providers = factory.create_providers_for_agent(agent_config, language="hi")
# Returns: {"stt": DeepgramSTT, "tts": CartesiaTTS, "llm": GeminiLLM, "vad": SileroVAD}

Key Lookup

# Key lookup example for Deepgram STT
key = keys.get_key("stt", "deepgram")

# Checks in order:
# 1. agent["stt.deepgram"]      - Agent override
# 2. tenant["stt.deepgram"]     - Tenant BYOK
# 3. platform["deepgram_api_key"] - Platform default

Adding New Providers

Providers are registered via decorators:

1. Create Provider Class

# src/agent_studio/providers/stt/my_stt.py

from agent_studio.providers.base import STTProvider, BaseProviderConfig
from agent_studio.providers.registry import ProviderRegistry

@ProviderRegistry.register_stt("my_provider")
class MySTTProvider(STTProvider):
    """My custom STT provider."""

    def __init__(self, config: BaseProviderConfig):
        super().__init__(config)
        # Initialize with config.api_key

    def create(self, language: str, model: str | None = None, **kwargs):
        """Create LiveKit STT plugin instance."""
        # Return LiveKit-compatible STT plugin
        return MySTTPlugin(
            api_key=self.config.api_key,
            language=language,
            model=model or self.config.default_model,
        )

2. Register in init.py

# src/agent_studio/providers/stt/__init__.py

from agent_studio.providers.stt.deepgram import DeepgramSTT
from agent_studio.providers.stt.my_stt import MySTTProvider  # Add import

__all__ = ["DeepgramSTT", "MySTTProvider"]

3. Use in Agent Config

{
  "stt": {
    "provider": "my_provider",
    "model": "my-model",
    "language": "en"
  }
}

Multi-Tenant Isolation

Each tenant's provider keys are completely isolated:

  • Tenant A's Deepgram key never used for Tenant B's calls
  • Key decryption happens at runtime with tenant context
  • No key caching across tenant boundaries
┌─────────────────────┐    ┌─────────────────────┐
│    Tenant A Call    │    │    Tenant B Call    │
├─────────────────────┤    ├─────────────────────┤
│ deepgram: key-A     │    │ deepgram: key-B     │
│ cartesia: key-A     │    │ cartesia: key-B     │
│ gemini: (platform)  │    │ gemini: key-B       │
└─────────────────────┘    └─────────────────────┘

Best Practices

1. Use BYOK for Production

Tenants should use their own API keys for:

  • Cost attribution
  • Rate limit isolation
  • Compliance requirements

2. Select Providers Based on Language

{
  "stt": {
    "provider": "deepgram",
    "language": "multi"
  },
  "tts": {
    "provider": "cartesia",
    "voices": {
      "hi": "hindi-voice",
      "ta": "tamil-voice"
    }
  }
}

3. Tune VAD for Use Case

{
  "vad": {
    "activation_threshold": 0.3,
    "min_silence_duration_ms": 200
  }
}
  • Lower threshold = more sensitive (noisy environments)
  • Higher silence duration = longer pauses before response

4. Match LLM to Task Complexity

Use CaseRecommended
Simple Q&A, meal logginggemini-2.0-flash-exp
Complex reasoninggpt-4o
Cost-sensitivegpt-4o-mini

Troubleshooting

"No API key found for provider"

Check key inheritance:

  1. Agent config has key?
  2. Tenant has BYOK configured?
  3. Platform env var set?

"Unknown provider: xxx"

Provider not registered. Check:

  1. Provider class has @ProviderRegistry.register_xxx() decorator
  2. Provider module imported in __init__.py

Voice doesn't match language

Check voices mapping in TTS config:

{
  "tts": {
    "voices": {
      "hi": "correct-hindi-voice-id"
    }
  }
}

On this page