Providers
Speech, language model, and voice providers for voice AI
Providers
Providers are the AI services that power voice agents - speech recognition (STT), text-to-speech (TTS), language models (LLM), and voice activity detection (VAD).
Overview
┌─────────────────────────────────────────────────────────────────┐
│ Voice Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ User Speech ┌─────┐ Text ┌─────┐ Response ┌─────┐ │
│ ────────────► │ STT │ ───────► │ LLM │ ───────────► │ TTS │ │
│ └─────┘ └─────┘ └─────┘ │
│ │ │ │
│ │ ┌─────┐ │ │
│ └─│ VAD │ (voice activity detection)─┘ │
│ └─────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Provider Types
Speech-to-Text (STT)
Converts user speech to text.
| Provider | Models | Languages | Best For |
|---|---|---|---|
deepgram | nova-3, nova-2 | multi, en, hi, ta, te, etc. | Real-time streaming, Indian languages |
sarvam | sarvam-1 | hi, ta, te, bn, mr, gu, kn, ml | Indian languages |
Configuration:
{
"stt": {
"provider": "deepgram",
"model": "nova-3",
"language": "multi"
}
}Options:
| Option | Type | Description |
|---|---|---|
language | string | Language code or "multi" for auto-detect |
model | string | Model variant |
interim_results | boolean | Stream partial results |
Text-to-Speech (TTS)
Converts agent responses to speech.
| Provider | Models | Features |
|---|---|---|
cartesia | sonic-3 | Emotion control, fast streaming, Indian voices |
sarvam | bulbul:v1 | Native Indian language voices |
Configuration:
{
"tts": {
"provider": "cartesia",
"model": "sonic-3",
"voice_id": "a167e0f3-df7e-4e3e-a5b4-3b3a6f3b8b3a",
"options": {
"emotion": ["Affectionate"],
"speed": 1.0
}
}
}Options:
| Option | Type | Description |
|---|---|---|
voice_id | string | Voice identifier (can be template) |
emotion | array | Emotion tags (Cartesia) |
speed | number | Speech speed multiplier |
Language-specific voices:
{
"tts": {
"provider": "cartesia",
"voice_id": "default-voice-id",
"voices": {
"hi": "hindi-female-voice-id",
"en": "english-female-voice-id",
"ta": "tamil-female-voice-id"
}
}
}Language Model (LLM)
Powers the agent's intelligence and responses.
| Provider | Models | Features |
|---|---|---|
gemini | gemini-2.0-flash-exp, gemini-1.5-pro | Fast, good multilingual |
openai | gpt-4o, gpt-4o-mini | Strong reasoning |
Configuration:
{
"llm": {
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"temperature": 0.5,
"max_tokens": 1024
}
}Options:
| Option | Type | Description |
|---|---|---|
temperature | number | Response creativity (0.0-1.0) |
max_tokens | number | Maximum response length |
Voice Activity Detection (VAD)
Detects when the user starts/stops speaking.
| Provider | Features |
|---|---|
silero | Fast, accurate, no API key needed |
Configuration:
{
"vad": {
"provider": "silero",
"activation_threshold": 0.5,
"min_speech_duration_ms": 50,
"min_silence_duration_ms": 150
}
}Options:
| Option | Type | Description |
|---|---|---|
activation_threshold | number | Sensitivity (0.0-1.0) |
min_speech_duration_ms | number | Minimum speech to detect |
min_silence_duration_ms | number | Silence before end of utterance |
Key Inheritance
Provider API keys follow an inheritance chain:
Priority: Agent Override → Tenant BYOK → Platform Default┌─────────────────────────────────────────────────────────────────┐
│ Key Resolution │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Agent Override → Specific agent uses different key │
│ ↓ │
│ 2. Tenant BYOK → Tenant's own API keys │
│ ↓ │
│ 3. Platform Default → Platform-wide API keys │
│ │
└─────────────────────────────────────────────────────────────────┘Platform Keys (Environment)
Set via environment variables:
DEEPGRAM_API_KEY=xxx
CARTESIA_API_KEY=xxx
GOOGLE_API_KEY=xxx
OPENAI_API_KEY=xxxTenant BYOK (Dashboard)
Tenants can bring their own keys via Dashboard → Settings → Provider Keys:
{
"provider_keys": {
"deepgram": { "api_key": "tenant-deepgram-key" },
"cartesia": { "api_key": "tenant-cartesia-key" },
"gemini": { "api_key": "tenant-google-key" }
}
}Agent Override
Agents can override specific providers:
{
"config": {
"llm": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "agent-specific-openai-key"
}
}
}Provider Factory
The ProviderFactory creates provider instances at runtime:
from agent_studio.worker.provider_factory import create_provider_factory
# Create factory with key chain
factory = create_provider_factory(
platform_keys={"deepgram_api_key": "xxx", "cartesia_api_key": "xxx"},
tenant_keys={"stt.deepgram": "tenant-key"},
agent_keys={},
default_language="hi"
)
# Create all providers for an agent
providers = factory.create_providers_for_agent(agent_config, language="hi")
# Returns: {"stt": DeepgramSTT, "tts": CartesiaTTS, "llm": GeminiLLM, "vad": SileroVAD}Key Lookup
# Key lookup example for Deepgram STT
key = keys.get_key("stt", "deepgram")
# Checks in order:
# 1. agent["stt.deepgram"] - Agent override
# 2. tenant["stt.deepgram"] - Tenant BYOK
# 3. platform["deepgram_api_key"] - Platform defaultAdding New Providers
Providers are registered via decorators:
1. Create Provider Class
# src/agent_studio/providers/stt/my_stt.py
from agent_studio.providers.base import STTProvider, BaseProviderConfig
from agent_studio.providers.registry import ProviderRegistry
@ProviderRegistry.register_stt("my_provider")
class MySTTProvider(STTProvider):
"""My custom STT provider."""
def __init__(self, config: BaseProviderConfig):
super().__init__(config)
# Initialize with config.api_key
def create(self, language: str, model: str | None = None, **kwargs):
"""Create LiveKit STT plugin instance."""
# Return LiveKit-compatible STT plugin
return MySTTPlugin(
api_key=self.config.api_key,
language=language,
model=model or self.config.default_model,
)2. Register in init.py
# src/agent_studio/providers/stt/__init__.py
from agent_studio.providers.stt.deepgram import DeepgramSTT
from agent_studio.providers.stt.my_stt import MySTTProvider # Add import
__all__ = ["DeepgramSTT", "MySTTProvider"]3. Use in Agent Config
{
"stt": {
"provider": "my_provider",
"model": "my-model",
"language": "en"
}
}Multi-Tenant Isolation
Each tenant's provider keys are completely isolated:
- Tenant A's Deepgram key never used for Tenant B's calls
- Key decryption happens at runtime with tenant context
- No key caching across tenant boundaries
┌─────────────────────┐ ┌─────────────────────┐
│ Tenant A Call │ │ Tenant B Call │
├─────────────────────┤ ├─────────────────────┤
│ deepgram: key-A │ │ deepgram: key-B │
│ cartesia: key-A │ │ cartesia: key-B │
│ gemini: (platform) │ │ gemini: key-B │
└─────────────────────┘ └─────────────────────┘Best Practices
1. Use BYOK for Production
Tenants should use their own API keys for:
- Cost attribution
- Rate limit isolation
- Compliance requirements
2. Select Providers Based on Language
{
"stt": {
"provider": "deepgram",
"language": "multi"
},
"tts": {
"provider": "cartesia",
"voices": {
"hi": "hindi-voice",
"ta": "tamil-voice"
}
}
}3. Tune VAD for Use Case
{
"vad": {
"activation_threshold": 0.3,
"min_silence_duration_ms": 200
}
}- Lower threshold = more sensitive (noisy environments)
- Higher silence duration = longer pauses before response
4. Match LLM to Task Complexity
| Use Case | Recommended |
|---|---|
| Simple Q&A, meal logging | gemini-2.0-flash-exp |
| Complex reasoning | gpt-4o |
| Cost-sensitive | gpt-4o-mini |
Troubleshooting
"No API key found for provider"
Check key inheritance:
- Agent config has key?
- Tenant has BYOK configured?
- Platform env var set?
"Unknown provider: xxx"
Provider not registered. Check:
- Provider class has
@ProviderRegistry.register_xxx()decorator - Provider module imported in
__init__.py
Voice doesn't match language
Check voices mapping in TTS config:
{
"tts": {
"voices": {
"hi": "correct-hindi-voice-id"
}
}
}