Providers

Providers are the AI services that power voice agents - speech recognition (STT), text-to-speech (TTS), language models (LLM), and voice activity detection (VAD).

Overview

┌─────────────────────────────────────────────────────────────────┐
│                         Voice Pipeline                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   User Speech   ┌─────┐   Text   ┌─────┐   Response   ┌─────┐  │
│  ────────────►  │ STT │ ───────► │ LLM │ ───────────► │ TTS │  │
│                 └─────┘          └─────┘              └─────┘  │
│                    │                                     │      │
│                    │ ┌─────┐                            │      │
│                    └─│ VAD │ (voice activity detection)─┘      │
│                      └─────┘                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Provider Types

Speech-to-Text (STT)

Converts user speech to text.

Provider	Models	Languages	Best For
`deepgram`	nova-3, nova-2	multi, en, hi, ta, te, etc.	Real-time streaming, Indian languages
`sarvam`	sarvam-1	hi, ta, te, bn, mr, gu, kn, ml	Indian languages

Configuration:

{
  "stt": {
    "provider": "deepgram",
    "model": "nova-3",
    "language": "multi"
  }
}

Options:

Option	Type	Description
`language`	string	Language code or "multi" for auto-detect
`model`	string	Model variant
`interim_results`	boolean	Stream partial results

Text-to-Speech (TTS)

Converts agent responses to speech.

Provider	Models	Features
`cartesia`	sonic-3	Emotion control, fast streaming, Indian voices
`sarvam`	bulbul:v1	Native Indian language voices

Configuration:

{
  "tts": {
    "provider": "cartesia",
    "model": "sonic-3",
    "voice_id": "a167e0f3-df7e-4e3e-a5b4-3b3a6f3b8b3a",
    "options": {
      "emotion": ["Affectionate"],
      "speed": 1.0
    }
  }
}

Options:

Option	Type	Description
`voice_id`	string	Voice identifier (can be template)
`emotion`	array	Emotion tags (Cartesia)
`speed`	number	Speech speed multiplier

Language-specific voices:

{
  "tts": {
    "provider": "cartesia",
    "voice_id": "default-voice-id",
    "voices": {
      "hi": "hindi-female-voice-id",
      "en": "english-female-voice-id",
      "ta": "tamil-female-voice-id"
    }
  }
}

Language Model (LLM)

Powers the agent's intelligence and responses.

Provider	Models	Features
`gemini`	gemini-2.0-flash-exp, gemini-1.5-pro	Fast, good multilingual
`openai`	gpt-4o, gpt-4o-mini	Strong reasoning

Configuration:

{
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.0-flash-exp",
    "temperature": 0.5,
    "max_tokens": 1024
  }
}

Options:

Option	Type	Description
`temperature`	number	Response creativity (0.0-1.0)
`max_tokens`	number	Maximum response length

Voice Activity Detection (VAD)

Detects when the user starts/stops speaking.

Provider	Features
`silero`	Fast, accurate, no API key needed

Configuration:

{
  "vad": {
    "provider": "silero",
    "activation_threshold": 0.5,
    "min_speech_duration_ms": 50,
    "min_silence_duration_ms": 150
  }
}

Options:

Option	Type	Description
`activation_threshold`	number	Sensitivity (0.0-1.0)
`min_speech_duration_ms`	number	Minimum speech to detect
`min_silence_duration_ms`	number	Silence before end of utterance

Key Inheritance

Provider API keys follow an inheritance chain:

Priority: Agent Override → Tenant BYOK → Platform Default

┌─────────────────────────────────────────────────────────────────┐
│                      Key Resolution                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Agent Override     →  Specific agent uses different key     │
│          ↓                                                       │
│  2. Tenant BYOK        →  Tenant's own API keys                 │
│          ↓                                                       │
│  3. Platform Default   →  Platform-wide API keys                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Platform Keys (Environment)

Set via environment variables:

DEEPGRAM_API_KEY=xxx
CARTESIA_API_KEY=xxx
GOOGLE_API_KEY=xxx
OPENAI_API_KEY=xxx

Tenant BYOK (Dashboard)

Tenants can bring their own keys via Dashboard → Settings → Provider Keys:

{
  "provider_keys": {
    "deepgram": { "api_key": "tenant-deepgram-key" },
    "cartesia": { "api_key": "tenant-cartesia-key" },
    "gemini": { "api_key": "tenant-google-key" }
  }
}

Agent Override

Agents can override specific providers:

{
  "config": {
    "llm": {
      "provider": "openai",
      "model": "gpt-4o",
      "api_key": "agent-specific-openai-key"
    }
  }
}

Provider Factory

The ProviderFactory creates provider instances at runtime:

from agent_studio.worker.provider_factory import create_provider_factory

# Create factory with key chain
factory = create_provider_factory(
    platform_keys={"deepgram_api_key": "xxx", "cartesia_api_key": "xxx"},
    tenant_keys={"stt.deepgram": "tenant-key"},
    agent_keys={},
    default_language="hi"
)

# Create all providers for an agent
providers = factory.create_providers_for_agent(agent_config, language="hi")
# Returns: {"stt": DeepgramSTT, "tts": CartesiaTTS, "llm": GeminiLLM, "vad": SileroVAD}

Key Lookup

# Key lookup example for Deepgram STT
key = keys.get_key("stt", "deepgram")

# Checks in order:
# 1. agent["stt.deepgram"]      - Agent override
# 2. tenant["stt.deepgram"]     - Tenant BYOK
# 3. platform["deepgram_api_key"] - Platform default

Adding New Providers

Providers are registered via decorators:

1. Create Provider Class

# src/agent_studio/providers/stt/my_stt.py

from agent_studio.providers.base import STTProvider, BaseProviderConfig
from agent_studio.providers.registry import ProviderRegistry

@ProviderRegistry.register_stt("my_provider")
class MySTTProvider(STTProvider):
    """My custom STT provider."""

    def __init__(self, config: BaseProviderConfig):
        super().__init__(config)
        # Initialize with config.api_key

    def create(self, language: str, model: str | None = None, **kwargs):
        """Create LiveKit STT plugin instance."""
        # Return LiveKit-compatible STT plugin
        return MySTTPlugin(
            api_key=self.config.api_key,
            language=language,
            model=model or self.config.default_model,
        )

2. Register in init.py

# src/agent_studio/providers/stt/__init__.py

from agent_studio.providers.stt.deepgram import DeepgramSTT
from agent_studio.providers.stt.my_stt import MySTTProvider  # Add import

__all__ = ["DeepgramSTT", "MySTTProvider"]

3. Use in Agent Config

{
  "stt": {
    "provider": "my_provider",
    "model": "my-model",
    "language": "en"
  }
}

Multi-Tenant Isolation

Each tenant's provider keys are completely isolated:

Tenant A's Deepgram key never used for Tenant B's calls
Key decryption happens at runtime with tenant context
No key caching across tenant boundaries

┌─────────────────────┐    ┌─────────────────────┐
│    Tenant A Call    │    │    Tenant B Call    │
├─────────────────────┤    ├─────────────────────┤
│ deepgram: key-A     │    │ deepgram: key-B     │
│ cartesia: key-A     │    │ cartesia: key-B     │
│ gemini: (platform)  │    │ gemini: key-B       │
└─────────────────────┘    └─────────────────────┘

Best Practices

1. Use BYOK for Production

Tenants should use their own API keys for:

Cost attribution
Rate limit isolation
Compliance requirements

2. Select Providers Based on Language

{
  "stt": {
    "provider": "deepgram",
    "language": "multi"
  },
  "tts": {
    "provider": "cartesia",
    "voices": {
      "hi": "hindi-voice",
      "ta": "tamil-voice"
    }
  }
}

3. Tune VAD for Use Case

{
  "vad": {
    "activation_threshold": 0.3,
    "min_silence_duration_ms": 200
  }
}

Lower threshold = more sensitive (noisy environments)
Higher silence duration = longer pauses before response

4. Match LLM to Task Complexity

Use Case	Recommended
Simple Q&A, meal logging	gemini-2.0-flash-exp
Complex reasoning	gpt-4o
Cost-sensitive	gpt-4o-mini

Troubleshooting

"No API key found for provider"

Check key inheritance:

Agent config has key?
Tenant has BYOK configured?
Platform env var set?

"Unknown provider: xxx"

Provider not registered. Check:

Provider class has @ProviderRegistry.register_xxx() decorator
Provider module imported in __init__.py

Voice doesn't match language

Check voices mapping in TTS config:

{
  "tts": {
    "voices": {
      "hi": "correct-hindi-voice-id"
    }
  }
}

Providers

On this page