Architecture Overview
High-level system architecture of Agent Studio
Architecture Overview
Agent Studio follows a clean architecture pattern with clear separation between domain logic, infrastructure, and presentation layers.
System Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ Dashboard (Next.js) │ API Consumers │ Phone (PSTN/SIP) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ API LAYER (FastAPI) │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Auth │ │ Agents │ │ Workflows │ │ Tools │ │ Calls │ │
│ │ Router │ │ Router │ │ Router │ │ Router │ │ Router │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ Middleware │ │
│ │ Rate Limit │ Auth │ Logging │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────────────┐ ┌─────────────┐ ┌─────────────────────────┐
│ CORE DOMAIN │ │ WORKER │ │ PROVIDERS │
│ ┌─────────────────┐ │ │ (LiveKit) │ │ ┌─────────────────┐ │
│ │ Workflow Runner │ │ │ │ │ │ STT: Deepgram │ │
│ │ Tool Executor │ │ │ Handles │ │ │ Sarvam │ │
│ │ Context Manager │ │ │ Voice │ │ ├─────────────────┤ │
│ │ Handoff Logic │ │ │ Sessions │ │ │ TTS: Cartesia │ │
│ └─────────────────┘ │ │ │ │ │ Sarvam │ │
└─────────────────────────┘ └─────────────┘ │ ├─────────────────┤ │
│ │ LLM: Gemini │ │
│ │ OpenAI │ │
│ └─────────────────┘ │
└─────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ LiveKit │ │
│ │ (Primary) │ │ (Cache) │ │ (WebRTC) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘Component Responsibilities
API Layer (src/agent_studio/api/)
- HTTP request handling
- Authentication (JWT + API Keys)
- Rate limiting
- Request validation
- OpenAPI documentation
Core Domain (src/agent_studio/core/)
- Business logic (no framework dependencies)
- Workflow orchestration
- Tool execution engine
- Context management
- Handoff detection
Providers (src/agent_studio/providers/)
- STT/TTS/LLM/VAD abstractions
- Registry pattern for provider management
- BYOK credential handling
Worker (src/agent_studio/worker/)
- LiveKit agent process
- Voice session management
- Real-time audio streaming
- SIP participant creation for phone calls
Database (src/agent_studio/db/)
- SQLAlchemy ORM models
- Repository pattern
- Alembic migrations
Call Types
Agent Studio supports two types of voice calls:
| Type | Description | Use Case |
|---|---|---|
| VoIP | WebRTC-based in-app calls | User connects via browser/mobile app microphone |
| SIP | Phone calls via PSTN | Outbound calls to phone numbers via Twilio |
VoIP Call Flow
- User initiates call from dashboard/app
- API creates LiveKit room and returns token
- User joins room via WebRTC
- Worker handles voice session
SIP Call Flow
- Backend system initiates call via API with phone number
- API creates LiveKit room and dispatches worker
- Worker creates SIP participant via LiveKit SIP service
- LiveKit SIP dials out through Twilio trunk
- Phone rings, user answers
- Voice session proceeds normally
Request Flow
API Request
- Request hits FastAPI router
- Middleware processes (auth, rate limit, logging)
- Dependency injection provides services
- Business logic executes
- Response returned
Voice Call Flow
- Call initiated via API
- LiveKit room created with metadata
- Worker picks up job
- For SIP calls: Worker creates SIP participant
- Workflow runner builds agents
- Agents execute with handoffs
- Call ends, results persisted
Deployment Topology
Agent Studio supports self-hosted LiveKit with SIP telephony capability.
Self-Hosted LiveKit + SIP Architecture
┌─────────────────────────────────────────────────────────────────┐
│ VM / Cloud Instance │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Caddy │ │ API │ │ Workers │ │
│ │ (HTTPS) │ │ (FastAPI) │ │ (Python) │ │
│ │ :80/:443 │ │ :8000 │ │ │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────────┐ │ │
│ │ LiveKit │◄─────│ SIP │◄────────────┘ │
│ │ Server │ │ Server │ │
│ │ :7880/:7881 │ │ :5060 │ │
│ │ :7882 │ │ :10000-20000 │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └─────────┬───────────┘ │
│ │ │
│ ┌──────────────┐ │ ┌──────────────┐ │
│ │ PostgreSQL │ │ │ Redis │ │
│ │ :5432 │ │ │ :6379 │ │
│ └──────────────┘ │ └──────────────┘ │
│ │ │
└───────────────────┼─────────────────────────────────────────────┘
│
│ SIP (port 5060) + RTP (ports 10000-20000)
▼
┌─────────────────────────────────────────────────────────────────┐
│ TWILIO │
│ Elastic SIP Trunking │
│ │
│ Outbound: Dial to any phone number │
│ Inbound: Route calls to LiveKit rooms │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PSTN │
│ Phone Networks │
└─────────────────────────────────────────────────────────────────┘Port Requirements
| Port | Protocol | Service | Access |
|---|---|---|---|
| 80/443 | TCP | Caddy (HTTPS) | Public |
| 7880 | TCP | LiveKit WebSocket | Internal |
| 7881 | TCP | LiveKit RTC over TCP | Public |
| 7882 | UDP | LiveKit TURN/TLS | Public |
| 5060 | UDP/TCP | SIP Signaling | Public |
| 10000-20000 | UDP | RTP Media | Public |
Scaling Guidelines
| Component | Scaling Strategy |
|---|---|
| API | Horizontal (stateless) |
| Worker | Horizontal (1 worker per ~10 concurrent calls) |
| PostgreSQL | Vertical + read replicas |
| Redis | Cluster mode |
| LiveKit | Vertical (single instance for moderate load) |
| SIP | Vertical (single instance) |