Agent Studio

Architecture Overview

High-level system architecture of Agent Studio

Architecture Overview

Agent Studio follows a clean architecture pattern with clear separation between domain logic, infrastructure, and presentation layers.

System Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENTS                                         │
│          Dashboard (Next.js) │ API Consumers │ Phone (PSTN/SIP)             │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│                           API LAYER (FastAPI)                                │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐    │
│  │   Auth    │ │  Agents   │ │ Workflows │ │   Tools   │ │   Calls   │    │
│  │  Router   │ │  Router   │ │  Router   │ │  Router   │ │  Router   │    │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘ └───────────┘    │
│                           │                                                  │
│                  ┌────────┴────────┐                                        │
│                  │   Middleware    │                                        │
│                  │  Rate Limit │ Auth │ Logging                             │
│                  └─────────────────┘                                        │
└─────────────────────────────────────────────────────────────────────────────┘

                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────┐ ┌─────────────┐ ┌─────────────────────────┐
│      CORE DOMAIN        │ │   WORKER    │ │      PROVIDERS          │
│  ┌─────────────────┐    │ │  (LiveKit)  │ │  ┌─────────────────┐    │
│  │ Workflow Runner │    │ │             │ │  │  STT: Deepgram  │    │
│  │ Tool Executor   │    │ │  Handles    │ │  │       Sarvam    │    │
│  │ Context Manager │    │ │  Voice      │ │  ├─────────────────┤    │
│  │ Handoff Logic   │    │ │  Sessions   │ │  │  TTS: Cartesia  │    │
│  └─────────────────┘    │ │             │ │  │       Sarvam    │    │
└─────────────────────────┘ └─────────────┘ │  ├─────────────────┤    │
                                            │  │  LLM: Gemini    │    │
                                            │  │       OpenAI    │    │
                                            │  └─────────────────┘    │
                                            └─────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATA LAYER                                           │
│     ┌─────────────┐          ┌─────────────┐          ┌─────────────┐       │
│     │ PostgreSQL  │          │    Redis    │          │   LiveKit   │       │
│     │ (Primary)   │          │  (Cache)    │          │  (WebRTC)   │       │
│     └─────────────┘          └─────────────┘          └─────────────┘       │
└─────────────────────────────────────────────────────────────────────────────┘

Component Responsibilities

API Layer (src/agent_studio/api/)

  • HTTP request handling
  • Authentication (JWT + API Keys)
  • Rate limiting
  • Request validation
  • OpenAPI documentation

Core Domain (src/agent_studio/core/)

  • Business logic (no framework dependencies)
  • Workflow orchestration
  • Tool execution engine
  • Context management
  • Handoff detection

Providers (src/agent_studio/providers/)

  • STT/TTS/LLM/VAD abstractions
  • Registry pattern for provider management
  • BYOK credential handling

Worker (src/agent_studio/worker/)

  • LiveKit agent process
  • Voice session management
  • Real-time audio streaming
  • SIP participant creation for phone calls

Database (src/agent_studio/db/)

  • SQLAlchemy ORM models
  • Repository pattern
  • Alembic migrations

Call Types

Agent Studio supports two types of voice calls:

TypeDescriptionUse Case
VoIPWebRTC-based in-app callsUser connects via browser/mobile app microphone
SIPPhone calls via PSTNOutbound calls to phone numbers via Twilio

VoIP Call Flow

  1. User initiates call from dashboard/app
  2. API creates LiveKit room and returns token
  3. User joins room via WebRTC
  4. Worker handles voice session

SIP Call Flow

  1. Backend system initiates call via API with phone number
  2. API creates LiveKit room and dispatches worker
  3. Worker creates SIP participant via LiveKit SIP service
  4. LiveKit SIP dials out through Twilio trunk
  5. Phone rings, user answers
  6. Voice session proceeds normally

Request Flow

API Request

  1. Request hits FastAPI router
  2. Middleware processes (auth, rate limit, logging)
  3. Dependency injection provides services
  4. Business logic executes
  5. Response returned

Voice Call Flow

  1. Call initiated via API
  2. LiveKit room created with metadata
  3. Worker picks up job
  4. For SIP calls: Worker creates SIP participant
  5. Workflow runner builds agents
  6. Agents execute with handoffs
  7. Call ends, results persisted

Deployment Topology

Agent Studio supports self-hosted LiveKit with SIP telephony capability.

Self-Hosted LiveKit + SIP Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      VM / Cloud Instance                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐      ┌──────────────┐      ┌──────────────┐  │
│  │    Caddy     │      │     API      │      │   Workers    │  │
│  │   (HTTPS)    │      │  (FastAPI)   │      │   (Python)   │  │
│  │   :80/:443   │      │    :8000     │      │              │  │
│  └──────────────┘      └──────────────┘      └──────┬───────┘  │
│                                                      │          │
│  ┌──────────────┐      ┌──────────────┐             │          │
│  │   LiveKit    │◄─────│     SIP      │◄────────────┘          │
│  │   Server     │      │   Server     │                        │
│  │ :7880/:7881  │      │    :5060     │                        │
│  │    :7882     │      │ :10000-20000 │                        │
│  └──────────────┘      └──────────────┘                        │
│         │                     │                                 │
│         └─────────┬───────────┘                                │
│                   │                                             │
│  ┌──────────────┐ │ ┌──────────────┐                           │
│  │  PostgreSQL  │ │ │    Redis     │                           │
│  │    :5432     │ │ │    :6379     │                           │
│  └──────────────┘ │ └──────────────┘                           │
│                   │                                             │
└───────────────────┼─────────────────────────────────────────────┘

                    │ SIP (port 5060) + RTP (ports 10000-20000)

┌─────────────────────────────────────────────────────────────────┐
│                          TWILIO                                  │
│                    Elastic SIP Trunking                          │
│                                                                  │
│              Outbound: Dial to any phone number                 │
│              Inbound: Route calls to LiveKit rooms              │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                           PSTN                                   │
│                      Phone Networks                              │
└─────────────────────────────────────────────────────────────────┘

Port Requirements

PortProtocolServiceAccess
80/443TCPCaddy (HTTPS)Public
7880TCPLiveKit WebSocketInternal
7881TCPLiveKit RTC over TCPPublic
7882UDPLiveKit TURN/TLSPublic
5060UDP/TCPSIP SignalingPublic
10000-20000UDPRTP MediaPublic

Scaling Guidelines

ComponentScaling Strategy
APIHorizontal (stateless)
WorkerHorizontal (1 worker per ~10 concurrent calls)
PostgreSQLVertical + read replicas
RedisCluster mode
LiveKitVertical (single instance for moderate load)
SIPVertical (single instance)

On this page