Agent Studio
Adr

ADR-010: Production Deployment Strategy

Docker Compose-based production deployment with future Kubernetes path

ADR-010: Production Deployment Strategy

Status

Accepted

Context

Agent Studio needs a production deployment strategy that balances:

  1. Simplicity: Easy to deploy and operate initially
  2. Scalability: Ability to handle growing call volumes
  3. Reliability: High availability for production workloads
  4. Future-proof: Clear path to more sophisticated orchestration

Current scale requirements:

  • 10-100 concurrent calls initially
  • Single-region deployment
  • Small ops team

Future considerations:

  • Per-agent worker pools for resource isolation
  • Auto-scaling based on call queue depth
  • Multi-region deployment

Decision

Phase 1: Docker Compose (Current)

Use Docker Compose for production deployment with manual horizontal scaling.

Architecture:

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer                         │
│                   (Caddy/nginx)                          │
└─────────────────────┬───────────────────────────────────┘

        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│  API (1)  │ │  API (2)  │ │  API (n)  │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
      │             │             │
      └─────────────┼─────────────┘

        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker(1) │ │ Worker(2) │ │ Worker(n) │
└───────────┘ └───────────┘ └───────────┘
        │           │           │
        └───────────┼───────────┘

    ┌───────────────┼───────────────┐
    │               │               │
    ▼               ▼               ▼
┌────────┐    ┌─────────┐    ┌──────────┐
│PostgreSQL│   │  Redis  │    │ LiveKit  │
└────────┘    └─────────┘    │  + SIP   │
                             └──────────┘

Self-Hosted LiveKit + SIP Architecture

For deployments with PSTN calling requirements, the architecture extends to include LiveKit SIP:

┌─────────────────────────────────────────────────────────┐
│                      Production VM                       │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌─────────┐ │
│  │  Caddy   │  │   API    │  │  Worker  │  │ LiveKit │ │
│  │ :80/:443 │  │  :8000   │  │          │  │  :7880  │ │
│  └──────────┘  └──────────┘  └──────────┘  └─────────┘ │
│                                                │         │
│                                           ┌────┴────┐   │
│                                           │   SIP   │   │
│                                           │  :5060  │   │
│                                           └────┬────┘   │
│                                                │         │
└────────────────────────────────────────────────┼─────────┘

                                           SIP Trunk


                                        ┌───────────────┐
                                        │    Twilio     │
                                        │   SIP Trunk   │
                                        └───────────────┘

SIP Port Requirements:

PortProtocolService
7880TCPLiveKit WebSocket/HTTP
7881TCPLiveKit RTC over TCP
7882UDPLiveKit TURN/TLS
5060UDP/TCPSIP Signaling
10000-20000UDPRTP Media

See ADR-012: Self-Hosted LiveKit SIP Strategy for detailed implementation.

Scaling:

# Scale API servers
docker compose -f docker-compose.prod.yml up -d --scale api=3

# Scale workers (1 worker per ~10 concurrent calls)
docker compose -f docker-compose.prod.yml up -d --scale worker=5

Benefits:

  • Simple deployment and debugging
  • Easy local replication of production issues
  • No Kubernetes expertise required
  • Lower infrastructure cost initially

Phase 2: Kubernetes (Future)

When needed, migrate to Kubernetes for:

Per-Agent Worker Pools:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-meal-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      agent: meal-agent
  template:
    spec:
      containers:
      - name: worker
        env:
        - name: AGENT_FILTER
          value: "meal-agent"
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"

Auto-scaling with KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: redis
    metadata:
      address: redis:6379
      listName: call-queue
      listLength: "5"  # Scale up when queue > 5

When to Consider Kubernetes

TriggerCurrent StateKubernetes Needed
Concurrent calls< 100> 100 consistently
Auto-scaling needManual is OKMust scale automatically
Per-agent isolationNot neededResource isolation required
Multi-regionSingle regionMultiple regions
Team expertiseLimited K8sK8s expertise available

Alternatives Considered

1. Kubernetes from Day 1

Pros: Future-proof, auto-scaling built-in Cons: Operational complexity, overkill for current scale

Decision: Docker Compose is simpler. We can migrate when needed.

2. Serverless (AWS Lambda, Cloud Run)

Pros: Auto-scaling, pay-per-use Cons: Cold starts problematic for real-time voice, connection limits

Decision: Voice agents need persistent connections. Serverless doesn't fit.

3. Managed Container Services (ECS, Cloud Run)

Pros: Managed infrastructure, auto-scaling Cons: Vendor lock-in, more complex than Compose

Decision: Good middle ground. Document as alternative to K8s.

4. LiveKit Cloud vs Self-Hosted

Pros of Cloud: Managed, no SIP infrastructure to maintain Cons of Cloud: Higher cost at scale, less control, SIP trunk complexity

Decision: Self-hosted LiveKit with SIP for full control over telephony. See ADR-012.

Implementation

docker-compose.prod.yml

services:
  api:
    image: agent-studio-api:${VERSION:-latest}
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1'
          memory: 1G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/ready"]
      interval: 30s
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_healthy }

  worker:
    image: agent-studio-worker:${VERSION:-latest}
    command: uv run python -m agent_studio.worker.entrypoint
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      api: { condition: service_healthy }

Scaling Guidelines

ComponentSizingRationale
API2+ replicasRedundancy, load distribution
Worker1 per 10 callsEach worker handles multiple concurrent calls
PostgreSQLConnection poolingPrevent connection exhaustion
RedisSingle instanceLow volume, can cluster later

Monitoring Checklist

  • API response times (p50, p95, p99)
  • Worker call queue depth
  • Active call count
  • Database connection pool usage
  • Redis memory usage
  • Container resource usage (CPU, memory)
  • LiveKit room count and participant count
  • SIP call success/failure rate
  • RTP packet loss and jitter

Consequences

Positive

  • Simple operations: Docker Compose is familiar and debuggable
  • Cost effective: No K8s infrastructure overhead
  • Quick iteration: Easy to update and redeploy
  • Clear upgrade path: Well-defined triggers for K8s migration

Negative

  • Manual scaling: Must monitor and scale manually
  • No auto-healing: Must restart failed containers manually
  • Limited isolation: All workers share resources

Mitigations

  • Health checks and restart policies for auto-recovery
  • Monitoring alerts for scaling triggers
  • Document runbooks for common operations

References

On this page