ADR-010: Production Deployment Strategy

Status

Accepted

Context

Agent Studio needs a production deployment strategy that balances:

Simplicity: Easy to deploy and operate initially
Scalability: Ability to handle growing call volumes
Reliability: High availability for production workloads
Future-proof: Clear path to more sophisticated orchestration

Current scale requirements:

10-100 concurrent calls initially
Single-region deployment
Small ops team

Future considerations:

Per-agent worker pools for resource isolation
Auto-scaling based on call queue depth
Multi-region deployment

Decision

Phase 1: Docker Compose (Current)

Use Docker Compose for production deployment with manual horizontal scaling.

Architecture:

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer                         │
│                   (nginx/traefik)                        │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│  API (1)  │ │  API (2)  │ │  API (n)  │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
      │             │             │
      └─────────────┼─────────────┘
                    │
        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker(1) │ │ Worker(2) │ │ Worker(n) │
└───────────┘ └───────────┘ └───────────┘
        │           │           │
        └───────────┼───────────┘
                    │
    ┌───────────────┼───────────────┐
    │               │               │
    ▼               ▼               ▼
┌────────┐    ┌─────────┐    ┌──────────┐
│PostgreSQL│   │  Redis  │    │ LiveKit  │
└────────┘    └─────────┘    └──────────┘

Scaling:

# Scale API servers
docker compose -f docker-compose.prod.yml up -d --scale api=3

# Scale workers (1 worker per ~10 concurrent calls)
docker compose -f docker-compose.prod.yml up -d --scale worker=5

Benefits:

Simple deployment and debugging
Easy local replication of production issues
No Kubernetes expertise required
Lower infrastructure cost initially

Phase 2: Kubernetes (Future)

When needed, migrate to Kubernetes for:

Per-Agent Worker Pools:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-meal-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      agent: meal-agent
  template:
    spec:
      containers:
      - name: worker
        env:
        - name: AGENT_FILTER
          value: "meal-agent"
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"

Auto-scaling with KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: redis
    metadata:
      address: redis:6379
      listName: call-queue
      listLength: "5"  # Scale up when queue > 5

When to Consider Kubernetes

Trigger	Current State	Kubernetes Needed
Concurrent calls	< 100	> 100 consistently
Auto-scaling need	Manual is OK	Must scale automatically
Per-agent isolation	Not needed	Resource isolation required
Multi-region	Single region	Multiple regions
Team expertise	Limited K8s	K8s expertise available

Alternatives Considered

1. Kubernetes from Day 1

Pros: Future-proof, auto-scaling built-in Cons: Operational complexity, overkill for current scale

Decision: Docker Compose is simpler. We can migrate when needed.

2. Serverless (AWS Lambda, Cloud Run)

Pros: Auto-scaling, pay-per-use Cons: Cold starts problematic for real-time voice, connection limits

Decision: Voice agents need persistent connections. Serverless doesn't fit.

3. Managed Container Services (ECS, Cloud Run)

Pros: Managed infrastructure, auto-scaling Cons: Vendor lock-in, more complex than Compose

Decision: Good middle ground. Document as alternative to K8s.

Implementation

docker-compose.prod.yml

services:
  api:
    image: agent-studio-api:${VERSION:-latest}
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1'
          memory: 1G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/ready"]
      interval: 30s
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_healthy }

  worker:
    image: agent-studio-worker:${VERSION:-latest}
    command: uv run python -m agent_studio.worker.entrypoint
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      api: { condition: service_healthy }

Scaling Guidelines

Component	Sizing	Rationale
API	2+ replicas	Redundancy, load distribution
Worker	1 per 10 calls	Each worker handles multiple concurrent calls
PostgreSQL	Connection pooling	Prevent connection exhaustion
Redis	Single instance	Low volume, can cluster later

ADR-010: Production Deployment Strategy

ADR-010: Production Deployment Strategy

Status

Context

Decision

Phase 1: Docker Compose (Current)

Phase 2: Kubernetes (Future)

When to Consider Kubernetes

Alternatives Considered

1. Kubernetes from Day 1

2. Serverless (AWS Lambda, Cloud Run)

3. Managed Container Services (ECS, Cloud Run)

Implementation

docker-compose.prod.yml

Scaling Guidelines

Monitoring Checklist

Consequences

Positive

Negative

Mitigations

References

On this page