Agent Studio
Adr

ADR-010: Production Deployment Strategy

Docker Compose-based production deployment with future Kubernetes path

ADR-010: Production Deployment Strategy

Status

Accepted

Context

Agent Studio needs a production deployment strategy that balances:

  1. Simplicity: Easy to deploy and operate initially
  2. Scalability: Ability to handle growing call volumes
  3. Reliability: High availability for production workloads
  4. Future-proof: Clear path to more sophisticated orchestration

Current scale requirements:

  • 10-100 concurrent calls initially
  • Single-region deployment
  • Small ops team

Future considerations:

  • Per-agent worker pools for resource isolation
  • Auto-scaling based on call queue depth
  • Multi-region deployment

Decision

Phase 1: Docker Compose (Current)

Use Docker Compose for production deployment with manual horizontal scaling.

Architecture:

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer                         │
│                   (nginx/traefik)                        │
└─────────────────────┬───────────────────────────────────┘

        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│  API (1)  │ │  API (2)  │ │  API (n)  │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
      │             │             │
      └─────────────┼─────────────┘

        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker(1) │ │ Worker(2) │ │ Worker(n) │
└───────────┘ └───────────┘ └───────────┘
        │           │           │
        └───────────┼───────────┘

    ┌───────────────┼───────────────┐
    │               │               │
    ▼               ▼               ▼
┌────────┐    ┌─────────┐    ┌──────────┐
│PostgreSQL│   │  Redis  │    │ LiveKit  │
└────────┘    └─────────┘    └──────────┘

Scaling:

# Scale API servers
docker compose -f docker-compose.prod.yml up -d --scale api=3

# Scale workers (1 worker per ~10 concurrent calls)
docker compose -f docker-compose.prod.yml up -d --scale worker=5

Benefits:

  • Simple deployment and debugging
  • Easy local replication of production issues
  • No Kubernetes expertise required
  • Lower infrastructure cost initially

Phase 2: Kubernetes (Future)

When needed, migrate to Kubernetes for:

Per-Agent Worker Pools:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-meal-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      agent: meal-agent
  template:
    spec:
      containers:
      - name: worker
        env:
        - name: AGENT_FILTER
          value: "meal-agent"
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"

Auto-scaling with KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: redis
    metadata:
      address: redis:6379
      listName: call-queue
      listLength: "5"  # Scale up when queue > 5

When to Consider Kubernetes

TriggerCurrent StateKubernetes Needed
Concurrent calls< 100> 100 consistently
Auto-scaling needManual is OKMust scale automatically
Per-agent isolationNot neededResource isolation required
Multi-regionSingle regionMultiple regions
Team expertiseLimited K8sK8s expertise available

Alternatives Considered

1. Kubernetes from Day 1

Pros: Future-proof, auto-scaling built-in Cons: Operational complexity, overkill for current scale

Decision: Docker Compose is simpler. We can migrate when needed.

2. Serverless (AWS Lambda, Cloud Run)

Pros: Auto-scaling, pay-per-use Cons: Cold starts problematic for real-time voice, connection limits

Decision: Voice agents need persistent connections. Serverless doesn't fit.

3. Managed Container Services (ECS, Cloud Run)

Pros: Managed infrastructure, auto-scaling Cons: Vendor lock-in, more complex than Compose

Decision: Good middle ground. Document as alternative to K8s.

Implementation

docker-compose.prod.yml

services:
  api:
    image: agent-studio-api:${VERSION:-latest}
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1'
          memory: 1G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/ready"]
      interval: 30s
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_healthy }

  worker:
    image: agent-studio-worker:${VERSION:-latest}
    command: uv run python -m agent_studio.worker.entrypoint
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      api: { condition: service_healthy }

Scaling Guidelines

ComponentSizingRationale
API2+ replicasRedundancy, load distribution
Worker1 per 10 callsEach worker handles multiple concurrent calls
PostgreSQLConnection poolingPrevent connection exhaustion
RedisSingle instanceLow volume, can cluster later

Monitoring Checklist

  • API response times (p50, p95, p99)
  • Worker call queue depth
  • Active call count
  • Database connection pool usage
  • Redis memory usage
  • Container resource usage (CPU, memory)

Consequences

Positive

  • Simple operations: Docker Compose is familiar and debuggable
  • Cost effective: No K8s infrastructure overhead
  • Quick iteration: Easy to update and redeploy
  • Clear upgrade path: Well-defined triggers for K8s migration

Negative

  • Manual scaling: Must monitor and scale manually
  • No auto-healing: Must restart failed containers manually
  • Limited isolation: All workers share resources

Mitigations

  • Health checks and restart policies for auto-recovery
  • Monitoring alerts for scaling triggers
  • Document runbooks for common operations

References

On this page