ADR-010: Production Deployment Strategy
Docker Compose-based production deployment with future Kubernetes path
ADR-010: Production Deployment Strategy
Status
Accepted
Context
Agent Studio needs a production deployment strategy that balances:
- Simplicity: Easy to deploy and operate initially
- Scalability: Ability to handle growing call volumes
- Reliability: High availability for production workloads
- Future-proof: Clear path to more sophisticated orchestration
Current scale requirements:
- 10-100 concurrent calls initially
- Single-region deployment
- Small ops team
Future considerations:
- Per-agent worker pools for resource isolation
- Auto-scaling based on call queue depth
- Multi-region deployment
Decision
Phase 1: Docker Compose (Current)
Use Docker Compose for production deployment with manual horizontal scaling.
Architecture:
┌─────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (nginx/traefik) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ API (1) │ │ API (2) │ │ API (n) │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└─────────────┼─────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker(1) │ │ Worker(2) │ │ Worker(n) │
└───────────┘ └───────────┘ └───────────┘
│ │ │
└───────────┼───────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌──────────┐
│PostgreSQL│ │ Redis │ │ LiveKit │
└────────┘ └─────────┘ └──────────┘Scaling:
# Scale API servers
docker compose -f docker-compose.prod.yml up -d --scale api=3
# Scale workers (1 worker per ~10 concurrent calls)
docker compose -f docker-compose.prod.yml up -d --scale worker=5Benefits:
- Simple deployment and debugging
- Easy local replication of production issues
- No Kubernetes expertise required
- Lower infrastructure cost initially
Phase 2: Kubernetes (Future)
When needed, migrate to Kubernetes for:
Per-Agent Worker Pools:
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker-meal-agent
spec:
replicas: 3
selector:
matchLabels:
agent: meal-agent
template:
spec:
containers:
- name: worker
env:
- name: AGENT_FILTER
value: "meal-agent"
resources:
limits:
cpu: "2"
memory: "2Gi"Auto-scaling with KEDA:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
spec:
scaleTargetRef:
name: worker
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: redis
metadata:
address: redis:6379
listName: call-queue
listLength: "5" # Scale up when queue > 5When to Consider Kubernetes
| Trigger | Current State | Kubernetes Needed |
|---|---|---|
| Concurrent calls | < 100 | > 100 consistently |
| Auto-scaling need | Manual is OK | Must scale automatically |
| Per-agent isolation | Not needed | Resource isolation required |
| Multi-region | Single region | Multiple regions |
| Team expertise | Limited K8s | K8s expertise available |
Alternatives Considered
1. Kubernetes from Day 1
Pros: Future-proof, auto-scaling built-in Cons: Operational complexity, overkill for current scale
Decision: Docker Compose is simpler. We can migrate when needed.
2. Serverless (AWS Lambda, Cloud Run)
Pros: Auto-scaling, pay-per-use Cons: Cold starts problematic for real-time voice, connection limits
Decision: Voice agents need persistent connections. Serverless doesn't fit.
3. Managed Container Services (ECS, Cloud Run)
Pros: Managed infrastructure, auto-scaling Cons: Vendor lock-in, more complex than Compose
Decision: Good middle ground. Document as alternative to K8s.
Implementation
docker-compose.prod.yml
services:
api:
image: agent-studio-api:${VERSION:-latest}
deploy:
replicas: 2
resources:
limits:
cpus: '1'
memory: 1G
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health/ready"]
interval: 30s
depends_on:
db: { condition: service_healthy }
redis: { condition: service_healthy }
worker:
image: agent-studio-worker:${VERSION:-latest}
command: uv run python -m agent_studio.worker.entrypoint
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 2G
depends_on:
api: { condition: service_healthy }Scaling Guidelines
| Component | Sizing | Rationale |
|---|---|---|
| API | 2+ replicas | Redundancy, load distribution |
| Worker | 1 per 10 calls | Each worker handles multiple concurrent calls |
| PostgreSQL | Connection pooling | Prevent connection exhaustion |
| Redis | Single instance | Low volume, can cluster later |
Monitoring Checklist
- API response times (p50, p95, p99)
- Worker call queue depth
- Active call count
- Database connection pool usage
- Redis memory usage
- Container resource usage (CPU, memory)
Consequences
Positive
- Simple operations: Docker Compose is familiar and debuggable
- Cost effective: No K8s infrastructure overhead
- Quick iteration: Easy to update and redeploy
- Clear upgrade path: Well-defined triggers for K8s migration
Negative
- Manual scaling: Must monitor and scale manually
- No auto-healing: Must restart failed containers manually
- Limited isolation: All workers share resources
Mitigations
- Health checks and restart policies for auto-recovery
- Monitoring alerts for scaling triggers
- Document runbooks for common operations