Production Deployment
Production deployment checklist and best practices
Production Deployment
This guide covers best practices and a checklist for deploying Agent Studio to production.
Pre-Deployment Checklist
Security
- Generate secure
JWT_SECRET(min 32 characters) - Generate secure
ENCRYPTION_KEY(exactly 32 bytes) - Use HTTPS for all endpoints
- Configure CORS origins (don't use
*in production) - Set up rate limiting
- Enable audit logging
- Review API key permissions
Infrastructure
- PostgreSQL 16+ with SSL enabled
- Redis 7+ with authentication
- LiveKit server or LiveKit Cloud
- Load balancer with health checks
- DNS and SSL certificates
Monitoring
- Application metrics (Prometheus)
- Log aggregation
- Error tracking (Sentry)
- Uptime monitoring
- Alerting configured
Environment Configuration
Required Variables
# Database (use connection pooling in production)
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/agent_studio?ssl=require
# Redis (with auth)
REDIS_URL=redis://:password@host:6379/0
# Security - GENERATE SECURE VALUES!
JWT_SECRET=generate-a-secure-random-string-at-least-32-chars
ENCRYPTION_KEY=exactly-32-bytes-for-aes256-key!
# LiveKit
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
# Application
LOG_LEVEL=INFO
CORS_ORIGINS=https://dashboard.yourdomain.com,https://app.yourdomain.comGenerating Secure Keys
# JWT Secret (64 characters)
openssl rand -base64 48
# Encryption Key (32 bytes, base64)
openssl rand -base64 32Architecture
Recommended Setup
┌─────────────────┐
│ Load Balancer │
│ (HTTPS/WSS) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ API #1 │ │ API #2 │ │ API #3 │
│ (FastAPI) │ │ (FastAPI) │ │ (FastAPI) │
└──────┬──────┘ └─────┬─────┘ └──────┬──────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ Worker #1 │ │ Worker #2 │ │ Worker #3 │
│ (LiveKit) │ │ (LiveKit) │ │ (LiveKit) │
└─────────────┘ └───────────┘ └─────────────┘
│ │ │
└──────────────┼──────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ PostgreSQL │ │ Redis │ │ LiveKit │
│ Primary │ │ Cluster │ │ Server │
└─────────────┘ └─────────────┘ └─────────────┘Scaling Guidelines
| Component | Scaling Strategy |
|---|---|
| API | Horizontal (stateless) |
| Worker | Horizontal (1 worker per concurrent call) |
| PostgreSQL | Vertical + read replicas |
| Redis | Cluster mode |
Database
Connection Pooling
Use PgBouncer or built-in pool:
# In settings
DATABASE_POOL_SIZE=20
DATABASE_POOL_OVERFLOW=10Migrations
Run migrations before deploying new versions:
# Using Alembic
alembic upgrade headBackups
- Enable automated daily backups
- Test restore procedures
- Keep 30 days of backups minimum
Monitoring
Prometheus Metrics
Agent Studio exposes metrics at /metrics:
# API metrics
http_requests_total{method="GET", endpoint="/api/v1/agents", status="200"}
http_request_duration_seconds{method="GET", endpoint="/api/v1/agents"}
# Voice metrics
voice_calls_total{workflow="daily-call", status="completed"}
voice_call_duration_seconds{workflow="daily-call"}
# Provider metrics
provider_latency_seconds{provider="deepgram", type="stt"}Logging
Structured JSON logging is enabled by default:
{
"timestamp": "2026-01-17T10:30:00Z",
"level": "INFO",
"message": "Call completed",
"call_id": "uuid",
"duration": 180,
"request_id": "req-uuid"
}Health Checks
Configure your load balancer to use:
- Liveness:
GET /health/live(container running) - Readiness:
GET /health/ready(ready for traffic)
Security Hardening
API Security
- Rate Limiting: Configure per-key limits
- Request Validation: All inputs are validated
- SQL Injection: Prevented via SQLAlchemy ORM
- XSS: API-only, no HTML rendering
Network Security
- TLS 1.3: Use modern TLS
- Private Networks: Keep DB/Redis internal
- Firewall Rules: Restrict access by IP
Secret Management
Use a secrets manager:
- AWS Secrets Manager
- HashiCorp Vault
- Kubernetes Secrets
# Example: AWS Secrets Manager
DATABASE_URL=$(aws secretsmanager get-secret-value --secret-id prod/db-url --query SecretString --output text)High Availability
API Layer
- Run 3+ instances behind load balancer
- Use sticky sessions for WebSocket (if needed)
- Health check interval: 10s
- Unhealthy threshold: 3
Worker Layer
- Run workers equal to max concurrent calls
- Workers are stateless (use Redis for state)
- Auto-scaling based on queue depth
Database
- PostgreSQL with streaming replication
- Automatic failover (patroni, RDS Multi-AZ)
- Connection pooling
Deployment Strategies
Blue-Green Deployment
- Deploy new version to "green" environment
- Run smoke tests
- Switch load balancer to green
- Keep blue as rollback
Rolling Deployment
- Update instances one at a time
- Wait for health checks to pass
- Continue to next instance
- Automatic rollback on failures
Canary Deployment
- Route 5% traffic to new version
- Monitor error rates and latency
- Gradually increase to 100%
- Rollback if issues detected
Troubleshooting
Common Issues
High Latency
- Check database query performance
- Review provider API latencies
- Check Redis connectivity
Connection Errors
- Verify connection pool settings
- Check network connectivity
- Review firewall rules
Memory Issues
- Monitor container memory usage
- Check for memory leaks in workers
- Adjust resource limits
Debug Mode
Enable debug logging temporarily:
LOG_LEVEL=DEBUGNever enable debug logging in production for extended periods - it impacts performance and may log sensitive data.
Maintenance
Regular Tasks
| Task | Frequency |
|---|---|
| Rotate API keys | Quarterly |
| Update dependencies | Monthly |
| Review access logs | Weekly |
| Test disaster recovery | Quarterly |
| Security audit | Annually |
Updates
- Review changelog for breaking changes
- Test in staging environment
- Schedule maintenance window
- Deploy with rollback plan
- Monitor for issues