Production Deployment

Production deployment checklist and best practices

Production Deployment

This guide covers best practices and a checklist for deploying Agent Studio to production.

Pre-Deployment Checklist

Security

Generate secure JWT_SECRET (min 32 characters)
Generate secure ENCRYPTION_KEY (exactly 32 bytes)
Use HTTPS for all endpoints
Configure CORS origins (don't use * in production)
Set up rate limiting
Enable audit logging
Review API key permissions

Infrastructure

PostgreSQL 16+ with SSL enabled
Redis 7+ with authentication
LiveKit server or LiveKit Cloud
Load balancer with health checks
DNS and SSL certificates

Monitoring

Environment Configuration

Required Variables

# Database (use connection pooling in production)
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/agent_studio?ssl=require

# Redis (with auth)
REDIS_URL=redis://:password@host:6379/0

# Security - GENERATE SECURE VALUES!
JWT_SECRET=generate-a-secure-random-string-at-least-32-chars
ENCRYPTION_KEY=exactly-32-bytes-for-aes256-key!

# LiveKit
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# Application
LOG_LEVEL=INFO
CORS_ORIGINS=https://dashboard.yourdomain.com,https://app.yourdomain.com

Generating Secure Keys

# JWT Secret (64 characters)
openssl rand -base64 48

# Encryption Key (32 bytes, base64)
openssl rand -base64 32

Architecture

Recommended Setup

                    ┌─────────────────┐
                    │  Load Balancer  │
                    │   (HTTPS/WSS)   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
       │   API #1    │ │  API #2   │ │   API #3    │
       │  (FastAPI)  │ │ (FastAPI) │ │  (FastAPI)  │
       └──────┬──────┘ └─────┬─────┘ └──────┬──────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
       │  Worker #1  │ │ Worker #2 │ │  Worker #3  │
       │  (LiveKit)  │ │ (LiveKit) │ │  (LiveKit)  │
       └─────────────┘ └───────────┘ └─────────────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
  ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐
  │  PostgreSQL │     │    Redis    │     │   LiveKit   │
  │   Primary   │     │   Cluster   │     │   Server    │
  └─────────────┘     └─────────────┘     └─────────────┘

Scaling Guidelines

Component	Scaling Strategy
API	Horizontal (stateless)
Worker	Horizontal (1 worker per concurrent call)
PostgreSQL	Vertical + read replicas
Redis	Cluster mode

Database

Connection Pooling

Use PgBouncer or built-in pool:

# In settings
DATABASE_POOL_SIZE=20
DATABASE_POOL_OVERFLOW=10

Migrations

Run migrations before deploying new versions:

# Using Alembic
alembic upgrade head

Backups

Enable automated daily backups
Test restore procedures
Keep 30 days of backups minimum

Monitoring

Prometheus Metrics

Agent Studio exposes metrics at /metrics:

# API metrics
http_requests_total{method="GET", endpoint="/api/v1/agents", status="200"}
http_request_duration_seconds{method="GET", endpoint="/api/v1/agents"}

# Voice metrics
voice_calls_total{workflow="daily-call", status="completed"}
voice_call_duration_seconds{workflow="daily-call"}

# Provider metrics
provider_latency_seconds{provider="deepgram", type="stt"}

Logging

Structured JSON logging is enabled by default:

{
  "timestamp": "2026-01-17T10:30:00Z",
  "level": "INFO",
  "message": "Call completed",
  "call_id": "uuid",
  "duration": 180,
  "request_id": "req-uuid"
}

Health Checks

Configure your load balancer to use:

Liveness: GET /health/live (container running)
Readiness: GET /health/ready (ready for traffic)

Security Hardening

API Security

Rate Limiting: Configure per-key limits
Request Validation: All inputs are validated
SQL Injection: Prevented via SQLAlchemy ORM
XSS: API-only, no HTML rendering

Network Security

TLS 1.3: Use modern TLS
Private Networks: Keep DB/Redis internal
Firewall Rules: Restrict access by IP

Secret Management

Use a secrets manager:

AWS Secrets Manager
HashiCorp Vault
Kubernetes Secrets

# Example: AWS Secrets Manager
DATABASE_URL=$(aws secretsmanager get-secret-value --secret-id prod/db-url --query SecretString --output text)

High Availability

API Layer

Run 3+ instances behind load balancer
Use sticky sessions for WebSocket (if needed)
Health check interval: 10s
Unhealthy threshold: 3

Worker Layer

Run workers equal to max concurrent calls
Workers are stateless (use Redis for state)
Auto-scaling based on queue depth

Database

PostgreSQL with streaming replication
Automatic failover (patroni, RDS Multi-AZ)
Connection pooling

Deployment Strategies

Blue-Green Deployment

Deploy new version to "green" environment
Run smoke tests
Switch load balancer to green
Keep blue as rollback

Rolling Deployment

Update instances one at a time
Wait for health checks to pass
Continue to next instance
Automatic rollback on failures

Canary Deployment

Route 5% traffic to new version
Monitor error rates and latency
Gradually increase to 100%
Rollback if issues detected

Troubleshooting

Common Issues

High Latency

Check database query performance
Review provider API latencies
Check Redis connectivity

Connection Errors

Verify connection pool settings
Check network connectivity
Review firewall rules

Memory Issues

Monitor container memory usage
Check for memory leaks in workers
Adjust resource limits

Debug Mode

Enable debug logging temporarily:

LOG_LEVEL=DEBUG

Never enable debug logging in production for extended periods - it impacts performance and may log sensitive data.

Maintenance

Regular Tasks

Task	Frequency
Rotate API keys	Quarterly
Update dependencies	Monthly
Review access logs	Weekly
Test disaster recovery	Quarterly
Security audit	Annually

Updates

Review changelog for breaking changes
Test in staging environment
Schedule maintenance window
Deploy with rollback plan
Monitor for issues

Docker Deployment

Deploy Agent Studio using Docker and Docker Compose

Testing

Guide to testing Agent Studio - unit tests, integration tests, and test infrastructure

Production Deployment

On this page