Agent Studio

Production Deployment

Production deployment checklist and best practices

Production Deployment

This guide covers best practices and a checklist for deploying Agent Studio to production.

Pre-Deployment Checklist

Security

  • Generate secure JWT_SECRET (min 32 characters)
  • Generate secure ENCRYPTION_KEY (exactly 32 bytes)
  • Use HTTPS for all endpoints
  • Configure CORS origins (don't use * in production)
  • Set up rate limiting
  • Enable audit logging
  • Review API key permissions

Infrastructure

  • PostgreSQL 16+ with SSL enabled
  • Redis 7+ with authentication
  • LiveKit server or LiveKit Cloud
  • Load balancer with health checks
  • DNS and SSL certificates

Monitoring

  • Application metrics (Prometheus)
  • Log aggregation
  • Error tracking (Sentry)
  • Uptime monitoring
  • Alerting configured

Environment Configuration

Required Variables

# Database (use connection pooling in production)
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/agent_studio?ssl=require

# Redis (with auth)
REDIS_URL=redis://:password@host:6379/0

# Security - GENERATE SECURE VALUES!
JWT_SECRET=generate-a-secure-random-string-at-least-32-chars
ENCRYPTION_KEY=exactly-32-bytes-for-aes256-key!

# LiveKit
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# Application
LOG_LEVEL=INFO
CORS_ORIGINS=https://dashboard.yourdomain.com,https://app.yourdomain.com

Generating Secure Keys

# JWT Secret (64 characters)
openssl rand -base64 48

# Encryption Key (32 bytes, base64)
openssl rand -base64 32

Architecture

                    ┌─────────────────┐
                    │  Load Balancer  │
                    │   (HTTPS/WSS)   │
                    └────────┬────────┘

              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
       │   API #1    │ │  API #2   │ │   API #3    │
       │  (FastAPI)  │ │ (FastAPI) │ │  (FastAPI)  │
       └──────┬──────┘ └─────┬─────┘ └──────┬──────┘
              │              │              │
              └──────────────┼──────────────┘

              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
       │  Worker #1  │ │ Worker #2 │ │  Worker #3  │
       │  (LiveKit)  │ │ (LiveKit) │ │  (LiveKit)  │
       └─────────────┘ └───────────┘ └─────────────┘
              │              │              │
              └──────────────┼──────────────┘

         ┌───────────────────┼───────────────────┐
         │                   │                   │
  ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐
  │  PostgreSQL │     │    Redis    │     │   LiveKit   │
  │   Primary   │     │   Cluster   │     │   Server    │
  └─────────────┘     └─────────────┘     └─────────────┘

Scaling Guidelines

ComponentScaling Strategy
APIHorizontal (stateless)
WorkerHorizontal (1 worker per concurrent call)
PostgreSQLVertical + read replicas
RedisCluster mode

Database

Connection Pooling

Use PgBouncer or built-in pool:

# In settings
DATABASE_POOL_SIZE=20
DATABASE_POOL_OVERFLOW=10

Migrations

Run migrations before deploying new versions:

# Using Alembic
alembic upgrade head

Backups

  • Enable automated daily backups
  • Test restore procedures
  • Keep 30 days of backups minimum

Monitoring

Prometheus Metrics

Agent Studio exposes metrics at /metrics:

# API metrics
http_requests_total{method="GET", endpoint="/api/v1/agents", status="200"}
http_request_duration_seconds{method="GET", endpoint="/api/v1/agents"}

# Voice metrics
voice_calls_total{workflow="daily-call", status="completed"}
voice_call_duration_seconds{workflow="daily-call"}

# Provider metrics
provider_latency_seconds{provider="deepgram", type="stt"}

Logging

Structured JSON logging is enabled by default:

{
  "timestamp": "2026-01-17T10:30:00Z",
  "level": "INFO",
  "message": "Call completed",
  "call_id": "uuid",
  "duration": 180,
  "request_id": "req-uuid"
}

Health Checks

Configure your load balancer to use:

  • Liveness: GET /health/live (container running)
  • Readiness: GET /health/ready (ready for traffic)

Security Hardening

API Security

  1. Rate Limiting: Configure per-key limits
  2. Request Validation: All inputs are validated
  3. SQL Injection: Prevented via SQLAlchemy ORM
  4. XSS: API-only, no HTML rendering

Network Security

  1. TLS 1.3: Use modern TLS
  2. Private Networks: Keep DB/Redis internal
  3. Firewall Rules: Restrict access by IP

Secret Management

Use a secrets manager:

  • AWS Secrets Manager
  • HashiCorp Vault
  • Kubernetes Secrets
# Example: AWS Secrets Manager
DATABASE_URL=$(aws secretsmanager get-secret-value --secret-id prod/db-url --query SecretString --output text)

High Availability

API Layer

  • Run 3+ instances behind load balancer
  • Use sticky sessions for WebSocket (if needed)
  • Health check interval: 10s
  • Unhealthy threshold: 3

Worker Layer

  • Run workers equal to max concurrent calls
  • Workers are stateless (use Redis for state)
  • Auto-scaling based on queue depth

Database

  • PostgreSQL with streaming replication
  • Automatic failover (patroni, RDS Multi-AZ)
  • Connection pooling

Deployment Strategies

Blue-Green Deployment

  1. Deploy new version to "green" environment
  2. Run smoke tests
  3. Switch load balancer to green
  4. Keep blue as rollback

Rolling Deployment

  1. Update instances one at a time
  2. Wait for health checks to pass
  3. Continue to next instance
  4. Automatic rollback on failures

Canary Deployment

  1. Route 5% traffic to new version
  2. Monitor error rates and latency
  3. Gradually increase to 100%
  4. Rollback if issues detected

Troubleshooting

Common Issues

High Latency

  • Check database query performance
  • Review provider API latencies
  • Check Redis connectivity

Connection Errors

  • Verify connection pool settings
  • Check network connectivity
  • Review firewall rules

Memory Issues

  • Monitor container memory usage
  • Check for memory leaks in workers
  • Adjust resource limits

Debug Mode

Enable debug logging temporarily:

LOG_LEVEL=DEBUG

Never enable debug logging in production for extended periods - it impacts performance and may log sensitive data.


Maintenance

Regular Tasks

TaskFrequency
Rotate API keysQuarterly
Update dependenciesMonthly
Review access logsWeekly
Test disaster recoveryQuarterly
Security auditAnnually

Updates

  1. Review changelog for breaking changes
  2. Test in staging environment
  3. Schedule maintenance window
  4. Deploy with rollback plan
  5. Monitor for issues

On this page