Agent Studio
Adr

ADR-011: Webhook Delivery System

Enterprise-grade webhook system for call status notifications

ADR-011: Webhook Delivery System

Status

Accepted

Context

Backend systems need to receive real-time notifications when call status changes occur (started, connected, completed, failed, etc.). This enables:

  1. Event-driven architectures - Backend can react to call events without polling
  2. Audit logging - Centralized logging of all call activities
  3. Business workflows - Trigger downstream processes when calls complete
  4. Analytics pipelines - Feed call data to analytics systems in real-time

Requirements:

  • Reliability: Guaranteed delivery with retries
  • Security: Cryptographic signatures to verify authenticity
  • Flexibility: Filter which events to receive
  • Low overhead: Minimal configuration for tenants
  • Enterprise patterns: Similar to Stripe, Twilio, etc.

Decision

Implement an enterprise-grade webhook system with:

1. Event Model (Stripe-like)

{
  "id": "evt_abc123",
  "object": "event",
  "api_version": "2026-01-19",
  "created": 1705680000,
  "type": "call.completed",
  "tenant_id": "tenant-uuid",
  "livemode": true,
  "data": {
    "call_id": "call-uuid",
    "room_name": "room-abc",
    "workflow_slug": "daily-checkup",
    "user_id": "user-123",
    "status": "completed",
    "duration_seconds": 180,
    "transcript": [...],
    "metrics": {...}
  }
}

2. HMAC-SHA256 Signatures

Each webhook request includes a signature header:

X-Webhook-Signature: t=1705680000,v1=abc123...

The signature is computed as:

HMAC-SHA256(secret, "{timestamp}.{payload}")

This prevents:

  • Replay attacks (timestamp validation)
  • Tampering (signature verification)
  • Spoofing (secret only known to tenant)

3. Delivery with Retries

  • Exponential backoff: 1s, 2s, 4s, 8s, 16s
  • Max 5 retries by default (configurable)
  • Timeout: 30s per request (configurable 5-60s)
  • Success criteria: HTTP 2xx response

4. Event Types

EventDescription
call.startedCall initiated, waiting for connection
call.connectedCall connected, agent active
call.completedCall ended normally
call.failedCall failed to connect
call.disconnectedCall dropped unexpectedly
call.timeoutCall timed out
call.agent.changedAgent handoff occurred

5. Configuration Storage

Webhook config stored in tenant.settings["webhooks"]:

{
    "url": "https://api.example.com/webhooks",
    "secret": "whsec_abc123...",
    "enabled": true,
    "filter": {
        "events": ["call.completed", "call.failed"],
        "include_transcript": true,
        "include_context": false,
        "include_metrics": true
    },
    "headers": {"X-Custom": "value"},
    "timeout": 30,
    "max_retries": 5
}

6. API Endpoints

GET    /api/v1/webhooks          # Get current config
POST   /api/v1/webhooks          # Create/replace (returns secret)
PATCH  /api/v1/webhooks          # Update (keeps secret)
DELETE /api/v1/webhooks          # Delete config
POST   /api/v1/webhooks/test     # Send test event
POST   /api/v1/webhooks/enable   # Enable webhook
POST   /api/v1/webhooks/disable  # Disable webhook
POST   /api/v1/webhooks/rotate-secret  # Rotate signing secret

Alternatives Considered

1. Message Queue (Kafka/RabbitMQ)

Pros: Higher throughput, better guarantees Cons: Infrastructure overhead, tenant must run consumers

Decision: Webhooks are simpler for tenants. They just expose an HTTPS endpoint.

2. WebSocket Push

Pros: Real-time, bidirectional Cons: Requires persistent connection, harder to scale

Decision: Webhooks work better for server-to-server integration.

3. Polling API

Pros: Simple, no webhook infrastructure needed Cons: Latency, wasted requests, not real-time

Decision: Polling is inefficient. Webhooks push events immediately.

Implementation

Files Created

src/agent_studio/core/webhooks/
├── __init__.py           # Module exports
├── events.py             # Event types and models
├── config.py             # Configuration models
└── dispatcher.py         # Delivery with retries

src/agent_studio/api/routers/webhooks.py  # Management API

Integration Points

  1. Internal router (internal.py): Dispatches webhooks on status changes
  2. Call completion: Sends full data (transcript, context, metrics)
  3. Background dispatch: Non-blocking, doesn't slow API responses

Verifying Signatures (Tenant Side)

import hmac
import hashlib
import time

def verify_webhook(payload: str, signature_header: str, secret: str) -> bool:
    """Verify webhook signature."""
    # Parse signature header
    parts = dict(p.split("=") for p in signature_header.split(","))
    timestamp = int(parts["t"])
    signature = parts["v1"]
    
    # Check timestamp (prevent replay attacks)
    if abs(time.time() - timestamp) > 300:  # 5 minute tolerance
        return False
    
    # Verify signature
    expected = hmac.new(
        secret.encode(),
        f"{timestamp}.{payload}".encode(),
        hashlib.sha256
    ).hexdigest()
    
    return hmac.compare_digest(signature, expected)

Consequences

Positive

  • Standard pattern: Familiar to developers (Stripe, Twilio, GitHub)
  • Secure: HMAC signatures prevent spoofing/tampering
  • Reliable: Retries ensure delivery despite transient failures
  • Flexible: Event filtering reduces noise
  • Low overhead: No infrastructure for tenants to manage

Negative

  • Eventual consistency: Events may arrive out of order
  • Endpoint requirements: Tenant must expose HTTPS endpoint
  • Secret management: Tenant must securely store webhook secret

Mitigations

  • Include timestamps and event IDs for ordering
  • Support localhost HTTP in development
  • Secret rotation API for compromised secrets

References

On this page