ADR-011: Webhook Delivery System
Enterprise-grade webhook system for call status notifications
ADR-011: Webhook Delivery System
Status
Accepted
Context
Backend systems need to receive real-time notifications when call status changes occur (started, connected, completed, failed, etc.). This enables:
- Event-driven architectures - Backend can react to call events without polling
- Audit logging - Centralized logging of all call activities
- Business workflows - Trigger downstream processes when calls complete
- Analytics pipelines - Feed call data to analytics systems in real-time
Requirements:
- Reliability: Guaranteed delivery with retries
- Security: Cryptographic signatures to verify authenticity
- Flexibility: Filter which events to receive
- Low overhead: Minimal configuration for tenants
- Enterprise patterns: Similar to Stripe, Twilio, etc.
Decision
Implement an enterprise-grade webhook system with:
1. Event Model (Stripe-like)
{
"id": "evt_abc123",
"object": "event",
"api_version": "2026-01-19",
"created": 1705680000,
"type": "call.completed",
"tenant_id": "tenant-uuid",
"livemode": true,
"data": {
"call_id": "call-uuid",
"room_name": "room-abc",
"workflow_slug": "daily-checkup",
"user_id": "user-123",
"status": "completed",
"duration_seconds": 180,
"transcript": [...],
"metrics": {...}
}
}2. HMAC-SHA256 Signatures
Each webhook request includes a signature header:
X-Webhook-Signature: t=1705680000,v1=abc123...The signature is computed as:
HMAC-SHA256(secret, "{timestamp}.{payload}")This prevents:
- Replay attacks (timestamp validation)
- Tampering (signature verification)
- Spoofing (secret only known to tenant)
3. Delivery with Retries
- Exponential backoff: 1s, 2s, 4s, 8s, 16s
- Max 5 retries by default (configurable)
- Timeout: 30s per request (configurable 5-60s)
- Success criteria: HTTP 2xx response
4. Event Types
| Event | Description |
|---|---|
call.started | Call initiated, waiting for connection |
call.connected | Call connected, agent active |
call.completed | Call ended normally |
call.failed | Call failed to connect |
call.disconnected | Call dropped unexpectedly |
call.timeout | Call timed out |
call.agent.changed | Agent handoff occurred |
5. Configuration Storage
Webhook config stored in tenant.settings["webhooks"]:
{
"url": "https://api.example.com/webhooks",
"secret": "whsec_abc123...",
"enabled": true,
"filter": {
"events": ["call.completed", "call.failed"],
"include_transcript": true,
"include_context": false,
"include_metrics": true
},
"headers": {"X-Custom": "value"},
"timeout": 30,
"max_retries": 5
}6. API Endpoints
GET /api/v1/webhooks # Get current config
POST /api/v1/webhooks # Create/replace (returns secret)
PATCH /api/v1/webhooks # Update (keeps secret)
DELETE /api/v1/webhooks # Delete config
POST /api/v1/webhooks/test # Send test event
POST /api/v1/webhooks/enable # Enable webhook
POST /api/v1/webhooks/disable # Disable webhook
POST /api/v1/webhooks/rotate-secret # Rotate signing secretAlternatives Considered
1. Message Queue (Kafka/RabbitMQ)
Pros: Higher throughput, better guarantees Cons: Infrastructure overhead, tenant must run consumers
Decision: Webhooks are simpler for tenants. They just expose an HTTPS endpoint.
2. WebSocket Push
Pros: Real-time, bidirectional Cons: Requires persistent connection, harder to scale
Decision: Webhooks work better for server-to-server integration.
3. Polling API
Pros: Simple, no webhook infrastructure needed Cons: Latency, wasted requests, not real-time
Decision: Polling is inefficient. Webhooks push events immediately.
Implementation
Files Created
src/agent_studio/core/webhooks/
├── __init__.py # Module exports
├── events.py # Event types and models
├── config.py # Configuration models
└── dispatcher.py # Delivery with retries
src/agent_studio/api/routers/webhooks.py # Management APIIntegration Points
- Internal router (
internal.py): Dispatches webhooks on status changes - Call completion: Sends full data (transcript, context, metrics)
- Background dispatch: Non-blocking, doesn't slow API responses
Verifying Signatures (Tenant Side)
import hmac
import hashlib
import time
def verify_webhook(payload: str, signature_header: str, secret: str) -> bool:
"""Verify webhook signature."""
# Parse signature header
parts = dict(p.split("=") for p in signature_header.split(","))
timestamp = int(parts["t"])
signature = parts["v1"]
# Check timestamp (prevent replay attacks)
if abs(time.time() - timestamp) > 300: # 5 minute tolerance
return False
# Verify signature
expected = hmac.new(
secret.encode(),
f"{timestamp}.{payload}".encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)Consequences
Positive
- Standard pattern: Familiar to developers (Stripe, Twilio, GitHub)
- Secure: HMAC signatures prevent spoofing/tampering
- Reliable: Retries ensure delivery despite transient failures
- Flexible: Event filtering reduces noise
- Low overhead: No infrastructure for tenants to manage
Negative
- Eventual consistency: Events may arrive out of order
- Endpoint requirements: Tenant must expose HTTPS endpoint
- Secret management: Tenant must securely store webhook secret
Mitigations
- Include timestamps and event IDs for ordering
- Support localhost HTTP in development
- Secret rotation API for compromised secrets