Telemetry Architecture¶
This document describes the architecture of the DTX telemetry system.
Overview¶
flowchart TB
subgraph Django["Django Application"]
MW["TelemetryMiddleware"]
WS["TelemetryConsumerMixin"]
CT["TelemetryTask (Celery)"]
SL["structlog"]
end
subgraph Context["Context Sources"]
REQ["HTTP Request"]
WSC["WebSocket"]
TASK["Celery Task"]
end
subgraph Output["Output Destinations"]
JSON["JSON Logs (stdout)"]
OTEL["OpenTelemetry Collector"]
end
REQ --> MW
WSC --> WS
TASK --> CT
MW --> SL
WS --> SL
CT --> SL
SL --> JSON
SL -->|optional| OTEL
Hold "Alt" / "Option" to enable pan & zoom
Components¶
Config (telemetry/config.py)¶
Central initialization point that:
- Reads environment variables
- Configures structlog format (JSON vs pretty)
- Initializes OpenTelemetry if enabled
- Provides
init_telemetry()called at Django startup
Logging (telemetry/logging.py)¶
Structured logging configuration:
- Uses structlog with stdlib logging integration
- JSON output in production, colored pretty output in development
- Automatic context variable merging
- Quiets noisy third-party loggers
Middleware (telemetry/middleware.py)¶
Django middleware that:
- Generates or accepts
X-Request-IDheader - Extracts user ID from authenticated sessions
- Extracts labels from URL paths
- Binds context for request duration
- Logs request completion with timing
Labels (telemetry/labels.py)¶
URL-based label extraction:
- Parses resource patterns from API and WebSocket paths
- Extracts resource IDs (tournament.id, draft.id, etc.)
- Returns context dict for log enrichment
WebSocket (telemetry/websocket.py)¶
Mixin for Django Channels consumers:
- Generates unique connection IDs
- Extracts labels from WebSocket paths
- Binds context for connection duration
- Logs connect/disconnect events
Celery (telemetry/celery.py)¶
Base task class for Celery:
- Binds task.id and task.name context
- Supports request context propagation
- Logs task lifecycle events
Tracing (telemetry/tracing.py)¶
Optional OpenTelemetry integration:
- OTLP exporter for distributed tracing
- Django auto-instrumentation
- Configurable sampling rate
Data Flow¶
HTTP Request Flow¶
sequenceDiagram
participant Client
participant Nginx
participant Middleware
participant View
participant Log
Client->>Nginx: Request with X-Request-ID
Nginx->>Middleware: Forward request
Middleware->>Middleware: Extract context (request.id, user.id, labels)
Middleware->>View: Process with bound context
View->>Log: log.info("event", key=value)
Log-->>stdout: JSON with context
View->>Middleware: Response
Middleware->>Log: log.info("request_completed")
Middleware->>Client: Response with X-Request-ID
Hold "Alt" / "Option" to enable pan & zoom
WebSocket Flow¶
sequenceDiagram
participant Client
participant Consumer
participant Mixin
participant Log
Client->>Consumer: Connect
Consumer->>Mixin: telemetry_connect()
Mixin->>Mixin: Generate ws_conn_id, extract labels
Mixin->>Log: log.info("ws_connected")
Note over Consumer: Connection lifetime with bound context
Consumer->>Log: log.info("event") includes context
Client->>Consumer: Disconnect
Consumer->>Mixin: telemetry_disconnect(code)
Mixin->>Log: log.info("ws_disconnected")
Mixin->>Mixin: Clear context
Hold "Alt" / "Option" to enable pan & zoom
Celery Task Flow¶
sequenceDiagram
participant View
participant Celery
participant Task
participant Log
View->>Celery: my_task.delay(args, _request_id=id)
Celery->>Task: Execute
Task->>Task: Bind context (task.id, request.id)
Task->>Log: log.info("task_started")
Task->>Task: Run task logic
Task->>Log: log.info("task_completed")
Task->>Task: Clear context
Hold "Alt" / "Option" to enable pan & zoom
Design Principles¶
1. Context Propagation¶
All context is propagated via structlog.contextvars:
- Automatically included in every log within scope
- Thread-safe via Python contextvars
- Cleared on scope exit to prevent leaks
2. Opt-in Complexity¶
- Logging works out of the box
- OpenTelemetry requires explicit enablement
- Progressive enhancement without breaking changes
3. Semantic Field Names¶
Fields follow OpenTelemetry semantic conventions:
request.id- Request identifieruser.id- User identifierhttp.method- HTTP methodhttp.status_code- Response statustask.id- Celery task ID
4. Minimal Performance Impact¶
- JSON serialization is deferred
- Context binding uses efficient contextvars
- Sampling reduces tracing overhead
Log Output Examples¶
Development (pretty format)¶
2024-01-15 10:23:45 [info ] request_completed [telemetry.middleware] http.method=GET http.route=/api/tournaments/5/ http.status_code=200 duration_ms=12.34 request.id=abc-123 user.id=42 labels=['tournament'] tournament.id=5
Production (JSON format)¶
{
"timestamp": "2024-01-15T10:23:45.123456Z",
"level": "info",
"logger": "telemetry.middleware",
"event": "request_completed",
"http.method": "GET",
"http.route": "/api/tournaments/5/",
"http.status_code": 200,
"duration_ms": 12.34,
"request.id": "abc-123",
"user.id": 42,
"labels": ["tournament"],
"tournament.id": 5
}