Tracing Guide¶
This guide covers OpenTelemetry distributed tracing setup.
Overview¶
OpenTelemetry tracing is opt-in and disabled by default. When enabled, it provides:
- Distributed trace correlation across services
- Request timing and span visualization
- Automatic Django instrumentation
Enabling Tracing¶
Set the following environment variables:
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=dtx-backend # optional, default: dtx-backend
What Gets Traced¶
When enabled, OpenTelemetry auto-instruments:
Django Requests¶
Every HTTP request creates a span with:
- HTTP method and route
- Response status code
- Request/response timing
Outbound HTTP Requests¶
Calls via requests library create child spans:
- Target URL
- Response status
- Timing
System Metrics¶
Optional system metrics collection:
- CPU usage
- Memory usage
- Process metrics
Configuration¶
Sample Rate¶
Control what percentage of requests are traced:
# 10% sampling (default)
OTEL_TRACES_SAMPLER_ARG=0.1
# 100% sampling (development only)
OTEL_TRACES_SAMPLER_ARG=1.0
# 1% sampling (high-traffic production)
OTEL_TRACES_SAMPLER_ARG=0.01
Authentication¶
For collectors requiring authentication:
Local Development¶
See Local Observability for running Jaeger locally.
Quick start:
# Start Jaeger
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/jaeger:latest
# Enable tracing
export OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Start Django
python manage.py runserver
View traces at http://localhost:16686
Production Backends¶
OpenTelemetry supports many backends:
| Backend | Endpoint Format |
|---|---|
| Jaeger | http://jaeger:4317 |
| Grafana Tempo | http://tempo:4317 |
| Honeycomb | https://api.honeycomb.io |
| Datadog | http://datadog-agent:4317 |
Trace Context Propagation¶
Traces automatically propagate via HTTP headers:
This means:
- Requests from frontend carry trace context
- Outbound API calls continue the trace
- Celery tasks can receive parent trace context
Manual Spans¶
For custom instrumentation:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def complex_operation():
with tracer.start_as_current_span("complex_operation") as span:
span.set_attribute("step", "initialization")
initialize()
span.set_attribute("step", "processing")
result = process()
span.set_attribute("result_count", len(result))
return result
Troubleshooting¶
Traces Not Appearing¶
- Verify
OTEL_ENABLED=trueis set - Check endpoint is reachable
- Verify sample rate isn't 0
- Check collector logs for errors
Missing Spans¶
Auto-instrumentation may miss:
- Custom protocols
- Background threads without context propagation
- Code running before instrumentation
High Overhead¶
Reduce with lower sample rate:
Disabling Tracing¶
To disable without code changes:
Tracing initialization becomes a no-op, with minimal performance impact.