Skip to content
Gabriel Carvalho
Go back

Open-Telemetry part 1: Instrumenting Python (With profiles + LGTM)

Edit page

Instrumenting Python apps with OpenTelemetry (and profiles) has been one of those “why didn’t I do this sooner?” upgrades for me. If you’re running FastAPI in production and mostly relying on logs, you’re basically debugging in the dark. Logs tell you that something broke. Traces show you the path. Metrics show you trends. But profiles? Profiles show you exactly where your CPU went to die.

The best part is how much you get for free. With OpenTelemetry, FastAPI, HTTP clients, async DB drivers, Redis — all of it can be auto-instrumented. You add a few dependencies, flip a couple configs, and suddenly you have traces and metrics flowing without rewriting your whole codebase.

This post is about setting that up: instrumenting a Python app with OpenTelemetry and wiring in profiling so traces and CPU data live together. Once you’ve debugged performance this way, going back to logs-only feels prehistoric.

If you want to skip straight to the code, there’s a ready-made template here.

tempo trace view at grafana

Table of contents

Open Table of contents

What Gets Instrumented Automatically

When you run your application/api with opentelemetry-instrument, here’s what happens without a single line of code change:

💡 Note: Automatic instrumentation is just the foundation. You can always use the OpenTelemetry SDK directly to customize, extend, or manually instrument specific code paths exactly as you would with manual instrumentation. Think of this as the baseline; you can build on top of it.

FastAPI/ASGI Handler Instrumentation (also other frameworks such as flask, django and celery)

Every incoming HTTP request automatically generates:

asyncpg (PostgreSQL) Instrumentation

Database operations are wrapped:

HTTP Client Instrumentation

Outgoing HTTP requests via httpx or requests:

Redis Instrumentation (aioredis)

Cache operations tracked:

AsyncIO Instrumentation

Async operations visibility:

Supported Instrumentations (Complete List)

OpenTelemetry Bootstrap auto-discovers and installs instrumentation packages for the following libraries:

asyncio, asgi, asyncpg, click, dbapi, fastapi, grpc, httpx, logging, redis, requests, sqlite3, sqlalchemy, starlette, threading, urllib, wsgi

If any of these libraries are installed in your environment, their corresponding instrumentation package is automatically installed and activated.


The Minimal Stack

ComponentPurposeNote
FastAPI + UvicornWeb frameworkFull ASGI instrumentation built-in
OpenTelemetry DistroMeta-package with all core componentsDon’t cherry-pick; use the official bundle
OTLP ExporterSends traces, metrics, logs over the wiregRPC protocol, widely supported
Pyroscope(Optional) Links traces to CPU/memory profilesGame-changer for finding bottlenecks

Dependencies and Installation

Run the following command to install the appropriate packages:

poetry add opentelemetry-distro opentelemetry-exporter-otlp pyroscope-otel opentelemetry-instrumentation-system-metrics

OR

Add to pyproject.toml

[tool.poetry.dependencies]
python = "^3.13"
fastapi = "^0.104.0"
uvicorn = "^0.24.0"

# OpenTelemetry core stack
opentelemetry-distro = ">=0.60b1,<0.61"
opentelemetry-exporter-otlp = ">=1.39.1,<2.0.0"

# System metrics (not auto-installed by bootstrap)
opentelemetry-instrumentation-system-metrics = ">=0.60b1,<0.61"

# Optional: Continuous profiling
pyroscope-otel = ">=0.4.1,<0.5.0"

What Each Package Does


OpenTelemetry Bootstrap Flow

The opentelemetry-instrument command uses bootstrap to discover and automatically configure instrumentation. Here’s the exact flow:

Step 1: Bootstrap Discovers Libraries

opentelemetry-bootstrap -a install

This command:

⚠️ Alert: Bootstrap installs all available instrumentations for libraries it detects in your environment. If you have httpx, Redis, asyncpg, SQLAlchemy, etc., bootstrap will install instrumentation for all of them automatically. This can increase your Python package count. To disable specific instrumentations at runtime, use OTEL_PYTHON_DISABLED_INSTRUMENTATIONS (see Environment Configuration).

bootstrap install view

Libraries auto-instrumented by bootstrap (if installed in your project):

Step 2: Application Startup with Auto-Instrumentation

opentelemetry-instrument uvicorn --host 0.0.0.0 --port 8000 app.main:app

When you run with opentelemetry-instrument, it imports all the discovered instrumentation packages and wraps library entry points before your application code even starts running. This means FastAPI, asyncpg, Redis, HTTP clients, and asyncio tasks are all intercepted automatically. The beautiful part: you don’t change a single line of your code. Your FastAPI routes, database queries, and HTTP calls all stay exactly as they were.

Step 3: Telemetry Collection Begins

Once requests start flowing through your application, the magic happens. You get traces showing the full path (request → database queries → Redis operations → HTTP calls → response), metrics tracking request rates and latency percentiles, logs enriched with trace context so you can correlate them with spans, and if you enabled Pyroscope, CPU and memory profiles that line up with specific traces.

Step 4: Export via OTLP

Everything collected goes straight to the endpoint you specified in OTEL_EXPORTER_OTLP_ENDPOINT. The data can be sent via gRPC (efficient and binary) or HTTP/Protobuf (more compatible). Batching happens automatically to avoid overwhelming your collector, and sampling is applied to keep data volumes manageable.


Environment Configuration

Core Configuration Variables

# Service Identity
OTEL_SERVICE_NAME="my-fastapi-service"
# The service name that appears in all observability platforms.
# Use lowercase with hyphens. This is your primary identifier.

OTEL_SERVICE_NAMESPACE="production"
# Groups related services in dashboards. Use: production, staging, development.
# All services with the same namespace appear together in the UI.

OTEL_SERVICE_VERSION="1.0.0"
# Semantic version of your service. Used to correlate issues with deployments.
# Format: MAJOR.MINOR.PATCH

# OTLP Exporter Configuration
OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
# The collector address where traces, metrics, and logs are sent.
# Format: <protocol>://<host>:<port>
# Common ports: 4317 (gRPC, default), 4318 (HTTP/Protobuf)
# In Kubernetes: http://otel-collector.observability.svc.cluster.local:4317

OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
# Protocol for sending telemetry data.
# "grpc"          = gRPC binary protocol (port 4317, ~3x smaller payload, lower latency)
# "http/protobuf" = HTTP with Protobuf (port 4318, better firewall compatibility)
# CRITICAL: Protocol must match the collector's listening port!
# ✅ gRPC protocol → port 4317
# ✅ HTTP protocol → port 4318
# ❌ Mixed ports/protocols = connection timeouts

OTEL_EXPORTER_OTLP_INSECURE="true"
# Controls TLS/SSL certificate validation.
# "true"  = Skip certificate validation (development, self-signed certs)
# "false" = Validate certificate (production with HTTPS)
# In production with proper certificates, set to false or omit (defaults to false).

# Export Target Selection
OTEL_TRACES_EXPORTER="otlp"
# Which exporter backend to use for traces.
# "otlp" = OpenTelemetry Protocol (recommended)
# "jaeger", "zipkin" = Alternative backends (less common)

OTEL_METRICS_EXPORTER="otlp"
# Which exporter backend to use for metrics.
# "otlp" = OpenTelemetry Protocol
# "prometheus" = Prometheus scraping (not used here; we use push-based OTLP)

OTEL_LOGS_EXPORTER="otlp"
# Which exporter backend to use for logs.
# "otlp" = Push logs to collector
# "logging" = Write to Python's logging module (development only)

# Python-Specific Auto-Instrumentation
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED="true"
# Automatically enriches Python logs with trace context (trace ID, span ID).
# When enabled, every log entry includes the current trace/span for correlation.

OTEL_PYTHON_LOG_LEVEL="INFO"
# Logging level for OpenTelemetry's internal debug output.
# Levels: DEBUG (verbose), INFO (normal), WARNING (quiet), ERROR (very quiet)
# Use DEBUG when troubleshooting missing traces/metrics.

# Instrumentation Control
OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="click,grpc"
# Comma-separated list of instrumentations to skip (no "opentelemetry-instrumentation-" prefix).
# Use this to reduce overhead if you don't use certain libraries.
# Available: asyncio, asyncpg, asgi, click, dbapi, fastapi, grpc, httpx, logging, redis, requests, sqlite3, sqlalchemy, starlette, threading, urllib, wsgi

OTEL_INSTRUMENTATION_HTTP_EXCLUDED_URLS="/health,/metrics,/ready"
# URLs to exclude from HTTP instrumentation (comma-separated).
# Prevents noise from health check endpoints.
# Each request to these URLs won't generate traces/metrics.

# Sampling Configuration
OTEL_TRACES_SAMPLER="parentbased_always_on"
# Sampling strategy.
# "always_on" = Keep all traces (high volume, good for debugging)
# "always_off" = Drop all traces (testing only)
# "parentbased_always_on" = If parent sampled, keep child; otherwise follow probability
# "parentbased_trace_id_ratio" = Parent-based with probability ratio

OTEL_TRACES_SAMPLER_ARG="1.0"
# Sampling rate when using ratio-based samplers.
# "1.0" = 100% sampling (keep all traces)
# "0.1" = 10% sampling (keep 1 in 10 traces)
# "0.01" = 1% sampling (cost reduction at scale)

# Pyroscope Configuration (Optional)
PYROSCOPE_SERVER_ADDRESS="http://localhost:4040"
# Grafana Pyroscope server address for continuous profiling.
# Profiles are sampled at 100Hz (100 samples per second) by default.
# Only used if pyroscope-otel is installed and initialized in your app.

PYROSCOPE_AUTH_TOKEN=""
# Authentication token for Grafana Cloud Pyroscope.
# Leave empty for local/self-hosted Pyroscope.
# Required for cloud.grafana.com/pyroscope.

OTLP Protocol Configuration Examples

gRPC Protocol (Recommended):

OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
OTEL_EXPORTER_OTLP_INSECURE="true"

HTTP/Protobuf Protocol (Better Firewall Compatibility):

OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4318"
OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
OTEL_EXPORTER_OTLP_INSECURE="true"

Production with HTTPS:

OTEL_EXPORTER_OTLP_ENDPOINT="https://otel-collector.example.com:4317"
OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
OTEL_EXPORTER_OTLP_INSECURE="false"

Prometheus Integration (Alternative Metrics Export)

By default, OpenTelemetry exports metrics to your OTLP collector. But you have options. If you want to expose metrics in Prometheus format and scrape them, you can configure it like this:

OTEL_METRICS_EXPORTER=prometheus
OTEL_EXPORTER_PROMETHEUS_PORT=9464

This exposes metrics at http://localhost:9464/metrics in Prometheus format. Your Prometheus scraper can then pull from that endpoint.

Push vs. Pull Metrics

You can choose how metrics are exported:

Push (OTLP - Recommended):

Pull (Prometheus):

Recommendation: Use Default + System Metrics

Here’s a practical tip: stick with the default OTLP push-based approach combined with system metrics. With opentelemetry-instrumentation-system-metrics installed and OTEL_METRICS_EXPORTER=otlp, you get CPU, memory, disk, and network metrics automatically attached to every trace without any scraping complexity. This is more than sufficient for most applications and you avoid the operational overhead of Prometheus scraping.

If you already have Prometheus running and want to use it, the Prometheus option is there. But for greenfield projects, OTLP + system metrics is simpler and more powerful.


Understanding System Metrics

By default, without opentelemetry-instrumentation-system-metrics, OpenTelemetry collects metrics from your application’s instrumented libraries:

What System Metrics Adds

With opentelemetry-instrumentation-system-metrics explicitly installed, you get OS-level metrics added to every trace:

MetricWhat It MeasuresWhy It Matters
system.cpu.usageCPU utilization at trace timeCorrelate slow requests with high CPU
system.cpu.timeCPU time (user + system)Distinguish between I/O wait vs CPU burn
system.memory.usageMemory consumption (RSS)Detect memory leaks or spikes
system.memory.limitAvailable memoryContext for memory pressure
system.disk.io.bytes_readBytes read from diskTrace disk I/O patterns
system.disk.io.bytes_writtenBytes written to diskIdentify excessive logging/writing
system.network.io.bytes_sentBytes sent on networkMonitor bandwidth usage per request
system.network.io.bytes_recvBytes received on networkDetect large payload transfers

These become attributes on the root span, so when you query “why is this trace slow?”, you can immediately see: “CPU was at 85%, memory at 2GB”.

metrics view at grafana

📋 Complete List: For the full list of available system metrics and their definitions, see otel-metrics-list.md in the template repository.


Resource Attributes for Telemetry Enrichment

Resource attributes provide metadata about your service that appear on every single trace, metric, and log. They’re not collected data; they’re descriptive context that enriches your telemetry.

Use them to tag your environment, version, team, and deployment information so you can filter and correlate issues.

Configuration via Environment Variables

Set the OTEL_RESOURCE_ATTRIBUTES environment variable with comma-separated key-value pairs:

export OTEL_RESOURCE_ATTRIBUTES="\
service.namespace=production,\
service.environment=production,\
service.version=1.0.0,\
service.build.git_hash=abc123def456,\
service.build.git_branch=main,\
service.owner.name=backend-team,\
service.owner.contact=backend @ company.com"

Attribute Reference

AttributeExamplePurpose
service.namespaceproductionEnvironment grouping. All services with same namespace appear together in dashboards.
service.environmentproductionEnvironment name. Allows filtering by prod/staging/dev.
service.version1.0.0Semantic version. Used to correlate issues with specific releases.
service.hostapi-pod-01Physical/logical host. Helps identify which instance had the issue.
service.build.git_hashabc123def456Git commit SHA. Allows jumping to exact code that was deployed.
service.build.git_branchmainGit branch. Useful for canary deployments tracking.
service.build.deployment.userci-botWho triggered the deployment. Useful for incident correlation.
service.build.deployment.triggergithub-actionsHow deployment was triggered. Helps identify bad deploys.
service.owner.nameBackend TeamTeam responsible for the service. Important for on-call routing.
service.owner.contactbackend @ company.comPrimary contact email. Used in alerting.

resource-metrics view at the trace

Docker/Deployment Integration

Set these environment variables in your CI/CD pipeline before deploying:

# Example for GitHub Actions
export OTEL_SERVICE_NAMESPACE="production"
export OTEL_SERVICE_VERSION="1.0.0"
export SERVICE_BUILD_GIT_HASH="abc123def456"      # Set from CI: ${{ github.sha }}
export SERVICE_BUILD_GIT_BRANCH="main"             # Set from CI: ${{ github.ref_name }}
export SERVICE_BUILD_DEPLOYMENT_USER="ci-bot"     # Set from CI: ${{ github.actor }}
export SERVICE_BUILD_DEPLOYMENT_TRIGGER="github-actions"
export SERVICE_OWNER_NAME="backend-team"

Then in your deployment script or Dockerfile:

ENV OTEL_RESOURCE_ATTRIBUTES="\
service.namespace=${OTEL_SERVICE_NAMESPACE},\
service.version=${OTEL_SERVICE_VERSION},\
service.build.git_hash=${SERVICE_BUILD_GIT_HASH},\
service.build.git_branch=${SERVICE_BUILD_GIT_BRANCH},\
service.build.deployment.user=${SERVICE_BUILD_DEPLOYMENT_USER},\
service.build.deployment.trigger=${SERVICE_BUILD_DEPLOYMENT_TRIGGER},\
service.owner.name=${SERVICE_OWNER_NAME}"

Or in CI/CD (GitHub Actions example):

env:
  OTEL_SERVICE_VERSION: "1.0.0"
  SERVICE_BUILD_GIT_HASH: ${{ github.sha }}
  SERVICE_BUILD_GIT_BRANCH: ${{ github.ref_name }}
  SERVICE_BUILD_DEPLOYMENT_USER: ${{ github.actor }}
  SERVICE_BUILD_DEPLOYMENT_TRIGGER: "github-actions"

Pyroscope: Linking Traces to Profiles

Traces tell you what happened and how long it took. Profiles tell you where the CPU and memory were spent.

The Problem Pyroscope Solves

A slow trace looks like:

POST /api/expensive-operation - 1500ms
├─ database query - 200ms (asyncpg)
├─ external API call - 300ms (httpx)
└─ processing - 1000ms ??? (where is the time going?)

Traces can’t answer “what’s happening in that 1000ms processing?” Profiling can:

process_data()
├─ heavy_computation() - 600ms CPU
│   └─ numpy_array_operation() - 580ms
└─ data_transformation() - 400ms CPU
    └─ json_serialization() - 380ms

With Pyroscope + OpenTelemetry, each trace automatically gets a link to the profile captured during that trace’s execution window. Click “View Profile” in your trace and see the exact call stack where CPU/memory was spent.

How Pyroscope Integration Works

When you install and configure pyroscope-otel, the PyroscopeSpanProcessor does this:

  1. Captures profiles continuously across your entire application
  2. At trace creation time, the processor attaches the pyroscope.profile.id attribute to the root span of each trace (not individual spans — the entire transaction)
  3. The profile ID is a reference to the time window when that trace was executing
  4. In Grafana Tempo (or compatible observability platform), you see a “View Profile” button that jumps directly to the profile segment

Example trace attributes with Pyroscope:

{
  "trace_id": "abc123...",
  "span_id": "root_span_001",
  "service.name": "my-fastapi-service",
  "http.route": "/api/expensive-operation",
  "pyroscope.profile.id": "cpu:my-fastapi-service{}2025-01-24T10:30:00Z",
}

profile view at grafana

Initialization

Add this to app/main.py at module level, before FastAPI app creation:

import os
import pyroscope
from pyroscope.otel import PyroscopeSpanProcessor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

# Configure Pyroscope FIRST (required before any spans are created)
pyroscope.configure(
    app_name=os.getenv("OTEL_SERVICE_NAME", "my-fastapi-service"),
    server_address=os.getenv("PYROSCOPE_SERVER_ADDRESS", "http://localhost:4040"),
    # Optional: for Grafana Cloud
    # auth_token=os.getenv("PYROSCOPE_AUTH_TOKEN", ""),
)

# Register Pyroscope with OpenTelemetry
# This ensures the profile ID is attached to every root span
if hasattr(trace, 'get_tracer_provider'):
    provider = trace.get_tracer_provider()
    if isinstance(provider, TracerProvider):
        provider.add_span_processor(PyroscopeSpanProcessor())

# Now safe to create FastAPI app
from fastapi import FastAPI
app = FastAPI()

Critical: Pyroscope must be initialized before the first request. If you configure it after FastAPI starts, span profiles won’t be captured or linked. Put this at module level.

Important: Profile Linking is at Trace Level

The PyroscopeSpanProcessor attaches pyroscope.profile.id to the root span (the top-level span wrapping the entire request). Child spans inherit this attribute via context propagation, but the actual profile data is captured for the entire transaction duration, not individual operations.

This design makes sense: you want to see “what was the CPU profile while this entire request was executing?” not “what was the CPU profile just for this one database query?”


Docker and Deployment

Multi-Stage Build (Production-Ready)

This approach separates dependency compilation from runtime, reducing the final image size:

FROM cgr.dev/chainguard/wolfi-base as builder

RUN apk add --no-cache python-3.13 py3.13-pip poetry

ENV POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_IN_PROJECT=1 \
    POETRY_VIRTUALENVS_CREATE=1 \
    POETRY_CACHE_DIR=/tmp/poetry_cache

WORKDIR /app/

COPY pyproject.toml poetry.lock ./

RUN --mount=type=cache,target=$POETRY_CACHE_DIR \
    poetry install --without dev --no-root

ENV PATH="/app/.venv/bin:$PATH"

# Install OpenTelemetry instrumentation packages for all detected libraries
RUN opentelemetry-bootstrap -a install


FROM cgr.dev/chainguard/wolfi-base as runtime

RUN apk add --no-cache python-3.13

ENV VIRTUAL_ENV=/app/.venv \
    PATH="/app/.venv/bin:$PATH"

COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}

COPY app ./app
COPY pyproject.toml poetry.lock ./

EXPOSE 8000

# Start with opentelemetry-instrument wrapper for automatic instrumentation
CMD ["sh", "-c", "opentelemetry-instrument uvicorn --proxy-headers --host 0.0.0.0 --port 8000 app.main:app"]

Image Size Comparison

With Pyroscope (Wolfi base): ~240MB

Without Pyroscope (Alpine base): ~180MB

Switching to Alpine (If Not Using Pyroscope)

If you don’t need continuous profiling and want a smaller image:

FROM python:3.13-alpine as builder
# ... same builder stage ...

FROM python:3.13-alpine as runtime
RUN apk add --no-cache libpq

# ... same runtime setup ...
# Final image size: ~150MB

Remove pyroscope-otel from pyproject.toml when using Alpine.


Troubleshooting

Traces Not Appearing in Backend

Protocol/Port Mismatch:

# Check your collector's listening ports
docker logs otel-collector | grep -i listen

# Test connectivity
curl -i http://localhost:4317  # gRPC (binary, no HTTP response)
curl -i http://localhost:4318  # HTTP/Protobuf (will respond)

Protocol must match port:

If using gRPC on port 4318 (or vice versa), you’ll get connection timeouts with no traces.

Enable debug logging:

OTEL_PYTHON_LOG_LEVEL=DEBUG opentelemetry-instrument uvicorn app.main:app

Pyroscope Profile ID Missing from Spans

Symptoms: Traces appear, but pyroscope.profile.id attribute is missing.

Root causes:

  1. Pyroscope initialized after first request:

    # ❌ WRONG: Initialized in a route handler
    @app.on_event("startup")
    async def startup():
        pyroscope.configure(...)  # Too late!
    
    # ✅ CORRECT: Module-level initialization
    pyroscope.configure(...)
    from fastapi import FastAPI
    app = FastAPI()
  2. PyroscopeSpanProcessor not registered:

    # Verify this is in your app startup
    provider.add_span_processor(PyroscopeSpanProcessor())
  3. Pyroscope server unreachable:

    curl -i http://localhost:4040

libgc Not Found Error (Alpine + Pyroscope)

Error: libgc: No such file or directory when Pyroscope initializes.

Solution: Alpine doesn’t include the garbage collector library. Use Wolfi base image instead:

# ✅ CORRECT
FROM cgr.dev/chainguard/wolfi-base as runtime
RUN apk add --no-cache python-3.13

# ❌ WRONG (Pyroscope fails at runtime)
FROM python:3.13-alpine

If you want Alpine, remove Pyroscope from dependencies.

System Metrics Not Appearing

Missing in attributes: system.cpu.usage, system.memory.usage, etc.

Cause: opentelemetry-instrumentation-system-metrics not installed.

Fix with Poetry:

# Add to pyproject.toml
opentelemetry-instrumentation-system-metrics = ">=0.60b1,<0.61"

Then run:

opentelemetry-bootstrap -a install

Or install manually:

poetry add opentelemetry-instrumentation-system-metrics

Bootstrap Installing Too Many Packages

If opentelemetry-bootstrap is bloating your image and you don’t need all instrumentations:

Option 1: Disable at runtime

OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="asyncio,click,grpc,threading,urllib"
opentelemetry-instrument uvicorn app.main:app

Option 2: Manual installation in Dockerfile

# Skip bootstrap, install only what you need
RUN pip install \
    opentelemetry-instrumentation-fastapi \
    opentelemetry-instrumentation-asyncpg \
    opentelemetry-instrumentation-redis \
    opentelemetry-instrumentation-httpx \
    opentelemetry-instrumentation-asyncio

Then use opentelemetry-instrument normally — it will activate only the installed instrumentations.

Certificate Validation Errors (HTTPS)

Error: SSL: CERTIFICATE_VERIFY_FAILED

For self-signed certificates (development):

OTEL_EXPORTER_OTLP_INSECURE="true"

For production with proper certificates:

OTEL_EXPORTER_OTLP_INSECURE="false"
# (or omit, as false is default)

Multiple Workers with Pre-fork Servers

Problem: Running Gunicorn with multiple workers breaks metric export because forking creates inconsistencies in background threads and locks used by OpenTelemetry’s PeriodicExportingMetricReader.

Symptoms:

Solution: Use Gunicorn + UvicornWorker

Instead of:

# ❌ BROKEN: Metrics won't export with multiple workers
gunicorn myapp.main:app --workers 4

Use:

# ✅ CORRECT: UvicornWorker preserves background threads
opentelemetry-instrument gunicorn \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  myapp.main:app

The UvicornWorker class is specifically designed to handle forks while preserving background processes and threads. This ensures:

Why this works: UvicornWorker manages the async event loop per worker, avoiding the thread/lock inconsistencies that occur with standard Gunicorn workers.

Alternative: If you don’t need multiple workers, stick with single-worker Uvicorn:

opentelemetry-instrument uvicorn --host 0.0.0.0 --port 8000 app.main:app

Bonus: Return Trace ID as Response Header

When debugging, users often ask “How do I find my request in the logs?” Returning the otel-trace-id header in the HTTP response solves this immediately. Users can provide the trace ID to support, and you can look it up in your observability platform.

FastAPI Middleware

Add this middleware to your app:

from fastapi import FastAPI, Request
from opentelemetry import trace

app = FastAPI()

@app.middleware("http")
async def add_trace_id_header(request: Request, call_next):
    # Get the current span's trace ID
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id
    
    # Format as hex string (without leading zeros that hex() adds)
    trace_id_hex = format(trace_id, '032x')
    
    # Call the endpoint
    response = await call_next(request)
    
    # Add trace ID to response header
    response.headers["otel-trace-id"] = trace_id_hex
    
    return response

Usage:

curl -i http://localhost:8000/api/users

# Response headers:
# HTTP/1.1 200 OK
# ...
# otel-trace-id: abc123def456abc123def456abc123de

User reports: “I got error, trace ID is abc123def456abc123def456abc123de

You search your Jaeger/Tempo UI for that trace ID and immediately see the full request timeline, database queries, external calls, and system metrics.


References


If you liked this post, have questions, or found issues in the setup, reach me on WhatsApp or email.

By Gabriel Carvalho · Template on GitHub


Edit page
Share this post on:

Next Post
Auth at the Edge: Better Auth + Cloudflare Workers