Sampling
Strathon supports server-side sampling at ingest time to control storage cost without losing audit-critical spans. The sampling decision is made per-span as it arrives at the receiver, after the SDK has already enforced any block/steer policies, so policy enforcement is unaffected by sampling.
Configuration
A single environment variable controls the sample rate:
STRATHON_SAMPLING_RATE (float in [0.0, 1.0], default 1.0)
| Value | Behavior |
|---|---|
1.0 | Keep every span. Default. |
0.5 | Keep ~50% of routine traces (deterministic per trace_id). |
0.1 | Keep ~10% of routine traces. Common production setting. |
0.0 | Drop all routine traces. Only the "always keep" rules apply. |
Values outside [0.0, 1.0] are clamped silently. Non-numeric values fall
back to the default of 1.0 (with a warning logged at startup).
The receiver logs its effective rate at startup:
Sampling rate: 0.100 (expensive LLM threshold: 5000 tokens)
What's always kept (bypasses sampling)
These spans are persisted regardless of the configured rate because they carry outsized audit / debugging value:
-
Policy-annotated spans. Any span with one of these attributes is considered audit-critical and always kept:
strathon.policy.blockedstrathon.policy.steeredstrathon.policy.steer_attemptedstrathon.policy.matched_ids
-
Errors. Any span with
status_code = ERROR. -
Expensive LLM calls. Any span where
gen_ai.usage.total_tokensis above the configured threshold (default: 5000 tokens). These are the calls operators most want to inspect when investigating cost spikes.
If you set STRATHON_SAMPLING_RATE=0.0 and an agent triggers a blocked
tool call, the receiver will still persist that span and its policy_matches
audit row. Only the routine spans around it (LLM calls, workflow steps,
tools that didn't trigger policy) get dropped.
Trace-level coherence
Routine spans are sampled deterministically by hashing the OTel trace_id
to a uniform [0, 1) value (the standard TraceIdRatioBased approach,
using the upper 53 bits of the trace_id's lower 8 bytes: exactly
representable in IEEE-754 doubles).
All spans of a given trace get the same keep/drop decision. This means you never end up with partial traces in storage: either the whole trace is kept or none of it. The receiver doesn't need to buffer trace state to guarantee this; the hash-based decision is stable across spans of the same trace.
When to use which rate
- Local development / staging: keep
1.0so every trace is inspectable. - Production with moderate volume:
0.1to0.5gives you headroom on storage while preserving all incidents and policy matches. - Production with high volume:
0.01or lower; the always-keep rules ensure you don't lose anything that matters.
Why per-span at ingest, not collector-style tail sampling
A full OpenTelemetry Collector tail sampler buffers all spans of a trace, waits for trace completion, then evaluates policies against the assembled trace. That works but requires memory, completion detection, and edge cases under load.
Strathon doesn't need that complexity: each span already carries
enough metadata in its attributes (policy annotations, status, token
counts) for a standalone keep/drop decision. Trace-level coherence is
preserved by hashing the trace_id rather than by buffering. Memory
footprint stays constant regardless of trace duration or fan-out.
Monitoring
The receiver maintains in-memory counters for sampling decisions:
spans_kept_totalspans_dropped_totalspans_force_kept_total(kept by an always-keep rule that overrode a would-be-drop decision)
These are exposed via the /metrics Prometheus endpoint, and are also
accessible via the FastAPI app state for debugging.
Related
- Retention: the other storage-cost lever
- Scaling guide: when sampling starts to matter
- Metrics: sampling counters to watch