Tail Sampling
Overview
The Tail Sampling processor decides whether to keep or drop a trace after all of its spans have been collected, allowing decisions based on the trace as a whole — total latency, error status, span count, attribute combinations. This is the right tool when you need policy-based sampling ("keep all errors", "keep slow traces", "rate-limit by service") rather than blind volume reduction.
The processor must be paired with the Group By Trace processor upstream — that's what holds spans in memory until the trace is complete.
Supported types: Traces
Required Pipeline Order
receivers → memory_limiter → groupbytrace → tail_sampling → batch → exporters
Configuration
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
decision_wait | duration | — | Yes | How long to buffer a trace before deciding. Should match or exceed groupbytrace.wait_duration. |
num_traces | uint64 | upstream default | No | Maximum number of traces held for decision. Eviction happens when exceeded. |
expected_new_traces_per_sec | uint64 | upstream default | No | Hint to the processor for sizing internal data structures. |
decision_cache_sampled_size | int | upstream default | No | Cache size for already-sampled trace IDs (so late-arriving spans of an already-decided trace inherit the decision). |
decision_cache_not_sampled_size | int | upstream default | No | Cache size for already-rejected trace IDs. |
policies | object[] | — | Yes | At least one policy. The first policy that fires "sample" wins (OR semantics across policies). |
Policy Types
Each policy entry has name and type plus a type-specific config block.
Policy type | Behavior | Config block |
|---|---|---|
always_sample | Keep every trace. | (none) |
latency | Keep traces whose total latency is at least threshold_ms (and optionally below upper_threshold_ms). | latency |
numeric_attribute | Keep traces with a numeric span attribute in [min_value, max_value]. | numeric_attribute |
string_attribute | Keep traces with a string span attribute matching values (exact or regex). | string_attribute |
boolean_attribute | Keep traces with a boolean span attribute matching value. | boolean_attribute |
status_code | Keep traces whose root span (or any span, depending on impl) has one of status_codes (OK, ERROR, UNSET). | status_code |
probabilistic | Keep sampling_percentage of traces. Like the probabilistic sampler, but applied at tail time. | probabilistic |
rate_limiting | Keep at most spans_per_second total spans. | rate_limiting |
span_count | Keep traces with span count in [min_spans, max_spans]. | span_count |
trace_state | Keep traces whose tracestate has key matching one of values. | trace_state |
ottl_condition | Keep traces whose spans match an OTTL boolean expression. | ottl_condition |
and | Keep traces matching ALL and_sub_policy entries. | and |
composite | Apply multiple sub-policies with rate allocation per policy. | composite |
Common type-specific config
"latency": { "threshold_ms": 1000, "upper_threshold_ms": 0 },
"numeric_attribute": { "key": "http.status_code", "min_value": 500, "max_value": 599, "invert_match": false },
"string_attribute": { "key": "service.name", "values": ["billing"], "enabled_regex_matching": false, "cache_max_size": 0, "invert_match": false },
"boolean_attribute": { "key": "is_critical", "value": true },
"status_code": { "status_codes": ["ERROR"] },
"probabilistic": { "hash_salt": "salt", "sampling_percentage": 10.0 },
"rate_limiting": { "spans_per_second": 1000 },
"span_count": { "min_spans": 5, "max_spans": 100 },
"trace_state": { "key": "tenant", "values": ["paid"] },
"ottl_condition": { "error_mode": "ignore", "expression": ["attributes[\"http.status_code\"] >= 500"] },
Example Configuration
{
"decision_wait": "10s",
"num_traces": 100000,
"expected_new_traces_per_sec": 5000,
"policies": [
// Always keep error traces
{
"name": "errors",
"type": "status_code",
"status_code": { "status_codes": ["ERROR"] },
},
// Keep slow traces (latency >= 1s)
{
"name": "slow",
"type": "latency",
"latency": { "threshold_ms": 1000 },
},
// Keep 100% of traces from the billing service
{
"name": "billing",
"type": "string_attribute",
"string_attribute": {
"key": "service.name",
"values": ["billing", "billing-internal"],
},
},
// Keep 5% of everything else (background sampling)
{
"name": "background",
"type": "probabilistic",
"probabilistic": { "sampling_percentage": 5.0 },
},
],
}
Composite policy with rate allocation
{
"decision_wait": "10s",
"policies": [
{
"name": "composite-policy",
"type": "composite",
"composite": {
"max_total_spans_per_second": 1000,
"policy_order": ["errors", "slow", "background"],
"composite_sub_policy": [
{ "name": "errors", "type": "status_code" },
{ "name": "slow", "type": "latency" },
{ "name": "background", "type": "always_sample" },
],
"rate_allocation": [
{ "policy": "errors", "percent": 50 },
{ "policy": "slow", "percent": 30 },
{ "policy": "background", "percent": 20 },
],
},
},
],
}
Notes
- Pipeline order is mandatory.
groupbytrace→tail_sampling. Without grouping, the processor receives one span at a time and can't make trace-level decisions. - Decision wait latency. Every trace is delayed by at least
decision_wait. This is the cost of policy-based sampling; the probabilistic sampler avoids this delay but can't make trace-level decisions. - OR semantics. Policies are evaluated independently — if any one policy says "sample", the trace is kept. To require multiple conditions, use the
andpolicy type. - Memory. Buffer holds
num_traces× average-trace-size bytes. Sized for peak load, not steady-state. - Late spans. Spans arriving after a trace's decision is finalized inherit the decision via the decision cache. Set
decision_cache_*_sizelarge enough to span the late-span window. - Deduplication concern. If you run multiple tail-sampling collectors that see overlapping spans of the same trace (e.g. behind a load balancer), they may make different decisions. Pin trace IDs to specific collectors or run a single tail-sampler at the gateway.