Probabilistic Sampler
Overview
The Probabilistic Sampler processor drops a fraction of records based on a hash function, keeping a configurable percentage. Unlike tail sampling, the decision is made per record at ingest time without needing to see the whole trace — making it cheap, stateless, and suitable for any signal type.
For traces, the processor hashes the trace ID so all spans of a trace share the same sampling decision (consistent within-trace sampling). For logs, you can either hash the trace ID (when present) or hash a chosen attribute.
Supported types: Logs · Traces
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
sampling_percentage | float | — | Percentage of records to keep (0–100). 100 keeps everything; 0 drops everything. |
hash_seed | uint32 | upstream default | Seed for the hash function. Set to a non-zero shared value across collectors so they all sample the same records consistently. |
mode | string | upstream default | Sampling mode. One of hash_seed (legacy), equalizing, proportional. The newer equalizing and proportional modes use OTel's standard sampling threshold encoded in the trace state. |
sampling_precision | int | upstream default | Number of hex digits of precision for equalizing/proportional mode (1–14). Higher = finer sampling granularity. |
fail_closed | bool | false | When true, records that can't be evaluated (missing trace ID, etc.) are dropped instead of forwarded. |
Logs-only fields
| Parameter | Type | Default | Description |
|---|---|---|---|
attribute_source | string | upstream default | What to hash. record (use from_attribute) or traceID (use the log record's trace ID). |
from_attribute | string | — | Required when attribute_source=record. Name of the log attribute whose value is hashed. |
sampling_priority | string | — | Name of an attribute that, when present and >0, forces the record to be kept regardless of the sampling decision. Useful for "always keep errors" override. |
Example Configurations
Keep 10% of traces
{
"sampling_percentage": 10.0,
"hash_seed": 22,
"mode": "proportional",
}
Keep 1% of logs, but always keep error logs
{
"sampling_percentage": 1.0,
"attribute_source": "record",
"from_attribute": "user.id",
"sampling_priority": "force_keep",
}
Keep 100% of high-priority logs, sample 5% of the rest
Set force_keep=1 upstream (e.g. via the Transform processor) on whatever records you want to retain, then this processor will keep them all while sampling the rest at 5%.
Notes
- Stateless. No memory grows with cardinality. Cheap. Works the same on a 100-pod fleet as on a 10000-pod fleet.
- Within-trace consistency. When sampling traces, all spans of a trace share the same decision because they share a trace ID. You won't get half-traces.
- Fleet consistency. Set the same
hash_seedon every collector instance so two collectors processing the same trace make the same decision. Critical when traces split across edge collectors and a gateway. - Pipeline order: put this after
memory_limiterand beforebatch— sample first to reduce what enters the batch. - Tail sampling vs. probabilistic. Probabilistic is cheap and stateless but blind (you can't sample "all errors"). Use tail sampling when you need policy-based decisions; probabilistic when you need volume reduction.