Skip to main content

Probabilistic Sampler

Overview

The Probabilistic Sampler processor drops a fraction of records based on a hash function, keeping a configurable percentage. Unlike tail sampling, the decision is made per record at ingest time without needing to see the whole trace — making it cheap, stateless, and suitable for any signal type.

For traces, the processor hashes the trace ID so all spans of a trace share the same sampling decision (consistent within-trace sampling). For logs, you can either hash the trace ID (when present) or hash a chosen attribute.

Supported types: Logs · Traces

Configuration

ParameterTypeDefaultDescription
sampling_percentagefloatPercentage of records to keep (0–100). 100 keeps everything; 0 drops everything.
hash_seeduint32upstream defaultSeed for the hash function. Set to a non-zero shared value across collectors so they all sample the same records consistently.
modestringupstream defaultSampling mode. One of hash_seed (legacy), equalizing, proportional. The newer equalizing and proportional modes use OTel's standard sampling threshold encoded in the trace state.
sampling_precisionintupstream defaultNumber of hex digits of precision for equalizing/proportional mode (1–14). Higher = finer sampling granularity.
fail_closedboolfalseWhen true, records that can't be evaluated (missing trace ID, etc.) are dropped instead of forwarded.

Logs-only fields

ParameterTypeDefaultDescription
attribute_sourcestringupstream defaultWhat to hash. record (use from_attribute) or traceID (use the log record's trace ID).
from_attributestringRequired when attribute_source=record. Name of the log attribute whose value is hashed.
sampling_prioritystringName of an attribute that, when present and >0, forces the record to be kept regardless of the sampling decision. Useful for "always keep errors" override.

Example Configurations

Keep 10% of traces

{
"sampling_percentage": 10.0,
"hash_seed": 22,
"mode": "proportional",
}

Keep 1% of logs, but always keep error logs

{
"sampling_percentage": 1.0,
"attribute_source": "record",
"from_attribute": "user.id",
"sampling_priority": "force_keep",
}

Keep 100% of high-priority logs, sample 5% of the rest

Set force_keep=1 upstream (e.g. via the Transform processor) on whatever records you want to retain, then this processor will keep them all while sampling the rest at 5%.

Notes

  • Stateless. No memory grows with cardinality. Cheap. Works the same on a 100-pod fleet as on a 10000-pod fleet.
  • Within-trace consistency. When sampling traces, all spans of a trace share the same decision because they share a trace ID. You won't get half-traces.
  • Fleet consistency. Set the same hash_seed on every collector instance so two collectors processing the same trace make the same decision. Critical when traces split across edge collectors and a gateway.
  • Pipeline order: put this after memory_limiter and before batch — sample first to reduce what enters the batch.
  • Tail sampling vs. probabilistic. Probabilistic is cheap and stateless but blind (you can't sample "all errors"). Use tail sampling when you need policy-based decisions; probabilistic when you need volume reduction.