Probabilistic Sampler

Overview

The Probabilistic Sampler processor drops a fraction of records based on a hash function, keeping a configurable percentage. Unlike tail sampling, the decision is made per record at ingest time without needing to see the whole trace — making it cheap, stateless, and suitable for any signal type.

For traces, the processor hashes the trace ID so all spans of a trace share the same sampling decision (consistent within-trace sampling). For logs, you can either hash the trace ID (when present) or hash a chosen attribute.

Supported types: Logs · Traces

Configuration

Parameter	Type	Default	Description
`sampling_percentage`	float	—	Percentage of records to keep (0–100). `100` keeps everything; `0` drops everything.
`hash_seed`	uint32	upstream default	Seed for the hash function. Set to a non-zero shared value across collectors so they all sample the same records consistently.
`mode`	string	upstream default	Sampling mode. One of `hash_seed` (legacy), `equalizing`, `proportional`. The newer `equalizing` and `proportional` modes use OTel's standard sampling threshold encoded in the trace state.
`sampling_precision`	int	upstream default	Number of hex digits of precision for `equalizing`/`proportional` mode (1–14). Higher = finer sampling granularity.
`fail_closed`	bool	`false`	When true, records that can't be evaluated (missing trace ID, etc.) are dropped instead of forwarded.

Logs-only fields

Parameter	Type	Default	Description
`attribute_source`	string	upstream default	What to hash. `record` (use `from_attribute`) or `traceID` (use the log record's trace ID).
`from_attribute`	string	—	Required when `attribute_source=record`. Name of the log attribute whose value is hashed.
`sampling_priority`	string	—	Name of an attribute that, when present and >0, forces the record to be kept regardless of the sampling decision. Useful for "always keep errors" override.

Example Configurations

Keep 10% of traces

{
  "sampling_percentage": 10.0,
  "hash_seed": 22,
  "mode": "proportional",
}

Keep 1% of logs, but always keep error logs

{
  "sampling_percentage": 1.0,
  "attribute_source": "record",
  "from_attribute": "user.id",
  "sampling_priority": "force_keep",
}

Keep 100% of high-priority logs, sample 5% of the rest

Set force_keep=1 upstream (e.g. via the Transform processor) on whatever records you want to retain, then this processor will keep them all while sampling the rest at 5%.

Notes

Stateless. No memory grows with cardinality. Cheap. Works the same on a 100-pod fleet as on a 10000-pod fleet.
Within-trace consistency. When sampling traces, all spans of a trace share the same decision because they share a trace ID. You won't get half-traces.
Fleet consistency. Set the same hash_seed on every collector instance so two collectors processing the same trace make the same decision. Critical when traces split across edge collectors and a gateway.
Pipeline order: put this after memory_limiter and before batch — sample first to reduce what enters the batch.
Tail sampling vs. probabilistic. Probabilistic is cheap and stateless but blind (you can't sample "all errors"). Use tail sampling when you need policy-based decisions; probabilistic when you need volume reduction.

Overview​

Configuration​

Logs-only fields​

Example Configurations​

Keep 10% of traces​

Keep 1% of logs, but always keep error logs​

Keep 100% of high-priority logs, sample 5% of the rest​

Notes​