Skip to main content

Monitors

Monitors help you catch problems before customers do: define metric-based rules in Praxis, run PromQL on your metrics backend on a schedule, compare results to thresholds, and open or update alerts with notifications to the channels your team already uses.

Concepts

ConceptDescription
QueryA PromQL expression (and optional time range) that returns time series. Monitors use instant or range query modes; range queries support reducing the window to a single value (last, min, max, average).
ConditionsCritical (alert) is required; you can add an optional Warning condition with a separate threshold. Each condition has a comparator (e.g. “is above”), a numeric threshold, and an optional pending period so the condition must hold before firing.
EvaluationThe monitor runs every evaluation interval (for example 1m, 5m, 15m, 1h). The query’s lookback window defines how much history each evaluation considers.
No data / errorPolicies control what happens when the query returns no series (No data) or fails (Error): typically Ignore (do not open synthetic incidents) or Alert (surface a no-data or error state).
NotificationsOptional notification channels (recipients) and a message template used when the monitor notifies on state changes. See Notifications.
PriorityP1 through P5 (for example P1 = Critical) is stored on the monitor and used with triggered alerts.

Monitor types in the UI

When creating a monitor, you can start from either flow:

  • Metric threshold monitor — for classic “fire when this metric crosses a line” (for example error rate in peak hours).
  • Metric change monitor — for tracking meaningful shifts (for example latency vs a prior period). The same underlying model applies; choose the entry point that matches how you think about the problem.

Query configuration

  • Query type

    • Instant — evaluates the series at a point in time.
    • Range — evaluates over a window; you choose a reduce function (Last, Max, Min, Avg) over that window before comparing to thresholds.
  • Lookback window — how far back to pull data for the query (for example 1m, 5m, 15m, 1h). Must align with your scrape interval and how quickly you need to detect problems.

  • Min step — optional resolution step for range queries (when relevant to your metrics backend).

Queries are validated and executed by the platform’s metrics stack (PromQL-compatible). Use labels that exist on your series (for example node_name, pipeline_id, or destination labels) to scope a monitor to a service or pipeline.

Alert conditions

  • Critical — one condition of type alert (the UI label is “Critical”). Set comparator and threshold.
  • Warning — optional second condition of type warning with its own threshold (same comparator direction for both in the form).

Comparators include: above, below, above or equal to, below or equal to, equal to, not equal to.

Pending duration — the condition must remain true for this duration (for example 0s for immediate, or several minutes) before a new open alert is created, reducing flapping.

No data and query errors

Policyno_data_state or error_state valueTypical use
IgnoreignoreDo not open a dedicated “no data” or “error” incident; existing open alerts may be preserved per evaluator rules.
AlertalertTreat missing data or query failures as a first-class problem and surface an alert.

Empty or legacy values may be normalized to the same ignore behavior as the API stores—prefer choosing Ignore or Alert explicitly in the UI.

Monitors workbench

The Monitors area uses two tabs:

TabPurpose
MonitorsAll alert rules for the tenant—create, edit, delete, and inspect configuration.
TriggeredAlert instances produced when conditions fire: open and recent events with live status.

Use the filters panel to narrow either list. On Monitors, filters typically include priority, health (last evaluation), and summary status. On Triggered, filters include status, severity, health, and priority.

Monitors (rules) list

Each row reflects the last evaluation of that rule:

  • Last health — for example OK, no_data, or error.
  • Summary status — roll-up such as normal, pending, or open.
  • Last evaluated — timestamp of the last run.

Opening a row loads the monitor detail drawer (definition, query, conditions, notifications).

Monitors Triggered

The Triggered tab lists firing and recent alert events, not the rule definitions. Typical columns include:

  • Monitor name — the parent rule; rows tied to no data or query error health may show a badge.
  • Status — alert workflow state (for example open vs resolved); see the lifecycle doc for the full model.
  • Severitywarning vs critical (or equivalent) from the matched condition.
  • Elapsed time — time since the alert triggered (relative).

Selecting a row opens a Triggered alert drawer with:

  • Details — context from the trigger payload, labels, and a metric preview scoped to the alert (where the monitor’s query and labels allow it).
  • History — a timeline of evaluations for that event (status, severity, evaluated value vs threshold, timestamps), including resolution when the condition clears.

For orthogonal status, severity, and priority behavior, see Alert lifecycle and severity below.

Notifications

Notifications are optional. If you configure them, the platform delivers messages when monitor state changes warrant it (according to evaluator and channel behavior).

Channels (recipients)

  • In the monitor editor, under NotificationsRecipients, select one or more notification channels.
  • Channels are endpoints your organization has already defined (for example email, chat, or webhook integrations). You must create and verify those endpoints elsewhere in the product before they appear in the list.
  • Each selection is stored as a channel_id on the monitor. You can enable or disable individual channel bindings per monitor when the API supports it.

Channel types

ChannelFieldsNotes
Webhookwebhook_url (HTTPS), optional headersPraxis posts the alert JSON to your endpoint. The send_resolved toggle was removed in v0.3 — every monitor event (firing and resolved) is delivered; consumers should switch on the event status if they only want one direction.
RICwebhook_url (HTTPS), optional username, optional passwordWebhook variant for Resilient Intelligent Connector receivers. With both a username and password the request uses HTTP Basic auth; with only a password (or bearer token) it is sent as Authorization: Bearer <value>; with neither, the request is unauthenticated.
Email / chat integrationsprovider-specificConfigured per channel type in the channels area.

Message template

  • The message template is the body sent (or rendered) for notification deliveries. The UI provides a rich text editor so you can format content beyond plain text.
  • Templates are stored with the monitor (for example HTML or structured content from the editor). Keep templates focused on human-readable context; the delivery layer may merge additional fields automatically (see below).

Delivery and payload

  • A notification worker processes outbound jobs asynchronously: it retries transient failures up to configured limits, then marks jobs failed.
  • When the app base URL is configured, deliveries can include helpful deep links added to the payload, such as:
    • alert_url — link to the alert in the Monitors UI (/monitors/alerts/<event_id>).
    • pipeline_url — if the event carries a pipeline_id label, a link to that pipeline’s view may be included.
    • alert_id — the monitor event id inside the nested event object for templates and providers that expect it.

Exact fields depend on the channel type (provider) and the event payload; webhook-style channels receive a JSON body built from the monitor event plus these enrichments.

Alert lifecycle and severity

The alerting model has two layers that move independently:

  • Rule health — whether the monitor can evaluate (Ok, No data, or Error). New alerts are only created when health is Ok; No data and Error stop firing until the rule recovers.
  • Alert record — once a healthy rule matches its condition, the platform tracks status (for example open, acknowledged, resolved), severity (for example warning vs critical, re-evaluated on each run), and priority (typically fixed for the lifetime of that alert).

For status, severity, and how evaluations appear in the UI, use the Triggered alert drawer (Details and History) described above.

See also

  • Integrations catalog — pipeline sources, destinations, and processors that emit the metrics you monitor