Monitors
Monitors help you catch problems before customers do: define metric-based rules in Praxis, run PromQL on your metrics backend on a schedule, compare results to thresholds, and open or update alerts with notifications to the channels your team already uses.
Concepts
| Concept | Description |
|---|---|
| Query | A PromQL expression (and optional time range) that returns time series. Monitors use instant or range query modes; range queries support reducing the window to a single value (last, min, max, average). |
| Conditions | Critical (alert) is required; you can add an optional Warning condition with a separate threshold. Each condition has a comparator (e.g. “is above”), a numeric threshold, and an optional pending period so the condition must hold before firing. |
| Evaluation | The monitor runs every evaluation interval (for example 1m, 5m, 15m, 1h). The query’s lookback window defines how much history each evaluation considers. |
| No data / error | Policies control what happens when the query returns no series (No data) or fails (Error): typically Ignore (do not open synthetic incidents) or Alert (surface a no-data or error state). |
| Notifications | Optional notification channels (recipients) and a message template used when the monitor notifies on state changes. See Notifications. |
| Priority | P1 through P5 (for example P1 = Critical) is stored on the monitor and used with triggered alerts. |
Monitor types in the UI
When creating a monitor, you can start from either flow:
- Metric threshold monitor — for classic “fire when this metric crosses a line” (for example error rate in peak hours).
- Metric change monitor — for tracking meaningful shifts (for example latency vs a prior period). The same underlying model applies; choose the entry point that matches how you think about the problem.
Query configuration
-
Query type
- Instant — evaluates the series at a point in time.
- Range — evaluates over a window; you choose a reduce function (Last, Max, Min, Avg) over that window before comparing to thresholds.
-
Lookback window — how far back to pull data for the query (for example 1m, 5m, 15m, 1h). Must align with your scrape interval and how quickly you need to detect problems.
-
Min step — optional resolution step for range queries (when relevant to your metrics backend).
Queries are validated and executed by the platform’s metrics stack (PromQL-compatible). Use labels that exist on your series (for example node_name, pipeline_id, or destination labels) to scope a monitor to a service or pipeline.
Alert conditions
- Critical — one condition of type
alert(the UI label is “Critical”). Set comparator and threshold. - Warning — optional second condition of type
warningwith its own threshold (same comparator direction for both in the form).
Comparators include: above, below, above or equal to, below or equal to, equal to, not equal to.
Pending duration — the condition must remain true for this duration (for example 0s for immediate, or several minutes) before a new open alert is created, reducing flapping.
No data and query errors
| Policy | no_data_state or error_state value | Typical use |
|---|---|---|
| Ignore | ignore | Do not open a dedicated “no data” or “error” incident; existing open alerts may be preserved per evaluator rules. |
| Alert | alert | Treat missing data or query failures as a first-class problem and surface an alert. |
Empty or legacy values may be normalized to the same ignore behavior as the API stores—prefer choosing Ignore or Alert explicitly in the UI.
Monitors workbench
The Monitors area uses two tabs:
| Tab | Purpose |
|---|---|
| Monitors | All alert rules for the tenant—create, edit, delete, and inspect configuration. |
| Triggered | Alert instances produced when conditions fire: open and recent events with live status. |
Use the filters panel to narrow either list. On Monitors, filters typically include priority, health (last evaluation), and summary status. On Triggered, filters include status, severity, health, and priority.
Monitors (rules) list
Each row reflects the last evaluation of that rule:
- Last health — for example OK, no_data, or error.
- Summary status — roll-up such as normal, pending, or open.
- Last evaluated — timestamp of the last run.
Opening a row loads the monitor detail drawer (definition, query, conditions, notifications).
Monitors Triggered
The Triggered tab lists firing and recent alert events, not the rule definitions. Typical columns include:
- Monitor name — the parent rule; rows tied to no data or query error health may show a badge.
- Status — alert workflow state (for example open vs resolved); see the lifecycle doc for the full model.
- Severity — warning vs critical (or equivalent) from the matched condition.
- Elapsed time — time since the alert triggered (relative).
Selecting a row opens a Triggered alert drawer with:
- Details — context from the trigger payload, labels, and a metric preview scoped to the alert (where the monitor’s query and labels allow it).
- History — a timeline of evaluations for that event (status, severity, evaluated value vs threshold, timestamps), including resolution when the condition clears.
For orthogonal status, severity, and priority behavior, see Alert lifecycle and severity below.
Notifications
Notifications are optional. If you configure them, the platform delivers messages when monitor state changes warrant it (according to evaluator and channel behavior).
Channels (recipients)
- In the monitor editor, under Notifications → Recipients, select one or more notification channels.
- Channels are endpoints your organization has already defined (for example email, chat, or webhook integrations). You must create and verify those endpoints elsewhere in the product before they appear in the list.
- Each selection is stored as a
channel_idon the monitor. You can enable or disable individual channel bindings per monitor when the API supports it.
Channel types
| Channel | Fields | Notes |
|---|---|---|
| Webhook | webhook_url (HTTPS), optional headers | Praxis posts the alert JSON to your endpoint. The send_resolved toggle was removed in v0.3 — every monitor event (firing and resolved) is delivered; consumers should switch on the event status if they only want one direction. |
| RIC | webhook_url (HTTPS), optional username, optional password | Webhook variant for Resilient Intelligent Connector receivers. With both a username and password the request uses HTTP Basic auth; with only a password (or bearer token) it is sent as Authorization: Bearer <value>; with neither, the request is unauthenticated. |
| Email / chat integrations | provider-specific | Configured per channel type in the channels area. |
Message template
- The message template is the body sent (or rendered) for notification deliveries. The UI provides a rich text editor so you can format content beyond plain text.
- Templates are stored with the monitor (for example HTML or structured content from the editor). Keep templates focused on human-readable context; the delivery layer may merge additional fields automatically (see below).
Delivery and payload
- A notification worker processes outbound jobs asynchronously: it retries transient failures up to configured limits, then marks jobs failed.
- When the app base URL is configured, deliveries can include helpful deep links added to the payload, such as:
alert_url— link to the alert in the Monitors UI (/monitors/alerts/<event_id>).pipeline_url— if the event carries apipeline_idlabel, a link to that pipeline’s view may be included.alert_id— the monitor event id inside the nested event object for templates and providers that expect it.
Exact fields depend on the channel type (provider) and the event payload; webhook-style channels receive a JSON body built from the monitor event plus these enrichments.
Alert lifecycle and severity
The alerting model has two layers that move independently:
- Rule health — whether the monitor can evaluate (Ok, No data, or Error). New alerts are only created when health is Ok; No data and Error stop firing until the rule recovers.
- Alert record — once a healthy rule matches its condition, the platform tracks status (for example open, acknowledged, resolved), severity (for example warning vs critical, re-evaluated on each run), and priority (typically fixed for the lifetime of that alert).
For status, severity, and how evaluations appear in the UI, use the Triggered alert drawer (Details and History) described above.
See also
- Integrations catalog — pipeline sources, destinations, and processors that emit the metrics you monitor