AI operations

AI agent monitoring

AI-agent monitoring checks whether agents are available, correct, fast enough, and operating within expected cost and error thresholds.

Audience: AI engineers, product engineers, platform teams

Two monitoring modes

Synthetic agent checks send a prompt to an agent endpoint and validate whether the response meets expectations.
Runtime agent telemetry records production runs, tool spans, latency, cost, token usage, status, and errors.
Use both modes together to catch scheduled quality problems and real production behavior.

Synthetic agent check fields

Prompt: the test input sent to the agent.
Expectation: the outcome the response should satisfy.
Body template: optional request body used for agent endpoint calls.
Response path: optional path to extract a response field from JSON.
Authorization header: optional auth header for protected agent endpoints.

Agent alerting

Create threshold rules for error rate or cost.
Scope rules to a single agent or all agents.
Choose a rolling time window and route breaches to alert channels.
Recovery notifications indicate when the agent returns below the threshold.

Common use cases

Detect when an AI support agent starts giving incomplete answers.
Detect cost spikes from tool loops, prompt regressions, or model changes.
Trace slow tool calls inside an agent workflow.
Separate endpoint availability from answer correctness.

Related documentation

Telemetry

Ingest OTLP JSON logs, metrics, traces, and inspect trace detail inside AImonitoring.

Monitors

Create HTTP, TCP, ping, heartbeat, and AI-agent synthetic monitors with thresholds and regions.

Alert channels

Configure email, Slack, webhook, SMS, and WhatsApp delivery targets and test alert delivery.

Incidents

Acknowledge, investigate, route, resolve, and review service incidents.