AI operations
AI agent monitoring
AI-agent monitoring checks whether agents are available, correct, fast enough, and operating within expected cost and error thresholds.
Two monitoring modes
- Synthetic agent checks send a prompt to an agent endpoint and validate whether the response meets expectations.
- Runtime agent telemetry records production runs, tool spans, latency, cost, token usage, status, and errors.
- Use both modes together to catch scheduled quality problems and real production behavior.
Synthetic agent check fields
- Prompt: the test input sent to the agent.
- Expectation: the outcome the response should satisfy.
- Body template: optional request body used for agent endpoint calls.
- Response path: optional path to extract a response field from JSON.
- Authorization header: optional auth header for protected agent endpoints.
Agent alerting
- Create threshold rules for error rate or cost.
- Scope rules to a single agent or all agents.
- Choose a rolling time window and route breaches to alert channels.
- Recovery notifications indicate when the agent returns below the threshold.
Common use cases
- Detect when an AI support agent starts giving incomplete answers.
- Detect cost spikes from tool loops, prompt regressions, or model changes.
- Trace slow tool calls inside an agent workflow.
- Separate endpoint availability from answer correctness.
Related documentation
Telemetry
Ingest OTLP JSON logs, metrics, traces, and inspect trace detail inside AImonitoring.
Monitors
Create HTTP, TCP, ping, heartbeat, and AI-agent synthetic monitors with thresholds and regions.
Alert channels
Configure email, Slack, webhook, SMS, and WhatsApp delivery targets and test alert delivery.
Incidents
Acknowledge, investigate, route, resolve, and review service incidents.