AI operations

AI agent monitoring

AI-agent monitoring checks whether agents are available, correct, fast enough, and operating within expected cost and error thresholds.

Audience: AI engineers, product engineers, platform teams

Two monitoring modes

  • Synthetic agent checks send a prompt to an agent endpoint and validate whether the response meets expectations.
  • Runtime agent telemetry records production runs, tool spans, latency, cost, token usage, status, and errors.
  • Use both modes together to catch scheduled quality problems and real production behavior.

Synthetic agent check fields

  • Prompt: the test input sent to the agent.
  • Expectation: the outcome the response should satisfy.
  • Body template: optional request body used for agent endpoint calls.
  • Response path: optional path to extract a response field from JSON.
  • Authorization header: optional auth header for protected agent endpoints.

Agent alerting

  • Create threshold rules for error rate or cost.
  • Scope rules to a single agent or all agents.
  • Choose a rolling time window and route breaches to alert channels.
  • Recovery notifications indicate when the agent returns below the threshold.

Common use cases

  • Detect when an AI support agent starts giving incomplete answers.
  • Detect cost spikes from tool loops, prompt regressions, or model changes.
  • Trace slow tool calls inside an agent workflow.
  • Separate endpoint availability from answer correctness.

Related documentation