What Evaluations Do Not Do

  • guarantee production safety on their own - they surface comparison results, contract checks, and diagnostics, but still require review
  • replace monitoring - there is no built-in continuous evaluation loop; you schedule runs yourself
  • make /api/runs/:id part of the public docs surface - evaluation results can reference underlying run ids, but public docs do not promote orchestration run detail as a released inspection API

See also

Was this page helpful?