What Evaluations Do Not Do
- guarantee production safety on their own - they surface comparison results, contract checks, and diagnostics, but still require review
- replace monitoring - there is no built-in continuous evaluation loop; you schedule runs yourself
- make
/api/runs/:idpart of the public docs surface - evaluation results can reference underlying run ids, but public docs do not promote orchestration run detail as a released inspection API