Overview
Evaluations batch cases and run them against draft and/or published skill versions. They help you measure changes before you ship.
What you get
- A run record with per-case outcomes.
- Links back to the underlying runs so you can inspect traces.
- A summary of changes and contract validity.
API
- Cases:
GET/POST /api/skills/:id/evaluation-cases - Run batch:
POST /api/skills/:id/evaluations/run - Fetch result:
GET /api/skills/:id/evaluations/:evaluationRunId
SDK: EvaluationsClient (evaluation runs are access_token_only).