Overview

Evaluations batch cases and run them against draft and/or published skill versions. They help you measure changes before you ship.

What you get

  • A run record with per-case outcomes.
  • Links back to the underlying runs so you can inspect traces.
  • A summary of changes and contract validity.

API

  • Cases: GET/POST /api/skills/:id/evaluation-cases
  • Run batch: POST /api/skills/:id/evaluations/run
  • Fetch result: GET /api/skills/:id/evaluations/:evaluationRunId

SDK: EvaluationsClient (evaluation runs are access_token_only).

See also

Was this page helpful?