Benchmarks

Benchmarks tie datasets and scorers to repeatable evaluation runs. These endpoints mirror what the TypeScript SDK exposes as Benchmark resource methods.

Base path: /api/v1/benchmarks. Authenticate as described in HTTP authentication.

List and create

List benchmarks

GET /api/v1/benchmarks

Returns JSON listing benchmarks visible to the authenticated project and organization.

Create benchmark

POST /api/v1/benchmarks

Body: JSON with at least the fields required by your workspace (for example name, dataset reference, and scorers configuration).
Returns: JSON describing the created benchmark, including its id for follow-up calls.

Single benchmark

Replace {id} with the benchmark id.

Method	Path	Purpose
GET	`/api/v1/benchmarks/{id}`	Fetch configuration and status for one benchmark.
DELETE	`/api/v1/benchmarks/{id}`	Remove the benchmark from the project.

Start a run

POST /api/v1/benchmarks/{id}/start

Body: Optional JSON arguments depending on your evaluation setup (for example row subset, experiment flags, or runner options supported by the product).
Effect: Schedules or starts benchmark execution according to server rules and quota.

Poll the benchmark GET route or inspect the tasks route below to monitor progress.

Tasks

GET /api/v1/benchmarks/{id}/tasks

Returns JSON describing outstanding and completed work units (tasks) associated with the benchmark run, suitable for progress UIs or batch orchestration.

Best practices

Idempotency: starting the same benchmark twice may enqueue duplicate work; guard with client-side locks or check status first.
Cost: benchmark runs can invoke generative models and scorers; monitor usage while iterating.
SDK: prefer Benchmark helpers for typed requests and consistent error handling.