Frametail

Benchmarks

Create, inspect, run, and remove benchmarks over HTTP.

Benchmarks tie datasets and scorers to repeatable evaluation runs. These endpoints mirror what the TypeScript SDK exposes as Benchmark resource methods.

Base path: /api/v1/benchmarks. Authenticate as described in HTTP authentication.

List and create

List benchmarks

GET /api/v1/benchmarks

Returns JSON listing benchmarks visible to the authenticated project and organization.

Create benchmark

POST /api/v1/benchmarks

  • Body: JSON with at least the fields required by your workspace (for example name, dataset reference, and scorers configuration).
  • Returns: JSON describing the created benchmark, including its id for follow-up calls.

Single benchmark

Replace {id} with the benchmark id.

MethodPathPurpose
GET/api/v1/benchmarks/{id}Fetch configuration and status for one benchmark.
DELETE/api/v1/benchmarks/{id}Remove the benchmark from the project.

Start a run

POST /api/v1/benchmarks/{id}/start

  • Body: Optional JSON arguments depending on your evaluation setup (for example row subset, experiment flags, or runner options supported by the product).
  • Effect: Schedules or starts benchmark execution according to server rules and quota.

Poll the benchmark GET route or inspect the tasks route below to monitor progress.

Tasks

GET /api/v1/benchmarks/{id}/tasks

Returns JSON describing outstanding and completed work units (tasks) associated with the benchmark run, suitable for progress UIs or batch orchestration.

Best practices

  • Idempotency: starting the same benchmark twice may enqueue duplicate work; guard with client-side locks or check status first.
  • Cost: benchmark runs can invoke generative models and scorers; monitor usage while iterating.
  • SDK: prefer Benchmark helpers for typed requests and consistent error handling.