Frametail

Experiments

Comparing prompts, models, or parameters with controlled experiment runs.

Why experiments

Experiments isolate variables: you change one aspect (for example, a prompt variant) while holding datasets and scorers constant. This reduces debate about what caused a metric shift.

Designing comparisons

  • Keep sample sizes meaningful — too few rows invite noise.
  • Document hypotheses in the experiment description for future readers.
  • Link experiments back to traces when validating behavior in production-like settings.