Experiments
Comparing prompts, models, or parameters with controlled experiment runs.
Why experiments
Experiments isolate variables: you change one aspect (for example, a prompt variant) while holding datasets and scorers constant. This reduces debate about what caused a metric shift.
Designing comparisons
- Keep sample sizes meaningful — too few rows invite noise.
- Document hypotheses in the experiment description for future readers.
- Link experiments back to traces when validating behavior in production-like settings.