Glossary
What is an experiment in video eval?
An experiment is a structured run that compares variants (prompts, models, parameters) on dataset inputs without attaching the full immutable scorer contract used for release benchmarks.
When to run experiments
Early in a fal integration you might compare two endpoints or temperature settings. Experiments keep that exploration lightweight before you commit to a scorer roster and pinned rows.
When to promote to a benchmark
When stakeholders need a signed-off artifact — immutable dataset, full scorers, linkable results — run the evaluation workflow explicitly. Live scoring on production traces does not replace that step.