Glossary

What is an experiment in video eval?

An experiment is a structured run that compares variants (prompts, models, parameters) on dataset inputs without attaching the full immutable scorer contract used for release benchmarks.

When to run experiments

Early in a fal integration you might compare two endpoints or temperature settings. Experiments keep that exploration lightweight before you commit to a scorer roster and pinned rows.

When to promote to a benchmark

When stakeholders need a signed-off artifact — immutable dataset, full scorers, linkable results — run the evaluation workflow explicitly. Live scoring on production traces does not replace that step.

Related terms