Learn
Generative video evaluation, explained
Definitions and workflows for teams that need benchmarks and traces beside clips, not clip review in Slack.
Glossary
- BenchmarkAn immutable benchmark is a scored, reproducible evaluation run on a pinned dataset with a fixed scorer contract — the artifact you share in release reviews.
- ExperimentAn experiment compares tasks or model settings on inputs without the full benchmark scorer roster — for exploration before sign-off.
- ScorerA scorer grades benchmark or trace outputs — automated rubrics, VQA models, or human-in-the-loop signals attached to eval runs.
- TraceA trace is a record of one instrumented workflow — spans with timing, inputs, outputs, and links to video artifacts.
Integrations
Documentation
Product guides and API reference live in /docs, including evaluating video and the fal.ai setup guide.