Glossary
Terms for generative video evaluation
Clear definitions for benchmarks, traces, and workflows — with links to how Frametail implements each.
Benchmark
A benchmark is an immutable, scored run on a pinned dataset with a defined scorer roster. Once created, the dataset rows and scorer contract stay fixed so comparisons across weeks stay honest.
Experiment
An experiment is a structured run that compares variants (prompts, models, parameters) on dataset inputs without attaching the full immutable scorer contract used for release benchmarks.
Scorer
A scorer is a function or model configuration that grades outputs during benchmarks or live scoring. Scorers are org-scoped and compose the “contract” for what “better” means on your team.
Trace
A trace is the top-level record for a single instrumented workflow. Spans inside the trace represent timed segments such as a fal model call, preprocessing, or upload.