Evaluating generative video
A practical rubric for judging AI video outputs in benchmarks and human review.
Temporal coherence
Check flicker, object permanence, and camera consistency across frames. Regressions here often trace to sampler or frame-rate changes.
Semantic alignment
Does the output match the prompt intent and any negative constraints? Misalignment may be a prompt issue rather than a model issue.
Artifacts
Look for warped hands, unreadable text, and melting textures — categorize them so scorers can track each defect type separately.
Latency vs quality
Higher step counts or resolutions improve quality but shift latency distributions. Record both when comparing runs.