Frametail

Evaluating generative video

A practical rubric for judging AI video outputs in benchmarks and human review.

Temporal coherence

Check flicker, object permanence, and camera consistency across frames. Regressions here often trace to sampler or frame-rate changes.

Semantic alignment

Does the output match the prompt intent and any negative constraints? Misalignment may be a prompt issue rather than a model issue.

Artifacts

Look for warped hands, unreadable text, and melting textures — categorize them so scorers can track each defect type separately.

Latency vs quality

Higher step counts or resolutions improve quality but shift latency distributions. Record both when comparing runs.