Frametail

Results and analytics

Reading benchmark outputs, spotting regressions, and exporting learnings.

Aggregate metrics

Start with summary charts to see score distributions and failure rates. Compare runs across time or branches when your workflow tags traces and benchmarks consistently.

Row-level inspection

Open failing rows to view inputs, outputs, and scorer rationales. Use this loop to refine prompts, scorers, or preprocessing.

Exporting

Export CSV or JSON where the UI offers it for offline analysis or executive summaries.