Release checklist
Steps before promoting a model or prompt to production traffic.
Pre-release
- Benchmark against a frozen dataset with pinned scorers.
- Compare p50/p95 latency in traces between old and new versions.
- Verify error budgets for upstream providers used in the path.
Launch
- Roll out with canary traffic or tenant allowlists when possible.
- Tag traces with release identifiers for easy rollback analysis.
Post-release
- Monitor dashboards for 24–48 hours.
- Archive benchmark runs with links in your change ticket.