"Did my last edit make this file better — or worse?"
# Score what HEAD did to a file vs HEAD~1
$ judge score-edit src/auth.ts --judge code-quality
→ scoring before (HEAD~1)… 72.4
→ scoring after (HEAD) … 68.9
✗ verdict: REGRESSED weighted Δ −3.5
↓ input_validation 0.9 → 0.6 (−0.3)
↓ error_handling 0.8 → 0.5 (−0.3)
= type_safety 1.0 → 1.0 (·)
history: 12 runs → 71.6 → 71.8 → 72.4 → 68.9
scorecard: judge.app/s/8a7…f2c (cited evidence)
exit 2 (--fail-on-regression)
The wedge: edit scoring. Diffs HEAD~1 → HEAD, scores both, returns a verdict with per-metric deltas. Plug it into a pre-commit hook with --fail-on-regression and you stop shipping silent quality drops.