Try itSee it in 30 seconds. No signup.
What you actually type, what you actually get back. One Node CLI (npx judge), three real workflows. Every score is a typed row in your dashboard with cited evidence — not a chat blob.
JUDGE_API_KEY=jdg_…· mint on /connect· BYOK Anthropic key works too Free account, no card·BYOK pays Anthropic at cost — one Sonnet 4.6 call per score (~$0.01–$0.05)·First score in <5 min — `npm i -g judge` → `judge init` → `judge run`
01
"Did my last edit make this file better — or worse?"
# Score what HEAD did to a file vs HEAD~1
$ judge score-edit src/auth.ts --judge code-quality
→ scoring before (HEAD~1)… 72.4
→ scoring after (HEAD) … 68.9
✗ verdict: REGRESSED weighted Δ −3.5
↓ input_validation 0.9 → 0.6 (−0.3)
↓ error_handling 0.8 → 0.5 (−0.3)
= type_safety 1.0 → 1.0 (·)
history: 12 runs → 71.6 → 71.8 → 72.4 → 68.9
scorecard: judge.app/s/8a7…f2c (cited evidence)
exit 2 (--fail-on-regression)
The wedge: edit scoring. Diffs HEAD~1 → HEAD, scores both, returns a verdict with per-metric deltas. Plug it into a pre-commit hook with --fail-on-regression and you stop shipping silent quality drops.
02
"Does my landing page actually land for my buyer?"
# Score a live URL through a buyer persona
$ judge score \
$ --judge sarah-k-founder \
$ --url https://acme.com/pricing
→ fetch + visible-text extract (1.4s)
→ submit_score (forced tool-use)
● overall 72.4 / 100 △ +1.8 IMPROVED
✓ value_clarity 0.85 (+0.10)
✓ pricing_clarity 0.70 (+0.20)
✗ social_proof 0.40 (·)
rationale (cited):
"Starter $29" — clear price; no annual toggle
no logos / quotes above the fold
A persona judge reads the page through a specific buyer's lens (optionally grounded in a crawl of your product). Ship a copy change, re-run, see if the persona's pricing-clarity metric moved before users do.
03
"Block regressions in CI without writing glue."
# Once: scaffold .judge/config.json + GH Action
$ judge init && judge install gh-actions
# On every PR (CI):
$ judge run --fail-on-regression
→ syncing 3 pipelines from .judge/config.json
✓ pricing-page 78.2 IMPROVED (+2.1)
✓ pr-diff:auth 74.0 STABLE (·)
✗ readme-clarity 61.5 REGRESSED (−4.3)
1 regression → exit 1, build fails
PR comment posted with 3 scorecards
Check in .judge/config.json with named pipelines (PR diff, landing page, README — anything). One judge run in CI scores them all, fails the build on any REGRESSED, and posts the scorecard URL to the PR.