Score — Judge

JJudge judge.app →

Score

ENG-1234 · "Fix slow dashboard" × Spec Completeness Judge

5/8/2026, 8:01:15 AMEdited by gallery-seed

Who is judging this

Rubric

Scores a product / engineering spec on whether it gives a team enough to build without follow-up questions.

Their top concerns (weight share)

Acceptance criteria
21%
Problem clarity
19%
Success metric
18%

The score below reflects how this specific rubric weighs the page. If their priorities don't match your real audience, recalibrate before acting on the recommendations.

15.0/ 100First

What changed since last time

Score: 15.0 (first iteration — no prior baseline).

Why the judge scored it this way— show full

This spec scores poorly across all dimensions. The problem statement relies entirely on anecdotal, unmeasured language ("feels slow," "forever to load," "looks bad") with no baseline numbers or reproducible conditions. Two separate concerns — query/caching latency and pie-chart flicker — are mixed into one ticket alongside a speculative third issue ("possibly related to new analytics rollout"), making scope impossible to bound. There are zero acceptance criteria and no measurable success metric; the closest thing to a target is "maybe add some caching." Ownership is assigned to "someone from platform," which is not actionable, and no definition of done exists anywhere in the spec.

What to fix first · ranked by impact = how far below “Good” × weight. Fix #1 first.

1
Acceptance criteria
Poorweight 21%score 1.0/10
ACs are mechanical and testable.
DiagnosisThere are zero mechanically-testable ACs; the entire spec contains only suggestions ('should probably look at queries,' 'maybe add some caching') with no pass/fail criteria.
Do this nextAdd numbered ACs in Given/When/Then or checklist form, e.g. 'Given a user with ≥500 rows of data, when the homepage loads, then Time-to-Interactive is ≤2 s in a Lighthouse CI run on the main branch.'
2
Success metric
Poorweight 18%score 1.0/10
A measurable signal of success is named.
DiagnosisNo measurable success signal is defined anywhere; 'feels slow' and 'looks bad' are the only proxies offered and neither is quantified.
Do this nextDefine at least one instrumented metric with a target and measurement method, e.g. 'Homepage P95 server response time drops from current Xms to ≤500 ms as tracked in Datadog dashboard [link].'
3
Problem clarity
Poorweight 19%score 2.0/10
Problem statement is concrete and bounded.
DiagnosisThe problem is described entirely in subjective, unbounded terms — 'feels slow,' 'forever to load,' 'looks bad' — with no baseline measurement, no affected user count, and no reproducible scenario.
Do this nextReplace vague language with concrete data: add a sentence like 'P95 homepage TTI measured at Xs on YYYY-MM-DD via [tool]; target is ≤Y s' and specify which user segments/browsers/data sizes reproduce the issue.
4
Scope boundaries
Poorweight 16%score 2.0/10
Explicit in-scope and out-of-scope.
DiagnosisTwo distinct concerns (load-time performance and pie-chart flicker) are bundled together with no explicit in-scope/out-of-scope list, and a third speculative concern ('possibly related to analytics rollout') is left dangling.
Do this nextSplit into at least two separate tickets (query/caching performance vs. chart flicker) and add explicit non-goals, e.g. 'Out of scope: analytics pipeline changes, mobile dashboard, other pages.'
5
Edge cases
Poorweight 15%score 2.0/10
Realistic edge / failure cases are enumerated.
DiagnosisThe possible analytics-rollout regression is noted but not investigated or structured as a hypothesis; no other failure modes (e.g., large datasets, concurrent users, cache invalidation, empty-state rendering) are considered.
Do this nextAdd an edge-cases section listing scenarios to test: empty dataset, >10 k rows, cache miss on cold start, simultaneous refresh during data pipeline run, and the analytics-rollout regression — each with an expected outcome.

Metrics

Scores a product / engineering spec on whether it gives a team enough to build without follow-up questions.

problem clarity
20%
scope boundaries
20%
edge cases
20%
acceptance criteria
10%
success metric
10%
owner dod
10%

Judged by Sonnet 4.6 · spec-completenessPublic link · read-only

Want quality scoring on your own code? Try Judge.