Why our spec templates look this way

Version: 1.0.0 Last Updated: 2026-04-17 Audience: New AI agents and humans authoring or reviewing specs in this repo. Read this once before authoring your first spec; refer back when a template constraint feels arbitrary.

TL;DR

Our spec templates are domain-specific guardrails for a multi-tenant, regulated, brownfield healthcare ERP. Every section, every required field, every checklist exists because of a specific failure mode we have already paid for or a specific compliance obligation we cannot opt out of. This document explains why each non-obvious template constraint exists. If a template rule feels redundant, read the corresponding section here before changing it.

Hierarchy reminder: constitution.md > AI_GUIDE.md > AGENTS.md > .cursor/rules/* > this file. This file documents intent; the constitution defines the rules.

Why we have a template at all (and not just “write a spec”)

Failure mode: Free-form specs in a 12-core repo with 7,400+ source files produce silent overlap, unsafe migrations, and missed regulatory requirements. Mitigation: A constrained template that names the failure modes explicitly and forces authors (human and AI) to address them. Cross-reference: SPEC_TEMPLATE.md — main entry point.

Why we split the spec into four files

File	Sections	Why split
`SPEC_TEMPLATE.md`	Header, Decision Tree, Functional / Non-Functional Reqs, Success Criteria, Data Model, Settings, RLS, API Design, Integration Points, Performance, Security, Regulatory, Testing Strategy, Rollback, Known Limitations, Success Metrics, References	Stays under ~500 lines for readable diffs and reviewer focus.
`SPEC_TEMPLATE_OVERVIEW.md`	Pre-Planning Clarification, Business Context, Objectives, Scope, Spec landscape & boundary, User Stories, Edge Cases	”What and why” — owned by product / domain SMEs.
`SPEC_TEMPLATE_UI_UX.md`	UI/UX Requirements (dialog tiers, shared components, mobile, a11y), Wizard & Help UX	UI patterns are dense, evolve independently, and have their own reviewer (`ux-reviewer`).
`SPEC_TEMPLATE_IMPLEMENTATION_AND_ERRATA.md`	Implementation Plan, Documentation, Future Enhancements, Errata, Completion Tracking, Related Features, References	”How and what changed” — owned by implementers; isolates post-Phase-1 rename mappings.

Failure mode this avoids: Single 1,500-line spec files where reviewers skim the middle and important constraints (RLS, regulatory mapping, error handling) get missed.

Why `## Spec landscape & boundary` is required

Failure mode (real): Two specs in the same core re-specify the same table or workflow with conflicting columns / ownership, or an integration is “owned” by no one. Mitigation: Every full-template spec must include ## Spec landscape & boundary after Scope. The section forces the author to (a) name overlapping specs, (b) declare ownership verdicts (Delegates / Extends / Duplicates), and (c) resolve conflicts before validate-spec will pass. Enforcement: validate-spec warns when missing or empty; spec-reviewer Stage 2i (Scope & Content Overlap) blocks PASS when an overlap is verdict = Duplicates. Cross-reference: .cursor/agents/spec-reviewer.md § Required research; .cursor/commands/specs/spec-landscape.md.

Why `Errata` is required after Phase 1

Failure mode (real): Phase 1 migrations rename or restructure columns relative to the spec’s draft Data Model. Phase 2 / 3 code that uses the spec’s column names then breaks at runtime, or worse, silently uses a stale column. Mitigation: After Phase 1 migration applies and types.ts regenerates, the author MUST update the Errata: Column Name Mapping table in the spec. Phase 2 / 3 code uses types from types.ts, not from the spec. Enforcement: spec-reviewer Stage 2d checks that new tables in the spec have either (a) matching types.ts entries, or (b) an explicit Errata row. T4 in TASKS_TEMPLATE.md verifies the schema and updates Errata. Cross-reference: SPEC_TEMPLATE_IMPLEMENTATION_AND_ERRATA.md § Errata.

Why `Regulatory & Compliance Requirements` is mandatory in regulated cores

Failure mode: Healthcare features that ship without a regulatory mapping accumulate “we’re sure HIPAA covers this” assumptions that fail audits. Mitigation: For CL, PM, HR, RH, GR, FA, IT, CE cores (or any spec touching PHI / PII / billing / credentialing / tax / security / communications), the spec MUST include the Regulatory & Compliance Requirements section with:

A regulation table mapping each applicable regulation (with section / rule citation) to one or more FRs.
A regulatory deadline (or N/A).
Compliance Acceptance Criteria.
A Compliance Review row (with reviewer / date / determination — even if Pending).

Enforcement: spec-reviewer Stage 3 (3a–3f); compliance-reviewer (Claude Code) for deeper audit; validate-spec warns when a regulated-core spec has no regulatory section. Cross-reference: AGENTS.md § Regulatory Compliance Decision Tree; docs/compliance/REGULATORY_COMPLIANCE_TRACKER.md.

Why state-specific rules belong in PF-96, not in code

Failure mode (real): AHCCCS / Arizona-specific constants (assessment-element counts, filing deadlines, billing thresholds, attestation text) hardcoded as global constants. When the platform expands to a second state, every such constant becomes a regression risk. Mitigation: State Medicaid rules live in PF-96 jurisdiction profiles. Specs that involve state-variable rules MUST reference PF-96, declare jurisdiction scope (org-level vs site-level), and call useJurisdictionProfile() / pf_resolve_jurisdiction_profile() instead of hardcoding constants. Arizona is the default profile, not the universal default. Enforcement: validate-spec Jurisdiction Scope Validation (CL/PM/GR cores); spec-reviewer flags hardcoded state constants; PR-level Bugbot rules. Cross-reference: specs/pf/specs/PF-96-medicaid-state-compliance-configuration.md; specs/cross-cutting/PF-96-MIGRATION-PLAN-ahcccs-to-jurisdiction-profiles.md.

Why TASKS files are self-contained for Lovable

Failure mode: Tasks generated for Lovable AI assume Cursor rules are loaded. They aren’t — Lovable has its own runtime and no awareness of .cursor/rules/*. Tasks that say “follow our standard query patterns” produce code that misses tenant filters, error sanitization, or skeleton loaders. Mitigation: TASKS_TEMPLATE.md includes the “Implementation rules for Lovable / external AI” block at the top of every generated TASKS file. The block restates the top patterns (tenant filter, sanitizeErrorMessage, <Skeleton />, staleTime/gcTime, React.lazy, useCurrentUser, no cross-core imports, RLS WITH CHECK, semantic tokens, dialog tier classes). Enforcement: generate-tasks always includes the block; .cursor/rules/spec-and-task-authoring.mdc requires it. Cross-reference: docs/development/CURSOR_AND_LOVABLE_WORKFLOW.md; docs/tools/lovable/LOVABLE_CUSTOM_KNOWLEDGE.md.

Why TASKS supports two layouts (`by-layer` and `by-story`)

Failure mode (by-layer only): Specs with multiple user stories where everything must complete before anything ships. Long feedback cycles, big-bang releases, late discovery of UX problems. Mitigation: by-story layout (added in PR-2) lets a feature ship after the P1 user story phase. Per-story Checkpoints turn each story into a verifiable, demoable, shippable unit. The by-layer layout remains the default for single-vertical specs and infrastructure work where vertical slicing buys nothing. Enforcement: generate-tasks auto-selects based on whether the spec has ≥ 2 prioritized stories with Independent Tests (PR-1 conventions). Manual override via --by-story / --by-layer. Cross-reference: TASKS_TEMPLATE.md § Layout selection; TASKS_GENERATION_GUIDE.md § Layout Selection.

Why `tests-first` is the default for new business logic and RLS

Failure mode (real): Tests written after RLS migrations have been deployed often validate the wrong thing. By the time a regression test catches a cross-tenant leak, it has already shipped. Mitigation: tests-first posture for new business logic and always for RLS regardless of the spec’s overall TDD posture. Tests must exist and FAIL before any implementation task in the same story / phase begins. Enforcement: generate-tasks --tests-first (auto-selected); TASKS_GENERATION_GUIDE.md § TDD Posture; mandatory **Verify:** Run the test; expected output: FAIL … step on each test task; tdd-workflow skill.

Why we use `[NEEDS CLARIFICATION: <question>]` markers inline

Failure mode: Authors silently guess when they don’t know. The guess looks like a real requirement, makes it through review, and produces working software that does the wrong thing. Mitigation: Mark unknowns inline at the point of ambiguity (e.g. **FR-7:** Authenticate via [NEEDS CLARIFICATION: SSO, password, magic link?]). clarify-spec migrates them to the Clarifications table on resolution; validate-spec warns when any remain (FAILs at complexity score ≥ 4); spec-reviewer blocks Stage 2 PASS for score ≥ 4 specs with unresolved markers. Inspired by: GitHub Spec Kit (R1 from the comparison review).

Why every user story needs a priority and an Independent Test

Failure mode: Flat user stories with no priority make MVP impossible. Without an Independent Test, you can’t verify “did this story ship correctly?” in isolation, which means by-story TASKS phases can’t have Checkpoints. Mitigation: Every ### US-N: … heading ends with (Priority: P1 | P2 | P3), has a **Why this priority:** line, an **Independent Test:** line, and at least one Given … When … Then … Acceptance Scenario. At least one story is P1 — that is the MVP. Enforcement: validate-spec warns; spec-reviewer Stage 2 FAILs when missing. Inspired by: GitHub Spec Kit (R2 from the comparison review).

Why `Constitution Check 🚦` is a hard gate at the top of the plan

Failure mode: A long Constitutional Compliance checklist at the bottom of a plan reads as “things to remember” — easy to skim, easy for violations to creep in. By the time the violation is found, redesign cost is high. Mitigation: A short-form gate table at the top of the plan with G1–G10 gates. The plan cannot proceed to Phase 0 research until all gates pass (or are explicitly tracked in Complexity Tracking with rationale). Gates are re-checked after Phase 1 design. Enforcement: validate-plan checks for the gate’s presence, that all gates have a verdict (no ⬜), that Phase 1 completion triggers a Post-Phase-1 re-check, and that any Failed gate has a corresponding Complexity Tracking row. Inspired by: GitHub Spec Kit (R5 from the comparison review).

Why `Success Criteria (SC-NNN)` is separate from `Success Metrics`

Failure mode: Mixing “the system handles 1,000 concurrent users” with “80% adoption within 30 days” makes “is this done?” un-decidable. Engineering owns the first; product owns the second. Mitigation: Two distinct sections.

## Success Criteria (SC-NNN): measurable, externally observable outcome correctness. No implementation detail. Engineering’s contract.
## Success Metrics: adoption / CSAT / business KPIs. Product’s contract.

Enforcement: validate-spec warns when SC-NNN is missing or contains implementation detail; spec-reviewer Stage 2 FAILs when missing or implementation-leaking. Inspired by: GitHub Spec Kit (R6 from the comparison review).

Why we have so many command and agent surfaces (31 spec commands, 8 reviewers)

Failure mode: A single “review the spec” command for a domain this regulated produces shallow reviews. A single “validate the spec” command misses architecture-specific concerns. Mitigation: Specialty commands and agents, each with one job:

Structural: validate-spec, validate-plan (fast, deterministic).
Implementability: spec-reviewer (3-stage, AI agent).
Cross-spec: spec-landscape, deep-module-review.
Specialty review: compliance-reviewer, ux-reviewer, security-auditor, architecture-validator, code-reviewer, verifier, module-strategic-reviewer.
Workflow movement: clarify-spec, prepare-spec, whats-next, discuss-implementation, generate-tasks, verify-task, spec-complete, spec deferred.
Session: pause-work, resume-work, session-status, pre-commit-check.

This is intentional. We pay the surface-area cost to get domain-appropriate depth. Cross-reference: docs/development/SPEC_WORKFLOW.md; docs/development/SPEC_COMMAND_CHEATSHEET.md; .cursor/agents/ inventory.

Why `whats-next` and `session-status` are branch-aware (R7)

Failure mode: Long sessions on a cursor/cl-15-… branch require typing --spec CL-15 repeatedly. Users skip the spec workflow because the friction is real. Mitigation: whats-next (and session-status, spec-status) infer {CORE-##} from the current git branch when no flag is passed. Explicit flags always win. Pattern: strip common prefixes (cursor/, feature/, fix/, …) and match (?i)\b([a-z]{2,3})-(\d{1,3}…)\b. Inspired by: GitHub Spec Kit’s active-spec-from-branch convention (R7 from the comparison review).

What we explicitly do NOT do (and why)

These are ideas from other spec frameworks (notably GitHub Spec Kit) that we have deliberately rejected:

Idea	Why we don’t	Where to read more
”Specs generate code; code is regenerated output.”	We have ~7,400 src files and 600+ migrations. Regeneration would destroy audit trails for regulated cores. Specs document intent and acceptance, not generated artifacts.	SPEC_KIT_COMPARISON_2026-04-17.md § Recommended adoptions / Explicitly rejected
Generic “Library-First / CLI Mandate” constitution	Our `constitution.md` is domain-specific (multi-tenancy, RLS, PHI, jurisdictional rules). A generic constitution would lose all of that.	`constitution.md`
Single `spec.md` per branch	Specs belong to cores, not branches. `{CORE}-##` IDs are how we do completion tracking, registry sync, and cross-spec analysis.	`specs/_templates/SPEC_DIRECTORY_STRUCTURE.md`
Replace 3-stage `spec-reviewer` with one `/analyze` step	Regulated cores demand specialty review (compliance, UX, security, architecture). One generic agent cannot do this depth.	`.cursor/agents/spec-reviewer.md`
Fragment Data Model and Integration Points into `data-model.md` / `contracts/` siblings	Co-locating them in the spec keeps single-document review and audit possible. Fragmentation would multiply the review surface and complicate completion tracking.	`SPEC_TEMPLATE.md`

When to update this document

A new template constraint is added — add a “Why X is required” subsection here.
A failure mode shows up that the templates do not currently catch — document it here, then update the template.
An external framework (Spec Kit, etc.) is re-evaluated — update the “What we explicitly do NOT do” table if any rejection changes.

References

constitution.md — Source of truth for guardrails.
AI_GUIDE.md — AI contribution process.
AGENTS.md — Quick reference for patterns.
docs/development/SPEC_WORKFLOW.md — Canonical workflow.
docs/development/SPEC_COMMAND_CHEATSHEET.md — Command tier list.
docs/reviews/SPEC_KIT_COMPARISON_2026-04-17.md — Comparison with GitHub Spec Kit and the source of recommendations R1–R8.
GitHub Spec Kit — External reference; six adopted patterns are credited inline above.

Architecture

Development

Database

Testing

Migration

Operations

Governance & Security

Platform Internals

Why our spec templates look this way

TL;DR

Why we have a template at all (and not just “write a spec”)

Why we split the spec into four files

Why `## Spec landscape & boundary` is required

Why `Errata` is required after Phase 1

Why `Regulatory & Compliance Requirements` is mandatory in regulated cores

Why state-specific rules belong in PF-96, not in code

Why TASKS files are self-contained for Lovable

Why TASKS supports two layouts (`by-layer` and `by-story`)

Why `tests-first` is the default for new business logic and RLS

Why we use `[NEEDS CLARIFICATION: <question>]` markers inline

Why every user story needs a priority and an Independent Test

Why `Constitution Check 🚦` is a hard gate at the top of the plan

Why `Success Criteria (SC-NNN)` is separate from `Success Metrics`

Why we have so many command and agent surfaces (31 spec commands, 8 reviewers)

Why `whats-next` and `session-status` are branch-aware (R7)

What we explicitly do NOT do (and why)

When to update this document

References

Architecture

Development

Database

Testing

Migration

Operations

Governance & Security

Platform Internals

Documentation Index

​TL;DR

​Why we have a template at all (and not just “write a spec”)

​Why we split the spec into four files

​Why ## Spec landscape & boundary is required

​Why Errata is required after Phase 1

​Why Regulatory & Compliance Requirements is mandatory in regulated cores

​Why state-specific rules belong in PF-96, not in code

​Why TASKS files are self-contained for Lovable

​Why TASKS supports two layouts (by-layer and by-story)

​Why tests-first is the default for new business logic and RLS

​Why we use [NEEDS CLARIFICATION: <question>] markers inline

​Why every user story needs a priority and an Independent Test

​Why Constitution Check 🚦 is a hard gate at the top of the plan

​Why Success Criteria (SC-NNN) is separate from Success Metrics

​Why we have so many command and agent surfaces (31 spec commands, 8 reviewers)

​Why whats-next and session-status are branch-aware (R7)

​What we explicitly do NOT do (and why)

​When to update this document

​References

TL;DR

Why we have a template at all (and not just “write a spec”)

Why we split the spec into four files

Why `## Spec landscape & boundary` is required

Why `Errata` is required after Phase 1

Why `Regulatory & Compliance Requirements` is mandatory in regulated cores

Why state-specific rules belong in PF-96, not in code

Why TASKS files are self-contained for Lovable

Why TASKS supports two layouts (`by-layer` and `by-story`)

Why `tests-first` is the default for new business logic and RLS

Why we use `[NEEDS CLARIFICATION: <question>]` markers inline

Why every user story needs a priority and an Independent Test

Why `Constitution Check 🚦` is a hard gate at the top of the plan

Why `Success Criteria (SC-NNN)` is separate from `Success Metrics`

Why we have so many command and agent surfaces (31 spec commands, 8 reviewers)

Why `whats-next` and `session-status` are branch-aware (R7)

What we explicitly do NOT do (and why)

When to update this document

References