Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt

Use this file to discover all available pages before exploring further.

Spec: PF-07 Phase 2 & 3 — Error Handling & Monitoring
Last Updated: 2026-03-14

1. Overview

Encore Health OS uses Sentry for error tracking, performance monitoring, and session replay. Custom performance metrics are stored in pf_health_metrics via the platform performanceMonitor.

Architecture

┌─────────────┐   errors, traces   ┌─────────────┐
│  React App  │ ──────────────────→ │   Sentry    │
│  (browser)  │   replay, logs     │  Dashboard  │
└─────────────┘                    └─────────────┘

      │ page load, custom marks,
      │ API histograms

┌─────────────────────┐
│  pf_health_metrics  │
│  (Supabase)         │
└─────────────────────┘

2. Sentry Configuration

File: src/platform/monitoring/sentry.ts
SettingValueNotes
tracesSampleRate0.5 (50%)Auth/billing/clinical/HR-payroll forced to 1.0
profilesSampleRate0.1 (10%)JS Self-Profiling API
replaysSessionSampleRate0.0Off by default
replaysOnErrorSampleRate1.0100% on errors
enableLogstrueStructured log search
enableMetricstrueCustom metrics (SDK 10.25+)

PHI Scrubbing

The beforeSend callback:
  • Truncates all event/exception messages to 500 characters
  • Strips emails, phone numbers, SSNs, DOBs via regex
  • Drops breadcrumb messages matching PHI patterns
  • Only UUIDs are sent as user/org context — never names, emails, or clinical data

Source Maps

Source maps are uploaded via @sentry/vite-plugin during the Vercel build. The SENTRY_AUTH_TOKEN, SENTRY_ORG, and SENTRY_PROJECT environment variables must be set in the Vercel project settings. Verification: After a deploy, check Sentry → Settings → Source Maps → Artifacts to confirm the release has uploaded maps.

3. Alerting Thresholds

Error Rate

MetricWarningCriticalAction
Error rate (events/min)> 10/min> 50/minCheck Sentry Issues feed; page on-call if critical
Unique issues (new/hour)> 5> 15Review new issues for regressions
Unhandled rejection rate> 1% of sessions> 5%Investigate JS errors in production

Performance (Web Vitals)

MetricGoodNeeds ImprovementPoor
LCP (Largest Contentful Paint)≤ 2.5s2.5–4.0s> 4.0s
INP (Interaction to Next Paint)≤ 200ms200–500ms> 500ms
CLS (Cumulative Layout Shift)≤ 0.10.1–0.25> 0.25

API Performance

MetricWarningCritical
p95 API latency> 2s> 5s
API error rate (5xx)> 1%> 5%

4. Dashboards

Sentry Project

  • Issues: Real-time error feed with stack traces and session replay
  • Performance: Transaction duration, Web Vitals, throughput
  • Replays: Session recordings for error context
  • Logs: Structured log search (Sentry.logger.*)

Key Sentry Queries

# High-frequency errors in the last hour
is:unresolved times_seen:>10 firstSeen:-1h

# Errors on auth routes
transaction:/auth/* is:unresolved

# Clinical module errors
module:cl is:unresolved

Platform Health Metrics

Custom metrics in pf_health_metrics (Supabase):
  • Page load timing (page_load, dom_ready)
  • Custom marks (startMark/endMark)
  • API response time histograms
Query via Supabase dashboard or the platform health module.

5. Error Boundaries

The application uses a layered error boundary strategy:
LevelLocationBehavior
Global (root)main.tsxCatches catastrophic failures; shows full-page fallback
Global (app)App.tsxDefense-in-depth; catches errors inside providers
FeatureRouteLoader.tsxPer-module isolation; module crash doesn’t break other routes
ComponentIndividual componentsOptional; for non-critical widgets
The double global boundary (main.tsx + App.tsx) is intentional — the outer boundary catches errors that occur during provider initialization.

6. Correlation IDs

Every auth state change (sign-in, sign-out, token refresh) generates a correlation_id via crypto.randomUUID(). This ID is:
  • Set in the logger context for all subsequent logs
  • Included in structured log entries
  • Useful for tracing a user session across log entries

7. Escalation Procedure

  1. P3 (Low): New non-critical issue appears in Sentry → assign to relevant core team in next standup
  2. P2 (Medium): Error rate warning threshold → investigate within 4 hours
  3. P1 (High): Error rate critical threshold or auth/billing errors → investigate within 1 hour
  4. P0 (Critical): Application-wide crash or data integrity issue → page on-call immediately

8. Maintenance

Sentry Housekeeping

  • Review and resolve/archive stale issues monthly
  • Update ignoreErrors patterns when new non-actionable errors are identified
  • Verify source map uploads after Vite/build tool upgrades
  • Review sampling rates quarterly (adjust based on event volume and budget)

Performance Monitor

  • performanceMonitor flushes metrics to pf_health_metrics every 30 seconds
  • Metrics are sampled at 10% in production, 100% in development
  • Stale metrics can be cleaned up via SQL on pf_health_metrics