Encore OS — Documentation

Spec: PF-07 Phase 2 & 3 — Error Handling & Monitoring
Last Updated: 2026-03-14

1. Overview

Encore OS uses Sentry for error tracking, performance monitoring, and session replay. Custom performance metrics are stored in pf_health_metrics via the platform performanceMonitor.

Architecture

┌─────────────┐   errors, traces   ┌─────────────┐
│  React App  │ ──────────────────→ │   Sentry    │
│  (browser)  │   replay, logs     │  Dashboard  │
└─────────────┘                    └─────────────┘
      │
      │ page load, custom marks,
      │ API histograms
      ▼
┌─────────────────────┐
│  pf_health_metrics  │
│  (Supabase)         │
└─────────────────────┘

2. Sentry Configuration

File: src/platform/monitoring/sentry.ts

Setting	Value	Notes
`tracesSampleRate`	0.5 (50%)	Auth/billing/clinical/HR-payroll forced to 1.0
`profilesSampleRate`	0.1 (10%)	JS Self-Profiling API
`replaysSessionSampleRate`	0.0	Off by default
`replaysOnErrorSampleRate`	1.0	100% on errors
`enableLogs`	true	Structured log search
`enableMetrics`	true	Custom metrics (SDK 10.25+)

PHI Scrubbing

The beforeSend callback:

Truncates all event/exception messages to 500 characters
Strips emails, phone numbers, SSNs, DOBs via regex
Drops breadcrumb messages matching PHI patterns
Only UUIDs are sent as user/org context — never names, emails, or clinical data

Source Maps

Source maps are uploaded via @sentry/vite-plugin during the Vercel build. The SENTRY_AUTH_TOKEN, SENTRY_ORG, and SENTRY_PROJECT environment variables must be set in the Vercel project settings. Verification: After a deploy, check Sentry → Settings → Source Maps → Artifacts to confirm the release has uploaded maps.

3. Alerting Thresholds

Error Rate

Metric	Warning	Critical	Action
Error rate (events/min)	> 10/min	> 50/min	Check Sentry Issues feed; page on-call if critical
Unique issues (new/hour)	> 5	> 15	Review new issues for regressions
Unhandled rejection rate	> 1% of sessions	> 5%	Investigate JS errors in production

Performance (Web Vitals)

Metric	Good	Needs Improvement	Poor
LCP (Largest Contentful Paint)	≤ 2.5s	2.5–4.0s	> 4.0s
INP (Interaction to Next Paint)	≤ 200ms	200–500ms	> 500ms
CLS (Cumulative Layout Shift)	≤ 0.1	0.1–0.25	> 0.25

API Performance

Metric	Warning	Critical
p95 API latency	> 2s	> 5s
API error rate (5xx)	> 1%	> 5%

4. Dashboards

Sentry Project

Issues: Real-time error feed with stack traces and session replay
Performance: Transaction duration, Web Vitals, throughput
Replays: Session recordings for error context
Logs: Structured log search (Sentry.logger.*)

Key Sentry Queries

# High-frequency errors in the last hour
is:unresolved times_seen:>10 firstSeen:-1h

# Errors on auth routes
transaction:/auth/* is:unresolved

# Clinical module errors
module:cl is:unresolved

Platform Health Metrics

Custom metrics in pf_health_metrics (Supabase):

Page load timing (page_load, dom_ready)
Custom marks (startMark/endMark)
API response time histograms

Query via Supabase dashboard or the platform health module.

5. Error Boundaries

The application uses a layered error boundary strategy:

Level	Location	Behavior
Global (root)	`main.tsx`	Catches catastrophic failures; shows full-page fallback
Global (app)	`App.tsx`	Defense-in-depth; catches errors inside providers
Feature	`RouteLoader.tsx`	Per-module isolation; module crash doesn’t break other routes
Component	Individual components	Optional; for non-critical widgets

The double global boundary (main.tsx + App.tsx) is intentional — the outer boundary catches errors that occur during provider initialization.

6. Correlation IDs

Every auth state change (sign-in, sign-out, token refresh) generates a correlation_id via crypto.randomUUID(). This ID is:

Set in the logger context for all subsequent logs
Included in structured log entries
Useful for tracing a user session across log entries

7. Escalation Procedure

P3 (Low): New non-critical issue appears in Sentry → assign to relevant core team in next standup
P2 (Medium): Error rate warning threshold → investigate within 4 hours
P1 (High): Error rate critical threshold or auth/billing errors → investigate within 1 hour
P0 (Critical): Application-wide crash or data integrity issue → page on-call immediately

8. Maintenance

Sentry Housekeeping

Review and resolve/archive stale issues monthly
Update ignoreErrors patterns when new non-actionable errors are identified
Verify source map uploads after Vite/build tool upgrades
Review sampling rates quarterly (adjust based on event volume and budget)

Performance Monitor

performanceMonitor flushes metrics to pf_health_metrics every 30 seconds
Metrics are sampled at 10% in production, 100% in development
Stale metrics can be cleaned up via SQL on pf_health_metrics

9. Structured Logging

Application logs use a JSON structure emitted by src/platform/monitoring/logger.ts. Format:

{
  "timestamp": "2025-01-07T10:00:00Z",
  "level": "info",
  "module": "hr",
  "action": "create_employee",
  "message": "Employee created successfully",
  "user_id": "uuid",
  "org_id": "uuid",
  "correlation_id": "uuid",
  "context": {
    "employee_id": "uuid"
  }
}

Standard Fields:

timestamp — ISO 8601 timestamp
level — Log level (debug, info, warn, error)
module — Module/core name
action — Action being performed
message — Human-readable message
user_id — User ID (stable UUID, not PHI)
org_id — Organization ID
site_id — Site ID (if applicable)
correlation_id — Request correlation ID (see §6)

PHI Protection:

Never log names, emails, SSNs, addresses
Only log stable IDs (UUIDs)
Sanitize error messages

10. Planned Integration: LogRocket

LogRocket is a planned complement to Sentry for richer session context (session replay, user-interaction tracking, network-request monitoring). Not yet wired. Setup Steps (Future):

Create a LogRocket project.
Install the SDK:
npm install logrocket

Initialize in src/main.tsx:

import LogRocket from 'logrocket';

LogRocket.init(import.meta.env.VITE_LOGROCKET_APP_ID, {
  shouldCaptureIP: false, // Privacy
  sanitizeInputs: true,
});

11. Monitoring Checklist

Setup

Ongoing

12. Troubleshooting

Logs Not Appearing

Symptoms: No logs in console; missing log entries. Solutions:

Check log level (may filter out)
Verify logger initialized
Check browser console filters
Verify structured format
Test with an explicit log call

Performance Monitoring Not Working

Symptoms: No metrics collected; dashboard empty. Solutions:

Verify initialization:

performanceMonitor.init({ enablePerformanceTracking: true });

Check sample rate (may be too low)
Verify the web-vitals library is installed
Check the browser console for errors
Test in development (100% sampling)

Too Many Alerts

Symptoms: Alert fatigue; important alerts missed. Solutions:

Increase alert thresholds
Reduce alert frequency
Group similar alerts
Use alert suppression
Review and tune alerts

Missing Alerts

Symptoms: Issues not detected; users report before alerts. Solutions:

Lower alert thresholds
Add more alert types
Improve monitoring coverage
Test alert delivery
Review alert configuration

​1. Overview

​Architecture

​2. Sentry Configuration

​PHI Scrubbing

​Source Maps

​3. Alerting Thresholds

​Error Rate

​Performance (Web Vitals)

​API Performance

​4. Dashboards

​Sentry Project

​Key Sentry Queries

​Platform Health Metrics

​5. Error Boundaries

​6. Correlation IDs

​7. Escalation Procedure

​8. Maintenance

​Sentry Housekeeping

​Performance Monitor

​9. Structured Logging

​10. Planned Integration: LogRocket

​11. Monitoring Checklist

​Setup

​Ongoing

​12. Troubleshooting

​Logs Not Appearing

​Performance Monitoring Not Working

​Too Many Alerts

​Missing Alerts