Version: 1.0.0Documentation Index
Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
Use this file to discover all available pages before exploring further.
Last Updated: 2025-01-07 This guide documents the monitoring and alerting strategy for the Encore Health OS Platform, covering error tracking, performance monitoring, log aggregation, and incident response.
Table of Contents
- Overview
- Error Tracking
- Performance Monitoring
- Log Aggregation
- Alerting Configuration
- Dashboard Setup
- Incident Response
- Best Practices
- Troubleshooting
Overview
Encore Health OS uses a multi-layered monitoring approach:- Error Tracking: Sentry/LogRocket (planned)
- Performance Monitoring: Core Web Vitals, custom metrics
- Log Aggregation: Structured logging, Supabase logs
- Alerting: Email/SMS notifications for critical issues
- Dashboards: Supabase Dashboard, custom dashboards (planned)
- Detect errors before users report them
- Track performance degradation
- Monitor system health
- Enable rapid incident response
Error Tracking
Current Implementation
Structured Logging:- Location:
src/platform/monitoring/logger.ts - Format: JSON with standard fields
- PHI Protection: Never logs PHI/PII
debug- Development debugginginfo- Normal operationswarn- Warning conditionserror- Error conditions
Sentry Integration (Implemented)
Current implementation:src/platform/monitoring/sentry.ts (PF-07). Initialization runs in src/main.tsx via initSentry() before React renders.
Package: @sentry/react only (no deprecated @sentry/tracing). Tracing uses reactRouterV6BrowserTracingIntegration and replay uses replayIntegration with HIPAA-safe options (maskAllText, blockAllMedia).
Required environment variable:
VITE_SENTRY_DSN– If unset, Sentry is disabled (enabled: false). Do not commit DSN to repo; use env per environment.
SENTRY_AUTH_TOKEN– Auth token for uploads (e.g. production CI).SENTRY_ORG– Sentry organization slug.SENTRY_PROJECT– Sentry project slug.
npm run build generates source maps and uploads them via @sentry/vite-plugin so production errors show readable stack traces.
Initialization (reference): See src/platform/monitoring/sentry.ts. Summary:
dsnandenabledfromVITE_SENTRY_DSN.releasefromVITE_APP_VERSION(buildId from Vite).beforeSendscrubsui.inputbreadcrumb data and truncates message/exception text to limit PHI.
ErrorBoundary from @/platform/monitoring, which reports to Sentry and shows a fallback UI. Do not use @sentry/react’s ErrorBoundary directly; the platform boundary is used in App.tsx and route-level boundaries.
Planned Integration: LogRocket
Alternative to Sentry:- Session replay
- User interaction tracking
- Network request monitoring
- Create LogRocket project
- Install SDK:
- Initialize in
src/main.tsx:
Error Tracking Best Practices
✅ DO:- Capture all unhandled errors
- Include correlation IDs
- Sanitize PHI/PII before sending
- Group similar errors
- Track error rates
- Log full user data
- Include passwords or tokens
- Log PHI/PII
- Overwhelm with noise
Performance Monitoring
Core Web Vitals
Current Implementation:- Location:
src/platform/monitoring/performance-monitor.ts - Metrics: LCP, INP (replaces FID), CLS
- Sampling: 10% in production, 100% in development
- LCP (Largest Contentful Paint): < 2.5s (good)
- INP (Interaction to Next Paint): < 200ms (good)
- CLS (Cumulative Layout Shift): < 0.1 (good)
Performance Targets
Lighthouse Scores:- Performance: 85+
- Accessibility: 90+
- Best Practices: 90+
- SEO: 90+
- PWA: 90+
- LCP: < 2.5s
- INP: < 200ms
- CLS: < 0.1
Custom Metrics
Track Business Metrics:- Form submission time
- Report generation time
- API response times
- Database query times
Log Aggregation
Structured Logging
Format:timestamp- ISO 8601 timestamplevel- Log level (debug, info, warn, error)module- Module/core nameaction- Action being performedmessage- Human-readable messageuser_id- User ID (stable UUID, not PHI)org_id- Organization IDsite_id- Site ID (if applicable)correlation_id- Request correlation ID
- Never log names, emails, SSNs, addresses
- Only log stable IDs (UUIDs)
- Sanitize error messages
Log Destinations
Development:- Console (pretty-printed)
- Browser DevTools
- Log aggregation service (Datadog, Logtail, etc.)
- Supabase function logs
- Error tracking service (Sentry)
Supabase Logs
Edge Function Logs:- View in Supabase Dashboard → Edge Functions → Logs
- Filter by function, time range, log level
- Export logs for analysis
- Query logs in Supabase Dashboard → Database → Logs
- Monitor slow queries
- Track connection usage
Alerting Configuration
Alert Types
Critical Alerts (Immediate):- System downtime
- Database connection failures
- Authentication failures
- RLS policy violations
- Security breaches
- High error rates (> 1%)
- Performance degradation
- High database CPU (> 80%)
- Storage usage > 80%
- Edge function failures
- Daily usage statistics
- Weekly performance summary
- Monthly security review
Alert Channels
Email:- Use
send-email-notificationedge function - Send to platform team email
- Include correlation IDs and context
- Use
send-sms-notificationedge function - Only for critical alerts
- Keep messages concise
- Use Platform Notifications (PF-10)
- Show in application UI
- Persist in database
Alert Configuration (Planned)
Set up alerts for:- Error rate > 1% over 5 minutes
- LCP > 3s for > 10% of users
- Database CPU > 80% for > 5 minutes
- Edge function failure rate > 5%
- Storage usage > 90%
Dashboard Setup
Supabase Dashboard
Available Metrics:- Database CPU/Memory usage
- API request count
- Storage usage
- Edge function invocations
- Authentication events
- Go to Supabase Dashboard
- Navigate to Project → Metrics
- View real-time and historical data
Custom Dashboard (Planned)
Metrics to Display:- Error rate (last 24 hours)
- Performance metrics (LCP, INP, CLS)
- Active users
- API response times
- Database query performance
- Edge function success rate
- Grafana (planned)
- Datadog (planned)
- Custom React dashboard (planned)
Incident Response
Incident Severity Levels
P0 - Critical:- System down
- Data breach
- Security incident
- Response: Immediate (< 15 minutes)
- Major feature broken
- Performance degradation
- High error rate
- Response: Within 1 hour
- Minor feature broken
- Performance issues (non-critical)
- Response: Within 4 hours
- Cosmetic issues
- Non-critical bugs
- Response: Next business day
Incident Response Process
1. Detection:- Monitor alerts
- Review error tracking
- Check performance metrics
- Assess severity
- Identify root cause
- Assign owner
- Fix issue
- Deploy fix
- Verify resolution
- Document incident
- Identify improvements
- Update procedures
On-Call Rotation
Responsibilities:- Monitor alerts
- Respond to incidents
- Escalate if needed
- Document incidents
- Weekly rotation (planned)
- 24/7 coverage (planned)
- Escalation path defined
Best Practices
1. Monitoring Coverage
✅ DO:- Monitor all critical paths
- Track business metrics
- Set up alerts for anomalies
- Review metrics regularly
- Monitor everything (too noisy)
- Ignore false positives
- Set alerts too sensitive
- Forget to update alerts
2. Error Tracking
✅ DO:- Capture all errors
- Include context
- Group similar errors
- Track error rates
- Log PHI/PII
- Overwhelm with noise
- Ignore error trends
- Skip error boundaries
3. Performance Monitoring
✅ DO:- Track Core Web Vitals
- Monitor custom metrics
- Set performance budgets
- Optimize slow paths
- Track too many metrics
- Ignore performance regressions
- Skip performance testing
- Forget mobile performance
4. Alerting
✅ DO:- Set meaningful thresholds
- Include context in alerts
- Test alert delivery
- Review and tune alerts
- Alert on everything
- Ignore alert fatigue
- Skip alert testing
- Forget to update contacts
Troubleshooting
Issue: Too Many Alerts
Symptoms:- Alert fatigue
- Important alerts missed
- Increase alert thresholds
- Reduce alert frequency
- Group similar alerts
- Use alert suppression
- Review and tune alerts
Issue: Missing Alerts
Symptoms:- Issues not detected
- Users report before alerts
- Lower alert thresholds
- Add more alert types
- Improve monitoring coverage
- Test alert delivery
- Review alert configuration
Issue: Performance Monitoring Not Working
Symptoms:- No metrics collected
- Dashboard empty
- Verify initialization:
- Check sample rate (may be too low)
- Verify web-vitals library installed
- Check browser console for errors
- Test in development (100% sampling)
Issue: Logs Not Appearing
Symptoms:- No logs in console
- Missing log entries
- Check log level (may filter out)
- Verify logger initialized
- Check browser console filters
- Verify structured format
- Test with explicit log call
Monitoring Checklist
Setup
- Error tracking configured (Sentry/LogRocket)
- Performance monitoring initialized
- Log aggregation configured
- Alerts configured
- Dashboard created
- On-call rotation established
Ongoing
- Daily error review
- Weekly performance review
- Monthly alert tuning
- Quarterly monitoring review
- Incident post-mortems completed
Related Documentation
- Performance Patterns:
constitution.md§5.6 (Performance) - Error Handling:
src/platform/monitoring/logger.ts - Performance Monitor:
src/platform/monitoring/performance-monitor.ts - Production Readiness:
docs/operations/PRODUCTION_READINESS.md
Document Owner: Platform Operations Team
Review Frequency: Quarterly
Last Updated: 2025-01-07