Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt

Use this file to discover all available pages before exploring further.

Version: 1.0.0
Last Updated: 2025-12-24
Status: Active
Owner: Platform Foundation (PF-36)

Overview

The System Health Dashboard provides real-time visibility into your application’s performance and health status. Administrators can monitor metrics, configure alerts, and analyze historical trends.

Accessing the Dashboard

Navigate to Settings → System Health to access the monitoring dashboard.

Dashboard Features

System Status Overview

The top section shows the overall system health:
  • Healthy (green): All components operational
  • Degraded (yellow): Some components experiencing issues
  • Unhealthy (red): Critical components failing

Component Health

Monitor the status of key dependencies:
  • Database: Connection and query performance
  • Auth: Authentication service availability
  • Storage: File storage accessibility
  • Integrations: External service connections

Performance Metrics

View key performance indicators:
  • API Response Time: Average, p95, and p99 response times
  • Error Rate: Percentage of failed requests
  • Request Volume: Requests per minute/hour

Web Vitals

Core Web Vitals for user experience:
  • LCP (Largest Contentful Paint): < 2.5s good, > 4s poor
  • INP (Interaction to Next Paint): < 200ms good, > 500ms poor
  • CLS (Cumulative Layout Shift): < 0.1 good, > 0.25 poor

Configuring Alerts

Creating an Alert

  1. Go to System Health → Alerts
  2. Click Create Alert
  3. Configure:
    • Name: Descriptive alert name
    • Metric: What to monitor (e.g., API Response Time)
    • Condition: Threshold and operator (e.g., > 500ms)
    • Severity: Info, Warning, or Critical
    • Notifications: Email and/or in-app

Alert Settings

  • Evaluation Window: Time period for averaging (1-60 minutes)
  • Cooldown Period: Minimum time between repeated alerts

Responding to Alerts

Alert Workflow

  1. Triggered: Alert condition met
  2. Acknowledged: Team member reviewing
  3. Resolved: Issue fixed (manual or auto-resolved)

Resolution Notes

When resolving an alert, document:
  • Root cause
  • Actions taken
  • Prevention measures

Historical Analysis

Access System Health → History for:
  • Trend Charts: 30-day metric visualization
  • Period Comparison: Week-over-week, month-over-month
  • Data Export: CSV or JSON format
  • Performance Reports: Summary statistics

Best Practices

  1. Start with warning thresholds before critical
  2. Use appropriate cooldown periods to reduce noise
  3. Document resolutions for future reference
  4. Review trends weekly to identify patterns