Documentation Index
Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
Use this file to discover all available pages before exploring further.
Spec: specs/fw/specs/FW-25-advanced-error-recovery-retry.md
Status: ✅ Complete
Last Updated: 2026-03-20
Overview
FW-25 is a cross-cutting concern that extends the durable execution stack (FW-46/47/48/49) with configurable retry strategies, circuit breakers, error recovery workflows, and compensation actions.
Integration Points
| Dependency | Feature | Integration Type | Usage |
|---|
| PF-01 | Organizations & Sites | Direct dependency | organization_id and site_id scoping on all FW-25 tables |
| PF-02 | RBAC | Direct dependency | Permission checks for retry/circuit breaker/recovery configuration (reuses fw.workflows.edit) |
| PF-10 | Notifications | Platform Layer | Alerts on circuit breaker state changes, recovery failures, compensation failures |
Internal FW Dependencies (Execution Pipeline)
| Dependency | Integration Contract | Direction |
|---|
| FW-46 (Durable Execution Worker) | Worker calls getRetryPolicy(ruleId, nodeId) before executing each step; on failure calls classifyError(error) | FW-46 → FW-25 |
| FW-47 (Dead Letter Queue) | After max retries exhausted, FW-25 inserts into DLQ with classified error info | FW-25 → FW-47 |
| FW-48 (Execution Step Checkpointing) | FW-48 step-level retry takes precedence; FW-25 node-level configs populate FW-48’s max_retries and strategy | FW-25 ↔ FW-48 |
| FW-49 (Execution Timeout & Watchdog) | When timeout watchdog detects expired execution, it calls runCompensationActions(executionId) | FW-49 → FW-25 |
Execution Pipeline Flow
Event → FW-46 Worker dequeues → FW-48 Checkpoint step start
│
Execute step
│
┌─────────┴──────────┐
│ │
Success Failure
│ │
FW-48 Checkpoint FW-25 classifies error
step complete │
│ ┌─────────┴──────────┐
│ Transient Permanent
│ │ │
│ FW-25 retry policy FW-25 compensation
│ (backoff + retry) (reverse order)
│ │ │
│ Retry budget FW-47 DLQ
│ exhausted? │
│ │ FW-49 notifies
│ FW-47 DLQ owner
│
Next step...
Database Tables
| Table | Purpose | Status |
|---|
fw_workflow_retry_configs | Retry configuration per node | 📝 Planned |
fw_workflow_circuit_breakers | Circuit breaker state per node | 📝 Planned |
fw_workflow_recovery_workflows | Recovery workflow definitions | 📝 Planned |
fw_workflow_compensation_actions | Compensation action definitions | 📝 Planned |
SECURITY DEFINER Functions
| Function | Purpose | Status |
|---|
fw_has_rule_org_access(rule_id, user_id) | Looks up organization_id from fw_automation_rules, then delegates to pf_has_org_access() for V2 role-assignment check (with expires_at filtering). Used in RLS policies to avoid recursion. | 📝 Planned |
Event Contracts (Future)
Circuit breaker state change events are planned for future phases. See EVENT_CONTRACTS.md for event schema conventions.