Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt

Use this file to discover all available pages before exploring further.

Spec: specs/fw/specs/FW-25-advanced-error-recovery-retry.md Status: ✅ Complete Last Updated: 2026-03-20

Overview

FW-25 is a cross-cutting concern that extends the durable execution stack (FW-46/47/48/49) with configurable retry strategies, circuit breakers, error recovery workflows, and compensation actions.

Integration Points

Platform Foundation (PF) Dependencies

DependencyFeatureIntegration TypeUsage
PF-01Organizations & SitesDirect dependencyorganization_id and site_id scoping on all FW-25 tables
PF-02RBACDirect dependencyPermission checks for retry/circuit breaker/recovery configuration (reuses fw.workflows.edit)
PF-10NotificationsPlatform LayerAlerts on circuit breaker state changes, recovery failures, compensation failures

Internal FW Dependencies (Execution Pipeline)

DependencyIntegration ContractDirection
FW-46 (Durable Execution Worker)Worker calls getRetryPolicy(ruleId, nodeId) before executing each step; on failure calls classifyError(error)FW-46 → FW-25
FW-47 (Dead Letter Queue)After max retries exhausted, FW-25 inserts into DLQ with classified error infoFW-25 → FW-47
FW-48 (Execution Step Checkpointing)FW-48 step-level retry takes precedence; FW-25 node-level configs populate FW-48’s max_retries and strategyFW-25 ↔ FW-48
FW-49 (Execution Timeout & Watchdog)When timeout watchdog detects expired execution, it calls runCompensationActions(executionId)FW-49 → FW-25

Execution Pipeline Flow

Event → FW-46 Worker dequeues → FW-48 Checkpoint step start

                                  Execute step

                              ┌─────────┴──────────┐
                              │                     │
                          Success               Failure
                              │                     │
                    FW-48 Checkpoint          FW-25 classifies error
                    step complete                    │
                              │           ┌─────────┴──────────┐
                              │       Transient            Permanent
                              │           │                     │
                              │     FW-25 retry policy     FW-25 compensation
                              │     (backoff + retry)      (reverse order)
                              │           │                     │
                              │     Retry budget          FW-47 DLQ
                              │     exhausted?                  │
                              │           │              FW-49 notifies
                              │     FW-47 DLQ                owner

                        Next step...

Database Tables

TablePurposeStatus
fw_workflow_retry_configsRetry configuration per node📝 Planned
fw_workflow_circuit_breakersCircuit breaker state per node📝 Planned
fw_workflow_recovery_workflowsRecovery workflow definitions📝 Planned
fw_workflow_compensation_actionsCompensation action definitions📝 Planned

SECURITY DEFINER Functions

FunctionPurposeStatus
fw_has_rule_org_access(rule_id, user_id)Looks up organization_id from fw_automation_rules, then delegates to pf_has_org_access() for V2 role-assignment check (with expires_at filtering). Used in RLS policies to avoid recursion.📝 Planned

Event Contracts (Future)

Circuit breaker state change events are planned for future phases. See EVENT_CONTRACTS.md for event schema conventions.