> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Advanced Error Recovery & Retry Strategies — Integration Documentation

> Spec: specs/fw/specs/FW-25-advanced-error-recovery-retry.md Status: ✅ Complete Last Updated: 2026-03-20

**Spec:** `specs/fw/specs/FW-25-advanced-error-recovery-retry.md`
**Status:** ✅ Complete
**Last Updated:** 2026-03-20

***

## Overview

FW-25 is a cross-cutting concern that extends the durable execution stack (FW-46/47/48/49) with configurable retry strategies, circuit breakers, error recovery workflows, and compensation actions.

***

## Integration Points

### Platform Foundation (PF) Dependencies

| Dependency | Feature               | Integration Type  | Usage                                                                                           |
| ---------- | --------------------- | ----------------- | ----------------------------------------------------------------------------------------------- |
| PF-01      | Organizations & Sites | Direct dependency | `organization_id` and `site_id` scoping on all FW-25 tables                                     |
| PF-02      | RBAC                  | Direct dependency | Permission checks for retry/circuit breaker/recovery configuration (reuses `fw.workflows.edit`) |
| PF-10      | Notifications         | Platform Layer    | Alerts on circuit breaker state changes, recovery failures, compensation failures               |

### Internal FW Dependencies (Execution Pipeline)

| Dependency                           | Integration Contract                                                                                              | Direction     |
| ------------------------------------ | ----------------------------------------------------------------------------------------------------------------- | ------------- |
| FW-46 (Durable Execution Worker)     | Worker calls `getRetryPolicy(ruleId, nodeId)` before executing each step; on failure calls `classifyError(error)` | FW-46 → FW-25 |
| FW-47 (Dead Letter Queue)            | After max retries exhausted, FW-25 inserts into DLQ with classified error info                                    | FW-25 → FW-47 |
| FW-48 (Execution Step Checkpointing) | FW-48 step-level retry takes precedence; FW-25 node-level configs populate FW-48's `max_retries` and strategy     | FW-25 ↔ FW-48 |
| FW-49 (Execution Timeout & Watchdog) | When timeout watchdog detects expired execution, it calls `runCompensationActions(executionId)`                   | FW-49 → FW-25 |

### Execution Pipeline Flow

```text theme={null}
Event → FW-46 Worker dequeues → FW-48 Checkpoint step start
                                        │
                                  Execute step
                                        │
                              ┌─────────┴──────────┐
                              │                     │
                          Success               Failure
                              │                     │
                    FW-48 Checkpoint          FW-25 classifies error
                    step complete                    │
                              │           ┌─────────┴──────────┐
                              │       Transient            Permanent
                              │           │                     │
                              │     FW-25 retry policy     FW-25 compensation
                              │     (backoff + retry)      (reverse order)
                              │           │                     │
                              │     Retry budget          FW-47 DLQ
                              │     exhausted?                  │
                              │           │              FW-49 notifies
                              │     FW-47 DLQ                owner
                              │
                        Next step...
```

***

## Database Tables

| Table                              | Purpose                         | Status     |
| ---------------------------------- | ------------------------------- | ---------- |
| `fw_workflow_retry_configs`        | Retry configuration per node    | 📝 Planned |
| `fw_workflow_circuit_breakers`     | Circuit breaker state per node  | 📝 Planned |
| `fw_workflow_recovery_workflows`   | Recovery workflow definitions   | 📝 Planned |
| `fw_workflow_compensation_actions` | Compensation action definitions | 📝 Planned |

### SECURITY DEFINER Functions

| Function                                   | Purpose                                                                                                                                                                                             | Status     |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `fw_has_rule_org_access(rule_id, user_id)` | Looks up `organization_id` from `fw_automation_rules`, then delegates to `pf_has_org_access()` for V2 role-assignment check (with `expires_at` filtering). Used in RLS policies to avoid recursion. | 📝 Planned |

***

## Event Contracts (Future)

Circuit breaker state change events are planned for future phases. See `EVENT_CONTRACTS.md` for event schema conventions.

***

## Related Integration Docs

* [FW-46 Durable Execution Worker](./durable-execution-worker-integration.md)
* [FW-47 Dead Letter Queue](./dead-letter-queue-integration.md)
* [FW-48 Execution Step Checkpointing](./execution-step-checkpointing-integration.md)
* [FW-49 Execution Timeout & Watchdog](./execution-timeout-watchdog-integration.md)
