> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Platform Foundation & Workflow — Business Automation Architecture Research

> Version: 1.1 Last Updated: 2026-03-15 Status: Architecture Research & Enhanced Recommendations Builds On: CROSS_CORE_EVENTS_AUTOMATION_WORKFLOW_DEEP_DIVE.md

**Version:** 1.1
**Last Updated:** 2026-03-15
**Status:** Architecture Research & Enhanced Recommendations
**Builds On:** [CROSS\_CORE\_EVENTS\_AUTOMATION\_WORKFLOW\_DEEP\_DIVE.md](./CROSS_CORE_EVENTS_AUTOMATION_WORKFLOW_DEEP_DIVE.md)

This document extends the Cross-Core Events, Automation & Workflow Deep Dive with industry research, pattern analysis, and enhanced recommendations for the Platform Foundation (PF) and Forms & Workflow (FW) cores — focusing on business automation, event-driven architecture, and workflow orchestration.

***

## Table of Contents

1. [Executive Summary](#1-executive-summary)
2. [Current State Assessment](#2-current-state-assessment)
3. [Industry Research & Pattern Analysis](#3-industry-research--pattern-analysis)
4. [Gap Analysis: PF Business Automation](#4-gap-analysis-pf-business-automation)
5. [Gap Analysis: FW Workflow Engine](#5-gap-analysis-fw-workflow-engine)
6. [Enhanced Recommendations: Platform Foundation](#6-enhanced-recommendations-platform-foundation)
7. [Enhanced Recommendations: FW Workflow Engine](#7-enhanced-recommendations-fw-workflow-engine)
8. [Proposed Architecture: Durable Workflow Execution](#8-proposed-architecture-durable-workflow-execution)
9. [Proposed Architecture: Business Rules Engine](#9-proposed-architecture-business-rules-engine)
10. [Proposed Architecture: Event Mesh & Dead Letter Queue](#10-proposed-architecture-event-mesh--dead-letter-queue)
11. [FHIR Workflow Alignment](#11-fhir-workflow-alignment)
12. [Cross-Core Business Process Catalog](#12-cross-core-business-process-catalog)
13. [Implementation Roadmap](#13-implementation-roadmap)
14. [Risk Assessment](#14-risk-assessment)
15. [References](#15-references)
16. [Appendix A: Supabase Queues (pgmq) — Enhanced Execution Architecture](#appendix-a-supabase-queues-pgmq--enhanced-execution-architecture)
17. [Appendix B: json-rules-engine — Concrete Business Rules Implementation](#appendix-b-json-rules-engine--concrete-business-rules-implementation)
18. [Appendix C: FHIR Behavioral Health Alignment — Extended Research](#appendix-c-fhir-behavioral-health-alignment--extended-research)
19. [Appendix D: Industry Sources](#appendix-d-industry-sources)

***

## 1. Executive Summary

The Encore Health OS platform has a strong foundation for forms (FW-01/02), automation rules (FW-03), visual workflows (FW-06), event publishing (FW-16), and approval chains (FW-34). However, the deep dive identified critical gaps in **execution durability**, **event delivery guarantees**, and **business rules orchestration** that must be addressed before production-grade business automation.

This research identifies **18 enhanced recommendations** organized into three tracks:

| Track                                         | Focus                                                                                      | Priority |
| --------------------------------------------- | ------------------------------------------------------------------------------------------ | -------- |
| **Track A: Execution Durability**             | Close the queued-execution loop, add retry/compensation, implement dead letter queues      | CRITICAL |
| **Track B: Business Rules & Decision Engine** | Configurable rules beyond trigger→action, clinical decision support hooks, SLA enforcement | HIGH     |
| **Track C: Event Architecture Maturity**      | Schema registry, event versioning, unified event bus documentation, observability          | MEDIUM   |

**Key insight:** The platform's Supabase-native architecture is well-suited for a **durable execution model** using `pg_cron` + Edge Functions + `fw_workflow_executions` as the state store — without introducing external orchestration tools (Temporal, Inngest). This keeps the stack unified while achieving the reliability guarantees healthcare workflows demand.

***

## 2. Current State Assessment

### 2.1 Platform Foundation (PF) — Business Automation Capabilities

| Module                         | Status             | Business Automation Role                          |
| ------------------------------ | ------------------ | ------------------------------------------------- |
| **PF-04: Audit Logging**       | Complete           | Compliance trail for all automated actions        |
| **PF-08: Forms Integration**   | Complete           | Platform layer for form-triggered automations     |
| **PF-10: Notifications**       | Complete           | Multi-channel delivery (in-app, email, SMS, push) |
| **PF-27: AI Integration**      | Complete (Phase 3) | AI-assisted suggestions, report narratives        |
| **PF-29: Unified Task System** | Complete           | Task lifecycle management, assignment, tracking   |
| **PF-35: Integration Hub**     | Complete           | External API connections, webhook management      |
| **PF-36: System Health**       | Complete           | Health monitoring, alerting                       |
| **PF-42: Rate Limiting**       | Spec               | Protects automation from runaway execution        |
| **PF-45: Feature Flags**       | Complete           | Progressive rollout of automation features        |
| **PF-47: Bulk Operations**     | Spec               | Batch processing for automation results           |
| **PF-48: Security Events**     | Spec               | Security event monitoring and alerting            |
| **PF-50: Tenant Provisioning** | Complete           | Automated tenant setup workflows                  |
| **PF-66: Realtime Layer**      | Complete           | Live UI updates for workflow status               |

### 2.2 Forms & Workflow (FW) — Automation Capabilities

| Module                                | Status    | Role                                            |
| ------------------------------------- | --------- | ----------------------------------------------- |
| **FW-01: Form Builder**               | Complete  | Form definition, versioning, conditional logic  |
| **FW-02: Form Submissions**           | Complete  | Submission lifecycle, validation                |
| **FW-03: Automation Engine**          | Complete  | Trigger → condition → action rules              |
| **FW-06: Visual Workflow Builder**    | Complete  | React Flow graph editor                         |
| **FW-16: Event-Based Triggers**       | Complete  | Domain event registry, date-relative triggers   |
| **FW-17: Condition Builder**          | Complete  | Complex condition evaluation                    |
| **FW-18: Workflow Variables**         | Complete  | Variable binding in workflow execution          |
| **FW-22: Execution Monitoring**       | Complete  | Real-time execution status tracking             |
| **FW-23: Performance Analytics**      | Complete  | Workflow performance metrics                    |
| **FW-24: Testing & Sandbox**          | Complete  | Sandbox execution, test cases                   |
| **FW-25: Error Recovery & Retry**     | Spec only | Retry strategies, circuit breaker, compensation |
| **FW-26: Scheduling & Resources**     | Spec only | Resource-aware scheduling                       |
| **FW-29: Notifications & Alerts**     | Complete  | Workflow-triggered notifications                |
| **FW-34: Approval Workflows**         | Complete  | Sequential/parallel approval chains             |
| **FW-35: SLA Deadline Management**    | Spec only | SLA tracking and escalation                     |
| **FW-40: Quorum-Based Approval**      | Spec only | Multi-approver quorum logic                     |
| **FW-41: Sub-Workflow Orchestration** | Spec only | Composable sub-workflows                        |
| **FW-43: Audit Trail & Compliance**   | Spec only | Compliance reporting for workflows              |

### 2.3 Critical Gaps (from Deep Dive)

1. **Queued executions never processed** — `fw_process_domain_event()` creates `fw_workflow_executions` rows with `status = 'queued'` but no worker picks them up.
2. **Form submission → automation invocation path unclear** — No documented or implemented bridge between `fw_form_submissions` INSERT and `automation-executor` invocation.
3. **Dual event paths not unified** — Table-driven (fw\_domain\_events) vs HTTP (event-consumer) with no clear routing documentation.
4. **XState workflow machine disconnected** — Client-side step machine exists but is not wired to server-side execution.
5. **No dead letter queue** — Failed events/executions have no recovery path.
6. **No event schema versioning** — Event payloads can change without consumer awareness.

***

## 3. Industry Research & Pattern Analysis

### 3.1 Healthcare Workflow Automation — Industry Patterns

**Behavioral health ERP systems** require workflow automation patterns that differ from general-purpose ERP:

| Pattern                         | Healthcare Requirement                                            | Encore Health OS Status   |
| ------------------------------- | ----------------------------------------------------------------- | ------------------------- |
| **Human-in-the-loop approvals** | Clinical decisions require licensed professional review           | FW-34 (Complete)          |
| **Deadline-driven escalation**  | Authorization expiry, treatment plan renewals, discharge planning | FW-35 (Spec only)         |
| **Regulatory audit trails**     | 42 CFR Part 2 (substance abuse), HIPAA, state licensing           | PF-04 + FW-43 (Spec only) |
| **Cross-entity workflows**      | Patient intake spans CE → PM → CL → RH → FA                       | Partially via events      |
| **Compensation/rollback**       | Failed billing should not block clinical documentation            | FW-25 (Spec only)         |
| **Configurable per-tenant**     | Each organization has different intake procedures                 | FW-03 rules are per-org   |
| **Temporal awareness**          | Actions relative to dates (admission +7d, discharge -3d)          | FW-16 date-relative       |

**Key industry insight:** Leading healthcare platforms (Epic, Cerner/Oracle Health, athenahealth) implement workflow as a **state machine with durable execution** — not as simple trigger→action rules. The workflow engine must survive restarts, handle long-running processes (days/weeks for authorizations), and support human-in-the-loop pauses.

### 3.2 Event-Driven Architecture — Best Practices

**At-least-once delivery with idempotent consumers** is the proven pattern for healthcare event systems:

| Practice                    | Description                                                                                    | Applicability                                                                    |
| --------------------------- | ---------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| **Transactional outbox**    | Write event to outbox table in same transaction as business data; worker publishes from outbox | High — prevents lost events when business write succeeds but event publish fails |
| **Idempotency keys**        | Each event carries a unique ID; consumers track processed IDs to skip duplicates               | High — `fw_domain_events.id` can serve as idempotency key                        |
| **Dead letter queue (DLQ)** | Failed events moved to DLQ after N retries for manual review                                   | Critical — currently missing                                                     |
| **Event schema registry**   | Versioned schemas with backward compatibility rules                                            | Medium — prevents breaking consumers                                             |
| **Consumer groups**         | Multiple independent consumers per event type, each tracking their own offset                  | Medium — enables new consumers without modifying publishers                      |
| **Saga/choreography**       | Long-running business processes as a series of compensatable steps                             | High — patient intake, billing cycles                                            |

**Supabase-specific patterns:**

| Pattern                  | Implementation                                                                                                     |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------ |
| **Transactional outbox** | `fw_domain_events` already serves this role — events written in same DB transaction                                |
| **Worker polling**       | `pg_cron` job runs every N seconds, calls Edge Function via `pg_net` for queued executions                         |
| **Database webhooks**    | Supabase Database Webhooks (INSERT on `fw_workflow_executions` WHERE status='queued') → Edge Function              |
| **Retry with backoff**   | `fw_workflow_executions` tracks `retry_count` and `next_retry_at`; worker skips rows where `next_retry_at > now()` |
| **Circuit breaker**      | Per-automation-rule failure counter; disable rule after N consecutive failures                                     |

### 3.3 Durable Execution Patterns (Temporal/Inngest/Trigger.dev)

Modern workflow orchestration tools share common patterns that can be adapted to a Supabase-native engine:

| Pattern               | Description                                           | Supabase Adaptation                                                                    |
| --------------------- | ----------------------------------------------------- | -------------------------------------------------------------------------------------- |
| **Durable functions** | Workflow function state survives process restarts     | `fw_workflow_executions` + `fw_execution_logs` as checkpoint store                     |
| **Step functions**    | Each workflow step is independently retryable         | Each node in visual workflow = one execution step with its own status                  |
| **Sleep/wait**        | Workflow can sleep for hours/days then resume         | `fw_workflow_executions.wait_until` column; worker skips until time arrives            |
| **Human-in-the-loop** | Workflow pauses, waits for human action, resumes      | FW-34 approval pattern: status = 'waiting\_approval', resume on approval record INSERT |
| **Fan-out/fan-in**    | Parallel branches that converge                       | Parallel approval chains in FW-34; extend to general parallel execution                |
| **Compensation/saga** | On failure, run compensating actions in reverse order | FW-25 spec covers this; needs implementation                                           |
| **Event triggers**    | External events resume waiting workflows              | `fw_domain_events` → match waiting executions by event type                            |
| **Visibility**        | Full execution history with timing                    | FW-22 monitoring + FW-23 analytics                                                     |

**Recommendation:** Do NOT introduce Temporal or Inngest as dependencies. Instead, build a **Supabase-native durable execution engine** using the patterns above. The `fw_workflow_executions` table is already the right foundation — it needs a **worker loop** and **step-level checkpointing**.

### 3.4 XState v5 — Workflow Orchestration Assessment

XState 5 is excellent for:

* **Client-side UI state** (form wizards, multi-step dialogs)
* **Visualization** of state transitions
* **Type-safe state machines** with TypeScript

XState 5 is NOT ideal for:

* **Server-side durable execution** (no built-in persistence, no distributed execution)
* **Long-running workflows** (days/weeks) — XState machines are in-memory
* **Multi-tenant execution** — no built-in tenant isolation

**Recommendation:** Keep XState for **client-side workflow visualization and wizard navigation** (aligning with current "experimental/reserved" status). Server-side execution should remain in the `automation-executor` + `fw_workflow_executions` model. If a future need arises for complex client-side approval UIs, XState can drive the UI while the server remains the source of truth.

### 3.5 FHIR Workflow Patterns — Behavioral Health Relevance

HL7 FHIR defines workflow resources that map to Encore Health OS concepts:

| FHIR Resource          | Purpose                                         | Encore Health OS Equivalent                             |
| ---------------------- | ----------------------------------------------- | ------------------------------------------------------- |
| **PlanDefinition**     | Template for a clinical/operational process     | `fw_forms` (form template) + visual workflow definition |
| **ActivityDefinition** | Single action template (order, referral, task)  | `fw_automation_actions`                                 |
| **Task**               | Trackable unit of work with lifecycle           | `pf_tasks` (PF-29)                                      |
| **ServiceRequest**     | Request for a service (referral, authorization) | PM scheduling + CL referral                             |
| **CarePlan**           | Patient-specific plan from PlanDefinition       | CL treatment plans                                      |
| **Encounter**          | Clinical visit context                          | `pm_encounters` (canonical entity)                      |

**Behavioral health specific:**

* **ASAM Criteria workflow:** Assessment → Level of Care determination → Authorization → Admission — maps to a multi-step workflow with clinical decision support
* **Treatment plan review cycles:** 30/60/90-day reviews with escalation — maps to date-relative triggers + approval chains
* **Discharge planning:** Multi-step process spanning CL, RH, FA, CE — maps to cross-core choreography

**Recommendation:** While full FHIR compliance is not required for MVP, align the workflow engine's **PlanDefinition-equivalent** (workflow templates) and **Task-equivalent** (PF-29) with FHIR semantics to ease future interoperability. Specifically:

* Workflow template definitions should support `action` arrays with `trigger`, `condition`, `input`, `output` (mirroring PlanDefinition.action)
* Tasks created by workflows should carry `intent`, `status`, `priority`, `for` (patient reference) fields

### 3.6 Business Rules Engines — Healthcare Application

Healthcare business rules operate at multiple levels:

| Level                         | Examples                                                    | Current Support          |
| ----------------------------- | ----------------------------------------------------------- | ------------------------ |
| **Operational rules**         | "If bed census > 90%, notify admissions coordinator"        | FW-03 automation rules   |
| **Financial rules**           | "If payer = Medicaid AND service = IOP, require prior auth" | Partially in FA/PM logic |
| **Clinical decision support** | "If PHQ-9 score > 20, flag for clinical review"             | Not formalized           |
| **Compliance rules**          | "Staff-to-patient ratio must meet state minimums"           | GR governance specs      |
| **SLA rules**                 | "Authorization response within 48 hours or escalate"        | FW-35 (spec only)        |

**DMN (Decision Model and Notation)** provides a standard for expressing business rules as decision tables:

```
Rule: Authorization Required
┌──────────────┬──────────────┬──────────────┬──────────┐
│ Payer Type   │ Service Line │ Visit Count  │ Result   │
├──────────────┼──────────────┼──────────────┼──────────┤
│ Medicaid     │ IOP          │ any          │ Required │
│ Medicaid     │ PHP          │ any          │ Required │
│ Commercial   │ IOP          │ > 10         │ Required │
│ Commercial   │ Residential  │ > 30         │ Required │
│ Self-Pay     │ any          │ any          │ None     │
└──────────────┴──────────────┴──────────────┴──────────┘
```

**Recommendation:** Extend FW-17 (Condition Builder) to support **decision tables** as a condition evaluation mode. This enables non-technical staff to configure complex business rules without code changes. Store decision tables in a new `fw_decision_tables` entity linked to automation rules or workflow condition nodes.

***

## 4. Gap Analysis: PF Business Automation

### 4.1 Missing Platform Capabilities for Business Automation

| Gap ID       | Gap                                               | Impact                                                                                   | Recommended Spec                           |
| ------------ | ------------------------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------ |
| **PF-GA-01** | No centralized business process registry          | Teams cannot discover what automated processes exist across cores                        | New: PF-82 Business Process Registry       |
| **PF-GA-02** | PF-29 Tasks not linked to workflow executions     | Tasks created by workflows are disconnected from execution context                       | Enhance PF-29 with `source_execution_id`   |
| **PF-GA-03** | PF-10 Notifications lack workflow-aware templates | Notification templates cannot reference workflow context (step name, approver, deadline) | Enhance PF-10 Phase 5                      |
| **PF-GA-04** | No SLA tracking platform service                  | FW-35 defines SLA per-workflow but no platform-wide SLA dashboard                        | New: PF-83 SLA Management Platform Layer   |
| **PF-GA-05** | PF-35 Integration Hub lacks event forwarding      | External integrations cannot subscribe to domain events                                  | Enhance PF-35 with outbound event webhooks |
| **PF-GA-06** | No business calendar / scheduling awareness       | Workflows cannot reason about business hours, holidays, on-call schedules                | New: PF-84 Business Calendar Service       |
| **PF-GA-07** | PF-47 Bulk Operations not connected to automation | Bulk actions (e.g., mass reassignment) should trigger automation rules                   | Enhance PF-47 to publish events per-record |
| **PF-GA-08** | No platform-level retry/backoff service           | Each edge function implements its own retry logic                                        | New: shared `_shared/retry.ts` utility     |

### 4.2 PF Modules Critical for Business Automation (Spec-Only, Need Implementation)

| Spec      | Name                       | Why Critical for Automation                                        |
| --------- | -------------------------- | ------------------------------------------------------------------ |
| **PF-42** | Rate Limiting & Throttling | Prevents runaway automation from overwhelming the system           |
| **PF-45** | Feature Flags              | Progressive rollout of new automation rules per tenant             |
| **PF-47** | Bulk Operations            | Mass record processing with per-record event emission              |
| **PF-48** | Security Event Monitoring  | Detects anomalous automation patterns (e.g., mass record deletion) |
| **PF-50** | Tenant Provisioning        | Automated setup of default automation rules for new tenants        |

***

## 5. Gap Analysis: FW Workflow Engine

### 5.1 Execution Model Gaps

| Gap ID       | Gap                                                 | Severity | Impact                                           |
| ------------ | --------------------------------------------------- | -------- | ------------------------------------------------ |
| **FW-GA-01** | No worker processes queued executions               | CRITICAL | Event-triggered workflows never run              |
| **FW-GA-02** | No dead letter queue for failed events/executions   | CRITICAL | Failed automation silently lost                  |
| **FW-GA-03** | Form submission → automation path undocumented      | HIGH     | Inconsistent form-triggered automation           |
| **FW-GA-04** | No step-level checkpointing                         | HIGH     | Long workflows restart from beginning on failure |
| **FW-GA-05** | No execution timeout enforcement                    | HIGH     | Stuck workflows consume resources indefinitely   |
| **FW-GA-06** | FW-25 (Error Recovery) not implemented              | HIGH     | No retry or compensation logic                   |
| **FW-GA-07** | FW-35 (SLA Deadlines) not implemented               | MEDIUM   | No deadline tracking or escalation               |
| **FW-GA-08** | FW-41 (Sub-Workflows) not implemented               | MEDIUM   | Cannot compose reusable workflow fragments       |
| **FW-GA-09** | No decision table support                           | MEDIUM   | Complex rules require code changes               |
| **FW-GA-10** | No workflow instance migration on definition change | LOW      | Active instances break when definition updated   |

### 5.2 Event System Gaps

| Gap ID       | Gap                                                                 | Severity |
| ------------ | ------------------------------------------------------------------- | -------- |
| **FW-EV-01** | No event schema versioning                                          | HIGH     |
| **FW-EV-02** | No event replay capability                                          | MEDIUM   |
| **FW-EV-03** | No consumer offset tracking                                         | MEDIUM   |
| **FW-EV-04** | event-consumer handlers hardcoded                                   | MEDIUM   |
| **FW-EV-05** | No event correlation (linking related events in a business process) | MEDIUM   |

***

## 6. Enhanced Recommendations: Platform Foundation

### R-PF-01: Business Process Registry (NEW — PF-82)

**Priority:** HIGH
**Rationale:** As automation rules and workflows grow across cores, operators need a single view of "what automated processes exist, what triggers them, and what they do."

**Proposed capabilities:**

* Registry table `pf_business_processes` linking to `fw_automation_rules` and workflow definitions
* Each process has: name, description, owning core, trigger summary, status (active/draft/disabled)
* Dashboard page showing all active business processes with execution stats
* Search/filter by core, trigger type, status
* Impact analysis: "If I disable this process, what downstream effects occur?"

**Integration points:** FW-03 (automation rules), FW-06 (workflow definitions), PF-29 (tasks), PF-10 (notifications)

### R-PF-02: SLA Management Platform Layer (NEW — PF-83)

**Priority:** HIGH
**Rationale:** Healthcare operations have strict SLAs — prior authorization response times, treatment plan review deadlines, discharge notification windows. FW-35 defines per-workflow SLAs; the platform needs a unified SLA tracking service.

**Proposed capabilities:**

* Platform service `@/platform/sla` exposing `createSLA()`, `checkSLA()`, `escalateSLA()` hooks
* SLA definition table `pf_sla_definitions` with: entity type, metric, threshold, escalation chain
* SLA instance table `pf_sla_instances` tracking: start time, deadline, current status, escalation level
* `pg_cron` job checking approaching/breached SLAs and triggering escalation workflows
* Dashboard widget showing SLA compliance metrics per core

**Integration points:** FW-35 (workflow SLAs), PM (authorization deadlines), CL (treatment plan reviews), HR (credentialing timelines)

### R-PF-03: Business Calendar Service (NEW — PF-84)

**Priority:** MEDIUM
**Rationale:** Workflows that involve deadlines, SLAs, and scheduling need awareness of business hours, holidays, and staff availability. Currently, date-relative triggers (FW-16) use calendar days, not business days.

**Proposed capabilities:**

* Organization-scoped business calendars (`pf_business_calendars`)
* Business hour definitions (e.g., Mon-Fri 8am-6pm EST)
* Holiday management (federal, state, organization-specific)
* Utility functions: `addBusinessDays()`, `isBusinessHour()`, `nextBusinessDay()`
* Integration with FW-16 date-relative triggers to support "5 business days before discharge"

### R-PF-04: Enhanced Task-Workflow Integration (PF-29 Enhancement)

**Priority:** HIGH
**Rationale:** Tasks created by workflow automation should maintain a bidirectional link to the workflow execution for traceability and lifecycle management.

**Proposed changes:**

* Add `source_workflow_execution_id` to `pf_tasks` (nullable FK to `fw_workflow_executions`)
* Add `source_automation_rule_id` to `pf_tasks` (nullable FK to `fw_automation_rules`)
* When a workflow creates a task, task completion should optionally resume the waiting workflow step
* Task detail view shows workflow context (which step created it, what happens next)
* Workflow execution view shows linked tasks with their status

### R-PF-05: Integration Hub Event Forwarding (PF-35 Enhancement)

**Priority:** MEDIUM
**Rationale:** External systems (EHRs, clearinghouses, payer portals) need to receive domain events. PF-35 has outbound webhooks but they are not connected to the domain event system.

**Proposed changes:**

* Add `pf_event_subscriptions` table: integration\_id, event\_type\_pattern, filter\_conditions, delivery\_config
* When `fw_domain_events` receives an event, check `pf_event_subscriptions` for matching subscriptions
* Deliver via existing outbound webhook infrastructure (PF-35) with retry logic
* Event payload transformation templates per subscription
* Delivery tracking and DLQ for failed external deliveries

***

## 7. Enhanced Recommendations: FW Workflow Engine

### R-FW-01: Durable Execution Worker (CRITICAL)

**Priority:** CRITICAL — Blocks all event-triggered automation
**Implements:** FW-GA-01 (queued execution processing)

**Architecture (detailed in Section 8):**

* `pg_cron` job runs every 10 seconds
* Calls `pg_net` to invoke a new Edge Function `workflow-executor-worker`
* Worker queries `fw_workflow_executions` for rows with `status IN ('queued', 'retry_pending')` AND `next_retry_at <= now()`
* For each execution, runs the workflow step-by-step via `automation-executor` logic
* Updates execution status after each step (checkpointing)
* On failure, applies retry policy from FW-25 or moves to DLQ

### R-FW-02: Dead Letter Queue (CRITICAL)

**Priority:** CRITICAL — Required for production reliability

**Proposed implementation:**

* New table `fw_dead_letter_queue`:
  ```
  id, organization_id, source_type ('event'|'execution'|'webhook'),
  source_id, payload, error_message, error_stack, failure_count,
  first_failed_at, last_failed_at, status ('pending'|'retried'|'resolved'|'discarded'),
  resolved_by, resolved_at, created_at
  ```
* Events/executions moved to DLQ after exhausting retry budget
* Admin UI for reviewing, retrying, or discarding DLQ items
* Alerting when DLQ depth exceeds threshold (via PF-10 notifications)
* DLQ items retain full context for debugging

### R-FW-03: Step-Level Checkpointing (HIGH)

**Priority:** HIGH — Enables long-running workflows

**Proposed implementation:**

* `fw_execution_steps` table tracking each node execution within a workflow:
  ```
  id, execution_id, node_id, node_type, status ('pending'|'running'|'completed'|'failed'|'skipped'),
  input_data, output_data, started_at, completed_at, error_message, retry_count
  ```
* Worker resumes from last completed step on restart
* Each step is independently retryable without re-running completed steps
* Step output feeds into next step's input (variable binding from FW-18)

### R-FW-04: Decision Tables (MEDIUM)

**Priority:** MEDIUM — Enables configurable business rules

**Proposed implementation:**

* New entity `fw_decision_tables`:
  ```
  id, organization_id, name, description, input_columns (JSONB),
  output_columns (JSONB), rules (JSONB array of row conditions + outputs),
  hit_policy ('first'|'unique'|'collect'|'priority'), version, status
  ```
* Integration with FW-17 Condition Builder as a "decision table" condition type
* Integration with FW-06 Workflow Builder as a "decision" node type
* UI for editing decision tables (spreadsheet-like grid)
* Hit policies following DMN standard:
  * **First:** First matching rule wins
  * **Unique:** Exactly one rule must match (error if multiple)
  * **Collect:** All matching rules contribute to output
  * **Priority:** Highest-priority matching rule wins

### R-FW-05: Event Schema Registry (MEDIUM)

**Priority:** MEDIUM — Prevents breaking changes to event consumers

**Proposed implementation:**

* Extend `fw_workflow_events` with `payload_schema` (JSON Schema) and `schema_version`
* `publishEvent()` validates payload against registered schema before INSERT
* Schema evolution rules: new fields are allowed (additive); removed fields require version bump
* Consumer documentation auto-generated from schema
* Breaking change detection in CI pipeline

### R-FW-06: Execution Timeout & Watchdog (HIGH)

**Priority:** HIGH — Prevents resource leaks

**Proposed implementation:**

* Add `timeout_seconds` to workflow definitions (default: 86400 = 24 hours)
* Add `deadline_at` to `fw_workflow_executions` (set on creation: `now() + timeout`)
* `pg_cron` watchdog job runs every 5 minutes, finds executions past deadline
* Timed-out executions: status → 'timed\_out', trigger compensation actions if defined
* Notification sent to workflow owner on timeout

### R-FW-07: Event Correlation & Business Process Tracking (MEDIUM)

**Priority:** MEDIUM — Enables end-to-end business process visibility

**Proposed implementation:**

* Add `correlation_id` to `fw_domain_events` and `fw_workflow_executions`
* Business processes (e.g., "Patient Intake for John Doe") share a correlation ID across all events and executions
* New view: "Business Process Timeline" showing all correlated events and workflow steps
* Enables questions like: "Show me everything that happened during this patient's intake"

***

## 8. Proposed Architecture: Durable Workflow Execution

### 8.1 Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────┐
│                        EVENT SOURCES                                │
├─────────────┬──────────────┬──────────────┬────────────────────────┤
│ Form Submit │ Domain Event │ Schedule     │ External Webhook       │
│ (FW-01/02)  │ (FW-16)      │ (pg_cron)    │ (PF-35)               │
└──────┬──────┴──────┬───────┴──────┬───────┴────────────┬──────────┘
       │             │              │                    │
       ▼             ▼              ▼                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                   fw_domain_events (Outbox Table)                   │
│  id | event_name | organization_id | payload | processed_at        │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│            fw_process_domain_event() (DB Trigger)                   │
│  - Match fw_automation_rules by event_config                       │
│  - INSERT into fw_workflow_executions (status = 'queued')           │
│  - Match pf_event_subscriptions for external delivery              │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│         fw_workflow_executions (Durable State Store)                │
│  id | rule_id | definition_id | status | current_step_id           │
│  trigger_payload | retry_count | next_retry_at | deadline_at       │
│  correlation_id | organization_id | created_at | updated_at        │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
              ┌───────────────┤ (pg_cron every 10s → pg_net)
              ▼               │
┌─────────────────────────────┴───────────────────────────────────────┐
│              workflow-executor-worker (Edge Function)                │
│                                                                     │
│  1. SELECT executions WHERE status IN ('queued','retry_pending')    │
│     AND next_retry_at <= now() LIMIT 10 FOR UPDATE SKIP LOCKED     │
│                                                                     │
│  2. For each execution:                                             │
│     a. SET status = 'running', lock row                            │
│     b. Load workflow definition (nodes, edges)                      │
│     c. Resume from current_step_id (or start)                      │
│     d. Execute step:                                                │
│        - Action node → run action (email, webhook, update, etc.)   │
│        - Condition node → evaluate, choose branch                  │
│        - Approval node → create task, SET status = 'waiting'       │
│        - Delay node → SET next_retry_at = now() + delay            │
│        - Sub-workflow → create child execution                     │
│     e. Checkpoint: UPDATE current_step_id, write execution_step    │
│     f. On success → next step or status = 'completed'              │
│     g. On failure → apply retry policy or move to DLQ              │
│                                                                     │
│  3. Emit execution events for monitoring (FW-22)                    │
└─────────────────────────────────────────────────────────────────────┘
```

### 8.2 Worker Execution Loop (Pseudocode)

```
// workflow-executor-worker Edge Function
async function processQueuedExecutions(organizationId?: string) {
  // 1. Claim batch of queued executions (row-level locking prevents double-processing)
  const executions = await supabase.rpc('claim_queued_executions', {
    batch_size: 10,
    org_filter: organizationId
  });

  for (const execution of executions) {
    try {
      // 2. Load workflow definition
      const definition = await loadWorkflowDefinition(execution.definition_id);

      // 3. Resume from checkpoint
      const startStep = execution.current_step_id
        ? getNextStep(definition, execution.current_step_id)
        : getStartNode(definition);

      // 4. Execute steps until completion, wait, or failure
      let currentStep = startStep;
      while (currentStep) {
        const result = await executeStep(currentStep, execution);

        // 5. Checkpoint after each step
        await checkpointStep(execution.id, currentStep.id, result);

        if (result.status === 'waiting') break;      // Human-in-the-loop
        if (result.status === 'delayed') break;       // Timer wait
        if (result.status === 'failed') throw result.error;

        currentStep = getNextStep(definition, currentStep.id, result);
      }

      // 6. Mark complete if no more steps
      if (!currentStep) {
        await completeExecution(execution.id);
      }

    } catch (error) {
      // 7. Apply retry policy
      await handleExecutionFailure(execution, error);
    }
  }
}
```

### 8.3 Database Claim Function

```sql theme={null}
-- Atomic claim of queued executions (prevents double-processing)
CREATE OR REPLACE FUNCTION claim_queued_executions(
  batch_size INTEGER DEFAULT 10,
  org_filter UUID DEFAULT NULL
)
RETURNS SETOF fw_workflow_executions AS $$
BEGIN
  RETURN QUERY
  UPDATE fw_workflow_executions
  SET status = 'running', updated_at = now()
  WHERE id IN (
    SELECT id FROM fw_workflow_executions
    WHERE status IN ('queued', 'retry_pending')
      AND (next_retry_at IS NULL OR next_retry_at <= now())
      AND (deadline_at IS NULL OR deadline_at > now())
      AND (org_filter IS NULL OR organization_id = org_filter)
    ORDER BY created_at ASC
    LIMIT batch_size
    FOR UPDATE SKIP LOCKED
  )
  RETURNING *;
END;
$$ LANGUAGE plpgsql;
```

### 8.4 Retry Policy Application

```
function handleExecutionFailure(execution, error):
  rule = loadAutomationRule(execution.rule_id)
  retryPolicy = rule.retry_config ?? DEFAULT_RETRY_POLICY

  if execution.retry_count >= retryPolicy.max_retries:
    // Move to Dead Letter Queue
    insertDLQ(execution, error)
    updateExecution(execution.id, { status: 'failed' })
    notifyOwner(execution, 'workflow_failed')
    // Run compensation actions if defined
    if rule.compensation_actions:
      scheduleCompensation(execution, rule.compensation_actions)
  else:
    // Schedule retry with exponential backoff
    delay = retryPolicy.base_delay * (2 ^ execution.retry_count)
    delay = min(delay, retryPolicy.max_delay)
    updateExecution(execution.id, {
      status: 'retry_pending',
      retry_count: execution.retry_count + 1,
      next_retry_at: now() + delay,
      last_error: error.message
    })
```

***

## 9. Proposed Architecture: Business Rules Engine

### 9.1 Decision Table Integration

```
┌─────────────────────────────────────────────────────────────────┐
│                    FW-17 Condition Builder                       │
│                                                                 │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐   │
│  │ Simple       │  │ Expression   │  │ Decision Table      │   │
│  │ Conditions   │  │ Builder      │  │ (NEW)               │   │
│  │ field=value  │  │ AND/OR/NOT   │  │ Multi-input/output  │   │
│  │ field>value  │  │ nested       │  │ DMN hit policies    │   │
│  └─────────────┘  └──────────────┘  └─────────────────────┘   │
│                                                                 │
│  All three evaluate to: { match: boolean, outputs: Record }     │
└─────────────────────────────────────────────────────────────────┘
```

### 9.2 Healthcare Business Rules Examples

**Authorization Rules (FA/PM):**

| Payer Category    | Service Line | Days Authorized | Requires Pre-Auth | Review Interval  |
| ----------------- | ------------ | --------------- | ----------------- | ---------------- |
| Medicaid          | Residential  | 30              | Yes               | Every 7 days     |
| Medicaid          | IOP          | 15              | Yes               | Every 5 sessions |
| Commercial (BCBS) | PHP          | 14              | Yes               | Every 10 days    |
| Self-Pay          | Any          | Unlimited       | No                | None             |

**Staffing Rules (HR/RH):**

| Facility Type | Census Range | Required Staff | Role                         |
| ------------- | ------------ | -------------- | ---------------------------- |
| Residential   | 1-8          | 1              | BHT (Behavioral Health Tech) |
| Residential   | 9-16         | 2              | BHT                          |
| Residential   | 17-24        | 3              | BHT + 1 Lead                 |
| IOP           | Any          | 1 per 10       | Therapist                    |

**Clinical Escalation Rules (CL):**

| Assessment | Score Range        | Action                           | Urgency      |
| ---------- | ------------------ | -------------------------------- | ------------ |
| PHQ-9      | 20-27 (Severe)     | Flag for psychiatrist review     | Urgent (24h) |
| PHQ-9      | 15-19 (Mod-Severe) | Schedule follow-up within 7 days | Standard     |
| GAD-7      | 15-21 (Severe)     | Flag for clinical director       | Urgent (24h) |
| CSSRS      | Any positive       | Immediate safety protocol        | Emergent     |

These rules currently live in code or are ad-hoc. A decision table system makes them **configurable, auditable, and tenant-specific** without code deploys.

***

## 10. Proposed Architecture: Event Mesh & Dead Letter Queue

### 10.1 Unified Event Architecture

```
                    ┌─────────────────────────────┐
                    │     Event Publishers         │
                    │  publishEvent() from any core│
                    └──────────────┬──────────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                    fw_domain_events (Event Store)                 │
│                                                                  │
│  Consumers:                                                      │
│  ┌──────────────────┐ ┌───────────────────┐ ┌────────────────┐  │
│  │ Automation Rules  │ │ External Webhooks │ │ Event Consumer │  │
│  │ (DB Trigger)      │ │ (PF-35 Subs)     │ │ (HTTP Handlers)│  │
│  │                   │ │                   │ │                │  │
│  │ fw_automation_    │ │ pf_event_         │ │ Teams notifs,  │  │
│  │ rules matching    │ │ subscriptions     │ │ CL transitions │  │
│  └────────┬─────────┘ └────────┬──────────┘ └───────┬────────┘  │
│           │                    │                     │           │
│           ▼                    ▼                     ▼           │
│  fw_workflow_executions   Outbound webhooks    Side effects      │
│  (queued → worker)        (with retry/DLQ)     (HTTP handlers)  │
└──────────────────────────────────────────────────────────────────┘
                                   │
                          On failure after retries
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                    fw_dead_letter_queue                           │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Admin UI: Review → Retry / Discard / Investigate         │   │
│  │ Alerting: Notify ops when DLQ depth > threshold          │   │
│  │ Metrics:  DLQ depth, age, resolution rate                │   │
│  └──────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────┘
```

### 10.2 Event Schema Versioning Strategy

```
fw_workflow_events (enhanced):
  id, event_name, display_name, description, core,
  payload_schema (JSONB - JSON Schema),
  schema_version (INTEGER),
  deprecated_at (TIMESTAMPTZ),
  replacement_event (TEXT - event_name of successor),
  created_at, updated_at

Evolution rules:
  - Adding optional fields: same version (backward compatible)
  - Adding required fields: new version + migration period
  - Removing fields: deprecate first, remove in next major version
  - Renaming fields: add new field + deprecate old (never rename in-place)
```

***

## 11. FHIR Workflow Alignment

### 11.1 Mapping to FHIR Resources

To future-proof interoperability, align workflow concepts with FHIR semantics:

| Encore Health OS            | FHIR Equivalent                   | Alignment Action                              |
| --------------------------- | --------------------------------- | --------------------------------------------- |
| Workflow definition (FW-06) | PlanDefinition                    | Add `fhir_plan_definition_id` optional field  |
| Automation action           | ActivityDefinition                | Map action types to FHIR activity kinds       |
| PF-29 Task                  | Task                              | Add `intent`, `priority`, `for` (patient ref) |
| Workflow execution          | Task (group)                      | Map execution status to FHIR Task.status      |
| Event                       | Communication / DocumentReference | Event payloads as FHIR-compatible resources   |

### 11.2 Behavioral Health Specific FHIR Extensions

| Extension                | Purpose                                      | Encore Health OS Integration                 |
| ------------------------ | -------------------------------------------- | -------------------------------------------- |
| **us-behavioral-health** | BH-specific encounter types                  | PM encounter classification                  |
| **ASAM-level-of-care**   | Substance use level of care                  | CL assessment → decision table               |
| **42-cfr-part-2**        | Consent management for substance use records | PF consent layer + workflow gate             |
| **treatment-episode**    | Episodes spanning multiple encounters        | RH episode → cross-core workflow correlation |

***

## 12. Cross-Core Business Process Catalog

The following are the primary cross-core business processes that the workflow engine must support. Each process spans multiple cores and requires reliable, auditable automation.

### 12.1 Patient Intake & Admission

```
CE (Referral) → PM (Scheduling) → CL (Assessment) → RH (Bed Assignment) → FA (Insurance Verify)
     │              │                   │                  │                    │
     ▼              ▼                   ▼                  ▼                    ▼
  Lead created   Intake appt      ASAM/clinical      Room assigned       Benefits verified
  Referral       scheduled        assessment          Lease agreement     Authorization
  acknowledged                    completed           signed              obtained
```

**Automation opportunities:**

* Auto-create intake appointment when referral accepted
* Auto-assign BHT based on census and specialization rules (decision table)
* Auto-initiate insurance verification on admission
* Auto-create treatment plan template on admission
* SLA: Insurance verification within 48 hours of admission

### 12.2 Treatment Plan Review Cycle

```
CL (Treatment Plan) → PM (Review Scheduling) → CL (Review Session) → FA (Authorization)
         │                    │                       │                      │
         ▼                    ▼                       ▼                      ▼
   Plan created with    Review scheduled         Review completed      Re-authorization
   30/60/90 day         for review date          with updates          submitted if
   review dates                                                         required
```

**Automation opportunities:**

* Date-relative trigger: 7 days before review due → notify therapist
* Date-relative trigger: 3 days before review due → escalate to clinical director
* Auto-submit re-authorization when treatment plan updated (decision table for payer rules)
* SLA: Treatment plan review within 5 business days of due date

### 12.3 Discharge Planning

```
CL (Discharge Plan) → RH (Room Release) → FA (Final Billing) → CE (Aftercare)
        │                   │                    │                    │
        ▼                   ▼                    ▼                    ▼
  Discharge plan       Room marked          Final charges        Aftercare referrals
  approved by          for turnover         generated            sent
  clinical director    Move-out date set    Insurance billed     Follow-up scheduled
```

**Automation opportunities:**

* Approval workflow: discharge requires clinical director sign-off (FW-34)
* Auto-generate final billing on discharge approval
* Auto-create room turnover task for facility management
* Auto-send aftercare referral packets
* SLA: Final billing submitted within 72 hours of discharge

### 12.4 Staff Onboarding & Credentialing

```
HR (Hire) → HR (Credentialing) → GR (Compliance Check) → PM (Provider Setup)
     │            │                      │                      │
     ▼            ▼                      ▼                      ▼
  Employee      Licenses verified    Background check       Provider profile
  created       Certifications       completed              created in
                tracked              Training assigned       scheduling system
```

**Automation opportunities:**

* Auto-create credentialing checklist on hire
* Date-relative trigger: 60 days before license expiration → notify HR
* Auto-assign required training modules based on role (decision table)
* Auto-activate provider in scheduling when all credentials verified
* SLA: Credentialing complete within 30 days of hire

### 12.5 Financial Cycle: Claims → Payment → Reconciliation

```
PM (Encounter) → FA (Charge Capture) → FA (Claim Submit) → FA (Payment Post)
       │               │                      │                    │
       ▼               ▼                      ▼                    ▼
  Encounter       Charges generated     Claim submitted       Payment received
  completed       from fee schedule     to clearinghouse      ERA posted
                  (auto or manual)                            Reconciled
```

**Automation opportunities:**

* Auto-generate charges on encounter completion (FW-03 trigger)
* Decision table: service code → fee schedule → charge amount by payer
* Auto-submit claims in batch (daily at 6pm via pg\_cron)
* Auto-post ERA payments and flag discrepancies
* SLA: Claims submitted within 48 hours of service; denials worked within 14 days

***

## 13. Implementation Roadmap

### Phase 1: Close the Execution Loop (Weeks 1-3) — CRITICAL

| Item   | Description                                                  | Spec         |
| ------ | ------------------------------------------------------------ | ------------ |
| **1a** | Implement `workflow-executor-worker` Edge Function           | New          |
| **1b** | Add `pg_cron` job to invoke worker every 10 seconds          | Migration    |
| **1c** | Add `claim_queued_executions()` DB function with row locking | Migration    |
| **1d** | Document form submission → automation invocation path        | FW-03 update |
| **1e** | Add `fw_dead_letter_queue` table and basic admin UI          | New          |
| **1f** | Add execution timeout (`deadline_at` column + watchdog cron) | Migration    |

**Exit criteria:** Event-triggered workflows execute reliably with retry and DLQ.

### Phase 2: Step-Level Reliability (Weeks 4-6) — HIGH

| Item   | Description                                                    | Spec      |
| ------ | -------------------------------------------------------------- | --------- |
| **2a** | Add `fw_execution_steps` table for checkpointing               | Migration |
| **2b** | Implement FW-25 retry strategies (exponential, linear backoff) | FW-25     |
| **2c** | Implement FW-25 circuit breaker per automation rule            | FW-25     |
| **2d** | Implement basic compensation actions                           | FW-25     |
| **2e** | Add `correlation_id` to events and executions                  | Migration |
| **2f** | Enhance PF-29 tasks with workflow execution linking            | PF-29     |

**Exit criteria:** Long-running workflows survive step failures and resume correctly.

### Phase 3: Business Rules & SLA (Weeks 7-10) — HIGH

| Item   | Description                                            | Spec        |
| ------ | ------------------------------------------------------ | ----------- |
| **3a** | Implement decision tables (`fw_decision_tables`)       | New FW spec |
| **3b** | Decision table UI (spreadsheet-like editor)            | New FW spec |
| **3c** | Integrate decision tables with FW-17 Condition Builder | FW-17       |
| **3d** | Implement FW-35 SLA Deadline Management                | FW-35       |
| **3e** | Implement PF-83 SLA Management Platform Layer          | New PF spec |
| **3f** | SLA dashboard widget                                   | PF-83       |

**Exit criteria:** Non-technical staff can configure authorization rules and SLA policies.

### Phase 4: Event Architecture Maturity (Weeks 11-13) — MEDIUM

| Item   | Description                                   | Spec            |
| ------ | --------------------------------------------- | --------------- |
| **4a** | Event schema registry in `fw_workflow_events` | FW-16           |
| **4b** | Event validation in `publishEvent()`          | Platform events |
| **4c** | PF-35 event forwarding to external systems    | PF-35           |
| **4d** | Business process registry (PF-82)             | New PF spec     |
| **4e** | Unified event delivery documentation          | Docs            |
| **4f** | Event replay capability for debugging         | New             |

**Exit criteria:** Events are versioned, validated, and deliverable to external systems.

### Phase 5: Advanced Orchestration (Weeks 14-18) — MEDIUM

| Item   | Description                                        | Spec        |
| ------ | -------------------------------------------------- | ----------- |
| **5a** | Implement FW-41 Sub-Workflow Orchestration         | FW-41       |
| **5b** | Implement FW-40 Quorum-Based Approval              | FW-40       |
| **5c** | Business calendar service (PF-84)                  | New PF spec |
| **5d** | Workflow instance migration on definition change   | New         |
| **5e** | Implement FW-43 Audit Trail & Compliance Reporting | FW-43       |

**Exit criteria:** Complex multi-step, multi-approval, cross-core workflows run reliably.

***

## 14. Risk Assessment

| Risk                                                        | Likelihood | Impact   | Mitigation                                                                  |
| ----------------------------------------------------------- | ---------- | -------- | --------------------------------------------------------------------------- |
| **Worker overload** — too many queued executions            | Medium     | High     | Batch sizing, per-org rate limiting (PF-42), priority queuing               |
| **Execution storms** — cascading event triggers             | Medium     | High     | Circuit breaker (FW-25), max fan-out limit per event                        |
| **Data inconsistency** — partial step execution             | Medium     | High     | Step-level checkpointing, idempotent actions, compensation                  |
| **Tenant isolation** — one org's automation affects another | Low        | Critical | RLS on all tables, per-org execution limits, separate cron per org at scale |
| **Schema drift** — event payloads change without notice     | Medium     | Medium   | Schema registry, validation, backward compatibility rules                   |
| **DLQ growth** — unresolved failures accumulate             | Medium     | Medium   | Alerting, auto-escalation, weekly DLQ review process                        |
| **Long-running workflow state bloat**                       | Low        | Medium   | Execution archival after completion (PF-46 Data Retention)                  |
| **Edge Function timeout** — complex workflow exceeds 150s   | Medium     | Medium   | Step-level execution (one step per invocation), re-queue for next step      |
| **pg\_cron reliability** — missed cron invocations          | Low        | High     | Health monitoring (PF-36), secondary polling mechanism                      |

***

## 15. References

### Internal Documents

* [CROSS\_CORE\_EVENTS\_AUTOMATION\_WORKFLOW\_DEEP\_DIVE.md](./CROSS_CORE_EVENTS_AUTOMATION_WORKFLOW_DEEP_DIVE.md) — Foundation document
* [EVENT\_CONTRACTS.md](./integrations/EVENT_CONTRACTS.md) — Event channels and payloads
* [PLATFORM\_INTEGRATION\_LAYERS.md](./integrations/PLATFORM_INTEGRATION_LAYERS.md) — Platform layer index
* [API\_CONTRACTS.md](./integrations/API_CONTRACTS.md) — Synchronous API contracts
* [DATA\_FLOW.md](./patterns/DATA_FLOW.md) — Request lifecycle and automation flow
* [FORMS\_WIZARDS\_WORKFLOWS\_AUTOMATIONS\_RECOMMENDATIONS.md](../archive/development/FORMS_WIZARDS_WORKFLOWS_AUTOMATIONS_RECOMMENDATIONS.md) (archived)
* [REAL\_TIME\_ARCHITECTURE.md](./REAL_TIME_ARCHITECTURE.md) — Realtime layer architecture
* `src/platform/events/README.md` — Platform events API
* `src/platform/workflow/README.md` — Workflow visualization API
* `src/cores/fw/AGENTS.md` — FW automation and workflow patterns

### Specs Referenced (Existing)

* FW-03: Automation Engine
* FW-06: Advanced Workflow Builder
* FW-16: Event-Based Workflow Triggers
* FW-17: Advanced Condition Builder
* FW-25: Advanced Error Recovery & Retry *(enriched 2026-03-15)*
* FW-34: Approval Workflows
* FW-35: SLA Deadline Management
* FW-40: Quorum-Based Approval
* FW-41: Sub-Workflow Orchestration
* FW-43: Workflow Audit Trail & Compliance Reporting *(enriched 2026-03-15)*
* PF-04: Audit Logging
* PF-10: Notifications System
* PF-29: Unified Task System
* PF-35: Integration Hub
* PF-42: Rate Limiting & Throttling
* PF-47: Bulk Operations Framework
* PF-66: Platform Realtime Layer

### New Specs Created from This Research (2026-03-15)

* [FW-45: Decision Tables](../../specs/fw/archive/FW-45-decision-tables.md) — from R-FW-07
* [FW-46: Durable Execution Worker](../../specs/fw/archive/FW-46-durable-execution-worker.md) — from R-FW-01 (CRITICAL)
* [FW-47: Dead Letter Queue](../../specs/fw/archive/FW-47-dead-letter-queue.md) — from R-FW-02
* [FW-48: Execution Step Checkpointing](../../specs/fw/archive/FW-48-execution-step-checkpointing.md) — from R-FW-03
* [FW-49: Execution Timeout & Watchdog](../../specs/fw/archive/FW-49-execution-timeout-watchdog.md) — from R-FW-06
* [FW-16 Phase 2: Event Schema Expansion](../../specs/fw/archive/FW-16-PHASE-2-EVENT-SCHEMA-EXPANSION.md) — from R-FW-08
* [PF-82: Business Process Registry](../../specs/pf/specs/PF-82-business-process-registry.md) — from PF recommendations
* [PF-83: SLA Management Platform Layer](../../specs/pf/specs/PF-83-sla-management-platform-layer.md) — from PF recommendations
* [PF-84: Business Calendar Service](../../specs/pf/specs/PF-84-business-calendar-service.md) — from PF recommendations
* [PF-85: Automation Observability Dashboard](../../specs/pf/specs/PF-85-automation-observability-dashboard.md) — from PF recommendations
* [PF-10 Phase 5: Workflow-Aware Templates](../../specs/pf/specs/PF-10-PHASE-5-WORKFLOW-TEMPLATES-EXPANSION.md) — from PF-10 enhancement
* [PF-29 Phase 4: Task-Workflow Linking](../../specs/pf/specs/PF-29-PHASE-4-WORKFLOW-LINKING-EXPANSION.md) — from PF-29 enhancement
* [PF-35 Phase 2: Event Forwarding](../../specs/pf/specs/PF-35-PHASE-2-EVENT-FORWARDING-EXPANSION.md) — from PF-35 enhancement
* [PF-36 Phase 3: Automation Health](../../specs/pf/specs/PF-36-PHASE-3-AUTOMATION-HEALTH-EXPANSION.md) — from PF-36 enhancement

### Architecture Documents Created from This Research

* [EVENT\_DELIVERY\_ARCHITECTURE.md](./EVENT_DELIVERY_ARCHITECTURE.md) — from R-FW-12

### Industry Standards & Research

* HL7 FHIR PlanDefinition (R4/R5) — Workflow definition standard
* HL7 FHIR Task Resource — Work item lifecycle
* DMN 1.3 (Decision Model and Notation) — Business rules standard
* Temporal.io — Durable execution patterns
* Inngest — Event-driven function execution patterns
* Saga Pattern (Garcia-Molina & Salem, 1987) — Long-running transaction compensation
* CQRS/Event Sourcing (Greg Young) — Event-driven architecture patterns
* Supabase pg\_net / pg\_cron / pgmq — PostgreSQL-native async execution and queuing
* SAMHSA/ASAM — Behavioral health treatment level determination criteria
* 42 CFR Part 2 — Substance use disorder record confidentiality
* BPM+ Health (Trisotech) — Pre-built healthcare workflow/decision models
* US Behavioral Health Profiles IG (HL7) — FHIR profiles for behavioral health
* json-rules-engine — JSON-based business rules evaluation for Node.js/Deno

***

## Appendix A: Supabase Queues (pgmq) — Enhanced Execution Architecture

### A.1 Why pgmq Over Raw pg\_cron Polling

Online research revealed that Supabase now provides **Supabase Queues** built on the `pgmq` extension — a PostgreSQL-native durable message queue. This is a superior alternative to the raw `pg_cron` + `FOR UPDATE SKIP LOCKED` pattern described in Section 8 for several reasons:

| Feature                | pg\_cron + Row Locking         | pgmq (Supabase Queues)                                |
| ---------------------- | ------------------------------ | ----------------------------------------------------- |
| **Delivery guarantee** | Manual (SKIP LOCKED)           | Built-in visibility timeout                           |
| **Retry behavior**     | Manual retry\_count tracking   | Automatic: message reappears after visibility timeout |
| **Dead letter queue**  | Manual DLQ table               | Built-in archive queue + configurable DLQ             |
| **Batch processing**   | Manual batch query             | `pgmq.read(queue, batch_size, visibility_timeout)`    |
| **Message cleanup**    | Manual DELETE after processing | `pgmq.delete()` or `pgmq.archive()` for audit         |
| **Concurrency safety** | FOR UPDATE SKIP LOCKED         | Native queue semantics (no double-delivery)           |
| **RLS support**        | Manual                         | Built-in tenant isolation                             |
| **Audit trail**        | Manual logging                 | Archive tables preserve processed messages            |

### A.2 Revised Execution Architecture with pgmq

```
┌─────────────────────────────────────────────────────────────────────┐
│                        EVENT SOURCES                                │
├─────────────┬──────────────┬──────────────┬────────────────────────┤
│ Form Submit │ Domain Event │ Schedule     │ External Webhook       │
└──────┬──────┴──────┬───────┴──────┬───────┴────────────┬──────────┘
       │             │              │                    │
       ▼             ▼              ▼                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│            fw_process_domain_event() (DB Trigger)                   │
│  - Match fw_automation_rules by event_config                       │
│  - pgmq.send('workflow_execution_queue', execution_payload)        │
│  - Match pf_event_subscriptions for external delivery              │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│         pgmq: workflow_execution_queue (Durable Queue)             │
│  Messages: { execution_id, rule_id, trigger_payload, org_id }      │
│  Visibility timeout: 120 seconds                                   │
│  Archive: enabled (compliance audit trail)                         │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
              ┌───────────────┤ (pg_cron every 10s → pg_net → Edge Function)
              ▼               │
┌─────────────────────────────┴───────────────────────────────────────┐
│              workflow-executor-worker (Edge Function)                │
│                                                                     │
│  1. pgmq.read('workflow_execution_queue', 10, 120) -- batch of 10  │
│                                                                     │
│  2. For each message:                                               │
│     a. Load execution from fw_workflow_executions                   │
│     b. Load workflow definition                                     │
│     c. Resume from current_step_id (or start)                      │
│     d. Execute step (action, condition, approval, delay, subflow)  │
│     e. Checkpoint: UPDATE current_step_id, write execution_step    │
│     f. On success → pgmq.delete() (or archive for audit)          │
│        If more steps → pgmq.send() new message for next step      │
│     g. On transient failure → let visibility timeout expire (auto  │
│        retry)                                                       │
│     h. On permanent failure → pgmq.send('workflow_dlq', msg)      │
│                                                                     │
│  3. Emit execution events for monitoring (FW-22)                    │
└─────────────────────────────────────────────────────────────────────┘
```

### A.3 Step-Per-Message Pattern

For long-running workflows (>150s Edge Function timeout), use a **one-step-per-message** pattern:

1. Worker dequeues message for execution step N
2. Executes step N
3. On success: deletes message, enqueues new message for step N+1
4. On wait (approval/delay): deletes message, sets execution status to 'waiting'
5. On approval received: enqueues new message to resume from step N+1

This ensures each Edge Function invocation stays well within the 150-second timeout while supporting workflows that run for days or weeks.

### A.4 pgmq Queue Configuration

```sql theme={null}
-- Create queues (run in migration)
SELECT pgmq.create('workflow_execution_queue');   -- Main execution queue
SELECT pgmq.create('workflow_dlq');               -- Dead letter queue
SELECT pgmq.create('event_forwarding_queue');     -- External webhook delivery
SELECT pgmq.create('notification_queue');         -- Async notification delivery

-- pg_cron job to invoke worker
SELECT cron.schedule(
  'process-workflow-queue',
  '10 seconds',
  $$SELECT net.http_post(
    url := current_setting('app.supabase_url') || '/functions/v1/workflow-executor-worker',
    headers := jsonb_build_object(
      'Authorization', 'Bearer ' || current_setting('app.service_role_key'),
      'Content-Type', 'application/json'
    ),
    body := '{"queue": "workflow_execution_queue", "batch_size": 10}'::jsonb
  )$$
);
```

***

## Appendix B: json-rules-engine — Concrete Business Rules Implementation

### B.1 Why json-rules-engine

Research identified `json-rules-engine` as the most practical rules engine for a React + Supabase stack:

* **JSON-defined rules** — storable in Postgres JSONB columns, no code deploys to change rules
* **Forward-chaining evaluation** — supports nested AND/OR/NOT conditions
* **Custom operators** — extensible for healthcare-specific evaluations (score ranges, date comparisons)
* **Isomorphic** — runs in both Edge Functions (Deno) and browser for real-time preview
* **Most popular** — 189+ npm dependents, active maintenance

### B.2 Integration with FW-17 Condition Builder

Decision tables (proposed FW-45) can be implemented as json-rules-engine rule sets stored in JSONB:

```json theme={null}
{
  "name": "Authorization Requirements",
  "version": 2,
  "hitPolicy": "first",
  "rules": [
    {
      "conditions": {
        "all": [
          { "fact": "payer_category", "operator": "equal", "value": "medicaid" },
          { "fact": "service_line", "operator": "in", "value": ["residential", "php"] }
        ]
      },
      "event": {
        "type": "authorization_required",
        "params": { "requires_preauth": true, "review_interval_days": 7 }
      }
    },
    {
      "conditions": {
        "all": [
          { "fact": "payer_category", "operator": "equal", "value": "self_pay" }
        ]
      },
      "event": {
        "type": "no_authorization",
        "params": { "requires_preauth": false }
      }
    }
  ]
}
```

### B.3 Server-Side Evaluation in Edge Functions

```typescript theme={null}
// In workflow-executor-worker or automation-executor
import { Engine } from 'json-rules-engine';

async function evaluateDecisionTable(
  tableId: string,
  facts: Record<string, unknown>,
  organizationId: string
): Promise<{ matched: boolean; outputs: Record<string, unknown>[] }> {
  const { data: table } = await supabase
    .from('fw_decision_tables')
    .select('rules, hit_policy')
    .eq('id', tableId)
    .eq('organization_id', organizationId)
    .single();

  const engine = new Engine(table.rules);
  const results = await engine.run(facts);

  // Apply hit policy
  if (table.hit_policy === 'first') {
    return { matched: results.events.length > 0, outputs: results.events.slice(0, 1) };
  }
  return { matched: results.events.length > 0, outputs: results.events };
}
```

### B.4 Audit Trail for Compliance

Every rule evaluation must be audited for healthcare compliance:

```sql theme={null}
CREATE TABLE fw_rule_evaluations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  organization_id UUID NOT NULL,
  decision_table_id UUID REFERENCES fw_decision_tables,
  rule_version INTEGER NOT NULL,
  input_facts JSONB NOT NULL,
  matched_rules JSONB NOT NULL,
  output_results JSONB NOT NULL,
  evaluation_context TEXT, -- 'workflow_step', 'form_validation', 'api_request'
  source_execution_id UUID, -- link to fw_workflow_executions if triggered by workflow
  evaluated_by UUID, -- user who triggered the evaluation
  evaluated_at TIMESTAMPTZ DEFAULT now()
);
```

***

## Appendix C: FHIR Behavioral Health Alignment — Extended Research

### C.1 US Behavioral Health Profiles Implementation Guide

The HL7 **US Behavioral Health Profiles IG** (v0.1.0, in development) standardizes FHIR profiles for:

* Substance use disorder conditions and treatment episodes
* Behavioral health encounter types (IOP, PHP, residential, outpatient)
* Mental health screening instruments (PHQ-9, GAD-7, CSSRS, ASAM)
* 42 CFR Part 2 consent management via FHIR Consent resource

### C.2 BPM+ Health Pre-Built Models

**BPM+ Health** (Trisotech) provides \~1,000 free, evidence-based workflow and decision models using BPMN/DMN/CMMN, including:

* Care pathways and clinical guidelines
* Healthcare calculators (LACE Score, APACHE, etc.)
* CDC immunization decision support
* Claims processing workflows
* Patient eligibility verification

These models can inform the design of Encore Health OS workflow templates (FW-28 marketplace) — adapting the process flows to the platform's automation engine rather than implementing BPMN directly.

### C.3 42 CFR Part 2 Workflow Implications

Substance use disorder records require special handling that affects workflow design:

* **Consent-gated data sharing**: Workflows that share patient data across providers must check 42 CFR Part 2 consent status before proceeding
* **Data segmentation**: FHIR R5 Consent resource + CDS Hooks-based data segmentation are the emerging standard
* **Break-the-glass**: Emergency override for substance use data requires audit trail and justification — model as an approval workflow step with enhanced logging
* **Re-disclosure prohibition**: Outbound event forwarding (PF-35 enhancement) must exclude 42 CFR Part 2 protected data unless consent is verified

### C.4 XState v5 Persistence Pattern for Server-Side Workflows

Research confirmed XState v5 supports full snapshot persistence:

```typescript theme={null}
// Persist: Save XState snapshot to Postgres
const snapshot = actor.getPersistedSnapshot(); // Recursively serializes actor tree
await supabase.from('fw_workflow_instances').upsert({
  id: executionId,
  machine_snapshot: snapshot, // JSONB column
  updated_at: new Date().toISOString()
});

// Restore: Resume from persisted snapshot
const restoredSnapshot = (await supabase
  .from('fw_workflow_instances')
  .select('machine_snapshot')
  .eq('id', executionId)
  .single()).data.machine_snapshot;

const actor = createActor(workflowMachine, { snapshot: restoredSnapshot });
actor.start(); // Resumes exactly where it left off
```

This **"wake up, react, persist, sleep"** pattern makes XState viable for Edge Function execution without requiring a long-running process. However, for the Encore Health OS architecture, this should remain **reserved for future use** (consistent with the existing "experimental" designation) — the `automation-executor` + `fw_workflow_executions` model is simpler and already implemented. XState persistence becomes relevant when/if complex client-side workflow UIs need server-synchronized state.

***

## Appendix D: Industry Sources

### Healthcare Workflow Automation

* [Workflow Automation for Behavioural Health (Q3Tech)](https://www.q3tech.com/blogs/workflow-automation-for-behavioural-health/)
* [Top 10 Healthcare Workflow Automations (Sully AI)](https://www.sully.ai/blog/healthcare-workflow-automation)
* [AI and Automation in Healthcare 2026 Predictions](https://www.healthcareittoday.com/2025/12/23/ai-and-automation-in-healthcare-2026-health-it-predictions/)
* [Priorities to Accelerate Workflow Automation in Health Care (PMC)](https://pmc.ncbi.nlm.nih.gov/articles/PMC9748536/)

### Event-Driven Architecture

* [Event Versioning Strategies (theburningmonk)](https://theburningmonk.com/2025/04/event-versioning-strategies-for-event-driven-architectures/)
* [Simple Patterns for Event Schema Versioning (Event-Driven.io)](https://event-driven.io/en/simple_events_versioning_patterns/)
* [Deduplication in Distributed Systems (Architecture Weekly)](https://www.architecture-weekly.com/p/deduplication-in-distributed-systems)
* [Idempotency and Ordering (CockroachDB)](https://www.cockroachlabs.com/blog/idempotency-and-ordering-in-event-driven-systems/)
* [On Idempotency Keys (Gunnar Morling)](https://www.morling.dev/blog/on-idempotency-keys/)
* [Reliable Reprocessing and Dead Letter Queues (Uber)](https://www.uber.com/blog/reliable-reprocessing/)
* [DLQ Guide (SRE School)](https://sreschool.com/blog/dead-letter-queue-dlq/)

### Supabase Patterns

* [Supabase Queues (pgmq) Documentation](https://supabase.com/docs/guides/queues)
* [PGMQ Extension Documentation](https://supabase.com/docs/guides/queues/pgmq)
* [Supabase Cron Documentation](https://supabase.com/docs/guides/cron)
* [pg\_net Documentation](https://supabase.com/docs/guides/database/extensions/pg_net)
* [Database Webhooks Documentation](https://supabase.com/docs/guides/database/webhooks)
* [Background Jobs with Supabase Tables and Edge Functions](https://www.jigz.dev/blogs/how-i-solved-background-jobs-using-supabase-tables-and-edge-functions)

### XState & Workflow Orchestration

* [Stately Docs: Persistence](https://stately.ai/docs/persistence)
* [XState v5 Persistent Serverless State Machines (Restate)](https://www.restate.dev/blog/persistent-serverless-state-machines-with-xstate-and-restate)
* [Workflow Automation with XState and React (Apploi)](https://medium.com/apploi/how-to-manage-workflow-automation-with-xstate-and-react-637f19d223c7)

### FHIR Workflow

* [FHIR Workflow Module (v5.0.0)](https://www.hl7.org/fhir/workflow.html)
* [FHIR Workflow Patterns (Medplum)](https://www.medplum.com/blog/fhir-workflow-patterns-to-simplify-your-life)
* [US Behavioral Health Profiles IG](https://build.fhir.org/ig/HL7/us-behavioral-health-profiles/)
* [Behavioral Health Workflow Automation (Dock Health)](https://dock.health/blog/behavioral-health-substance-abuse-treatment-workflows)

### Durable Execution & Saga Patterns

* [Ultimate Guide to TypeScript Orchestration (Temporal vs Inngest vs Trigger.dev)](https://medium.com/@matthieumordrel/the-ultimate-guide-to-typescript-orchestration-temporal-vs-trigger-dev-vs-inngest-and-beyond-29e1147c8f2d)
* [Temporal: How It Works](https://temporal.io/how-it-works)
* [Mastering Saga Patterns (Temporal)](https://temporal.io/blog/mastering-saga-patterns-for-distributed-transactions-in-microservices)
* [Saga Pattern (microservices.io)](https://microservices.io/patterns/data/saga.html)

### Business Rules Engines

* [Decision Model and Notation (Trisotech)](https://www.trisotech.com/dmn/)
* [DMN (OMG Standard)](https://www.omg.org/dmn/)
* [BPM+ Health Pre-Built Models (Trisotech)](https://www.trisotech.com/business-rules-and-decision-management/)
* [json-rules-engine (npm)](https://www.npmjs.com/package/json-rules-engine)
* [CDC Immunization Decision Support (BRCommunity)](https://www.brcommunity.com/articles.php?id=c117)
