Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt

Use this file to discover all available pages before exploring further.

Feature ID: FW-48
Spec: specs/fw/specs/FW-48-execution-step-checkpointing.md
Status: ✅ Implemented
Owner Core: FW (Forms & Workflow)

Purpose

Defines integration contracts for workflow execution step checkpointing, including worker write/read paths, monitoring UI read paths, realtime updates, and downstream consumers (retry, compensation, analytics).

Integration Matrix

DependencyTypeDirectionContract Summary
PF-01 OrganizationsDirect dependencyFW -> PForganization_id tenant scoping for all rows and queries
PF-04 PermissionsDirect dependencyFW -> PFUI/API access checks for execution detail and step data
FW-46 Durable Execution WorkerInternal FW dependencyFW <-> FWWorker writes/updates checkpoint rows and uses resume point logic
FW-22 Execution MonitoringInternal FW dependencyFW -> FWUI reads execution step timeline and status/duration data
FW-25 Error Recovery & RetryInternal FW dependencyFW <-> FWRetry config informs failed-step handling; FW-25 consumes step history
FW-18 Workflow VariablesInternal FW dependencyFW <-> FWAdds steps.* namespace to variable resolution from checkpoint output data
FW-23 Performance AnalyticsInternal FW dependencyFW -> FWConsumes duration_ms and status metrics

Data Contracts

Step Record (fw_execution_steps)

Core fields used in integrations:
  • id (UUID)
  • organization_id (UUID)
  • execution_id (UUID, FK -> fw_workflow_executions.id)
  • node_id (TEXT)
  • step_index (INTEGER)
  • status (pending|running|completed|failed|skipped|waiting|compensated)
  • input_data (JSONB)
  • output_data (JSONB)
  • retry_count (INTEGER)
  • error_message (TEXT, sanitized)
  • started_at, completed_at, duration_ms
  • created_at, updated_at
Tenant isolation and all reads/writes are scoped by organization_id and enforced with RLS.

Runtime Interaction Contracts

1) Worker Checkpoint Write Contract (FW-46 -> FW-48)

Trigger: Worker starts a node execution.
Behavior: Upsert step row to status running.
Minimum write payload:
  • organization_id
  • execution_id
  • node_id
  • step_index
  • status = 'running'
  • started_at
  • input_data (step-scoped variable subset)
Idempotency:
  • Unique key (execution_id, node_id); retries update existing row via upsert.

2) Worker Completion Contract (FW-46 -> FW-48)

Trigger: Node execution completes.
Behavior: Update row to completed, persist output_data, completed_at, duration_ms.

3) Worker Failure Contract (FW-46/FW-25 -> FW-48)

Trigger: Node execution fails.
Behavior: Update row to failed, increment retry_count, store sanitized error message.

4) Resume Contract (FW-46 reads FW-48)

Trigger: Execution restart/recovery.
Behavior: Determine resume point from ordered step rows; skip completed/skipped steps and retry eligible failed steps.

API / Query Contracts

Execution Step Timeline Query

Consumer: FW-22 execution detail UI
Inputs: execution_id, tenant context
Output: Ordered list by step_index with status, timing, and input/output snippets.

Resume Point Query

Consumer: FW-46 worker
Inputs: execution_id, tenant context
Output: Node identifier/index to continue from, with retry eligibility metadata.

Realtime Contract

Publication requirement:
ALTER PUBLICATION supabase_realtime ADD TABLE fw_execution_steps;
Consumer behavior:
  • FW-22 subscribes by execution context and updates step timeline in near-real time.
  • Note: FW-48 uses Supabase Realtime (table publication), not domain events via fw_domain_events. No EVENT_CONTRACTS.md entry is required.

Security Contract

  • RLS enabled on fw_execution_steps.
  • Policies scoped using SECURITY DEFINER helper (fw_has_org_access(...)).
  • UPDATE policies include WITH CHECK for tenant safety.
  • Error payloads are sanitized before persistence.
  • No PHI/PII logging of step input/output in application logs.

Failure and Recovery Contract

  • Orphaned running rows may occur on worker crash; cleanup handled by scheduled job (fw-step-orphan-cleanup).
  • Historical retention cleanup handled by scheduled job (fw-step-data-retention).
  • Cleanup jobs must preserve tenant boundaries and auditability requirements.

Validation Checklist

  • Integration points align with FW-48 spec Integration Points section
  • Tenant isolation and permission model documented
  • Worker write/read checkpoint contracts defined
  • Monitoring/realtime consumption contract defined
  • Downstream consumer contracts (FW-23/FW-25/FW-18) identified