Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt

Use this file to discover all available pages before exploring further.

Feature ID: FW-47
Status: ✅ Implemented (2026-03-19)
Spec: FW-47-dead-letter-queue.md
Last Updated: 2026-03-19

Overview

FW-47 adds a Dead Letter Queue (DLQ) capability for permanently failed automation executions. It integrates with FW-46 durable worker output via pgmq, creates org-scoped DLQ records in fw_dead_letter_queue, publishes operational alerts through PF-10 notifications, and writes audit actions to pf_audit_logs.

Integration Points (from Spec)

  • PF-01 organizations - Data / Consumes: tenant scoping through organization_id on all DLQ entries and queries.
  • PF-10 notifications - Platform Layer / Publishes: threshold and permanent-failure alerts (fw_dlq_threshold_exceeded, fw_dlq_new_permanent_failure).
  • PF audit logs (pf_audit_logs) - Data / Publishes: immutable audit trail for retry/discard/investigate actions.
  • FW-46 durable execution worker - Event/Queue / Consumes: receives permanently failed execution payloads from workflow_dlq.
  • FW-03 automation engine - Data / Consumes: reads execution context and creates retry executions in fw_workflow_executions.

Queue and Processing Contracts

pgmq queues

  • workflow_dlq (Inbound): receives permanently failed messages from FW-46.
  • workflow_execution_queue (Outbound): enqueues retry execution requests.

Scheduled jobs

  • fw-dlq-consumer (Every 30 seconds): moves queue messages into fw_dead_letter_queue.
  • fw-dlq-threshold-check (Every 5 minutes): evaluates depth and sends PF-10 alerts.
  • fw-dlq-auto-discard (Daily 2 AM UTC): applies retention policy auto-discard.

API / Platform Contracts

  • Frontend hooks (planned):
    • useDeadLetterQueue(filters)
    • useDeadLetterQueueStats()
    • useRetryDeadLetterEntry()
    • useDiscardDeadLetterEntry()
    • useBulkRetryDeadLetterEntries()
    • useBulkDiscardDeadLetterEntries()
  • Worker/API integration:
    • FW-46 publishes failed execution payloads to workflow_dlq.
    • FW-47 consumer upserts DLQ entries and tracks failure_count, timestamps, and status transitions.
    • Retry actions create a new fw_workflow_executions record and enqueue it to workflow_execution_queue.

Security and Tenant Isolation

  • fw_dead_letter_queue is organization-scoped with RLS enabled.
  • Policies use SECURITY DEFINER-safe access helper functions.
  • Update policies include WITH CHECK to prevent cross-tenant mutation.
  • Stored error details must be sanitized; no PHI/PII may be persisted in DLQ error fields.
  • Admin actions are role-gated (automation_admin, org_admin) and audited in pf_audit_logs.

Failure Handling and Retry Semantics

  • Transient failures may be retried after root-cause remediation.
  • Permanent failures stay in pending until manual retry/discard or retention auto-discard.
  • Retry is transactional: update original DLQ state, create retry execution, and enqueue work atomically.
  • Duplicate pending entries are prevented with a unique partial index on (source_id, source_type).