> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Durable Execution Worker — Runbook

> Version: 1.0.0 Last Updated: 2026-03-18 Status: Active Module: FW

**Version:** 1.0.0\
**Last Updated:** 2026-03-18\
**Status:** Active\
**Module:** FW

***

## 1. Safe Pause

To temporarily stop the worker without losing queued messages:

```sql theme={null}
-- Unschedule the cron job
SELECT cron.unschedule('process-workflow-queue');

-- Verify it's removed
SELECT * FROM cron.job WHERE jobname = 'process-workflow-queue';
```

Messages remain in pgmq and will be processed when the worker is re-enabled.

**To resume:**

```sql theme={null}
SELECT cron.schedule(
  'process-workflow-queue',
  '* * * * *',
  $$SELECT util.invoke_edge_function('workflow-executor-worker', '{}'::jsonb);$$
);
```

***

## 2. Reset Stuck Semaphore

If `worker_running = true` persists (e.g., worker crashed mid-run):

```sql theme={null}
-- Check which orgs are stuck
SELECT organization_id, worker_running, worker_last_run_at
FROM fw_module_settings
WHERE worker_running = true;

-- Reset for a specific org
UPDATE fw_module_settings
SET worker_running = false
WHERE organization_id = '<org_id>'
  AND worker_running = true;

-- Reset all stuck semaphores (use with caution)
UPDATE fw_module_settings
SET worker_running = false
WHERE worker_running = true
  AND worker_last_run_at < now() - interval '5 minutes';
```

***

## 3. Queue Backlog Investigation

```sql theme={null}
-- Total pending messages
SELECT count(*) FROM pgmq.q_workflow_execution_queue WHERE vt <= now();

-- Messages by org
SELECT message->>'organization_id' AS org_id, count(*)
FROM pgmq.q_workflow_execution_queue
WHERE vt <= now()
GROUP BY 1
ORDER BY 2 DESC;

-- Oldest unprocessed message
SELECT msg_id, enqueued_at, message->>'execution_id' AS execution_id
FROM pgmq.q_workflow_execution_queue
WHERE vt <= now()
ORDER BY enqueued_at ASC
LIMIT 1;
```

***

## 4. Execution Failure Investigation

```sql theme={null}
-- Recent failed executions
SELECT id, rule_id, organization_id, status, retry_count, last_error, 
       next_retry_at, started_at, completed_at
FROM fw_workflow_executions
WHERE status IN ('failed', 'retry_pending')
ORDER BY completed_at DESC NULLS LAST
LIMIT 20;

-- DLQ contents
SELECT msg_id, enqueued_at,
       message->>'execution_id' AS execution_id,
       message->>'error' AS error,
       message->>'total_attempts' AS attempts
FROM pgmq.q_workflow_dlq
ORDER BY enqueued_at DESC
LIMIT 20;
```

***

## 5. Emergency: Disable Worker for All Orgs

```sql theme={null}
UPDATE fw_module_settings
SET fw_execution_worker_enabled = false, worker_running = false;

SELECT cron.unschedule('process-workflow-queue');
```

***

## 6. Rollback / Cleanup

If the feature must be fully reverted:

1. Unschedule cron: `SELECT cron.unschedule('process-workflow-queue');`
2. Disable all orgs: `UPDATE fw_module_settings SET fw_execution_worker_enabled = false, worker_running = false;`
3. Drain queues (optional): `SELECT pgmq.purge_queue('workflow_execution_queue'); SELECT pgmq.purge_queue('workflow_dlq');`
4. Reset stuck executions: `UPDATE fw_workflow_executions SET status = 'failed', last_error = 'Worker rollback' WHERE status IN ('queued', 'retry_pending', 'running');`

***

## 7. Health Check Queries

```sql theme={null}
-- Worker last run times
SELECT organization_id, worker_last_run_at, worker_last_batch_size,
       now() - worker_last_run_at AS time_since_last_run
FROM fw_module_settings
WHERE fw_execution_worker_enabled = true
ORDER BY worker_last_run_at DESC NULLS LAST;

-- Cron job status
SELECT jobid, jobname, schedule, active
FROM cron.job
WHERE jobname = 'process-workflow-queue';
```

***

## Related Documentation

* [FW-46 Admin Guide](durable-execution-worker-admin-guide.md)
* [FW-46 API Reference](durable-execution-worker-api-reference.md)
* [FW-46 Cron Scheduling](../guides/cron-scheduling-fw-46.md)
