Feature ID: PF-60Documentation Index
Fetch the complete documentation index at: https://docs.encoreos.io/llms.txt
Use this file to discover all available pages before exploring further.
Version: 1.0
Last Updated: 2026-01-30
Status: π Specification
Overview
This document describes how PF-60 (RAG Infrastructure) integrates with other platform components and cores. RAG (Retrieval-Augmented Generation) provides semantic search capabilities that enable AI features to reference organization-specific documents and knowledge.Integration Pattern
Pattern Type: Event-Based Integration + Platform Layer PF-60 acts as both:- Event Consumer: Subscribes to document lifecycle events from PF-11
- Platform Layer: Provides semantic search API consumed by all AI features
Publisher Dependencies
PF-11: Document Management
PF-60 subscribes to document lifecycle events to maintain embedding synchronization. Events are emitted with the DomainEvent envelope (see EVENT_CONTRACTS.md): top-levelevent_type, payload, and metadata. The payload contains only identifiers and timestamps; consumers must fetch document content from pf_documents under RLS.
| Event | Action | Payload (wire β payload only) |
|---|---|---|
document_published | Generate embeddings | {organization_id, document_id, timestamp, user_id} β fetch content from pf_documents |
document_updated | Regenerate embeddings | {organization_id, document_id, timestamp, user_id} β fetch content from pf_documents |
document_deleted | Delete embeddings | {organization_id, document_id, timestamp, user_id} |
20260226140000_pf_rag_document_knowledge_events.sql):
payload field of the envelope. For embedding generation, the generate-embeddings edge function fetches title, extracted_content, category, tags from pf_documents by document_id; the event does not carry full content on the wire.
Consumer Cores
PF-60 provides semantic search capabilities to all modules with AI features.Core Consumers
| Core | Usage | Source Type Preferences |
|---|---|---|
| GR (Governance) | Policy search for compliance questions | source_type = 'policy' |
| HR (Workforce) | Handbook search for HR queries | source_type = 'document', category = 'handbook' |
| FA (Finance) | Procedure search for accounting queries | source_type = 'document', category = 'procedure' |
| All Modules | General AI enhancement | All source types |
Integration Flow
API Contracts
Database Functions
pf_search_embeddings Performs semantic similarity search against document embeddings.SET search_path = public
Edge Functions
generate-embeddings Generates embeddings for a document and stores in database.useRAG: true.
Data Flow Diagrams
Document Indexing Flow
RAG Query Flow
Security Considerations
Multi-Tenancy
- All embeddings include
organization_id - RLS policies enforce tenant isolation on all operations
pf_search_embeddingsuses SECURITY DEFINER but validates org_id- UPDATE policy includes WITH CHECK to prevent org_id modification
Access Control
Embeddings inherit access control from source documents:- Users can only search embeddings from their organization
- Same permissions as source documents apply
- No cross-organization embedding access
PHI/PII Handling
- Embedding
contentfield contains document text (may include PHI) - RLS provides the same protection as source documents
- Embeddings are cascade deleted when source is deleted
- No PHI stored in
metadatafield
Performance Considerations
Indexing Performance
| Metric | Target | Notes |
|---|---|---|
| Embedding generation | <5s per document | OpenAI API latency |
| Bulk indexing | 100 docs/batch | Rate limit handling |
| Retry delays | Exponential backoff | 1s, 2s, 4s |
Search Performance
| Metric | Target | Notes |
|---|---|---|
| Semantic search | <500ms | HNSW index |
| Top 5 results | <500ms | Cosine similarity |
| 10,000+ embeddings | Supported | Per organization |
Index Configuration
Migration Guide
For Existing Documents
Use the bulk indexing job to index existing documents:For New Integrations
- Emit
document_publishedevent when content is created - Emit event with identifiers only (
organization_id,document_id,timestamp,user_id). The consumer (generate-embeddings) fetches content frompf_documentsbydocument_idunder RLS. - PF-60 handles embedding generation automatically
Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| No search results | Embeddings not generated | Check pf_document_embeddings for source_id |
| Low similarity scores | Query mismatch | Adjust matchThreshold parameter |
| Slow generation | Large documents | Check chunking, reduce chunk size if needed |
| 429 errors | Rate limiting | Bulk indexing handles this automatically |
Monitoring
Check embedding status:Dependencies
| Dependency | Type | Required For |
|---|---|---|
| PF-59 (AI Provider Migration) | Required | OpenAI/OpenRouter API access |
| PF-11 (Document Management) | Required | Source documents and events |
| PF-27 (Platform AI) | Required | AI infrastructure base |
| pgvector extension | Required | Vector similarity search |
Related Documentation
- Spec:
specs/pf/specs/PF-60-rag-infrastructure.md - Tasks:
specs/pf/tasks/PF-60-TASKS.md - AI Strategy:
docs/architecture/analysis/AI_INTEGRATION_STRATEGY_2026.md - Event Contracts:
docs/architecture/integrations/EVENT_CONTRACTS.md
Last Updated: 2026-01-30