Event Graph Gap Analysis — 2026-04-28
Auditor: Socrates 🧠
Phase: 6.1 UAT
Priority: HIGH — Matt needs visibility into document processing persistence
Executive Summary
The Event Graph feature is entirely unimplemented. The pipeline currently processes documents through vision extraction → briefing generation → Telegram delivery, but nothing is persisted after the briefing card leaves the pipeline. The briefing_events table schema exists in code but was never initialized. No coordination type exists. No Event Graph database exists.
UAT documents are not persisting in any queryable form beyond the original ChromaDB email ingestion (pre-UAT).
Gap 1: No Event Graph Database
Current state: Zero tables, collections, or schemas for Event Graph
Impact: No way to query "what documents were processed this week" or "show me all coordination items"
Fix Scope
- Create
event_graphtable inicarus.dbwith the target schema:
sql CREATE TABLE event_graph ( id TEXT PRIMARY KEY, document_id TEXT NOT NULL, type TEXT NOT NULL CHECK(type IN ('calendar_event', 'coordination', 'info')), extracted_data TEXT NOT NULL, -- JSON routing_decision TEXT NOT NULL, -- JSON confidence REAL NOT NULL, source TEXT, -- email, telegram, upload created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); - Initialize via
init_db()indocuments.py - Add write step in
pipeline.pyafter briefing generation
Estimate: ~2 hours (schema + write step + tests)
Gap 2: briefing_events Table Never Created
Current state: Schema exists in documents.py but init_db() was never called in production
Impact: "Add to Calendar" button data has no persistence — events expire and cannot be retrieved for calendar creation
Fix Scope
- Call
init_db()during Icarus API startup or migration - Verify
store_briefing_event()is called in the pipeline after briefing generation - This is a blocking bug — calendar action buttons reference stored events
Estimate: ~30 minutes (add init_db call + verify pipeline hook)
Gap 3: No coordination Type Routing
Current state: Document sorter only classifies by family member, not by type (calendar_event vs coordination vs info)
Impact: Documents like "Can you cover Thursday?" have no routing path — they're not calendar events and not stored anywhere
Fix Scope
- Extend
document_sorter.pywith type classification (calendar_event / coordination / info) - Add
typefield to briefing generator output - For
coordinationtype: skip calendar button, add "Confirm with [person]" button - For
infotype: no action buttons, just briefing card - Store type in Event Graph for future HBM queries
Estimate: ~4 hours (classifier extension + routing logic + button variants)
Gap 4: System State Dashboard Not Wired to Pipeline
Current state: API serves System State at /system/state but routing_decisions = [] is hardcoded, total_documents = 0
Impact: Dashboard shows zeros even when documents have been processed through Telegram
Fix Scope
- Wire
total_documentscounter toingress_logsorevent_graphtable - Wire
routing_decisionsto actual document sorter results - Wire
confidence_distributionto briefing generator confidence scores - Add recent documents section showing last N processed items
Estimate: ~3 hours (API endpoints + data wiring + template updates)
Gap 5: ChromaDB family_knowledge Not Connected to Briefing Pipeline
Current state: ChromaDB family_knowledge has 47 documents from the email ingestion phase (April 17), but this collection is NOT written to by the vision/briefing pipeline
Impact: Documents processed through Telegram/vision pipeline are ephemeral — they exist only as Telegram messages
Fix Scope
- Option A: Write briefing results to
family_knowledgecollection (simplest) - Option B: Create separate
processed_documentscollection for Event Graph (cleaner separation) - Option C: Use
event_graphSQLite table as primary, ChromaDB for semantic search (recommended)
Estimate: ~2 hours (whichever option)
Gap 6: No Pipeline Logging for UAT Traceability
Current state: ingress_logs.db has 1 test entry from April 26. The vision/briefing pipeline does NOT log to ingress_logs.
Impact: No audit trail of what documents were processed, when, or with what result
Fix Scope
- Add ingress log write in
pipeline.pyafter each document is processed - Include: source, processing time, confidence, document type, routing decision
- Query endpoint for System State dashboard
Estimate: ~1 hour (log write + schema extension)
Priority Ranking
| Priority | Gap | Impact | Effort |
|---|---|---|---|
| P0 | Gap 2: briefing_events table not created | Calendar buttons broken | 30 min |
| P0 | Gap 6: No pipeline logging | No audit trail | 1 hour |
| P1 | Gap 1: No Event Graph database | No persistence | 2 hours |
| P1 | Gap 4: Dashboard not wired | Shows zeros | 3 hours |
| P2 | Gap 3: No coordination type | Phase 6 feature | 4 hours |
| P2 | Gap 5: ChromaDB not connected | No semantic search on new docs | 2 hours |
Total estimated effort: ~12.5 hours
Recommended Sprint Order
- P0: Initialize
briefing_eventstable (fix broken calendar buttons) - P0: Add pipeline logging to
ingress_logs - P1: Create
event_graphtable + write step in pipeline - P1: Wire System State dashboard to actual data
- P2: Add
coordinationtype classification - P2: Connect ChromaDB for semantic search on processed documents
This order ensures UAT flows work end-to-end first (calendar buttons), then adds observability (logging), then persistence (Event Graph), then completeness (coordination + search).