# Event Graph Gap Analysis — 2026-04-28 **Auditor:** Socrates 🧠 **Phase:** 6.1 UAT **Priority:** HIGH — Matt needs visibility into document processing persistence --- ## Executive Summary The Event Graph feature is **entirely unimplemented**. The pipeline currently processes documents through vision extraction → briefing generation → Telegram delivery, but **nothing is persisted** after the briefing card leaves the pipeline. The `briefing_events` table schema exists in code but was never initialized. No coordination type exists. No Event Graph database exists. UAT documents are **not persisting** in any queryable form beyond the original ChromaDB email ingestion (pre-UAT). --- ## Gap 1: No Event Graph Database **Current state:** Zero tables, collections, or schemas for Event Graph **Impact:** No way to query "what documents were processed this week" or "show me all coordination items" ### Fix Scope - Create `event_graph` table in `icarus.db` with the target schema: ```sql CREATE TABLE event_graph ( id TEXT PRIMARY KEY, document_id TEXT NOT NULL, type TEXT NOT NULL CHECK(type IN ('calendar_event', 'coordination', 'info')), extracted_data TEXT NOT NULL, -- JSON routing_decision TEXT NOT NULL, -- JSON confidence REAL NOT NULL, source TEXT, -- email, telegram, upload created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` - Initialize via `init_db()` in `documents.py` - Add write step in `pipeline.py` after briefing generation **Estimate:** ~2 hours (schema + write step + tests) --- ## Gap 2: briefing_events Table Never Created **Current state:** Schema exists in `documents.py` but `init_db()` was never called in production **Impact:** "Add to Calendar" button data has no persistence — events expire and cannot be retrieved for calendar creation ### Fix Scope - Call `init_db()` during Icarus API startup or migration - Verify `store_briefing_event()` is called in the pipeline after briefing generation - This is a **blocking bug** — calendar action buttons reference stored events **Estimate:** ~30 minutes (add init_db call + verify pipeline hook) --- ## Gap 3: No coordination Type Routing **Current state:** Document sorter only classifies by family member, not by type (calendar_event vs coordination vs info) **Impact:** Documents like "Can you cover Thursday?" have no routing path — they're not calendar events and not stored anywhere ### Fix Scope - Extend `document_sorter.py` with type classification (calendar_event / coordination / info) - Add `type` field to briefing generator output - For `coordination` type: skip calendar button, add "Confirm with [person]" button - For `info` type: no action buttons, just briefing card - Store type in Event Graph for future HBM queries **Estimate:** ~4 hours (classifier extension + routing logic + button variants) --- ## Gap 4: System State Dashboard Not Wired to Pipeline **Current state:** API serves System State at `/system/state` but `routing_decisions = []` is hardcoded, `total_documents = 0` **Impact:** Dashboard shows zeros even when documents have been processed through Telegram ### Fix Scope - Wire `total_documents` counter to `ingress_logs` or `event_graph` table - Wire `routing_decisions` to actual document sorter results - Wire `confidence_distribution` to briefing generator confidence scores - Add recent documents section showing last N processed items **Estimate:** ~3 hours (API endpoints + data wiring + template updates) --- ## Gap 5: ChromaDB family_knowledge Not Connected to Briefing Pipeline **Current state:** ChromaDB `family_knowledge` has 47 documents from the email ingestion phase (April 17), but this collection is NOT written to by the vision/briefing pipeline **Impact:** Documents processed through Telegram/vision pipeline are ephemeral — they exist only as Telegram messages ### Fix Scope - Option A: Write briefing results to `family_knowledge` collection (simplest) - Option B: Create separate `processed_documents` collection for Event Graph (cleaner separation) - Option C: Use `event_graph` SQLite table as primary, ChromaDB for semantic search (recommended) **Estimate:** ~2 hours (whichever option) --- ## Gap 6: No Pipeline Logging for UAT Traceability **Current state:** `ingress_logs.db` has 1 test entry from April 26. The vision/briefing pipeline does NOT log to ingress_logs. **Impact:** No audit trail of what documents were processed, when, or with what result ### Fix Scope - Add ingress log write in `pipeline.py` after each document is processed - Include: source, processing time, confidence, document type, routing decision - Query endpoint for System State dashboard **Estimate:** ~1 hour (log write + schema extension) --- ## Priority Ranking | Priority | Gap | Impact | Effort | |----------|-----|--------|--------| | P0 | Gap 2: briefing_events table not created | Calendar buttons broken | 30 min | | P0 | Gap 6: No pipeline logging | No audit trail | 1 hour | | P1 | Gap 1: No Event Graph database | No persistence | 2 hours | | P1 | Gap 4: Dashboard not wired | Shows zeros | 3 hours | | P2 | Gap 3: No coordination type | Phase 6 feature | 4 hours | | P2 | Gap 5: ChromaDB not connected | No semantic search on new docs | 2 hours | **Total estimated effort:** ~12.5 hours --- ## Recommended Sprint Order 1. **P0:** Initialize `briefing_events` table (fix broken calendar buttons) 2. **P0:** Add pipeline logging to `ingress_logs` 3. **P1:** Create `event_graph` table + write step in pipeline 4. **P1:** Wire System State dashboard to actual data 5. **P2:** Add `coordination` type classification 6. **P2:** Connect ChromaDB for semantic search on processed documents This order ensures UAT flows work end-to-end first (calendar buttons), then adds observability (logging), then persistence (Event Graph), then completeness (coordination + search).