📄 gap_analysis.md 5,920 bytes Apr 28, 2026 📋 Raw

Event Graph Gap Analysis — 2026-04-28

Auditor: Socrates 🧠
Phase: 6.1 UAT
Priority: HIGH — Matt needs visibility into document processing persistence


Executive Summary

The Event Graph feature is entirely unimplemented. The pipeline currently processes documents through vision extraction → briefing generation → Telegram delivery, but nothing is persisted after the briefing card leaves the pipeline. The briefing_events table schema exists in code but was never initialized. No coordination type exists. No Event Graph database exists.

UAT documents are not persisting in any queryable form beyond the original ChromaDB email ingestion (pre-UAT).


Gap 1: No Event Graph Database

Current state: Zero tables, collections, or schemas for Event Graph
Impact: No way to query "what documents were processed this week" or "show me all coordination items"

Fix Scope

  • Create event_graph table in icarus.db with the target schema:
    sql CREATE TABLE event_graph ( id TEXT PRIMARY KEY, document_id TEXT NOT NULL, type TEXT NOT NULL CHECK(type IN ('calendar_event', 'coordination', 'info')), extracted_data TEXT NOT NULL, -- JSON routing_decision TEXT NOT NULL, -- JSON confidence REAL NOT NULL, source TEXT, -- email, telegram, upload created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
  • Initialize via init_db() in documents.py
  • Add write step in pipeline.py after briefing generation

Estimate: ~2 hours (schema + write step + tests)


Gap 2: briefing_events Table Never Created

Current state: Schema exists in documents.py but init_db() was never called in production
Impact: "Add to Calendar" button data has no persistence — events expire and cannot be retrieved for calendar creation

Fix Scope

  • Call init_db() during Icarus API startup or migration
  • Verify store_briefing_event() is called in the pipeline after briefing generation
  • This is a blocking bug — calendar action buttons reference stored events

Estimate: ~30 minutes (add init_db call + verify pipeline hook)


Gap 3: No coordination Type Routing

Current state: Document sorter only classifies by family member, not by type (calendar_event vs coordination vs info)
Impact: Documents like "Can you cover Thursday?" have no routing path — they're not calendar events and not stored anywhere

Fix Scope

  • Extend document_sorter.py with type classification (calendar_event / coordination / info)
  • Add type field to briefing generator output
  • For coordination type: skip calendar button, add "Confirm with [person]" button
  • For info type: no action buttons, just briefing card
  • Store type in Event Graph for future HBM queries

Estimate: ~4 hours (classifier extension + routing logic + button variants)


Gap 4: System State Dashboard Not Wired to Pipeline

Current state: API serves System State at /system/state but routing_decisions = [] is hardcoded, total_documents = 0
Impact: Dashboard shows zeros even when documents have been processed through Telegram

Fix Scope

  • Wire total_documents counter to ingress_logs or event_graph table
  • Wire routing_decisions to actual document sorter results
  • Wire confidence_distribution to briefing generator confidence scores
  • Add recent documents section showing last N processed items

Estimate: ~3 hours (API endpoints + data wiring + template updates)


Gap 5: ChromaDB family_knowledge Not Connected to Briefing Pipeline

Current state: ChromaDB family_knowledge has 47 documents from the email ingestion phase (April 17), but this collection is NOT written to by the vision/briefing pipeline
Impact: Documents processed through Telegram/vision pipeline are ephemeral — they exist only as Telegram messages

Fix Scope

  • Option A: Write briefing results to family_knowledge collection (simplest)
  • Option B: Create separate processed_documents collection for Event Graph (cleaner separation)
  • Option C: Use event_graph SQLite table as primary, ChromaDB for semantic search (recommended)

Estimate: ~2 hours (whichever option)


Gap 6: No Pipeline Logging for UAT Traceability

Current state: ingress_logs.db has 1 test entry from April 26. The vision/briefing pipeline does NOT log to ingress_logs.
Impact: No audit trail of what documents were processed, when, or with what result

Fix Scope

  • Add ingress log write in pipeline.py after each document is processed
  • Include: source, processing time, confidence, document type, routing decision
  • Query endpoint for System State dashboard

Estimate: ~1 hour (log write + schema extension)


Priority Ranking

Priority Gap Impact Effort
P0 Gap 2: briefing_events table not created Calendar buttons broken 30 min
P0 Gap 6: No pipeline logging No audit trail 1 hour
P1 Gap 1: No Event Graph database No persistence 2 hours
P1 Gap 4: Dashboard not wired Shows zeros 3 hours
P2 Gap 3: No coordination type Phase 6 feature 4 hours
P2 Gap 5: ChromaDB not connected No semantic search on new docs 2 hours

Total estimated effort: ~12.5 hours


  1. P0: Initialize briefing_events table (fix broken calendar buttons)
  2. P0: Add pipeline logging to ingress_logs
  3. P1: Create event_graph table + write step in pipeline
  4. P1: Wire System State dashboard to actual data
  5. P2: Add coordination type classification
  6. P2: Connect ChromaDB for semantic search on processed documents

This order ensures UAT flows work end-to-end first (calendar buttons), then adds observability (logging), then persistence (Event Graph), then completeness (coordination + search).