# 🎯 Icarus Phase 5b — Memory System Port (Option A) **Director:** Matt (Veto Active: Option B, Premature Abstraction) **Decision:** Copy, Decouple, Sever **Architectural Principle:** Icarus is sovereign. Zero cross-domain dependencies. --- ## 📋 Scope Port classification + memory + validation loop from `costco_route/` to `icarus/core/`. **Source Files (Copy ONLY, do not import):** - `costco_route/router.py` → `icarus/core/router.py` (adapted) - `costco_route/item_memory.py` → `icarus/core/memory/engine.py` - `costco_route/pipeline.py` validation logic → `icarus/core/validation/sanity.py` **Constraint:** No `import costco_route.*` anywhere in Icarus. Blast radius = 0. --- ## Component 1: Router Adaptation **File:** `icarus/core/router.py` (NEW) ```python """Family document router — sorts classified documents by confidence/priority.""" from typing import Optional from icarus.core.config.staging import FAMILY_MEMBERS # Family member "zones" — priority order for routing MEMBER_ORDER = ["sully", "harper", "aundrea", "matt", "family"] def generate_routing(classified: dict[str, list[dict]], learned_overrides: Optional[dict] = None) -> list[dict]: """Generate prioritized routing for documents. Args: classified: {member_id: [documents]} learned_overrides: Override from ChromaDB memory Returns: List of member groups in priority order, each with docs. """ if learned_overrides: classified = _apply_overrides(classified, learned_overrides) route = [] for member_id in MEMBER_ORDER: docs = classified.get(member_id, []) if not docs: continue member_info = FAMILY_MEMBERS.get(member_id, {"name": member_id}) route.append({ "member_id": member_id, "member_name": member_info["name"], "documents": docs, "priority": _calculate_priority(docs) }) return route def _apply_overrides(classified: dict, overrides: dict) -> dict: """Apply learned overrides from memory.""" # Similar logic to costco router, but for family members pass # Implement based on item_memory.py pattern def _calculate_priority(docs: list[dict]) -> float: """Calculate priority score for a member's documents.""" # Based on urgency, deadline proximity, etc. pass ``` **Key Changes from Costco:** - `ZONE_ORDER` → `MEMBER_ORDER` - Zone IDs → Member IDs - "items" → "documents" - `format_route()` → `format_briefing_queue()` --- ## Component 2: Memory Engine **File:** `icarus/core/memory/engine.py` (NEW) ```python """ChromaDB-based learned pattern memory for family document routing. Remembers document → family member assignments after user calibration. Overrides LLM classification on future documents. Adapted from costco_route/item_memory.py — decoupled, renamed, Icarus-specific. """ import chromadb from icarus.core.config.staging import CHROMA_DB_PATH from icarus.core.embeddings import get_embedding CHROMA_COLLECTION = "family_assignments" def _get_collection(): """Get or create ChromaDB collection.""" client = chromadb.PersistentClient(path=str(CHROMA_DB_PATH)) return client.get_or_create_collection( name=CHROMA_COLLECTION, metadata={"hnsw:space": "cosine"}, ) def learn_assignment(document_hash: str, member_id: str, pattern: str, notes: str = ""): """Save a learned family assignment. Args: document_hash: Unique hash of document content member_id: Family member ID (sully, harper, etc.) pattern: Extracted pattern that triggered assignment notes: Optional notes """ col = _get_collection() emb = get_embedding(pattern) col.upsert( ids=[document_hash], embeddings=[emb], documents=[pattern], metadatas=[{ "member_id": member_id, "notes": notes, "source": "user_calibration" }], ) def lookup_assignment(pattern: str, threshold: float = 0.85) -> Optional[dict]: """Check if we have a learned assignment for a pattern. Args: pattern: Document pattern to look up threshold: Cosine similarity threshold Returns: Dict with member_id, notes, similarity — or None """ col = _get_collection() if col.count() == 0: return None emb = get_embedding(pattern) results = col.query( query_embeddings=[emb], n_results=3, include=["documents", "metadatas", "distances"], ) if not results["ids"] or not results["ids"][0]: return None # Process results (similar to costco_route logic) distance = results["distances"][0][0] similarity = 1.0 - (distance / 2.0) if similarity >= threshold: return { "pattern": results["documents"][0][0], "member_id": results["metadatas"][0][0]["member_id"], "notes": results["metadatas"][0][0].get("notes", ""), "similarity": round(similarity, 3), } return None def stats() -> dict: """Return memory statistics.""" col = _get_collection() return { "total_learned": col.count(), "collection": CHROMA_COLLECTION, "path": str(CHROMA_DB_PATH), } ``` **Key Changes from Costco:** - `learn_item()` → `learn_assignment()` - `item` → `document_hash` + `pattern` - `zone` → `member_id` - Removed Instacart seed data logic - Icarus-specific metadata --- ## Component 3: Validation Logic **File:** `icarus/core/validation/sanity.py` (NEW) ```python """Sanity checks for LLM classification — prevent hallucinations and drops. Adapted from costco_route/pipeline.py validation logic. """ import difflib from typing import Optional def validate_classification( classified: dict[str, list[dict]], original_documents: list[dict] ) -> dict[str, list[dict]]: """Remove hallucinated classifications, recover dropped documents. The LLM sometimes: 1. Assigns documents not in original list (hallucination) 2. Drops documents from original list Args: classified: {member_id: [documents]} original_documents: Original document list Returns: Validated classification dict """ # Build lookup of original docs by content hash original_hashes = {doc["content_hash"]: doc for doc in original_documents} matched_hashes = set() # Filter out hallucinations validated = {} for member_id, docs in classified.items(): kept = [] for doc in docs: doc_hash = doc.get("content_hash") if doc_hash in original_hashes: matched_hashes.add(doc_hash) kept.append(doc) if kept: validated[member_id] = kept # Find dropped documents dropped = [doc for h, doc in original_hashes.items() if h not in matched_hashes] if dropped: # Assign to default member based on keywords for doc in dropped: member = _infer_member_from_content(doc) validated.setdefault(member, []).append(doc) return validated def _infer_member_from_content(doc: dict) -> str: """Infer family member from document content (keyword fallback).""" content = doc.get("content", "").lower() KEYWORDS = { "sully": ["first grade", "mrs. smith", "sullivan", "dinosaur", "space"], "harper": ["pre-k", "ms. johnson", "harper", "unicorn", "dance"], "aundrea": ["hospital", "work", "night shift"], "matt": ["software", "meeting", "work"], } for member, keywords in KEYWORDS.items(): if any(kw in content for kw in keywords): return member return "family" # Default ``` **Key Changes from Costco:** - Item name matching → Document hash matching - Zone defaults → Member keyword inference - Costco-specific keywords → Family-specific keywords --- ## Component 4: SQLite Integration (Ingress Spooler) **File:** `icarus/core/memory/replay.py` (NEW) ```python """Replay test payloads through memory loops for debugging. Hooks into SQLite Ingress Spooler for test replay. """ import sqlite3 from pathlib import Path from typing import Optional from icarus.core.memory.engine import lookup_assignment, learn_assignment def replay_payload(payload_id: str, spooler_db: Path) -> dict: """Replay a payload through the memory system. Args: payload_id: ID in spooler database spooler_db: Path to SQLite spooler database Returns: Replay results with before/after comparison """ conn = sqlite3.connect(spooler_db) cursor = conn.cursor() # Fetch payload cursor.execute( "SELECT content, extracted_pattern, assigned_member FROM spooler WHERE id = ?", (payload_id,) ) row = cursor.fetchone() if not row: return {"error": f"Payload {payload_id} not found"} content, pattern, original_assignment = row # Check memory BEFORE memory_before = lookup_assignment(pattern) # Run through assignment logic # ... (call to classifier) ... # Simulate user correction if needed # learn_assignment(...) # Check memory AFTER memory_after = lookup_assignment(pattern) return { "payload_id": payload_id, "pattern": pattern, "original_assignment": original_assignment, "memory_before": memory_before, "memory_after": memory_after, "spooler_db": str(spooler_db), } def spool_test_payload(content: str, pattern: str, spooler_db: Path) -> str: """Add a test payload to the spooler for replay. Args: content: Document content pattern: Extracted pattern for matching spooler_db: Path to SQLite database Returns: payload_id for replay """ # Implementation depends on spooler schema pass ``` --- ## 🎯 Success Criteria 1. ✅ `icarus/core/router.py` — Family document routing (no Costco refs) 2. ✅ `icarus/core/memory/engine.py` — ChromaDB learning (Icarus-specific) 3. ✅ `icarus/core/validation/sanity.py` — Hallucination prevention 4. ✅ `icarus/core/memory/replay.py` — SQLite spooler integration 5. ✅ Zero imports from `costco_route/` 6. ✅ All files reference Icarus data structures only --- ## Validation Commands ```bash # Verify no cross-domain imports grep -r "from costco_route" icarus/core/ || echo "✅ Clean" grep -r "import costco_route" icarus/core/ || echo "✅ Clean" # Verify SQLite integration curl https://icarus-test.hoffdesk.com/memory/replay/ # Test learning loop curl -X POST https://icarus-test.hoffdesk.com/memory/learn \ -H "Content-Type: application/json" \ -d '{"pattern": "field trip", "member_id": "sully"}' ``` --- ## 📚 Files Created | File | Purpose | |------|---------| | `icarus/core/router.py` | Family document prioritization | | `icarus/core/memory/engine.py` | ChromaDB learning + recall | | `icarus/core/memory/replay.py` | SQLite spooler integration | | `icarus/core/validation/sanity.py` | LLM output validation | --- **EXECUTE. Copy, Decouple, Sever. Zero blast radius.** --- *Questions: @mention Wadsworth in The Hoffmann Board*