🎯 Icarus Phase 5b — Memory System Port (Option A)
Director: Matt (Veto Active: Option B, Premature Abstraction)
Decision: Copy, Decouple, Sever
Architectural Principle: Icarus is sovereign. Zero cross-domain dependencies.
📋 Scope
Port classification + memory + validation loop from costco_route/ to icarus/core/.
Source Files (Copy ONLY, do not import):
- costco_route/router.py → icarus/core/router.py (adapted)
- costco_route/item_memory.py → icarus/core/memory/engine.py
- costco_route/pipeline.py validation logic → icarus/core/validation/sanity.py
Constraint: No import costco_route.* anywhere in Icarus. Blast radius = 0.
Component 1: Router Adaptation
File: icarus/core/router.py (NEW)
"""Family document router — sorts classified documents by confidence/priority."""
from typing import Optional
from icarus.core.config.staging import FAMILY_MEMBERS
# Family member "zones" — priority order for routing
MEMBER_ORDER = ["sully", "harper", "aundrea", "matt", "family"]
def generate_routing(classified: dict[str, list[dict]],
learned_overrides: Optional[dict] = None) -> list[dict]:
"""Generate prioritized routing for documents.
Args:
classified: {member_id: [documents]}
learned_overrides: Override from ChromaDB memory
Returns:
List of member groups in priority order, each with docs.
"""
if learned_overrides:
classified = _apply_overrides(classified, learned_overrides)
route = []
for member_id in MEMBER_ORDER:
docs = classified.get(member_id, [])
if not docs:
continue
member_info = FAMILY_MEMBERS.get(member_id, {"name": member_id})
route.append({
"member_id": member_id,
"member_name": member_info["name"],
"documents": docs,
"priority": _calculate_priority(docs)
})
return route
def _apply_overrides(classified: dict, overrides: dict) -> dict:
"""Apply learned overrides from memory."""
# Similar logic to costco router, but for family members
pass # Implement based on item_memory.py pattern
def _calculate_priority(docs: list[dict]) -> float:
"""Calculate priority score for a member's documents."""
# Based on urgency, deadline proximity, etc.
pass
Key Changes from Costco:
- ZONE_ORDER → MEMBER_ORDER
- Zone IDs → Member IDs
- "items" → "documents"
- format_route() → format_briefing_queue()
Component 2: Memory Engine
File: icarus/core/memory/engine.py (NEW)
"""ChromaDB-based learned pattern memory for family document routing.
Remembers document → family member assignments after user calibration.
Overrides LLM classification on future documents.
Adapted from costco_route/item_memory.py — decoupled, renamed, Icarus-specific.
"""
import chromadb
from icarus.core.config.staging import CHROMA_DB_PATH
from icarus.core.embeddings import get_embedding
CHROMA_COLLECTION = "family_assignments"
def _get_collection():
"""Get or create ChromaDB collection."""
client = chromadb.PersistentClient(path=str(CHROMA_DB_PATH))
return client.get_or_create_collection(
name=CHROMA_COLLECTION,
metadata={"hnsw:space": "cosine"},
)
def learn_assignment(document_hash: str, member_id: str,
pattern: str, notes: str = ""):
"""Save a learned family assignment.
Args:
document_hash: Unique hash of document content
member_id: Family member ID (sully, harper, etc.)
pattern: Extracted pattern that triggered assignment
notes: Optional notes
"""
col = _get_collection()
emb = get_embedding(pattern)
col.upsert(
ids=[document_hash],
embeddings=[emb],
documents=[pattern],
metadatas=[{
"member_id": member_id,
"notes": notes,
"source": "user_calibration"
}],
)
def lookup_assignment(pattern: str, threshold: float = 0.85) -> Optional[dict]:
"""Check if we have a learned assignment for a pattern.
Args:
pattern: Document pattern to look up
threshold: Cosine similarity threshold
Returns:
Dict with member_id, notes, similarity — or None
"""
col = _get_collection()
if col.count() == 0:
return None
emb = get_embedding(pattern)
results = col.query(
query_embeddings=[emb],
n_results=3,
include=["documents", "metadatas", "distances"],
)
if not results["ids"] or not results["ids"][0]:
return None
# Process results (similar to costco_route logic)
distance = results["distances"][0][0]
similarity = 1.0 - (distance / 2.0)
if similarity >= threshold:
return {
"pattern": results["documents"][0][0],
"member_id": results["metadatas"][0][0]["member_id"],
"notes": results["metadatas"][0][0].get("notes", ""),
"similarity": round(similarity, 3),
}
return None
def stats() -> dict:
"""Return memory statistics."""
col = _get_collection()
return {
"total_learned": col.count(),
"collection": CHROMA_COLLECTION,
"path": str(CHROMA_DB_PATH),
}
Key Changes from Costco:
- learn_item() → learn_assignment()
- item → document_hash + pattern
- zone → member_id
- Removed Instacart seed data logic
- Icarus-specific metadata
Component 3: Validation Logic
File: icarus/core/validation/sanity.py (NEW)
"""Sanity checks for LLM classification — prevent hallucinations and drops.
Adapted from costco_route/pipeline.py validation logic.
"""
import difflib
from typing import Optional
def validate_classification(
classified: dict[str, list[dict]],
original_documents: list[dict]
) -> dict[str, list[dict]]:
"""Remove hallucinated classifications, recover dropped documents.
The LLM sometimes:
1. Assigns documents not in original list (hallucination)
2. Drops documents from original list
Args:
classified: {member_id: [documents]}
original_documents: Original document list
Returns:
Validated classification dict
"""
# Build lookup of original docs by content hash
original_hashes = {doc["content_hash"]: doc for doc in original_documents}
matched_hashes = set()
# Filter out hallucinations
validated = {}
for member_id, docs in classified.items():
kept = []
for doc in docs:
doc_hash = doc.get("content_hash")
if doc_hash in original_hashes:
matched_hashes.add(doc_hash)
kept.append(doc)
if kept:
validated[member_id] = kept
# Find dropped documents
dropped = [doc for h, doc in original_hashes.items()
if h not in matched_hashes]
if dropped:
# Assign to default member based on keywords
for doc in dropped:
member = _infer_member_from_content(doc)
validated.setdefault(member, []).append(doc)
return validated
def _infer_member_from_content(doc: dict) -> str:
"""Infer family member from document content (keyword fallback)."""
content = doc.get("content", "").lower()
KEYWORDS = {
"sully": ["first grade", "mrs. smith", "sullivan", "dinosaur", "space"],
"harper": ["pre-k", "ms. johnson", "harper", "unicorn", "dance"],
"aundrea": ["hospital", "work", "night shift"],
"matt": ["software", "meeting", "work"],
}
for member, keywords in KEYWORDS.items():
if any(kw in content for kw in keywords):
return member
return "family" # Default
Key Changes from Costco:
- Item name matching → Document hash matching
- Zone defaults → Member keyword inference
- Costco-specific keywords → Family-specific keywords
Component 4: SQLite Integration (Ingress Spooler)
File: icarus/core/memory/replay.py (NEW)
"""Replay test payloads through memory loops for debugging.
Hooks into SQLite Ingress Spooler for test replay.
"""
import sqlite3
from pathlib import Path
from typing import Optional
from icarus.core.memory.engine import lookup_assignment, learn_assignment
def replay_payload(payload_id: str, spooler_db: Path) -> dict:
"""Replay a payload through the memory system.
Args:
payload_id: ID in spooler database
spooler_db: Path to SQLite spooler database
Returns:
Replay results with before/after comparison
"""
conn = sqlite3.connect(spooler_db)
cursor = conn.cursor()
# Fetch payload
cursor.execute(
"SELECT content, extracted_pattern, assigned_member FROM spooler WHERE id = ?",
(payload_id,)
)
row = cursor.fetchone()
if not row:
return {"error": f"Payload {payload_id} not found"}
content, pattern, original_assignment = row
# Check memory BEFORE
memory_before = lookup_assignment(pattern)
# Run through assignment logic
# ... (call to classifier) ...
# Simulate user correction if needed
# learn_assignment(...)
# Check memory AFTER
memory_after = lookup_assignment(pattern)
return {
"payload_id": payload_id,
"pattern": pattern,
"original_assignment": original_assignment,
"memory_before": memory_before,
"memory_after": memory_after,
"spooler_db": str(spooler_db),
}
def spool_test_payload(content: str, pattern: str,
spooler_db: Path) -> str:
"""Add a test payload to the spooler for replay.
Args:
content: Document content
pattern: Extracted pattern for matching
spooler_db: Path to SQLite database
Returns:
payload_id for replay
"""
# Implementation depends on spooler schema
pass
🎯 Success Criteria
- ✅
icarus/core/router.py— Family document routing (no Costco refs) - ✅
icarus/core/memory/engine.py— ChromaDB learning (Icarus-specific) - ✅
icarus/core/validation/sanity.py— Hallucination prevention - ✅
icarus/core/memory/replay.py— SQLite spooler integration - ✅ Zero imports from
costco_route/ - ✅ All files reference Icarus data structures only
Validation Commands
# Verify no cross-domain imports
grep -r "from costco_route" icarus/core/ || echo "✅ Clean"
grep -r "import costco_route" icarus/core/ || echo "✅ Clean"
# Verify SQLite integration
curl https://icarus-test.hoffdesk.com/memory/replay/<payload_id>
# Test learning loop
curl -X POST https://icarus-test.hoffdesk.com/memory/learn \
-H "Content-Type: application/json" \
-d '{"pattern": "field trip", "member_id": "sully"}'
📚 Files Created
| File | Purpose |
|---|---|
icarus/core/router.py |
Family document prioritization |
icarus/core/memory/engine.py |
ChromaDB learning + recall |
icarus/core/memory/replay.py |
SQLite spooler integration |
icarus/core/validation/sanity.py |
LLM output validation |
EXECUTE. Copy, Decouple, Sever. Zero blast radius.
Questions: @mention Wadsworth in The Hoffmann Board