📄 icarus-phase-5b-memory-port.md 11,520 bytes Apr 26, 2026 📋 Raw

🎯 Icarus Phase 5b — Memory System Port (Option A)

Director: Matt (Veto Active: Option B, Premature Abstraction)
Decision: Copy, Decouple, Sever
Architectural Principle: Icarus is sovereign. Zero cross-domain dependencies.


📋 Scope

Port classification + memory + validation loop from costco_route/ to icarus/core/.

Source Files (Copy ONLY, do not import):
- costco_route/router.pyicarus/core/router.py (adapted)
- costco_route/item_memory.pyicarus/core/memory/engine.py
- costco_route/pipeline.py validation logic → icarus/core/validation/sanity.py

Constraint: No import costco_route.* anywhere in Icarus. Blast radius = 0.


Component 1: Router Adaptation

File: icarus/core/router.py (NEW)

"""Family document router — sorts classified documents by confidence/priority."""

from typing import Optional
from icarus.core.config.staging import FAMILY_MEMBERS

# Family member "zones" — priority order for routing
MEMBER_ORDER = ["sully", "harper", "aundrea", "matt", "family"]

def generate_routing(classified: dict[str, list[dict]], 
                     learned_overrides: Optional[dict] = None) -> list[dict]:
    """Generate prioritized routing for documents.

    Args:
        classified: {member_id: [documents]}
        learned_overrides: Override from ChromaDB memory

    Returns:
        List of member groups in priority order, each with docs.
    """
    if learned_overrides:
        classified = _apply_overrides(classified, learned_overrides)

    route = []
    for member_id in MEMBER_ORDER:
        docs = classified.get(member_id, [])
        if not docs:
            continue

        member_info = FAMILY_MEMBERS.get(member_id, {"name": member_id})
        route.append({
            "member_id": member_id,
            "member_name": member_info["name"],
            "documents": docs,
            "priority": _calculate_priority(docs)
        })

    return route


def _apply_overrides(classified: dict, overrides: dict) -> dict:
    """Apply learned overrides from memory."""
    # Similar logic to costco router, but for family members
    pass  # Implement based on item_memory.py pattern


def _calculate_priority(docs: list[dict]) -> float:
    """Calculate priority score for a member's documents."""
    # Based on urgency, deadline proximity, etc.
    pass

Key Changes from Costco:
- ZONE_ORDERMEMBER_ORDER
- Zone IDs → Member IDs
- "items" → "documents"
- format_route()format_briefing_queue()


Component 2: Memory Engine

File: icarus/core/memory/engine.py (NEW)

"""ChromaDB-based learned pattern memory for family document routing.

Remembers document → family member assignments after user calibration.
Overrides LLM classification on future documents.

Adapted from costco_route/item_memory.py — decoupled, renamed, Icarus-specific.
"""

import chromadb
from icarus.core.config.staging import CHROMA_DB_PATH
from icarus.core.embeddings import get_embedding

CHROMA_COLLECTION = "family_assignments"


def _get_collection():
    """Get or create ChromaDB collection."""
    client = chromadb.PersistentClient(path=str(CHROMA_DB_PATH))
    return client.get_or_create_collection(
        name=CHROMA_COLLECTION,
        metadata={"hnsw:space": "cosine"},
    )


def learn_assignment(document_hash: str, member_id: str, 
                     pattern: str, notes: str = ""):
    """Save a learned family assignment.

    Args:
        document_hash: Unique hash of document content
        member_id: Family member ID (sully, harper, etc.)
        pattern: Extracted pattern that triggered assignment
        notes: Optional notes
    """
    col = _get_collection()
    emb = get_embedding(pattern)

    col.upsert(
        ids=[document_hash],
        embeddings=[emb],
        documents=[pattern],
        metadatas=[{
            "member_id": member_id,
            "notes": notes,
            "source": "user_calibration"
        }],
    )


def lookup_assignment(pattern: str, threshold: float = 0.85) -> Optional[dict]:
    """Check if we have a learned assignment for a pattern.

    Args:
        pattern: Document pattern to look up
        threshold: Cosine similarity threshold

    Returns:
        Dict with member_id, notes, similarity — or None
    """
    col = _get_collection()
    if col.count() == 0:
        return None

    emb = get_embedding(pattern)
    results = col.query(
        query_embeddings=[emb],
        n_results=3,
        include=["documents", "metadatas", "distances"],
    )

    if not results["ids"] or not results["ids"][0]:
        return None

    # Process results (similar to costco_route logic)
    distance = results["distances"][0][0]
    similarity = 1.0 - (distance / 2.0)

    if similarity >= threshold:
        return {
            "pattern": results["documents"][0][0],
            "member_id": results["metadatas"][0][0]["member_id"],
            "notes": results["metadatas"][0][0].get("notes", ""),
            "similarity": round(similarity, 3),
        }

    return None


def stats() -> dict:
    """Return memory statistics."""
    col = _get_collection()
    return {
        "total_learned": col.count(),
        "collection": CHROMA_COLLECTION,
        "path": str(CHROMA_DB_PATH),
    }

Key Changes from Costco:
- learn_item()learn_assignment()
- itemdocument_hash + pattern
- zonemember_id
- Removed Instacart seed data logic
- Icarus-specific metadata


Component 3: Validation Logic

File: icarus/core/validation/sanity.py (NEW)

"""Sanity checks for LLM classification — prevent hallucinations and drops.

Adapted from costco_route/pipeline.py validation logic.
"""

import difflib
from typing import Optional


def validate_classification(
    classified: dict[str, list[dict]], 
    original_documents: list[dict]
) -> dict[str, list[dict]]:
    """Remove hallucinated classifications, recover dropped documents.

    The LLM sometimes:
    1. Assigns documents not in original list (hallucination)
    2. Drops documents from original list

    Args:
        classified: {member_id: [documents]}
        original_documents: Original document list

    Returns:
        Validated classification dict
    """
    # Build lookup of original docs by content hash
    original_hashes = {doc["content_hash"]: doc for doc in original_documents}
    matched_hashes = set()

    # Filter out hallucinations
    validated = {}
    for member_id, docs in classified.items():
        kept = []
        for doc in docs:
            doc_hash = doc.get("content_hash")
            if doc_hash in original_hashes:
                matched_hashes.add(doc_hash)
                kept.append(doc)
        if kept:
            validated[member_id] = kept

    # Find dropped documents
    dropped = [doc for h, doc in original_hashes.items() 
               if h not in matched_hashes]

    if dropped:
        # Assign to default member based on keywords
        for doc in dropped:
            member = _infer_member_from_content(doc)
            validated.setdefault(member, []).append(doc)

    return validated


def _infer_member_from_content(doc: dict) -> str:
    """Infer family member from document content (keyword fallback)."""
    content = doc.get("content", "").lower()

    KEYWORDS = {
        "sully": ["first grade", "mrs. smith", "sullivan", "dinosaur", "space"],
        "harper": ["pre-k", "ms. johnson", "harper", "unicorn", "dance"],
        "aundrea": ["hospital", "work", "night shift"],
        "matt": ["software", "meeting", "work"],
    }

    for member, keywords in KEYWORDS.items():
        if any(kw in content for kw in keywords):
            return member

    return "family"  # Default

Key Changes from Costco:
- Item name matching → Document hash matching
- Zone defaults → Member keyword inference
- Costco-specific keywords → Family-specific keywords


Component 4: SQLite Integration (Ingress Spooler)

File: icarus/core/memory/replay.py (NEW)

"""Replay test payloads through memory loops for debugging.

Hooks into SQLite Ingress Spooler for test replay.
"""

import sqlite3
from pathlib import Path
from typing import Optional

from icarus.core.memory.engine import lookup_assignment, learn_assignment


def replay_payload(payload_id: str, spooler_db: Path) -> dict:
    """Replay a payload through the memory system.

    Args:
        payload_id: ID in spooler database
        spooler_db: Path to SQLite spooler database

    Returns:
        Replay results with before/after comparison
    """
    conn = sqlite3.connect(spooler_db)
    cursor = conn.cursor()

    # Fetch payload
    cursor.execute(
        "SELECT content, extracted_pattern, assigned_member FROM spooler WHERE id = ?",
        (payload_id,)
    )
    row = cursor.fetchone()

    if not row:
        return {"error": f"Payload {payload_id} not found"}

    content, pattern, original_assignment = row

    # Check memory BEFORE
    memory_before = lookup_assignment(pattern)

    # Run through assignment logic
    # ... (call to classifier) ...

    # Simulate user correction if needed
    # learn_assignment(...)

    # Check memory AFTER
    memory_after = lookup_assignment(pattern)

    return {
        "payload_id": payload_id,
        "pattern": pattern,
        "original_assignment": original_assignment,
        "memory_before": memory_before,
        "memory_after": memory_after,
        "spooler_db": str(spooler_db),
    }


def spool_test_payload(content: str, pattern: str, 
                     spooler_db: Path) -> str:
    """Add a test payload to the spooler for replay.

    Args:
        content: Document content
        pattern: Extracted pattern for matching
        spooler_db: Path to SQLite database

    Returns:
        payload_id for replay
    """
    # Implementation depends on spooler schema
    pass

🎯 Success Criteria

  1. icarus/core/router.py — Family document routing (no Costco refs)
  2. icarus/core/memory/engine.py — ChromaDB learning (Icarus-specific)
  3. icarus/core/validation/sanity.py — Hallucination prevention
  4. icarus/core/memory/replay.py — SQLite spooler integration
  5. ✅ Zero imports from costco_route/
  6. ✅ All files reference Icarus data structures only

Validation Commands

# Verify no cross-domain imports
grep -r "from costco_route" icarus/core/ || echo "✅ Clean"
grep -r "import costco_route" icarus/core/ || echo "✅ Clean"

# Verify SQLite integration
curl https://icarus-test.hoffdesk.com/memory/replay/<payload_id>

# Test learning loop
curl -X POST https://icarus-test.hoffdesk.com/memory/learn \
  -H "Content-Type: application/json" \
  -d '{"pattern": "field trip", "member_id": "sully"}'

📚 Files Created

File Purpose
icarus/core/router.py Family document prioritization
icarus/core/memory/engine.py ChromaDB learning + recall
icarus/core/memory/replay.py SQLite spooler integration
icarus/core/validation/sanity.py LLM output validation

EXECUTE. Copy, Decouple, Sever. Zero blast radius.


Questions: @mention Wadsworth in The Hoffmann Board