# 🎯 Icarus Phase 3 — Vision Pipeline & Briefing Cards **To:** Socrates 🧠 **From:** Matt (Director) via Wadsworth 📋 **Date:** 2026-04-25 **Priority:** P1 — Execute immediately **Status:** Infrastructure ready, Phase 2 complete ✅ --- ## MISSION Build the **Vision Pipeline** for Icarus — email attachments (PDFs, images) → structured briefing cards. This is the core value proposition: automated contextual briefing from unstructured documents. --- ## MODEL STRATEGY (Confirmed) | Task | Model | Endpoint | |------|-------|----------| | **Vision parsing** | qwen3-vl (or qwen2.5-vl) | Gaming PC via Ollama | | **Text extraction/JSON** | 8B class (llama3.1:8b, qwen2.5:7b) | Gaming PC via Ollama | | **Fast filtering** | 3B class (llama3.2:3b) | Gaming PC via Ollama | **Target hardware:** M4 Mac mini 16GB (simulated on 3080 Ti for staging) **Ollama endpoint:** `http://matt-pc.tail864e81.ts.net:11434` --- ## PHASE 3 COMPONENTS ### Component 1: Vision Service (`icarus/core/vision/`) **Create:** `icarus/core/vision/parser.py` ```python """Vision document parser — PDF/image → structured text.""" from pathlib import Path import base64 import httpx from icarus.core.config.staging import OLLAMA_BASE_URL from icarus.core.utils.model_gate import validate_ollama_request VISION_MODEL = "qwen3-vl" # or qwen2.5-vl if 3 unavailable async def parse_document(file_path: Path) -> dict: """ Parse a PDF or image document via vision model. Returns: { "text": "extracted text content", "structure": "layout description", "confidence": 0.95, "pages": 1 } """ validate_ollama_request(VISION_MODEL) # Convert file to base64 for vision model content = file_path.read_bytes() b64_content = base64.b64encode(content).decode() # Call Ollama vision endpoint async with httpx.AsyncClient() as client: response = await client.post( f"{OLLAMA_BASE_URL}/api/chat", json={ "model": VISION_MODEL, "messages": [{ "role": "user", "content": "Extract all text from this document. Preserve structure. Return as plain text.", "images": [b64_content] }], "stream": False } ) response.raise_for_status() result = response.json() return { "text": result["message"]["content"], "structure": "document", "confidence": 0.9, # Estimate or parse from model "pages": 1 # Count if PDF } ``` **Test document types:** - Field trip forms (PDF) - Doctor appointment cards (image) - School newsletters (PDF) - Handwritten notes (image) --- ### Component 2: Briefing Card Generator (`icarus/core/briefing/`) **Create:** `icarus/core/briefing/generator.py` ```python """Generate contextual briefing cards from parsed documents.""" from icarus.core.config.staging import OLLAMA_PRIMARY_MODEL from icarus.core.utils.model_gate import validate_ollama_request import httpx import json async def generate_briefing(parsed_doc: dict, context: dict) -> dict: """ Generate a briefing card from parsed document + calendar context. Args: parsed_doc: Output from vision parser context: { "calendar_events": [...], # Conflicts, nearby events "family_members": [...], # Who this affects "urgency": "high/medium/low" } Returns: { "title": "Field Trip: Museum of Science", "summary": "Sully's class trip on May 15...", "key_details": { "date": "2026-05-15", "time": "9:00 AM - 2:00 PM", "location": "Museum of Science", "cost": "$15", "action_required": "Permission slip + lunch" }, "conflicts": ["Harper violin 4:00 PM"], "suggested_actions": [ "Sign permission slip", "Pack lunch", "Set reminder for 8:30 AM departure" ], "confidence": 0.92 } """ validate_ollama_request(OLLAMA_PRIMARY_MODEL) prompt = f"""You are Icarus, a family context engine. Generate a briefing card from this document. DOCUMENT TEXT: {parsed_doc['text']} CALENDAR CONTEXT: {json.dumps(context['calendar_events'], indent=2)} Create a briefing card with: 1. Clear title 2. One-paragraph summary 3. Key details (date, time, location, cost, requirements) 4. Any calendar conflicts 5. Suggested actions 6. Confidence score (0-1) Return as JSON.""" async with httpx.AsyncClient() as client: response = await client.post( f"{OLLAMA_BASE_URL}/api/chat", json={ "model": OLLAMA_PRIMARY_MODEL, "messages": [{"role": "user", "content": prompt}], "format": "json", "stream": False } ) response.raise_for_status() result = response.json() return json.loads(result["message"]["content"]) ``` --- ### Component 3: Email Attachment Pipeline **Extend:** `icarus/core/email_fetcher.py` or create `icarus/core/vision/pipeline.py` ```python """End-to-end: Email attachment → briefing card.""" from pathlib import Path import tempfile from icarus.core.vision.parser import parse_document from icarus.core.briefing.generator import generate_briefing from icarus.core.calendar_sync import get_upcoming_events # For context async def process_attachment(email_meta: dict, attachment: bytes, filename: str) -> dict: """ Process an email attachment through the vision pipeline. Args: email_meta: {"from", "subject", "date", "to"} attachment: Raw bytes filename: Original filename Returns: Briefing card dict """ # Save to temp file suffix = Path(filename).suffix.lower() with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp: tmp.write(attachment) tmp_path = Path(tmp.name) try: # Step 1: Vision parse if suffix in ['.pdf', '.png', '.jpg', '.jpeg']: parsed = await parse_document(tmp_path) else: raise ValueError(f"Unsupported file type: {suffix}") # Step 2: Gather context # TODO: Query calendar for conflicts around parsed date context = { "calendar_events": [], # Populate from calendar_sync "family_members": infer_recipients(email_meta), "urgency": "medium" } # Step 3: Generate briefing briefing = await generate_briefing(parsed, context) return briefing finally: tmp_path.unlink(missing_ok=True) def infer_recipients(email_meta: dict) -> list: """Infer which family members this email concerns.""" # Simple keyword matching for MVP recipients = [] text = f"{email_meta.get('subject', '')} {email_meta.get('to', '')}".lower() if 'sully' in text or 'sullivan' in text: recipients.append('Sullivan') if 'harper' in text: recipients.append('Harper') return recipients or ['Family'] ``` --- ### Component 4: API Endpoints **Extend:** `icarus/core/api.py` ```python from fastapi import FastAPI, File, UploadFile, HTTPException from icarus.core.vision.pipeline import process_attachment from icarus.core.briefing.generator import generate_briefing app = FastAPI(title="Icarus", version="0.1.0") # Existing health endpoint... @app.post("/vision/parse") async def vision_parse(file: UploadFile = File(...)): """Upload a document and get parsed text.""" from icarus.core.vision.parser import parse_document import tempfile suffix = Path(file.filename).suffix with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp: content = await file.read() tmp.write(content) tmp_path = Path(tmp.name) try: result = await parse_document(tmp_path) return result finally: tmp_path.unlink(missing_ok=True) @app.post("/vision/briefing") async def vision_briefing(file: UploadFile = File(...)): """Upload a document and get a full briefing card.""" content = await file.read() email_meta = { "from": "upload@icarus.local", "subject": file.filename, "date": "now", "to": "family" } briefing = await process_attachment(email_meta, content, file.filename) return briefing @app.get("/vision/status") async def vision_status(): """Check vision model availability.""" import httpx from icarus.core.config.staging import OLLAMA_BASE_URL try: async with httpx.AsyncClient() as client: response = await client.get(f"{OLLAMA_BASE_URL}/api/tags") models = response.json().get("models", []) vision_ready = any("vl" in m["name"] for m in models) return { "ollama_ready": True, "vision_model_ready": vision_ready, "available_models": [m["name"] for m in models] } except Exception as e: return {"ollama_ready": False, "error": str(e)} ``` --- ## VERIFICATION CHECKLIST Before declaring Phase 3 complete: ```bash # 1. Vision model available curl https://icarus-test.hoffdesk.com/vision/status # Expected: {"ollama_ready": true, "vision_model_ready": true} # 2. Parse endpoint works curl -X POST -F "file=@test-receipt.pdf" \ https://icarus-test.hoffdesk.com/vision/parse # Expected: {"text": "...", "structure": "...", "confidence": 0.9} # 3. Briefing endpoint works curl -X POST -F "file=@test-field-trip.pdf" \ https://icarus-test.hoffdesk.com/vision/briefing # Expected: Full briefing card JSON ``` --- ## TEST DOCUMENTS Create sample test files in `icarus/tests/fixtures/`: 1. **test-field-trip.pdf** — School field trip form 2. **test-appointment.jpg** — Doctor appointment card photo 3. **test-newsletter.pdf** — School newsletter snippet --- ## INTEGRATION WITH EXISTING PIPELINE The vision pipeline plugs into the email worker: ```python # In email_worker.py or webhook handler if attachment.is_pdf_or_image(): briefing = await process_attachment(email_meta, attachment.bytes, attachment.filename) # Send to Telegram await send_briefing_card(briefing, chat_id) # Optionally create calendar event if detected if briefing.get("key_details", {}).get("date"): await create_calendar_event_from_briefing(briefing) ``` --- ## SUCCESS CRITERIA **Phase 3 Complete When:** 1. ✅ Vision parser extracts text from PDF/image 2. ✅ Briefing generator creates structured cards 3. ✅ API endpoints respond correctly 4. ✅ End-to-end: Upload file → Get briefing card 5. ✅ Model gate enforces 8B/3B limits 6. ✅ All data paths use `DATA_DIR` --- ## BLOCKERS & ESCALATION | Issue | Escalate To | |-------|-------------| | Vision model not available on Gaming PC | Wadsworth → Matt | | JSON parsing failures | Wadsworth (prompt engineering) | | Ollama connection issues | Wadsworth → Matt | | Scope creep | Matt (Director) | --- **Execute. Build the vision pipeline. Report daily progress.** --- *Questions: @mention Wadsworth in The Hoffmann Board* *Full spec: This document*