📄 icarus-phase-3-vision-pipeline.md 11,500 bytes Apr 25, 2026 📋 Raw

🎯 Icarus Phase 3 — Vision Pipeline & Briefing Cards

To: Socrates 🧠
From: Matt (Director) via Wadsworth 📋
Date: 2026-04-25
Priority: P1 — Execute immediately
Status: Infrastructure ready, Phase 2 complete ✅


MISSION

Build the Vision Pipeline for Icarus — email attachments (PDFs, images) → structured briefing cards. This is the core value proposition: automated contextual briefing from unstructured documents.


MODEL STRATEGY (Confirmed)

Task Model Endpoint
Vision parsing qwen3-vl (or qwen2.5-vl) Gaming PC via Ollama
Text extraction/JSON 8B class (llama3.1:8b, qwen2.5:7b) Gaming PC via Ollama
Fast filtering 3B class (llama3.2:3b) Gaming PC via Ollama

Target hardware: M4 Mac mini 16GB (simulated on 3080 Ti for staging)

Ollama endpoint: http://matt-pc.tail864e81.ts.net:11434


PHASE 3 COMPONENTS

Component 1: Vision Service (icarus/core/vision/)

Create: icarus/core/vision/parser.py

"""Vision document parser — PDF/image → structured text."""
from pathlib import Path
import base64
import httpx
from icarus.core.config.staging import OLLAMA_BASE_URL
from icarus.core.utils.model_gate import validate_ollama_request

VISION_MODEL = "qwen3-vl"  # or qwen2.5-vl if 3 unavailable

async def parse_document(file_path: Path) -> dict:
    """
    Parse a PDF or image document via vision model.

    Returns:
        {
            "text": "extracted text content",
            "structure": "layout description",
            "confidence": 0.95,
            "pages": 1
        }
    """
    validate_ollama_request(VISION_MODEL)

    # Convert file to base64 for vision model
    content = file_path.read_bytes()
    b64_content = base64.b64encode(content).decode()

    # Call Ollama vision endpoint
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{OLLAMA_BASE_URL}/api/chat",
            json={
                "model": VISION_MODEL,
                "messages": [{
                    "role": "user",
                    "content": "Extract all text from this document. Preserve structure. Return as plain text.",
                    "images": [b64_content]
                }],
                "stream": False
            }
        )
        response.raise_for_status()
        result = response.json()

    return {
        "text": result["message"]["content"],
        "structure": "document",
        "confidence": 0.9,  # Estimate or parse from model
        "pages": 1  # Count if PDF
    }

Test document types:
- Field trip forms (PDF)
- Doctor appointment cards (image)
- School newsletters (PDF)
- Handwritten notes (image)


Component 2: Briefing Card Generator (icarus/core/briefing/)

Create: icarus/core/briefing/generator.py

"""Generate contextual briefing cards from parsed documents."""
from icarus.core.config.staging import OLLAMA_PRIMARY_MODEL
from icarus.core.utils.model_gate import validate_ollama_request
import httpx
import json

async def generate_briefing(parsed_doc: dict, context: dict) -> dict:
    """
    Generate a briefing card from parsed document + calendar context.

    Args:
        parsed_doc: Output from vision parser
        context: {
            "calendar_events": [...],  # Conflicts, nearby events
            "family_members": [...],   # Who this affects
            "urgency": "high/medium/low"
        }

    Returns:
        {
            "title": "Field Trip: Museum of Science",
            "summary": "Sully's class trip on May 15...",
            "key_details": {
                "date": "2026-05-15",
                "time": "9:00 AM - 2:00 PM",
                "location": "Museum of Science",
                "cost": "$15",
                "action_required": "Permission slip + lunch"
            },
            "conflicts": ["Harper violin 4:00 PM"],
            "suggested_actions": [
                "Sign permission slip",
                "Pack lunch",
                "Set reminder for 8:30 AM departure"
            ],
            "confidence": 0.92
        }
    """
    validate_ollama_request(OLLAMA_PRIMARY_MODEL)

    prompt = f"""You are Icarus, a family context engine. Generate a briefing card from this document.

DOCUMENT TEXT:
{parsed_doc['text']}

CALENDAR CONTEXT:
{json.dumps(context['calendar_events'], indent=2)}

Create a briefing card with:
1. Clear title
2. One-paragraph summary
3. Key details (date, time, location, cost, requirements)
4. Any calendar conflicts
5. Suggested actions
6. Confidence score (0-1)

Return as JSON."""

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{OLLAMA_BASE_URL}/api/chat",
            json={
                "model": OLLAMA_PRIMARY_MODEL,
                "messages": [{"role": "user", "content": prompt}],
                "format": "json",
                "stream": False
            }
        )
        response.raise_for_status()
        result = response.json()

    return json.loads(result["message"]["content"])

Component 3: Email Attachment Pipeline

Extend: icarus/core/email_fetcher.py or create icarus/core/vision/pipeline.py

"""End-to-end: Email attachment → briefing card."""
from pathlib import Path
import tempfile
from icarus.core.vision.parser import parse_document
from icarus.core.briefing.generator import generate_briefing
from icarus.core.calendar_sync import get_upcoming_events  # For context

async def process_attachment(email_meta: dict, attachment: bytes, filename: str) -> dict:
    """
    Process an email attachment through the vision pipeline.

    Args:
        email_meta: {"from", "subject", "date", "to"}
        attachment: Raw bytes
        filename: Original filename

    Returns:
        Briefing card dict
    """
    # Save to temp file
    suffix = Path(filename).suffix.lower()
    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
        tmp.write(attachment)
        tmp_path = Path(tmp.name)

    try:
        # Step 1: Vision parse
        if suffix in ['.pdf', '.png', '.jpg', '.jpeg']:
            parsed = await parse_document(tmp_path)
        else:
            raise ValueError(f"Unsupported file type: {suffix}")

        # Step 2: Gather context
        # TODO: Query calendar for conflicts around parsed date
        context = {
            "calendar_events": [],  # Populate from calendar_sync
            "family_members": infer_recipients(email_meta),
            "urgency": "medium"
        }

        # Step 3: Generate briefing
        briefing = await generate_briefing(parsed, context)

        return briefing

    finally:
        tmp_path.unlink(missing_ok=True)

def infer_recipients(email_meta: dict) -> list:
    """Infer which family members this email concerns."""
    # Simple keyword matching for MVP
    recipients = []
    text = f"{email_meta.get('subject', '')} {email_meta.get('to', '')}".lower()
    if 'sully' in text or 'sullivan' in text:
        recipients.append('Sullivan')
    if 'harper' in text:
        recipients.append('Harper')
    return recipients or ['Family']

Component 4: API Endpoints

Extend: icarus/core/api.py

from fastapi import FastAPI, File, UploadFile, HTTPException
from icarus.core.vision.pipeline import process_attachment
from icarus.core.briefing.generator import generate_briefing

app = FastAPI(title="Icarus", version="0.1.0")

# Existing health endpoint...

@app.post("/vision/parse")
async def vision_parse(file: UploadFile = File(...)):
    """Upload a document and get parsed text."""
    from icarus.core.vision.parser import parse_document
    import tempfile

    suffix = Path(file.filename).suffix
    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = Path(tmp.name)

    try:
        result = await parse_document(tmp_path)
        return result
    finally:
        tmp_path.unlink(missing_ok=True)

@app.post("/vision/briefing")
async def vision_briefing(file: UploadFile = File(...)):
    """Upload a document and get a full briefing card."""
    content = await file.read()

    email_meta = {
        "from": "upload@icarus.local",
        "subject": file.filename,
        "date": "now",
        "to": "family"
    }

    briefing = await process_attachment(email_meta, content, file.filename)
    return briefing

@app.get("/vision/status")
async def vision_status():
    """Check vision model availability."""
    import httpx
    from icarus.core.config.staging import OLLAMA_BASE_URL

    try:
        async with httpx.AsyncClient() as client:
            response = await client.get(f"{OLLAMA_BASE_URL}/api/tags")
            models = response.json().get("models", [])
            vision_ready = any("vl" in m["name"] for m in models)
            return {
                "ollama_ready": True,
                "vision_model_ready": vision_ready,
                "available_models": [m["name"] for m in models]
            }
    except Exception as e:
        return {"ollama_ready": False, "error": str(e)}

VERIFICATION CHECKLIST

Before declaring Phase 3 complete:

# 1. Vision model available
curl https://icarus-test.hoffdesk.com/vision/status
# Expected: {"ollama_ready": true, "vision_model_ready": true}

# 2. Parse endpoint works
curl -X POST -F "file=@test-receipt.pdf" \
    https://icarus-test.hoffdesk.com/vision/parse
# Expected: {"text": "...", "structure": "...", "confidence": 0.9}

# 3. Briefing endpoint works
curl -X POST -F "file=@test-field-trip.pdf" \
    https://icarus-test.hoffdesk.com/vision/briefing
# Expected: Full briefing card JSON

TEST DOCUMENTS

Create sample test files in icarus/tests/fixtures/:

  1. test-field-trip.pdf — School field trip form
  2. test-appointment.jpg — Doctor appointment card photo
  3. test-newsletter.pdf — School newsletter snippet

INTEGRATION WITH EXISTING PIPELINE

The vision pipeline plugs into the email worker:

# In email_worker.py or webhook handler
if attachment.is_pdf_or_image():
    briefing = await process_attachment(email_meta, attachment.bytes, attachment.filename)

    # Send to Telegram
    await send_briefing_card(briefing, chat_id)

    # Optionally create calendar event if detected
    if briefing.get("key_details", {}).get("date"):
        await create_calendar_event_from_briefing(briefing)

SUCCESS CRITERIA

Phase 3 Complete When:
1. ✅ Vision parser extracts text from PDF/image
2. ✅ Briefing generator creates structured cards
3. ✅ API endpoints respond correctly
4. ✅ End-to-end: Upload file → Get briefing card
5. ✅ Model gate enforces 8B/3B limits
6. ✅ All data paths use DATA_DIR


BLOCKERS & ESCALATION

Issue Escalate To
Vision model not available on Gaming PC Wadsworth → Matt
JSON parsing failures Wadsworth (prompt engineering)
Ollama connection issues Wadsworth → Matt
Scope creep Matt (Director)

Execute. Build the vision pipeline. Report daily progress.


Questions: @mention Wadsworth in The Hoffmann Board
Full spec: This document