🎯 Icarus Phase 3 — Vision Pipeline & Briefing Cards
To: Socrates 🧠
From: Matt (Director) via Wadsworth 📋
Date: 2026-04-25
Priority: P1 — Execute immediately
Status: Infrastructure ready, Phase 2 complete ✅
MISSION
Build the Vision Pipeline for Icarus — email attachments (PDFs, images) → structured briefing cards. This is the core value proposition: automated contextual briefing from unstructured documents.
MODEL STRATEGY (Confirmed)
| Task | Model | Endpoint |
|---|---|---|
| Vision parsing | qwen3-vl (or qwen2.5-vl) | Gaming PC via Ollama |
| Text extraction/JSON | 8B class (llama3.1:8b, qwen2.5:7b) | Gaming PC via Ollama |
| Fast filtering | 3B class (llama3.2:3b) | Gaming PC via Ollama |
Target hardware: M4 Mac mini 16GB (simulated on 3080 Ti for staging)
Ollama endpoint: http://matt-pc.tail864e81.ts.net:11434
PHASE 3 COMPONENTS
Component 1: Vision Service (icarus/core/vision/)
Create: icarus/core/vision/parser.py
"""Vision document parser — PDF/image → structured text."""
from pathlib import Path
import base64
import httpx
from icarus.core.config.staging import OLLAMA_BASE_URL
from icarus.core.utils.model_gate import validate_ollama_request
VISION_MODEL = "qwen3-vl" # or qwen2.5-vl if 3 unavailable
async def parse_document(file_path: Path) -> dict:
"""
Parse a PDF or image document via vision model.
Returns:
{
"text": "extracted text content",
"structure": "layout description",
"confidence": 0.95,
"pages": 1
}
"""
validate_ollama_request(VISION_MODEL)
# Convert file to base64 for vision model
content = file_path.read_bytes()
b64_content = base64.b64encode(content).decode()
# Call Ollama vision endpoint
async with httpx.AsyncClient() as client:
response = await client.post(
f"{OLLAMA_BASE_URL}/api/chat",
json={
"model": VISION_MODEL,
"messages": [{
"role": "user",
"content": "Extract all text from this document. Preserve structure. Return as plain text.",
"images": [b64_content]
}],
"stream": False
}
)
response.raise_for_status()
result = response.json()
return {
"text": result["message"]["content"],
"structure": "document",
"confidence": 0.9, # Estimate or parse from model
"pages": 1 # Count if PDF
}
Test document types:
- Field trip forms (PDF)
- Doctor appointment cards (image)
- School newsletters (PDF)
- Handwritten notes (image)
Component 2: Briefing Card Generator (icarus/core/briefing/)
Create: icarus/core/briefing/generator.py
"""Generate contextual briefing cards from parsed documents."""
from icarus.core.config.staging import OLLAMA_PRIMARY_MODEL
from icarus.core.utils.model_gate import validate_ollama_request
import httpx
import json
async def generate_briefing(parsed_doc: dict, context: dict) -> dict:
"""
Generate a briefing card from parsed document + calendar context.
Args:
parsed_doc: Output from vision parser
context: {
"calendar_events": [...], # Conflicts, nearby events
"family_members": [...], # Who this affects
"urgency": "high/medium/low"
}
Returns:
{
"title": "Field Trip: Museum of Science",
"summary": "Sully's class trip on May 15...",
"key_details": {
"date": "2026-05-15",
"time": "9:00 AM - 2:00 PM",
"location": "Museum of Science",
"cost": "$15",
"action_required": "Permission slip + lunch"
},
"conflicts": ["Harper violin 4:00 PM"],
"suggested_actions": [
"Sign permission slip",
"Pack lunch",
"Set reminder for 8:30 AM departure"
],
"confidence": 0.92
}
"""
validate_ollama_request(OLLAMA_PRIMARY_MODEL)
prompt = f"""You are Icarus, a family context engine. Generate a briefing card from this document.
DOCUMENT TEXT:
{parsed_doc['text']}
CALENDAR CONTEXT:
{json.dumps(context['calendar_events'], indent=2)}
Create a briefing card with:
1. Clear title
2. One-paragraph summary
3. Key details (date, time, location, cost, requirements)
4. Any calendar conflicts
5. Suggested actions
6. Confidence score (0-1)
Return as JSON."""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{OLLAMA_BASE_URL}/api/chat",
json={
"model": OLLAMA_PRIMARY_MODEL,
"messages": [{"role": "user", "content": prompt}],
"format": "json",
"stream": False
}
)
response.raise_for_status()
result = response.json()
return json.loads(result["message"]["content"])
Component 3: Email Attachment Pipeline
Extend: icarus/core/email_fetcher.py or create icarus/core/vision/pipeline.py
"""End-to-end: Email attachment → briefing card."""
from pathlib import Path
import tempfile
from icarus.core.vision.parser import parse_document
from icarus.core.briefing.generator import generate_briefing
from icarus.core.calendar_sync import get_upcoming_events # For context
async def process_attachment(email_meta: dict, attachment: bytes, filename: str) -> dict:
"""
Process an email attachment through the vision pipeline.
Args:
email_meta: {"from", "subject", "date", "to"}
attachment: Raw bytes
filename: Original filename
Returns:
Briefing card dict
"""
# Save to temp file
suffix = Path(filename).suffix.lower()
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
tmp.write(attachment)
tmp_path = Path(tmp.name)
try:
# Step 1: Vision parse
if suffix in ['.pdf', '.png', '.jpg', '.jpeg']:
parsed = await parse_document(tmp_path)
else:
raise ValueError(f"Unsupported file type: {suffix}")
# Step 2: Gather context
# TODO: Query calendar for conflicts around parsed date
context = {
"calendar_events": [], # Populate from calendar_sync
"family_members": infer_recipients(email_meta),
"urgency": "medium"
}
# Step 3: Generate briefing
briefing = await generate_briefing(parsed, context)
return briefing
finally:
tmp_path.unlink(missing_ok=True)
def infer_recipients(email_meta: dict) -> list:
"""Infer which family members this email concerns."""
# Simple keyword matching for MVP
recipients = []
text = f"{email_meta.get('subject', '')} {email_meta.get('to', '')}".lower()
if 'sully' in text or 'sullivan' in text:
recipients.append('Sullivan')
if 'harper' in text:
recipients.append('Harper')
return recipients or ['Family']
Component 4: API Endpoints
Extend: icarus/core/api.py
from fastapi import FastAPI, File, UploadFile, HTTPException
from icarus.core.vision.pipeline import process_attachment
from icarus.core.briefing.generator import generate_briefing
app = FastAPI(title="Icarus", version="0.1.0")
# Existing health endpoint...
@app.post("/vision/parse")
async def vision_parse(file: UploadFile = File(...)):
"""Upload a document and get parsed text."""
from icarus.core.vision.parser import parse_document
import tempfile
suffix = Path(file.filename).suffix
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
content = await file.read()
tmp.write(content)
tmp_path = Path(tmp.name)
try:
result = await parse_document(tmp_path)
return result
finally:
tmp_path.unlink(missing_ok=True)
@app.post("/vision/briefing")
async def vision_briefing(file: UploadFile = File(...)):
"""Upload a document and get a full briefing card."""
content = await file.read()
email_meta = {
"from": "upload@icarus.local",
"subject": file.filename,
"date": "now",
"to": "family"
}
briefing = await process_attachment(email_meta, content, file.filename)
return briefing
@app.get("/vision/status")
async def vision_status():
"""Check vision model availability."""
import httpx
from icarus.core.config.staging import OLLAMA_BASE_URL
try:
async with httpx.AsyncClient() as client:
response = await client.get(f"{OLLAMA_BASE_URL}/api/tags")
models = response.json().get("models", [])
vision_ready = any("vl" in m["name"] for m in models)
return {
"ollama_ready": True,
"vision_model_ready": vision_ready,
"available_models": [m["name"] for m in models]
}
except Exception as e:
return {"ollama_ready": False, "error": str(e)}
VERIFICATION CHECKLIST
Before declaring Phase 3 complete:
# 1. Vision model available
curl https://icarus-test.hoffdesk.com/vision/status
# Expected: {"ollama_ready": true, "vision_model_ready": true}
# 2. Parse endpoint works
curl -X POST -F "file=@test-receipt.pdf" \
https://icarus-test.hoffdesk.com/vision/parse
# Expected: {"text": "...", "structure": "...", "confidence": 0.9}
# 3. Briefing endpoint works
curl -X POST -F "file=@test-field-trip.pdf" \
https://icarus-test.hoffdesk.com/vision/briefing
# Expected: Full briefing card JSON
TEST DOCUMENTS
Create sample test files in icarus/tests/fixtures/:
- test-field-trip.pdf — School field trip form
- test-appointment.jpg — Doctor appointment card photo
- test-newsletter.pdf — School newsletter snippet
INTEGRATION WITH EXISTING PIPELINE
The vision pipeline plugs into the email worker:
# In email_worker.py or webhook handler
if attachment.is_pdf_or_image():
briefing = await process_attachment(email_meta, attachment.bytes, attachment.filename)
# Send to Telegram
await send_briefing_card(briefing, chat_id)
# Optionally create calendar event if detected
if briefing.get("key_details", {}).get("date"):
await create_calendar_event_from_briefing(briefing)
SUCCESS CRITERIA
Phase 3 Complete When:
1. ✅ Vision parser extracts text from PDF/image
2. ✅ Briefing generator creates structured cards
3. ✅ API endpoints respond correctly
4. ✅ End-to-end: Upload file → Get briefing card
5. ✅ Model gate enforces 8B/3B limits
6. ✅ All data paths use DATA_DIR
BLOCKERS & ESCALATION
| Issue | Escalate To |
|---|---|
| Vision model not available on Gaming PC | Wadsworth → Matt |
| JSON parsing failures | Wadsworth (prompt engineering) |
| Ollama connection issues | Wadsworth → Matt |
| Scope creep | Matt (Director) |
Execute. Build the vision pipeline. Report daily progress.
Questions: @mention Wadsworth in The Hoffmann Board
Full spec: This document