# Email Pipeline v2 — Smart Classification & Routing **Routed To:** Socrates 🧠 **Priority:** P1 — Newsletter handling broken **Context:** Family pipeline migration complete, classification layer missing --- ## Problem Statement Current `family/email.py` filter is too aggressive: - `skip_keywords = ["re:", "fw:", "unsubscribe", "newsletter", "promotion", "sale"]` - Newsletters are **silently dropped** - Matt wants them **summarized and forwarded**, not discarded --- ## Desired Behavior ``` Email arrives at assistant@hoffdesk.com ↓ [Classifier LLM] → Detects email type ↓ ├─ APPOINTMENT → Calendar + Telegram ├─ NEWSLETTER → Summarize + Telegram (no calendar) ├─ FAMILY → Calendar + Telegram └─ OTHER → Telegram notification only ``` --- ## Implementation Requirements ### 1. Email Classifier **Location:** `family/classifier.py` **Input:** Email subject, body, sender **Output:** One of `appointment`, `newsletter`, `family`, `other` **Classification Prompt:** ```python CLASSIFICATION_PROMPT = """Classify this email into exactly one category. Categories: - appointment: Contains meeting time, reservation, booking, appointment confirmation - newsletter: Digest, weekly updates, informational broadcast (not personal) - family: From family members, personal updates, household matters - other: Everything else Email Subject: {subject} Email Body: {body[:1000]} Sender: {sender} Respond with ONLY the category name, lowercase, no explanation. """ ``` ### 2. Route Handlers **Location:** `family/handlers/` | Handler | File | Purpose | |---------|------|---------| | Appointment | `appointment.py` | Extract event → Calendar + Notify | | Newsletter | `newsletter.py` | Summarize → Notify only | | Family | `family.py` | Calendar + Notify | | Other | `other.py` | Notify only | ### 3. Newsletter Handler Specifics **Input:** Full newsletter email **Output:** Telegram message with summary **Summarization Prompt:** ```python NEWSLETTER_SUMMARY_PROMPT = """Summarize this newsletter for Matt. Keep it brief (3-5 bullet points max). Highlight actionable items or key insights. Mention the source/newsletter name. Newsletter: {content} Format: 📰 • Key point 1 • Key point 2 (if actionable) • Key point 3 [Link if present in original]""" ``` ### 4. Pipeline Updates **Modify:** `family/pipeline.py` ```python async def process_email(self, subject, body, sender, received_at): # Step 1: Classify email_type = await self.classifier.classify(subject, body, sender) # Step 2: Route to handler handler = self.handlers.get(email_type) if not handler: handler = self.handlers['other'] result = await handler.process(subject, body, sender, received_at) return result ``` ### 5. Remove Hardcoded Filter **Delete from:** `family/email.py` - Remove `should_process()` or make it classification-aware - No more silent dropping --- ## Acceptance Criteria | Test | Expected Result | |------|-----------------| | Newsletter arrives | Classified as `newsletter`, summarized, Telegram sent, no calendar | | Appointment arrives | Classified as `appointment`, calendar event created, Telegram sent | | Family email arrives | Classified as `family`, calendar + Telegram | | Random email arrives | Classified as `other`, Telegram notification only | --- ## Files to Create/Modify ``` hoffdesk-api/ ├── family/ │ ├── classifier.py # NEW │ ├── handlers/ │ │ ├── __init__.py │ │ ├── appointment.py # MOVE from email.py │ │ ├── newsletter.py # NEW │ │ ├── family.py # NEW │ │ └── other.py # NEW │ ├── email.py # MODIFY — remove filter, keep extraction │ └── pipeline.py # MODIFY — add classification + routing ``` --- ## Notes - Use local LLM (phi4:14b or qwen2.5-coder:7b) for classification - Classification should be fast (< 2s) - Newsletter summarization can take longer (< 10s) - Log classification results for debugging --- **Assigned:** Socrates 🧠 **Requester:** Wadsworth 📋 (routing from Matt) **Deadline:** Flexible, but newsletter currently broken