Email Pipeline v2 — Smart Classification & Routing
Routed To: Socrates 🧠
Priority: P1 — Newsletter handling broken
Context: Family pipeline migration complete, classification layer missing
Problem Statement
Current family/email.py filter is too aggressive:
- skip_keywords = ["re:", "fw:", "unsubscribe", "newsletter", "promotion", "sale"]
- Newsletters are silently dropped
- Matt wants them summarized and forwarded, not discarded
Desired Behavior
Email arrives at assistant@hoffdesk.com
↓
[Classifier LLM] → Detects email type
↓
├─ APPOINTMENT → Calendar + Telegram
├─ NEWSLETTER → Summarize + Telegram (no calendar)
├─ FAMILY → Calendar + Telegram
└─ OTHER → Telegram notification only
Implementation Requirements
1. Email Classifier
Location: family/classifier.py
Input: Email subject, body, sender
Output: One of appointment, newsletter, family, other
Classification Prompt:
CLASSIFICATION_PROMPT = """Classify this email into exactly one category.
Categories:
- appointment: Contains meeting time, reservation, booking, appointment confirmation
- newsletter: Digest, weekly updates, informational broadcast (not personal)
- family: From family members, personal updates, household matters
- other: Everything else
Email Subject: {subject}
Email Body: {body[:1000]}
Sender: {sender}
Respond with ONLY the category name, lowercase, no explanation.
"""
2. Route Handlers
Location: family/handlers/
| Handler | File | Purpose |
|---|---|---|
| Appointment | appointment.py |
Extract event → Calendar + Notify |
| Newsletter | newsletter.py |
Summarize → Notify only |
| Family | family.py |
Calendar + Notify |
| Other | other.py |
Notify only |
3. Newsletter Handler Specifics
Input: Full newsletter email
Output: Telegram message with summary
Summarization Prompt:
NEWSLETTER_SUMMARY_PROMPT = """Summarize this newsletter for Matt.
Keep it brief (3-5 bullet points max).
Highlight actionable items or key insights.
Mention the source/newsletter name.
Newsletter: {content}
Format:
📰 <Newsletter Name>
• Key point 1
• Key point 2 (if actionable)
• Key point 3
[Link if present in original]"""
4. Pipeline Updates
Modify: family/pipeline.py
async def process_email(self, subject, body, sender, received_at):
# Step 1: Classify
email_type = await self.classifier.classify(subject, body, sender)
# Step 2: Route to handler
handler = self.handlers.get(email_type)
if not handler:
handler = self.handlers['other']
result = await handler.process(subject, body, sender, received_at)
return result
5. Remove Hardcoded Filter
Delete from: family/email.py
- Remove should_process() or make it classification-aware
- No more silent dropping
Acceptance Criteria
| Test | Expected Result |
|---|---|
| Newsletter arrives | Classified as newsletter, summarized, Telegram sent, no calendar |
| Appointment arrives | Classified as appointment, calendar event created, Telegram sent |
| Family email arrives | Classified as family, calendar + Telegram |
| Random email arrives | Classified as other, Telegram notification only |
Files to Create/Modify
hoffdesk-api/
├── family/
│ ├── classifier.py # NEW
│ ├── handlers/
│ │ ├── __init__.py
│ │ ├── appointment.py # MOVE from email.py
│ │ ├── newsletter.py # NEW
│ │ ├── family.py # NEW
│ │ └── other.py # NEW
│ ├── email.py # MODIFY — remove filter, keep extraction
│ └── pipeline.py # MODIFY — add classification + routing
Notes
- Use local LLM (phi4:14b or qwen2.5-coder:7b) for classification
- Classification should be fast (< 2s)
- Newsletter summarization can take longer (< 10s)
- Log classification results for debugging
Assigned: Socrates 🧠
Requester: Wadsworth 📋 (routing from Matt)
Deadline: Flexible, but newsletter currently broken