๐งช Icarus Phase 5 โ User Acceptance Test Plan
Director: Matt
Goal: Prove utility in real household before Phase 6
Target: 20+ documents, >90% auto-routing confidence
Timeline: 5-7 days of real-world usage
๐ Test Categories
| Category | System | Test Cases | Success Criteria |
|---|---|---|---|
| A | Email Pipeline | 5 tests | Events auto-extracted, conflicts detected |
| B | Icarus Document Intelligence | 8 tests | Documents routed correctly, confidence >90% |
| C | Recipe Ecosystem | 4 tests | URL โ grocery list โ zone optimization |
| D | System State & Dashboard | 3 tests | Real-time visibility, HTMX polling works |
| E | End-to-End Workflows | 4 tests | Complete user journeys |
Category A: Email Pipeline (family_assistant)
A1. Appointment Email Extraction
Input: Email with appointment confirmation (dentist, doctor, school event)
Action: Send test email to monitored inbox
Expected:
- Event extracted with correct date/time
- Calendar event created automatically
- Telegram notification sent with "Add to Calendar" button
Validation: Check Radicale calendar for created event
A2. Newsletter Extraction
Input: School newsletter with multiple events
Action: Forward newsletter to monitored inbox
Expected:
- Multiple events extracted
- No duplicate events (dedup working)
- Conflicts detected if events overlap existing calendar
Validation: Events appear in calendar; conflicts reported
A3. Conflict Detection
Setup: Create calendar event "Family Dinner" for Friday 6 PM
Input: Email with "PTO Meeting Friday 6:30 PM"
Expected:
- Conflict detected (30 min overlap)
- Telegram notification with resolution options
- "Reschedule PTO" / "Keep Both" / "Cancel PTO" buttons
Validation: Conflict acknowledged, user choice executes
A4. Slot Selection (Signup Genius)
Input: Email with Signup Genius link for parent-teacher conferences
Action: Click slot selection button in Telegram
Expected:
- Slots scraped and presented
- User selects slot โ auto-registered
- Calendar event created with correct time
Validation: Confirmation email received, calendar updated
A5. Rejection Handling
Input: Email for "Summer Camp 2026" (already rejected previously)
Expected:
- Rejection rule triggered
- No calendar event created
- No notification sent (silent discard)
Validation: Email processed, no action taken
Category B: Icarus Document Intelligence
B1. School Document โ Child Routing
Input: PDF: "First Grade Field Trip Permission Slip โ Sullivan's Class"
Action: Send to Icarus Telegram bot
Expected:
- Document classified: "permission_slip"
- Routed to: sully
- Confidence: >90%
- Briefing generated with action items
Validation: System State shows routing decision
B2. Medical Document โ Parent Routing
Input: Photo of vaccination record for Harper
Action: Send photo to Icarus bot
Expected:
- OCR extracts text
- Routed to: aundrea, matt (both parents)
- Confidence: >85%
- Briefing includes medical context
Validation: Correct recipients, medical category
B3. Multi-Person Document
Input: "Parent-Teacher Conferences โ Both Parents Required"
Expected:
- Routed to: aundrea, matt
- Confidence: >95% (pattern match)
Validation: System State shows both assignees
B4. Uncertain Document (Edge Case)
Input: "Classroom volunteer signup โ snacks needed"
Expected:
- Low confidence match (<80%)
- Escalates to fuzzy LLM inference
- Routed with confidence score displayed
Validation: Correct fallback behavior, transparency
B5. Handwritten Note
Input: Photo of handwritten note: "Harper has OT on Tuesdays 4pm โ Ms. Johnson"
Expected:
- Vision OCR extracts text
- Routed to: harper
- Extracted: recurring event pattern
Validation: Handwriting recognition, recurring detection
B6. Invoice/Receipt (Phase 6 Preview)
Input: Vehicle service receipt
Expected:
- Document parsed
- No maintenance action (system disabled)
- Logged for Phase 6 validation
Validation: Receipt processed, no false maintenance alert
B7. Document Confidence Tracking
Input: 5 documents of varying clarity
Action: Review System State dashboard
Expected:
- Confidence distribution visible
- Failed matches logged
- Pattern: deterministic >90%, fuzzy <90%
Validation: Dashboard shows distribution bar chart
B8. YAML Inference Rules Validation
Setup: Add custom rule for "soccer practice"
Input: Document mentioning "Sully soccer practice Saturday 9am"
Expected:
- Custom rule triggers
- Routed to: sully
- Confidence: 0.95
Validation: Rule system extensible
Category C: Recipe Ecosystem
C1. Recipe URL Extraction
Input: Send URL to Icarus bot: https://www.allrecipes.com/recipe/12345/chicken-parmesan
Action: /recipe <url> or auto-detect
Expected:
- Ingredients extracted
- Temp grocery list created (5-min TTL)
- Inline keyboard: toggle ingredients
Validation: Ingredients match recipe
C2. Recipe Toggle UI
Input: Recipe loaded, toggle buttons displayed
Action: Click toggle buttons to deselect items already in pantry
Expected:
- Visual state updates (โ โ โ)
- Temp state saved per item
- Persist across Telegram session
Validation: Toggle state consistent
C3. Grocery List Commit
Input: Recipe toggles configured
Action: Click "Commit to Grocery List"
Expected:
- Selected items added to persistent grocery list
- Items auto-zoned (ChromaDB matching)
- Confirmation message with zone breakdown
Validation: Grocery list updated, zones assigned
C4. Multi-Recipe Grocery Management
Setup: Add 3 recipes, commit ingredients from all
Action: /groceries command
Expected:
- Combined grocery list displayed
- Zone-optimized sorting
- Clear button functional
Validation: No duplicate items, zones correct
C5. Recipe with Non-Standard Site
Input: Recipe from unsupported site (no ld+json)
Action: LLM fallback extraction
Expected:
- Ingredients extracted via LLM
- Confidence displayed
- Manual verification suggested
Validation: Graceful degradation
Category D: System State & Dashboard
D1. System State Real-Time View
URL: https://icarus-test.hoffdesk.com/system/state
Tests:
- Family configuration displays correctly (4 members, 7 rules)
- Confidence distribution bar chart renders
- Recent routing decisions table shows last 10
- HTMX polling updates every 30s (watch for changes)
Validation: No CSS errors, mobile responsive
D2. JSON API Endpoint
URL: https://icarus-test.hoffdesk.com/system/state?format=json
Expected:
- Valid JSON response
- Family array populated
- Rules array populated
- Stats object present
Validation: JSON schema matches spec
D3. Dashboard Under Load
Test: Send 5 documents in rapid succession
Action: Watch dashboard during processing
Expected:
- Routing decisions appear in real-time
- No server errors
- HTMX updates smooth
Validation: Performance acceptable
Category E: End-to-End Workflows
E1. The School Paper Workflow
Scenario: School sends field trip permission slip
Steps:
1. Photo sent to Icarus bot
2. Document routed to child
3. Briefing includes deadline, cost, action required
4. Parent marks done or creates calendar reminder
Success: Full loop complete <2 minutes
E2. The Weekly Meal Plan Workflow
Scenario: Plan meals for week
Steps:
1. Send 5 recipe URLs to bot
2. Toggle ingredients (skip staples)
3. Commit all to grocery list
4. View optimized Costco route
5. Shop with zone-sorted list
Success: Grocery trip efficient, no forgotten items
E3. The Conflict Resolution Workflow
Scenario: Double-booking detected
Steps:
1. Calendar event exists: "Dinner with parents Friday 6pm"
2. Email arrives: "School play Friday 6:30pm"
3. Conflict detected, notification sent
4. User selects "Reschedule dinner"
5. Original event moved, new event added
Success: No manual calendar editing required
E4. The Document Archive Workflow
Scenario: Receipt needs saving
Steps:
1. Photo sent to Icarus bot
2. OCR extracts text
3. Document classified (receipt/invoice)
4. Briefing generated
5. Document stored with metadata
6. Searchable via future Brain query (Phase 6)
Success: Document retrievable later
๐ Success Metrics
| Metric | Target | Measurement |
|---|---|---|
| Documents processed | โฅ20 | Count in System State |
| Auto-routing accuracy | >90% | Correct assignment / Total |
| Manual corrections | <10% | User override count |
| Recipe extraction success | >95% | Ingredients match / Total |
| Grocery zone accuracy | >90% | Correct zone / Total items |
| System uptime | 100% | No staging outages |
| User satisfaction | Pass | Director subjective assessment |
๐๏ธ Daily UAT Schedule
Day 1-2: Setup & Baseline
- [ ] Deploy latest Icarus staging
- [ ] Verify all systems green
- [ ] Run 5 controlled test documents
- [ ] Baseline confidence scores
Day 3-5: Real-World Testing
- [ ] Process real family documents as they arrive
- [ ] Log all routing decisions
- [ ] Note edge cases and failures
- [ ] Recipe extraction daily use
Day 6-7: Analysis & Report
- [ ] Compile metrics
- [ ] Document failure modes
- [ ] Identify Phase 5 blockers (if any)
- [ ] Prepare Phase 6 recommendations
๐ Known Issues (Watch For)
- OCR failures on poor lighting โ Note success/fail rate
- Confidence threshold edge cases โ Documents scoring 80-90%
- Recipe extraction on JavaScript-heavy sites โ May fail
- Mobile UI responsiveness โ Test on iPhone/Android
โ UAT Sign-Off Criteria
Phase 5 passes UAT when:
- [ ] 20+ documents processed
- [ ] >90% auto-routing accuracy
- [ ] <10% manual corrections required
- [ ] Director (Matt) confirms utility in daily use
- [ ] No critical bugs blocking daily workflows
- [ ] Phase 6 roadmap validated by real usage patterns
This plan evolves during testing. Update with learnings.