# ๐Ÿงช Icarus Phase 5 โ€” User Acceptance Test Plan **Director:** Matt **Goal:** Prove utility in real household before Phase 6 **Target:** 20+ documents, >90% auto-routing confidence **Timeline:** 5-7 days of real-world usage --- ## ๐Ÿ“‹ Test Categories | Category | System | Test Cases | Success Criteria | |----------|--------|-----------|------------------| | **A** | Email Pipeline | 5 tests | Events auto-extracted, conflicts detected | | **B** | Icarus Document Intelligence | 8 tests | Documents routed correctly, confidence >90% | | **C** | Recipe Ecosystem | 4 tests | URL โ†’ grocery list โ†’ zone optimization | | **D** | System State & Dashboard | 3 tests | Real-time visibility, HTMX polling works | | **E** | End-to-End Workflows | 4 tests | Complete user journeys | --- ## Category A: Email Pipeline (family_assistant) ### A1. Appointment Email Extraction **Input:** Email with appointment confirmation (dentist, doctor, school event) **Action:** Send test email to monitored inbox **Expected:** - Event extracted with correct date/time - Calendar event created automatically - Telegram notification sent with "Add to Calendar" button **Validation:** Check Radicale calendar for created event ### A2. Newsletter Extraction **Input:** School newsletter with multiple events **Action:** Forward newsletter to monitored inbox **Expected:** - Multiple events extracted - No duplicate events (dedup working) - Conflicts detected if events overlap existing calendar **Validation:** Events appear in calendar; conflicts reported ### A3. Conflict Detection **Setup:** Create calendar event "Family Dinner" for Friday 6 PM **Input:** Email with "PTO Meeting Friday 6:30 PM" **Expected:** - Conflict detected (30 min overlap) - Telegram notification with resolution options - "Reschedule PTO" / "Keep Both" / "Cancel PTO" buttons **Validation:** Conflict acknowledged, user choice executes ### A4. Slot Selection (Signup Genius) **Input:** Email with Signup Genius link for parent-teacher conferences **Action:** Click slot selection button in Telegram **Expected:** - Slots scraped and presented - User selects slot โ†’ auto-registered - Calendar event created with correct time **Validation:** Confirmation email received, calendar updated ### A5. Rejection Handling **Input:** Email for "Summer Camp 2026" (already rejected previously) **Expected:** - Rejection rule triggered - No calendar event created - No notification sent (silent discard) **Validation:** Email processed, no action taken --- ## Category B: Icarus Document Intelligence ### B1. School Document โ†’ Child Routing **Input:** PDF: "First Grade Field Trip Permission Slip โ€” Sullivan's Class" **Action:** Send to Icarus Telegram bot **Expected:** - Document classified: "permission_slip" - Routed to: sully - Confidence: >90% - Briefing generated with action items **Validation:** System State shows routing decision ### B2. Medical Document โ†’ Parent Routing **Input:** Photo of vaccination record for Harper **Action:** Send photo to Icarus bot **Expected:** - OCR extracts text - Routed to: aundrea, matt (both parents) - Confidence: >85% - Briefing includes medical context **Validation:** Correct recipients, medical category ### B3. Multi-Person Document **Input:** "Parent-Teacher Conferences โ€” Both Parents Required" **Expected:** - Routed to: aundrea, matt - Confidence: >95% (pattern match) **Validation:** System State shows both assignees ### B4. Uncertain Document (Edge Case) **Input:** "Classroom volunteer signup โ€” snacks needed" **Expected:** - Low confidence match (<80%) - Escalates to fuzzy LLM inference - Routed with confidence score displayed **Validation:** Correct fallback behavior, transparency ### B5. Handwritten Note **Input:** Photo of handwritten note: "Harper has OT on Tuesdays 4pm โ€” Ms. Johnson" **Expected:** - Vision OCR extracts text - Routed to: harper - Extracted: recurring event pattern **Validation:** Handwriting recognition, recurring detection ### B6. Invoice/Receipt (Phase 6 Preview) **Input:** Vehicle service receipt **Expected:** - Document parsed - No maintenance action (system disabled) - Logged for Phase 6 validation **Validation:** Receipt processed, no false maintenance alert ### B7. Document Confidence Tracking **Input:** 5 documents of varying clarity **Action:** Review System State dashboard **Expected:** - Confidence distribution visible - Failed matches logged - Pattern: deterministic >90%, fuzzy <90% **Validation:** Dashboard shows distribution bar chart ### B8. YAML Inference Rules Validation **Setup:** Add custom rule for "soccer practice" **Input:** Document mentioning "Sully soccer practice Saturday 9am" **Expected:** - Custom rule triggers - Routed to: sully - Confidence: 0.95 **Validation:** Rule system extensible --- ## Category C: Recipe Ecosystem ### C1. Recipe URL Extraction **Input:** Send URL to Icarus bot: `https://www.allrecipes.com/recipe/12345/chicken-parmesan` **Action:** `/recipe ` or auto-detect **Expected:** - Ingredients extracted - Temp grocery list created (5-min TTL) - Inline keyboard: toggle ingredients **Validation:** Ingredients match recipe ### C2. Recipe Toggle UI **Input:** Recipe loaded, toggle buttons displayed **Action:** Click toggle buttons to deselect items already in pantry **Expected:** - Visual state updates (โœ“ โ†’ โ—‹) - Temp state saved per item - Persist across Telegram session **Validation:** Toggle state consistent ### C3. Grocery List Commit **Input:** Recipe toggles configured **Action:** Click "Commit to Grocery List" **Expected:** - Selected items added to persistent grocery list - Items auto-zoned (ChromaDB matching) - Confirmation message with zone breakdown **Validation:** Grocery list updated, zones assigned ### C4. Multi-Recipe Grocery Management **Setup:** Add 3 recipes, commit ingredients from all **Action:** `/groceries` command **Expected:** - Combined grocery list displayed - Zone-optimized sorting - Clear button functional **Validation:** No duplicate items, zones correct ### C5. Recipe with Non-Standard Site **Input:** Recipe from unsupported site (no ld+json) **Action:** LLM fallback extraction **Expected:** - Ingredients extracted via LLM - Confidence displayed - Manual verification suggested **Validation:** Graceful degradation --- ## Category D: System State & Dashboard ### D1. System State Real-Time View **URL:** `https://icarus-test.hoffdesk.com/system/state` **Tests:** - Family configuration displays correctly (4 members, 7 rules) - Confidence distribution bar chart renders - Recent routing decisions table shows last 10 - HTMX polling updates every 30s (watch for changes) **Validation:** No CSS errors, mobile responsive ### D2. JSON API Endpoint **URL:** `https://icarus-test.hoffdesk.com/system/state?format=json` **Expected:** - Valid JSON response - Family array populated - Rules array populated - Stats object present **Validation:** JSON schema matches spec ### D3. Dashboard Under Load **Test:** Send 5 documents in rapid succession **Action:** Watch dashboard during processing **Expected:** - Routing decisions appear in real-time - No server errors - HTMX updates smooth **Validation:** Performance acceptable --- ## Category E: End-to-End Workflows ### E1. The School Paper Workflow **Scenario:** School sends field trip permission slip **Steps:** 1. Photo sent to Icarus bot 2. Document routed to child 3. Briefing includes deadline, cost, action required 4. Parent marks done or creates calendar reminder **Success:** Full loop complete <2 minutes ### E2. The Weekly Meal Plan Workflow **Scenario:** Plan meals for week **Steps:** 1. Send 5 recipe URLs to bot 2. Toggle ingredients (skip staples) 3. Commit all to grocery list 4. View optimized Costco route 5. Shop with zone-sorted list **Success:** Grocery trip efficient, no forgotten items ### E3. The Conflict Resolution Workflow **Scenario:** Double-booking detected **Steps:** 1. Calendar event exists: "Dinner with parents Friday 6pm" 2. Email arrives: "School play Friday 6:30pm" 3. Conflict detected, notification sent 4. User selects "Reschedule dinner" 5. Original event moved, new event added **Success:** No manual calendar editing required ### E4. The Document Archive Workflow **Scenario:** Receipt needs saving **Steps:** 1. Photo sent to Icarus bot 2. OCR extracts text 3. Document classified (receipt/invoice) 4. Briefing generated 5. Document stored with metadata 6. Searchable via future Brain query (Phase 6) **Success:** Document retrievable later --- ## ๐Ÿ“Š Success Metrics | Metric | Target | Measurement | |--------|--------|-------------| | Documents processed | โ‰ฅ20 | Count in System State | | Auto-routing accuracy | >90% | Correct assignment / Total | | Manual corrections | <10% | User override count | | Recipe extraction success | >95% | Ingredients match / Total | | Grocery zone accuracy | >90% | Correct zone / Total items | | System uptime | 100% | No staging outages | | User satisfaction | Pass | Director subjective assessment | --- ## ๐Ÿ—“๏ธ Daily UAT Schedule ### Day 1-2: Setup & Baseline - [ ] Deploy latest Icarus staging - [ ] Verify all systems green - [ ] Run 5 controlled test documents - [ ] Baseline confidence scores ### Day 3-5: Real-World Testing - [ ] Process real family documents as they arrive - [ ] Log all routing decisions - [ ] Note edge cases and failures - [ ] Recipe extraction daily use ### Day 6-7: Analysis & Report - [ ] Compile metrics - [ ] Document failure modes - [ ] Identify Phase 5 blockers (if any) - [ ] Prepare Phase 6 recommendations --- ## ๐Ÿ› Known Issues (Watch For) 1. **OCR failures on poor lighting** โ€” Note success/fail rate 2. **Confidence threshold edge cases** โ€” Documents scoring 80-90% 3. **Recipe extraction on JavaScript-heavy sites** โ€” May fail 4. **Mobile UI responsiveness** โ€” Test on iPhone/Android --- ## โœ… UAT Sign-Off Criteria Phase 5 passes UAT when: - [ ] 20+ documents processed - [ ] >90% auto-routing accuracy - [ ] <10% manual corrections required - [ ] Director (Matt) confirms utility in daily use - [ ] No critical bugs blocking daily workflows - [ ] Phase 6 roadmap validated by real usage patterns --- _This plan evolves during testing. Update with learnings._