๐Ÿ“„ icarus-phase-5-UAT-plan.md 10,283 bytes Apr 27, 2026 ๐Ÿ“‹ Raw

๐Ÿงช Icarus Phase 5 โ€” User Acceptance Test Plan

Director: Matt
Goal: Prove utility in real household before Phase 6
Target: 20+ documents, >90% auto-routing confidence
Timeline: 5-7 days of real-world usage


๐Ÿ“‹ Test Categories

Category System Test Cases Success Criteria
A Email Pipeline 5 tests Events auto-extracted, conflicts detected
B Icarus Document Intelligence 8 tests Documents routed correctly, confidence >90%
C Recipe Ecosystem 4 tests URL โ†’ grocery list โ†’ zone optimization
D System State & Dashboard 3 tests Real-time visibility, HTMX polling works
E End-to-End Workflows 4 tests Complete user journeys

Category A: Email Pipeline (family_assistant)

A1. Appointment Email Extraction

Input: Email with appointment confirmation (dentist, doctor, school event)
Action: Send test email to monitored inbox
Expected:
- Event extracted with correct date/time
- Calendar event created automatically
- Telegram notification sent with "Add to Calendar" button
Validation: Check Radicale calendar for created event

A2. Newsletter Extraction

Input: School newsletter with multiple events
Action: Forward newsletter to monitored inbox
Expected:
- Multiple events extracted
- No duplicate events (dedup working)
- Conflicts detected if events overlap existing calendar
Validation: Events appear in calendar; conflicts reported

A3. Conflict Detection

Setup: Create calendar event "Family Dinner" for Friday 6 PM
Input: Email with "PTO Meeting Friday 6:30 PM"
Expected:
- Conflict detected (30 min overlap)
- Telegram notification with resolution options
- "Reschedule PTO" / "Keep Both" / "Cancel PTO" buttons
Validation: Conflict acknowledged, user choice executes

A4. Slot Selection (Signup Genius)

Input: Email with Signup Genius link for parent-teacher conferences
Action: Click slot selection button in Telegram
Expected:
- Slots scraped and presented
- User selects slot โ†’ auto-registered
- Calendar event created with correct time
Validation: Confirmation email received, calendar updated

A5. Rejection Handling

Input: Email for "Summer Camp 2026" (already rejected previously)
Expected:
- Rejection rule triggered
- No calendar event created
- No notification sent (silent discard)
Validation: Email processed, no action taken


Category B: Icarus Document Intelligence

B1. School Document โ†’ Child Routing

Input: PDF: "First Grade Field Trip Permission Slip โ€” Sullivan's Class"
Action: Send to Icarus Telegram bot
Expected:
- Document classified: "permission_slip"
- Routed to: sully
- Confidence: >90%
- Briefing generated with action items
Validation: System State shows routing decision

B2. Medical Document โ†’ Parent Routing

Input: Photo of vaccination record for Harper
Action: Send photo to Icarus bot
Expected:
- OCR extracts text
- Routed to: aundrea, matt (both parents)
- Confidence: >85%
- Briefing includes medical context
Validation: Correct recipients, medical category

B3. Multi-Person Document

Input: "Parent-Teacher Conferences โ€” Both Parents Required"
Expected:
- Routed to: aundrea, matt
- Confidence: >95% (pattern match)
Validation: System State shows both assignees

B4. Uncertain Document (Edge Case)

Input: "Classroom volunteer signup โ€” snacks needed"
Expected:
- Low confidence match (<80%)
- Escalates to fuzzy LLM inference
- Routed with confidence score displayed
Validation: Correct fallback behavior, transparency

B5. Handwritten Note

Input: Photo of handwritten note: "Harper has OT on Tuesdays 4pm โ€” Ms. Johnson"
Expected:
- Vision OCR extracts text
- Routed to: harper
- Extracted: recurring event pattern
Validation: Handwriting recognition, recurring detection

B6. Invoice/Receipt (Phase 6 Preview)

Input: Vehicle service receipt
Expected:
- Document parsed
- No maintenance action (system disabled)
- Logged for Phase 6 validation
Validation: Receipt processed, no false maintenance alert

B7. Document Confidence Tracking

Input: 5 documents of varying clarity
Action: Review System State dashboard
Expected:
- Confidence distribution visible
- Failed matches logged
- Pattern: deterministic >90%, fuzzy <90%
Validation: Dashboard shows distribution bar chart

B8. YAML Inference Rules Validation

Setup: Add custom rule for "soccer practice"
Input: Document mentioning "Sully soccer practice Saturday 9am"
Expected:
- Custom rule triggers
- Routed to: sully
- Confidence: 0.95
Validation: Rule system extensible


Category C: Recipe Ecosystem

C1. Recipe URL Extraction

Input: Send URL to Icarus bot: https://www.allrecipes.com/recipe/12345/chicken-parmesan
Action: /recipe <url> or auto-detect
Expected:
- Ingredients extracted
- Temp grocery list created (5-min TTL)
- Inline keyboard: toggle ingredients
Validation: Ingredients match recipe

C2. Recipe Toggle UI

Input: Recipe loaded, toggle buttons displayed
Action: Click toggle buttons to deselect items already in pantry
Expected:
- Visual state updates (โœ“ โ†’ โ—‹)
- Temp state saved per item
- Persist across Telegram session
Validation: Toggle state consistent

C3. Grocery List Commit

Input: Recipe toggles configured
Action: Click "Commit to Grocery List"
Expected:
- Selected items added to persistent grocery list
- Items auto-zoned (ChromaDB matching)
- Confirmation message with zone breakdown
Validation: Grocery list updated, zones assigned

C4. Multi-Recipe Grocery Management

Setup: Add 3 recipes, commit ingredients from all
Action: /groceries command
Expected:
- Combined grocery list displayed
- Zone-optimized sorting
- Clear button functional
Validation: No duplicate items, zones correct

C5. Recipe with Non-Standard Site

Input: Recipe from unsupported site (no ld+json)
Action: LLM fallback extraction
Expected:
- Ingredients extracted via LLM
- Confidence displayed
- Manual verification suggested
Validation: Graceful degradation


Category D: System State & Dashboard

D1. System State Real-Time View

URL: https://icarus-test.hoffdesk.com/system/state
Tests:
- Family configuration displays correctly (4 members, 7 rules)
- Confidence distribution bar chart renders
- Recent routing decisions table shows last 10
- HTMX polling updates every 30s (watch for changes)
Validation: No CSS errors, mobile responsive

D2. JSON API Endpoint

URL: https://icarus-test.hoffdesk.com/system/state?format=json
Expected:
- Valid JSON response
- Family array populated
- Rules array populated
- Stats object present
Validation: JSON schema matches spec

D3. Dashboard Under Load

Test: Send 5 documents in rapid succession
Action: Watch dashboard during processing
Expected:
- Routing decisions appear in real-time
- No server errors
- HTMX updates smooth
Validation: Performance acceptable


Category E: End-to-End Workflows

E1. The School Paper Workflow

Scenario: School sends field trip permission slip
Steps:
1. Photo sent to Icarus bot
2. Document routed to child
3. Briefing includes deadline, cost, action required
4. Parent marks done or creates calendar reminder
Success: Full loop complete <2 minutes

E2. The Weekly Meal Plan Workflow

Scenario: Plan meals for week
Steps:
1. Send 5 recipe URLs to bot
2. Toggle ingredients (skip staples)
3. Commit all to grocery list
4. View optimized Costco route
5. Shop with zone-sorted list
Success: Grocery trip efficient, no forgotten items

E3. The Conflict Resolution Workflow

Scenario: Double-booking detected
Steps:
1. Calendar event exists: "Dinner with parents Friday 6pm"
2. Email arrives: "School play Friday 6:30pm"
3. Conflict detected, notification sent
4. User selects "Reschedule dinner"
5. Original event moved, new event added
Success: No manual calendar editing required

E4. The Document Archive Workflow

Scenario: Receipt needs saving
Steps:
1. Photo sent to Icarus bot
2. OCR extracts text
3. Document classified (receipt/invoice)
4. Briefing generated
5. Document stored with metadata
6. Searchable via future Brain query (Phase 6)
Success: Document retrievable later


๐Ÿ“Š Success Metrics

Metric Target Measurement
Documents processed โ‰ฅ20 Count in System State
Auto-routing accuracy >90% Correct assignment / Total
Manual corrections <10% User override count
Recipe extraction success >95% Ingredients match / Total
Grocery zone accuracy >90% Correct zone / Total items
System uptime 100% No staging outages
User satisfaction Pass Director subjective assessment

๐Ÿ—“๏ธ Daily UAT Schedule

Day 1-2: Setup & Baseline

  • [ ] Deploy latest Icarus staging
  • [ ] Verify all systems green
  • [ ] Run 5 controlled test documents
  • [ ] Baseline confidence scores

Day 3-5: Real-World Testing

  • [ ] Process real family documents as they arrive
  • [ ] Log all routing decisions
  • [ ] Note edge cases and failures
  • [ ] Recipe extraction daily use

Day 6-7: Analysis & Report

  • [ ] Compile metrics
  • [ ] Document failure modes
  • [ ] Identify Phase 5 blockers (if any)
  • [ ] Prepare Phase 6 recommendations

๐Ÿ› Known Issues (Watch For)

  1. OCR failures on poor lighting โ€” Note success/fail rate
  2. Confidence threshold edge cases โ€” Documents scoring 80-90%
  3. Recipe extraction on JavaScript-heavy sites โ€” May fail
  4. Mobile UI responsiveness โ€” Test on iPhone/Android

โœ… UAT Sign-Off Criteria

Phase 5 passes UAT when:
- [ ] 20+ documents processed
- [ ] >90% auto-routing accuracy
- [ ] <10% manual corrections required
- [ ] Director (Matt) confirms utility in daily use
- [ ] No critical bugs blocking daily workflows
- [ ] Phase 6 roadmap validated by real usage patterns


This plan evolves during testing. Update with learnings.