# Silent Observer — Brain Intelligence Integration Spec

**Status:** Draft v1.1 (Director Revisions)  
**Date:** 2026-04-30  
**Author:** Wadsworth (Chief of Staff)  
**Owner:** Socrates (Backend Architecture)  
**Scope:** Phase 8 Integration Layer

---

## Executive Summary

This document specifies the integration architecture between the **Silent Observer** (Phase 8 — ambient chat monitoring) and the **Brain Intelligence** (Phase 6/7 — RAG-based knowledge retrieval). Together, they enable zero-UI household coordination where:

- The Observer listens to family chat without interrupting
- The Brain provides memory context for decisions
- Icarus speaks only when there's a real conflict or missing critical variable

**Non-Negotiable Constraints (Director-Level):**
1. **State Protection (No Auto-Writes):** All extractions require Human-in-the-Loop [Confirm]
2. **Asynchronous Processing:** Tier 1 releases chat thread immediately; Tier 2 is async
3. **The 'Redline' Speak Rule:** Silence is default; speak only on temporal/resource conflicts
4. **Staging Environment Only:** Connects only to staging.db with Miller Family context

---

## Confidence Field Unification

**CRITICAL:** Confidence scores are NEVER multiplied or blended. Each stage has independent gates with clear thresholds.

### Stage Gates (Sequential, Not Multiplicative)

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                     CONFIDENCE GATE ARCHITECTURE                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Stage 1: Tripwire Confidence                                               │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                               │
│  • Calculated by: Regex pattern matcher (Tier 1)                            │
│  • Threshold: ≥0.7 to proceed to extraction                                │
│  • Logged field: tripwire_confidence (independent, 0.0-1.0)                  │
│  • If <0.7: DROP — message not queued for Tier 2                            │
│                                                                             │
│                         ↓ tripwire_confidence ≥ 0.7                         │
│                                                                             │
│  Stage 2: Extraction Confidence                                             │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                             │
│  • Calculated by: LLM extraction model (Tier 2)                             │
│  • Threshold: ≥0.6 for usable extraction                                    │
│  • Logged field: extraction_confidence (independent, 0.0-1.0)                │
│  • If <0.6: DROP — low-quality extraction, don't query Brain                 │
│                                                                             │
│                         ↓ extraction_confidence ≥ 0.6                       │
│                                                                             │
│  Stage 3: Brain Relevance Score                                             │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                             │
│  • Calculated by: Brain RAG retriever                                       │
│  • Threshold: ≥0.5 to use Brain context in redline decision                │
│  • Logged field: brain_relevance (independent, 0.0-1.0)                      │
│  • If <0.5: Proceed without Brain context (extraction-only redline)          │
│                                                                             │
│                         ↓ brain_relevance ≥ 0.5 (optional)                  │
│                                                                             │
│  Stage 4: Redline Decision                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                               │
│  • Calculated by: Rule-based conflict engine (no ML)                       │
│  • Output: SILENT | SPEAK                                                  │
│  • Logged field: redline_decision, redline_trigger_rule                      │
│  • Does NOT use confidence scores — uses extracted entities only             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Confidence Logging Schema

```sql
-- Each confidence is logged independently for debugging
CREATE TABLE observer_confidence_log (
    message_id TEXT NOT NULL,
    stage TEXT NOT NULL,                    -- 'tripwire' | 'extraction' | 'brain'
    confidence_type TEXT,                   -- 'regex_score' | 'llm_score' | 'relevance'
    score REAL NOT NULL,
    threshold REAL NOT NULL,
    passed BOOLEAN NOT NULL,
    decision TEXT NOT NULL,                 -- 'proceed' | 'drop' | 'proceed_no_context'
    logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

### Anti-Pattern: NEVER Do This

```python
# WRONG — blending scores creates debugging nightmare
final_confidence = tripwire_score * extraction_score * brain_score
if final_confidence > 0.5:  # What failed? Who knows.
    ...

# CORRECT — each gate independent, clear failure point
if tripwire_score < 0.7:
    log("TRIPWIRE_REJECT", score=tripwire_score, threshold=0.7)
    return DROP
    
if extraction_score < 0.6:
    log("EXTRACTION_REJECT", score=extraction_score, threshold=0.6)
    return DROP
    
# Brain relevance is optional — proceed with or without
use_brain = brain_relevance >= 0.5
```

---

## Multi-Message Context Architecture

**Status:** P0 Priority — Moved to Phase 8.3 (was deferred)

**Why:** 60% of real coordination happens across multiple messages. Single-message extraction misses critical context.

### The Problem

```
Message A (09:00): "Who's picking up Leo Thursday after soccer?"
Message B (09:15): "I'll do it"
                      ↑
              Who is "I"? What activity? What time?
              Without Message A, Message B is unresolvable.
```

### Context Window Specification

```python
MULTI_MESSAGE_CONFIG = {
    "context_window_size": 3,           # Last 3 messages in thread
    "context_ttl_seconds": 300,         # Messages expire after 5 minutes
    "attribution_window": 600,          # Look back 10 min for matching context
    "max_participants": 4,              # Family group size limit
}
```

### Attribution Logic

```python
def resolve_attribution(current_message, context_window):
    """
    Match "I'll do it" to the action in prior messages.
    Returns: resolved_assignment or None
    """
    
    # Step 1: Extract from current message
    current = extract(current_message)
    if current.get("assigned_to") is not None:
        # Already has assignment (e.g., "John will pick up")
        return current
    
    # Step 2: Look for pronoun references in current message
    pronouns = ["i'll", "i will", "i can", "i'm", "i am", "me"]
    if not any(p in current_message.text.lower() for p in pronouns):
        # Not an assignment message
        return current
    
    # Step 3: Search context window for matching coordination
    for prior in reversed(context_window):
        prior_extracted = prior.get("extraction", {})
        
        # Match criteria:
        # 1. Prior has missing assignment (who's/who is/can someone)
        # 2. Same temporal scope (date overlap)
        # 3. Same activity or child mentioned
        if (
            prior_extracted.get("assigned_to") == "unspecified" and
            dates_overlap(current.get("dates"), prior_extracted.get("dates")) and
            (activities_match(current.get("activity"), prior_extracted.get("activity")) or
             children_match(current.get("child"), prior_extracted.get("child")))
        ):
            # Attribution found
            current["assigned_to"] = current_message.sender  # "I" = sender of Message B
            current["attributed_to_message_id"] = prior["message_id"]
            current["attribution_confidence"] = 0.85
            return current
    
    # No match found — extraction incomplete
    current["assigned_to"] = "unspecified"
    current["attribution_confidence"] = 0.3
    return current
```

### Context-Aware Tripwire

```python
# Tripwire now operates on message thread, not just single message
def tripwire_with_context(message, context_window):
    # Score current message alone
    base_score = pattern_match_score(message.text)
    
    # Boost score if context suggests coordination thread
    if is_coordination_thread(context_window):
        # Prior messages contained questions about assignments
        base_score = min(1.0, base_score + 0.15)
    
    # Boost if current message is short response in context thread
    if (
        len(message.text.split()) <= 5 and  # "I'll do it", "yeah me", "sure"
        is_coordination_thread(context_window)
    ):
        base_score = min(1.0, base_score + 0.20)
    
    return base_score
```

### Database Schema Updates

```sql
-- Track message threading for multi-message context
ALTER TABLE observer_messages ADD COLUMN thread_id TEXT;
ALTER TABLE observer_messages ADD COLUMN context_window TEXT; -- JSON array of prior message_ids
ALTER TABLE observer_messages ADD COLUMN attribution_source TEXT; -- message_id this was resolved from

-- Index for thread lookups
CREATE INDEX idx_observer_thread ON observer_messages(thread_id, sent_at);
```

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           FAMILY CHAT (Telegram Group)                     │
│                     Members: John, Sarah, Icarus (bot)                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  TIER 1: TRIPWIRE (Python/Regex)                                            │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                              │
│  • Latency: <10ms                                                            │
│  • Resource: CPU only (Beelink)                                              │
│  • Pattern match: dates, times, coordination keywords                        │
│  • Context-aware: Scans last 3 messages for thread continuity              │
│  • Immediate release of chat thread                                          │
│                                                                              │
│  Pattern Categories:                                                        │
│  ├── Temporal: "tomorrow", "June 4th", "next week", "Tuesday 3pm"          │
│  ├── Assignment: "I'll pick up", "can you cover", "who's getting"            │
│  ├── Pronoun Resolution: "I'll do it", "me too", "yeah" (needs context)   │
│  ├── Children: "Leo", "Mia", "kids", "the children"                        │
│  ├── Activities: "soccer", "ballet", "swim", "chess", "practice"          │
│  └── Conflict markers: "but", "wait", "doesn't", "conflict", "overlap"     │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────┐
                    ▼                                   ▼
              NO MATCH (≥95%)                    MATCH (≤5%)
         ┌─────────────────────┐              ┌─────────────────────┐
         │   Drop Silently     │              │ Queue for Tier 2    │
         │   Log to observer_  │              │ (Redis/Queue)       │
         │   messages table    │              │ Return 200 OK to    │
         │   (no processing)   │              │ Telegram            │
         └─────────────────────┘              └─────────────────────┘
                                                      │
                                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  TIER 2: ASYNC PROCESSING (8B LLM + Brain Query)                           │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                            │
│  • Worker: Celery/Background task (Beelink)                                  │
│  • LLM: phi4:14b or llama3.1:8b via Gaming PC (Tailscale)                  │
│  • Latency: 1-3s (acceptable — async)                                        │
│  • Brain Query: HTTPS to icarus-test.hoffdesk.com/brain/query                │
│  • Context: Fetches last 3 messages for multi-message resolution             │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  COORDINATION STATE EXTRACTOR                                              │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                              │
│                                                                              │
│  Input: Chat message + Context window (last 3 messages) + Brain context   │
│  Output: Structured extraction with independent confidence scores            │
│                                                                              │
│  {                                                                           │
│    "is_coordination": true | false,                                        │
│    "coordination_type": "transport" | "care_coverage" |                    │
│                         "schedule_change" | "activity_confirm" | null,     │
│    "dates": ["2026-06-04", "2026-06-05"],                                │
│    "times": ["15:30", "16:00"],                                          │
│    "assigned_to": ["john" | "sarah" | "unspecified"],                     │
│    "child": ["leo" | "mia" | "both" | null],                              │
│    "activity": "soccer" | "ballet" | "swim" | "chess" | null,              │
│    "location": "Westside Park" | "Madison Dance Academy" | null,          │
│    "action_required": "confirm" | "resolve_conflict" | "notify" | null,  │
│    "extracted_entities": [...],                                            │
│    "attribution": {                                                        │
│      "source_message_id": "...",                                          │
│      "attribution_confidence": 0.85                                        │
│    },                                                                        │
│    "brain_query_context": {...} | null,                                   │
│    "confidence_scores": {                                                   │
│      "tripwire_confidence": 0.85,                                          │
│      "extraction_confidence": 0.72,                                        │
│      "brain_relevance": 0.68                                               │
│    }                                                                         │
│  }                                                                           │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  REDLINE DECISION ENGINE                                                   │
│  ━━━━━━━━━━━━━━━━━━━━━━━━                                                  │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  RULE 1: Temporal Conflict                                          │   │
│  │  ━━━━━━━━━━━━━━━━━━━━━━━━━━━                                        │   │
│  │  Condition: Extracted date/time overlaps with existing Event Graph   │   │
│  │            entry for same child + overlapping times                  │   │
│  │  Confidence gate: extraction_confidence ≥ 0.6 (already verified)    │   │
│  │                                                                     │   │
│  │  Action: SPEAK — "⚠️ Leo has soccer at 4:30pm Thursday, but this     │   │
│  │          message mentions a dentist appointment at 4:00pm.           │   │
│  │          Which is correct?"                                         │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  RULE 2: Resource Conflict                                          │   │
│  │  ━━━━━━━━━━━━━━━━━━━━━━━━━━━                                        │   │
│  │  Condition: Both parents assigned to conflicting tasks at same time│   │
│  │  Confidence gate: extraction_confidence ≥ 0.6 (already verified)    │   │
│  │                                                                     │   │
│  │  Action: SPEAK — "🚨 John and Sarah both said they're covering      │   │
│  │          Tuesday 3pm pickup. Who's getting the kids?"             │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  RULE 3: Missing Critical Variable                                  │   │
│  │  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                    │   │
│  │  Condition: Coordination detected but 'assigned_to' or 'child'       │   │
│  │            is null AND activity is time-sensitive (<24h)           │   │
│  │  Confidence gate: extraction_confidence ≥ 0.6 (already verified)    │   │
│  │                                                                     │   │
│  │  Action: SPEAK — "⏳ I see 'someone' needs to cover Thursday 3pm    │   │
│  │          for Leo's early dismissal. Who's handling pickup?"         │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  RULE 4: All Other Cases                                            │   │
│  │  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                       │   │
│  │  Condition: No conflict, no missing critical variable                │   │
│  │                                                                     │   │
│  │  Action: SILENT — Store extraction as 'pending' with [Confirm]       │   │
│  │          button (no chat message)                                   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────┐
                    ▼                                   ▼
              ┌─────────┐                         ┌─────────────┐
              │  SILENT │                         │    SPEAK    │
              └────┬────┘                         └──────┬──────┘
                   │                                    │
                   ▼                                    ▼
┌──────────────────────────┐            ┌─────────────────────────────────────┐
│  STORE AS PENDING         │            │  POST TO GROUP CHAT               │
│  ━━━━━━━━━━━━━━━━━━━━━━━━ │            │  ━━━━━━━━━━━━━━━━━━━━━━━            │
│                           │            │                                     │
│  Table: pending_actions   │            │  Message includes:                │
│  Status: 'unconfirmed'    │            │  • Clear conflict description       │
│  UI: Telegram inline      │            │  • Suggested resolution             │
│      [Confirm] button     │            │  • [Resolve] inline button          │
│                           │            │                                     │
│  Human can review via:    │            │  Example:                          │
│  • /pending command       │            │  "⚠️ Conflict detected:             │
│  • Dashboard              │            │   Leo has soccer 4:30pm Tuesday,    │
│                           │            │   but message mentions chess club   │
│  [Confirm] →              │            │   at 4:00pm. Both?"                 │
│    writes to Event Graph │            │                                     │
└──────────────────────────┘            └─────────────────────────────────────┘
```

---

## Brevity Constraint (Cross-Reference Daedalus UX Specs)

**Source:** Daedalus UX Design System — Conversational Agents v2.1

### Maximum Message Lengths

| Message Type | Max Length | Rationale |
|--------------|------------|-----------|
| Conflict alert | 280 chars | Fits in single Telegram bubble |
| Clarification request | 200 chars | Quick to read, easy to answer |
| Confirmation summary | 140 chars | Twitter-length, scannable |

### Brevity Patterns

```python
BREVITY_TEMPLATES = {
    "temporal_conflict": "⚠️ {child} has {activity} at {time}, but message says {conflict}. Which is correct?",
    "resource_conflict": "🚨 Both {parent1} and {parent2} claim {task} on {day}. Who's covering?",
    "missing_assignment": "⏳ Someone needs to cover {child}'s {activity} on {day}. Who?",
    "missing_child": "⏳ {activity} mentioned for {day} — which child?",
}
```

### Anti-Patterns (Never Do)

```
❌ "I noticed that in your message at 9:15 AM, you mentioned..."
❌ "Based on my analysis of the conversation history..."
❌ "It appears there may be a potential scheduling conflict..."
❌ Multi-paragraph explanations

✅ "⚠️ Leo has soccer 4:30pm Thursday. Message says dentist 4pm. Which?"
✅ "🚨 Both John and Sarah claim Tuesday pickup. Who's covering?"
✅ "⏳ Someone needs to cover Leo Thursday 3pm. Who?"
```

---

## API Contract: Observer ↔ Brain

### 1. Brain Query Endpoint

**Current:** `https://icarus-test.hoffdesk.com/brain/query?q={question}`

**For Observer Integration:** Extend with structured query support

```http
GET /brain/query?q={question}&context=observer&format=json

Headers:
  X-Observer-Request: true
  X-Family-Context: miller  # staging only

Response (current):
{
  "answer": "Leo's soccer practice is Tuesdays and Thursdays at 4:30 PM...",
  "sources": [...],
  "confidence": "high"
}

Response (extended for Observer):
{
  "answer": "...",
  "sources": [...],
  "confidence": "high",
  "relevance_score": 0.769,                # ← Used for brain_relevance gate
  "temporal_entities": [
    {"type": "time", "value": "16:30", "context": "soccer practice"},
    {"type": "day", "value": "Tuesday", "context": "recurring"}
  ],
  "extracted_events": [
    {
      "summary": "Leo Soccer Practice",
      "start": "2026-05-05T16:30:00",
      "location": "Westside Park"
    }
  ]
}
```

### 2. Observer → Brain Query Patterns

The Observer queries the Brain for context before making Redline decisions:

| Observer Detected | Brain Query | Purpose |
|-------------------|-------------|---------|
| "Leo soccer Thursday" | "When is Leo's soccer practice?" | Verify against known schedule |
| "Mia has ballet Monday" | "What days does Mia have ballet?" | Check for conflicts |
| "pick up tomorrow 3pm" | "Who usually picks up the kids on [day]?" | Resource assignment history |
| "early dismissal Friday" | "Any events on Friday involving school?" | Temporal conflict check |

### 3. Brain Query Constraints

```python
# Observer-specific query limits
OBSERVER_BRAIN_CONFIG = {
    "max_queries_per_message": 3,      # Prevent query spam
    "query_timeout_ms": 5000,          # Fail fast if Brain slow
    "cache_ttl_seconds": 60,           # Cache recent queries
    "min_relevance_threshold": 0.5,    # Gate threshold (logged as brain_relevance)
    "staging_only": True,              # Enforce staging environment
}
```

---

## Data Flow: Complete Walkthrough

### Example 1: Conflict Detection (Speak)

**Chat Message (Sarah → Group):**
> "I'll pick up Leo from school tomorrow at 3pm and take him to soccer"

**Step-by-Step:**

1. **Tripwire Match** (Tier 1)
   - Patterns: "pick up", "tomorrow", "3pm", "soccer", "Leo"
   - tripwire_confidence: 0.85 (≥ 0.7 threshold ✓)
   - Action: Queue for Tier 2

2. **Async Processing** (Tier 2)
   - Extract: `{"date": "2026-05-01", "time": "15:00", "child": "leo", "activity": "soccer", "assigned_to": "sarah"}`
   - extraction_confidence: 0.78 (≥ 0.6 threshold ✓)

3. **Brain Query**
   - Query: "What time is Leo's soccer practice?"
   - Response: "Leo's soccer practice is Tuesdays and Thursdays at 4:30 PM at Westside Park"
   - brain_relevance: 0.82 (≥ 0.5 threshold ✓, use context)

4. **Redline Check**
   - Message says: 3pm pickup, soccer implied
   - Brain says: Soccer is 4:30pm (1.5h later)
   - No direct conflict, BUT time gap is suspicious

5. **Decision: SPEAK** (Rule 3 variant — clarification needed)
   ```
   "⏳ Leo pickup 3pm for soccer, but practice is 4:30pm. Different activity?"
   ```

---

### Example 2: Silent Storage (No Conflict)

**Chat Message (John → Group):**
> "Got the oil changed today. Next due in July."

**Step-by-Step:**

1. **Tripwire Match**
   - Patterns: "today", "July", "oil changed"
   - tripwire_confidence: 0.65 (< 0.7 threshold ✗)
   - Action: DROP — not coordination-related

---

### Example 3: Multi-Message Context Resolution (Silent)

**Message A (Sarah, 09:00):**
> "Who's picking up Leo Thursday after soccer?"

**Message B (John, 09:15):**
> "I'll do it"

**Step-by-Step:**

1. **Message A Tripwire**
   - tripwire_confidence: 0.88 (question about assignment)
   - Queued for Tier 2
   - Extraction: `{"date": "2026-05-08", "child": "leo", "activity": "soccer", "assigned_to": "unspecified"}`
   - Stored with thread_id

2. **Message B Tripwire (Context-Aware)**
   - Base score: 0.45 (just "I'll do it" — low confidence alone)
   - Context boost: +0.20 (short response in coordination thread)
   - Final tripwire_confidence: 0.65 (< 0.7 threshold...)
   - BUT: Attribution pattern detected
   - Override: Queue for Tier 2 with context window

3. **Tier 2 with Context**
   - Fetches Message A via thread_id
   - Attribution logic: John (sender of B) → assignment from A
   - Final extraction: `{"date": "2026-05-08", "child": "leo", "activity": "soccer", "assigned_to": "john", "attribution_confidence": 0.85}`

4. **Brain Query**
   - Query: "When is Leo's soccer practice?"
   - brain_relevance: 0.91 (confirms Thursday 4:30pm)

5. **Redline Check**
   - Assigned to: John
   - Brain context: No conflict detected
   - Decision: SILENT

6. **Store as Pending**
   - Type: `care_assignment`
   - Summary: "John assigned to pick up Leo from soccer Thursday 4:30pm"
   - [Confirm] button available

---

### Example 4: Resource Conflict (Speak)

**Chat Message 1 (John, 09:00):**
> "I can cover Tuesday pickup"

**Chat Message 2 (Sarah, 09:15):**
> "I'll get the kids Tuesday after work"

**Step-by-Step:**

1. **Tripwire Matches**
   - Both messages match assignment patterns
   - tripwire_confidence: 0.79, 0.82
   - Both queued for Tier 2

2. **Async Processing**
   - Extract John: `{"date": "2026-05-06", "assigned_to": "john", "task": "pickup"}`
     - extraction_confidence: 0.74
   - Extract Sarah: `{"date": "2026-05-06", "assigned_to": "sarah", "task": "pickup"}`
     - extraction_confidence: 0.81

3. **Brain Query**
   - Query for both: "Who usually picks up kids on Tuesdays?"
   - brain_relevance: 0.76 (historical pattern: usually John)

4. **Redline Check**
   - Same date
   - Same task (pickup)
   - Different assignees
   - **RESOURCE CONFLICT** (Rule 2)

5. **Decision: SPEAK**
   ```
   "🚨 John and Sarah both claim Tuesday pickup. Who's covering?"
   [John] [Sarah] [Both — carpool]
   ```

---

## Database Schema

### `observer_messages` — Raw Chat Log

```sql
CREATE TABLE observer_messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    message_id TEXT UNIQUE NOT NULL,        -- Telegram message ID
    chat_id TEXT NOT NULL,                   -- Group chat ID
    thread_id TEXT,                          -- Links messages in same conversation
    sender TEXT NOT NULL,                    -- john | sarah | icarus
    message_text TEXT NOT NULL,
    sent_at TIMESTAMP NOT NULL,
    context_window TEXT,                     -- JSON array of prior message_ids
    
    -- Tier 1 Tripwire (Independent confidence)
    tripwire_confidence REAL,                -- 0.0-1.0, threshold 0.7
    tripwire_patterns TEXT,                  -- JSON array of matched patterns
    tier1_decision TEXT,                     -- DROP | QUEUE
    
    -- Tier 2 Processing
    processed_at TIMESTAMP,
    extracted_json TEXT,                     -- Full extraction JSON
    attribution_source TEXT,                 -- message_id this was resolved from
    
    -- Brain Query (Independent relevance score)
    brain_queries TEXT,                      -- JSON array of Brain queries made
    brain_context TEXT,                      -- Brain responses
    brain_relevance REAL,                    -- 0.0-1.0, threshold 0.5
    
    -- Redline Decision (Rule-based, no confidence score)
    redline_decision TEXT,                   -- SILENT | SPEAK
    redline_trigger_rule TEXT,               -- temporal_conflict | resource_conflict | missing_variable
    
    -- Outcome
    action_id TEXT,                          -- FK to pending_actions (if SILENT)
    notification_id TEXT,                    -- Telegram message ID (if SPEAK)
    
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_observer_thread ON observer_messages(thread_id, sent_at);
CREATE INDEX idx_observer_tripwire ON observer_messages(tripwire_confidence, tier1_decision);
CREATE INDEX idx_observer_decision ON observer_messages(redline_decision, processed_at);
```

### `observer_confidence_log` — Debug Trail

```sql
CREATE TABLE observer_confidence_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    message_id TEXT NOT NULL,
    stage TEXT NOT NULL,                    -- 'tripwire' | 'extraction' | 'brain'
    confidence_type TEXT,                   -- 'regex_score' | 'llm_score' | 'relevance'
    score REAL NOT NULL,
    threshold REAL NOT NULL,
    passed BOOLEAN NOT NULL,
    decision TEXT NOT NULL,                 -- 'proceed' | 'drop' | 'proceed_no_context'
    logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_confidence_msg ON observer_confidence_log(message_id, stage);
```

### `pending_actions` — Human-in-the-Loop Queue

```sql
CREATE TABLE pending_actions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    action_id TEXT UNIQUE NOT NULL,          -- UUID for this action
    
    -- Source
    source_type TEXT NOT NULL,               -- OBSERVER | EMAIL_MANUAL | etc
    source_id TEXT,                          -- FK to observer_messages or email_id
    
    -- Extracted State (what the system thinks it heard)
    action_type TEXT NOT NULL,               -- maintenance_update | event_create | 
                                             -- care_assignment | schedule_change
    extracted_json TEXT NOT NULL,            -- Full extraction
    extraction_confidence REAL NOT NULL,     -- Independent confidence (not blended)
    
    -- Human Review State
    status TEXT NOT NULL DEFAULT 'pending',  -- pending | confirmed | rejected | expired
    suggested_event_json TEXT,               -- What would be written to Event Graph
    
    -- UI/UX (Brevity constraint enforced)
    summary_text TEXT NOT NULL,              -- Human-readable summary (max 140 chars)
    confirm_payload TEXT,                    -- JSON to send on confirm
    
    -- Expiration
    expires_at TIMESTAMP,                    -- Auto-expire if not confirmed
    confirmed_at TIMESTAMP,
    confirmed_by TEXT,                       -- john | sarah
    
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_pending_status ON pending_actions(status, expires_at);
CREATE INDEX idx_pending_source ON pending_actions(source_type, source_id);
```

---

## Error Handling & Fallback Paths

### 1. Brain Query Timeout

```python
# If Brain doesn't respond in 5s
if brain_response.status == "timeout":
    # Log independent brain_relevance as NULL
    log_confidence(message_id, stage="brain", score=None, threshold=0.5, 
                   passed=False, decision="proceed_no_context")
    
    # Degrade gracefully: proceed without Brain context
    extraction["brain_context"] = None
    extraction["brain_relevance"] = None
    
    # Decision: Evaluate redline with extraction-only (lower threshold)
    redline_decision = evaluate_redline(extraction, has_brain_context=False)
    reason = "brain_timeout_degraded"
```

### 2. Brain Returns Low Relevance

```python
# If top score < 0.5
if brain_response.relevance_score < 0.5:
    # Log independent brain_relevance
    log_confidence(message_id, stage="brain", score=brain_response.relevance_score, 
                   threshold=0.5, passed=False, decision="proceed_no_context")
    
    # No useful context found
    extraction["brain_context"] = None
    
    # Decision: Depends on extraction_confidence (independent gate)
    if extraction["extraction_confidence"] > 0.8:
        redline_decision = evaluate_redline(extraction, has_brain_context=False)
    else:
        redline_decision = "SILENT"
        reason = "low_confidence_no_brain"
```

### 3. Tier 2 Worker Crash

```python
# Celery retry with exponential backoff
@celery.task(bind=True, max_retries=3)
def process_tier2(self, message_id):
    try:
        # ... processing logic ...
    except Exception as e:
        # Retry in 30s, 2min, 5min
        raise self.retry(countdown=30 * (2 ** self.request.retries))
        
# After 3 failures: mark for manual review
if self.request.retries >= 3:
    mark_for_manual_review(message_id, error=str(e))
```

### 4. Duplicate Detection

```python
# Prevent duplicate pending actions for same event
def check_duplicate(extraction):
    similar = db.query("""
        SELECT * FROM pending_actions 
        WHERE action_type = ? 
        AND json_extract(extracted_json, '$.dates') = ?
        AND json_extract(extracted_json, '$.child') = ?
        AND status = 'pending'
        AND created_at > datetime('now', '-1 hour')
    """, [extraction["type"], extraction["dates"], extraction["child"]])
    
    if similar:
        # Merge or skip
        return "duplicate_detected"
```

---

## Staging Environment Configuration

### File: `.env.staging.observer`

```bash
# Environment
ENV=staging
DB_PATH=./data/staging.db
FAMILY_CONTEXT=./TEST_FAMILY_CONTEXT.md

# Telegram (Test Bot)
TELEGRAM_BOT_TOKEN=${TELEGRAM_TEST_BOT_TOKEN}
TELEGRAM_GROUP_CHAT_ID=${TELEGRAM_TEST_GROUP_ID}

# Brain Integration
BRAIN_BASE_URL=https://icarus-test.hoffdesk.com
BRAIN_QUERY_ENDPOINT=/brain/query
BRAIN_TIMEOUT_MS=5000
BRAIN_MIN_RELEVANCE=0.5              # Gate threshold for brain_relevance

# LLM (Gaming PC via Tailscale)
OLLAMA_HOST=http://matt-pc.tail864e81.ts.net:11434
OLLAMA_MODEL_TIER2=phi4:14b

# Tripwire
TRIPWIRE_THRESHOLD=0.7               # Independent gate threshold
EXTRACTION_THRESHOLD=0.6             # Independent gate threshold
TRIPWIRE_PATTERNS_PATH=./config/tripwire_patterns.json

# Multi-Message Context
CONTEXT_WINDOW_SIZE=3
CONTEXT_TTL_SECONDS=300
ATTRIBUTION_WINDOW_SECONDS=600

# Async Processing
REDIS_URL=redis://localhost:6379/0
CELERY_WORKERS=2

# Redline Rules
REDLINE_TEMPORAL_CONFLICT=true
REDLINE_RESOURCE_CONFLICT=true
REDLINE_MISSING_VARIABLE=true
SPEAK_ON_MISSING_CRITICAL=true

# Limits
MAX_BRAIN_QUERIES_PER_MESSAGE=3
PENDING_ACTION_TTL_HOURS=48
MAX_MESSAGES_PER_MINUTE=60

# Brevity Constraint (Daedalus UX Spec)
MAX_CONFLICT_MESSAGE_LENGTH=280
MAX_CLARIFICATION_LENGTH=200
MAX_SUMMARY_LENGTH=140
```

### Miller Family Test Context

**Test Data Pre-Seeded:**
- Leo's soccer: Tuesdays/Thursdays 4:30 PM @ Westside Park
- Mia's ballet: Mondays/Wednesdays 4:00 PM @ Madison Dance Academy
- Mia's swim: Saturdays 10:00 AM @ YMCA
- Leo's chess: Wednesdays 3:30 PM
- Spring break: March 24-28, 2026
- Honda Civic service: Due April 15, 2026

---

## Implementation Phases

### Week 0.5: Data Collection & Labeling (NEW — Critical)

**Before any implementation:**

- [ ] Record 100 real messages from Family Logistics group
- [ ] Hand-label for coordination signals:
  - `is_coordination`: true/false
  - `coordination_type`: transport | care_coverage | schedule_change | activity_confirm | none
  - `attribution_required`: true/false (for multi-message context)
- [ ] Extract noisy data examples:
  - Typos: "socer", "pratice", "balet"
  - Abbreviations: "wk" (week), "2moro", "thx"
  - Emoji: "👍", "🙋", "✅"
  - Half-sentences: "yeah 3pm", "me", "i'll do it"
  - Mixed signals: "Leo soccer thursday??" (question vs statement)
- [ ] Tune tripwire regex against noisy data
- [ ] Establish baseline precision/recall before code is written

**Noisy Data Test Cases (TC-006 to TC-020):**

| TC | Message | Expected | Notes |
|----|---------|----------|-------|
| TC-006 | "socer tmrw 4" | Queue | Typo: "socer", abbreviation: "tmrw" |
| TC-007 | "👍" (reply to coordination) | Queue | Emoji-only in context |
| TC-008 | "yeah 3pm" | Queue | Half-sentence, needs context |
| TC-009 | "i'll do it" | Queue | Pronoun reference, needs attribution |
| TC-010 | "wk pickup" | Queue | Abbreviation: "wk" |
| TC-011 | "pratice is cancled" | Queue | Multiple typos |
| TC-012 | "thx!" | Drop | Gratitude, not coordination |
| TC-013 | "Mia has bailt monday" | Queue | Typo in activity name |
| TC-014 | "u sure?" | Depends | Question, usually drop unless context says otherwise |
| TC-015 | "4:30 work for me" | Queue | Time-only, needs context |
| TC-016 | "🙋‍♀️" | Queue | Self-assignment via emoji |
| TC-017 | "kk" | Depends | Acknowledgment, context-dependent |
| TC-018 | "Leo dentist 2mrw??" | Queue | Question mark + abbreviation |
| TC-019 | "n/m" | Drop | "Never mind" — cancels previous |
| TC-020 | "omw" | Queue | "On my way" — active coordination |

---

### Phase 8.1: Tripwire + Logger (Week 1)

- [ ] Implement Tier 1 Tripwire (regex) with independent confidence scoring
- [ ] Create `observer_messages` table with thread support
- [ ] Create `observer_confidence_log` table for debugging
- [ ] Log all messages, mark 95% as DROP
- [ ] Deploy to staging, run for 3 days
- [ ] Measure: false positive rate against Week 0.5 labeled data (target <5%)

---

### Phase 8.2: Async Tier 2 (Week 2)

- [ ] Implement Celery worker for Tier 2
- [ ] Integration with Brain query endpoint (independent relevance scoring)
- [ ] Create `pending_actions` table
- [ ] Implement [Confirm] button flow
- [ ] Deploy to staging, run for 3 days

---

### Phase 8.3: Redline Rules + Multi-Message Context (Week 3) — P0 Priority

- [ ] Implement temporal conflict detection (Rule 1)
- [ ] Implement resource conflict detection (Rule 2)
- [ ] Implement missing variable detection (Rule 3)
- [ ] Add multi-message context window (3 messages)
- [ ] Implement attribution logic for pronoun resolution
- [ ] Add SPEAK pathway for conflicts
- [ ] Enforce brevity constraints (Daedalus UX spec)
- [ ] Deploy to staging with Miller family

**Multi-Message Test Scenarios:**

| Scenario | Messages | Expected | Confidence |
|----------|----------|----------|------------|
| TC-021 | A: "Who's getting Leo?" / B: "me" | Attribution → John | extraction_confidence ≥ 0.75 |
| TC-022 | A: "Soccer pickup Thursday?" / B: "I'll do it" / C: "thx" | Attribution + silent | C drops |
| TC-023 | A: "Mia ballet Monday" / B: "I'll cover" / C: "👍" | Attribution + silent | C is emoji acknowledgment |
| TC-024 | A: "Both kids Tuesday" / B: "I got Leo" / C: "I got Mia" | Double attribution + silent | Unless times conflict |

---

### Phase 8.4: Integration Test (Week 4)

- [ ] Run full test scenarios (TC-001 to TC-024)
- [ ] Measure: independent confidence gate accuracy
- [ ] Measure: multi-message attribution accuracy
- [ ] Measure: noisy data handling
- [ ] Measure: family comfort metrics (see Success Metrics below)
- [ ] Adjust thresholds based on results
- [ ] Document learnings for production

---

## Success Metrics

### Primary Metric: Family Comfort (Non-Negotiable)

**UX Acceptance Gate:**
> "Aundrea has never asked 'why did the bot just say that?' for 7 consecutive days"

This is the only metric that matters. Technical precision/recall are secondary to human experience.

### Secondary Metrics

| Metric | Target | Measurement |
|--------|--------|-------------|
| Tripwire Precision | >95% | False positive rate on labeled data (Week 0.5) |
| Tripwire Recall | >90% | % of coordination messages caught |
| Extraction Confidence Accuracy | >85% | Correlation between extraction_confidence and human-verified correctness |
| Brain Relevance Accuracy | >80% | Correlation between brain_relevance and context usefulness |
| Multi-Message Attribution | >85% | Correct "I'll do it" → prior action resolution |
| Brain Query Latency | <3s | Average response time |
| Brain Query Success | >95% | % of queries returning valid response |
| Redline Accuracy | >90% | Correct SPEAK vs SILENT decisions (measured by human review) |
| Pending Confirmation Rate | >70% | % of silent extractions eventually confirmed |
| Chat Interruptions | <2/day | SPEAK messages per day (excluding test scenarios) |
| Aundrea Confusion Events | 0 in 7 days | Number of times primary user questions bot behavior |

### Anti-Metric (What NOT to Optimize)

- **Don't optimize for:** Message extraction volume
- **Don't optimize for:** Brain query coverage
- **Don't optimize for:** Total features detected

**Optimize for:** Family comfort. Full stop.

---

## Security & Privacy

### Data Handling
- **No message storage beyond extraction:** Raw chat messages purged after 30 days
- **No PII in Brain queries:** Queries are abstract ("soccer schedule" not "where is Leo")
- **Local-first:** All processing on Beelink/Gaming PC, no cloud LLM for chat

### Telegram Privacy
- **Privacy Mode:** Disabled for test bot (BotFather → Group Privacy → Off)
- **Bot can see:** All group messages
- **Bot cannot:** See direct messages unless replied

### Staging Isolation
- **Database:** staging.db completely separate from prod.db
- **Bot:** Separate test bot token
- **Family:** Miller family test data only
- **No production data:** Observer will NEVER see real Hoffmann family chat

---

## Open Questions

1. **Conflict Window:** How close is "conflict"? Same day? ±2 hours? → **Resolved:** Same day for now; temporal overlap detection TBD
2. **Override:** Can a parent "force" a silent extraction to speak? (e.g., urgent) → **Open:** Not for Phase 8
3. **Learning:** Should Observer learn from confirmed/rejected actions? (Phase 8.5?) → **Open:** Post-family-adoption
4. **Multi-message Context:** Should attribution expire? → **Resolved:** 10-minute window

---

## Appendix A: Tripwire Pattern Reference

```json
{
  "tripwire_patterns": {
    "temporal": [
      "\\b(tomorrow|today|next week|this week)\\b",
      "\\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* \\d{1,2}\\b",
      "\\b(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\\b",
      "\\b\\d{1,2}:\\d{2}\\b",
      "\\b\\d{1,2}(\\s)?(am|pm)\\b"
    ],
    "assignment": [
      "\\b(I('ll| will)|can) (pick up|get|cover|take|bring)\\b",
      "\\b(who('s| is) (getting|picking up|covering))\\b",
      "\\b(cover( for)?|swap|switch)\\b",
      "\\b(me|me too|i'll do it|i got it)\\b"
    ],
    "pronoun_attribution": [
      "\\b(i'll|i will|i can|i'm|i am|me|me too)\\b"
    ],
    "children": [
      "\\b(Leo|Mia)\\b",
      "\\b(kids?|children)\\b",
      "\\b(son|daughter)\\b"
    ],
    "activities": [
      "\\b(soccer|ballet|swim|chess|practice|lesson|class)\\b",
      "\\b(game|match|meet|recital|performance)\\b"
    ],
    "conflict_markers": [
      "\\b(but|wait|hold on|doesn't|didn't|conflict|overlap)\\b",
      "\\b(what about|how about)\\b"
    ],
    "noise_patterns": [
      "\\b(thx|thanks|ty|👍|✅|🙋|kk|omw)\\b"
    ]
  }
}
```

---

## Appendix B: Mermaid Diagram (Simplified)

```mermaid
flowchart TD
    A[Family Chat Message] --> B{Tier 1 Tripwire}
    B -->|tripwire_confidence < 0.7| C[Drop Silently]
    B -->|tripwire_confidence >= 0.7| D[Queue for Tier 2]
    
    D --> E[Async Tier 2 Worker]
    E --> F[Extract Coordination State]
    F -->|extraction_confidence < 0.6| C
    F -->|extraction_confidence >= 0.6| G[Query Brain for Context]
    
    G --> H{brain_relevance >= 0.5?}
    H -->|Yes| I[Redline with Brain Context]
    H -->|No| J[Redline Extraction-Only]
    
    I --> K{Redline Decision}
    J --> K
    
    K -->|No Conflict| L[Store as Pending]
    K -->|Conflict Detected| M[Speak in Chat]
    
    L --> N[[Confirm Button]]
    N -->|Confirmed| O[Write to Event Graph]
    N -->|Rejected| P[Discard]
    
    M --> Q[[Resolve Button]]
    Q --> R[Update Event Graph]
```

---

_The conversation is the interface. The brain provides memory. The observer knows when to speak. The family never asks "why did the bot just say that?"_