# Agent Health Troubleshooting Playbook

_In case of emergency — when an agent sounds generic, forgets who they are, or goes off-identity._

## Symptom: Agent responds generically ("I am a large language model trained by...")

### Root Cause: Session Model Override Stuck

OpenClaw's auto-failover can permanently override an agent's model at the session level. If the primary model fails, the session silently switches to whatever model completed the request — and stays there. This overrides the agent's configured model and persona.

**Example:** Wadsworth's DM session auto-overrode from `glm-5.1:cloud` to `gemma4:latest` (Gaming PC model). Since `gemma4` is a Google-trained model, it didn't adopt Wadsworth's butler persona and responded as "I am a large language model trained by Google."

### Fix: Reset the Contaminated Session

1. **Find the session key:**
   ```bash
   python3 -c "
   import json
   with open('/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json') as f:
       store = json.load(f)
   for k, v in store.items():
       if 'telegram:direct' in k and 'heartbeat' not in k:
           print(f'Key: {k}')
           print(f'  Model: {v.get(\"model\",\"?\")}')
           print(f'  ModelOverride: {v.get(\"modelOverride\",\"none\")}')
           print(f'  ProviderOverride: {v.get(\"providerOverride\",\"none\")}')
   "
   ```

2. **Check for model override contamination:**
   - If `modelOverride` is set to a model that doesn't match the agent's configured primary → contaminated
   - If `modelOverrideSource: "auto"` → auto-failover stuck it there

3. **Backup and delete the contaminated session:**
   ```bash
   SESSION_FILE="/home/hoffmann_admin/.openclaw/agents/main/sessions/<SESSION_ID>.jsonl"
   SESSION_KEY="agent:main:telegram:direct:<CHAT_ID>"

   # Backup first
   cp "$SESSION_FILE" "${SESSION_FILE}.contaminated-backup.$(date +%Y%m%d%H%M%S)"

   # Remove from sessions store
   python3 -c "
   import json
   store_path = '/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json'
   with open(store_path) as f:
       store = json.load(f)
   key = '$SESSION_KEY'
   if key in store:
       del store[key]
       print(f'Removed: {key}')
   with open(store_path, 'w') as f:
       json.dump(store, f, indent=2)
   "

   # Remove transcript
   mv "$SESSION_FILE" "${SESSION_FILE}.contaminated-deleted.$(date +%Y%m%d%H%M%S)"

   # Restart gateway
   openclaw gateway restart
   ```

4. **Test:** DM the agent with "Who are you?" — should respond with correct identity.

### Prevention

- Monitor `modelOverride` and `modelOverrideSource: "auto"` in session stores
- If Active Memory or failover keeps overriding the model, the fallback chain may be too aggressive
- Consider whether `ollama-remote` (Gaming PC) models should be in the fallback chain for agents that need strong persona adherence — remote models are slow and may trigger cascading failovers

---

## Symptom: Agent takes 30-60 seconds before responding (then may respond generically)

### Root Cause: Active Memory Plugin Blocking on Slow Model

Active Memory runs as a **blocking pre-response hook**. It searches session history and generates a recall summary *before* the main agent can respond. If the recall model is too slow, every message has a 30-60s delay before the agent even starts thinking.

**Example:** Active Memory was configured for `ollama-remote/gemma4:latest` (Gaming PC). 7 out of 8 calls timed out at 30s. Even switching to `glm-5.1:cloud` (cloud proxy) resulted in 58s completion time — still too slow for a blocking gate.

### Fix: Disable Active Memory or Use a Fast Local Model

**Option A — Disable Active Memory (recommended for current hardware):**
```json
"active-memory": {
  "enabled": false,
  "config": {
    "agents": ["main"],
    "model": "ollama/llama3.2:1b-instruct-q4_K_M",
    "modelFallback": "ollama/glm-5.1:cloud"
  }
}
```

Workspace files (SOUL.md, IDENTITY.md, MEMORY.md, AGENTS.md) provide strong identity grounding without Active Memory. The `sessionMemory` experimental feature (already enabled) indexes session transcripts for recall without the blocking gate.

**Option B — Use a fast local model (requires sub-5s inference on Beelink CPU):**
As of 2026-04-20, no tested model completes Active Memory recall in under 5s on the Beelink's CPU:
- `llama3.2:3b`: 8.4s, failed to recall ("None")
- `llama3.2:1b`: 9.9s, partial recall
- `llama3.2:1b-instruct-q4_K_M`: 6.4s, best option but still too slow
- `qwen3.5:4b`: >30s, timed out

Revisit if the Beelink gets a GPU or a significantly faster small model becomes available.

### Active Memory Logs

Check Active Memory performance in gateway logs:
```bash
grep "active-memory" /tmp/openclaw/openclaw-*.log | grep -E "(start|done)" | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        d = json.loads(line.strip())
        msg = d.get('1','')
        ts = d.get('time','')
        if 'active-memory' in msg:
            print(f'{ts[:19]} | {msg}')
    except: pass
" | tail -20
```

Look for `status=timeout` or `status=empty` — these mean the recall failed and the agent got zero memory context.

---

## Symptom: Agent identity confusion (responds as wrong agent)

### Root Cause: Contaminated Session History

If someone operated as Agent A through Agent B's bot/channel, the session history contains mixed identity context. The model then can't determine which persona to adopt.

### Fix: Reset the contaminated session (same procedure as model override above)

### Prevention

- **Never operate as Socrates through the main/default bot** — it contaminates Wadsworth's session history with wrong identity context
- **Never operate as Daedalus through Wadsworth's bot** — same problem
- Always use the correct agent's bot/channel for that agent's work
- Inter-agent communication: use `sessions_send` or `sessions_spawn`, **never** the Telegram API

---

## Symptom: `hoffdesk-webhook` service failing

### Root Cause: Template placeholders in systemd unit file

The service file at `/etc/systemd/system/hoffdesk-webhook.service` had `__USER__`, `__WORKDIR__`, `__PREFIX__` placeholders that were never substituted.

### Fix: Already applied (2026-04-20)

Replaced with actual values:
- `__USER__` → `hoffmann_admin`
- `__WORKDIR__` → `/home/hoffmann_admin/.openclaw/services/family_assistant`
- `__PREFIX__` → `/home/hoffmann_admin/.local/bin`

Requires sudo to apply:
```bash
sudo systemctl daemon-reload && sudo systemctl restart hoffdesk-webhook
```

---

## Quick Reference: Agent Config Locations

| Item | Path |
|------|------|
| Main config | `~/.openclaw/openclaw.json` |
| Wadsworth workspace | `~/.openclaw/workspace/` |
| Socrates workspace | `~/.openclaw/workspace-socrates/` |
| Daedalus workspace | `~/.openclaw/workspace-daedalus/` |
| Wadsworth sessions | `~/.openclaw/agents/main/sessions/` |
| Socrates sessions | `~/.openclaw/agents/socrates/sessions/` |
| Daedalus sessions | `~/.openclaw/agents/daedalus/sessions/` |
| Gateway logs | `/tmp/openclaw/openclaw-YYYY-MM-DD.log` |
| Webhook service | `/etc/systemd/system/hoffdesk-webhook.service` |
| Webhook env | `~/.openclaw/workspace/scripts/.env` |

## Quick Reference: Session Reset Commands

```bash
# List sessions for an agent
python3 -c "
import json
with open('/home/hoffmann_admin/.openclaw/agents/<AGENT>/sessions/sessions.json') as f:
    store = json.load(f)
for k, v in store.items():
    if 'direct' in k or 'group' in k:
        print(f'{k:60s} | model={v.get(\"model\",\"?\"):20s} | override={v.get(\"modelOverride\",\"none\")}')
"

# Reset a specific session (backup first!)
# Replace SESSION_KEY and SESSION_ID with actual values
```

---

_Last updated: 2026-04-20 by Socrates_
_Incident: Wadsworth responding as "trained by Google" due to model override + Active Memory cascade failure_

---

## FALLBACK CONTINGENCIES (In Case of Emergency)

**Source of Truth:** `github.com:NightKnight64/hoffdesk-agents`  
**Last Backup:** 2026-04-20 — Mono-repo consolidation complete

---

## Scenario: Complete Agent Loss (Corrupted or Deleted)

### Recovery Protocol

1. **Clone the mono-repo:**
   ```bash
   git clone git@github.com:NightKnight64/hoffdesk-agents.git ~/hoffdesk-agents-recovery
   cd ~/hoffdesk-agents-recovery
   ```

2. **Restore agent workspace:**
   ```bash
   # For Wadsworth
   cp -r agents/wadsworth/* ~/.openclaw/workspace/
   
   # For Socrates
   cp -r agents/socrates/* ~/.openclaw/workspace-socrates/
   
   # For Daedalus
   cp -r agents/daedalus/* ~/.openclaw/workspace-daedalus/
   ```

3. **Restore shared artifacts:**
   ```bash
   cp -r shared/* ~/.openclaw/shared/
   ```

4. **Verify openclaw.json workspace paths:**
   ```bash
   cat ~/.openclaw/openclaw.json | python3 -c "
   import json, sys
   d = json.load(sys.stdin)
   for agent in d.get('agents', {}).get('list', []):
       print(f'{agent[\"id\"]}: {agent[\"workspace\"]}')
   "
   ```
   - `main` → should point to `~/.openclaw/workspace` (or `~/hoffdesk-agents/agents/wadsworth` if migrated)
   - `socrates` → should point to `~/.openclaw/workspace-socrates`
   - `daedalus` → should point to `~/.openclaw/workspace-daedalus`

5. **Restart gateway:**
   ```bash
   openclaw gateway restart
   ```

6. **Test:** Send "Who are you?" to each agent — should respond with correct identity.

---

## Scenario: Gateway Completely Broken (Won't Start)

### Recovery Protocol

1. **Check logs for config errors:**
   ```bash
   tail -100 /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log
   ```

2. **Validate openclaw.json:**
   ```bash
   openclaw config validate
   ```

3. **If config corrupted, restore from mono-repo:**
   ```bash
   cp ~/hoffdesk-agents/openclaw.json ~/.openclaw/openclaw.json
   openclaw gateway restart
   ```

4. **If still broken, reset to minimal config:**
   ```bash
   # Backup current
   cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.broken.$(date +%Y%m%d%H%M%S)
   
   # Generate minimal config
   openclaw config
   # Follow prompts to rebuild
   ```

---

## Scenario: Beelink Hardware Failure (Total Loss)

### Recovery Protocol

**Assumptions:** You have access to another Linux machine or fresh Ubuntu install.

1. **Install OpenClaw:**
   ```bash
   npm install -g openclaw
   ```

2. **Clone mono-repo:**
   ```bash
   git clone git@github.com:NightKnight64/hoffdesk-agents.git ~/hoffdesk-agents
   ```

3. **Recreate directory structure:**
   ```bash
   mkdir -p ~/.openclaw
   cp -r ~/hoffdesk-agents/agents/wadsworth ~/.openclaw/workspace
   cp -r ~/hoffdesk-agents/agents/socrates ~/.openclaw/workspace-socrates
   cp -r ~/hoffdesk-agents/agents/daedalus ~/.openclaw/workspace-daedalus
   cp -r ~/hoffdesk-agents/shared ~/.openclaw/shared
   cp ~/hoffdesk-agents/openclaw.json ~/.openclaw/
   ```

4. **Reconfigure secrets:**
   - Telegram bot tokens
   - Cloudflare tokens
   - Any API keys (not stored in git)

5. **Start gateway:**
   ```bash
   openclaw gateway start
   ```

---

## Scenario: Single Session Corrupted (Agent works but one chat is broken)

### Quick Fix

```bash
# Find the corrupted session
python3 -c "
import json
path = '/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json'
with open(path) as f:
    store = json.load(f)
for k, v in store.items():
    if 'telegram:direct' in k:
        print(f'{k} | model={v.get(\"model\",\"?\")} | override={v.get(\"modelOverride\",\"none\")}')
"

# Reset that specific session (backup first!)
# Then DM the agent again — fresh session will spawn automatically
```

---

## Critical Files Not in Git (Must Reconfigure Manually)

| File | Purpose | Backup Strategy |
|------|---------|-----------------|
| `~/.openclaw/openclaw.json` | Main config | In mono-repo |
| `~/.openclaw/credentials/` | OAuth tokens | **NOT IN GIT** — re-auth required |
| Telegram bot tokens | Messaging | **NOT IN GIT** — reconfigure required |
| Cloudflare API tokens | Tunnel/Pages | **NOT IN GIT** — reconfigure required |
| `~/.openclaw/services/.env` | Service secrets | Symlinked to workspace — in mono-repo |

---

## Emergency Contacts

- **Primary:** Wadsworth (main agent) — handles coordination
- **Backend/Issues:** Socrates 🧠
- **Frontend/Issues:** Daedalus 🎨
- **Director:** Matt

---

_Last updated: 2026-04-20 by Wadsworth_
_Mono-repo migration: github.com:NightKnight64/hoffdesk-agents_