Agent Health Troubleshooting Playbook
In case of emergency — when an agent sounds generic, forgets who they are, or goes off-identity.
Symptom: Agent responds generically ("I am a large language model trained by...")
Root Cause: Session Model Override Stuck
OpenClaw's auto-failover can permanently override an agent's model at the session level. If the primary model fails, the session silently switches to whatever model completed the request — and stays there. This overrides the agent's configured model and persona.
Example: Wadsworth's DM session auto-overrode from glm-5.1:cloud to gemma4:latest (Gaming PC model). Since gemma4 is a Google-trained model, it didn't adopt Wadsworth's butler persona and responded as "I am a large language model trained by Google."
Fix: Reset the Contaminated Session
-
Find the session key:
bash python3 -c " import json with open('/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json') as f: store = json.load(f) for k, v in store.items(): if 'telegram:direct' in k and 'heartbeat' not in k: print(f'Key: {k}') print(f' Model: {v.get(\"model\",\"?\")}') print(f' ModelOverride: {v.get(\"modelOverride\",\"none\")}') print(f' ProviderOverride: {v.get(\"providerOverride\",\"none\")}') " -
Check for model override contamination:
- IfmodelOverrideis set to a model that doesn't match the agent's configured primary → contaminated
- IfmodelOverrideSource: "auto"→ auto-failover stuck it there -
Backup and delete the contaminated session:
```bash
SESSION_FILE="/home/hoffmann_admin/.openclaw/agents/main/sessions/.jsonl"
SESSION_KEY="agent:main:telegram:direct:"
# Backup first
cp "$SESSION_FILE" "${SESSION_FILE}.contaminated-backup.$(date +%Y%m%d%H%M%S)"
# Remove from sessions store
python3 -c "
import json
store_path = '/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json'
with open(store_path) as f:
store = json.load(f)
key = '$SESSION_KEY'
if key in store:
del store[key]
print(f'Removed: {key}')
with open(store_path, 'w') as f:
json.dump(store, f, indent=2)
"
# Remove transcript
mv "$SESSION_FILE" "${SESSION_FILE}.contaminated-deleted.$(date +%Y%m%d%H%M%S)"
# Restart gateway
openclaw gateway restart
```
- Test: DM the agent with "Who are you?" — should respond with correct identity.
Prevention
- Monitor
modelOverrideandmodelOverrideSource: "auto"in session stores - If Active Memory or failover keeps overriding the model, the fallback chain may be too aggressive
- Consider whether
ollama-remote(Gaming PC) models should be in the fallback chain for agents that need strong persona adherence — remote models are slow and may trigger cascading failovers
Symptom: Agent takes 30-60 seconds before responding (then may respond generically)
Root Cause: Active Memory Plugin Blocking on Slow Model
Active Memory runs as a blocking pre-response hook. It searches session history and generates a recall summary before the main agent can respond. If the recall model is too slow, every message has a 30-60s delay before the agent even starts thinking.
Example: Active Memory was configured for ollama-remote/gemma4:latest (Gaming PC). 7 out of 8 calls timed out at 30s. Even switching to glm-5.1:cloud (cloud proxy) resulted in 58s completion time — still too slow for a blocking gate.
Fix: Disable Active Memory or Use a Fast Local Model
Option A — Disable Active Memory (recommended for current hardware):
"active-memory": {
"enabled": false,
"config": {
"agents": ["main"],
"model": "ollama/llama3.2:1b-instruct-q4_K_M",
"modelFallback": "ollama/glm-5.1:cloud"
}
}
Workspace files (SOUL.md, IDENTITY.md, MEMORY.md, AGENTS.md) provide strong identity grounding without Active Memory. The sessionMemory experimental feature (already enabled) indexes session transcripts for recall without the blocking gate.
Option B — Use a fast local model (requires sub-5s inference on Beelink CPU):
As of 2026-04-20, no tested model completes Active Memory recall in under 5s on the Beelink's CPU:
- llama3.2:3b: 8.4s, failed to recall ("None")
- llama3.2:1b: 9.9s, partial recall
- llama3.2:1b-instruct-q4_K_M: 6.4s, best option but still too slow
- qwen3.5:4b: >30s, timed out
Revisit if the Beelink gets a GPU or a significantly faster small model becomes available.
Active Memory Logs
Check Active Memory performance in gateway logs:
grep "active-memory" /tmp/openclaw/openclaw-*.log | grep -E "(start|done)" | python3 -c "
import sys, json
for line in sys.stdin:
try:
d = json.loads(line.strip())
msg = d.get('1','')
ts = d.get('time','')
if 'active-memory' in msg:
print(f'{ts[:19]} | {msg}')
except: pass
" | tail -20
Look for status=timeout or status=empty — these mean the recall failed and the agent got zero memory context.
Symptom: Agent identity confusion (responds as wrong agent)
Root Cause: Contaminated Session History
If someone operated as Agent A through Agent B's bot/channel, the session history contains mixed identity context. The model then can't determine which persona to adopt.
Fix: Reset the contaminated session (same procedure as model override above)
Prevention
- Never operate as Socrates through the main/default bot — it contaminates Wadsworth's session history with wrong identity context
- Never operate as Daedalus through Wadsworth's bot — same problem
- Always use the correct agent's bot/channel for that agent's work
- Inter-agent communication: use
sessions_sendorsessions_spawn, never the Telegram API
Symptom: hoffdesk-webhook service failing
Root Cause: Template placeholders in systemd unit file
The service file at /etc/systemd/system/hoffdesk-webhook.service had __USER__, __WORKDIR__, __PREFIX__ placeholders that were never substituted.
Fix: Already applied (2026-04-20)
Replaced with actual values:
- __USER__ → hoffmann_admin
- __WORKDIR__ → /home/hoffmann_admin/.openclaw/services/family_assistant
- __PREFIX__ → /home/hoffmann_admin/.local/bin
Requires sudo to apply:
sudo systemctl daemon-reload && sudo systemctl restart hoffdesk-webhook
Quick Reference: Agent Config Locations
| Item | Path |
|---|---|
| Main config | ~/.openclaw/openclaw.json |
| Wadsworth workspace | ~/.openclaw/workspace/ |
| Socrates workspace | ~/.openclaw/workspace-socrates/ |
| Daedalus workspace | ~/.openclaw/workspace-daedalus/ |
| Wadsworth sessions | ~/.openclaw/agents/main/sessions/ |
| Socrates sessions | ~/.openclaw/agents/socrates/sessions/ |
| Daedalus sessions | ~/.openclaw/agents/daedalus/sessions/ |
| Gateway logs | /tmp/openclaw/openclaw-YYYY-MM-DD.log |
| Webhook service | /etc/systemd/system/hoffdesk-webhook.service |
| Webhook env | ~/.openclaw/workspace/scripts/.env |
Quick Reference: Session Reset Commands
# List sessions for an agent
python3 -c "
import json
with open('/home/hoffmann_admin/.openclaw/agents/<AGENT>/sessions/sessions.json') as f:
store = json.load(f)
for k, v in store.items():
if 'direct' in k or 'group' in k:
print(f'{k:60s} | model={v.get(\"model\",\"?\"):20s} | override={v.get(\"modelOverride\",\"none\")}')
"
# Reset a specific session (backup first!)
# Replace SESSION_KEY and SESSION_ID with actual values
Last updated: 2026-04-20 by Socrates
Incident: Wadsworth responding as "trained by Google" due to model override + Active Memory cascade failure
FALLBACK CONTINGENCIES (In Case of Emergency)
Source of Truth: github.com:NightKnight64/hoffdesk-agents
Last Backup: 2026-04-20 — Mono-repo consolidation complete
Scenario: Complete Agent Loss (Corrupted or Deleted)
Recovery Protocol
-
Clone the mono-repo:
bash git clone git@github.com:NightKnight64/hoffdesk-agents.git ~/hoffdesk-agents-recovery cd ~/hoffdesk-agents-recovery -
Restore agent workspace:
```bash
# For Wadsworth
cp -r agents/wadsworth/* ~/.openclaw/workspace/
# For Socrates
cp -r agents/socrates/* ~/.openclaw/workspace-socrates/
# For Daedalus
cp -r agents/daedalus/* ~/.openclaw/workspace-daedalus/
```
-
Restore shared artifacts:
bash cp -r shared/* ~/.openclaw/shared/ -
Verify openclaw.json workspace paths:
bash cat ~/.openclaw/openclaw.json | python3 -c " import json, sys d = json.load(sys.stdin) for agent in d.get('agents', {}).get('list', []): print(f'{agent[\"id\"]}: {agent[\"workspace\"]}') "
-main→ should point to~/.openclaw/workspace(or~/hoffdesk-agents/agents/wadsworthif migrated)
-socrates→ should point to~/.openclaw/workspace-socrates
-daedalus→ should point to~/.openclaw/workspace-daedalus -
Restart gateway:
bash openclaw gateway restart -
Test: Send "Who are you?" to each agent — should respond with correct identity.
Scenario: Gateway Completely Broken (Won't Start)
Recovery Protocol
-
Check logs for config errors:
bash tail -100 /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log -
Validate openclaw.json:
bash openclaw config validate -
If config corrupted, restore from mono-repo:
bash cp ~/hoffdesk-agents/openclaw.json ~/.openclaw/openclaw.json openclaw gateway restart -
If still broken, reset to minimal config:
```bash
# Backup current
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.broken.$(date +%Y%m%d%H%M%S)
# Generate minimal config
openclaw config
# Follow prompts to rebuild
```
Scenario: Beelink Hardware Failure (Total Loss)
Recovery Protocol
Assumptions: You have access to another Linux machine or fresh Ubuntu install.
-
Install OpenClaw:
bash npm install -g openclaw -
Clone mono-repo:
bash git clone git@github.com:NightKnight64/hoffdesk-agents.git ~/hoffdesk-agents -
Recreate directory structure:
bash mkdir -p ~/.openclaw cp -r ~/hoffdesk-agents/agents/wadsworth ~/.openclaw/workspace cp -r ~/hoffdesk-agents/agents/socrates ~/.openclaw/workspace-socrates cp -r ~/hoffdesk-agents/agents/daedalus ~/.openclaw/workspace-daedalus cp -r ~/hoffdesk-agents/shared ~/.openclaw/shared cp ~/hoffdesk-agents/openclaw.json ~/.openclaw/ -
Reconfigure secrets:
- Telegram bot tokens
- Cloudflare tokens
- Any API keys (not stored in git) -
Start gateway:
bash openclaw gateway start
Scenario: Single Session Corrupted (Agent works but one chat is broken)
Quick Fix
# Find the corrupted session
python3 -c "
import json
path = '/home/hoffmann_admin/.openclaw/agents/main/sessions/sessions.json'
with open(path) as f:
store = json.load(f)
for k, v in store.items():
if 'telegram:direct' in k:
print(f'{k} | model={v.get(\"model\",\"?\")} | override={v.get(\"modelOverride\",\"none\")}')
"
# Reset that specific session (backup first!)
# Then DM the agent again — fresh session will spawn automatically
Critical Files Not in Git (Must Reconfigure Manually)
| File | Purpose | Backup Strategy |
|---|---|---|
~/.openclaw/openclaw.json |
Main config | In mono-repo |
~/.openclaw/credentials/ |
OAuth tokens | NOT IN GIT — re-auth required |
| Telegram bot tokens | Messaging | NOT IN GIT — reconfigure required |
| Cloudflare API tokens | Tunnel/Pages | NOT IN GIT — reconfigure required |
~/.openclaw/services/.env |
Service secrets | Symlinked to workspace — in mono-repo |
Emergency Contacts
- Primary: Wadsworth (main agent) — handles coordination
- Backend/Issues: Socrates 🧠
- Frontend/Issues: Daedalus 🎨
- Director: Matt
Last updated: 2026-04-20 by Wadsworth
Mono-repo migration: github.com:NightKnight64/hoffdesk-agents