Runbooks
Common ops procedures. Each is copy-paste ready — run top to bottom.
Wipe DB (fresh-start testing)
When you want to reset to a clean state for testing:
# 1. Backup first (always)
ssh jaeger "docker exec lumen-postgres pg_dump -U lumen -d lumen \
| gzip > /tmp/lumen-backup-$(date +%Y%m%d-%H%M%S).sql.gz"
# 2. Wipe data (keep schema + llm_providers config)
ssh jaeger "docker exec -i lumen-postgres psql -U lumen -d lumen" <<'SQL'
BEGIN;
TRUNCATE TABLE
audit_log, chunks, conversations, messages,
documents, group_members, groups, project_grants,
project_memories, project_members, projects,
share_requests, share_tokens, users, departments
RESTART IDENTITY CASCADE;
UPDATE bootstrap_state SET is_initialized = false;
COMMIT;
SQL
# 3. Clear uploads
ssh jaeger "docker exec lumen-api sh -c 'rm -rf /data/uploads/*'"
# 4. Verify
curl https://lumen-api.zenmail.my.id/bootstrap/status
# Expect: {"isInitialized":false}
Visit https://ai-kb.zenmail.my.id — middleware redirects to /onboarding/first-login.
Fresh deploy from scratch
# 1. Ensure repo is clean
cd ~/ai-knowledge-base
git status
git pull origin master
# 2. Trigger redeploy via Dokploy API
ssh jaeger "curl -s -b /tmp/dk_cookie -X POST \
http://localhost:3000/api/trpc/compose.redeploy \
-H 'Content-Type: application/json' \
-d '{\"json\":{\"composeId\":\"-lHrFrWxG8415d10zZ3j0\"}}'"
# 3. Watch
watch -n 5 "ssh jaeger 'docker ps --filter name=lumen --format \"{{.Names}}\\t{{.Status}}\"'"
Expect all services Up X seconds → Up X minutes → stable Up Y minutes.
Rotate JWT secret
Warning: invalidates every access + refresh token. Every user will have to re-login.
# Generate new secret
NEW_SECRET=$(openssl rand -base64 32)
# Update in Dokploy env via web UI:
# Apps → lumen-api → Environment → JWT_SECRET = <new value>
# Trigger redeploy of just the API
ssh jaeger "curl -s -b /tmp/dk_cookie -X POST \
http://localhost:3000/api/trpc/compose.redeploy \
-H 'Content-Type: application/json' \
-d '{\"json\":{\"composeId\":\"-lHrFrWxG8415d10zZ3j0\"}}'"
Revert a bad commit
# Find the bad commit
git log --oneline -10
# Create a revert
git revert <sha> --no-edit
# Push — Dokploy auto-deploys
git push origin master
If the frontend is completely broken, users will see the cached Next output until Dokploy rebuilds (~3-6 min).
Clear Redis queue
ssh jaeger "docker exec lumen-redis redis-cli FLUSHDB"
Any in-flight document processing jobs will be lost — they'll need to be re-uploaded. Do NOT run this while users are uploading.
Force re-embed all documents
When the embedder model changes (e.g. upgrade from all-MiniLM-L6-v2 to multilingual-e5-small):
# 1. Delete existing chunks
ssh jaeger "docker exec -i lumen-postgres psql -U lumen -d lumen" <<'SQL'
TRUNCATE TABLE chunks RESTART IDENTITY;
UPDATE documents SET status = 'pending' WHERE status = 'indexed';
SQL
# 2. Re-enqueue all documents via the script
cd ~/ai-knowledge-base/scripts
python embed_chunks.py --all
Check a user's access
ssh jaeger "docker exec lumen-postgres psql -U lumen -d lumen -c \
\"SELECT id, email, platform_role, org_position, department_id \
FROM users WHERE email = 'foo@bar.com';\""
To compute their full access tier on a project, look at:
- Their
platform_role(admin/engineer/superadmin→fullon everything) - Their
org_position(ceo→useon everything) projects.owner_id = user.id→full- Rows in
project_grantsmatching user/group/dept projects.is_private— if false,usebaseline for everyone
See Resolver priority.
Tail logs during a deploy
# API logs
ssh jaeger "docker logs -f --tail 50 lumen-api"
# Web logs
ssh jaeger "docker logs -f --tail 50 lumen-web"
# Worker logs (document processing)
ssh jaeger "docker logs -f --tail 50 lumen-worker"
# Embedder (model loading)
ssh jaeger "docker logs -f --tail 50 lumen-embedder"
"Authentication Fails" on chat
Classic symptom: chat returns "Error: Failed to generate response" and API logs show:
[LLM] API error: 401 Authentication Fails (auth header format should be Bearer sk-...)
[LLM] resolveProvider -> url=... keyPrefix=(empty) keyLen=0
The resolver couldn't find an API key. Causes:
- No LLM provider configured at all → add one at
/engineer/providers - Provider not marked
is_default = trueAND chat sent a bare model name - Bash escaping corrupted an API key in a DB UPDATE (happened during issue #3)
Fix: check llm_providers.api_key actually starts with sk- (or similar prefix). If corrupted, re-save via the Edit provider modal.
ssh jaeger "docker exec lumen-postgres psql -U lumen -d lumen -c \
\"SELECT name, substring(api_key, 1, 10) AS prefix, length(api_key) AS len \
FROM llm_providers;\""
Good: prefix=sk-abc1234..., len=51. Bad: prefix=, len=0 or short weird prefixes.
Smoke test the whole stack
# Run from your laptop (prod)
TOKEN=$(curl -s -X POST "https://lumen-api.zenmail.my.id/auth/login" \
-H 'Content-Type: application/json' \
-d '{"email":"...","password":"..."}' | jq -r .accessToken)
echo "=== bootstrap ==="
curl -s "https://lumen-api.zenmail.my.id/bootstrap/status"
echo "=== projects ==="
curl -s -H "Authorization: Bearer $TOKEN" \
"https://lumen-api.zenmail.my.id/projects" | jq '.projects | length'
echo "=== providers ==="
curl -s -H "Authorization: Bearer $TOKEN" \
"https://lumen-api.zenmail.my.id/providers" | jq '.providers | length'
echo "=== chat config ==="
curl -s -H "Authorization: Bearer $TOKEN" \
"https://lumen-api.zenmail.my.id/chat/config"
All four should return non-error JSON. If any returns {error: ...} or 5xx, start with docker logs lumen-api.