Summary of "Save 95% of Your Tokens - OpenClaw Full Tutorial"

Goal

Reduce OpenClaw API costs massively (author claims >90% savings; personal drop from ~$100/day → <$5/day; helped friend $200/day → ~$10/day). Focus areas: model routing/selection, caching, context pruning, session initialization, local heartbeats, token audits, and practical deployment.

Quick checklist of optimizations (what to change and why)

Model routing / model selection
- Add multiple providers and models in openclaw.json (examples: Anthropic Claude — Haiku, Sonnet, Opus; OpenAI — GPT-5.1, GPT-5-mini).
- Use an inexpensive default model (e.g., Claude Haiku 4.5/4.6) and escalate to Sonnet/Opus or GPT-5 only for advanced reasoning tasks.
- Configure fallback order so agents automatically switch if a provider is rate-limited.
Session initialization
- Load only minimal files at session start: soul.md, user.md, and today’s memory file.
- Avoid auto-loading full conversation history or past tool outputs; run memory search only when the user asks about past context.
- At session end, write a short summary to memory (e.g., <500 words, bullet points).
Prompt caching (major saving)
- Enable model-side prompt caching (configure in model settings) for large/expensive models (Opus/Sonnet).
- Configure cache retention (short ~5 min inactivity, long ~1 hour). Caching reduces repeated token costs for static prompts (soul.md, user.md).
- Note: caching has a write cost but yields huge read savings. Cache-hit rates are visible in the gateway.
Context pruning
- Add rules to prune stale tool outputs / old messages after a TTL (e.g., ~55 minutes) to prevent context windows from bloating token counts.
- Put context-pruning config in the defaults section of openclaw.json.
Local heartbeat offload (Olama + Llama)
- Heartbeats (periodic checks) usually hit paid APIs — route them to a free local model instead.
- Install Olama on the VPS and run a lightweight Llama-3.2-3B model to handle heartbeat checks (CPU-capable).
- Configure the heartbeat snippet in openclaw.json defaults so heartbeats call the local model, not paid providers.
Spending limits and budget rules
- Set platform monthly spending caps and disable auto-recharge to prevent unexpected charges.
- Add rate-limit / pacing / spending rules in soul.md (e.g., minimum seconds between API/web requests, daily/monthly budgets, notify thresholds, fallback behavior on rate-limits).
Token audits & monitoring
- Use the OpenClaw gateway “Usage” tab and built-in commands to inspect token/cost usage.
- Useful slash commands: /st status, /context list, /context detail to see token counts per file, cache hit rate, and session totals.
- Run a token-audit prompt to get a session-level cost breakdown and recommendations.

Practical deployment & tooling (how to apply changes)

Host environment: run OpenClaw in an isolated VPS (author uses Hostinger one‑click Docker + KVM plans).
Access the server via SSH. If running in Docker, enter the container:

docker ps
docker exec -it <container-id> /bin/bash

Edit files using:
- OpenClaw CLI inside the container, or
- VS Code Remote - SSH: open /docker/openclaw/data/openclaw for easier editing than nano.
Files to edit:
- openclaw.json (models, defaults, heartbeat, context pruning)
- soul.md (routing rules, session rules, budget rules)
- heartbeat.md (heartbeat prompt)
Restart to apply changes:

docker restart <container-id>
# or restart the gateway if not using Docker

Olama quick steps:
- Install via curl install script (as in Olama docs)
- Enable and start: systemctl enable –now olama
- Pull/run model: olama pull llama-3.2b; olama run llama-3.2b
- Test with a sample prompt

Commands & UI pointers

Common Docker commands:

docker ps
docker exec -it <container-id> /bin/bash
docker restart <container-id>

VS Code: Remote - SSH → open folder /docker/openclaw/data/openclaw
Gateway slash commands:
- /st status
- /context list
- /context detail
Olama:

systemctl start olama
olama pull llama-3.2b
olama run llama-3.2b
# test with a small prompt

Metrics & effects shown

Cache hit rate reported in the gateway (example: 99.9%).
Gateway Usage tab shows: messages, tool calls, average tokens/message, cost by model and token type (cache writes/reads).
Author demonstrates large cost reductions after implementing routing, caching, local heartbeat, and pruning.

Guides, downloads, and additional materials referenced

Presenter created a downloadable guide (prompts, JSON snippets, ready config) linked in the video description (requires email to receive).
Credits/inspiration: Matt Ganzic — guide/video linked in the description.
Presenter’s other videos cover full setup, security/hardening, enabling skills/voice — this particular video focuses on cost optimization.

Best practices & security notes

Run OpenClaw in an isolated VPS (not on your primary desktop).
Always use API spending limits and monitor usage.
Remove or rotate keys after testing; never publish keys in configuration files.
Test changes in a fresh instance to avoid unexpected configurations.

Main speakers / sources

Presenter: Tech With Tim (references his channel and discount code).
Credit / inspiration: Matt Ganzic.
Tools / providers referenced: OpenClaw, Hostinger, Docker, Visual Studio Code Remote-SSH, Anthropic (Claude), OpenAI (GPT models), Olama, Llama (3.2b).

Notes

Exact JSON/snippet examples (openclaw.json, soul.md, heartbeat) and a condensed step‑by‑step checklist were mentioned as available in the original material.