Summary of "Save 95% of Your Tokens - OpenClaw Full Tutorial"

Goal

Reduce OpenClaw API costs massively (author claims >90% savings; personal drop from ~$100/day → <$5/day; helped friend $200/day → ~$10/day). Focus areas: model routing/selection, caching, context pruning, session initialization, local heartbeats, token audits, and practical deployment.

Quick checklist of optimizations (what to change and why)

  1. Model routing / model selection

    • Add multiple providers and models in openclaw.json (examples: Anthropic Claude — Haiku, Sonnet, Opus; OpenAI — GPT-5.1, GPT-5-mini).
    • Use an inexpensive default model (e.g., Claude Haiku 4.5/4.6) and escalate to Sonnet/Opus or GPT-5 only for advanced reasoning tasks.
    • Configure fallback order so agents automatically switch if a provider is rate-limited.
  2. Session initialization

    • Load only minimal files at session start: soul.md, user.md, and today’s memory file.
    • Avoid auto-loading full conversation history or past tool outputs; run memory search only when the user asks about past context.
    • At session end, write a short summary to memory (e.g., <500 words, bullet points).
  3. Prompt caching (major saving)

    • Enable model-side prompt caching (configure in model settings) for large/expensive models (Opus/Sonnet).
    • Configure cache retention (short ~5 min inactivity, long ~1 hour). Caching reduces repeated token costs for static prompts (soul.md, user.md).
    • Note: caching has a write cost but yields huge read savings. Cache-hit rates are visible in the gateway.
  4. Context pruning

    • Add rules to prune stale tool outputs / old messages after a TTL (e.g., ~55 minutes) to prevent context windows from bloating token counts.
    • Put context-pruning config in the defaults section of openclaw.json.
  5. Local heartbeat offload (Olama + Llama)

    • Heartbeats (periodic checks) usually hit paid APIs — route them to a free local model instead.
    • Install Olama on the VPS and run a lightweight Llama-3.2-3B model to handle heartbeat checks (CPU-capable).
    • Configure the heartbeat snippet in openclaw.json defaults so heartbeats call the local model, not paid providers.
  6. Spending limits and budget rules

    • Set platform monthly spending caps and disable auto-recharge to prevent unexpected charges.
    • Add rate-limit / pacing / spending rules in soul.md (e.g., minimum seconds between API/web requests, daily/monthly budgets, notify thresholds, fallback behavior on rate-limits).
  7. Token audits & monitoring

    • Use the OpenClaw gateway “Usage” tab and built-in commands to inspect token/cost usage.
    • Useful slash commands: /st status, /context list, /context detail to see token counts per file, cache hit rate, and session totals.
    • Run a token-audit prompt to get a session-level cost breakdown and recommendations.

Practical deployment & tooling (how to apply changes)

docker ps
docker exec -it <container-id> /bin/bash
docker restart <container-id>
# or restart the gateway if not using Docker

Commands & UI pointers

docker ps
docker exec -it <container-id> /bin/bash
docker restart <container-id>
systemctl start olama
olama pull llama-3.2b
olama run llama-3.2b
# test with a small prompt

Metrics & effects shown

Guides, downloads, and additional materials referenced

Best practices & security notes

Main speakers / sources

Notes

Exact JSON/snippet examples (openclaw.json, soul.md, heartbeat) and a condensed step‑by‑step checklist were mentioned as available in the original material.

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video