Skip to main content
指南10 min read

为什么 OpenClaw 这么慢?自托管实例性能优化指南

OpenClaw 响应需要10-30秒?修复上下文累积、工具输出膨胀和会话增长导致的性能问题。

作者 AutomationVPS

From 23 Seconds to 4 Seconds

An unoptimized OpenClaw instance averages 23-second response times and can cost $50-100/day in API calls. An optimized one responds in under 4 seconds and costs less than $2/day. The difference isn't a better model or a faster server -- it's configuration.

OpenClaw's default settings are tuned for maximum capability, not speed. Conversations accumulate unlimited context, tool outputs are stored permanently in session memory, and the system prompt alone consumes 15,000+ tokens before your first message. Every API call resends all of this.

Here's how to fix each bottleneck.

Root Cause #1: Unbounded Context Accumulation

This is the biggest performance killer. By default, OpenClaw sends your entire conversation history with every API call. After 10 rounds of tool-heavy interaction, that context can exceed 150,000 tokens. Each API call takes longer, costs more, and eventually hits the model's context limit.

The math: A conversation with 150K tokens at Claude Sonnet pricing ($3/million input tokens) costs ~$0.45 per message. Send 100 messages in a day and you're burning $45 on context alone.

Fix: Enable Context Compaction

Context compaction automatically summarizes older conversation turns when the context grows too large:

{
  "agents": {
    "defaults": {
      "contextCompaction": true,
      "maxContextLength": 4000
    }
  }
}

This tells OpenClaw to summarize conversation history when it exceeds 4,000 tokens, down from the default 16,000. Older messages get compressed into a brief summary while recent messages stay intact.

Fix: Limit Conversation History

Cap how many turns are kept in active context:

{
  "agents": {
    "defaults": {
      "conversationHistoryLimit": 50
    }
  }
}

This keeps only the last 50 turns. For most interactions, you don't need the full conversation from hours ago -- just recent context.

Setting maxContextLength to 4000 and conversationHistoryLimit to 50 typically cuts API costs by 80%+ while having minimal impact on response quality for ongoing conversations.

Root Cause #2: Tool Output Bloat

Every time OpenClaw calls a tool (reads a file, runs a command, browses a webpage), the full output is stored in the session and included in subsequent API calls. A single cat of a 500-line file adds thousands of tokens. A browser snapshot can add 10,000+ tokens.

After a few tool-heavy interactions, tool outputs can consume 30-40% of your context window -- context that you're paying for with every subsequent message.

Fix: Cap File Contributions

Limit how much of any single file gets injected into context:

{
  "agents": {
    "defaults": {
      "bootstrapMaxChars": 5000
    }
  }
}

The default is 20,000 characters per file. Dropping to 5,000 significantly reduces context bloat from workspace files and tool outputs.

Fix: Reduce Image Resolution

If your agent uses vision (screenshots, image analysis), reduce the default resolution:

{
  "agents": {
    "defaults": {
      "imageMaxDimensionPx": 800
    }
  }
}

The default is 1200px. Lowering to 800 reduces token usage per image without significantly affecting comprehension for most tasks.

Root Cause #3: Session File Growth

OpenClaw stores session data on disk. Without cleanup, session files grow unbounded -- 114 MB with 2,700+ sessions after just two weeks of cron job activity. On a SATA SSD, this growth causes I/O bottlenecks that add 20-100ms per read/write operation.

Fix: Enable Session Rotation and Cleanup

{
  "sessions": {
    "rotateBytes": "20mb",
    "maxDiskBytes": "1gb",
    "pruneAfter": "45d"
  }
}

This rotates session files at 20 MB, caps total disk usage at 1 GB, and automatically deletes sessions older than 45 days.

Fix: Use NVMe Storage

This isn't a configuration change -- it's a hardware choice. NVMe sustains 2,800-3,800 MB/s throughput vs SATA's 500 MB/s. For OpenClaw's constant log and session writes, NVMe keeps I/O wait under 1% where SATA can spike to 5-15%.

Every VPS provider we recommend includes NVMe storage on current plans. If you're on an older SATA-based VPS, upgrading to NVMe is one of the highest-impact changes you can make.

Hostinger

All Hostinger VPS plans include NVMe storage. KVM 1 at $6.49/mo gives you 50 GB NVMe — essential for responsive OpenClaw performance.

Visit Hostinger

* Affiliate link — we may earn a commission at no extra cost to you.

Root Cause #4: System Prompt Overhead

OpenClaw's default system prompt is massive: 15,000+ tokens before any conversation begins. This includes AGENTS.md, SOUL.md, loaded skills, and workspace file descriptions. Every single API call resends this prompt.

Fix: Trim Your System Prompt

Review what's loaded by default and remove what you don't need:

  • Disable skills you're not using (each loaded skill adds to the system prompt)
  • Reduce workspace file descriptions
  • Use a focused agent configuration instead of the default generalist

The leaner your system prompt, the faster and cheaper every API call becomes.

Root Cause #5: Model Selection

Not all models respond at the same speed. The model you choose has a direct impact on latency:

ModelTypical LatencyCostBest For
Gemini 2.5 Flash<1 secondVery lowHigh-volume, speed-critical tasks
Claude Sonnet 4.61-3 secondsMediumCode, reasoning, reliable tool calling
Claude Opus 4.62-5 secondsHighComplex reasoning, analysis
Local Ollama (7B)3-10 secondsFreePrivacy-focused, simple tasks
Local Ollama (32B)10-30 secondsFreeOffline capability, data sovereignty

Fix: Reduce Thinking Overhead

For real-time interactions, disable extended chain-of-thought:

{
  "agents": {
    "defaults": {
      "thinkingDefault": false
    }
  }
}

This can cut response latency roughly in half (from ~2.2 seconds to ~1.1 seconds on cloud models) at the cost of less detailed reasoning.

Root Cause #6: Version-Specific Regressions

Recent OpenClaw versions have introduced performance issues:

  • v2026.4.5: Worker processes load all plugins independently, spawning 87+ child processes on 8-core systems, each using 20-50% CPU
  • v2026.4.9: Gateway memory consumption increased ~130 MB compared to v2026.4.8
  • General: Gateway memory grows to 1.9 GB RSS after 13 hours of continuous operation, reaching 69% CPU

Fix: Monitor and Restart

If you're seeing progressive slowdown over hours, set up a scheduled restart:

# Restart OpenClaw gateway daily at 4 AM
0 4 * * * docker restart openclaw

This isn't ideal, but it prevents the memory accumulation from degrading performance over time.

Root Cause #7: Network Latency

If your VPS is in Europe but your LLM provider (Anthropic, OpenAI) has endpoints primarily in the US, every API call adds 100-150 ms of round-trip latency. For a tool-heavy workflow with 10+ API calls, that's over a second of pure network overhead.

Fix: Colocate VPS with API Provider

Choose a VPS in the same region as your LLM provider:

  • Anthropic (Claude): US-based servers -- choose a US East VPS
  • OpenAI (GPT): US-based -- choose US East or US West
  • Google (Gemini): Global, but US gives lowest latency

Vultr has 32 data centers across 6 continents, making it the best choice for optimizing API latency regardless of your provider.

Complete Optimization Config

Here's a production-optimized ~/.openclaw/openclaw.json:

{
  "gateway": {
    "mode": "local",
    "host": "127.0.0.1",
    "port": 18789,
    "auth": {
      "mode": "token",
      "token": "your-secure-token"
    }
  },
  "agents": {
    "defaults": {
      "contextCompaction": true,
      "maxContextLength": 4000,
      "conversationHistoryLimit": 50,
      "bootstrapMaxChars": 5000,
      "imageMaxDimensionPx": 800,
      "thinkingDefault": false
    }
  },
  "sessions": {
    "rotateBytes": "20mb",
    "maxDiskBytes": "1gb",
    "pruneAfter": "45d"
  }
}

Monitoring Performance

Quick Check

# Check gateway memory usage
docker stats openclaw --no-stream

# Check session storage size
du -sh ~/.openclaw/sessions/

# Check I/O wait (should be under 1%)
iostat -x 1 3

Ongoing Monitoring

Set up ClawMetry for a free dashboard:

# One-command install
npx clawmetry setup

Or use OpenTelemetry to export metrics to your monitoring stack. Track: token usage per conversation, context size over time, response latency, and gateway memory.

Contabo

Contabo VPS 1: 8 GB RAM, 4 vCPU, NVMe storage for $4.50/mo. The performance headroom you need for a fast OpenClaw instance.

Visit Contabo

* Affiliate link — we may earn a commission at no extra cost to you.

VPS Sizing for Optimal Performance

PriorityMinimumRecommendedWhy
RAM4 GB8 GBPrevents swapping; gateway needs 1.5-3 GB
Storage40 GB NVMe80 GB NVMeSession files grow; NVMe keeps I/O fast
CPU2 vCPU4 vCPUPlugin worker processes need headroom
Network50 Mbps100 MbpsLow latency to API providers matters most

Provider picks for performance:

  • Contabo VPS 1 ($4.50/mo): 8 GB RAM, best raw performance per dollar
  • Hostinger KVM 2 ($8.49/mo): 8 GB RAM, easy management, NVMe included
  • Vultr ($10-20/mo): Best location flexibility for API latency optimization
  • DigitalOcean ($12-24/mo): Monitoring built in, great for debugging

Conclusion

OpenClaw's default configuration prioritizes capability over speed. By enabling context compaction, capping tool output sizes, managing session files, and choosing the right model, you can cut response times from 23 seconds to under 4 seconds -- and reduce API costs by 80% or more.

Start with the complete optimization config above, monitor your instance with ClawMetry or docker stats, and upgrade to NVMe storage if you haven't already. The difference is dramatic.

准备好开始自动化了吗?立即获取VPS。

立即开始使用 Hostinger VPS 主机。特惠价格可用。

获取 Hostinger VPS

* 联盟链接 — 我们可能会获得佣金,不会增加您的费用

#openclaw#performance#optimization#self-hosting#latency