Why Is OpenClaw So Slow? Performance Optimization Guide for Self-Hosted Instances
OpenClaw responses taking 10-30 seconds? Here's why context accumulation, tool output bloat, and session growth are killing your performance — and how to fix each one.
From 23 Seconds to 4 Seconds
An unoptimized OpenClaw instance averages 23-second response times and can cost $50-100/day in API calls. An optimized one responds in under 4 seconds and costs less than $2/day. The difference isn't a better model or a faster server -- it's configuration.
OpenClaw's default settings are tuned for maximum capability, not speed. Conversations accumulate unlimited context, tool outputs are stored permanently in session memory, and the system prompt alone consumes 15,000+ tokens before your first message. Every API call resends all of this.
Here's how to fix each bottleneck.
Root Cause #1: Unbounded Context Accumulation
This is the biggest performance killer. By default, OpenClaw sends your entire conversation history with every API call. After 10 rounds of tool-heavy interaction, that context can exceed 150,000 tokens. Each API call takes longer, costs more, and eventually hits the model's context limit.
The math: A conversation with 150K tokens at Claude Sonnet pricing ($3/million input tokens) costs ~$0.45 per message. Send 100 messages in a day and you're burning $45 on context alone.
Fix: Enable Context Compaction
Context compaction automatically summarizes older conversation turns when the context grows too large:
{
"agents": {
"defaults": {
"contextCompaction": true,
"maxContextLength": 4000
}
}
}
This tells OpenClaw to summarize conversation history when it exceeds 4,000 tokens, down from the default 16,000. Older messages get compressed into a brief summary while recent messages stay intact.
Fix: Limit Conversation History
Cap how many turns are kept in active context:
{
"agents": {
"defaults": {
"conversationHistoryLimit": 50
}
}
}
This keeps only the last 50 turns. For most interactions, you don't need the full conversation from hours ago -- just recent context.
Setting maxContextLength to 4000 and conversationHistoryLimit to 50 typically cuts API costs by 80%+ while having minimal impact on response quality for ongoing conversations.
Root Cause #2: Tool Output Bloat
Every time OpenClaw calls a tool (reads a file, runs a command, browses a webpage), the full output is stored in the session and included in subsequent API calls. A single cat of a 500-line file adds thousands of tokens. A browser snapshot can add 10,000+ tokens.
After a few tool-heavy interactions, tool outputs can consume 30-40% of your context window -- context that you're paying for with every subsequent message.
Fix: Cap File Contributions
Limit how much of any single file gets injected into context:
{
"agents": {
"defaults": {
"bootstrapMaxChars": 5000
}
}
}
The default is 20,000 characters per file. Dropping to 5,000 significantly reduces context bloat from workspace files and tool outputs.
Fix: Reduce Image Resolution
If your agent uses vision (screenshots, image analysis), reduce the default resolution:
{
"agents": {
"defaults": {
"imageMaxDimensionPx": 800
}
}
}
The default is 1200px. Lowering to 800 reduces token usage per image without significantly affecting comprehension for most tasks.
Root Cause #3: Session File Growth
OpenClaw stores session data on disk. Without cleanup, session files grow unbounded -- 114 MB with 2,700+ sessions after just two weeks of cron job activity. On a SATA SSD, this growth causes I/O bottlenecks that add 20-100ms per read/write operation.
Fix: Enable Session Rotation and Cleanup
{
"sessions": {
"rotateBytes": "20mb",
"maxDiskBytes": "1gb",
"pruneAfter": "45d"
}
}
This rotates session files at 20 MB, caps total disk usage at 1 GB, and automatically deletes sessions older than 45 days.
Fix: Use NVMe Storage
This isn't a configuration change -- it's a hardware choice. NVMe sustains 2,800-3,800 MB/s throughput vs SATA's 500 MB/s. For OpenClaw's constant log and session writes, NVMe keeps I/O wait under 1% where SATA can spike to 5-15%.
Every VPS provider we recommend includes NVMe storage on current plans. If you're on an older SATA-based VPS, upgrading to NVMe is one of the highest-impact changes you can make.
Hostinger
All Hostinger VPS plans include NVMe storage. KVM 1 at $6.49/mo gives you 50 GB NVMe — essential for responsive OpenClaw performance.
* Affiliate link — we may earn a commission at no extra cost to you.
Root Cause #4: System Prompt Overhead
OpenClaw's default system prompt is massive: 15,000+ tokens before any conversation begins. This includes AGENTS.md, SOUL.md, loaded skills, and workspace file descriptions. Every single API call resends this prompt.
Fix: Trim Your System Prompt
Review what's loaded by default and remove what you don't need:
- Disable skills you're not using (each loaded skill adds to the system prompt)
- Reduce workspace file descriptions
- Use a focused agent configuration instead of the default generalist
The leaner your system prompt, the faster and cheaper every API call becomes.
Root Cause #5: Model Selection
Not all models respond at the same speed. The model you choose has a direct impact on latency:
| Model | Typical Latency | Cost | Best For |
|---|---|---|---|
| Gemini 2.5 Flash | <1 second | Very low | High-volume, speed-critical tasks |
| Claude Sonnet 4.6 | 1-3 seconds | Medium | Code, reasoning, reliable tool calling |
| Claude Opus 4.6 | 2-5 seconds | High | Complex reasoning, analysis |
| Local Ollama (7B) | 3-10 seconds | Free | Privacy-focused, simple tasks |
| Local Ollama (32B) | 10-30 seconds | Free | Offline capability, data sovereignty |
Fix: Reduce Thinking Overhead
For real-time interactions, disable extended chain-of-thought:
{
"agents": {
"defaults": {
"thinkingDefault": false
}
}
}
This can cut response latency roughly in half (from ~2.2 seconds to ~1.1 seconds on cloud models) at the cost of less detailed reasoning.
Root Cause #6: Version-Specific Regressions
Recent OpenClaw versions have introduced performance issues:
- v2026.4.5: Worker processes load all plugins independently, spawning 87+ child processes on 8-core systems, each using 20-50% CPU
- v2026.4.9: Gateway memory consumption increased ~130 MB compared to v2026.4.8
- General: Gateway memory grows to 1.9 GB RSS after 13 hours of continuous operation, reaching 69% CPU
Fix: Monitor and Restart
If you're seeing progressive slowdown over hours, set up a scheduled restart:
# Restart OpenClaw gateway daily at 4 AM
0 4 * * * docker restart openclaw
This isn't ideal, but it prevents the memory accumulation from degrading performance over time.
Root Cause #7: Network Latency
If your VPS is in Europe but your LLM provider (Anthropic, OpenAI) has endpoints primarily in the US, every API call adds 100-150 ms of round-trip latency. For a tool-heavy workflow with 10+ API calls, that's over a second of pure network overhead.
Fix: Colocate VPS with API Provider
Choose a VPS in the same region as your LLM provider:
- Anthropic (Claude): US-based servers -- choose a US East VPS
- OpenAI (GPT): US-based -- choose US East or US West
- Google (Gemini): Global, but US gives lowest latency
Vultr has 32 data centers across 6 continents, making it the best choice for optimizing API latency regardless of your provider.
Complete Optimization Config
Here's a production-optimized ~/.openclaw/openclaw.json:
{
"gateway": {
"mode": "local",
"host": "127.0.0.1",
"port": 18789,
"auth": {
"mode": "token",
"token": "your-secure-token"
}
},
"agents": {
"defaults": {
"contextCompaction": true,
"maxContextLength": 4000,
"conversationHistoryLimit": 50,
"bootstrapMaxChars": 5000,
"imageMaxDimensionPx": 800,
"thinkingDefault": false
}
},
"sessions": {
"rotateBytes": "20mb",
"maxDiskBytes": "1gb",
"pruneAfter": "45d"
}
}
Monitoring Performance
Quick Check
# Check gateway memory usage
docker stats openclaw --no-stream
# Check session storage size
du -sh ~/.openclaw/sessions/
# Check I/O wait (should be under 1%)
iostat -x 1 3
Ongoing Monitoring
Set up ClawMetry for a free dashboard:
# One-command install
npx clawmetry setup
Or use OpenTelemetry to export metrics to your monitoring stack. Track: token usage per conversation, context size over time, response latency, and gateway memory.
Contabo
Contabo VPS 1: 8 GB RAM, 4 vCPU, NVMe storage for $4.50/mo. The performance headroom you need for a fast OpenClaw instance.
* Affiliate link — we may earn a commission at no extra cost to you.
VPS Sizing for Optimal Performance
| Priority | Minimum | Recommended | Why |
|---|---|---|---|
| RAM | 4 GB | 8 GB | Prevents swapping; gateway needs 1.5-3 GB |
| Storage | 40 GB NVMe | 80 GB NVMe | Session files grow; NVMe keeps I/O fast |
| CPU | 2 vCPU | 4 vCPU | Plugin worker processes need headroom |
| Network | 50 Mbps | 100 Mbps | Low latency to API providers matters most |
Provider picks for performance:
- Contabo VPS 1 ($4.50/mo): 8 GB RAM, best raw performance per dollar
- Hostinger KVM 2 ($8.49/mo): 8 GB RAM, easy management, NVMe included
- Vultr ($10-20/mo): Best location flexibility for API latency optimization
- DigitalOcean ($12-24/mo): Monitoring built in, great for debugging
Conclusion
OpenClaw's default configuration prioritizes capability over speed. By enabling context compaction, capping tool output sizes, managing session files, and choosing the right model, you can cut response times from 23 seconds to under 4 seconds -- and reduce API costs by 80% or more.
Start with the complete optimization config above, monitor your instance with ClawMetry or docker stats, and upgrade to NVMe storage if you haven't already. The difference is dramatic.
Ready to start automating? Get a VPS today.
Get started with Hostinger VPS hosting today. Special pricing available.
* Affiliate link — we may earn a commission at no extra cost to you