OpenClaw + Ollama funktioniert nicht? Streaming-Fehler, Tool-Calling-Fehler & Modell-Hänger beheben
OpenClaw mit Ollama reparieren: Streaming-Protokollfehler, Tool-Calling-Korruption, Modell-Discovery-Timeouts und korrekte Provider-Konfiguration.
The Promise and the Pain of Local LLMs
Running OpenClaw with Ollama gives you a fully local, private AI agent -- no API keys, no usage charges, no data leaving your server. It's the dream setup for self-hosters who care about privacy and cost control.
The reality is messier. Ollama's OpenAI-compatible API has subtle incompatibilities that break OpenClaw's tool calling, streaming responses corrupt mid-generation, models hang indefinitely during complex reasoning, and the wrong configuration format causes silent failures with no error messages.
Here's how to fix every common issue when running OpenClaw with Ollama.
The #1 Mistake: Using the OpenAI-Compatible Endpoint
Most guides tell you to configure Ollama as an "OpenAI-compatible" provider in OpenClaw. This technically works for simple chat, but breaks tool calling -- which is the core of what makes OpenClaw useful.
The Problem
When you configure Ollama with OpenClaw's OpenAI provider and set the base URL to http://localhost:11434/v1, you're using Ollama's OpenAI compatibility layer. This layer has a known streaming bug: when stream: true is set, the tool_calls field in the response gets corrupted or dropped entirely.
The result: OpenClaw sends a request that requires a tool call, the model generates the correct tool call, but the streaming response format mangles it. OpenClaw receives an empty or malformed tool call and either does nothing or throws an error.
Fix: Use the Native Ollama API
Configure OpenClaw to use Ollama's native API instead of the OpenAI compatibility layer:
{
"providers": {
"ollama": {
"api": "ollama",
"baseUrl": "http://localhost:11434",
"models": ["qwen3:8b"]
}
}
}
Notice: no /v1 at the end of the URL. The native API endpoint is just http://localhost:11434. Adding /v1 routes to the OpenAI compatibility layer, which is exactly what we're avoiding.
If your Ollama config has "api": "openai" or a baseUrl ending in /v1, you're using the compatibility layer. Switch to "api": "ollama" with the base URL http://localhost:11434 to fix tool calling issues.
Tool Calling Argument Corruption
Even with the native API, tool calling with Ollama models can produce corrupted arguments. This is an upstream Ollama issue (tracked as #57103) where the model generates valid JSON for tool arguments, but Ollama's response parsing occasionally truncates or misformats the JSON before passing it back.
Symptoms
- Tool calls execute with missing parameters
- JSON parse errors in OpenClaw logs:
Unexpected end of JSON input - Tools receive partial arguments (e.g., a file path cut off mid-string)
- Intermittent -- works sometimes, fails on longer argument strings
Fix: Disable Streaming for Tool Calls
In your OpenClaw agent configuration, disable streaming when tool calling is active:
{
"agents": {
"defaults": {
"streaming": false
}
}
}
This forces Ollama to return the complete response in a single JSON payload instead of streaming tokens. The tradeoff: you won't see tokens appear in real-time, but tool calls will be complete and valid.
Fix: Use Models Known to Handle Tool Calls Well
Not all Ollama models support tool calling reliably. The models with the best tool calling support as of April 2026:
| Model | Size | Tool Calling | Notes |
|---|---|---|---|
| qwen3:8b | 4.9 GB | Excellent | Best balance of speed and capability |
| qwen3:14b | 9.0 GB | Excellent | More reliable on complex multi-tool chains |
| qwen3:32b | 19 GB | Excellent | Best local model for agentic workflows |
| llama3.1:8b | 4.7 GB | Good | Solid but less reliable than Qwen3 for tools |
| mistral-nemo:12b | 7.1 GB | Fair | Works for simple single-tool calls |
| deepseek-r1:8b | 4.9 GB | Poor | Thinking model, drops tool_calls frequently |
The Qwen3 series is currently the best choice for OpenClaw + Ollama. It has native tool calling support that doesn't rely on prompt hacking, and Ollama's implementation handles it cleanly.
Model Hangs: No Response After Prompt
You send a message, OpenClaw shows "thinking..." and nothing ever comes back. No error, no timeout, just infinite waiting.
Cause 1: Model Not Downloaded
Ollama doesn't auto-download models. If your OpenClaw config references qwen3:8b but you haven't pulled it, Ollama returns an error that OpenClaw may not surface clearly.
# Check what models are available locally
ollama list
# Pull the model you need
ollama pull qwen3:8b
Cause 2: Insufficient VRAM/RAM
The model fits in memory on paper, but the actual inference requires more than the model file size. A 8B parameter model at Q4 quantization needs ~4.9 GB for weights plus 1-2 GB for context, KV cache, and inference overhead.
Check if Ollama is actually loading the model:
# Watch Ollama logs for loading/memory errors
journalctl -u ollama -f
# Or if running in Docker
docker logs ollama --tail 50 -f
Look for messages like out of memory, insufficient resources, or model load failed.
Cause 3: Context Length Exceeds Model Limit
OpenClaw sends increasingly large context as conversations grow. If the total context exceeds the model's configured limit, Ollama may hang rather than returning an error.
# Check the model's default context length
ollama show qwen3:8b --modelfile | grep num_ctx
Default is usually 2048 or 4096 tokens. For OpenClaw agent workflows, you likely need more:
# Create a custom model with larger context
cat << 'EOF' > Modelfile
FROM qwen3:8b
PARAMETER num_ctx 8192
EOF
ollama create qwen3-8k -f Modelfile
Then update your OpenClaw config to use qwen3-8k.
Cause 4: Model Discovery Timeout
When OpenClaw starts, it queries Ollama to discover available models. If Ollama is still loading or the API is slow to respond, OpenClaw may time out during discovery and fail to register the provider.
# Verify Ollama API is responding
curl http://localhost:11434/api/tags
If this is slow or times out, Ollama isn't ready yet. Ensure Ollama starts before OpenClaw in your Docker Compose:
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 10
openclaw:
image: openclaw/openclaw:latest
depends_on:
ollama:
condition: service_healthy
Thinking Models Dropping Tool Calls
If you're using a "thinking" or "reasoning" model (like DeepSeek-R1 or QwQ), you'll notice it frequently generates a reasoning chain but then fails to emit the tool call. The model "thinks" about what tool to use but never actually calls it.
Why It Happens
Thinking models use <think>...</think> blocks in their output. When OpenClaw parses the response, it may interpret the thinking block as the complete response and miss the tool call that follows. Additionally, some thinking models at smaller sizes (7B-14B) simply don't have enough capacity to maintain both a reasoning chain and structured tool call output.
Fix: Disable Thinking for Tool-Heavy Agents
{
"agents": {
"defaults": {
"thinkingDefault": false
}
}
}
Or switch to a non-thinking model for agents that rely heavily on tool calling. Use thinking models only for analysis and reasoning tasks where tool calling isn't needed.
Performance: Speed vs Quality Tradeoffs
Local LLMs are inherently slower than cloud APIs. Here's how to optimize response time:
GPU vs CPU Inference
| Setup | Tokens/sec (8B model) | Typical Response Time |
|---|---|---|
| NVIDIA RTX 4090 (24 GB VRAM) | 80-120 tok/s | 1-3 seconds |
| NVIDIA RTX 3060 (12 GB VRAM) | 40-60 tok/s | 3-6 seconds |
| CPU only (8 cores, 32 GB RAM) | 5-15 tok/s | 10-30 seconds |
| CPU only (4 cores, 16 GB RAM) | 3-8 tok/s | 20-60 seconds |
For a VPS without a GPU, CPU inference with an 8B model is the practical limit. Anything larger will be painfully slow.
Fix: Choose the Right Model Size for Your Hardware
| VPS RAM | Max Model | Quantization | Practical Use |
|---|---|---|---|
| 4 GB | 3B | Q4_K_M | Simple Q&A, basic tasks |
| 8 GB | 8B | Q4_K_M | General agent, tool calling |
| 16 GB | 14B | Q4_K_M | Better reasoning, code generation |
| 32 GB | 32B | Q4_K_M | Complex agents, multi-step workflows |
| 64 GB | 70B | Q4_K_M | Near-cloud quality |
The model file size shown by ollama list is the disk size, not the RAM requirement. Actual RAM usage during inference is 20-40% higher due to context window, KV cache, and processing overhead.
Contabo
Contabo VPS 2: 16 GB RAM, 6 vCPU for $8.49/mo. Run Ollama with 14B models on CPU for a fully private OpenClaw agent.
* Affiliate link — we may earn a commission at no extra cost to you.
Complete Docker Compose: OpenClaw + Ollama
Here's a production-ready setup for running both services:
services:
ollama:
image: ollama/ollama:latest
restart: always
ports:
- "127.0.0.1:11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_MAX_LOADED_MODELS=1
deploy:
resources:
limits:
memory: 12g
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 10
openclaw:
image: openclaw/openclaw:latest
restart: always
ports:
- "127.0.0.1:18789:18789"
environment:
- OPENCLAW_GATEWAY_TOKEN=your-secure-token
volumes:
- openclaw_data:/root/.openclaw
depends_on:
ollama:
condition: service_healthy
volumes:
ollama_data:
openclaw_data:
Key settings:
OLLAMA_NUM_PARALLEL=2-- limits concurrent inference to 2 requests. Higher values use more RAM.OLLAMA_MAX_LOADED_MODELS=1-- keeps only one model in memory at a time. Essential for limited RAM.- Memory limit of 12 GB -- leaves room for OpenClaw and the OS on a 16 GB VPS.
After starting, pull your model:
docker exec ollama ollama pull qwen3:8b
Troubleshooting Checklist
When OpenClaw + Ollama isn't working:
- Is Ollama running? --
curl http://localhost:11434/api/tagsshould list models - Is the model downloaded? --
ollama listshould show your model - Is the API format correct? -- Use
"api": "ollama"with no/v1in the URL - Are tool calls working? -- Test with a simple tool-calling prompt; if it fails, disable streaming
- Is the model hanging? -- Check Ollama logs for memory errors; reduce model size or increase RAM
- Is context too large? -- Enable context compaction in OpenClaw, increase
num_ctxin the model - Is inference too slow? -- Use a smaller model, enable GPU passthrough, or consider a cloud API for time-sensitive tasks
VPS Sizing for OpenClaw + Ollama
| Use Case | RAM | vCPU | Model | Cost |
|---|---|---|---|---|
| Experimentation | 8 GB | 4 | 3B-8B | $4.50-8/mo |
| Personal agent | 16 GB | 6 | 8B-14B | $8-15/mo |
| Production agent | 32 GB | 8 | 14B-32B | $15-30/mo |
| Team deployment | 64 GB | 16 | 32B-70B | $30-60/mo |
Provider recommendations for Ollama workloads:
- Contabo VPS 1 ($4.50/mo): 8 GB RAM -- enough for 8B models on CPU
- Contabo VPS 2 ($8.49/mo): 16 GB RAM -- runs 14B models comfortably
- Hostinger KVM 4 ($15.99/mo): 16 GB RAM with NVMe for fast model loading
- Vultr GPU instances (from $90/mo): NVIDIA A100 for real-time inference speeds
Hostinger
Hostinger KVM 4: 16 GB RAM and NVMe storage for $15.99/mo. Fast model loading and enough memory for 14B parameter models.
* Affiliate link — we may earn a commission at no extra cost to you.
Conclusion
OpenClaw + Ollama is a powerful combination for private, cost-free AI agents -- but the integration has sharp edges. The biggest wins come from using the native Ollama API (not the OpenAI compatibility layer), choosing models with good tool calling support (Qwen3 series), and sizing your VPS correctly for the model you want to run.
Fix the streaming bug by using "api": "ollama", fix tool calling by disabling streaming or choosing a better model, and fix hangs by ensuring your model fits in memory with room for context. Once these are dialed in, you get a fully local AI agent that costs nothing per message.
Bereit zum Automatisieren? Holen Sie sich heute einen VPS.
Starten Sie noch heute mit Hostinger VPS-Hosting. Sonderpreise verfügbar.
* Affiliate-Link — wir erhalten möglicherweise eine Provision ohne zusätzliche Kosten für Sie