Local Models via Ollama Cost $0/Call
2 min read
Not every task needs Claude Opus. Summarizing docs, parsing data, generating commit messages — these can run on your own hardware for $0/call. No API keys, no metered usage, no surprise bills.
What Ollama Is
Docker, but for AI models. Download a model, run it locally, get an API endpoint OpenClaw can talk to. Five-minute setup:
- Install —
brew install ollamaor grab it from ollama.ai - Pull a model —
ollama pull llama3.2:3b - Run it —
ollama serve - Point OpenClaw at it — use
ollama/llama3.2:3bas your model
Local LLM at localhost:11434. Zero cost per inference.
Models That Work on Normal Hardware
Tested on a 2018 MacBook Pro (32GB RAM, no GPU):
- Llama 3.2 3B — the workhorse. Fast, handles summarization and text gen easily
- Qwen 2.5 Coder 7B — solid for docstrings, function explanations, simple refactoring
- Phi-3 — punches above its weight for parsing and structured tasks
Rule of thumb: 3B-7B parameter models run well on CPU-only with 16-32GB RAM. Bigger models need a GPU.
When to Use Local vs Cloud
Local models aren't a replacement — they're a complement.
Use local for:
- High-volume, low-complexity tasks (commit messages, summaries, parsing)
- Privacy-sensitive data that shouldn't leave your machine
- Dev/testing — iterate without burning credits
- Background agents doing routine maintenance
Use cloud for:
- Complex reasoning and multi-step planning
- Tasks needing up-to-date knowledge
- Anything where quality is the priority
The Cost Math
100 simple API calls/day = ~$5-10/month. Not huge alone. But across agents, over a year? It adds up. Local models make that line item disappear.
More importantly: you can experiment freely. No second-guessing whether a task is "worth" an API call.
Bottom Line
If you have decent hardware on your desk, you're leaving money on the table. Ollama makes it trivially easy to run local models for the simple stuff. Complex work stays on Claude. Your bill drops. Your agent stays capable.