Local Models via Ollama Cost $0/Call | Field Notes

Not every task needs Claude Opus. Summarizing docs, parsing data, generating commit messages — these can run on your own hardware for $0/call. No API keys, no metered usage, no surprise bills.

What Ollama Is

Docker, but for AI models. Download a model, run it locally, get an API endpoint OpenClaw can talk to. Five-minute setup:

Install — brew install ollama or grab it from ollama.ai
Pull a model — ollama pull llama3.2:3b
Run it — ollama serve
Point OpenClaw at it — use ollama/llama3.2:3b as your model

Local LLM at localhost:11434. Zero cost per inference.

Models That Work on Normal Hardware

Tested on a 2018 MacBook Pro (32GB RAM, no GPU):

Llama 3.2 3B — the workhorse. Fast, handles summarization and text gen easily
Qwen 2.5 Coder 7B — solid for docstrings, function explanations, simple refactoring
Phi-3 — punches above its weight for parsing and structured tasks

Rule of thumb: 3B-7B parameter models run well on CPU-only with 16-32GB RAM. Bigger models need a GPU.

When to Use Local vs Cloud

Local models aren't a replacement — they're a complement.

Use local for:

High-volume, low-complexity tasks (commit messages, summaries, parsing)
Privacy-sensitive data that shouldn't leave your machine
Dev/testing — iterate without burning credits
Background agents doing routine maintenance

Use cloud for:

Complex reasoning and multi-step planning
Tasks needing up-to-date knowledge
Anything where quality is the priority

The Cost Math

100 simple API calls/day = ~$5-10/month. Not huge alone. But across agents, over a year? It adds up. Local models make that line item disappear.

More importantly: you can experiment freely. No second-guessing whether a task is "worth" an API call.

Bottom Line

If you have decent hardware on your desk, you're leaving money on the table. Ollama makes it trivially easy to run local models for the simple stuff. Complex work stays on Claude. Your bill drops. Your agent stays capable.