Field Notes/Cost Optimization
🔍 Discovery

Local Models via Ollama Cost $0/Call

2 min read

Not every task needs Claude Opus. Summarizing docs, parsing data, generating commit messages — these can run on your own hardware for $0/call. No API keys, no metered usage, no surprise bills.

What Ollama Is

Docker, but for AI models. Download a model, run it locally, get an API endpoint OpenClaw can talk to. Five-minute setup:

  • Installbrew install ollama or grab it from ollama.ai
  • Pull a modelollama pull llama3.2:3b
  • Run itollama serve
  • Point OpenClaw at it — use ollama/llama3.2:3b as your model

Local LLM at localhost:11434. Zero cost per inference.

Models That Work on Normal Hardware

Tested on a 2018 MacBook Pro (32GB RAM, no GPU):

  • Llama 3.2 3B — the workhorse. Fast, handles summarization and text gen easily
  • Qwen 2.5 Coder 7B — solid for docstrings, function explanations, simple refactoring
  • Phi-3 — punches above its weight for parsing and structured tasks

Rule of thumb: 3B-7B parameter models run well on CPU-only with 16-32GB RAM. Bigger models need a GPU.

When to Use Local vs Cloud

Local models aren't a replacement — they're a complement.

Use local for:

  • High-volume, low-complexity tasks (commit messages, summaries, parsing)
  • Privacy-sensitive data that shouldn't leave your machine
  • Dev/testing — iterate without burning credits
  • Background agents doing routine maintenance

Use cloud for:

  • Complex reasoning and multi-step planning
  • Tasks needing up-to-date knowledge
  • Anything where quality is the priority

The Cost Math

100 simple API calls/day = ~$5-10/month. Not huge alone. But across agents, over a year? It adds up. Local models make that line item disappear.

More importantly: you can experiment freely. No second-guessing whether a task is "worth" an API call.

Bottom Line

If you have decent hardware on your desk, you're leaving money on the table. Ollama makes it trivially easy to run local models for the simple stuff. Complex work stays on Claude. Your bill drops. Your agent stays capable.