Inference Arbitrage: How I Route 200+ Daily LLM Calls Across Five Models
Inference arbitrage means routing each AI task to the cheapest model that can handle it at acceptable quality, instead of sending everything…
Technology, tools, and productivity systems
Inference arbitrage means routing each AI task to the cheapest model that can handle it at acceptable quality, instead of sending everything…
Most LLM benchmarks measure raw intelligence. Real deployment decisions also depend on latency, format reliability, and data boundaries, including when a task…
I spent about two weeks of evenings getting Qwen3-Coder-30B running reliably on a Mac Studio (M1 Max, 32GB) through LM Studio and…
The full architecture for giving Claude Code persistent memory across sessions: four layers of markdown files, two commands, five cron jobs, and the 8 design rules I derived from breaking it over 22 days.
OpenClaw on Apple Silicon with a 24B local model: 14 real errors fixed, sub-agent delivery working, $1.50/month total. Every config documented.
I’ve been hiring tech talent in Victoria since before the pandemic, first at a series of companies and now as CEO of…