The LLM Kept Saying “Fixed.” For Three Months, It Wasn’t.
That afternoon a Slack bot told me a script had NEVER RUN. That was a lie. The script had pulled 81 weather…
Practical AI experimentation, local LLM setup, benchmarks, and Claude Code
That afternoon a Slack bot told me a script had NEVER RUN. That was a lie. The script had pulled 81 weather…
Claude Code has a feature called auto-compact that quietly destroys your session quality. The Problem I was three hours into a multi-file…
(If you’re trying to decide which model to switch to when one runs dry, I benchmarked 15 models on 38 real coding…
Inference arbitrage means routing each AI task to the cheapest model that can handle it at acceptable quality, instead of sending everything…
Most LLM benchmarks measure raw intelligence. Real deployment decisions also depend on latency, format reliability, and data boundaries, including when a task…
I spent about two weeks of evenings getting Qwen3-Coder-30B running reliably on a Mac Studio (M1 Max, 32GB) through LM Studio and…
The full architecture for giving Claude Code persistent memory across sessions: four layers of markdown files, two commands, five cron jobs, and the 8 design rules I derived from breaking it over 22 days.
OpenClaw on Apple Silicon with a 24B local model: 14 real errors fixed, sub-agent delivery working, $1.50/month total. Every config documented.