Why Your llama.cpp Benchmarks Are Wrong: GPU Architecture and Real Numbers
I pulled an aging Quadro out of my homelab LLM box, dropped in an RTX 2060 SUPER, and the thing booted on…
Practical AI experimentation, local LLM setup, benchmarks, and Claude Code
I pulled an aging Quadro out of my homelab LLM box, dropped in an RTX 2060 SUPER, and the thing booted on…
I built a honeypot canary that screens web content with a deliberately gullible LLM before my AI agent reads it. Here is what it caught.
On May 31, 2026, one of my LLM providers quietly deleted most of its free models, including the exact one my code…
Within a week of standing up the dashboard, three panels were showing bad data. None of them was a Grafana bug. I…
Driving WordPress from Claude Code (Anthropic’s terminal coding agent) fights the platform’s core assumption roughly every other operation. WordPress was built for…
What do 2,300 random files on a knowledge worker’s laptop actually look like? I let a local LLM tell me. TL;DR I…
The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The finishing line, three months later, is 39…
I pulled a Quadro M4000 out of a used Dell Precision T5820, dropped in an RTX 3090 Ti, and turned the box…
That afternoon a Slack bot told me a script had NEVER RUN. That was a lie. The script had pulled 81 weather…
Claude Code has a feature called auto-compact that quietly destroys your session quality. The Problem I was three hours into a multi-file…