Sorting a Filesystem Hoard With Local LLMs: What 2,300 Files Told Me About My Obsidian Vault
What do 2,300 random files on a knowledge worker’s laptop actually look like? I let a local LLM tell me. TL;DR I…
Practical AI experimentation, local LLM setup, benchmarks, and Claude Code
What do 2,300 random files on a knowledge worker’s laptop actually look like? I let a local LLM tell me. TL;DR I…
The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The finishing line, three months later, is 39…
I pulled a Quadro M4000 out of a used Dell Precision T5820, dropped in an RTX 3090 Ti, and turned the box…
That afternoon a Slack bot told me a script had NEVER RUN. That was a lie. The script had pulled 81 weather…
Claude Code has a feature called auto-compact that quietly destroys your session quality. The Problem I was three hours into a multi-file…
(If you’re trying to decide which model to switch to when one runs dry, I benchmarked 15 models on 38 real coding…
Inference arbitrage means routing each AI task to the cheapest model that can handle it at acceptable quality, instead of sending everything…
Most LLM benchmarks measure raw intelligence. Real deployment decisions also depend on latency, format reliability, and data boundaries, including when a task…
I spent about two weeks of evenings getting Qwen3-Coder-30B running reliably on a Mac Studio (M1 Max, 32GB) through LM Studio and…
The full architecture for giving Claude Code persistent memory across sessions: four layers of markdown files, two commands, five cron jobs, and the 8 design rules I derived from breaking it over 22 days.