Cloud LLM Comparison for Coding Agents (2026)
Multi-source benchmark comparison of cloud LLMs for coding agents. Scores from our 38-task benchmark, Aider edit leaderboard, and LMSYS Arena.
| Model | Provider | Our Score | Aider Score | LMSYS Elo | Cost/Task | Speed | Privacy | Install |
|---|---|---|---|---|---|---|---|---|
| Claude Sonnet 4.6 | Anthropic | 100.0% | N/A | 1287 | $0.0052 | 53 tok/s | No training | aider --model claude-sonnet-4-6 |
| Claude Opus 4.6 | Anthropic | 98.6% | N/A | 1280 | $0.0181 | 38 tok/s | No training | aider --model claude-opus-4-6 |
| MiniMax M2.5 | Minimax | 98.6% | N/A | N/A | $0.0018 | 98 tok/s | Unknown | aider --model minimax/minimax-m2.5 |
| Kimi K2.5 | Moonshot | 98.6% | N/A | 1265 | $0.0034 | 48 tok/s | Unknown | aider --model moonshotai/kimi-k2.5 |
| Gemini 2.5 Pro | 98.3% | N/A | 1295 | $0.0187 | 116 tok/s | No training | aider --model gemini-2.5-pro | |
| GPT-5.2 Codex | Openai | 98.3% | N/A | 1285 | $0.0042 | 40 tok/s | Opt-out | aider --model gpt-5.2-codex |
| GPT-5.2 | Openai | 98.0% | N/A | 1290 | $0.0038 | 59 tok/s | Opt-out | aider --model gpt-5.2 |
| Gemini 2.5 Flash | 97.1% | N/A | 1260 | $0.0001 | 112 tok/s | No training | aider --model gemini-2.5-flash | |
| DeepSeek R1 | Deepseek | 96.8% | N/A | 1270 | $0.0032 | 39 tok/s | Unknown | aider --model deepseek/deepseek-r1 |
| Claude Haiku 4.5 | Anthropic | 95.9% | N/A | N/A | $0.0009 | 77 tok/s | No training | aider --model claude-haiku-4-5-20251001 |
| GPT-5 Nano | Openai | 94.8% | N/A | 1220 | $0.0007 | 104 tok/s | Opt-out | aider --model openai/gpt-5-nano |
| DeepSeek V3 | Deepseek | 88.7% | N/A | 1240 | $0.0002 | 20 tok/s | Unknown | aider --model deepseek/deepseek-chat |