Bigger AI models aren't getting much better at coding anymore
There are signs that making AI models larger and more expensive is no longer delivering the same jumps in coding ability it once did. Benchmark scores may still climb, but real-world coding improvements feel smaller with each new release. Developers relying on AI coding tools should temper their expectations for the next wave of upgrades.
The AI industry has long operated on the assumption that more computing power and more training data equals a smarter model. For many general tasks, that held true. But in the specific domain of software engineering, that curve appears to be flattening — a pattern sometimes called 'diminishing returns.'
In practice, this means tools like GitHub Copilot, Cursor, or Claude may not feel meaningfully better with each new model version, even if the company behind them announces higher benchmark scores. Benchmarks measure performance on standardized tests, which don't always reflect the messy, complex coding problems developers actually face day-to-day. For solo developers and makers who rely heavily on AI coding assistants, this is a signal to think critically about which tool genuinely helps, rather than assuming the newest model is always the best choice.
Key points
- Scaling AI models up is yielding smaller and smaller gains specifically in software development
- Benchmark scores can keep rising even when real coding ability improvements stall
- Tools like Copilot, Cursor, and Claude may feel similar across new model releases
- 'Diminishing returns' means each extra investment produces less improvement than the last
- It's worth evaluating AI coding tools on actual tasks, not just claimed benchmark numbers
Quick term guide
- AI models
- The core brain or underlying program that powers an artificial intelligence tool.
- benchmark
- A test used to compare speed, quality, or cost.
- developers
- Developers are people who build software, apps, or websites.
- AI coding tool
- Software that uses AI to help write, edit, or explain code.
- software
- Programs or apps that run on a computer or smartphone.
- diminishing returns
- When putting in more effort or money starts producing smaller and smaller improvements.
- GitHub Copilot
- A popular tool that helps programmers write code using artificial intelligence.
- benchmarks
- Benchmarks are standard tests used to compare performance.