Gemini 3.1 Pro ends up cheaper per task than 3.5 Flash despite higher token rates
The team behind Tessl, an tool, ran about 3,300 coding tasks across four and shared the results. scored 87.9 and cost $0.66 per task; scored 88.6 and cost $1.05 per task. The score gap is just 0.7 points, yet Flash cost 59% more.
The twist: 3.1 Pro's published per-token price is actually higher than Flash's. The cost reversal comes from how each model works through a problem. On average, 3.1 Pro used 26 conversation turns and roughly 650,000 tokens per task, while Flash used 39 turns and about 1.4 million tokens.
Flash pulled in far more context and took more steps, which overwhelmed its lower unit price. A second finding: when the team supplied relevant skills from their registry, 3.1 Pro's cost dropped by roughly 23% and its score improved substantially, while s saw little to no benefit from the same addition.
Key points
- Despite higher per-token rates, cost 39% less per task than Flash in this test
- Flash consumed roughly twice as many tokens and 50% more turns per task, making it pricier overall
- Adding task-relevant skills cut 3.1 Pro's cost by ~23% and boosted its score; Flash barely benefited
- For , measure cost per completed task — not cost per token — to get an accurate picture
- Results come from Tessl's workloads; outcomes may differ for other task types