Minimax M3 beat Kimi K2.6 on agent cost in hands-on tests
Kimi K2.6 and were compared in real , and finished more tasks while costing less. The tests covered terminal coding, API calls, tool use, and multi-step . The prompts, tools, and sandbox stayed the same, with only the model changed.
On difficult terminal coding tasks, solved 5 out of 10 tasks for $2.80. Kimi K2.6 solved 4 out of 10 tasks for $6.61, so it cost more while completing fewer tasks. One difficult path-tracing-reverse task needed 134 terminal back-and-forth steps; kept going and finished it, while Kimi K2.6 timed out.
Across 25 practical workflows, including email summaries, Drive organization, GitHub analysis, startup research, outreach drafts, and cross-app , scored 0.75 at a cost of $0.81, while Kimi K2.6 scored 0.72 at a cost of $4.08.
Key points
- The comparison kept the prompts, tools, and sandbox the same, changing only the model.
- solved 5 of 10 terminal coding tasks for $2.80.
- Kimi K2.6 solved 4 of 10 terminal coding tasks for $6.61.
- Across 25 practical , scored 0.75 for $0.81, while Kimi K2.6 scored 0.72 for $4.08.
- completed a long terminal task that required 134 back-and-forth steps; Kimi K2.6 timed out.