Minimax M3 beat Kimi K2.6 on agent cost in hands-on tests

Kimi K2.6 and were compared in real , and finished more tasks while costing less. The tests covered terminal coding, API calls, tool use, and multi-step . The prompts, tools, and sandbox stayed the same, with only the model changed.

On difficult terminal coding tasks, solved 5 out of 10 tasks for $2.80. Kimi K2.6 solved 4 out of 10 tasks for $6.61, so it cost more while completing fewer tasks. One difficult path-tracing-reverse task needed 134 terminal back-and-forth steps; kept going and finished it, while Kimi K2.6 timed out.

Across 25 practical workflows, including email summaries, Drive organization, GitHub analysis, startup research, outreach drafts, and cross-app , scored 0.75 at a cost of $0.81, while Kimi K2.6 scored 0.72 at a cost of $4.08.

Key points

  • The comparison kept the prompts, tools, and sandbox the same, changing only the model.
  • solved 5 of 10 terminal coding tasks for $2.80.
  • Kimi K2.6 solved 4 of 10 terminal coding tasks for $6.61.
  • Across 25 practical , scored 0.75 for $0.81, while Kimi K2.6 scored 0.72 for $4.08.
  • completed a long terminal task that required 134 back-and-forth steps; Kimi K2.6 timed out.
Read original