Reasoning models can waste tokens on creative writing

Reasoning models may help creative tasks by keeping track of details and following instructions more closely. In practice, they can spend a lot of effort drafting, checking, revising, and drafting again before giving the final answer. This wastes tokens even for short replies, and the problem becomes much worse for outputs longer than a paragraph or two.

With Gemma 4 and Qwen3.6, prompting did not seem enough to control or shorten this reasoning process. Extra instructions could add more steps, but they did not reliably remove the built-in-looking draft-and-revise loop. Possible workarounds include Jinja template changes, fine tuning, or using a reasoning model designed with this inefficiency in mind.

It may also be a limitation that end users cannot really fix from the outside.

Key points

  • Reasoning models can repeat drafting and revision steps before producing the final answer.
  • That behavior can waste many tokens, especially for longer creative outputs.
  • Prompting alone did not seem to reduce the process in Gemma 4 or Qwen3.6.
  • Jinja template changes, fine tuning, or a different reasoning model might help.
  • The behavior may be hard for an end user to control if it is built into the model.

Quick term guide

reasoning models
AI models designed to spend more effort thinking through harder tasks.
reasoning model
An AI model that works through an internal thinking process before giving its final answer — and that thinking process also consumes tokens, which cost money.
reasoning
The ability of the AI to think through complex steps to find a solution.
prompting
Writing instructions or questions to an AI to get a response.
reasoning process
The steps an AI appears to use while working toward an answer.
workaround
An alternative way to get something done when the normal way doesn't work.
Jinja template
A reusable text template often used to control how prompts are built.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
Read original