Open SourceImportance: Medium

Reasoning models can waste tokens on creative writing

r/LocalLLaMAJun 11, 2026 · 2d ago

Reasoning models may help creative tasks by keeping track of details and following instructions more closely. In practice, they can spend a lot of effort drafting, checking, revising, and drafting again before giving the final answer. This wastes tokens even for short replies, and the problem becomes much worse for outputs longer than a paragraph or two.

With Gemma 4 and Qwen3.6, prompting did not seem enough to control or shorten this reasoning process. Extra instructions could add more steps, but they did not reliably remove the built-in-looking draft-and-revise loop. Possible workarounds include Jinja template changes, fine tuning, or using a reasoning model designed with this inefficiency in mind.

It may also be a limitation that end users cannot really fix from the outside.

Key points

Reasoning models can repeat drafting and revision steps before producing the final answer.
That behavior can waste many tokens, especially for longer creative outputs.
Prompting alone did not seem to reduce the process in Gemma 4 or Qwen3.6.
Jinja template changes, fine tuning, or a different reasoning model might help.
The behavior may be hard for an end user to control if it is built into the model.

Quick term guide

reasoning models: AI models designed to spend more effort thinking through harder tasks.
reasoning model: An AI model that works through an internal thinking process before giving its final answer — and that thinking process also consumes tokens, which cost money.
reasoning: The ability of the AI to think through complex steps to find a solution.
prompting: Writing instructions or questions to an AI to get a response.
reasoning process: The steps an AI appears to use while working toward an answer.
workaround: An alternative way to get something done when the normal way doesn't work.
Jinja template: A reusable text template often used to control how prompts are built.
AI agents: AI agents are AI tools that can carry out steps toward a goal, not just answer once.

Read original ↗