Two copies of the same AI model won't write the same code changes

An experiment shows that running the same AI model twice on the same coding task produces different code changes each time. This happens because AI models pick words using probability, not fixed rules. Anyone building multi-agent systems needs to account for this non-determinism.

AI language models don't work like calculators — they don't produce the exact same output every time you give them the same input. Each run samples from a probability distribution, so the results vary slightly or sometimes significantly. This post demonstrates that two instances of the same model, given identical instructions, will produce different diffs (records of what lines were added or removed in code).

For developers building AI agent pipelines, this has real practical weight. Strategies like running the same model multiple times to 'vote' on the best answer or to verify a result are valid, but you cannot assume the outputs will converge on their own. If your system needs consistent results, lowering the temperature setting reduces randomness, and adding an explicit comparison or validation step between agent outputs is more reliable than hoping they agree.

Key points

  • Two runs of the same model on the same task will produce different code changes (diffs)
  • AI generates text probabilistically, so identical inputs don't guarantee identical outputs
  • Multi-agent systems that assume consistent outputs between instances can produce silent bugs
  • Lowering the temperature setting makes outputs more consistent, though not perfectly identical
  • Always include an explicit verification step when comparing results across multiple agent runs

Quick term guide

AI models
The core brain or underlying program that powers an artificial intelligence tool.
multi-agent system
A setup where several AI programs each take a specific role and work together to complete one larger task
multi-agent
A setup where several AI agents each handle a different subtask and work together to complete a larger goal.
non-determinism
The property of a system that can give different results even when given the exact same input.
distribution
All the work involved in getting your product or content in front of people — posting on social media, sending emails, sharing in communities, etc.
developers
Developers are people who build software, apps, or websites.
temperature
A setting that controls how random or creative an AI's output is — lower means more predictable.
validation
Checking whether real people understand, want, or would use an idea before spending more time on it.
Read original