Open SourceImportance: Medium

Automated a Claude-GPT-Gemini debate loop — the hard part was making them actually disagree

r/AI_AgentsJun 11, 2026 · 4h ago

A developer automated the process of feeding each AI model's answer to the others as a prompt for rebuttal. The automation itself was straightforward, but getting the models to genuinely push back — rather than just agree — turned out to be the real challenge.

The author had a habit of manually copy-pasting answers between Claude, GPT, and Gemini to create a kind of debate: each model would see what the others said and be asked to critique it. After doing this by hand many times, they wrote a script to automate the loop.

The automation worked, but the models kept defaulting to agreement or shallow praise instead of real disagreement. Crafting prompts that reliably forced genuine rebuttal was far harder than expected. This multi-LLM debate approach can surface blind spots and errors that a single model misses, but calling three models per round multiplies token usage and API costs significantly.

Key points

Three LLMs (Claude, GPT, Gemini) are each shown the others' answers and asked to rebut them
The manual copy-paste workflow was replaced with an automated script
Models tend to agree rather than disagree — the core unsolved design problem
Writing prompts that force real pushback is the key engineering challenge
Running three models per question increases token costs compared to a single-model approach

Quick term guide

AI model: A program that can understand prompts and produce text, code, or answers.
automation: A way to make repeated work happen without doing every step by hand.
Critique: A detailed evaluation or review of someone's work to help them improve it.
prompts: Instructions you give to an AI tool.
API costs: Fees paid when software calls an online service programmatically.
workflow: A repeatable set of steps for getting a task done.
token costs: Token costs are the fees paid for the text an AI model reads and writes.
token cost: The money or usage spent when sending text to an AI model and getting text back.

Read original ↗