New CLI tool forces coding agents to prove they're actually done

AI coding agents often claim a task is finished without running tests or checking for errors. A developer built a CLI tool that blocks agents from declaring 'done' unless they show real proof it works.

Anyone who uses AI coding agents like Claude or Cursor has likely seen this: the agent writes some code and confidently says 'all done,' but it never actually ran the code or verified it works. This new CLI tool sits in between the agent and your workflow, requiring the agent to provide concrete evidence — like test results or successful execution output — before marking a task complete.

For solo developers and makers who rely heavily on AI tools every day, this is a practical quality-control layer. It directly addresses the 'hallucination' problem where AI confidently reports success on tasks it hasn't actually verified, forcing a real validation step into the process.

Key points

  • Blocks AI coding agents from claiming 'done' without showing proof
  • Addresses a common problem with Claude, Cursor, and similar tools
  • Agents must submit test results or execution output to be considered finished
  • Adds a practical verification step to AI-assisted coding workflows

Quick term guide

AI coding agents
AI tools that can help write, edit, or organize software code.
AI coding agent
An AI tool that can write, edit, and run code from your instructions.
coding agents
AI programs designed to autonomously perform tasks like writing or fixing code.
coding agent
An AI tool that writes or edits code from a person’s instructions.
Solo developer
An individual who handles all parts of creating a project or product alone.
developers
Developers are people who build software, apps, or websites.
hallucination
When AI makes something up and presents it as a real answer.
validation
Checking whether real people understand, want, or would use an idea before spending more time on it.
Read original