Open SourceImportance: High

Google releases DiffusionGemma: open model with up to 4x faster text output

r/LocalLLaMAJun 11, 2026 · 15h ago

Google DeepMind released DiffusionGemma, an experimental open model that generates text using diffusion — the same technique used in image-generation AI — instead of the usual word-by-word approach. On dedicated GPUs it claims up to 4x faster output, which could meaningfully cut the cost of running AI agents.

Most LLMs write text one token at a time from left to right (called autoregressive generation). DiffusionGemma takes a different route: it starts with a rough draft across all positions simultaneously and iteratively refines it — the same diffusion process that powers image generators like Stable Diffusion, now applied to text. This parallelism is why it can be dramatically faster on hardware that supports it.

For AI agent workflows, where a single task might trigger dozens of LLM calls, faster inference directly lowers cost per run. Because DiffusionGemma is fully open, it can be self-hosted with no per-token API fees. Google also published a developer guide alongside the release, giving a concrete integration path. The model is labeled experimental, so production use should come after quality and reliability testing.

Key points

Generates text via diffusion instead of token-by-token — multiple positions refined in parallel
Up to 4x faster on dedicated GPUs, meaning more agent calls per dollar of compute
Fully open model: self-host to eliminate per-call API costs
Official developer guide published at launch — covers how to integrate it
Still experimental; benchmark quality against your use case before deploying

Quick term guide

diffusion: A method that builds data in steps, famously used to create AI-generated art.
autoregressive generation: The standard method most LLMs use: predict and output one token at a time, left to right, before moving to the next.
Stable Diffusion: An AI tool that creates images based on text descriptions.
AI agent workflow: A series of automated steps where AI tools work in sequence to complete a task without manual input each time.
agent workflow: A set of steps an AI follows automatically to complete a series of tasks in order.
self-hosted: Run on your own server instead of managed by another company.
production: The live version of a service that real users use.
reliability: How consistently a tool works without failing or behaving unexpectedly.

Sources covering this story (6)

Read original ↗