Google releases DiffusionGemma: open model with up to 4x faster text output
Google DeepMind released DiffusionGemma, an experimental open model that generates text using diffusion — the same technique used in image-generation AI — instead of the usual word-by-word approach. On dedicated GPUs it claims up to 4x faster output, which could meaningfully cut the cost of running AI agents.
Most LLMs write text one token at a time from left to right (called autoregressive generation). DiffusionGemma takes a different route: it starts with a rough draft across all positions simultaneously and iteratively refines it — the same diffusion process that powers image generators like Stable Diffusion, now applied to text. This parallelism is why it can be dramatically faster on hardware that supports it.
For AI agent workflows, where a single task might trigger dozens of LLM calls, faster inference directly lowers cost per run. Because DiffusionGemma is fully open, it can be self-hosted with no per-token API fees. Google also published a developer guide alongside the release, giving a concrete integration path. The model is labeled experimental, so production use should come after quality and reliability testing.
Key points
- Generates text via diffusion instead of token-by-token — multiple positions refined in parallel
- Up to 4x faster on dedicated GPUs, meaning more agent calls per dollar of compute
- Fully open model: self-host to eliminate per-call API costs
- Official developer guide published at launch — covers how to integrate it
- Still experimental; benchmark quality against your use case before deploying
Quick term guide
- diffusion
- A method that builds data in steps, famously used to create AI-generated art.
- autoregressive generation
- The standard method most LLMs use: predict and output one token at a time, left to right, before moving to the next.
- Stable Diffusion
- An AI tool that creates images based on text descriptions.
- AI agent workflow
- A series of automated steps where AI tools work in sequence to complete a task without manual input each time.
- agent workflow
- A set of steps an AI follows automatically to complete a series of tasks in order.
- self-hosted
- Run on your own server instead of managed by another company.
- production
- The live version of a service that real users use.
- reliability
- How consistently a tool works without failing or behaving unexpectedly.
Sources covering this story (6)
- r/LocalLLaMAGoogle releases DiffusionGemma: open model with up to 4x faster text output ↗
- r/vibecodingI added on-device Gemini Nano to my Chrome extension — here's how I integrated Chrome's Built-in AI Prompt API ↗
- r/LocalLLaMADeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model ↗
- r/singularityGoogle releases DiffusionGemma, new experimental open model with up to 4x faster output on dedicated GPUs ↗
- r/GoogleProductivityTurn on Meet transcription and use Gemini to summarise it after, you will never take manual notes again ↗
- r/AISEOInsiderGoogle AI Studio Updates With Nano Banana Will SHOCK You ↗