Google releases Gemini Diffusion's diffusion-based model at a speed of 2000 token/seconds comparable to Gemini 2.0 Flash-Lite

In Google I/O 2025, Google Deepmind first made public the cutting-edge technology it was developing - Gemini Diffusion, a new way of applying proliferation models to language modelling.

What’s Gemini Diffusion?

This non-negative (non-causal) reasoning gives rise to two major advantages: ** Amazing rate of generation: available **2000 token/sec, including all computational processes such as tokenization, prefilling, filtering, etc. ** Stronger global reasoning**: not dependent on step-by-step production, capable of thinking in a holistic way.

Modular principles

The difference between traditional self-regression models

  • ** Self-regression model** (e.g. GPT, PaLM, etc.): Forecasts the next word (token) sequentially, one after the other.

  • Advantages: simple structure and extensive application.

  • Deficiencies: slow generation and limited global coherence.

** Gemini Diffusion model** (based on proliferation mechanisms):

The main advantage

  • Methods of work: After adding a “complete text expression” to noise damage, the training model is gradually “noise” and restored to a reasonable text. Strengths: ** Allows corporate generation**: a broad draft can be generated from the outset and then repeatedly improved;

  • Natural support for error correction and editing: since each step is essentially “adapted”;

  • According to the generation of complex structures: for example, mathematics, programming languages, the structure and semantic rigour are required.

** Natural mechanisms for correcting errors**

  • Errors in the generation of traditional models require external tool detection and rewriting.

  • The proliferation mechanism itself consists of a gradual process of amendment that makes the output more fluid and accurate.

Mathematics and code processing skills.

  • Gemini Diffusion is superior in terms of “verifiable structural content”.

  • For example, mathematical expressions, program codes that satisfy syntax correctness and logical consistency, and diffusion models are better at such “editorial generation”.

Case demonstration

1. The programming scene was excellent

Gemini Diffusion is particularly good at code generation, and Brendan calls it the “vibe countingexperience: it’s almost real-time high-quality code writing. Actual velocity 2000 tokens/sec, which includes:

  • Tokenize

  • Prefill (prefill)

  • Safe filter.

  • Output synthesis

Compared to self-regression models such as GPT-4/Claude, the speed increase is significant and suitable for high-frequency interactions or low-delayed scenarios (e.g. code editor, chat robots, etc.).

2. ** Complex math problems can be easily solved**

Traditional language models (e.g. GPS-4o) are prone to failure in the face of mathematical questions such as “answer first”, “cross-step logic”, because they are predictive and each step magnifies previous mistakes. Gemini Diffusion uses a non-causal structure that “** builds the logical framework for the full answer and fills the details”. Example Question 1: ** Title **((81)* (2/3))2 +(15-3)/(22)” Please give answers before the process. Answer **: 39 Gemini Diffusion correctly completed, GPT-4o failed to solve.

3. ** Non-linear, multistep reasoning**

Example Question 2: ** Title **: “How many prime numbers do you have between 150 and 250? “ ** Answer ** 18 prime numbers, list output [151, 157, …, 241, 251] Gemini Diffusion precisely completed and GPS-4o also failed.

##4. ** Non-repatriation structure = reasoning is no longer limited by the order** The non-regression characteristics of the proliferation mechanism allow the model to ** optimize the structure of the answer across the board**, address the tasks that require “know the end before write the beginning” and break the traditional token-by-token limit.

Applying perspectives and impacts

♪ ♪ ♪ ♪ apply the scene ♪

The future of technology

Official presentation: https://deepmind.google/models/gemini-diffusion/