Open SourceImportance: High

Xiaomi claims 1,000+ tokens per second on an 8-GPU server

r/LocalLLaMAJun 9, 2026 · 3d ago

The Reddit post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed. According to the post, Xiaomi claims the model passed 1,000 output tokens per second on a 1 trillion parameter MoE model. The post says this was done on a standard 8-GPU server, not on special hardware like Cerebras or Groq. Some comments questioned which GPUs were used and what the real cost would be.

Key points

The post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed.
Xiaomi is claimed to have passed 1,000 output tokens per second.
The claim is for a 1 trillion parameter MoE model.
The post says the run used a standard 8-GPU server, not custom AI hardware.
Comments raised open questions about the exact GPUs and real-world cost.

Quick term guide

output tokens: Small pieces of text produced by an AI model and used for pricing.
output token: A small piece of text the AI produces in its answer.
tokens per second: A measurement of how many pieces of text an AI can generate in one second.
8-GPU server: A computer with eight graphics chips used for heavy AI work.
AI agents: AI agents are AI tools that can carry out steps toward a goal, not just answer once.
agent workflows: Step-by-step work patterns where an AI agent handles a task.
agent workflow: A set of steps an AI follows automatically to complete a series of tasks in order.
workflows: The specific order of steps taken to finish a piece of work.

Read original ↗