Xiaomi claims 1,000+ tokens per second on an 8-GPU server
The Reddit post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed. According to the post, Xiaomi claims the model passed 1,000 output tokens per second on a 1 trillion parameter MoE model. The post says this was done on a standard 8-GPU server, not on special hardware like Cerebras or Groq. Some comments questioned which GPUs were used and what the real cost would be.
Key points
- The post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed.
- Xiaomi is claimed to have passed 1,000 output tokens per second.
- The claim is for a 1 trillion parameter MoE model.
- The post says the run used a standard 8-GPU server, not custom AI hardware.
- Comments raised open questions about the exact GPUs and real-world cost.
Quick term guide
- output tokens
- Small pieces of text produced by an AI model and used for pricing.
- output token
- A small piece of text the AI produces in its answer.
- tokens per second
- A measurement of how many pieces of text an AI can generate in one second.
- 8-GPU server
- A computer with eight graphics chips used for heavy AI work.
- AI agents
- AI agents are AI tools that can carry out steps toward a goal, not just answer once.
- agent workflows
- Step-by-step work patterns where an AI agent handles a task.
- agent workflow
- A set of steps an AI follows automatically to complete a series of tasks in order.
- workflows
- The specific order of steps taken to finish a piece of work.