Xiaomi claims 1,000+ tokens per second on an 8-GPU server

The Reddit post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed. According to the post, Xiaomi claims the model passed 1,000 output tokens per second on a 1 trillion parameter MoE model. The post says this was done on a standard 8-GPU server, not on special hardware like Cerebras or Groq. Some comments questioned which GPUs were used and what the real cost would be.

Key points

  • The post says Xiaomi MiMo announced MiMo-V2.5-Pro UltraSpeed.
  • Xiaomi is claimed to have passed 1,000 output tokens per second.
  • The claim is for a 1 trillion parameter MoE model.
  • The post says the run used a standard 8-GPU server, not custom AI hardware.
  • Comments raised open questions about the exact GPUs and real-world cost.

Quick term guide

output tokens
Small pieces of text produced by an AI model and used for pricing.
output token
A small piece of text the AI produces in its answer.
tokens per second
A measurement of how many pieces of text an AI can generate in one second.
8-GPU server
A computer with eight graphics chips used for heavy AI work.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
agent workflows
Step-by-step work patterns where an AI agent handles a task.
agent workflow
A set of steps an AI follows automatically to complete a series of tasks in order.
workflows
The specific order of steps taken to finish a piece of work.
Read original