Tracking LLM cost per user in production is still hard

Teams running large language models in production need practical ways to handle LLM observability and cost tracking. The main questions are which tools teams use, what still breaks in real use, and how costs can be assigned to each request or user as traffic grows.

Total spending is not enough when usage scales, because teams need to know which features, workflows, or users are driving the bill. The focus is early research into real production pain points, not a product pitch.

Key points

  • Production LLM teams are looking for better observability and cost tracking.
  • The hard part is assigning cost to each request or user as traffic grows.
  • Total monthly spend does not show which workflows are expensive.
  • AI agent teams need this data before they can cut token use or redesign costly flows.

Quick term guide

large language models
AI models trained to read, write, and answer questions using text.
large language model
The type of AI behind ChatGPT or Claude — trained on huge amounts of text to read, write, and code.
language models
AI systems that read text and generate likely next words as answers.
LLM observability
The ability to see how model calls behave, fail, and spend money in a live service.
observability
The ability to monitor and understand what's happening inside a running system by looking at its outputs and logs.
pain points
Specific problems or frustrations that customers are willing to pay money to solve.
model calls
Requests sent to an AI model to get an answer or action.
agent teams
Groups of AI agents set up with different roles to work on tasks.
Read original