Open SourceImportance: Medium

Tracking LLM cost per user in production is still hard

r/LLMDevsJun 13, 2026 · 2h ago

Teams running large language models in production need practical ways to handle LLM observability and cost tracking. The main questions are which tools teams use, what still breaks in real use, and how costs can be assigned to each request or user as traffic grows.

Total spending is not enough when usage scales, because teams need to know which features, workflows, or users are driving the bill. The focus is early research into real production pain points, not a product pitch.

Key points

Production LLM teams are looking for better observability and cost tracking.
The hard part is assigning cost to each request or user as traffic grows.
Total monthly spend does not show which workflows are expensive.
AI agent teams need this data before they can cut token use or redesign costly flows.

Quick term guide

large language models: AI models trained to read, write, and answer questions using text.
large language model: The type of AI behind ChatGPT or Claude — trained on huge amounts of text to read, write, and code.
language models: AI systems that read text and generate likely next words as answers.
LLM observability: The ability to see how model calls behave, fail, and spend money in a live service.
observability: The ability to monitor and understand what's happening inside a running system by looking at its outputs and logs.
pain points: Specific problems or frustrations that customers are willing to pay money to solve.
model calls: Requests sent to an AI model to get an answer or action.
agent teams: Groups of AI agents set up with different roles to work on tasks.

Read original ↗