A FinOps team's first LLM cost audit: 4 surprises they found
A company's FinOps team ran their first structured audit of LLM spending and uncovered 4 things they hadn't known before. It's a hands-on account of finding hidden waste and sharing what to fix.
Most teams adopt LLMs (AI language models like GPT or Claude) without closely tracking what they actually cost. This post shares the experience of a FinOps team — the group responsible for managing a company's technology spending — doing a thorough audit of all LLM usage for the first time. The audit surfaced where money was going, where waste was happening, and which areas could be optimized.
For anyone running AI agents or automated pipelines, this kind of audit is especially relevant. Small decisions like which model to use, how long your prompts are, and whether you cache repeated requests can add up to very different bills. Concrete, experience-based checklists from real teams are rare and practical for anyone trying to bring LLM costs under control.
Key points
- A FinOps team shares their first systematic audit of LLM costs
- Found 4 previously unknown sources of waste or inefficiency
- LLM costs depend on model choice, prompt length, caching, and more
- Directly useful as an audit framework for teams running AI pipelines
- Real-world practitioner case from the r/mlops community
Quick term guide
- surface
- Here it means a distinct channel or interface where users encounter information, such as a search results page or an AI chat answer.
- AI agents
- AI agents are AI tools that can carry out steps toward a goal, not just answer once.
- AI agent
- An AI program that can inspect information and suggest what to do next.
- pipeline
- An automated sequence of steps that processes or moves data without manual intervention.
- prompts
- Instructions you give to an AI tool.
- sources
- Evidence showing where a piece of information came from.
- caching
- Saving an AI's response so you can reuse it later without sending the same request again.
- framework
- A ready-made structure or toolkit that helps developers build software faster.