A FinOps team's first LLM cost audit: 4 surprises they found

A company's FinOps team ran their first structured audit of LLM spending and uncovered 4 things they hadn't known before. It's a hands-on account of finding hidden waste and sharing what to fix.

Most teams adopt LLMs (AI language models like GPT or Claude) without closely tracking what they actually cost. This post shares the experience of a FinOps team — the group responsible for managing a company's technology spending — doing a thorough audit of all LLM usage for the first time. The audit surfaced where money was going, where waste was happening, and which areas could be optimized.

For anyone running AI agents or automated pipelines, this kind of audit is especially relevant. Small decisions like which model to use, how long your prompts are, and whether you cache repeated requests can add up to very different bills. Concrete, experience-based checklists from real teams are rare and practical for anyone trying to bring LLM costs under control.

Key points

  • A FinOps team shares their first systematic audit of LLM costs
  • Found 4 previously unknown sources of waste or inefficiency
  • LLM costs depend on model choice, prompt length, caching, and more
  • Directly useful as an audit framework for teams running AI pipelines
  • Real-world practitioner case from the r/mlops community

Quick term guide

surface
Here it means a distinct channel or interface where users encounter information, such as a search results page or an AI chat answer.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
AI agent
An AI program that can inspect information and suggest what to do next.
pipeline
An automated sequence of steps that processes or moves data without manual intervention.
prompts
Instructions you give to an AI tool.
sources
Evidence showing where a piece of information came from.
caching
Saving an AI's response so you can reuse it later without sending the same request again.
framework
A ready-made structure or toolkit that helps developers build software faster.
Read original