llmbuffer aims to cut AI agent chat costs with better caching

llmbuffer aims to cut AI agent chat costs with better caching

llmbuffer is a Python library for managing LLM conversation history so more of it can use cache. Its creator says it keeps long-lived conversation history stable while still allowing dynamic context to change. The README shows a simulated 15-turn benchmark where input cost was about 43% lower than a common simple approach.

Key points

  • It is a Python 3.9+ library for LLM conversation history management.
  • It keeps the fixed system prompt and older conversation history at the front so cache can be reused.
  • It places changing items, such as RAG results, timestamps, and tool calls, near the end.
  • The README includes examples for OpenAI and Anthropic, plus provider adapters for other setups.
  • It supports compaction hooks to trim or summarize long history.

Quick term guide

benchmark
A test used to compare speed, quality, or cost.
open-source
Software whose code is shared publicly so others can inspect, use, or change it.
model provider
The external AI service (such as OpenAI or Anthropic) that Cursor connects to in order to generate code suggestions.
system prompt
A hidden set of basic instructions that guides how an AI tool behaves.
timestamps
Dates and times attached to messages or events.
tool calls
Times when an AI system uses another function, such as search or file access.
tool call
One time an AI agent uses a tool, such as search, calculation, or file reading.
compaction
The process where an AI summarizes its past conversation to save memory space.
Read original