llmbuffer aims to cut AI agent chat costs with better caching
llmbuffer is a Python library for managing LLM conversation history so more of it can use cache. Its creator says it keeps long-lived conversation history stable while still allowing dynamic context to change. The README shows a simulated 15-turn benchmark where input cost was about 43% lower than a common simple approach.
Key points
- It is a Python 3.9+ library for LLM conversation history management.
- It keeps the fixed system prompt and older conversation history at the front so cache can be reused.
- It places changing items, such as RAG results, timestamps, and tool calls, near the end.
- The README includes examples for OpenAI and Anthropic, plus provider adapters for other setups.
- It supports compaction hooks to trim or summarize long history.
Quick term guide
- benchmark
- A test used to compare speed, quality, or cost.
- open-source
- Software whose code is shared publicly so others can inspect, use, or change it.
- model provider
- The external AI service (such as OpenAI or Anthropic) that Cursor connects to in order to generate code suggestions.
- system prompt
- A hidden set of basic instructions that guides how an AI tool behaves.
- timestamps
- Dates and times attached to messages or events.
- tool calls
- Times when an AI system uses another function, such as search or file access.
- tool call
- One time an AI agent uses a tool, such as search, calculation, or file reading.
- compaction
- The process where an AI summarizes its past conversation to save memory space.