Open SourceImportance: Medium

llmbuffer aims to cut AI agent chat costs with better caching

Hacker NewsJun 11, 2026 · 20h ago

llmbuffer is a Python library for managing LLM conversation history so more of it can use cache. Its creator says it keeps long-lived conversation history stable while still allowing dynamic context to change. The README shows a simulated 15-turn benchmark where input cost was about 43% lower than a common simple approach.

Key points

It is a Python 3.9+ library for LLM conversation history management.
It keeps the fixed system prompt and older conversation history at the front so cache can be reused.
It places changing items, such as RAG results, timestamps, and tool calls, near the end.
The README includes examples for OpenAI and Anthropic, plus provider adapters for other setups.
It supports compaction hooks to trim or summarize long history.

Quick term guide

benchmark: A test used to compare speed, quality, or cost.
open-source: Software whose code is shared publicly so others can inspect, use, or change it.
model provider: The external AI service (such as OpenAI or Anthropic) that Cursor connects to in order to generate code suggestions.
system prompt: A hidden set of basic instructions that guides how an AI tool behaves.
timestamps: Dates and times attached to messages or events.
tool calls: Times when an AI system uses another function, such as search or file access.
tool call: One time an AI agent uses a tool, such as search, calculation, or file reading.
compaction: The process where an AI summarizes its past conversation to save memory space.

Read original ↗