Open-source proxy claims lower Claude Code token costs

The Reddit poster says much of their Claude Code bill came from parts that were not covered by cache discounts. They built `llmtrim`, a local proxy, to reduce the size of requests and replies. The post says it leaves the cached prefix unchanged so the cache discount still applies. The poster claims it cut full-price input by about 68% on live Claude Code traffic.

Key points

  • The tool is called `llmtrim` and is described as an open-source local proxy.
  • The poster says it does not change the cached prefix.
  • It focuses on shrinking new request content, tool output, and replies.
  • The claimed result is about 68% less full-price input on live Claude Code traffic.
  • The post says it added about 18ms per request, but smaller requests could make some calls faster overall.

Quick term guide

local proxy
A small middle layer that receives requests and sends them to a local model or another model.
cached prefix
The earlier part of a prompt that can be reused by the AI service without charging the same full price again.
Solo makers
People who build and launch their own products or services entirely on their own.
benchmark
A test used to compare speed, quality, or cost.
verbose
Using more words or code than needed to get the job done.
testing
The process of checking that software does what it's supposed to do, usually by running it and looking for errors.
open-source
Software whose code is shared publicly so others can inspect, use, or change it.
Content
Information or experiences, like articles or videos, provided through digital media.
Read original