Open-source proxy claims lower Claude Code token costs
The Reddit poster says much of their Claude Code bill came from parts that were not covered by cache discounts. They built `llmtrim`, a local proxy, to reduce the size of requests and replies. The post says it leaves the cached prefix unchanged so the cache discount still applies. The poster claims it cut full-price input by about 68% on live Claude Code traffic.
Key points
- The tool is called `llmtrim` and is described as an open-source local proxy.
- The poster says it does not change the cached prefix.
- It focuses on shrinking new request content, tool output, and replies.
- The claimed result is about 68% less full-price input on live Claude Code traffic.
- The post says it added about 18ms per request, but smaller requests could make some calls faster overall.
Quick term guide
- local proxy
- A small middle layer that receives requests and sends them to a local model or another model.
- cached prefix
- The earlier part of a prompt that can be reused by the AI service without charging the same full price again.
- Solo makers
- People who build and launch their own products or services entirely on their own.
- benchmark
- A test used to compare speed, quality, or cost.
- verbose
- Using more words or code than needed to get the job done.
- testing
- The process of checking that software does what it's supposed to do, usually by running it and looking for errors.
- open-source
- Software whose code is shared publicly so others can inspect, use, or change it.
- Content
- Information or experiences, like articles or videos, provided through digital media.