InfiniteKV stores old chat tokens instead of deleting them
The Reddit post introduces an open source KV cache called InfiniteKV. It says the system keeps old tokens by turning them into 104-byte searchable records stored in RAM or on disk. The author says Mistral-7B answered using content from token 76,747, which is 2.3 times past its trained window.
Key points
- InfiniteKV says it stores old tokens instead of deleting them.
- The newest 256 tokens stay exact in GPU memory, according to the post.
- Older tokens become small records that can live in RAM or disk files.
- For each new token, the cache searches for relevant old tokens and brings them back for the model to use.
- The author claims one million tokens would use about 3GB of records instead of 122GB of float16 data.
Quick term guide
- open source
- Software whose code is available for people to view and often modify.
- KV cache
- Memory an AI model uses to keep track of earlier text so it does not repeat the same work.
- Solo developer
- An individual who handles all parts of creating a project or product alone.
- solo dev
- A person who builds software mostly by themselves.
- developers
- Developers are people who build software, apps, or websites.
- local AI
- AI software that runs entirely on your own computer, with no internet connection needed.
- local model
- An AI model you run directly on your own computer, with no internet connection or external service needed.
- GPU memory
- Fast memory on a graphics card, often used to run AI models.