M2 16GB Mac test hit memory trouble during compression

A user says they added Gwen3 4b as an auxiliary model for compression in Hermes Agent. They saw a token speed of 20k/s during testing. While summarizing 16000 MD files, RAM use rose to 12GB and the Mac mini shut down immediately.

Key points

  • The user tested Gwen3 4b as an auxiliary model in Hermes Agent.
  • They reported a token speed of 20k/s.
  • RAM use reached 12GB during a 16000 MD file summarization test.
  • The Mac mini shut down right away and could not be turned on remotely.
  • On an M2 16GB Mac, large summary jobs should be tested in smaller batches first.

Quick term guide

Gwen3 4b
A likely AI model name as written in the Reddit title.
auxiliary model
A helper AI model used alongside the main model.
compression
A process that shortens older chat details so the AI can keep working in a long session.
compress
To take a lot of information and turn it into a shorter, simpler version.
Hermes Agent
It appears to be a tool or community for building and managing AI agents.
testing
The process of checking that software does what it's supposed to do, usually by running it and looking for errors.
MD file
A text document written in Markdown format.
Mac mini
A small desktop computer made by Apple.
Read original