A 16GB Jetson test for a low-power local AI agent

A 16GB was rebuilt into a quiet 40W local machine for running . The practical goal was to keep the box always available, use little power, handle , and still answer fast enough to be usable. The main limit was memory: the same 16GB pool had to hold the model, the , the KV cache, tool use, and .

Gemma 4 26B A4B UD Q2_K_XL could generate answers around a 66K context depth and reached about 10.21 near a 60K context. Qwen 3.6 35B did better in the tool-calling stress test, but it was slower when generating answers deep in a , which made it less attractive as the default setup. could reach 100K context, but it was weaker at choosing between similar tools and handling chained tool tasks.

The practical recommendation was Gemma 4 26B for tool-heavy work, for very long memory, and careful KV cache tuning for either path.

Key points

  • A 16GB was tuned as a quiet 40W machine.
  • Gemma 4 26B A4B UD Q2_K_XL generated at about 10 in long-context use.
  • Qwen 3.6 35B was stronger on tool-calling correctness but slower at deep long-context generation.
  • reached 100K context but was weaker on harder tool-selection tasks.
  • KV cache choices such as q8_0/q4_0 were central to fitting into 16GB.
Read original