Local AI model choices for a 72GB RTX 3090 setup
A computer with three RTX 3090 cards and 72GB of VRAM can run large quickly by keeping them in . GPT-OSS 120B remains a solid everyday choice.
Qwen3.5 122B is very strong at , but it may spend too much time reasoning before answering. GLM Air 4.5 106B is used often for quick replies because it does not default to a long thinking mode.
Gemma 4 31B and Qwen3.6 27B are smaller, load and unload quickly, and can fit well in 48GB using Q8, which leaves another free for audio or image work. Nematron Nano Omni 30B A3B and Devstral Small 2 24B are also considered good, but they have been used less because larger general models and Qwen 27B cover the main needs.
Key points
- A 72GB VRAM setup can run some 100B-class .
- Qwen3.5 122B is strong for but may be slower because it overthinks.
- GLM Air 4.5 106B is useful for quick replies.
- Gemma 4 31B and Qwen3.6 27B are easier to load, unload, and share with other GPU tasks.
- For lower agent costs, mixing large and small models may be better than using one big model for everything.