A 12GB VRAM user asks why local LLM tasks keep failing

A Reddit user says they run an LLM on a Linux server with 12GB VRAM. They use it mostly for self hosting admin tasks, with Qwen Code CLI as the client and Ollama loading the models. They have mainly used Qwen 3.5 9B with a large context window. The user says the model sometimes stops mid-task or falls into a loop, and asks whether the problem is low VRAM, the model, or the workflow.

Key points

  • The setup has 12GB VRAM dedicated to running an LLM.
  • The user runs the models on a Linux server through Ollama.
  • They use Qwen Code CLI for self hosting admin tasks.
  • They have used Qwen 3.5 9B with a large context window.
  • The model sometimes stops during a task or gets stuck in a loop.

Quick term guide

context window
The amount of text an AI tool can remember and use in one chat.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
API bill
The fee charged each time you call an AI model, based on how much text is processed
reliability
How consistently a tool works without failing or behaving unexpectedly.
liability
Legal responsibility for causing an accident or damage.
local setup
A way to run AI on your own computer or home machine instead of a cloud service.
local models
AI models that run on your own computer or device instead of a company server.
local model
An AI model you run directly on your own computer, with no internet connection or external service needed.
Read original