Open SourceImportance: Medium

A 12GB VRAM user asks why local LLM tasks keep failing

r/selfhostedJun 11, 2026 · 4h ago

A Reddit user says they run an LLM on a Linux server with 12GB VRAM. They use it mostly for self hosting admin tasks, with Qwen Code CLI as the client and Ollama loading the models. They have mainly used Qwen 3.5 9B with a large context window. The user says the model sometimes stops mid-task or falls into a loop, and asks whether the problem is low VRAM, the model, or the workflow.

Key points

The setup has 12GB VRAM dedicated to running an LLM.
The user runs the models on a Linux server through Ollama.
They use Qwen Code CLI for self hosting admin tasks.
They have used Qwen 3.5 9B with a large context window.
The model sometimes stops during a task or gets stuck in a loop.

Quick term guide

context window: The amount of text an AI tool can remember and use in one chat.
AI agents: AI agents are AI tools that can carry out steps toward a goal, not just answer once.
API bill: The fee charged each time you call an AI model, based on how much text is processed
reliability: How consistently a tool works without failing or behaving unexpectedly.
liability: Legal responsibility for causing an accident or damage.
local setup: A way to run AI on your own computer or home machine instead of a cloud service.
local models: AI models that run on your own computer or device instead of a company server.
local model: An AI model you run directly on your own computer, with no internet connection or external service needed.

Read original ↗