User runs local Qwen models on ARC B70 for AI coding work

A Reddit user shared their experience running local AI workloads on an Intel ARC B70 graphics card. They said they used Qwen3.6-35B-A3B-GGUF with Hermes agent and Qwen3.6-27B-MTP-GGUF with Claude Code. They tried LM Studio, the official llama.cpp Docker image, a community Docker image, and kyuz0’s tools before building a Docker image themselves, which they said improved speed and stability.

Key points

  • The author says the ARC B70 handled local AI workloads well for them.
  • They used Claude Code with Qwen3.6-27B-MTP-GGUF.
  • They used Hermes agent with Qwen3.6-35B-A3B-GGUF.
  • They dropped LM Studio because it lacked SYCL backend support.
  • They reported better speed and stability after building the Docker image themselves.

Quick term guide

AI workload
A set of tasks that use an AI model or process AI results.
workloads
The tasks a computer is expected to handle.
graphics card
A component inside a computer that handles displaying images and running game visuals.
Hermes Agent
It appears to be a tool or community for building and managing AI agents.
Solo developer
An individual who handles all parts of creating a project or product alone.
developers
Developers are people who build software, apps, or websites.
performance
How fast and smoothly a site loads and works.
SYCL backend support
Support that helps AI software use certain hardware, such as Intel GPUs, for faster calculations.
Read original