Open SourceImportance: Medium

llama.cpp adds support for Cohere’s smaller active-code model

r/LocalLLaMAJun 14, 2026 · 12h ago

llama.cpp now supports the needed to run Cohere2-MoE models. This makes it easier to run Cohere and Cohere Labs’ North Mini Code 1.0 in local setups. North Mini Code 1.0 is an open-weights research model built for code writing, agent-style software work, and terminal tasks.

The model has 30 billion total parameters, but only 3 billion are active at one time. That MoE design can help keep running costs and speed more manageable because the whole model is not used for every step. It supports a of up to 256,000 tokens and output of up to 64,000 tokens.

It is released under the Apache 2.0 license, which makes experimentation and possible product use easier to evaluate.

Key points

llama.cpp added support for the Cohere2-MoE .
North Mini Code 1.0 is aimed at , agent-style software engineering, and terminal tasks.
The model has 30 billion total parameters, with 3 billion active at a time.
It supports up to 256,000 tokens of context and up to 64,000 tokens of output.
The model is released under the Apache 2.0 license.

Read original ↗