llama.cpp adds support for Cohere’s smaller active-code model

llama.cpp now supports the needed to run Cohere2-MoE models. This makes it easier to run Cohere and Cohere Labs’ North Mini Code 1.0 in local setups. North Mini Code 1.0 is an open-weights research model built for code writing, agent-style software work, and terminal tasks.

The model has 30 billion total parameters, but only 3 billion are active at one time. That MoE design can help keep running costs and speed more manageable because the whole model is not used for every step. It supports a of up to 256,000 tokens and output of up to 64,000 tokens.

It is released under the Apache 2.0 license, which makes experimentation and possible product use easier to evaluate.

Key points

  • llama.cpp added support for the Cohere2-MoE .
  • North Mini Code 1.0 is aimed at , agent-style software engineering, and terminal tasks.
  • The model has 30 billion total parameters, with 3 billion active at a time.
  • It supports up to 256,000 tokens of context and up to 64,000 tokens of output.
  • The model is released under the Apache 2.0 license.
Read original