llama.cpp adds support for Cohere’s smaller active-code model
llama.cpp now supports the needed to run Cohere2-MoE models. This makes it easier to run Cohere and Cohere Labs’ North Mini Code 1.0 in local setups. North Mini Code 1.0 is an open-weights research model built for code writing, agent-style software work, and terminal tasks.
The model has 30 billion total parameters, but only 3 billion are active at one time. That MoE design can help keep running costs and speed more manageable because the whole model is not used for every step. It supports a of up to 256,000 tokens and output of up to 64,000 tokens.
It is released under the Apache 2.0 license, which makes experimentation and possible product use easier to evaluate.
Key points
- llama.cpp added support for the Cohere2-MoE .
- North Mini Code 1.0 is aimed at , agent-style software engineering, and terminal tasks.
- The model has 30 billion total parameters, with 3 billion active at a time.
- It supports up to 256,000 tokens of context and up to 64,000 tokens of output.
- The model is released under the Apache 2.0 license.