Open SourceImportance: High

Apple announces a new local AI engine for its own chips

r/LocalLLaMAJun 9, 2026 · 2d ago

The Reddit post says Apple announced CoreAI, a new on-device inference engine for Apple Silicon, at WWDC. The author describes CoreAI as a possible future replacement for CoreML and an alternative to MLX, llama.cpp, and torch for running models on phones and tablets. The post says model weights must be converted with a Python script, and that no performance numbers are available yet. It also points to Apple’s claim about deploying a 20B model on device.

Key points

Apple announced CoreAI for running AI models on Apple Silicon, according to the post.
The author says it may replace CoreML for some local AI use cases.
Models need a conversion step before they can run in this system.
The post says there are no performance numbers yet.
Apple’s 20B model claim is highlighted as a sign that larger local models may be possible.

Quick term guide

on-device inference: Running an AI model on your own machine instead of sending the work to a cloud service.
Apple Silicon: Apple's own line of chips (M1, M2, M3, M4, M5) used in Macs, known for performance and efficiency.
model weights: The internal numbers an AI learns during training — saving them lets you reuse or share the trained AI.
Python script: A Python script is a small program written in the Python language.
local execution: Running an AI model directly on your own computer instead of using a remote server.
long sessions: Extended AI work sessions where many messages or steps happen over time.
local models: AI models that run on your own computer or device instead of a company server.
local model: An AI model you run directly on your own computer, with no internet connection or external service needed.

Read original ↗