Open SourceImportance: Medium

Offline real-time speech-to-text on iPhone — open-source demo released

r/LocalLLMJun 10, 2026 · 8h ago

A developer has open-sourced a demo iPhone app that converts speech to text in real time, entirely on-device with no internet needed. It uses NVIDIA's Nemotron 3.5 speech model converted to Apple's Core ML format. This gives developers a starting point for adding private, offline voice input to iOS apps or AI agents.

The project is a proof-of-concept (PoC) iOS app that performs streaming ASR — meaning text appears on screen as you speak, not after you finish. The key technical step was converting NVIDIA's Nemotron 3.5 speech recognition model into Apple's Core ML format, which allows it to run entirely inside the iPhone or iPad without sending any audio to an external server. This means voice data stays on the device, which is good for privacy and works in places without a network connection.

Because this is still an early demo rather than a finished product, it is best treated as a learning resource or experiment base. The code is publicly available, so developers who want to add offline voice recognition to an iOS app or an on-device AI agent can study or adapt it directly.

Key points

Runs fully on-device — no internet or server required for speech recognition
NVIDIA Nemotron 3.5 model converted to Apple's Core ML format for on-device inference
Streaming mode: text appears in real time while you speak, not after you stop
Audio never leaves the device, so it is more private than cloud-based alternatives
Still a PoC — suitable for experimentation and learning, not production use yet

Quick term guide

open-sourced: The code has been made public so others can inspect, use, or contribute to it.
open-source: Software whose code is shared publicly so others can inspect, use, or change it.
developers: Developers are people who build software, apps, or websites.
AI agents: AI agents are AI tools that can carry out steps toward a goal, not just answer once.
streaming: Here it means text is generated continuously as you speak, rather than waiting until you finish talking.
on-device AI: AI processing that runs on the phone itself rather than sending data to a remote server, which is faster and more private.
on-device inference: Running an AI model on your own machine instead of sending the work to a cloud service.
production: The live version of a service that real users use.

Read original ↗