Offline real-time speech-to-text on iPhone — open-source demo released

A developer has open-sourced a demo iPhone app that converts speech to text in real time, entirely on-device with no internet needed. It uses NVIDIA's Nemotron 3.5 speech model converted to Apple's Core ML format. This gives developers a starting point for adding private, offline voice input to iOS apps or AI agents.

The project is a proof-of-concept (PoC) iOS app that performs streaming ASR — meaning text appears on screen as you speak, not after you finish. The key technical step was converting NVIDIA's Nemotron 3.5 speech recognition model into Apple's Core ML format, which allows it to run entirely inside the iPhone or iPad without sending any audio to an external server. This means voice data stays on the device, which is good for privacy and works in places without a network connection.

Because this is still an early demo rather than a finished product, it is best treated as a learning resource or experiment base. The code is publicly available, so developers who want to add offline voice recognition to an iOS app or an on-device AI agent can study or adapt it directly.

Key points

  • Runs fully on-device — no internet or server required for speech recognition
  • NVIDIA Nemotron 3.5 model converted to Apple's Core ML format for on-device inference
  • Streaming mode: text appears in real time while you speak, not after you stop
  • Audio never leaves the device, so it is more private than cloud-based alternatives
  • Still a PoC — suitable for experimentation and learning, not production use yet

Quick term guide

open-sourced
The code has been made public so others can inspect, use, or contribute to it.
open-source
Software whose code is shared publicly so others can inspect, use, or change it.
developers
Developers are people who build software, apps, or websites.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
streaming
Here it means text is generated continuously as you speak, rather than waiting until you finish talking.
on-device AI
AI processing that runs on the phone itself rather than sending data to a remote server, which is faster and more private.
on-device inference
Running an AI model on your own machine instead of sending the work to a cloud service.
production
The live version of a service that real users use.
Read original