Tell me about transcribing audio files into transcripts

Asked on 2025-06-19

1 search

This post is brought to you by Tailscan,
the best devtool for Tailwind CSS.

Apple's WWDC session titled "Bring advanced speech-to-text to your app with SpeechAnalyzer" provides a comprehensive overview of how to transcribe audio files into transcripts using the SpeechAnalyzer API. The session covers the following key points:

SpeechAnalyzer API: The API allows developers to manage an analysis session and add a transcriber module to perform speech-to-text processing. This is done asynchronously, enabling the application to handle audio input and transcription results independently.
Volatile and Finalized Results: The transcriber can deliver both volatile (real-time, less accurate) and finalized (more accurate) results. Volatile results provide immediate feedback, which can be useful for live transcription experiences.
Transcription Process: The process involves setting up the SpeechAnalyzer with a transcriber module, configuring it with the desired locale, and handling the results. The transcription results are represented as attributed strings, which include timing data for synchronization with audio playback.
On-Device Processing: Transcription is performed entirely on-device, ensuring privacy. However, the necessary language models need to be downloaded and managed using the Asset Inventory API.
Integration with SwiftUI: The session demonstrates how to integrate the transcription results into a SwiftUI application, using attributed strings to visualize the transition from volatile to finalized results.

For more detailed information, you can refer to the session Bring advanced speech-to-text to your app with SpeechAnalyzer (09:06).

Bring advanced speech-to-text to your app with SpeechAnalyzer

Discover the new SpeechAnalyzer API for speech to text. We’ll learn about the Swift API and its capabilities, which power features in Notes, Voice Memos, Journal, and more. We’ll dive into details about how speech to text works and how SpeechAnalyzer and SpeechTranscriber can enable you to create exciting, performant features. And you’ll learn how to incorporate SpeechAnalyzer and live transcription into your app with a code-along.

Optimize for the spatial web

Discover how to make the most of visionOS capabilities on the web. Explore recent updates like improvements to selection highlighting, and the ability to present spatial photos and panorama images in fullscreen. Learn to take advantage of existing web standards for dictation and text-to-speech with WebSpeech, spatial soundscapes with WebAudio, and immersive experiences with WebXR.

Deep dive into the Foundation Models framework

Level up with the Foundation Models framework. Learn how guided generation works under the hood, and use guides, regexes, and generation schemas to get custom structured responses. We’ll show you how to use tool calling to let the model autonomously access external information and perform actions, for a personalized experience. To get the most out of this video, we recommend first watching “Meet the Foundation Models framework”.