speech
Asked on 06/13/2025
1 search
Apple's WWDC 2025 introduced the new Speech Analyzer API, which enhances speech-to-text capabilities across Apple platforms. This API, part of the Speech Framework, allows developers to perform speech-to-text processing with minimal code, entirely on-device. It supports a wide range of use cases, including long-form and distant audio like lectures and meetings. The Speech Analyzer API is designed to work asynchronously, allowing applications to process audio input and display results independently.
For more detailed information, you can refer to the session titled "Bring advanced speech-to-text to your app with SpeechAnalyzer," which covers the API's design, usage, and integration into apps. The session also includes a live coding demo to help developers get started with building speech-to-text features.
If you're interested in exploring this further, you can check out the session chapters:

Bring advanced speech-to-text to your app with SpeechAnalyzer
Discover the new SpeechAnalyzer API for speech to text. We’ll learn about the Swift API and its capabilities, which power features in Notes, Voice Memos, Journal, and more. We’ll dive into details about how speech to text works and how SpeechAnalyzer and SpeechTranscriber can enable you to create exciting, performant features. And you’ll learn how to incorporate SpeechAnalyzer and live transcription into your app with a code-along.

Discover machine learning & AI frameworks on Apple platforms
Tour the latest updates to machine learning and AI frameworks available on Apple platforms. Whether you are an app developer ready to tap into Apple Intelligence, an ML engineer optimizing models for on-device deployment, or an AI enthusiast exploring the frontier of what is possible, we’ll offer guidance to help select the right tools for your needs.

Optimize for the spatial web
Discover how to make the most of visionOS capabilities on the web. Explore recent updates like improvements to selection highlighting, and the ability to present spatial photos and panorama images in fullscreen. Learn to take advantage of existing web standards for dictation and text-to-speech with WebSpeech, spatial soundscapes with WebAudio, and immersive experiences with WebXR.