New documents capture api
Asked on 06/11/2025
1 search
The new documents capture API is part of the Vision framework enhancements introduced at WWDC 2025. This API, known as the RecognizeDocuments Request, allows developers to extract structural elements and important information from documents. It can detect structures such as tables and lists, group lines of text into paragraphs, and identify machine-readable codes like QR codes. This API supports text recognition in 26 languages and is designed to provide a better understanding of document structures, making it easier to parse with fewer lines of code.
For more details, you can refer to the session titled "Read documents using the Vision framework" from WWDC 2025. The relevant chapter for reading documents starts at 00:01:22.

Explore enhancements to your spatial business app
Discover how the latest enhancements and APIs in visionOS 26 expand access and extend enterprise capabilities announced last year. Learn how these all-new features make it easy to build model training workflows, enhance video feeds, and enable you to align coordinate systems over a local network to develop collaborative experiences in your in-house app.

Keep colors consistent across captures
Meet the Constant Color API and find out how it can help people use your app to determine precise colors. You’ll learn how to adopt the API, explore its scientific and marketing potential, and discover best practices for making the most of the technology.

Read documents using the Vision framework
Learn about the latest advancements in the Vision framework. We’ll introduce RecognizeDocumentsRequest, and how you can use it to read lines of text and group them into paragraphs, read tables, etc. And we’ll also dive into camera lens smudge detection, and how to identify potentially smudged images in photo libraries or your own camera capture pipeline.