I want to extract text from an imafe. The image might contain table and text and can be formatted in disticnt ways
Asked on 06/12/2025
1 search
To extract text from an image, especially when the image contains structured formats like tables and lists, you can use the Vision framework's new API called Recognized Documents Request. This API is designed to detect and extract structural elements from documents, such as tables, lists, and paragraphs, and it can recognize text in 26 languages. It provides a hierarchical structure of the document, allowing you to understand not just the text but also its formatting and structure.
For example, if you have a document with a table, the API can detect the table structure, including rows and columns, and provide the content of each cell. This is particularly useful for extracting information from documents where the layout is important, such as forms or spreadsheets.
You can learn more about this feature in the session titled "Read documents using the Vision framework" from WWDC 2025. Here's a relevant section from the session: Read documents using the Vision framework (00:01:29).

Read documents using the Vision framework
Learn about the latest advancements in the Vision framework. We’ll introduce RecognizeDocumentsRequest, and how you can use it to read lines of text and group them into paragraphs, read tables, etc. And we’ll also dive into camera lens smudge detection, and how to identify potentially smudged images in photo libraries or your own camera capture pipeline.

Develop for Shortcuts and Spotlight with App Intents
Learn about how building App Intents that make actions available and work best with the new features in Shortcuts and Spotlight on Mac. We’ll show you how your actions combine in powerful ways with the new Apple Intelligence actions available in the Shortcuts app. We’ll deep-dive into how the new “Use Model” action works, and how it interacts with your app’s entities. And we’ll discuss how to use the App Intents APIs to make your actions available in Spotlight.

Discover Swift enhancements in the Vision framework
The Vision Framework API has been redesigned to leverage modern Swift features like concurrency, making it easier and faster to integrate a wide array of Vision algorithms into your app. We’ll tour the updated API and share sample code, along with best practices, to help you get the benefits of this framework with less coding effort. We’ll also demonstrate two new features: image aesthetics and holistic body pose.