How to extract text from a screen

Asked on 06/13/2025

1 search

To extract text from a screen using Apple's technologies, you can utilize the Vision framework, specifically the RecognizeDocuments Request API. This API allows you to detect and extract lines of text from an image, as well as recognize structural elements like tables and lists. It can also identify important information such as email addresses, phone numbers, and URLs. This makes it easier to parse documents with fewer lines of code.

For more detailed information, you can refer to the session titled "Read documents using the Vision framework" from WWDC 2025. The session covers how to use the Vision framework to extract text and structural elements from documents. You can start learning about reading documents at the chapter marker Reading documents.

If you need to extract text from a structured document, such as a table, the API provides the ability to detect the table's structure and extract text from each cell, which can then be used to create a list of contacts or other structured data.