how use intelligent api to inject ocr

Asked on 06/16/2025

1 search

To use intelligent APIs to inject OCR (Optical Character Recognition) capabilities into your app, you can leverage the Vision framework provided by Apple. The Vision framework offers a range of capabilities for visual intelligence, including text extraction from images. Here's a brief overview of how you can integrate OCR using the Vision framework:

  1. Recognize Text Request: Use the RecognizeTextRequest to detect and extract lines of text from an image. This is useful for documents with a structured format, such as tables and lists.

  2. Recognize Documents Request: This new API can extract structural elements and important information from documents, such as tables, lists, and machine-readable codes like QR codes. It can also identify important information like email addresses, phone numbers, or URLs.

  3. Swift Enhancements: The Vision framework has introduced a new Swift API, which simplifies the integration of these capabilities into your apps. The new API uses async/await syntax for better performance and ease of use.

For a more detailed guide on using the Vision framework for OCR, you can refer to the session Read documents using the Vision framework (00:01:22) from WWDC 2025. This session provides insights into how to use the Vision framework to read and understand documents effectively.