how use intelligent api to inject ocr
Asked on 06/16/2025
1 search
To use intelligent APIs to inject OCR (Optical Character Recognition) capabilities into your app, you can leverage the Vision framework provided by Apple. The Vision framework offers a range of capabilities for visual intelligence, including text extraction from images. Here's a brief overview of how you can integrate OCR using the Vision framework:
-
Recognize Text Request: Use the
RecognizeTextRequest
to detect and extract lines of text from an image. This is useful for documents with a structured format, such as tables and lists. -
Recognize Documents Request: This new API can extract structural elements and important information from documents, such as tables, lists, and machine-readable codes like QR codes. It can also identify important information like email addresses, phone numbers, or URLs.
-
Swift Enhancements: The Vision framework has introduced a new Swift API, which simplifies the integration of these capabilities into your apps. The new API uses async/await syntax for better performance and ease of use.
For a more detailed guide on using the Vision framework for OCR, you can refer to the session Read documents using the Vision framework (00:01:22) from WWDC 2025. This session provides insights into how to use the Vision framework to read and understand documents effectively.

Explore machine learning on Apple platforms
Get started with an overview of machine learning frameworks on Apple platforms. Whether you’re implementing your first ML model, or an ML expert, we’ll offer guidance to help you select the right framework for your app’s needs.

Discover machine learning & AI frameworks on Apple platforms
Tour the latest updates to machine learning and AI frameworks available on Apple platforms. Whether you are an app developer ready to tap into Apple Intelligence, an ML engineer optimizing models for on-device deployment, or an AI enthusiast exploring the frontier of what is possible, we’ll offer guidance to help select the right tools for your needs.

Read documents using the Vision framework
Learn about the latest advancements in the Vision framework. We’ll introduce RecognizeDocumentsRequest, and how you can use it to read lines of text and group them into paragraphs, read tables, etc. And we’ll also dive into camera lens smudge detection, and how to identify potentially smudged images in photo libraries or your own camera capture pipeline.