extract table in images

To extract tables from images, you can use the Vision framework's Recognized Documents request. This API allows you to detect and extract structured elements like tables from documents. Here's a brief overview of how it works:

Capture the Image: Take a photo of the document using a device like an iPad.
Create a Recognized Documents Request: Use this request to detect tables within the image. The request will return a document observation, which contains the structure of the document, including tables.
Extract Table Structure: Access the tables property on the document observation to extract tables. Each table is composed of a 2D array of cells, which can be accessed by rows or columns. The table's boundary is defined as the bounding region, providing coordinates relative to the image.
Access Table Content: Each cell in the table has properties indicating its row and column. The content of a cell can include text, tables, lists, or barcodes. You can extract text from each cell using the transcript, which provides all text in a cell as a single string.

For more detailed information, you can refer to the session "Read documents using the Vision framework" from WWDC 2025. Here's a link to the session where table extraction is discussed in detail.