can I use the new FoundationModels framework to extract structured data from an image?

Asked on 06/14/2025

1 search

Yes, you can use the new Foundation Models framework to extract structured data, but it is primarily designed for tasks involving text, such as generating structured output from text prompts. The framework provides features like "Generable" and "Guided Generation" to help you get structured output from text prompts, which can be useful for generating structured data like JSON or CSV from text.

However, for extracting structured data from images, you would use the Vision framework. The Vision framework has a new API called "RecognizeDocumentsRequest" that can detect and extract structured elements from images, such as tables and lists, and group lines of text into paragraphs. This would be the appropriate tool for extracting structured data from images.

For more information on extracting structured data from images, you can refer to the session Read documents using the Vision framework (00:01:22).