can I use the new FoundationModels framework to extract structured data from an image?
Asked on 06/14/2025
1 search
Yes, you can use the new Foundation Models framework to extract structured data, but it is primarily designed for tasks involving text, such as generating structured output from text prompts. The framework provides features like "Generable" and "Guided Generation" to help you get structured output from text prompts, which can be useful for generating structured data like JSON or CSV from text.
However, for extracting structured data from images, you would use the Vision framework. The Vision framework has a new API called "RecognizeDocumentsRequest" that can detect and extract structured elements from images, such as tables and lists, and group lines of text into paragraphs. This would be the appropriate tool for extracting structured data from images.
For more information on extracting structured data from images, you can refer to the session Read documents using the Vision framework (00:01:22).

Discover machine learning & AI frameworks on Apple platforms
Tour the latest updates to machine learning and AI frameworks available on Apple platforms. Whether you are an app developer ready to tap into Apple Intelligence, an ML engineer optimizing models for on-device deployment, or an AI enthusiast exploring the frontier of what is possible, we’ll offer guidance to help select the right tools for your needs.

Deep dive into the Foundation Models framework
Level up with the Foundation Models framework. Learn how guided generation works under the hood, and use guides, regexes, and generation schemas to get custom structured responses. We’ll show you how to use tool calling to let the model autonomously access external information and perform actions, for a personalized experience. To get the most out of this video, we recommend first watching “Meet the Foundation Models framework”.

Platforms State of the Union
Discover the newest advancements on Apple platforms.