is foundation model framework multi-modal

Asked on 06/13/2025

2 searches

The Foundation Models framework introduced by Apple at WWDC is primarily focused on providing access to an on-device large language model. It is designed for tasks such as text extraction, summarization, and content generation. However, the context provided does not explicitly mention that the Foundation Models framework is multi-modal. It seems to be focused on language-based tasks and does not indicate support for other modalities like images or audio.

For more detailed information, you might want to check out the session Meet the Foundation Models framework which provides an overview of the framework and its capabilities.