Can i send image as input to foundation model for the model to reference?

The Foundation Models framework introduced by Apple at WWDC allows developers to use on-device large language models for various tasks. However, the context provided does not mention the capability to send images as input to the foundation model for reference. The framework primarily focuses on text-based tasks such as text extraction, summarization, and content generation.

For image-related tasks, Apple provides the Vision framework, which is designed to understand the content of images and videos. If you are interested in image analysis, you might want to explore the Vision framework as mentioned in the session Discover machine learning & AI frameworks on Apple platforms.

If you have more specific questions about the capabilities of the Foundation Models framework or other Apple frameworks, feel free to ask!