tell me about the new framework to run LLM local on device
Asked on 06/19/2025
1 search
Apple introduced a new framework called the Foundation Models Framework at WWDC 2025, which allows developers to run large language models (LLMs) directly on devices. This framework is designed to enhance app features by providing access to on-device LLMs with a convenient Swift API. It is available on macOS, iOS, iPadOS, and VisionOS.
The Foundation Models Framework is optimized for tasks such as content generation, text summarization, and user input analysis, all while ensuring user data privacy since everything runs on-device. This means that the models can operate offline, and there's no need for cloud API costs or data to be sent off-device.
The framework supports a large language model with 3 billion parameters, quantized to 2 bits, making it efficient for device-scale applications. It is not intended for tasks requiring extensive world knowledge or advanced reasoning, which are typically handled by server-scale LLMs.
For developers looking to integrate these capabilities, the framework provides a simple API to prompt the model and generate responses. Additionally, the Core ML framework can be used to import and run other AI models on Apple devices, optimizing performance across the CPU, GPU, and neural engine.
For more detailed information, you can refer to the session Meet the Foundation Models framework (00:00:00).

Explore prompt design & safety for on-device foundation models
Design generative AI experiences that leverage the strengths of the Foundation Models framework. We’ll start by showing how to design prompts for the on-device large language model at the core of Apple Intelligence. Then, we’ll introduce key ideas around AI safety, and offer concrete strategies to make your generative AI features safe, reliable, and delightful.

Platforms State of the Union
Discover the newest advancements on Apple platforms.

Meet the Foundation Models framework
Learn how to tap into the on-device large language model behind Apple Intelligence! This high-level overview covers everything from guided generation for generating Swift data structures and streaming for responsive experiences, to tool calling for integrating data sources and sessions for context management. This session has no prerequisites.