How are models able to run on device?

Models are able to run on Apple devices through a series of steps that involve defining, training, converting, and integrating the models using Apple's machine learning frameworks and tools. Here is a detailed breakdown of the process:

Model Training and Preparation:
- Defining and Training: Initially, you define the model architecture and train it using appropriate training data. This can be done using popular training libraries such as PyTorch or TensorFlow, leveraging Apple Silicon and the unified memory architecture on Mac for high-performance training.
- Conversion to Core ML: After training, the model is converted into the Core ML format. This step involves optimizing the model representation and parameters to achieve great performance while maintaining accuracy. Core ML tools offer various optimization techniques, such as quantization and efficient key-value caching for large language models (LLMs).
Model Integration:
- Using Core ML: Core ML is the primary framework for running AI models on Apple devices. It allows for the integration of the model into your app, making it easy to load and execute the model. Core ML optimizes hardware-accelerated execution across the CPU, GPU, and neural engine.
- MLTensor: For more complex use cases, such as generative AI, Core ML introduces MLTensor, which simplifies the computational glue code needed to stitch models together. This helps in managing operations outside the model, which are necessary for running an end-to-end pipeline.
Running Models on Device:
- Frameworks and Tools: Core ML, along with other frameworks like NPSGraph and BNNSGraph, helps in executing models efficiently by dispatching work across the CPU, GPU, and neural engine. This ensures that the models run with optimal performance and energy efficiency.
- Performance Tools: Core ML provides performance tools to profile and debug models, helping developers to further optimize their models for better performance on Apple devices.

For more detailed information, you can refer to the following sessions from WWDC 2024:

Explore machine learning on Apple platforms (07:32)
Deploy machine learning and AI models on-device with Core ML (01:57)
Platforms State of the Union (16:37)

These sessions provide comprehensive insights into the process of running models on Apple devices, including the latest features and tools available for developers.