How do I load a CoreML model?

To load a CoreML model, you can use the Core ML tools Python API. Here's a general workflow based on the information from the WWDC sessions:

Convert the Model: Start by converting your model into the Core ML format using Core ML tools. This involves using the ct.convert function to convert models from frameworks like PyTorch or TensorFlow into Core ML models.
Optimize the Model: During conversion, you can optimize the model for Apple hardware using various techniques such as quantization and efficient key-value caching, especially for large language models.
Load the Model: Once you have your model in the Core ML format, you can load it using the Core ML tools Python API. If you're working with multifunction models, you can specify the function name to load a specific function within the model.
Run Inference: After loading the model, you can perform predictions as usual using the Core ML framework. Core ML automatically optimizes execution across the CPU, GPU, and neural engine to maximize performance.

For more detailed information, you can refer to the session Bring your machine learning and AI models to Apple silicon (26:24) which covers multifunction models and their usage.