Back to walidsassi.com
Running large language models locally on Apple Silicon

Local LLMs on Apple Silicon, Part 1: From Compatibility to Your First Local Chat

Cloud APIs put a frontier model behind a single HTTPS call. That convenience is hard to beat, and for most production workloads it remains the right choice. But something has shifted over the last couple of years: the gap between “what a hosted model can do” and “what a model running on your laptop can do” has narrowed enough that local inference is no longer a curiosity. For developers, especially those of us building on Apple Silicon, it has become a serious option. ...

May 21, 2026 · 19 min · Walid Sassi
Quantization in LLMs for mobile developers

Quantization in LLMs: How to Run AI on Your iPhone Without Burning It

In Part 1 of this series, we set up the MLX ecosystem and ran a language model locally on Apple Silicon. If you haven’t read it yet, it’s worth starting there. This article tackles the question that naturally follows: how do you fit a multi-billion parameter model into a device with 8 GB of RAM? The answer is quantization, and understanding it will change how you think about on-device AI. Introduction: LLMs Are Just Very Large Matrices Before we can explain quantization, we need to be clear about what we’re actually compressing. ...

April 6, 2026 · 10 min · Walid Sassi