Quantization in LLMs for mobile developers

Quantization in LLMs: How to Run AI on Your iPhone Without Burning It

In Part 1 of this series, we set up the MLX ecosystem and ran a language model locally on Apple Silicon. If you haven’t read it yet, it’s worth starting there. This article tackles the question that naturally follows: how do you fit a multi-billion parameter model into a device with 8 GB of RAM? The answer is quantization — and understanding it will change how you think about on-device AI. ...

April 6, 2026 · 11 min · Walid Sassi
MLX Swift — On-Device Large Language Models on Apple Silicon

MLX Swift: Enabling On-Device Large Language Models on Apple Silicon

Abstract The proliferation of large-scale neural language models has, until recently, been contingent upon access to remote computational infrastructure. The architectural characteristics of Apple Silicon — most notably its unified memory subsystem — present a substantive departure from this dependency. This article examines MLX Swift, a native Swift binding to Apple’s MLX machine learning framework, as a mechanism for deploying quantized Large Language Models (LLMs) directly on consumer Apple hardware. ...

March 31, 2026 · 14 min · Walid Sassi

Getting Started with Claude Code for Xcode 26: Setup, Pricing & Monitoring Guide

The landscape of iOS development has dramatically shifted in 2025. With Apple’s introduction of Xcode 26 at WWDC 2025, which integrates ChatGPT and supports multiple AI models through API keys, and Anthropic’s release of Claude Code as a powerful command-line tool for agentic coding, developers now have unprecedented AI-powered development capabilities. ...

September 2, 2025 · 6 min · Walid Sassi