vMLX
by jjang-ai
Local AI engine for Apple Silicon providing OpenAI and Anthropic compatible APIs for LLMs, VLMs, and Image Gen.
What it does
vMLX is a high-performance local AI inference engine optimized for Apple Silicon (M1-M4). It allows users to run a vast array of models (LLMs, VLMs, and Flux image models) entirely on-device, providing a secure, private alternative to cloud APIs with full compatibility with OpenAI and Anthropic wire formats.
Tools
- Local LLM Serving: Run any mlx-community model with continuous batching and paged KV cache.
- Image Generation: Local Flux Schnell/Dev and Z-Image Turbo generation and editing.
- Distributed Inference: Split large models across multiple Macs via Thunderbolt or Ethernet.
- JANG Quantization: Adaptive mixed-precision quantization for superior quality at low bit-rates.
Installation
Install via uv:
brew install uv
uv tool install vmlx
vmlx serve mlx-community/Qwen3-8B-4bit
Supported hosts
- Claude Desktop
- Cursor
- Codex
- Gemini-CLI
Quick install
uv tool install vmlxInformation
- Pricing
- free
- Published
- 4/14/2026
- stars






