Apple MLX Integration
You can use Apple MLX as an optimized worker implementation in FastChat.
It runs models efficiently on Apple Silicon
See the supported models here.
Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.
Instructions
Install MLX.
pip install "mlx-lm>=0.0.6"
When you launch a model worker, replace the normal worker (
fastchat.serve.model_worker
) with the MLX worker (fastchat.serve.mlx_worker
). Remember to launch a model worker after you have launched the controller (instructions)python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0