Apple MLX Integration

You can use Apple MLX as an optimized worker implementation in FastChat.

It runs models efficiently on Apple Silicon

See the supported models here.

Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.

Instructions

Install MLX.
```
pip install "mlx-lm>=0.0.6"
```
When you launch a model worker, replace the normal worker (fastchat.serve.model_worker) with the MLX worker (fastchat.serve.mlx_worker). Remember to launch a model worker after you have launched the controller (instructions)
```
python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0
```