Training Flux Locally on Mac

Community Article Published September 12, 2024

For all those struggling to set this up right now.

(rearticulated by A.C.T. soonยฎ from a post/repo by Hughescr and the ai-toolkit Flux training script by Ostris)

This workflow is not grounded in Diffusers. However, I have not yet encountered a working Diffusers implementation of local Flux training on Mac/mps. If such a workflow/pipeline exists, I would sincerely appreciate it if someone linked me to it (or/and advised me on implementation details). Such as via alekseycalvin@gmail.com, or a comment somewhere... Like, say, one of my Flux LoRA repos here on Huggingface... (By the way, check them out? To improve Flux Schnell, use Historic Color Schnell.)

But to the point, the repo to train locally on Mac is here, as a somewhat modified branch of Ostris' ai-toolkit training script git.

Below sits the link to the ai-toolkit repo modified for MacOS: https://github.com/hughescr/ai-toolkit

To be clear, I'm not the person behind this branch, but myself finally stumbled upon it whilst seeking far and wide for many hours any extant Flux training solution adapted for MacOS/sillicon. So, if this works for you, then please thank that prodigious wizard Ostris (the developer of ai-toolkit training scripts), along with this Mac-oriented branch's mysterious author: a certain Hughescr. 

Credit and solidarity further extends to all who -- in chronic scepticism of seemingly insurmountable limitations -- stubbornly tinker and quest for options, solutions, and possibilities.

In any case, on the basis of another guide post by Hughescr, plus a few notes/details added from myself for clarity, I just put together the below short guide on setting up local Flux training on Macs using ai-toolkit + the Hughescr branch.

Take heed though: without further optimization, this is very unlikely to work on Mac systems with low unified memory! 

WORKFLOW to TRAIN FLUX On Mac/OSX/mps:

In Terminal, clone https://github.com/hughescr/ai-toolkit, following the Linux instructions in the README there.

As in:

git clone https://github.com/hughescr/ai-toolkit

Then travel over to the cloned directory:

cd ai-toolkit

Do this:

git submodule update --init --recursive

Then make a virtual environment from the same folder:

python3 -m venv venv

Activate it:

source venv/bin/activate

Install PyTorch :

pip3 install torch

Install requirements for the ai-toolkit, which should also extend it with certain submodules updated/introduced by Hughescr:

pip3 install -r requirements.txt

**Here's a list of all that Hughescr introduced to that branch of Ostris' ai-toolkit training script, in order to adapt it for Mac OSX (I quote the below from their post):

-- Using torch.amp instead of torch.cuda.amp (which should work for CUDA too, but will allow MPS to work, using an MPS-compatible GradScaler).

-- Force-using spawn instead of fork for multiprocessing.

-- Turning off the T5 quantizer, because this won't work on MPS.

-- Forcing the dataloader to have num_workers=0 (otherwise the process breaks on Mac). This may be done by adding "num_workers=0" to your config file for a prospective training: in this context, this would be your variant of one of the (.yaml) template configs from /ai-toolkit/config/examples. The Hughescr branch of ai-toolkit is supposed to already pre-enforce this particular option, even irrespectively of the config, but it might be better to make doubly sure manually.

On a side note for those aspiring Flux remodelers who are new to training scripts or relatively code-fresh: The template config file, typically in a file format of either .yaml or .json (such as for Kohya ss) is an essential component of launching a local training (at least without some GUI interface container/app), and typically carries strict internal formatting rules, corresponding to its data type broadly and/or family/architecture of trainers more specifically. As such, whilst specifying "num_workers=0" or or filling in your training parameters or modifying anything else within a config .yaml (or .json, etc), make sure to match closely the format and syntax found throughout the config template! Else get condemned to exasperating backtracking come runtime.

Relatedly, the /ai-toolkit local trainer scripts folder contains a wide range of template configs, not just for training Flux, but for many other sorts of models as well. There's much there to explore and potentially try. Instrumental to our given case, however, are the specific template config .yaml files for Flux Dev and Flux Schnell. These configs, train_lora_flux_24gb.yaml and train_lora_flux_schnell_24gb.yaml, are found in the /config/examples/ subfolder of the cloned-in /ai-toolkit folder: the relevant config (for either training Dev or Schnell) is meant to get duplicated by you and thereafter modified for your use with the training script. These configs may then be brought in as an argument to the run.py launcher, if you want to launch the trainer all at once directly from the config and through the Terminal. Or the config could be dragged into/opened via a dedicated UI. The built-in ai-toolkit UI may be launched from the same /ai-toolkit folder, via the flux_train_ui.py Python executable.)

   ๐ŸŒ•๐ŸŒ–๐ŸŒ—๐ŸŒ˜๐ŸŒ‘๐ŸŒ’๐ŸŒ“๐ŸŒ”๐ŸŒ•๐ŸŒ–๐ŸŒ—๐ŸŒ˜๐ŸŒ‘๐ŸŒ’๐ŸŒ“๐ŸŒ”๐ŸŒ•๐ŸŒ–๐ŸŒ—๐ŸŒ˜๐ŸŒ‘๐ŸŒ’๐ŸŒ“๐ŸŒ”๐ŸŒ•

NOW, to run/modify the script, follow further usage instructions here:

https://github.com/hughescr/ai-toolkit#tutorial

Finally, in order to side-step functions un-implemented in MPS (thus far), one needs to launch the Python executable for training script with the following argument:

PYTORCH_ENABLE_MPS_FALLBACK=1

This is basically a way of enabling selective/temporary CPU-offload of operations unable to work with MPS/sillicon.

As in:

PYTORCH_ENABLE_MPS_FALLBACK=1 python run.py config/your-custom-config-file.yaml

This should launch the custom training config, and thereby the training itself!

Lastly, just for clarity, and in case anyone reading this is new to manually-launched training, I will reiterate:

To specify stuff like, say, dataset folder location input, model output folder location, trigger phrase/token, learning rate, optimizer, etc... one must duplicate and modify the .yaml config file from the /config or the /config/examples/ subfolder of your /ai-toolkit folder...

NOW GO AND TRY THIS!

Sincerely,

A.C.T. SOONยฎ