Running failed on oobabooga's text-generation-webui, macOS, M1
Weights looked like loaded but actual chat was failed just as described in the title. I'm using Apple Silicon M1 computer, macOS Ventura 13.3.1(a), and tried to load and use this weights on oobabooga's text-generation-webui, then failed.
Oobabooga's text-generation-webui uses Hugging Face's transformers Python module for Hugging Face weights models, and installed transformers version is 4.28.0. Torch version is 2.0.1 and torchvision version is 0.15.2.
I have used many Hugging Face weights models but have never seen like this before. Even I tried to run with PYTORCH_ENABLE_MPS_FALLBACK=1 environment variable, this was happened.I read the error message, so I know it is related with torch.autocast. Does anybody know how to resolve this problem?
INFO:Loading mosaicml_mpt-7b-storywriter...
Loading checkpoint shards: 100%|ββββββββββββββββββ| 2/2 [00:03<00:00, 1.78s/it]
INFO:Loaded the model in 4.76 seconds.
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
INFO:Loading the extension "gallery"...
INFO:server listening on 127.0.0.1:5005
Starting API at http://127.0.0.1:5000/api
Running on local URL: http://127.0.0.1:8860
To create a public link, set `share=True` in `launch()`.
/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py:690: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/attention.py:263: UserWarning: The operator 'aten::pow.Scalar_out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
slopes = 1.0 / torch.pow(2, m)
Traceback (most recent call last):
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/text-generation-webui/modules/callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/text-generation-webui/modules/text_generation.py", line 263, in generate_with_callback
shared.model.generate(**kwargs)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
outputs = self(
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/modeling_mpt.py", line 237, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/modeling_mpt.py", line 183, in forward
(x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/blocks.py", line 35, in forward
a = self.norm_1(x)
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/norm.py", line 24, in forward
with torch.autocast(enabled=False, device_type=module_device.type):
File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in __init__
raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'')
RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'
Output generated in 0.35 seconds (0.00 tokens/s, 0 tokens, context 72, seed 553060808)
There are some community resources for using this model in oobabooga, such as https://youtu.be/QVVb6Md6huA β hopefully that helps.
@sam-mosaic Thanks but it didn't help at all.