Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

Running failed on oobabooga's text-generation-webui, macOS, M1

#19
by teneriffa - opened

Weights looked like loaded but actual chat was failed just as described in the title. I'm using Apple Silicon M1 computer, macOS Ventura 13.3.1(a), and tried to load and use this weights on oobabooga's text-generation-webui, then failed.

Oobabooga's text-generation-webui uses Hugging Face's transformers Python module for Hugging Face weights models, and installed transformers version is 4.28.0. Torch version is 2.0.1 and torchvision version is 0.15.2.

I have used many Hugging Face weights models but have never seen like this before. Even I tried to run with PYTORCH_ENABLE_MPS_FALLBACK=1 environment variable, this was happened.I read the error message, so I know it is related with torch.autocast. Does anybody know how to resolve this problem?

INFO:Loading mosaicml_mpt-7b-storywriter...
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00,  1.78s/it]
INFO:Loaded the model in 4.76 seconds.

Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
INFO:Loading the extension "gallery"...
INFO:server listening on 127.0.0.1:5005
Starting API at http://127.0.0.1:5000/api
Running on local URL:  http://127.0.0.1:8860

To create a public link, set `share=True` in `launch()`.
/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py:690: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
  input_ids = input_ids.repeat_interleave(expand_size, dim=0)
/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/attention.py:263: UserWarning: The operator 'aten::pow.Scalar_out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  slopes = 1.0 / torch.pow(2, m)
Traceback (most recent call last):
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/text-generation-webui/modules/callbacks.py", line 73, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/text-generation-webui/modules/text_generation.py", line 263, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/modeling_mpt.py", line 237, in forward
    outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/modeling_mpt.py", line 183, in forward
    (x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/blocks.py", line 35, in forward
    a = self.norm_1(x)
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/knut/.cache/huggingface/modules/transformers_modules/mosaicml_mpt-7b-storywriter/norm.py", line 24, in forward
    with torch.autocast(enabled=False, device_type=module_device.type):
  File "/Volumes/cuttingedge/large_lang_models/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in __init__
    raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'')
RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'
Output generated in 0.35 seconds (0.00 tokens/s, 0 tokens, context 72, seed 553060808)

There are some community resources for using this model in oobabooga, such as https://youtu.be/QVVb6Md6huA β€” hopefully that helps.

sam-mosaic changed discussion status to closed

@sam-mosaic Thanks but it didn't help at all.

Sign up or log in to comment