Unable to load Bloom on an EC2 instance
#99
by
viniciusguimaraes
- opened
Hi everyone. I am trying to load Bloom-175B on a x2iezn.6xlarge (specs below) but it is stuck on BloomForCausalLM.from_pretrained() call. I was able to narrow down the exact method where the code stops by using faulthandler's dump_traceback_later method (attached image) but I'm still trying to understand why it happens. The line in Pytorch where it seems to have a problem is
storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
Has anyone had a similar problem and was able to solve it?
x2iezn.6xlarge specs
768gb RAM
24 vcpus
Hi @viniciusguimaraes . You could alternatively try downloading the model first and then using it from the downloaded folder as follows:
- Download model:
git lfs install
git clone https://huggingface.co/bigscience/bloom
- Use model:
model = AutoModel.from_pretrained("<your_downloaded_folder>/bloom")