Failed to run in AWS SageMaker

#9
by fangleen - opened

Hi,
I ran the script in the Deploy menu above in AWS Sagemaker, but after a while, it failed with the OOM error. The same issue happened when I tried ml.g5.2xlarge and ml.g5.12xlarge. Is it the AWS environment problem? Did anyone have this issue?

Thanks,

Error from CloudWatch:

torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)

2023-09-06T15:39:42.570+08:00 Currently allocated : 21.13 GiB

2023-09-06T15:39:42.570+08:00 Requested : 150.00 MiB

2023-09-06T15:39:42.570+08:00 Device limit : 22.20 GiB

2023-09-06T15:39:42.570+08:00 Free (according to CUDA): 25.12 MiB

2023-09-06T15:39:45.076+08:00 PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

NousResearch org

You don't have a big enough gpu

Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.

The deploy code suggests using 2xlarge:
predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )

NousResearch org

Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.

The deploy code suggests using 2xlarge:
predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )

To fit it on a 24gb gpu, either set

"device_map": "auto"

or pip install bitsandbytes

and use load_in_8bit=True or load_in_4bit=True

all of these are LlamaForCausalLM.from_pretrained args, i.e.
self.model = LlamaForCausalLM.from_pretrained(
"./openhermes13b/",
torch_dtype=torch.float16,
device_map='auto',
#load_in_8bit=True
)

Thanks a lot.

Sign up or log in to comment