HuggingFaceH4/zephyr-7b-beta · Error Message when the number of input tokens exceeds 2000. I am using ml.g4dn.8xlarge instance (128 GiB).

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_10963/2260344357.py", line 30, in
outputs = pipe(prompt, max_new_tokens=1000, do_sample=True, temperature=0.001, top_k=50, top_p=0.95)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 208, in call
return super().call(text_inputs, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1140, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1147, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1046, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 271, in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/generation/utils.py", line 1777, in generate
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/generation/utils.py", line 2874, in sample
next_token_scores = logits_processor(input_ids, next_token_logits)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1055, in forward
attention_mask=attention_mask,
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 940, in forward
attention_mask=attention_mask,
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 665, in forward
attention_mask=attention_mask,
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 300, in forward
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/functional.py", line 1858, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.08 GiB. GPU 0 has a total capacty of 14.75 GiB of which 921.06 MiB is free. Including non-PyTorch memory, this process has 13.85 GiB memory in use. Of the allocated memory 13.22 GiB is allocated by PyTorch, and 518.11 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2120, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1435, in structured_traceback
return FormattedTB.structured_traceback(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1326, in structured_traceback
return VerboseTB.structured_traceback(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1173, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1088, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 970, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/IPython/core/ultratb.py", line 792, in lines
return self._sd.lines
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
pieces = self.included_pieces
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/core.py", line 681, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/stack_data/core.py", line 660, in executing_piece
return only(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/executing/executing.py", line 190, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0