File size: 3,113 Bytes
b0ae254
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Reading metadata...: 2165it [00:00, 13177.23it/s]                     | 0/30000 [00:00<?, ?it/s]
Reading metadata...: 1650it [00:00, 10373.09it/s]

[INFO|trainer_utils.py:744] 2023-11-18 11:48:05,650 >> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`,  you can safely ignore this message.
[WARNING|logging.py:329] 2023-11-18 11:48:08,147 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`...
Traceback (most recent call last):
  File "/mnt/e/run_speech_recognition_seq2seq_streaming.py", line 679, in <module>
    main()
  File "/mnt/e/run_speech_recognition_seq2seq_streaming.py", line 628, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1546, in train
    return inner_training_loop(
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/transformers/trainer.py", line 2734, in training_step
    self.accelerator.backward(loss)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1987, in backward
    self.scaler.scale(loss).backward(**kwargs)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 688.00 MiB. GPU 0 has a total capacty of 15.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 13.36 GiB is allocated by PyTorch, and 1.26 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF