utrobinmv/t5_summary_en_ru_zh_base_2048 · how to use the model with gpu ?

I use the following code:

model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
device=torch.device('cuda')
model.to(device)

, and I got

Traceback (most recent call last):
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/app3/source/service_app.py", line 43, in summary
    result = model.extract(src_text)
  File "/app3/source/t5.py", line 18, in extract
    generated_tokens = self.model.generate(**input_ids)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/transformers/generation/utils.py", line 1597, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/transformers/generation/utils.py", line 523, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1013, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
  File "/root/anaconda3/envs/t5model/lib/python3.9/site-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Any comment are welcome, thanks