Spaces:
Runtime error
Apply for community grant: Academic project (gpu)
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
Hi @ShoufaChen , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
Hi @hysts , thank you very much for your kind donation.
Our Gradio app encountered a CUDA-related error on ZeroGPU. Could you tell me the difference between our current GPU and ZeroGPU? Our app requires CUDA >= 12.1.
We transferred to the original A100 GPU as a workaround.
Thanks for checking. Hmm, not sure what difference caused the error. What error did you get?
I am sorry that I didn't copy the full log. Could I fork this repo to a ZeroGPU one? Since this demo is very active right now, it would significantly affect users if I try to debug this problem on this one.
@ShoufaChen Ah, sorry, I accidentally changed the hardware to L4. (I thought I changed the hardware of my duplicate of your Space, but apparently it wasn't the case). Could you change the hardware back to A100?
no worry, now it is back to A100 one.
@ShoufaChen Thanks!
Regarding ZeroGPU, looks like the following error is raised on ZeroGPU.
Traceback (most recent call last):
File "/home/user/app/app.py", line 116, in <module>
vq_model, llm, image_size = load_model(args)
File "/home/user/app/app.py", line 46, in load_model
llm = LLM(
File "/home/user/app/serve/llm.py", line 124, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/home/user/app/serve/llm_engine.py", line 284, in from_engine_args
engine = cls(
File "/home/user/app/serve/llm_engine.py", line 152, in __init__
self.model_executor = executor_class(
File "/home/user/app/serve/gpu_executor.py", line 42, in __init__
self._init_executor()
File "/home/user/app/serve/gpu_executor.py", line 51, in _init_executor
self._init_non_spec_worker()
File "/home/user/app/serve/gpu_executor.py", line 80, in _init_non_spec_worker
self.driver_worker.init_device()
File "/home/user/app/serve/worker.py", line 102, in init_device
torch.cuda.set_device(self.device)
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 408, in set_device
torch._C._cuda_setDevice(device)
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
torch._C._cuda_init()
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch.py", line 181, in _cuda_init_raise
raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init
On ZeroGPU, backend GPUs are shared across multiple ZeroGPU Spaces and CUDA is only available in the function decorated with @spaces.GPU
.
I remember someone mentioned that the vllm
was not compatible with ZeroGPU, so I guess that's the reason.
Would it be possible for you to not use vllm
for your demo or would it be possible to make it runnable on L4?
The hardware with the largest VRAM we can grant is ZeroGPU (A100 with 40GB VRAM), and apparently OOM occurs when running on L4.
@hysts Thank you very much for your kind help.
Yes, our demo can work without vllm
. However, it would be 4x slower without vllm
. It needs at least 40 GB VRAM.
Thank you all the way ❤️❤️❤️
@ShoufaChen
Thanks, I see. A normal A100 is not available for grants, so I think it would be nice if you could migrate your Space to ZeroGPU without using vllm
, then. (I wonder if there's a workaround to use vllm
on ZeroGPU.)
I'll add you to the ZeroGPU explorers org so you can test if ZeroGPU works for your Space by duplicating your Space and assigning ZeroGPU to it yourself. Once you made your Space runnable on ZeroGPU, you can update the code of this Space and delete the duplicate Space you used for testing.
Also, I'll remove the L4 grant from this Space as it only has 24 GB VRAM and useless for this Space.
@ShoufaChen
Ah, sorry again. Apparently, the hardware was switched back to cpu-basic
when we remove the grant. Would you change the hardware back to A100 again?
@ShoufaChen Ah, sorry again. Apparently, the hardware was switched back to
cpu-basic
when we remove the grant. Would you change the hardware back to A100 again?
done. no worry.