InstantX/FLUX.1-dev-Controlnet-Union · NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.

Sorry for the repetitiveness.
I'm reporting an unusual error.

By the way, the error itself has been there since before the last PR, so that fix should not be the culprit.
It is working in someone else's space, so I hope it is simply lack of VRAM...

I can't decide whether it's a problem with my code, a conflict with some library, a version-dependent error in HF's library or pytorch, one of the bugs that are happening across the HF site, or a model error that occurs under certain conditions.

Note that normal inference and various other processes by FluxPipeline are executed without error, and errors occur only when the inference part of the code block below is reached.

Error Log

NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. 
Traceback (most recent call last):
  File "/home/user/app/app.py", line 142, in generate_image
    image = pipe(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 711, in __call__
    ) = self.encode_prompt(
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 370, in encode_prompt
    prompt_embeds = self._get_t5_prompt_embeds(
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 256, in _get_t5_prompt_embeds
    prompt_embeds = self.text_encoder_2(text_input_ids.to(device), output_hidden_states=False)[0]
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1971, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1106, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 686, in forward
    self_attention_outputs = self.layer[0](
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
    attention_output = self.SelfAttention(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 553, in forward
    attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 253, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 156, in generate_image
    raise gr.Error(f"Inference Error: {e}")
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. '
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1508, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 818, in wrapper
    response = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 818, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 201, in run_lora
    image = generate_image(prompt_mash, steps, seed, cfg_scale, width, height, lora_scale, cn_on, progress)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
    raise res.value
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. '

Code Summary

Loading Part (Summarized code)

            global controlnet_union # Originally located elsewhere
            global controlnet # Originally located elsewhere
            global pipe # Originally located elsewhere
            repo_id = "camenduru/FLUX.1-dev-diffusers" # Originally located elsewhere, but this value is assigned
            controlnet_model_union_repo = 'InstantX/FLUX.1-dev-Controlnet-Union' # Originally located elsewhere, but this value is assigned
            dtype = torch.bfloat16 # Originally located elsewhere, but this value is assigned
            controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union_repo, torch_dtype=dtype)
            controlnet = FluxMultiControlNetModel([controlnet_union])
            pipe = FluxControlNetPipeline.from_pretrained(repo_id, controlnet=controlnet, torch_dtype=dtype)

Inference Part (Actual code)

                if controlnet is not None: controlnet.to("cuda") # Crashes here.
                if controlnet_union is not None: controlnet_union.to("cuda") # Crashes here.
                image = pipe( # Without the above statement, it would crash here.
                    prompt=prompt_mash,
                    control_image=images,
                    control_mode=modes,
                    num_inference_steps=steps,
                    guidance_scale=cfg_scale,
                    width=width,
                    height=height,
                    controlnet_conditioning_scale=scales,
                    generator=generator,
                    joint_attention_kwargs={"scale": lora_scale},
                ).images[0]

Dependency

spaces
git+https://github.com/huggingface/diffusers
torch
torchvision
huggingface_hub
accelerate
transformers
peft
sentencepiece
timm
einops
controlnet-aux
kornia
numpy
opencv-python
deepspeed

Actual Space

https://huggingface.co/spaces/John6666/flux-lora-the-explorer