can this vae be used in stable video diffusion?
I am trying to run stable video diffusion with the code below but got all black videos (which is composed of all black images). It seems to be caused by the fp16 format. After trying your VAE, I got this error: "RuntimeError: Input type (float) and bias type (c10::Half) should be the same". Any suggestions for how to fix?
#################################code###
import torch
from diffusers import StableVideoDiffusionPipeline, AutoencoderKL
from diffusers.utils import load_image, export_to_video
pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid", torch_dtype=torch.float16, variant="fp16"
)
pipe.to("cuda")
Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))
generator = torch.manual_seed(42)
export_to_video(frames, "generated.mp4", fps=7)
sdxl-vae-fp16-fix
cannot be used with SVD, because SVD uses the Stable Diffusion 1/2 latent space (see code, paper), whereas sdxl-vae-fp16-fix
uses the SDXL latent space, and the SD1/2 and SDXL latent spaces are not compatible.
Hopefully the stabilityai/stable-video-diffusion-img2vid
thread can find a solution to the issue you're encountering.
Thank you for the info!