How is your VAE decoder diff than Comfy's Mochi VAE?

#4
by Hesajon - opened

Yours says bf16 (725 MB), and theirs only says it is 920 MB. Do you know the difference?

Comfy packed the encoder with the decoder into same file.

That's the only diff? They are both bf16 decoders?

Looking closer they went with fp16, I don't know why, I tested both side by side and saw no difference though.

Some people are reporting that the comfy VAE is higher quality than yours, a cleaner crisper image. Do you know why that might be?
https://www.reddit.com/r/comfyui/comments/1gkb9y1/comment/lvroge1/

run-mochi-in-comfyui-with-consumer-gpu-v0-ujb8vhp33czd1.webp

It might be how your node is tiling it differently than Comfy's. Do you know what settings they are using for their VAE decode?

Some people are reporting that the comfy VAE is higher quality than yours, a cleaner crisper image. Do you know why that might be?
https://www.reddit.com/r/comfyui/comments/1gkb9y1/comment/lvroge1/

run-mochi-in-comfyui-with-consumer-gpu-v0-ujb8vhp33czd1.webp

This is because comfy does unnormalization in the sampler, while I do it in the VAE decoder... so if you generate with comfy and decode with my node it gets done twice and that ruins the quality, I have since added option to toggle that off in my VAE decode for this reason.

These differences are seen with unnormalization set to "false" in your node so it shouldn't be doing it twice?

I did try changing the tile sample size to 256 x 256, and overlap to 0.25 on width and height, and I get similar quality as comfy without the shadowing/ghosting. So maybe that is the diff?

These differences are seen with unnormalization set to "false" in your node so it shouldn't be doing it twice?

I did try changing the tile sample size to 256 x 256, and overlap to 0.25 on width and height, and I get similar quality as comfy without the shadowing/ghosting. So maybe that is the diff?

I mean with tiling disabled it's identical, I know tiling reduces quality, but it's currently the only way to even do more frames.

To be clear: this has nothing to do with the model itself, just the tiling code.

yeah, just trying to reproduce the same quality as the tiling that Comfy VAE Decode uses.

Turning on "lossless" to true in the save node also helps quality a lot. By default Comfy's workflow has it set to lossy 80% on the save of webp, which introduces many artifacts.

I think it might have been the overlap of tiles that was causing the diff in quality. I think Comfy's implementation is using greater overlap. Changing to 0.25 overlap on width and height seems to produce the same quality.

Sign up or log in to comment