metadata

tags:
  - Text-to-Video

zeroscope_dark_v2 30x448x256

A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions with varying brightness and a smooth video output. This model was trained using 9,923 clips and 29,769 tagged frames at 30 frames, 448x256 resolution.
zeroscope_v2 30x448x256 is specifically designed for upscaling with Potat1 using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in Potat1, permitting faster exploration in 448x256 before transitioning to a high-resolution render.

Using it with the 1111 text2video extension

Rename the file 'zeroscope_v2_dark_30x448x256.pth' to 'text2video_pytorch_model.pth'.
Rename the file 'zeroscope_v2_dark_30x448x256_text.bin' to 'open_clip_pytorch_model.bin'.
Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.

Upscaling recommendations

For upscaling, it's recommended to use Potat1 via vid2vid in the 1111 extension. Aim for a resolution of 1152x640 and a denoise strength between 0.66 and 0.85. Remember to use the same prompt and settings that were used to generate the original clip.

Known issues

Lower resolutions or fewer frames could lead to suboptimal output.
Certain clips might appear with cuts. This will be fixed in the upcoming 2.1 version, which will incorporate a cleaner dataset. Some clips may playback too slowly, requiring prompt engineering for an increased pace.