Modifying and controling the size and ratio of an image.
Hello community,
I'm attempting to acquire images with different aspect ratios such as 16:9, 4:3, 3:5, etc. By default, the output is always a 1:1 ratio image, and despite my efforts to adjust the parameters for variance, I haven't succeeded. Below are the parameters I've considered. How can I configure them correctly?
#οΈβ£ Numeric parameters
strength = 0.3
num_inference_steps = 50
denoising_start = 0.0
denoising_end = 1.0
guidance_scale = 7.5
num_images_per_prompt = 1
eta = 0.0
guidance_rescale = 0.0
aesthetic_score = 6.0
negative_aesthetic_score = 2.5
clip_skip = 0
#οΈβ£ String parameters
prompt = "Something" # A valid prompt must be provided
prompt_2 = prompt # Uses the value of prompt
if not defined
negative_prompt = "Something" # None unless using negative guidance
negative_prompt_2 = negative_prompt # Uses the value of negative_prompt
if not defined
output_type = "pil"
target_size = (1280, 720)
negative_target_size = (3840, 2160)
#οΈβ£ Parameters set to match target size as default
original_size = (1024, 768) # We don't need it as we're not cropping from an existing image.
crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.
negative_original_size = (1024, 1024) # We don't need it as we're not cropping from an existing image.
negative_crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.
Thank you very much for your guidance. To provide some context, these parameters are defined in a cell that I run to subsequently execute another cell, allowing for easy modification of the parameters if desired. Below is the subsequent cell:
#οΈβ£ Running the model
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch_dtype)
pipe = pipe.to("cuda")
pipe.safety_checker = None
generated_image = pipe(
β β βββ β β β β β β β prompt=prompt,
β β β β β β β β β β prompt_2=prompt_2,
β β β β β β β β β β strength=strength,
β β β β β β β β β β num_inference_steps=num_inference_steps,
β β β β β β β β β β denoising_start=denoising_start,
β β β β β β β ββ β denoising_end=denoising_end,
β β β β β β β β β β guidance_scale=guidance_scale,
β β β β β β β β β β negative_prompt=negative_prompt,
β β β β β β β β β β negative_prompt_2=negative_prompt_2,
β β β β β β β β β β num_images_per_prompt=num_images_per_prompt,
β β β β β β β β β β eta=eta,
β β β β β β β β β β output_type=output_type,
β β β β β β β β β β guidance_rescale=guidance_rescale,
β β β β β β β β β β original_size=original_size,
β β β β β β β β β β crops_coords_top_left=crops_coords_top_left,
β β β β β β β β β β target_size=target_size,
β β β β β β β β β β negative_original_size=negative_original_size,
β β β β β β β β β β negative_crops_coords_top_left=negative_crops_coords_top_left,
β β β β β β β β β β negative_target_size=negative_target_size,
β β β β β β β β β β aesthetic_score=aesthetic_score,
β β β β β β β β β β negative_aesthetic_score=negative_aesthetic_score,
β β β β β β β β β β clip_skip=clip_skip
).images[0]
#οΈβ£ Display the image
display(generated_image)