Update model card metadata

3e86d6b verified 5 months ago

4.64 kB

	---
	license: mit
	datasets:
	- huggan/anime-faces
	pipeline_tag: unconditional-image-generation
	tags:
	- art
	---
	# Abstract

	DDPM model trained on [huggan/anime-faces](https://huggingface.co/datasets/huggan/anime-faces) dataset.

	## Training Arguments

	\| Argument \| Value \|
	\| :-------------------------: \| :----: \|
	\| image_size \| 64 \|
	\| train_batch_size \| 16 \|
	\| eval_batch_size \| 16 \|
	\| num_epochs \| 50 \|
	\| gradient_accumulation_steps \| 1 \|
	\| learning_rate \| 1e-4 \|
	\| lr_warmup_steps \| 500 \|
	\| mixed_precision \| "fp16" \|

	For training code, please refer to [this link](https://github.com/LittleNyima/code-snippets/blob/master/ddpm-tutorial/ddpm_training.py).

	# Inference

	This project aims to implement DDPM from scratch, so `DDPMScheduler` is not used. Instead, I use only `UNet2DModel` and implement a simple scheduler myself. The inference code is:

	```python
	import torch
	from tqdm import tqdm
	from diffusers import UNet2DModel

	class DDPM:
	def __init__(
	self,
	num_train_timesteps:int = 1000,
	beta_start: float = 0.0001,
	beta_end: float = 0.02,
	):
	self.num_train_timesteps = num_train_timesteps
	self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)
	self.alphas = 1.0 - self.betas
	self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
	self.timesteps = torch.arange(num_train_timesteps - 1, -1, -1)

	def add_noise(
	self,
	original_samples: torch.Tensor,
	noise: torch.Tensor,
	timesteps: torch.Tensor,
	):
	alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device ,dtype=original_samples.dtype)
	noise = noise.to(original_samples.device)
	timesteps = timesteps.to(original_samples.device)

	# \sqrt{\bar\alpha_t}
	sqrt_alpha_prod = alphas_cumprod[timesteps].flatten() ** 0.5
	while len(sqrt_alpha_prod.shape) < len(original_samples.shape):
	sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)

	# \sqrt{1 - \bar\alpha_t}
	sqrt_one_minus_alpha_prod = (1.0 - alphas_cumprod[timesteps]).flatten() ** 0.5
	while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):
	sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)

	return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise

	@torch.no_grad()
	def sample(
	self,
	unet: UNet2DModel,
	batch_size: int,
	in_channels: int,
	sample_size: int,
	):
	betas = self.betas.to(unet.device)
	alphas = self.alphas.to(unet.device)
	alphas_cumprod = self.alphas_cumprod.to(unet.device)
	timesteps = self.timesteps.to(unet.device)
	images = torch.randn((batch_size, in_channels, sample_size, sample_size), device=unet.device)
	for timestep in tqdm(timesteps, desc='Sampling'):
	pred_noise: torch.Tensor = unet(images, timestep).sample

	# mean of q(x_{t-1}\|x_t)
	alpha_t = alphas[timestep]
	alpha_cumprod_t = alphas_cumprod[timestep]
	sqrt_alpha_t = alpha_t ** 0.5
	one_minus_alpha_t = 1.0 - alpha_t
	sqrt_one_minus_alpha_cumprod_t = (1 - alpha_cumprod_t) ** 0.5
	mean = (images - one_minus_alpha_t / sqrt_one_minus_alpha_cumprod_t * pred_noise) / sqrt_alpha_t

	# variance of q(x_{t-1}\|x_t)
	if timestep > 1:
	beta_t = betas[timestep]
	one_minus_alpha_cumprod_t_minus_one = 1.0 - alphas_cumprod[timestep - 1]
	one_divided_by_sigma_square = alpha_t / beta_t + 1.0 / one_minus_alpha_cumprod_t_minus_one
	variance = (1.0 / one_divided_by_sigma_square) ** 0.5
	else:
	variance = torch.zeros_like(timestep)

	epsilon = torch.randn_like(images)
	images = mean + variance * epsilon
	images = (images / 2.0 + 0.5).clamp(0, 1).cpu().permute(0, 2, 3, 1).numpy()
	return images

	model = UNet2DModel.from_pretrained('ddpm-animefaces-64').cuda()
	ddpm = DDPM()
	images = ddpm.sample(model, 32, 3, 64)

	from diffusers.utils import make_image_grid, numpy_to_pil
	image_grid = make_image_grid(numpy_to_pil(images), rows=4, cols=8)
	image_grid.save('ddpm-sample-results.png')
	```

	This can also be found in [this link](https://github.com/LittleNyima/code-snippets/blob/master/ddpm-tutorial/ddpm_sampling.py).