English
hanjian.thu123 commited on
Commit
64b0761
โ€ข
1 Parent(s): 064fdd1

[update] revise README

Browse files
Files changed (1) hide show
  1. README.md +7 -28
README.md CHANGED
@@ -1,4 +1,7 @@
1
- # Infinity $\infty$: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
 
 
 
2
 
3
  <div align="center">
4
 
@@ -16,38 +19,14 @@
16
 
17
 
18
 
19
- ## ๐Ÿ“Œ Note
20
- This repo is used for hosting Infinity's checkpoints. For more details, please refer to [![code](https://img.shields.io/badge/%F0%9F%A4%96%20Code-FoundationVision/Infinity-green)](https://github.com/FoundationVision/Infinity)&nbsp;
21
 
22
 
23
  ## ๐Ÿ“– Introduction
24
  We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024ร—1024 image in 0.8 seconds, making it 2.6ร— faster than SD3-Medium and establishing it as the fastest text-to-image model.
25
 
26
-
27
- ## ๐Ÿ“€ Infinity Model ZOO
28
- We provide Infinity models for you to play with, which are on <a href='https://huggingface.co/FoundationVision/infinity'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20weights-FoundationVision/Infinity-yellow'></a> or can be downloaded from the following links:
29
-
30
- ### Visual Tokenizer
31
-
32
- | vocabulary | stride | IN-256 rFID $\downarrow$ | IN-256 PSNR $\uparrow$ | IN-512 rFID $\downarrow$ | IN-512 PSNR $\uparrow$ | HF weights๐Ÿค— |
33
- |:----------:|:-----:|:--------:|:---------:|:-------:|:-------:|:------------------------------------------------------------------------------------|
34
- | $V_d=2^{16}$ | 16 | 1.22 | 20.9 | 0.31 | 22.6 | [infinity_vae_d16.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d16.pth) |
35
- | $V_d=2^{24}$ | 16 | 0.75 | 22.0 | 0.30 | 23.5 | [infinity_vae_d24.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d24.pth) |
36
- | $V_d=2^{32}$ | 16 | 0.61 | 22.7 | 0.23 | 24.4 | [infinity_vae_d32.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d32.pth) |
37
- | $V_d=2^{64}$ | 16 | 0.33 | 24.9 | 0.15 | 26.4 | [infinity_vae_d64.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d64.pth) |
38
- | $V_d=2^{32}$ | 16 | 0.75 | 21.9 | 0.32 | 23.6 | [infinity_vae_d32_reg.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d32_reg.pth) |
39
-
40
- ### Infinity
41
- | model | Resolution | GenEval | DPG | HPSv2.1 | HF weights๐Ÿค— |
42
- |:----------:|:-----:|:--------:|:---------:|:-------:|:------------------------------------------------------------------------------------|
43
- | Infinity-2B | 1024 | 0.69 / 0.73 $^{\dagger}$ | 83.5 | 32.2 | [infinity_2b_reg.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_2b_reg.pth) |
44
- | Infinity-20B | 1024 | - | - | - | [Coming Soon](TBD) |
45
-
46
- ${\dagger}$ result is tested with a [prompt rewriter](tools/prompt_rewriter.py).
47
-
48
- You can load these models to generate images via the codes in [interactive_infer.ipynb](tools/interactive_infer.ipynb). Note: you need to download [infinity_vae_d32reg.pth](https://huggingface.co/FoundationVision/infinity/blob/main/infinity_vae_d32_reg.pth) and [flan-t5-xl](https://huggingface.co/google/flan-t5-xl) first.
49
-
50
-
51
 
52
  ## ๐Ÿ“– Citation
53
  If our work assists your research, feel free to give us a star โญ or cite us using:
 
1
+ ----
2
+ -license: mit
3
+ ----
4
+ # Infinity โˆž: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
5
 
6
  <div align="center">
7
 
 
19
 
20
 
21
 
22
+
 
23
 
24
 
25
  ## ๐Ÿ“– Introduction
26
  We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024ร—1024 image in 0.8 seconds, making it 2.6ร— faster than SD3-Medium and establishing it as the fastest text-to-image model.
27
 
28
+ ## ๐Ÿ“Œ Note
29
+ This repo is used for hosting Infinity's checkpoints. For more details, please refer to [![code](https://img.shields.io/badge/%F0%9F%A4%96%20Code-FoundationVision/Infinity-green)](https://github.com/FoundationVision/Infinity)&nbsp;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## ๐Ÿ“– Citation
32
  If our work assists your research, feel free to give us a star โญ or cite us using: