:chestnut: SEED Multimodal

Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.

The repository provides the official implementation of SEED, SEED-LLaMA. For any inquiries, please email seed-x@googlegroups.com.

News

:beers: We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. :beers:

2023-10-23 :hugs: We have optimized the memory overhead. Through 8bit quantization and dynamic loading, SEED-LLaMA 8b/14B can run on single 16GB/24GB GPU.
2023-10-23 :hugs: All model weights will be downloaded automatically when starting the demo.
2023-10-20 :hugs: We release the checkpoints and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B.
2023-10-20 :space_invader: We release an online gradio demo, feel free to use it by yourself.
2023-10-02 :paperclip: We release the technical report of SEED-LLaMA on arXiv, which is empowered by the improved SEED-2 tokenizer.
2023-07-29 :octocat: We release the checkpoint of the SEED tokenizer and its inference code. Check it out via SEED-1.
2023-07-16 :paperclip: We release the technical report of SEED on arXiv.

Stay tuned for the updates!

Brief Introduction

It is recommended to check out our papers for technical details.

:speech_balloon: What can SEED-LLaMA do?

SEED-LLaMA is capable of both multimodal comprehension and generation, exhibiting compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant. [Compare to SOTA] [More examples on X]

:bulb: How does SEED-LLaMA achieve it?

The core of SEED-LLaMA is the tailored SEED tokenizer, which properly quantized visual signals into discrete visual tokens, capturing necessary semantics while being produced under 1D causal dependence. [SEED-2 vs. SEED-1]

Usage

Dependencies

Python >= 3.8 (Recommend to use Anaconda)
PyTorch >= 1.11.0
NVIDIA GPU + CUDA

Installation

Clone the repo and install dependent packages

git clone https://github.com/AILab-CVC/SEED.git
cd SEED
pip install -r requirements.txt

Model Weights

We release the pretrained SEED Tokenizer and De-Tokenizer, pretrained and instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B in SEED Hugging Face.

Check the SEED tokenizer weights in AILab-CVC/seed-tokenizer-2
Check the SEED LLaMA(8B) weights in AILab-CVC/seed-llama-8b-sft
Check the SEED LLaMA(14B) weights in AILab-CVC/seed-llama-14b-sft

The model weights of unCLIP SD-UNet which are used to reconstruct the image will be downloaded automatically.

Inference for visual tokenization and de-tokenization

To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet:

cd ..   # SEED/ 
python scripts/seed_tokenizer_inference.py

Inference for SEED-LLaMA

Given that SEED-LLaMA-8B is based on Vicuna-7B and SEED-LLaMA-14B based on LLaMA2-Chat-13B, we use Vicuna-7B's ("USER:", "ASSISTANT:") and LLaMA2-Chat-13B's ([INST] [/INST]) prompts for respective instruction tuning.

# Inference for SEED-LLaMA-8B
python scripts/seed_llama_inference_8B.py

# Inference for SEED-LLaMA-14B
python scripts/seed_llama_inference_14B.py

Launching Gradio Demo of SEED-LLaMA-14B Locally

Building the local demo of SEED-LLaMA-14B currently requires single 24GB GPU.

# SEED/
# in first terminal
bash scripts/start_backend_14b.sh
# in second terminal
bash scripts/start_frontend_14b.sh

Building the local demo of SEED-LLaMA-8B currently requires single 16GB GPU.

# SEED/
# in first terminal
bash scripts/start_backend_8b.sh
# in second terminal
bash scripts/start_frontend_8b.sh

Then the demo can be accessed through http://127.0.0.1:80

Citation

If you find the work helpful, please consider citing:

@article{ge2023making,
  title={Making LLaMA SEE and Draw with SEED Tokenizer},
  author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2310.01218},
  year={2023}
}

@article{ge2023planting,
  title={Planting a seed of vision in large language model},
  author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2307.08041},
  year={2023}
}

The project is still in progress.

License

SEED is released under Apache License Version 2.0.

SEED-LLaMA is released under the original License of LLaMA2.

Acknowledgement

We thank the great work from unCLIP SD and BLIP2.