Spaces:
Sleeping
:chestnut: SEED Multimodal
Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.
The repository provides the official implementation of SEED, SEED-LLaMA. For any inquiries, please email seed-x@googlegroups.com.
News
:beers: We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. :beers:
- 2023-10-23 :hugs: We have optimized the memory overhead. Through 8bit quantization and dynamic loading, SEED-LLaMA 8b/14B can run on single 16GB/24GB GPU.
- 2023-10-23 :hugs: All model weights will be downloaded automatically when starting the demo.
- 2023-10-20 :hugs: We release the checkpoints and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B.
- 2023-10-20 :space_invader: We release an online gradio demo, feel free to use it by yourself.
- 2023-10-02 :paperclip: We release the technical report of SEED-LLaMA on arXiv, which is empowered by the improved SEED-2 tokenizer.
- 2023-07-29 :octocat: We release the checkpoint of the SEED tokenizer and its inference code. Check it out via SEED-1.
- 2023-07-16 :paperclip: We release the technical report of SEED on arXiv.
Stay tuned for the updates!
Brief Introduction
It is recommended to check out our papers for technical details.
:speech_balloon: What can SEED-LLaMA do?
SEED-LLaMA is capable of both multimodal comprehension and generation, exhibiting compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant. [Compare to SOTA] [More examples on X]
:bulb: How does SEED-LLaMA achieve it?
The core of SEED-LLaMA is the tailored SEED tokenizer, which properly quantized visual signals into discrete visual tokens, capturing necessary semantics while being produced under 1D causal dependence. [SEED-2 vs. SEED-1]
Usage
Dependencies
- Python >= 3.8 (Recommend to use Anaconda)
- PyTorch >= 1.11.0
- NVIDIA GPU + CUDA
Installation
Clone the repo and install dependent packages
git clone https://github.com/AILab-CVC/SEED.git
cd SEED
pip install -r requirements.txt
Model Weights
We release the pretrained SEED Tokenizer and De-Tokenizer, pretrained and instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B in SEED Hugging Face.
- Check the SEED tokenizer weights in AILab-CVC/seed-tokenizer-2
- Check the SEED LLaMA(8B) weights in AILab-CVC/seed-llama-8b-sft
- Check the SEED LLaMA(14B) weights in AILab-CVC/seed-llama-14b-sft
The model weights of unCLIP SD-UNet which are used to reconstruct the image will be downloaded automatically.
Inference for visual tokenization and de-tokenization
To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet:
cd .. # SEED/
python scripts/seed_tokenizer_inference.py
Inference for SEED-LLaMA
Given that SEED-LLaMA-8B is based on Vicuna-7B and SEED-LLaMA-14B based on LLaMA2-Chat-13B, we use Vicuna-7B's ("USER:", "ASSISTANT:") and LLaMA2-Chat-13B's ([INST] [/INST]) prompts for respective instruction tuning.
# Inference for SEED-LLaMA-8B
python scripts/seed_llama_inference_8B.py
# Inference for SEED-LLaMA-14B
python scripts/seed_llama_inference_14B.py
Launching Gradio Demo of SEED-LLaMA-14B Locally
- Building the local demo of SEED-LLaMA-14B currently requires single 24GB GPU.
# SEED/
# in first terminal
bash scripts/start_backend_14b.sh
# in second terminal
bash scripts/start_frontend_14b.sh
- Building the local demo of SEED-LLaMA-8B currently requires single 16GB GPU.
# SEED/
# in first terminal
bash scripts/start_backend_8b.sh
# in second terminal
bash scripts/start_frontend_8b.sh
Then the demo can be accessed through http://127.0.0.1:80
Citation
If you find the work helpful, please consider citing:
@article{ge2023making,
title={Making LLaMA SEE and Draw with SEED Tokenizer},
author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2310.01218},
year={2023}
}
@article{ge2023planting,
title={Planting a seed of vision in large language model},
author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2307.08041},
year={2023}
}
The project is still in progress.
License
SEED
is released under Apache License Version 2.0.
SEED-LLaMA
is released under the original License of LLaMA2.