Xuyao Wang
commited on
Commit
•
aafeaf6
1
Parent(s):
89ed0bd
Add README
Browse files
README.md
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: interleaved-text-image
|
6 |
+
tags:
|
7 |
+
- multimodal
|
8 |
+
library_name: transformers
|
9 |
+
---
|
10 |
+
|
11 |
+
# Align-Anything Chameleon 7B Base
|
12 |
+
|
13 |
+
## Introduction
|
14 |
+
|
15 |
+
Repository for Align-Anything Chameleon 7B Base, a powerful model for text-image interleaved input and output. This model is based on the [Chameleon](https://huggingface.co/facebook/chameleon-7b) model, and is trained on the [Align-Anything](https://github.com/PKU-Alignment/Align-Anything) framework to further unlock its capability of image generation.
|
16 |
+
|
17 |
+
## Usage
|
18 |
+
|
19 |
+
To use this model, you can refer to the [Align-Anything](https://github.com/PKU-Alignment) repository for more details, including the training, inference and evaluation:
|
20 |
+
|
21 |
+
```bash
|
22 |
+
git clone https://github.com/PKU-Alignment/align-anything.git
|
23 |
+
cd align-anything/projects/text_image_to_text_image
|
24 |
+
```
|
25 |
+
|
26 |
+
Then follow the instructions in the README.md file to set up the environment and run the scripts.
|
27 |
+
|
28 |
+
Currently, the official Transformer repo does not support Chameleon model with image output (see [this PR](https://github.com/huggingface/transformers/pull/32013) for more details), so we rely on a certain fork of the repo.
|
29 |
+
|
30 |
+
After installing Align-Anything and correctly set up the envrionment, you can install the forked stable version of the repo by running:
|
31 |
+
|
32 |
+
```bash
|
33 |
+
pip install git+https://github.com/htlou/transformers.git@hantao_stable_cham
|
34 |
+
```
|
35 |
+
|
36 |
+
If you want to generate image (pure text generation can be directly done by `Transformers`), you can follow the instructions in the [mmsg_chameleon](https://github.com/htlou/mmsg_chameleon) repo to run the inference.
|
37 |
+
|
38 |
+
```bash
|
39 |
+
git clone https://github.com/htlou/mmsg_chameleon.git
|
40 |
+
cd mmsg_chameleon
|
41 |
+
```
|
42 |
+
|
43 |
+
Then set up the envrionment using
|
44 |
+
```bash
|
45 |
+
pip install -e .
|
46 |
+
```
|
47 |
+
|
48 |
+
After setting up the envrioment, set up the correct paths in `scripts/interleaved_gen.sh` and then run
|
49 |
+
```bash
|
50 |
+
bash scripts/interleaved_gen.sh
|
51 |
+
```
|