Upload folder using huggingface_hub
Browse files- README.md +23 -39
- modeling_intern_vit.py +6 -12
README.md
CHANGED
@@ -62,6 +62,8 @@ InternVL 2.0 is a multimodal large language model series, featuring models of va
|
|
62 |
| MathVista<sub>testmini</sub> | 63.8 | 67.7 | 63.7 | 65.5 |
|
63 |
| OpenCompass<sub>avg</sub> | 69.9 | 67.9 | 69.7 | 71.0 |
|
64 |
|
|
|
|
|
65 |
- We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
|
66 |
|
67 |
- For MMMU, we report both the original scores (left side: evaluated using the InternVL codebase for InternVL series models, and sourced from technical reports or webpages for other models) and the VLMEvalKit scores (right side: collected from the OpenCompass leaderboard).
|
@@ -321,7 +323,7 @@ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast
|
|
321 |
|
322 |
# set the max number of tiles in `max_num`
|
323 |
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
|
324 |
-
generation_config = dict(max_new_tokens=1024, do_sample=
|
325 |
|
326 |
# pure-text conversation (纯文本对话)
|
327 |
question = 'Hello, who are you?'
|
@@ -473,30 +475,28 @@ for new_text in streamer:
|
|
473 |
|
474 |
## Finetune
|
475 |
|
476 |
-
|
477 |
|
478 |
## Deployment
|
479 |
|
480 |
### LMDeploy
|
481 |
|
482 |
-
|
483 |
-
|
484 |
-
To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.
|
485 |
|
486 |
-
```
|
487 |
-
|
488 |
-
"model_name":"internvl-internlm2",
|
489 |
-
"meta_instruction":"我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。",
|
490 |
-
"stop_words":["<|im_start|>", "<|im_end|>"]
|
491 |
-
}
|
492 |
```
|
493 |
|
|
|
|
|
|
|
|
|
494 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
495 |
|
496 |
> **⚠️ Warning**: Please make sure to install Flash Attention; otherwise, using `--tp` will cause errors.
|
497 |
|
498 |
```shell
|
499 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --
|
500 |
```
|
501 |
|
502 |
To use the OpenAI-style interface, you need to install OpenAI:
|
@@ -533,14 +533,6 @@ response = client.chat.completions.create(
|
|
533 |
print(response)
|
534 |
```
|
535 |
|
536 |
-
### vLLM
|
537 |
-
|
538 |
-
TODO
|
539 |
-
|
540 |
-
### Ollama
|
541 |
-
|
542 |
-
TODO
|
543 |
-
|
544 |
## License
|
545 |
|
546 |
This project is released under the MIT license, while Llama3 is licensed under the Llama 3 Community License.
|
@@ -613,6 +605,8 @@ InternVL 2.0 是一个多模态大语言模型系列,包含各种规模的模
|
|
613 |
| MathVista<sub>testmini</sub> | 63.8 | 67.7 | 63.7 | 65.5 |
|
614 |
| OpenCompass<sub>avg</sub> | 69.9 | 67.9 | 69.7 | 71.0 |
|
615 |
|
|
|
|
|
616 |
- 我们同时使用 InternVL 和 VLMEvalKit 仓库进行模型评估。具体来说,DocVQA、ChartQA、InfoVQA、TextVQA、MME、AI2D、MMBench、CCBench、MMVet 和 SEED-Image 的结果是使用 InternVL 仓库测试的。OCRBench、RealWorldQA、HallBench 和 MathVista 是使用 VLMEvalKit 进行评估的。
|
617 |
|
618 |
- 对于MMMU,我们报告了原始分数(左侧:InternVL系列模型使用InternVL代码库评测,其他模型的分数来自其技术报告或网页)和VLMEvalKit分数(右侧:从OpenCompass排行榜收集)。
|
@@ -671,30 +665,28 @@ InternVL 2.0 是一个多模态大语言模型系列,包含各种规模的模
|
|
671 |
|
672 |
## 微调
|
673 |
|
674 |
-
|
675 |
|
676 |
## 部署
|
677 |
|
678 |
### LMDeploy
|
679 |
|
680 |
-
|
681 |
-
|
682 |
-
为了将InternVL2部署成API,请先配置聊天模板配置文件。创建如下的 JSON 文件 `chat_template.json`。
|
683 |
|
684 |
-
```
|
685 |
-
|
686 |
-
"model_name":"internvl-internlm2",
|
687 |
-
"meta_instruction":"我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。",
|
688 |
-
"stop_words":["<|im_start|>", "<|im_end|>"]
|
689 |
-
}
|
690 |
```
|
691 |
|
|
|
|
|
|
|
|
|
692 |
LMDeploy 的 `api_server` 使模型能够通过一个命令轻松打包成服务。提供的 RESTful API 与 OpenAI 的接口兼容。以下是服务启动的示例:
|
693 |
|
694 |
> **⚠️ 注意**: 请务必安装Flash Attention; 否则,使用`——tp`将存在异常。
|
695 |
|
696 |
```shell
|
697 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --
|
698 |
```
|
699 |
|
700 |
为了使用OpenAI风格的API接口,您需要安装OpenAI:
|
@@ -731,14 +723,6 @@ response = client.chat.completions.create(
|
|
731 |
print(response)
|
732 |
```
|
733 |
|
734 |
-
### vLLM
|
735 |
-
|
736 |
-
TODO
|
737 |
-
|
738 |
-
### Ollama
|
739 |
-
|
740 |
-
TODO
|
741 |
-
|
742 |
## 开源许可证
|
743 |
|
744 |
该项目采用 MIT 许可证发布,而 LLama3 则采用 Llama 3 Community License 许可证。
|
|
|
62 |
| MathVista<sub>testmini</sub> | 63.8 | 67.7 | 63.7 | 65.5 |
|
63 |
| OpenCompass<sub>avg</sub> | 69.9 | 67.9 | 69.7 | 71.0 |
|
64 |
|
65 |
+
- For more details and evaluation reproduction, please refer to our [Evaluation Guide](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html).
|
66 |
+
|
67 |
- We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
|
68 |
|
69 |
- For MMMU, we report both the original scores (left side: evaluated using the InternVL codebase for InternVL series models, and sourced from technical reports or webpages for other models) and the VLMEvalKit scores (right side: collected from the OpenCompass leaderboard).
|
|
|
323 |
|
324 |
# set the max number of tiles in `max_num`
|
325 |
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
|
326 |
+
generation_config = dict(max_new_tokens=1024, do_sample=True)
|
327 |
|
328 |
# pure-text conversation (纯文本对话)
|
329 |
question = 'Hello, who are you?'
|
|
|
475 |
|
476 |
## Finetune
|
477 |
|
478 |
+
Many repositories now support fine-tuning of the InternVL series models, including [InternVL](https://github.com/OpenGVLab/InternVL), [SWIFT](https://github.com/modelscope/ms-swift), [XTurner](https://github.com/InternLM/xtuner), and others. Please refer to their documentation for more details on fine-tuning.
|
479 |
|
480 |
## Deployment
|
481 |
|
482 |
### LMDeploy
|
483 |
|
484 |
+
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
|
|
|
|
485 |
|
486 |
+
```sh
|
487 |
+
pip install lmdeploy==0.5.3
|
|
|
|
|
|
|
|
|
488 |
```
|
489 |
|
490 |
+
LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
|
491 |
+
|
492 |
+
#### Service
|
493 |
+
|
494 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
495 |
|
496 |
> **⚠️ Warning**: Please make sure to install Flash Attention; otherwise, using `--tp` will cause errors.
|
497 |
|
498 |
```shell
|
499 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --tp 4
|
500 |
```
|
501 |
|
502 |
To use the OpenAI-style interface, you need to install OpenAI:
|
|
|
533 |
print(response)
|
534 |
```
|
535 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
536 |
## License
|
537 |
|
538 |
This project is released under the MIT license, while Llama3 is licensed under the Llama 3 Community License.
|
|
|
605 |
| MathVista<sub>testmini</sub> | 63.8 | 67.7 | 63.7 | 65.5 |
|
606 |
| OpenCompass<sub>avg</sub> | 69.9 | 67.9 | 69.7 | 71.0 |
|
607 |
|
608 |
+
- 关于更多的细节以及评测复现,请看我们的[评测指南](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html)。
|
609 |
+
|
610 |
- 我们同时使用 InternVL 和 VLMEvalKit 仓库进行模型评估。具体来说,DocVQA、ChartQA、InfoVQA、TextVQA、MME、AI2D、MMBench、CCBench、MMVet 和 SEED-Image 的结果是使用 InternVL 仓库测试的。OCRBench、RealWorldQA、HallBench 和 MathVista 是使用 VLMEvalKit 进行评估的。
|
611 |
|
612 |
- 对于MMMU,我们报告了原始分数(左侧:InternVL系列模型使用InternVL代码库评测,其他模型的分数来自其技术报告或网页)和VLMEvalKit分数(右侧:从OpenCompass排行榜收集)。
|
|
|
665 |
|
666 |
## 微调
|
667 |
|
668 |
+
许多仓库现在都支持 InternVL 系列模型的微调,包括 [InternVL](https://github.com/OpenGVLab/InternVL)、[SWIFT](https://github.com/modelscope/ms-swift)、[XTurner](https://github.com/InternLM/xtuner) 等。请参阅它们的文档以获取更多微调细节。
|
669 |
|
670 |
## 部署
|
671 |
|
672 |
### LMDeploy
|
673 |
|
674 |
+
LMDeploy 是由 MMRazor 和 MMDeploy 团队开发的用于压缩、部署和服务大语言模型(LLM)的工具包。
|
|
|
|
|
675 |
|
676 |
+
```sh
|
677 |
+
pip install lmdeploy==0.5.3
|
|
|
|
|
|
|
|
|
678 |
```
|
679 |
|
680 |
+
LMDeploy 将多模态视觉-语言模型(VLM)的复杂推理过程抽象为一个易于使用的管道,类似于大语言模型(LLM)的推理管道。
|
681 |
+
|
682 |
+
#### API部署
|
683 |
+
|
684 |
LMDeploy 的 `api_server` 使模型能够通过一个命令轻松打包成服务。提供的 RESTful API 与 OpenAI 的接口兼容。以下是服务启动的示例:
|
685 |
|
686 |
> **⚠️ 注意**: 请务必安装Flash Attention; 否则,使用`——tp`将存在异常。
|
687 |
|
688 |
```shell
|
689 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --tp 4
|
690 |
```
|
691 |
|
692 |
为了使用OpenAI风格的API接口,您需要安装OpenAI:
|
|
|
723 |
print(response)
|
724 |
```
|
725 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
726 |
## 开源许可证
|
727 |
|
728 |
该项目采用 MIT 许可证发布,而 LLama3 则采用 Llama 3 Community License 许可证。
|
modeling_intern_vit.py
CHANGED
@@ -20,18 +20,12 @@ from transformers.utils import logging
|
|
20 |
from .configuration_intern_vit import InternVisionConfig
|
21 |
|
22 |
try:
|
23 |
-
try: # v1
|
24 |
-
from flash_attn.flash_attn_interface import \
|
25 |
-
flash_attn_unpadded_qkvpacked_func
|
26 |
-
except: # v2
|
27 |
-
from flash_attn.flash_attn_interface import \
|
28 |
-
flash_attn_varlen_qkvpacked_func as flash_attn_unpadded_qkvpacked_func
|
29 |
-
|
30 |
from flash_attn.bert_padding import pad_input, unpad_input
|
31 |
-
|
|
|
32 |
has_flash_attn = True
|
33 |
except:
|
34 |
-
print('
|
35 |
has_flash_attn = False
|
36 |
|
37 |
logger = logging.get_logger(__name__)
|
@@ -74,7 +68,7 @@ class FlashAttention(nn.Module):
|
|
74 |
max_s = seqlen
|
75 |
cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
|
76 |
device=qkv.device)
|
77 |
-
output =
|
78 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
79 |
softmax_scale=self.softmax_scale, causal=causal
|
80 |
)
|
@@ -84,7 +78,7 @@ class FlashAttention(nn.Module):
|
|
84 |
x = rearrange(qkv, 'b s three h d -> b s (three h d)')
|
85 |
x_unpad, indices, cu_seqlens, max_s = unpad_input(x, key_padding_mask)
|
86 |
x_unpad = rearrange(x_unpad, 'nnz (three h d) -> nnz three h d', three=3, h=nheads)
|
87 |
-
output_unpad =
|
88 |
x_unpad, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
89 |
softmax_scale=self.softmax_scale, causal=causal
|
90 |
)
|
@@ -93,7 +87,7 @@ class FlashAttention(nn.Module):
|
|
93 |
'b s (h d) -> b s h d', h=nheads)
|
94 |
else:
|
95 |
assert max_s is not None
|
96 |
-
output =
|
97 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
98 |
softmax_scale=self.softmax_scale, causal=causal
|
99 |
)
|
|
|
20 |
from .configuration_intern_vit import InternVisionConfig
|
21 |
|
22 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
from flash_attn.bert_padding import pad_input, unpad_input
|
24 |
+
from flash_attn.flash_attn_interface import \
|
25 |
+
flash_attn_varlen_qkvpacked_func
|
26 |
has_flash_attn = True
|
27 |
except:
|
28 |
+
print('FlashAttention2 is not installed.')
|
29 |
has_flash_attn = False
|
30 |
|
31 |
logger = logging.get_logger(__name__)
|
|
|
68 |
max_s = seqlen
|
69 |
cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
|
70 |
device=qkv.device)
|
71 |
+
output = flash_attn_varlen_qkvpacked_func(
|
72 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
73 |
softmax_scale=self.softmax_scale, causal=causal
|
74 |
)
|
|
|
78 |
x = rearrange(qkv, 'b s three h d -> b s (three h d)')
|
79 |
x_unpad, indices, cu_seqlens, max_s = unpad_input(x, key_padding_mask)
|
80 |
x_unpad = rearrange(x_unpad, 'nnz (three h d) -> nnz three h d', three=3, h=nheads)
|
81 |
+
output_unpad = flash_attn_varlen_qkvpacked_func(
|
82 |
x_unpad, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
83 |
softmax_scale=self.softmax_scale, causal=causal
|
84 |
)
|
|
|
87 |
'b s (h d) -> b s h d', h=nheads)
|
88 |
else:
|
89 |
assert max_s is not None
|
90 |
+
output = flash_attn_varlen_qkvpacked_func(
|
91 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
92 |
softmax_scale=self.softmax_scale, causal=causal
|
93 |
)
|