Infinity support: Short max_length=2048 for more optimized deployment

#20

by michaelfeil - opened 17 days ago

base: refs/heads/main

←

from: refs/pr/20

Discussion Files changed

-4

Infinity support: Short max_length for more optimized deployment1ea73a6c

michaelfeil

17 days ago

•

edited 17 days ago

@ Alibaba Team, thanks so much for the support of this model.

How to run with https://github.com/michaelfeil/infinity

Run via Docker:

docker run --gpus all -p 7997 michaelf34/infinity:0.0.68-trt-onnx v2 --model-id Alibaba-NLP/gte-Qwen2-1.5B-instruct --revision "refs/pr/20" --dtype bfloat16 --batch-size 8 --device cuda --engine torch --port 7997 --no-bettertransformer

Run via CLI:

pip install infinity_emb flash-attn
infinity_emb v2 --model-id Alibaba-NLP/gte-Qwen2-1.5B-instruct --revision "refs/pr/20" --dtype bfloat16 --batch-size 8 --device cuda --engine torch --port 7997 --no-bettertransformer

michaelfeil

17 days ago

DO NOT MERGE!

Update config.jsoncbbb9d20

michaelfeil changed pull request title from Infinity support: Short max_length for more optimized deployment to Infinity support: Short max_length=2048 for more optimized deployment 17 days ago

Update sentence_bert_config.jsonc87570a3

Update config.json8f52789e

Update config.json2e8a2b8d

michaelfeil changed pull request status to closed 17 days ago

michaelfeil changed pull request status to open 17 days ago

Update config.jsonc8a458f1

Update sentence_bert_config.json6bd0f999

thenlper changed pull request status to merged 16 days ago

michaelfeil

16 days ago

•

edited 16 days ago

Please do not merge PR, as mentioned above.

Ill open a new PR to undo this -> https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct/discussions/22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment