aws-prototyping
/

MegaBeam-Mistral-7B-512k

Text Generation

text-generation-inference

Model card Files Files and versions Community

chenwuml commited on 9 days ago

Commit

daa314a

•

1 Parent(s): 1294845

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -123,7 +123,6 @@ python3 -m vllm.entrypoints.openai.api_server \
     --tensor-parallel-size 8 \
     --enable-prefix-caching
 ```
-**Important Note** - In the repo revision `g5-48x`, `config.json` has been updated to set `max_position_embeddings` to 288,800, fitting the model's KV cache on a single `g5.48xlarge` instance with 8 A10 GPUs (24GB RAM per GPU).
 On an instance with larger GPU RAM (e.g. `p4d.24xlarge`), simply remove the `MAX_MODEL_LEN` argument in order to support the full sequence length of 524,288 tokens:
 ```shell

     --tensor-parallel-size 8 \
     --enable-prefix-caching
 ```
 On an instance with larger GPU RAM (e.g. `p4d.24xlarge`), simply remove the `MAX_MODEL_LEN` argument in order to support the full sequence length of 524,288 tokens:
 ```shell