Spaces:

microsoft
/

VPTQ

Running

yangwang92 commited on 17 days ago

Commit

13243b6

•

1 Parent(s): 30eed04

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -16,8 +16,10 @@
 			<p>
 				<b>VPTQ (Vector Post-Training Quantization)</b> is an advanced compression technique that dramatically reduces the size of large language models such as the 70B and 405B Llama models. VPTQ efficiently compresses these models to 1-2 bits within just a few hours, enabling them to run effectively on GPUs with limited memory.
 				For more information, visit the following links:
-                <p>The current demo runs on a free, shared A100 provided by HUGGINGFACE, which may lead to long load times for model loading and acquiring an available GPU. This demo is intended to showcase the quality of the quantized model, not inference speed.</p>
-				<ul>
 					<li>
 						<a href="https://arxiv.org/abs/2409.17066" target="_blank" class="link-styled">
 							<img src="arxiv-logo.png" alt="arXiv" width="20" height="20" /> <b>Paper on arXiv</b>

 			<p>
 				<b>VPTQ (Vector Post-Training Quantization)</b> is an advanced compression technique that dramatically reduces the size of large language models such as the 70B and 405B Llama models. VPTQ efficiently compresses these models to 1-2 bits within just a few hours, enabling them to run effectively on GPUs with limited memory.
 				For more information, visit the following links:
+                <p style="font-weight: bold; font-size: larger;">
+                    The current demo runs on a free, shared A100 provided by HUGGINGFACE, which may lead to long load times for model loading and acquiring an available GPU. This demo is intended to showcase the quality of the quantized model, not inference speed.
+                </p>
+                <ul>
 					<li>
 						<a href="https://arxiv.org/abs/2409.17066" target="_blank" class="link-styled">
 							<img src="arxiv-logo.png" alt="arXiv" width="20" height="20" /> <b>Paper on arXiv</b>