Syed-Hasan-8503
/

PaluLlama-3-8B-Instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Syed-Hasan-8503 commited on Aug 19

Commit

d8b9180

•

1 Parent(s): 5043722

Update README.md

Files changed (1) hide show

README.md +0 -6

README.md CHANGED Viewed

@@ -11,12 +11,6 @@ library_name: transformers
 This repository contains a compressed version of the Meta Llama-3-8B-Instruct model, utilizing the Palu framework for KV-Cache compression. Palu reduces the hidden dimensions of the KV-Cache through low-rank decomposition, significantly reducing the model's memory footprint while maintaining or enhancing performance.
-## Evaluation Results
-Here's a Markdown file to include the results of your comparisons:
----
 # Meta Llama-3-8B-Instruct: Palu Compression Results
 ## Perplexity (PPL)

 This repository contains a compressed version of the Meta Llama-3-8B-Instruct model, utilizing the Palu framework for KV-Cache compression. Palu reduces the hidden dimensions of the KV-Cache through low-rank decomposition, significantly reducing the model's memory footprint while maintaining or enhancing performance.
 # Meta Llama-3-8B-Instruct: Palu Compression Results
 ## Perplexity (PPL)