kaitchup
/

Llama-3.1-Minitron-4B-Width-Base-AutoRound-GPTQ-sym-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Llama-3.1-Minitron-4B-Width-Base-AutoRound-GPTQ-sym-4bit / README.md

bnjmnmarie's picture

Update README.md

49710c3 verified 3 months ago

|

history blame contribute delete

703 Bytes

	---
	language:
	- en
	library_name: transformers
	tags:
	- AutoRound
	license: apache-2.0
	---


	## Model Details

	This is [nvidia/Llama-3.1-Minitron-4B-Width-Base](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base) quantized with AutoRound (symmetric quantization) to 4-bit. The model has been created, tested, and evaluated by The Kaitchup. It is compatible with the main inference frameworks, e.g., TGI and vLLM.

	Details on the quantization process and evaluation:
	[Mistral-NeMo: 4.1x Smaller with Quantized Minitron](https://kaitchup.substack.com/p/mistral-nemo-41x-smaller-with-quantized)


	- Developed by: [The Kaitchup](https://kaitchup.substack.com/)
	- License: Apache license 2.0