LLaMA 3-8B Quantized (4-bit)
Model Overview
This is a 4-bit quantized version of the LLaMA 3-8B model, optimized for faster inference and reduced memory usage. The quantization allows the model to be more efficient while maintaining high performance for causal language modeling tasks.
Model type: Causal Language Model (LLM)
Quantization: 4-bit
Base model: LLaMA 3-8B
Framework: Transformers
Usage: Text generation, conversational agents, language understanding tasks
Inference API (serverless) is not available, repository is disabled.