LLaMA 3-8B Quantized (4-bit)

Model Overview

This is a 4-bit quantized version of the LLaMA 3-8B model, optimized for faster inference and reduced memory usage. The quantization allows the model to be more efficient while maintaining high performance for causal language modeling tasks.

Model type: Causal Language Model (LLM)

Quantization: 4-bit

Base model: LLaMA 3-8B

Framework: Transformers

Usage: Text generation, conversational agents, language understanding tasks

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Examples

Text Generation

Inference API (serverless) is not available, repository is disabled.