Edit model card

๐Ÿš€ al-baka-llama3-8b

Al Baka is an Experimental Fine Tuned Model based on the new released LLAMA3-8B Model on the Stanford Alpaca dataset Arabic version Yasbok/Alpaca_arabic_instruct.

Model Summary

Model Details

  • The model was fine-tuned in 4-bit precision using unsloth

  • The run is performed only for 1000 steps with a single Google Colab T4 GPU NVIDIA GPU with 15 GB of available memory.

The model is currently being Experimentally Fine Tuned to assess LLaMA-3's response to Arabic, following a brief period of fine-tuning. Larger and more sophisticated models will be introduced soon.

How to Get Started with the Model

Setup

# Install packages
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

First, Load the Model

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

Second, Try the model

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
       "ุงุณุชุฎุฏู… ุงู„ุจูŠุงู†ุงุช ุงู„ู…ุนุทุงุฉ ู„ุญุณุงุจ ุงู„ูˆุณูŠุท.", # instruction
        "[2 ุŒ 3 ุŒ 7 ุŒ 8 ุŒ 10]", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Recommendations

  • unsloth for finetuning models. You can get a 2x faster finetuned model which can be exported to any format or uploaded to Hugging Face.
Downloads last month
2,517
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using Omartificial-Intelligence-Space/al-baka-llama3-8b-experimental 5

Collection including Omartificial-Intelligence-Space/al-baka-llama3-8b-experimental