nicholasKluge/Aira-2-1B1-GGUF

Quantized GGUF model files for Aira-2-1B1 from nicholasKluge

Name	Quant method	Size
aira-2-1b1.fp16.gguf	fp16	2.20 GB
aira-2-1b1.q2_k.gguf	q2_k	482.15 MB
aira-2-1b1.q3_k_m.gguf	q3_k_m	549.86 MB
aira-2-1b1.q4_k_m.gguf	q4_k_m	667.83 MB
aira-2-1b1.q5_k_m.gguf	q5_k_m	782.06 MB
aira-2-1b1.q6_k.gguf	q6_k	903.43 MB
aira-2-1b1.q8_0.gguf	q8_0	1.17 GB

Original Model Card:

Aira-2-1B1

Aira-2 is the second version of the Aira instruction-tuned series. Aira-2-1B1 is an instruction-tuned GPT-style model based on TinyLlama-1.1B. The model was trained with a dataset composed of prompts and completions generated synthetically by prompting already-tuned models (ChatGPT, Llama, Open-Assistant, etc).

Check our gradio-demo in Spaces.

Details

Size: 1,261,545,472 parameters
Dataset: Instruct-Aira Dataset
Language: English
Number of Epochs: 3
Batch size: 4
Optimizer: torch.optim.AdamW (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
GPU: 1 NVIDIA A100-SXM4-40GB
Emissions: 1.78 KgCO2 (Singapore)
Total Energy Consumption: 3.64 kWh

This repository has the source code used to train this model.

Usage

Three special tokens are used to mark the user side of the interaction and the model's response:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/Aira-2-1B1')
aira = AutoModelForCausalLM.from_pretrained('nicholasKluge/Aira-2-1B1')

aira.eval()
aira.to(device)

question =  input("Enter your question: ")

inputs = tokenizer(tokenizer.bos_token + question + tokenizer.sep_token, return_tensors="pt").to(device)

responses = aira.generate(**inputs,
    bos_token_id=tokenizer.bos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    do_sample=True,
    top_k=50,
    max_length=500,
    top_p=0.95,
    temperature=0.7,
    num_return_sequences=2)

print(f"Question: 👤 {question}\n")

for i, response in  enumerate(responses):
    print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')

The model will output something like:

>>>Question: 👤 What is the capital of Brazil?

>>>Response 1: 🤖 The capital of Brazil is Brasília.
>>>Response 2: 🤖 The capital of Brazil is Brasília.

Limitations

🤥 Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.

🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.

Evaluation

Model (TinyLlama)	Average	ARC	TruthfulQA	ToxiGen
Aira-2-1B1	42.55	25.26	50.81	51.59
TinyLlama-1.1B-intermediate-step-480k-1T	37.52	30.89	39.55	42.13

Evaluations were performed using the Language Model Evaluation Harness (by EleutherAI).

Cite as 🤗


@misc{nicholas22aira,
  doi = {10.5281/zenodo.6989727},
  url = {https://huggingface.co/nicholasKluge/Aira-2-1B1},
  author = {Nicholas Kluge Corrêa},
  title = {Aira},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
}

License

The Aira-2-1B1 is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	25.19
ARC (25-shot)	23.21
HellaSwag (10-shot)	26.97
MMLU (5-shot)	24.86
TruthfulQA (0-shot)	50.63
Winogrande (5-shot)	50.28
GSM8K (5-shot)	0.0
DROP (3-shot)	0.39

afrideva
/

Aira-2-1B1-GGUF