File size: 6,804 Bytes
1d68918 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
library_name: transformers
datasets:
- argilla/distilabel-capybara-dpo-7k-binarized
---
# CapyLake-7B-v2-laser
This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
<div align="center">
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp)
[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
</div>
## Process
+ Realigned the chat template to ChatML
+ Completed 1 Epoch
+ 5e-05 learning rate
+ Training time was about 2 hours on 1 H100
+ Cost was ~$8
## Code Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")
# Adding hyperparameters to the generation call
outputs = model.generate(
**inputs,
max_new_tokens=4096, # Controls the maximum length of the new tokens created
temperature=0.7, # Adjust for creativity (lower is less random)
top_k=50, # Keeps the top k tokens for sampling
top_p=0.95, # Uses nucleus sampling with this cumulative probability
num_return_sequences=1, # Number of sequences to generate
no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity
early_stopping=True # Stops generation when all sequences reach the EOS token
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Other Capy Models
SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!
## Evaluations
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|-------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[CapyLake-7B-v2-laser](https://huggingface.co/macadeliccc/CapyLake-7B-v2-laser)| 44.34| 77.77| 68.47| 47.92| 59.62|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |28.35|± | 2.83|
| | |acc_norm|25.98|± | 2.76|
|agieval_logiqa_en | 0|acc |38.86|± | 1.91|
| | |acc_norm|39.02|± | 1.91|
|agieval_lsat_ar | 0|acc |25.22|± | 2.87|
| | |acc_norm|24.35|± | 2.84|
|agieval_lsat_lr | 0|acc |50.39|± | 2.22|
| | |acc_norm|51.57|± | 2.22|
|agieval_lsat_rc | 0|acc |65.06|± | 2.91|
| | |acc_norm|63.94|± | 2.93|
|agieval_sat_en | 0|acc |78.64|± | 2.86|
| | |acc_norm|78.64|± | 2.86|
|agieval_sat_en_without_passage| 0|acc |40.78|± | 3.43|
| | |acc_norm|40.78|± | 3.43|
|agieval_sat_math | 0|acc |33.64|± | 3.19|
| | |acc_norm|30.45|± | 3.11|
Average: 44.34%
### GPT4All
| Task |Version| Metric |Value| |Stderr|
|-------------|------:|--------|----:|---|-----:|
|arc_challenge| 0|acc |66.89|± | 1.38|
| | |acc_norm|67.49|± | 1.37|
|arc_easy | 0|acc |86.70|± | 0.70|
| | |acc_norm|81.90|± | 0.79|
|boolq | 1|acc |88.10|± | 0.57|
|hellaswag | 0|acc |71.45|± | 0.45|
| | |acc_norm|87.78|± | 0.33|
|openbookqa | 0|acc |39.80|± | 2.19|
| | |acc_norm|49.80|± | 2.24|
|piqa | 0|acc |82.86|± | 0.88|
| | |acc_norm|84.87|± | 0.84|
|winogrande | 0|acc |84.45|± | 1.02|
Average: 77.77%
### TruthfulQA
| Task |Version|Metric|Value| |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc| 1|mc1 |53.98|± | 1.74|
| | |mc2 |68.47|± | 1.53|
Average: 68.47%
### Bigbench
| Task |Version| Metric |Value| |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|59.47|± | 3.57|
|bigbench_date_understanding | 0|multiple_choice_grade|64.50|± | 2.49|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|44.96|± | 3.10|
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.84|± | 2.22|
| | |exact_str_match | 2.79|± | 0.87|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.80|± | 2.07|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|21.57|± | 1.56|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|56.67|± | 2.87|
|bigbench_movie_recommendation | 0|multiple_choice_grade|51.60|± | 2.24|
|bigbench_navigate | 0|multiple_choice_grade|51.00|± | 1.58|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.35|± | 1.02|
|bigbench_ruin_names | 0|multiple_choice_grade|51.79|± | 2.36|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|35.97|± | 1.52|
|bigbench_snarks | 0|multiple_choice_grade|79.01|± | 3.04|
|bigbench_sports_understanding | 0|multiple_choice_grade|75.66|± | 1.37|
|bigbench_temporal_sequences | 0|multiple_choice_grade|47.90|± | 1.58|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.84|± | 1.21|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.00|± | 0.92|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|56.67|± | 2.87|
Average: 47.92%
Average score: 59.62%
Elapsed time: 01:57:56 |