File size: 3,228 Bytes
2aa9c33
cbaa190
 
 
 
 
 
 
 
298cb07
2aa9c33
 
cbaa190
2aa9c33
cbaa190
2aa9c33
cbaa190
 
2aa9c33
cbaa190
2aa9c33
cbaa190
2aa9c33
 
 
cbaa190
 
 
2aa9c33
cbaa190
2aa9c33
cbaa190
2aa9c33
cbaa190
2aa9c33
cbaa190
2aa9c33
cbaa190
 
 
2aa9c33
cbaa190
2aa9c33
cbaa190
 
 
 
 
 
2aa9c33
cbaa190
 
 
 
2aa9c33
cbaa190
 
 
 
 
2aa9c33
cbaa190
 
 
 
2aa9c33
cbaa190
 
 
 
 
 
 
 
 
 
2aa9c33
cbaa190
2aa9c33
 
cbaa190
2aa9c33
cbaa190
 
 
 
2aa9c33
cbaa190
2aa9c33
cbaa190
 
 
 
2aa9c33
cbaa190
2aa9c33
cbaa190
 
 
298cb07
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
language:
- en
license: other
tags:
- chat
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
library_name: transformers
---

# Dracarys2-72B-Instruct

# Introduction

We introduce the latest in the Smaug series, the Dracarys family of finetunes targeting coding performance improvements
across a variety of base models.

This variant is a finetune of [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)

Compared to Qwen2.5-72B-Instruct, Dracarys has better LiveCodeBench scores (see evaluation results below).

### Model Description

- **Developed by:** [Abacus.AI](https://abacus.ai)
- **License:** https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
- **Finetuned from model:** [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct).

## How to use

The prompt format is unchanged from Qwen2.5-72B-Instruct (see evaluations for prompt details for LCB)

### Use with transformers

See the snippet below for usage with Transformers:

```python
import transformers
import torch

model_id = "abacusai/Dracarys2-72B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are data science coding assistant that generates Python code using Pandas and Numpy."},
    {"role": "user", "content": "Write code to select rows from the dataframe `df` having the maximum `temp` for each `city`"},
]

prompt = pipeline.tokenizer.apply_chat_template(
		messages, 
		tokenize=False, 
		add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
```

# Evaluation Results


## LiveCodeBench

| Model                      | Code Generation | Code Execution (COT) |Test Output Prediction |
|----------------------------|-----------------|----------------------|-----------------------|
| **Dracarys2-72B-Instruct** | **53.80**       | **89.12**            | **59.61**             |
| Qwen2.5-72B-Instruct       | 53.03           | 88.72                | 46.28                 |

## Breakdown of LiveCodeBench CodeGeneration

| Model                     | Easy            | Medium         | Hard          |
|---------------------------|-----------------|----------------|---------------|
| **Dracarys2-72B-Instruct**| **88.79**       | **50.28**      | 9.47          |
| Qwen2.5-72B-Instruct      |  86.99          | 49.59          | 9.99          |

## Breakdown of LiveCodeBench TestOutputPrediction

| Model                     | Easy            | Medium         | Hard                  |
|---------------------------|-----------------|----------------|-----------------------|
| **Dracarys2-72B-Instruct**| **79.25**       | **53.76**      | **37.63**             |
| Qwen2.5-72B-Instruct      |  68.43          |  39.46         |  22.22                |