---
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
example_title: Hello world
group: Python
license: bigcode-openrail-m
datasets:
- bigcode/commitpackft
- bigcode/oasst-octopack
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: OctoCoder
results:
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Python
metrics:
- name: pass@1
type: pass@1
value: 46.2
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize JavaScript
metrics:
- name: pass@1
type: pass@1
value: 39.2
verified: false
---
![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
# OctoCoder
Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
Model (↓) |
Python |
JavaScript |
Java |
Go |
C++ |
Rust |
Avg. |
HumanEvalFix
Non-permissive models
WizardCoder |
31.8 |
29.5 |
12.7 |
30.4 |
18.7 |
13.0 |
22.7 |
GPT-4 |
47.0 |
48.2 |
50.0 |
50.6 |
47.6 |
43.3 |
47.8 |
Permissive models
InstructCodeT5+‡ |
2.7 |
1.2 |
4.3 |
2.1 |
0.2 |
0.5 |
1.8 |
BLOOMZ+ |
16.6 |
15.5 |
15.2 |
16.4 |
6.7 |
5.7 |
12.5 |
StarChat-β |
18.1 |
18.1 |
24.1 |
18.1 |
8.2 |
3.6 |
11.2 |
CodeGeeX2* |
15.9 |
14.7 |
18.0 |
13.6 |
4.3 |
6.1 |
12.1 |
StarCoder |
8.7 |
15.7 |
13.3 |
20.1 |
15.6 |
6.7 |
13.4 |
OctoGeeX* |
28.1 |
27.7 |
30.4 |
27.6 |
22.9 |
9.6 |
24.4 |
OctoCoder |
30.2 |
28.4 |
30.6 |
30.2 |
26.1 |
16.5 |
27.0 |
HumanEvalExplain
Non-permissive models
WizardCoder |
32.5 |
33.0 |
27.4 |
26.7 |
28.2 |
16.9 |
27.5 |
GPT-4 |
64.6 |
57.3 |
51.2 |
58.5 |
38.4 |
42.7 |
52.1 |
Permissive models
InstructCodeT5+‡ |
20.8 |
0.0 |
0.0 |
0.0 |
0.1 |
0.0 |
3.5 |
BLOOMZ+ |
14.7 |
8.8 |
12.1 |
8.5 |
0.6 |
0.0 |
7.5 |
StarChat-β |
25.4 |
21.5 |
24.5 |
18.4 |
17.6 |
13.2 |
20.1 |
CodeGeeX2* |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
StarCoder |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
OctoGeeX* |
30.4 |
24.0 |
24.7 |
21.7 |
21.0 |
15.9 |
22.9 |
OctoCoder |
35.1 |
24.5 |
27.3 |
21.1 |
24.1 |
14.8 |
24.5 |
HumanEvalSynthesize
Non-permissive models
WizardCoder |
57.3 |
49.5 |
36.1 |
36.4 |
40.9 |
20.2 |
40.1 |
GPT-4 |
86.6 |
82.9 |
81.7 |
72.6 |
78.7 |
67.1 |
78.3 |
Permissive models
InstructCodeT5+‡ |
37.0 |
18.9 |
17.4 |
9.5 |
19.8 |
0.3 |
17.1 |
BLOOMZ+ |
15.6 |
14.8 |
18.4 |
8.4 |
6.5 |
5.5 |
11.5 |
StarChat-β |
33.5 |
31.4 |
26.7 |
25.5 |
26.6 |
14.0 |
26.3 |
CodeGeeX2* |
35.9 |
32.2 |
30.8 |
22.5 |
29.3 |
18.1 |
28.1 |
StarCoder |
33.6 |
30.8 |
30.2 |
17.6 |
31.6 |
21.8 |
27.6 |
OctoGeeX* |
44.7 |
33.8 |
36.9 |
21.9 |
32.3 |
15.7 |
30.9 |
OctoCoder |
46.2 |
39.2 |
38.2 |
30.4 |
35.6 |
23.4 |
35.5 |
## Table of Contents
1. [Model Summary](##model-summary)
2. [Use](##use)
3. [Limitations](##limitations)
4. [Training](##training)
5. [License](##license)
6. [Citation](##citation)
## Model Summary
OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.
- **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack)
- **Paper:** [TODO]()
- **Languages:** 80+ Programming languages
## Use
### Intended use
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
**Feel free to share your generations in the Community tab!**
### Generation
```python
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "bigcode/octocoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
# Training
## Model
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
- **Steps:** 250k pretraining & 30 instruction tuning
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
- **Precision:** bfloat16
## Hardware
- **Pretraining:**
- **GPUs:** 512 Tesla A100
- **Training time:** 24 days
- **Instruction tuning:**
- **GPUs:** 8 Tesla A100
- **Training time:** 4 hours
## Software
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
# Citation
TODO