metadata
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
example_title: Hello world
group: Python
license: bigcode-openrail-m
datasets:
- bigcode/commitpackft
- bigcode/oasst-octopack
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: OctoCoder
results:
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Python
metrics:
- name: pass@1
type: pass@1
value: 46.2
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize JavaScript
metrics:
- name: pass@1
type: pass@1
value: 39.2
verified: false
OctoCoder
Play with the model on the TODO Playground.
Model (↓) | Python | JavaScript | Java | Go | C++ | Rust | Avg. |
---|
WizardCoder | 31.8 | 29.5 | 12.7 | 30.4 | 18.7 | 13.0 | 22.7 |
GPT-4 | 47.0 | 48.2 | 50.0 | 50.6 | 47.6 | 43.3 | 47.8 |
InstructCodeT5+‡ | 2.7 | 1.2 | 4.3 | 2.1 | 0.2 | 0.5 | 1.8 |
BLOOMZ+ | 16.6 | 15.5 | 15.2 | 16.4 | 6.7 | 5.7 | 12.5 |
StarChat-β | 18.1 | 18.1 | 24.1 | 18.1 | 8.2 | 3.6 | 11.2 |
CodeGeeX2* | 15.9 | 14.7 | 18.0 | 13.6 | 4.3 | 6.1 | 12.1 |
StarCoder | 8.7 | 15.7 | 13.3 | 20.1 | 15.6 | 6.7 | 13.4 |
OctoGeeX* | 28.1 | 27.7 | 30.4 | 27.6 | 22.9 | 9.6 | 24.4 |
OctoCoder | 30.2 | 28.4 | 30.6 | 30.2 | 26.1 | 16.5 | 27.0 |
HumanEvalExplain
WizardCoder | 32.5 | 33.0 | 27.4 | 26.7 | 28.2 | 16.9 | 27.5 |
GPT-4 | 64.6 | 57.3 | 51.2 | 58.5 | 38.4 | 42.7 | 52.1 |
InstructCodeT5+‡ | 20.8 | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 3.5 |
BLOOMZ+ | 14.7 | 8.8 | 12.1 | 8.5 | 0.6 | 0.0 | 7.5 |
StarChat-β | 25.4 | 21.5 | 24.5 | 18.4 | 17.6 | 13.2 | 20.1 |
CodeGeeX2* | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
StarCoder | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
OctoGeeX* | 30.4 | 24.0 | 24.7 | 21.7 | 21.0 | 15.9 | 22.9 |
OctoCoder | 35.1 | 24.5 | 27.3 | 21.1 | 24.1 | 14.8 | 24.5 |
HumanEvalSynthesize
WizardCoder | 57.3 | 49.5 | 36.1 | 36.4 | 40.9 | 20.2 | 40.1 |
GPT-4 | 86.6 | 82.9 | 81.7 | 72.6 | 78.7 | 67.1 | 78.3 |
InstructCodeT5+‡ | 37.0 | 18.9 | 17.4 | 9.5 | 19.8 | 0.3 | 17.1 |
BLOOMZ+ | 15.6 | 14.8 | 18.4 | 8.4 | 6.5 | 5.5 | 11.5 |
StarChat-β | 33.5 | 31.4 | 26.7 | 25.5 | 26.6 | 14.0 | 26.3 |
CodeGeeX2* | 35.9 | 32.2 | 30.8 | 22.5 | 29.3 | 18.1 | 28.1 |
StarCoder | 33.6 | 30.8 | 30.2 | 17.6 | 31.6 | 21.8 | 27.6 |
OctoGeeX* | 44.7 | 33.8 | 36.9 | 21.9 | 32.3 | 15.7 | 30.9 |
OctoCoder | 46.2 | 39.2 | 38.2 | 30.4 | 35.6 | 23.4 | 35.5 |
Table of Contents
Model Summary
OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.
- Repository: bigcode/octopack
- Paper: TODO
- Languages: 80+ Programming languages
Use
Intended use
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
Feel free to share your generations in the Community tab!
Generation
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "bigcode/octocoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
Training
Model
- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
- Steps: 250k pretraining & 30 instruction tuning
- Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
- Precision: bfloat16
Hardware
- Pretraining:
- GPUs: 512 Tesla A100
- Training time: 24 days
- Instruction tuning:
- GPUs: 8 Tesla A100
- Training time: 4 hours
Software
- Orchestration: Megatron-LM/Transformers
- Neural networks: PyTorch
Citation
TODO