metadata

pipeline_tag: text-generation
inference: true
widget:
  - text: 'def print_hello_world():'
    example_title: Hello world
    group: Python
license: bigcode-openrail-m
datasets:
  - bigcode/commitpackft
  - bigcode/oasst-octopack
metrics:
  - code_eval
library_name: transformers
tags:
  - code
model-index:
  - name: OctoCoder
    results:
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize Python
        metrics:
          - name: pass@1
            type: pass@1
            value: 46.2
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode/humanevalpack
          name: HumanEvalSynthesize JavaScript
        metrics:
          - name: pass@1
            type: pass@1
            value: 39.2
            verified: false

OctoCoder

Play with the model on the TODO Playground.

Model (↓)	Python	JavaScript	Java	Go	C++	Rust	Avg.

HumanEvalFix

Non-permissive models

WizardCoder	31.8	29.5	12.7	30.4	18.7	13.0	22.7
GPT-4	47.0	48.2	50.0	50.6	47.6	43.3	47.8

Permissive models

InstructCodeT5+^‡	2.7	1.2	4.3	2.1	0.2	0.5	1.8
BLOOMZ⁺	16.6	15.5	15.2	16.4	6.7	5.7	12.5
StarChat-β	18.1	18.1	24.1	18.1	8.2	3.6	11.2
CodeGeeX2^*	15.9	14.7	18.0	13.6	4.3	6.1	12.1
StarCoder	8.7	15.7	13.3	20.1	15.6	6.7	13.4
OctoGeeX^*	28.1	27.7	30.4	27.6	22.9	9.6	24.4
OctoCoder	30.2	28.4	30.6	30.2	26.1	16.5	27.0

HumanEvalExplain

Non-permissive models

WizardCoder	32.5	33.0	27.4	26.7	28.2	16.9	27.5
GPT-4	64.6	57.3	51.2	58.5	38.4	42.7	52.1

Permissive models

InstructCodeT5+^‡	20.8	0.0	0.0	0.0	0.1	0.0	3.5
BLOOMZ⁺	14.7	8.8	12.1	8.5	0.6	0.0	7.5
StarChat-β	25.4	21.5	24.5	18.4	17.6	13.2	20.1
CodeGeeX2^*	0.0	0.0	0.0	0.0	0.0	0.0	0.0
StarCoder	0.0	0.0	0.0	0.0	0.0	0.0	0.0
OctoGeeX^*	30.4	24.0	24.7	21.7	21.0	15.9	22.9
OctoCoder	35.1	24.5	27.3	21.1	24.1	14.8	24.5

HumanEvalSynthesize

Non-permissive models

WizardCoder	57.3	49.5	36.1	36.4	40.9	20.2	40.1
GPT-4	86.6	82.9	81.7	72.6	78.7	67.1	78.3

Permissive models

InstructCodeT5+^‡	37.0	18.9	17.4	9.5	19.8	0.3	17.1
BLOOMZ⁺	15.6	14.8	18.4	8.4	6.5	5.5	11.5
StarChat-β	33.5	31.4	26.7	25.5	26.6	14.0	26.3
CodeGeeX2^*	35.9	32.2	30.8	22.5	29.3	18.1	28.1
StarCoder	33.6	30.8	30.2	17.6	31.6	21.8	27.6
OctoGeeX^*	44.7	33.8	36.9	21.9	32.3	15.7	30.9
OctoCoder	46.2	39.2	38.2	30.4	35.6	23.4	35.5

Model Summary
Use
Limitations
Training
License
Citation

Model Summary

OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.

Repository: bigcode/octopack
Paper: TODO
Languages: 80+ Programming languages

Use

Intended use

The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

Feel free to share your generations in the Community tab!

Generation

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/octocoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Training

Model

Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
Steps: 250k pretraining & 30 instruction tuning
Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
Precision: bfloat16

Hardware

Pretraining:
- GPUs: 512 Tesla A100
- Training time: 24 days
Instruction tuning:
- GPUs: 8 Tesla A100
- Training time: 4 hours

Software

Orchestration: Megatron-LM/Transformers
Neural networks: PyTorch

Citation

TODO

bigcode
/

octocoder

OctoCoder

HumanEvalExplain

HumanEvalSynthesize

Table of Contents

Model Summary

Use

Intended use

Generation

Training

Model

Hardware

Software

Citation