kazemnejad commited on
Commit
7e35312
1 Parent(s): bf94f38

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - bigcode/starcoderdata
5
+ language:
6
+ - en
7
+ ---
8
+ # McGill-NLP/codellm_1b_rotary
9
+
10
+ This model is a 1B-scale decoder-only transformer designed to explore the impact of positional encoding on length generalization, specifically trained with **Rotary** positional encoding to assess its effectiveness in length generalization tasks.
11
+
12
+ ## Usage Example
13
+ ```python
14
+ import torch
15
+ from transformers import AutoModelForCausalLM, AutoTokenizer
16
+
17
+ model_name = "McGill-NLP/codellm_1b_rotary"
18
+
19
+ # Important: `trust_remote_code=True` is required due to
20
+ # the custom architecture supporting different positional encodings,
21
+ # necessitating the download of the model implementation from Huggingface
22
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
23
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
24
+
25
+ print(model.config.position_encoding_type)
26
+ # Outputs: `rotary`
27
+
28
+ prompt = "def print_hello_world():"
29
+ input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids
30
+ input_ids = torch.cat([
31
+ torch.tensor([[tokenizer.bos_token_id]], device="cuda"), input_ids
32
+ ], dim=1) # Prepend <bos> token
33
+
34
+ output = model.generate(input_ids, do_sample=True, temperature=0.2, max_length=16)
35
+ print(tokenizer.decode(output[0]))
36
+ ```
37
+
38
+ ## Model Details
39
+
40
+ ### Model Description
41
+
42
+ - **Developed by:** McGill NLP Group
43
+ - **Model type:** Decoder-only transformer
44
+ - **Language(s) (NLP):** Primarily English, with potential application across various programming languages as demonstrated by its training on a code dataset.
45
+ - **License:** Apache 2.0
46
+ - **Finetuned from model:** This model is pretrained from scratch.
47
+
48
+ ### Model Sources
49
+
50
+ - **Repository:** [McGill-NLP/Length-Generalization GitHub Repository](https://github.com/McGill-NLP/length-generalization)
51
+ - **Paper:** [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)
52
+
53
+ ## Uses
54
+
55
+ ### Direct Use
56
+
57
+ The model is designed for direct application in NLP tasks that require understanding and generating text. It's especially suited for working with source code, making it a valuable tool for tasks such as code completion, bug fixing, or even code generation.
58
+
59
+ ## Bias, Risks, and Limitations
60
+
61
+ Given the model's training on source code, it might inherit biases present in the underlying dataset, including but not limited to, biases towards more commonly used programming languages or coding styles. Users should be cautious when applying this model to diverse or underrepresented coding languages and contexts.
62
+ This model has not undergone safety training and it is only produced for research purposes. The user is soley responsible for outputs of this model.
63
+
64
+ ### Recommendations
65
+
66
+ Users should consider the context and diversity of the application domain when employing this model, especially in critical systems. Further evaluation and fine-tuning might be necessary to mitigate any potential biases or limitations for specific use cases.
67
+
68
+ ## How to Get Started with the Model
69
+
70
+ Use the example provided in the README to get started with generating text or code. Ensure you have the necessary dependencies installed, including `torch` and `transformers`, and follow the guidelines for setting up your environment.
71
+
72
+ ## Training Details
73
+
74
+ ### Training Data
75
+
76
+ The model was pretrained on a dataset comprising 30M source code files from the StarCoder corpus, amounting to 30B token. The training data mix:
77
+
78
+ - 40% Python
79
+ - 25% Java
80
+ - 25% JavaScript
81
+ - 5% GitHub issues
82
+ - 5% GitHub commits
83
+
84
+
85
+ ### Training Procedure
86
+
87
+ The model follows a decoder-only architecture with 1.3 billion parameters and was trained to predict the next token in the sequence. For more detailed information on the training procedure, refer to the paper linked above.
88
+
89
+
90
+ ## Technical Specifications
91
+
92
+ ### Model Architecture and Objective
93
+
94
+ The model leverages a decoder-only transformer architecture without explicit positional encoding.
95
+
96
+ ## Citation
97
+
98
+ Please cite the following paper if you use this model in your work:
99
+
100
+ ```bibtex
101
+ @inproceedings{kazemnejad2023:ImpactOfPeOnLengthGen,
102
+ title={The Impact of Positional Encoding on Length Generalization in Transformers},
103
+ author={Amirhossein Kazemnejad and Inkit Padhi and Karthikeyan Natesan and Payel Das and Siva Reddy},
104
+ booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
105
+ year={2023},
106
+ url={https://openreview.net/forum?id=Drrl2gcjzl}
107
+ }
108
+ ```
109
+
110
+ ## More Information
111
+
112
+ For further details about the model's architecture, training, and applications, please refer to the paper and the GitHub repository linked above.