Update README.md
Browse files
README.md
CHANGED
@@ -3,164 +3,18 @@ license: other
|
|
3 |
license_name: deepseek-license
|
4 |
license_link: LICENSE
|
5 |
---
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
</p>
|
10 |
-
<p align="center"><a href="https://www.deepseek.com/">[🏠Homepage]</a> | <a href="https://coder.deepseek.com/">[🤖 Chat with DeepSeek Coder]</a> | <a href="https://discord.gg/Tc7c45Zzu5">[Discord]</a> | <a href="https://github.com/guoday/assert/blob/main/QR.png?raw=true">[Wechat(微信)]</a> </p>
|
11 |
-
<hr>
|
12 |
|
|
|
|
|
13 |
|
14 |
-
|
|
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
-
|
19 |
-
|
20 |
-
- **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
|
21 |
-
|
22 |
-
- **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
|
23 |
-
|
24 |
-
- **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
### 2. Model Summary
|
29 |
-
deepseek-coder-6.7b-base is a 6.7B parameter model with Multi-Head Attention trained on 2 trillion tokens.
|
30 |
-
- **Home Page:** [DeepSeek](https://deepseek.com/)
|
31 |
-
- **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
|
32 |
-
- **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://coder.deepseek.com/)
|
33 |
-
|
34 |
-
|
35 |
-
### 3. How to Use
|
36 |
-
Here give some examples of how to use our model.
|
37 |
-
#### 1)Code Completion
|
38 |
-
```python
|
39 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
40 |
-
import torch
|
41 |
-
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
|
42 |
-
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True).cuda()
|
43 |
-
input_text = "#write a quick sort algorithm"
|
44 |
-
inputs = tokenizer(input_text, return_tensors="pt").cuda()
|
45 |
-
outputs = model.generate(**inputs, max_length=128)
|
46 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
47 |
-
```
|
48 |
-
|
49 |
-
#### 2)Code Insertion
|
50 |
-
```python
|
51 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
52 |
-
import torch
|
53 |
-
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
|
54 |
-
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True).cuda()
|
55 |
-
input_text = """<|fim▁begin|>def quick_sort(arr):
|
56 |
-
if len(arr) <= 1:
|
57 |
-
return arr
|
58 |
-
pivot = arr[0]
|
59 |
-
left = []
|
60 |
-
right = []
|
61 |
-
<|fim▁hole|>
|
62 |
-
if arr[i] < pivot:
|
63 |
-
left.append(arr[i])
|
64 |
-
else:
|
65 |
-
right.append(arr[i])
|
66 |
-
return quick_sort(left) + [pivot] + quick_sort(right)<|fim▁end|>"""
|
67 |
-
inputs = tokenizer(input_text, return_tensors="pt").cuda()
|
68 |
-
outputs = model.generate(**inputs, max_length=128)
|
69 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
|
70 |
-
```
|
71 |
-
|
72 |
-
#### 3)Repository Level Code Completion
|
73 |
-
```python
|
74 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
75 |
-
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
|
76 |
-
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True).cuda()
|
77 |
-
|
78 |
-
input_text = """#utils.py
|
79 |
-
import torch
|
80 |
-
from sklearn import datasets
|
81 |
-
from sklearn.model_selection import train_test_split
|
82 |
-
from sklearn.preprocessing import StandardScaler
|
83 |
-
from sklearn.metrics import accuracy_score
|
84 |
-
|
85 |
-
def load_data():
|
86 |
-
iris = datasets.load_iris()
|
87 |
-
X = iris.data
|
88 |
-
y = iris.target
|
89 |
-
|
90 |
-
# Standardize the data
|
91 |
-
scaler = StandardScaler()
|
92 |
-
X = scaler.fit_transform(X)
|
93 |
-
|
94 |
-
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
95 |
-
|
96 |
-
# Convert numpy data to PyTorch tensors
|
97 |
-
X_train = torch.tensor(X_train, dtype=torch.float32)
|
98 |
-
X_test = torch.tensor(X_test, dtype=torch.float32)
|
99 |
-
y_train = torch.tensor(y_train, dtype=torch.int64)
|
100 |
-
y_test = torch.tensor(y_test, dtype=torch.int64)
|
101 |
-
|
102 |
-
return X_train, X_test, y_train, y_test
|
103 |
-
|
104 |
-
def evaluate_predictions(y_test, y_pred):
|
105 |
-
return accuracy_score(y_test, y_pred)
|
106 |
-
#model.py
|
107 |
-
import torch
|
108 |
-
import torch.nn as nn
|
109 |
-
import torch.optim as optim
|
110 |
-
from torch.utils.data import DataLoader, TensorDataset
|
111 |
-
|
112 |
-
class IrisClassifier(nn.Module):
|
113 |
-
def __init__(self):
|
114 |
-
super(IrisClassifier, self).__init__()
|
115 |
-
self.fc = nn.Sequential(
|
116 |
-
nn.Linear(4, 16),
|
117 |
-
nn.ReLU(),
|
118 |
-
nn.Linear(16, 3)
|
119 |
-
)
|
120 |
-
|
121 |
-
def forward(self, x):
|
122 |
-
return self.fc(x)
|
123 |
-
|
124 |
-
def train_model(self, X_train, y_train, epochs, lr, batch_size):
|
125 |
-
criterion = nn.CrossEntropyLoss()
|
126 |
-
optimizer = optim.Adam(self.parameters(), lr=lr)
|
127 |
-
|
128 |
-
# Create DataLoader for batches
|
129 |
-
dataset = TensorDataset(X_train, y_train)
|
130 |
-
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
|
131 |
-
|
132 |
-
for epoch in range(epochs):
|
133 |
-
for batch_X, batch_y in dataloader:
|
134 |
-
optimizer.zero_grad()
|
135 |
-
outputs = self(batch_X)
|
136 |
-
loss = criterion(outputs, batch_y)
|
137 |
-
loss.backward()
|
138 |
-
optimizer.step()
|
139 |
-
|
140 |
-
def predict(self, X_test):
|
141 |
-
with torch.no_grad():
|
142 |
-
outputs = self(X_test)
|
143 |
-
_, predicted = outputs.max(1)
|
144 |
-
return predicted.numpy()
|
145 |
-
#main.py
|
146 |
-
from utils import load_data, evaluate_predictions
|
147 |
-
from model import IrisClassifier as Classifier
|
148 |
-
|
149 |
-
def main():
|
150 |
-
# Model training and evaluation
|
151 |
-
"""
|
152 |
-
inputs = tokenizer(input_text, return_tensors="pt").cuda()
|
153 |
-
outputs = model.generate(**inputs, max_new_tokens=140)
|
154 |
-
print(tokenizer.decode(outputs[0]))
|
155 |
-
```
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
### 4. License
|
160 |
-
This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.
|
161 |
-
|
162 |
-
See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.
|
163 |
-
|
164 |
-
### 5. Contact
|
165 |
-
|
166 |
-
If you have any questions, please raise an issue or contact us at [agi_code@deepseek.com](mailto:agi_code@deepseek.com).
|
|
|
3 |
license_name: deepseek-license
|
4 |
license_link: LICENSE
|
5 |
---
|
6 |
+
# deepseek-coder-6.7B-chat
|
7 |
+
It was created by starting with the deepseek-coder-6.7B and training it on the open assistant dataset. We have attached the wandb report in pdf form to view the training run at a glance.
|
8 |
|
9 |
+
# Reson
|
10 |
+
This model was fine tned to allow it to follow direction and is a steeping stone to further training, but still would be good for asking qestions about code.
|
|
|
|
|
|
|
11 |
|
12 |
+
# Templete
|
13 |
+
Us the following templete when interacting with the fine tuned model.
|
14 |
|
15 |
+
# Getting Started
|
16 |
+
A quick start quide.
|
17 |
|
18 |
+
# Referrals
|
19 |
+
Run Pod - This is who I use to train th emodels on huggingface. If you use it we both get free crdits.
|
20 |
+
Paypal - If you want to leave a tip, it is appecaheted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|