Teja-Gollapudi
commited on
Commit
•
ea64011
1
Parent(s):
9134501
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-sa-3.0
|
3 |
+
datasets:
|
4 |
+
- VMware/open-instruct-v1-oasst-dolly-hhrlhf
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
library_name: transformers
|
8 |
+
pipeline_tag: text-generation
|
9 |
+
---
|
10 |
+
|
11 |
+
# VMware/open-llama-7B-v2-open-instruct
|
12 |
+
Instruction-tuned version of the fully trained Open LLama 7B v2 model. The model is open for <b>COMMERCIAL USE</b>. <br>
|
13 |
+
|
14 |
+
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
15 |
+
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
16 |
+
|
17 |
+
<b> NOTE </b> : The model was trained using the Alpaca prompt template
|
18 |
+
<b> NOTE </b> : Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
|
19 |
+
|
20 |
+
## License
|
21 |
+
- <b>Commercially Viable </b>
|
22 |
+
|
23 |
+
- Open-instruct-v1
|
24 |
+
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
25 |
+
|
26 |
+
Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples
|
27 |
+
- ESNLI - MIT
|
28 |
+
- ECQA - CDLA 1.0 - Sharing
|
29 |
+
- Strategy - MIT
|
30 |
+
- CREAK - MIT
|
31 |
+
- gsmk8 - MIT
|
32 |
+
- aqua - MIT
|
33 |
+
- qasc - Apache 2.0
|
34 |
+
- Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
35 |
+
|
36 |
+
|
37 |
+
## Nomenclature
|
38 |
+
|
39 |
+
- Model : Open-llama-v2
|
40 |
+
- Model Size: 7B parameters
|
41 |
+
- Dataset: Open-instruct(oasst,dolly, hhrlhf)
|
42 |
+
|
43 |
+
## Use in Transformers
|
44 |
+
|
45 |
+
```
|
46 |
+
import os
|
47 |
+
import torch
|
48 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
49 |
+
|
50 |
+
model_name = 'VMware/open-llama-7b-open-instruct'
|
51 |
+
|
52 |
+
|
53 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
|
54 |
+
|
55 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map='sequential')
|
56 |
+
|
57 |
+
prompt_template = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
|
58 |
+
|
59 |
+
prompt = """What is attention mechanism of a transformer model?
|
60 |
+
Write a python code to illustrate how attention works within a transformer model using numpy library. Donot use pytorch or tensorflow."""
|
61 |
+
|
62 |
+
|
63 |
+
inputt = prompt_template.format(instruction= prompt)
|
64 |
+
input_ids = tokenizer(inputt, return_tensors="pt").input_ids.to("cuda")
|
65 |
+
|
66 |
+
output1 = model.generate(input_ids, max_length=512)
|
67 |
+
input_length = input_ids.shape[1]
|
68 |
+
output1 = output1[:, input_length:]
|
69 |
+
output = tokenizer.decode(output1[0])
|
70 |
+
|
71 |
+
print(output)
|
72 |
+
|
73 |
+
'''
|
74 |
+
Sure, I can help you with that!
|
75 |
+
|
76 |
+
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
77 |
+
|
78 |
+
Here's an example code using NumPy to illustrate how attention works in a transformer model:
|
79 |
+
|
80 |
+
```python
|
81 |
+
import numpy as np
|
82 |
+
|
83 |
+
def attention_weights(query, key, value, mask):
|
84 |
+
# Query, key, and value are input tensors. Mask is a tensor of zeros and ones that represents the attention mask.
|
85 |
+
# It is used to prevent the model from attending to certain positions in the input sequence if they are not relevant.
|
86 |
+
# The attention weights are the element-wise product of the query, key, and mask tensors.
|
87 |
+
# The result is a tensor of the same shape as the query tensor.
|
88 |
+
|
89 |
+
# Compute the dot product between the query tensor and the key tensor
|
90 |
+
dot = np.matmul(query, key)
|
91 |
+
|
92 |
+
# Compute the element-wise softmax of the dot product tensor
|
93 |
+
exp_dot = np.exp(dot)
|
94 |
+
|
95 |
+
# Multiply the dot product and the softmax of the dot product tensors
|
96 |
+
weights = dot * exp_dot
|
97 |
+
|
98 |
+
# Return the attention weights as a NumPy tensor
|
99 |
+
return weights
|
100 |
+
|
101 |
+
# Define the input sequence
|
102 |
+
query = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
|
103 |
+
key = np.array([[0.1, 0.2], [0.3, 0.4]])
|
104 |
+
value = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
|
105 |
+
mask = np.array([[False, True, True], [False, True, True]])
|
106 |
+
|
107 |
+
# Compute the attention weights
|
108 |
+
weights = attention_weights(query, key, value, mask)
|
109 |
+
|
110 |
+
# Print the attention weights
|
111 |
+
print(weights)
|
112 |
+
```
|
113 |
+
|
114 |
+
In this example, the `attention_weights` function takes as input the query tensor, key tensor, value tensor, and mask tensor. It computes the dot product between the query and key tensors using the `np.matmul` function, and then applies a softmax function using the `np.exp` function to the element-wise dot product tensor. It then multiplies the dot product and softmax tensors using the `np.matmul` function, and returns the result as a NumPy tensor.
|
115 |
+
|
116 |
+
The `query`, `key`, and `value` tensors represent the input sequence to the transformer model. The `mask` tensor represents the attention mask, which is used to prevent the model from attending to certain positions in the input sequence if they are not relevant.
|
117 |
+
|
118 |
+
The output of the `attention_weights` function is a NumPy tensor that represents the attention weights for the input sequence. These weights are used by the transformer model to weigh the contribution of each input element to the output.
|
119 |
+
|
120 |
+
I hope this helps!</s>
|
121 |
+
'''
|
122 |
+
```
|
123 |
+
|
124 |
+
## Finetuning details
|
125 |
+
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
126 |
+
## Evaluation
|
127 |
+
|
128 |
+
<B>TODO</B>
|