|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- BAAI/COIG-PC |
|
language: |
|
- zh |
|
library_name: transformers |
|
pipeline_tag: question-answering |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is an experimental product that can be used to create new LLM bassed on Chinese language. It has been created based on [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
- **Developed by:** yjf9966 |
|
- **Model type:** LLaMA with enhanced tokenizer-size-49964 |
|
- **Language(s) (NLP):** Chinese |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://huggingface.co/BlueWhaleX/bwx-13B-HF |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
You can use the raw model for next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. |
|
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. |
|
It also inherits some of the bias of its dataset model. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
import torch |
|
import transformers |
|
from transformers import LlamaTokenizer, LlamaForCausalLM |
|
|
|
def generate_prompt(text): |
|
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n" + |
|
### Instruction:\n\n{text}\n\n### Response:\n\n""" |
|
|
|
tokenizer = LlamaTokenizer.from_pretrained('BlueWhaleX/bwx-13B-HF') |
|
model = LlamaForCausalLM.from_pretrained('BlueWhaleX/bwx-13B-HF').half().cuda() |
|
model.eval() |
|
|
|
text = '王国维说:“自周之衰,文王、周公势力之瓦解也,国民之智力成熟于内,政治之纷乱乘之于外,上无统一之制度,下迫于社会之要求,于是诸于九流各创其学说。” 他意在说明 A. 分封制的崩溃 B. 商鞅变法的作用 C. 兼并战争的后果 D. 百家争鸣的原因' |
|
prompt = generate_prompt(text) |
|
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda') |
|
|
|
with torch.no_grad(): |
|
output_ids = model.generate( |
|
input_ids=input_ids, |
|
max_new_tokens=400, |
|
temperature=0.2, |
|
top_k=40, |
|
top_p=0.9, |
|
repetition_penalty=1.3 |
|
).cuda() |
|
output = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
|
response = output.split("### Response:")[1].strip() |
|
print("Response: ", response, '\n') |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
BAAI/COIG-PC |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Preprocessing [optional] |
|
|
|
80% for train dataset and 20% for test dataset |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** fp16 mixed precision, lr=1e-4, lora_rank=8, lora_alpha=32 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
|
|
## Evaluation |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Data Card if possible. --> |
|
20% of the BAAI/COIG-PC dataset. |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
``` |
|
@software{bwx-13B-HF, |
|
title={An Enchanced Chinese Language Model based on the Chinese-Alpaca}, |
|
url={https://huggingface.co/BlueWhaleX/bwx-13B-HF}, |
|
year={2023} |
|
} |
|
``` |