Edit model card

YAML Metadata Warning: The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, text2text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, any-to-any, other

Model

Llama-2-7b-qlora-moss-003-sft is fine-tuned from Llama-2-7b with moss-003-sft dataset by XTuner.

Quickstart

Usage with HuggingFace libraries

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, StoppingCriteria
from transformers.generation import GenerationConfig

class StopWordStoppingCriteria(StoppingCriteria):
    def __init__(self, tokenizer, stop_word):
        self.tokenizer = tokenizer
        self.stop_word = stop_word
        self.length = len(self.stop_word)
    def __call__(self, input_ids, *args, **kwargs) -> bool:
        cur_text = self.tokenizer.decode(input_ids[0])
        cur_text = cur_text.replace('\r', '').replace('\n', '')
        return cur_text[-self.length:] == self.stop_word

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', trust_remote_code=True)
quantization_config = BitsAndBytesConfig(load_in_4bit=True, load_in_8bit=False, llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', quantization_config=quantization_config, device_map='auto', trust_remote_code=True).eval()
model = PeftModel.from_pretrained(model, 'xtuner/Llama-2-7b-qlora-moss-003-sft')
gen_config = GenerationConfig(max_new_tokens=1024, do_sample=True, temperature=0.1, top_p=0.75, top_k=40)

# Note: In this example, we disable the use of plugins because the API depends on additional implementations.
# If you want to experience plugins, please refer to XTuner CLI!
prompt_template = (
    'You are an AI assistant whose name is Llama2.\n'
    'Capabilities and tools that Llama2 can possess.\n'
    '- Inner thoughts: disabled.\n'
    '- Web search: disabled.\n'
    '- Calculator: disabled.\n'
    '- Equation solver: disabled.\n'
    '- Text-to-image: disabled.\n'
    '- Image edition: disabled.\n'
    '- Text-to-speech: disabled.\n'
    '<|Human|>: {input}<eoh>\n'
    '<|Inner Thoughts|>: None<eot>\n'
    '<|Commands|>: None<eoc>\n'
    '<|Results|>: None<eor>\n')

text = '请给我介绍五个上海的景点'
inputs = tokenizer(prompt_template.format(input=text), return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, generation_config=gen_config, stopping_criteria=[StopWordStoppingCriteria(tokenizer, '<eom>')])
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
"""
好的，以下是五个上海的景点：
1. 外滩：外滩是上海的标志性景点之一，是一条长达1.5公里的沿江大道，沿途有许多历史建筑和现代化的高楼大厦。游客可以欣赏到黄浦江两岸的美景，还可以在这里拍照留念。
2. 上海博物馆：上海博物馆是上海市最大的博物馆之一，收藏了大量的历史文物和艺术品。博物馆内有许多展览，包括中国古代文物、近代艺术品和现代艺术品等。
3. 上海科技馆：上海科技馆是一座以科技为主题的博物馆，展示了许多科技产品和科技发展的历史。游客可以在这里了解到许多有趣的科技知识，还可以参加一些科技体验活动。
4. 上海迪士尼乐园：上海迪士尼乐园是中国第一个迪士尼乐园，是一个集游乐、购物、餐饮、娱乐等多种功能于一体的主题公园。游客可以在这里体验到迪士尼的经典故事和游乐设施。
5. 上海野生动物园：上海野生动物园是一座以野生动物观赏和保护为主题的大型动物园。它位于上海市浦东新区，是中国最大的野生动物园之一。
"""

Usage with XTuner CLI

Installation

pip install -U xtuner

Chat

Don't forget to use huggingface-cli login and input your access token first to access Llama2! See here to learn how to obtain your access token.

export SERPER_API_KEY="xxx"  # Please get the key from https://serper.dev to support google search!
xtuner chat meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-template moss_sft --with-plugins calculate solve search --no-streamer

Fine-tune

Use the following command to quickly reproduce the fine-tuning results.

NPROC_PER_NODE=8 xtuner train llama2_7b_qlora_moss_sft_all_e2_gpu8

Downloads last month: 1,595

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for xtuner/Llama-2-7b-qlora-moss-003-sft

Base model

meta-llama/Llama-2-7b-hf

Adapter

(1239)

this model

Dataset used to train xtuner/Llama-2-7b-qlora-moss-003-sft

Collections including xtuner/Llama-2-7b-qlora-moss-003-sft

Llama

Collection

4 items • Updated Apr 25

MOSS-003-SFT

Collection

2 items • Updated Apr 25