Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model has been trained on massive Chinese plain-text open-domain dialogues following the approach described in Re$^3$Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training. The associated Github repository is available here https://github.com/thu-coai/Re3Dial.

Usage

from transformers import BertTokenizer, BertModel
import torch


def get_embedding(encoder, inputs):
    outputs = encoder(**inputs)
    pooled_output = outputs[0][:, 0, :]
    return pooled_output

tokenizer = BertTokenizer.from_pretrained('xwwwww/bert-chinese-dialogue-retriever-query')
tokenizer.add_tokens(['<uttsep>'])
query_encoder = BertModel.from_pretrained('xwwwww/bert-chinese-dialogue-retriever-query')
context_encoder = BertModel.from_pretrained('xwwwww/bert-chinese-dialogue-retriever-context')

query = '你好<uttsep>好久不见,最近在干嘛'
context = '正在准备考试<uttsep>是什么考试呀,很辛苦吧'

query_inputs = tokenizer([query], return_tensors='pt')
context_inputs = tokenizer([context], return_tensors='pt')

query_embedding = get_embedding(query_encoder, query_inputs)
context_embedding = get_embedding(context_encoder, context_inputs)

score = torch.cosine_similarity(query_embedding, context_embedding, dim=1)

print('similarity score = ', score)
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.