YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
StructBERT: Un-Official Copy
Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT
Claimer
- This model card is not produced by AliceMind Team
Reproduce HFHub models:
Download model/tokenizer vocab
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin
from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer
config = AutoConfig.from_pretrained("./config.json")
model = AutoModelForMaskedLM.from_pretrained(".", config=config)
tokenizer = AutoTokenizer.from_pretrained(".", config=config)
model.push_to_hub("structbert-large")
tokenizer.push_to_hub("structbert-large")
https://arxiv.org/abs/1908.04577
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Introduction
We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.
Pre-trained models
Model | Description | #params | Download |
---|---|---|---|
structbert.en.large | StructBERT using the BERT-large architecture | 340M | structbert.en.large |
structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon |
structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | structbert.ch.large |
Results
The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
structbert.en.large
Model | MNLI | QNLIv2 | QQP | SST-2 | MRPC |
---|---|---|---|---|---|
structbert.en.large | 86.86% | 93.04% | 91.67% | 93.23% | 86.51% |
structbert.ch.large
Model | CMNLI | OCNLI | TNEWS | AFQMC |
---|---|---|---|---|
structbert.ch.large | 84.47% | 81.28% | 68.67% | 76.11% |
Example usage
Requirements and Installation
PyTorch version >= 1.0.1
Install other libraries via
pip install -r requirements.txt
- For faster training install NVIDIA's apex library
Finetune MNLI
python run_classifier_multi_task.py \
--task_name MNLI \
--do_train \
--do_eval \
--do_test \
--amp_type O1 \
--lr_decay_factor 1 \
--dropout 0.1 \
--do_lower_case \
--detach_index -1 \
--core_encoder bert \
--data_dir path_to_glue_data \
--vocab_file config/vocab.txt \
--bert_config_file config/large_bert_config.json \
--init_checkpoint path_to_pretrained_model \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--fast_train \
--gradient_accumulation_steps 1 \
--output_dir path_to_output_dir
Citation
If you use our work, please cite:
@article{wang2019structbert,
title={Structbert: Incorporating language structures into pre-training for deep language understanding},
author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
journal={arXiv preprint arXiv:1908.04577},
year={2019}
}
- Downloads last month
- 195
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.