microsoft
/

deberta-v2-xxlarge-mnli

Text Classification

Inference Endpoints

Model card Files Files and versions Community

DeBERTa commited on May 21, 2021

Commit

8f8b43d

•

1 Parent(s): 5385f0f

Update README.md

Files changed (1) hide show

README.md +17 -17

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ widget:
 ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
-[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
@@ -51,20 +51,20 @@ export TASK_NAME=rte
 output_dir="ds_results"
 num_gpus=8
 batch_size=4
-python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
-  run_glue.py \
-  --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \
-  --task_name $TASK_NAME \
-  --do_train \
-  --do_eval \
-  --max_seq_length 256 \
-  --per_device_train_batch_size ${batch_size} \
-  --learning_rate 3e-6 \
-  --num_train_epochs 3 \
-  --output_dir $output_dir \
-  --overwrite_output_dir \
-  --logging_steps 10 \
-  --logging_dir $output_dir \
   --deepspeed ds_config.json
 ```
@@ -72,8 +72,8 @@ You can also run with `--sharded_ddp`
 ```bash
 cd transformers/examples/text-classification/
 export TASK_NAME=rte
-python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge-mnli   \
---task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 4   \
 --learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
 ```

 ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
+[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on  majority of NLU tasks with 80GB training data.
 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
 output_dir="ds_results"
 num_gpus=8
 batch_size=4
+python -m torch.distributed.launch --nproc_per_node=${num_gpus} \\
+  run_glue.py \\
+  --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \\
+  --task_name $TASK_NAME \\
+  --do_train \\
+  --do_eval \\
+  --max_seq_length 256 \\
+  --per_device_train_batch_size ${batch_size} \\
+  --learning_rate 3e-6 \\
+  --num_train_epochs 3 \\
+  --output_dir $output_dir \\
+  --overwrite_output_dir \\
+  --logging_steps 10 \\
+  --logging_dir $output_dir \\
   --deepspeed ds_config.json
 ```
 ```bash
 cd transformers/examples/text-classification/
 export TASK_NAME=rte
+python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge-mnli   \\
+--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 4   \\
 --learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
 ```