wissamantoun's picture
Librarian Bot: Add base_model information to model (#2)
e7bc810 verified
metadata
language:
  - en
  - fr
license: mit
tags:
  - lm-detection
datasets:
  - hc3_multi_custom_ms_hg
metrics:
  - f1
base_model: xlm-roberta-base
model-index:
  - name: xlmr-chatgptdetect-noisy
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: HC3 FULL_MULTI_1.0_0.5_0.5
          type: glue
          config: full_multi_1.0_0.5_0.5
          split: vsl
          args: full_multi_1.0_0.5_0.5
        metrics:
          - type: f1
            value: 0.963274059512108
            name: F1

xlmr-chatgptdetect-noisy

Multilingual ChatGPT detection model from Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect?

This model is a fine-tuned version of xlm-roberta-base on the HC3 FULL_MULTI_1.0_0.5_0.5 dataset with noise added. It achieves the following results on the:

Evaluation set:

  • Loss: 0.1573
  • F1: 0.9633

Test Set:

  • F1: 0.97

Adversarial:

  • F1: 0.45

Model description

This a model trained to detect text created by ChatGPT in French. The training data is the combination of the hc3_fr_full and hc3_en_full subsets of almanach/hc3_multi, but with added misspelling and homoglyph attacks.

Intended uses & limitations

This model is for research purposes only. It is not intended to be used in production as we said in our paper:

We would like to emphasize that our study does not claim to have produced an universally accurate detector. Our strong results are based on in-domain testing and, unsurprisingly, do not generalize in out-of-domain scenarios. This is even more so when used on text specifically designed to fool language model detectors and on text intentionally stylistically similar to ChatGPT-generated text, especially instructional text.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 1
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss F1
0.0317 1.0 8538 0.1732 0.9492
0.008 2.0 17076 0.3541 0.9270
0.0085 3.0 25614 0.1161 0.9726
0.0015 4.0 34152 0.2557 0.9516
0.0 5.0 42690 0.2286 0.9650

Framework versions

  • Transformers 4.26.1
  • Pytorch 1.11.0+cu115
  • Datasets 2.8.0
  • Tokenizers 0.13.2