Model Card for Model ID
In this repo are LoRa weights of the zephyr-7b-beta model (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) finetuned with the Continuous Adversarial Training (CAT) algorithm. For more information, see our paper "Efficient Adversarial Training in LLMs with Continuous Attacks" (https://arxiv.org/abs/2405.15589)
Github
https://github.com/sophie-xhonneux/Continuous-AdvTrain/edit/master/README.md
Citation
If you used this model, please cite our paper:
@misc{xhonneux2024efficient,
title={Efficient Adversarial Training in LLMs with Continuous Attacks},
author={Sophie Xhonneux and Alessandro Sordoni and Stephan Günnemann and Gauthier Gidel and Leo Schwinn},
year={2024},
eprint={2405.15589},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 7,369
Model tree for ContinuousAT/Zephyr-CAT
Base model
mistralai/Mistral-7B-v0.1
Finetuned
HuggingFaceH4/zephyr-7b-beta