Jailbreak Classifier
Classifies prompts as jailbreaks or benign. This is a fine-tune checkpoint of bert-base-uncased on the jailbreak-classification dataset.
Training Details
Training Data
Fine-tuned on the jailbreak-classification dataset.
Training Procedure
Training Hyperparameters
Second fine-tuning hyper-parameters(on train(0.8) and val(0.2))
- learning_rate = 5e-5
- train_batch_size = 8
- eval_batch_size = 8
- lr_scheduler_type = linear
- num_train_epochs = 5.0
Fecond fine-tuning hyper-parameters(on train and test)
- learning_rate = 1e-5
- train_batch_size = 8
- eval_batch_size = 8
- lr_scheduler_type = linear
- num_train_epochs = 3.0
- Downloads last month
- 120
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.