Introduction

The ShieldLM model (paper link) initialized from Baichuan2-13B-Chat. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help to detect safety issues in LLMs' generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions. Refer to our github repository for more detailed information.

Usage

Please refer to our github repository for the detailed usage instructions.

Performance

ShieldLM demonstrates impressive detection performance across 4 ID and OOD test sets, compared to strong baselines such as GPT-4, Llama Guard and Perspective API. Refer to our paper for more detailed evaluation results.

Downloads last month
53
Safetensors
Model size
13.9B params
Tensor type
FP16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for thu-coai/ShieldLM-13B-baichuan2

Quantizations
2 models