Configurable Safety Tuning ⚙️ Collection CST allows for configurable inference-time control of LLM safety levels, so users can dictate model behavior based on the system prompt • 11 items • Updated Oct 27 • 2
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20 • 24
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper • 2410.13334 • Published Oct 17 • 12
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135