Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
daniel-de-leonΒ 
posted an update Oct 23
Post
2396
As the rapid adoption of chat bots and QandA models continues, so do the concerns for their reliability and safety. In response to this, many state-of-the-art models are being tuned to act as Safety Guardrails to protect against malicious usage and avoid undesired, harmful output. I published a Hugging Face blog introducing a simple, proof-of-concept, RoBERTa-based LLM that my team and I finetuned to detect toxic prompt inputs into chat-style LLMs. The article explores some of the tradeoffs of fine-tuning larger decoder vs. smaller encoder models and asks the question if "simpler is better" in the arena of toxic prompt detection.

πŸ”— to blog: https://huggingface.co/blog/daniel-de-leon/toxic-prompt-roberta
πŸ”— to model: Intel/toxic-prompt-roberta
πŸ”— to OPEA microservice: https://github.com/opea-project/GenAIComps/tree/main/comps/guardrails/toxicity_detection

A huge thank you to my colleagues that helped contribute: @qgao007 , @mitalipo , @ashahba and Fahim Mohammad
In this post