19 8 42

Gabriel C

gabrielchua

https://gabrielchua.me

AI & ML interests

None yet

Recent Activity

updated a Space 1 day ago

govtech/off-topic-demo

posted an update 2 days ago

Sharing my first paper! == Large Language Models (LLMs) are powerful, but they're prone to off-topic misuse, where users push them beyond their intended scope. Think harmful prompts, jailbreaks, and misuse. So how do we build better guardrails? Traditional guardrails rely on curated examples or classifiers. The problem? ⚠️ High false-positive rates ⚠️ Poor adaptability to new misuse types ⚠️ Require real-world data, which is often unavailable during pre-production Our method skips the need for real-world misuse examples. Instead, we: 1️⃣ Define the problem space qualitatively 2️⃣ Use an LLM to generate synthetic misuse prompts 3️⃣ Train and test guardrails on this dataset We apply this to the off-topic prompt detection problem, and fine-tune simple bi- and cross-encoder classifiers that outperform heuristics based on cosine similarity or prompt engineering. Additionally, framing the problem as prompt relevance allows these fine-tuned classifiers to generalise to other risk categories (e.g., jailbreak, toxic prompts). Through this work, we also open-source our dataset (2M examples, ~50M+ tokens) and models. paper: https://huggingface.co/papers/2411.12946 artifacts: https://huggingface.co/collections/govtech/off-topic-guardrail-673838a62e4c661f248e81a4

upvoted a paper 2 days ago

Harnessing the Potential of Gen-AI Coding Assistants in Public Sector Software Development

View all activity

Organizations

gabrielchua's activity

liked a dataset 2 days ago

gabrielchua/system-prompt-leakage

Viewer • Updated 24 days ago • 355k • 38 • 2

liked a dataset 3 days ago

gabrielchua/off-topic

Viewer • Updated 5 days ago • 2.64M • 100 • 6

liked 2 models 3 days ago

govtech/jina-embeddings-v2-small-en-off-topic

Updated 6 days ago • 21 • 2

govtech/stsb-roberta-base-off-topic

Updated 2 days ago • 17 • 2

liked a Space 3 days ago

Running

🙅

Off Topic Guardrail Demo

liked a model 12 days ago

intfloat/multilingual-e5-large-instruct

Feature Extraction • Updated Sep 26 • 268k • • 239

liked 2 datasets 23 days ago

allenai/WildChat-nontoxic

Viewer • Updated May 6 • 530k • 45 • 23

allenai/WildChat

Viewer • Updated Oct 17 • 529k • 1.43k • 123

liked a model about 2 months ago

jxm/cde-small-v1

Feature Extraction • Updated 28 days ago • 12k • 274

liked a Space about 2 months ago

Running

170

⚡

paper-central

liked a dataset about 2 months ago

5CD-AI/Viet-Wiki-Handwriting

Viewer • Updated Aug 25 • 5.8k • 148 • 4

liked a model 2 months ago

iiiorg/piiranha-v1-detect-personal-information

Token Classification • Updated Sep 13 • 64.1k • 141

liked 8 models 3 months ago

keras-io/bert-semantic-similarity

tasksource/deberta-small-long-nli

Zero-Shot Classification • Updated Aug 28 • 144k • 37