Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3.6
TFLOPS
19
8
42
Gabriel C
gabrielchua
Follow
pwndayma's profile picture
ijohn07's profile picture
nelsonvega's profile picture
80 followers
·
50 following
https://gabrielchua.me
gabrielchua_
gabrielchua
AI & ML interests
None yet
Recent Activity
updated
a Space
1 day ago
govtech/off-topic-demo
posted
an
update
2 days ago
Sharing my first paper! == Large Language Models (LLMs) are powerful, but they're prone to off-topic misuse, where users push them beyond their intended scope. Think harmful prompts, jailbreaks, and misuse. So how do we build better guardrails? Traditional guardrails rely on curated examples or classifiers. The problem? ⚠️ High false-positive rates ⚠️ Poor adaptability to new misuse types ⚠️ Require real-world data, which is often unavailable during pre-production Our method skips the need for real-world misuse examples. Instead, we: 1️⃣ Define the problem space qualitatively 2️⃣ Use an LLM to generate synthetic misuse prompts 3️⃣ Train and test guardrails on this dataset We apply this to the off-topic prompt detection problem, and fine-tune simple bi- and cross-encoder classifiers that outperform heuristics based on cosine similarity or prompt engineering. Additionally, framing the problem as prompt relevance allows these fine-tuned classifiers to generalise to other risk categories (e.g., jailbreak, toxic prompts). Through this work, we also open-source our dataset (2M examples, ~50M+ tokens) and models. paper: https://huggingface.co/papers/2411.12946 artifacts: https://huggingface.co/collections/govtech/off-topic-guardrail-673838a62e4c661f248e81a4
upvoted
a
paper
2 days ago
Harnessing the Potential of Gen-AI Coding Assistants in Public Sector Software Development
View all activity
Organizations
gabrielchua
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
2 days ago
gabrielchua/system-prompt-leakage
Viewer
•
Updated
24 days ago
•
355k
•
38
•
2
liked
a dataset
3 days ago
gabrielchua/off-topic
Viewer
•
Updated
5 days ago
•
2.64M
•
100
•
6
liked
2 models
3 days ago
govtech/jina-embeddings-v2-small-en-off-topic
Updated
6 days ago
•
21
•
2
govtech/stsb-roberta-base-off-topic
Updated
2 days ago
•
17
•
2
liked
a Space
3 days ago
Running
3
🙅
Off Topic Guardrail Demo
liked
a model
12 days ago
intfloat/multilingual-e5-large-instruct
Feature Extraction
•
Updated
Sep 26
•
268k
•
•
239
liked
2 datasets
23 days ago
allenai/WildChat-nontoxic
Viewer
•
Updated
May 6
•
530k
•
45
•
23
allenai/WildChat
Viewer
•
Updated
Oct 17
•
529k
•
1.43k
•
123
liked
a model
about 2 months ago
jxm/cde-small-v1
Feature Extraction
•
Updated
28 days ago
•
12k
•
274
liked
a Space
about 2 months ago
Running
170
⚡
paper-central
liked
a dataset
about 2 months ago
5CD-AI/Viet-Wiki-Handwriting
Viewer
•
Updated
Aug 25
•
5.8k
•
148
•
4
liked
a model
2 months ago
iiiorg/piiranha-v1-detect-personal-information
Token Classification
•
Updated
Sep 13
•
64.1k
•
141
liked
8 models
3 months ago
pints-ai/1.5-Pints-2K-v0.1
Text Generation
•
Updated
Aug 27
•
131
•
16
pints-ai/1.5-Pints-2K-v0.1-GGUF
Text Generation
•
Updated
Aug 27
•
48
•
3
pints-ai/1.5-Pints-16K-v0.1-GGUF
Text Generation
•
Updated
Aug 27
•
47
•
5
mlabonne/Hermes-3-Llama-3.1-70B-lorablated
Text Generation
•
Updated
Oct 16
•
1.13k
•
22
vidore/colpali-v1.2
Updated
28 days ago
•
105k
•
90
govtech/lionguard-v1
Updated
12 days ago
•
78
•
8
keras-io/bert-semantic-similarity
Sentence Similarity
•
Updated
Jul 5
•
42
•
9
tasksource/deberta-small-long-nli
Zero-Shot Classification
•
Updated
Aug 28
•
144k
•
37
Load more