A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper • 2411.12946 • Published 11 days ago • 20
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 • 10 days ago • 32
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published 12 days ago • 16
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • 11 days ago • 84
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 17 days ago • 96
Training with Prompts Collection See the Training with Prompts documentation for more details: https://sbert.net/examples/training/prompts/README.html • 5 items • Updated 23 days ago • 3
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20 • 17
Model2Vec base models Collection These are the Minishlab Model2Vec base models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 7 items • Updated Oct 29 • 8
POTION Collection These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 3 items • Updated Oct 30 • 6
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22 • 41
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 26 days ago • 93
MedEmbed: Embedding Models for Medical Domain Collection GitHub -> https://github.com/abhinand5/MedEmbed • 4 items • Updated Oct 21 • 7