Tom Aarsen

tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

Articles

Organizations

Hugging Face's profile picture Sentence Transformers's profile picture Sentence Transformers - Cross-Encoders's profile picture SetFit's profile picture Hugging Face Fellows's profile picture Massive Text Embedding Benchmark's profile picture Open-Source AI Meetup's profile picture Nomic AI's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Sentence Transformers Testing's profile picture mLLM multilingual's profile picture Social Post Explorers's profile picture gg-tt's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Bert ... but new's profile picture

tomaarsen's activity

upvoted an article 9 days ago
upvoted an article 10 days ago
upvoted an article 17 days ago
view article
Article

Releasing the largest multilingual open pretraining dataset

96
upvoted an article 26 days ago
view article
Article

Releasing Common Corpus: the largest public domain dataset for training LLMs

17
upvoted an article about 1 month ago
upvoted 3 articles about 1 month ago
view article
Article

Visually Multilingual: Introducing mcdse-2b

By marco
37
view article
Article

Releasing Outlines-core 0.1.0: structured generation in Rust and Python

41
view article
Article

Transformers.js v3: WebGPU support, new models & tasks, and more…

65
upvoted an article about 1 month ago