Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
1
1
Catherine Arnett
catherinearnett
Follow
genesith's profile picture
thomwolf's profile picture
m-ric's profile picture
16 followers
·
4 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
upvoted
an
article
17 days ago
Releasing the largest multilingual open pretraining dataset
View all activity
Articles
Releasing the largest multilingual open pretraining dataset
17 days ago
•
96
Detoxifying the Commons
about 1 month ago
•
6
wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR??
Sep 27
•
36
Organizations
catherinearnett
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
6 months ago
ambean/lingOly
Viewer
•
Updated
Jun 11
•
90
•
115
•
7