Yacine Jernite

yjernite

AI & ML interests

Technical, community, and regulatory tools of AI governance @HuggingFace

Recent Activity

Articles

Organizations

yjernite's activity

upvoted an article 1 day ago
view article
Article

Let’s make a generation of amazing image generation models

By burtenshaw β€’
β€’ 29
Reacted to cfahlgren1's post with ❀️ 7 days ago
view post
Post
2900
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
Β·
Reacted to fdaudens's post with πŸ”₯ 14 days ago
view post
Post
1830
Fascinating point from @thomwolf at Web Summit: AI misuse (deepfakes, fake news) is actually easier to make with closed models, not with open-source ones.

This challenges the common narrative that open-source AI is inherently more dangerous. The reality is more nuanced - while we may think open source is technically easier to misuse, closed models' accessibility and product-focused design appear to be driving more actual harm.

Important context for current AI safety discussions and regulation debates.

Do you agree? πŸ‘‡
  • 1 reply
Β·
upvoted an article 14 days ago
view article
Article

Releasing the largest multilingual open pretraining dataset

By Pclanglais β€’
β€’ 95
upvoted an article 27 days ago
liked a Space 29 days ago
updated a Space about 1 month ago