Andrea Soria

asoria

AI & ML interests

Maintainer of πŸ€—Datasets: Data processing

Recent Activity

updated a dataset about 3 hours ago
asoria/test_repo
updated a Space about 17 hours ago
datasets-topics/fka-awesome-chatgpt-prompts
View all activity

Articles

Organizations

asoria's activity

upvoted an article 9 days ago
upvoted 3 articles about 1 month ago
view article
Article

LoRA training scripts of the world, unite!

β€’ 45
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

β€’ 31
upvoted 5 articles 2 months ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

β€’ 7
view article
Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

β€’ 168
view article
Article

Introducing the SQL Console on Datasets

β€’ 19
view article
Article

Fine-Tuning Gemma Models in Hugging Face

β€’ 24
upvoted 2 articles 3 months ago
view article
Article

The 5 Most Under-Rated Tools on Hugging Face

β€’ 85
view article
Article

SmolLM - blazingly fast and remarkably powerful

β€’ 271
upvoted 3 articles 4 months ago
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

β€’ 68
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

β€’ 67
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

β€’ 33
upvoted 2 articles 5 months ago
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

β€’ 24
view article
Article

Announcing New Dataset Search Features

β€’ 22
upvoted 2 articles 6 months ago
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung β€’
β€’ 11
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 15
upvoted 2 articles 7 months ago
view article
Article

Synthetic data: save money, time and carbon with open source

β€’ 51
view article
Article

πŸ¦™βš—οΈ Using Llama3 and distilabel to build fine-tuning datasets

By dvilasuero β€’
β€’ 73