1 2

Banerjee

port8080

port8080

AI & ML interests

datasets

Recent Activity

Reacted to jsulz's post with 🔥 7 days ago

When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in. Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means: ⏩ Only upload the chunks that changed. 🚀 Download just the updates, not the whole file. 🧠 We store your file as deduplicated chunks In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub. We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows? https://huggingface.co/blog/from-files-to-chunks

Reacted to erinys's post with 🚀 about 1 month ago

🌍 Super cool visualization of global PUT requests to Hugging Face over 24 hours, coded by object size, thanks to @port8080! We're putting this analysis to work to help us architect a more geo-distributed system for the HF storage backend. Originally shared on LinkedIn: https://www.linkedin.com/posts/ajitbanerjee_one-of-the-joys-of-working-on-the-xethub-activity-7252688424732614656-tFGD

New activity about 2 months ago

xet-team/lfs-analysis:LFS Analysis Roadmap

View all activity

Articles

Rearchitecting Hugging Face Uploads and Downloads

2 days ago

• 18

Organizations

port8080's activity

Reacted to jsulz's post with 🔥 7 days ago

Post

2849

When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks

Reacted to erinys's post with 🚀 about 1 month ago

Post

2138

🌍 Super cool visualization of global PUT requests to Hugging Face over 24 hours, coded by object size, thanks to @port8080 !

We're putting this analysis to work to help us architect a more geo-distributed system for the HF storage backend.

Originally shared on LinkedIn: https://www.linkedin.com/posts/ajitbanerjee_one-of-the-joys-of-working-on-the-xethub-activity-7252688424732614656-tFGD

New activity in xet-team/lfs-analysis about 2 months ago

LFS Analysis Roadmap

#3 opened about 2 months ago by

jsulz

upvoted an article about 2 months ago

Article

Improving Parquet Dedupe on Hugging Face Hub

Oct 5

• 31

upvoted an article 4 months ago

Article

XetHub is joining Hugging Face!

Aug 8

• 80