Banerjee's picture
1 2

Banerjee

port8080

AI & ML interests

datasets

Recent Activity

New activity about 2 months ago
xet-team/lfs-analysis:LFS Analysis Roadmap
View all activity

Articles

Organizations

port8080's activity

Reacted to jsulz's post with 🔥 7 days ago
view post
Post
2849
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks
Reacted to erinys's post with 🚀 about 1 month ago
New activity in xet-team/lfs-analysis about 2 months ago

LFS Analysis Roadmap

#3 opened about 2 months ago by jsulz
upvoted an article about 2 months ago
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

31
upvoted an article 4 months ago
view article
Article

XetHub is joining Hugging Face!

80