Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
Banerjee
port8080
Follow
erinys's profile picture
lunarflu's profile picture
Akash20000's profile picture
11 followers
·
2 following
port8080
AI & ML interests
datasets
Recent Activity
Reacted to
jsulz
's
post
with 🔥
7 days ago
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in. Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means: ⏩ Only upload the chunks that changed. 🚀 Download just the updates, not the whole file. 🧠 We store your file as deduplicated chunks In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub. We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows? https://huggingface.co/blog/from-files-to-chunks
Reacted to
erinys
's
post
with 🚀
about 1 month ago
🌍 Super cool visualization of global PUT requests to Hugging Face over 24 hours, coded by object size, thanks to @port8080! We're putting this analysis to work to help us architect a more geo-distributed system for the HF storage backend. Originally shared on LinkedIn: https://www.linkedin.com/posts/ajitbanerjee_one-of-the-joys-of-working-on-the-xethub-activity-7252688424732614656-tFGD
New activity
about 2 months ago
xet-team/lfs-analysis:
LFS Analysis Roadmap
View all activity
Articles
Rearchitecting Hugging Face Uploads and Downloads
2 days ago
•
18
Organizations
models
None public yet
datasets
None public yet