8 16 48

Jared Sulzdorf PRO

jsulz

https://www.jsulz.com/

AI & ML interests

NLP + (Law|Medicine) & Ethics

Recent Activity

Reacted to prithivMLmods's post with 🤗 about 5 hours ago

HF Posts Receipts 🏆🚀 [ HF POSTS RECEIPT ] : https://huggingface.co/spaces/prithivMLmods/HF-POSTS-RECEIPT 🥠The one thing that needs to be remembered is the 'username'. 🥠And yeah, thank you, @maxiw, for creating the awesome dataset and sharing them here! 🙌 🥠[ Dataset ] : https://huggingface.co/datasets/maxiw/hf-posts . . . @prithivMLmods

Reacted to prithivMLmods's post with ❤️ about 5 hours ago

Reacted to prithivMLmods's post with 🔥 about 5 hours ago

View all activity

Articles

Rearchitecting Hugging Face Uploads and Downloads

2 days ago

• 18

From Files to Chunks: Improving Hugging Face Storage Efficiency

8 days ago

• 38

Organizations

Posts 4

Post

829

Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, we’re redesigning the architecture that supports uploads and downloads on the Hub.

Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).

Here’s what’s coming:

📦 Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads.
⚡ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases.
🔒 Enhanced security: Validate uploads before storage to block malicious or invalid data.

We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.

The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.

🔗 Read the blog post for the full details: https://huggingface.co/blog/rearchitecting-uploads-and-downloads

🌟 Check out our interactive demo to explore the data yourself!
xet-team/cas-analysis

We’d love to hear your feedback - let us know if you have questions or want to see more.

Post

2848

When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks

View all posts