Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
jsulz 
posted an update 1 day ago
Post
829
Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, we’re redesigning the architecture that supports uploads and downloads on the Hub.

Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).

Here’s what’s coming:

📦 Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads.
⚡ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases.
🔒 Enhanced security: Validate uploads before storage to block malicious or invalid data.

We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.

The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.

🔗 Read the blog post for the full details: https://huggingface.co/blog/rearchitecting-uploads-and-downloads

🌟 Check out our interactive demo to explore the data yourself!
xet-team/cas-analysis

We’d love to hear your feedback - let us know if you have questions or want to see more.

Maybe you can also create torrents for popular files?

·

Great question, we've talked about torrents before, actually!

How would you include torrents in your workflows today?

There's nothing stopping us from doing it, but the user/developer experience doesn't quite align with what we're trying to support right now. There are benefits to leveraging CDNs as we do today, and this integrates relatively seamlessly with existing clients (e.g., huggingface_hub) that are used across the Hub.

Maybe if there's enough interest in the future!

very cool!