Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, weβre redesigning the architecture that supports uploads and downloads on the Hub.
Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).
Hereβs whatβs coming:
π¦ Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads. β‘ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases. π Enhanced security: Validate uploads before storage to block malicious or invalid data.
We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.
The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.
We are excited to announce a new internal project, Rome, focused on advancing LLM reasoning. The code and accompanying paper will be released soon. Stay tuned!