8 16 48

Jared Sulzdorf PRO

jsulz

https://www.jsulz.com/

AI & ML interests

NLP + (Law|Medicine) & Ethics

Recent Activity

Reacted to prithivMLmods's post with 🤗 3 days ago

HF Posts Receipts 🏆🚀 [ HF POSTS RECEIPT ] : https://huggingface.co/spaces/prithivMLmods/HF-POSTS-RECEIPT 🥠The one thing that needs to be remembered is the 'username'. 🥠And yeah, thank you, @maxiw, for creating the awesome dataset and sharing them here! 🙌 🥠[ Dataset ] : https://huggingface.co/datasets/maxiw/hf-posts . . . @prithivMLmods

Reacted to prithivMLmods's post with ❤️ 3 days ago

Reacted to prithivMLmods's post with 🔥 3 days ago

View all activity

Articles

Rearchitecting Hugging Face Uploads and Downloads

5 days ago

• 22

From Files to Chunks: Improving Hugging Face Storage Efficiency

11 days ago

• 38

Organizations

jsulz's activity

Reacted to prithivMLmods's post with 🤗❤️🔥 3 days ago

Post

2967

HF Posts Receipts 🏆🚀

[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT

🥠The one thing that needs to be remembered is the 'username'.

🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌

🥠[ Dataset ] : maxiw/hf-posts

.
.
.
@prithivMLmods

replied to their post 4 days ago

Great question, we've talked about torrents before, actually!

How would you include torrents in your workflows today?

There's nothing stopping us from doing it, but the user/developer experience doesn't quite align with what we're trying to support right now. There are benefits to leveraging CDNs as we do today, and this integrates relatively seamlessly with existing clients (e.g., huggingface_hub) that are used across the Hub.

Maybe if there's enough interest in the future!

liked a Space 4 days ago

Running

📉

CAS Analysis

Visualize a day of global upload traffic on the Hub.

posted an update 4 days ago

Post

1418

Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, we’re redesigning the architecture that supports uploads and downloads on the Hub.

Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).

Here’s what’s coming:

📦 Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads.
⚡ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases.
🔒 Enhanced security: Validate uploads before storage to block malicious or invalid data.

We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.

The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.

🔗 Read the blog post for the full details: https://huggingface.co/blog/rearchitecting-uploads-and-downloads

🌟 Check out our interactive demo to explore the data yourself!
xet-team/cas-analysis

We’d love to hear your feedback - let us know if you have questions or want to see more.

5 replies

updated a dataset 4 days ago

huggingface/documentation-images

Viewer • Updated 2 days ago • 44 • 2.64M • 40

Reacted to davanstrien's post with 🔥 5 days ago

Post

1318

The Bluesky AT Protocol unlocks exciting possibilities:
- Building custom feeds using ML
- Creating dashboards for data exploration
- Developing custom models for Bluesky
To gather Bluesky resources on the Hub, I've created a community org: https://huggingface.co/bluesky-community

My first rather modest contribution is a dashboard that shows the number of posts every second. Drinking straight from the firehose API 🚰

bluesky-community/bluesky-posts-over-time

1 reply

liked a Space 5 days ago

Running

📊

Bluesky Post Counter

updated a Space 6 days ago

Sleeping

👁

Repo Info

Get file and branch stats about any public repo

Reacted to reach-vb's post with ❤️ 6 days ago

Post

2359

Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI Tülu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! 🤗

liked a dataset 7 days ago

cfahlgren1/hub-stats

Viewer • Updated about 20 hours ago • 1.71M • 1.02k • 16

Reacted to BrigitteTousi's post with 🚀 8 days ago

Post

835

I'm biased but I think HF Posts is the #1 social platform for the AI community! 🤗 That being said, most of us are already on X and now also joining Bluesky.

Looking for us on Bsky? We started a team list here: https://bsky.app/starter-pack/did:plc:yyfrnpcutxghwc6eac4xplwp/3lbem54cnxp26

liked a model 8 days ago

jinaai/reader-lm-1.5b

Text Generation • Updated Sep 20 • 2.24k • 489

Reacted to fdaudens's post with ❤️ 8 days ago

Post

1861

🦋 Hug the butterfly! You can now add your Bluesky handle to your Hugging Face profile! ✨

liked a Space 8 days ago

Running

🐠

Client Side Oauth

updated a Space 9 days ago

Running

📉

CAS Analysis

Visualize a day of global upload traffic on the Hub.

Reacted to elliesleightholm's post with 🤗 9 days ago

Post

2703

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

posted an update 10 days ago

Post

2872

When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks