clem (Clem 🤗)

Reacted to andito's post with ❤️🔥 about 6 hours ago

Post

372

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

replied to jsulz's post about 6 hours ago

very cool!

Reacted to jsulz's post with 🔥👀 about 6 hours ago

Post

877

Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, we’re redesigning the architecture that supports uploads and downloads on the Hub.

Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).

Here’s what’s coming:

📦 Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads.
⚡ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases.
🔒 Enhanced security: Validate uploads before storage to block malicious or invalid data.

We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.

The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.

🔗 Read the blog post for the full details: https://huggingface.co/blog/rearchitecting-uploads-and-downloads

🌟 Check out our interactive demo to explore the data yourself!
xet-team/cas-analysis

We’d love to hear your feedback - let us know if you have questions or want to see more.

5 replies

·

Reacted to davanstrien's post with ❤️ 1 day ago

Post

1356

First dataset for the new Hugging Face Bluesky community organisation: bluesky-community/one-million-bluesky-posts 🦋

📊 1M public posts from Bluesky's firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗

Excited to see people build more open tools for a more open social media platform!

Reacted to merve's post with 👀🤗🚀🔥 1 day ago

Post

2063

Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗

Reacted to reach-vb's post with ❤️ 2 days ago

Post

2255

Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI Tülu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! 🤗

posted an update 2 days ago

Post

1749

I've been in Brazil for 10 days now 🇧🇷🇧🇷🇧🇷

I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps.

Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!

Reacted to lin-tan's post with 🔥 5 days ago

Post

1384

Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security

1 reply

·

Reacted to xiaozaa's post with 👍 5 days ago

Post

779

Did a quick conversion from flux1-fill-dev to diffusers. Release here:
xiaozaa/flux1-fill-dev-diffusers

Reacted to TuringsSolutions's post with 👀 5 days ago

Post

797

I created something called 'Hyperbolic Embeddings'. I literally just embed the tokens into Hyperbolic Space instead of Euclidean space. At first, this did not get me the gains I was expecting. I was a sad panda. Then I thought about it, a Hyperbolic Embedding needs a Hyperbolic Optimizer. So, instead of Adam, I used Riemannian Adam (RAdam). "Ladies and Gentlemen, We Got 'Em!"

27 replies

·

Reacted to openfree's post with ❤️🔥 5 days ago

Post

1511

🎉 Reached HuggingFace Trending Top 100 in Just One Day! Introducing Mouse-I

First, we want to thank everyone who helped Mouse-I reach the HuggingFace Spaces Trending Top 100! We're especially excited that a game called "Jewel Pop Game," created using Mouse-I, has reached the global top 160.
With this overwhelming response, we're thrilled to introduce Mouse-I, an AI-powered code generation and automatic deployment tool by Bidraft.

✨ What is Mouse-I?
Mouse-I is an innovative tool that automatically generates and deploys working web services within 60 seconds, simply based on your prompt input.

🚀 Key Features

One-Click Real-time Deployment: Complete from prompt to deployment in just 60 seconds
Real-time Preview: Instantly check your generated code results
40+ Templates: Ready-to-use templates including MBTI tests, investment management tools, Tetris games, and more
Real-time Editing: Instantly modify and apply generated code

⚡ How to Use
Create your own web service in just 3 steps:

Enter your prompt (15 seconds)
Code generation (40 seconds)
Deploy (5 seconds)

🌟 What Makes Us Special

Ultra-fast code generation powered by NVIDIA H100 GPUs
Advanced multi-LLM complex agent technology
All generated web apps available for free viewing and use in our marketplace

🔍 Current Status

Over 3,000 web apps generated, with 160+ successfully deployed
30x faster service completion compared to competing services

🎈 Join Our Beta Test
Try Mouse-I for free right now!
👉 Experience Mouse-I
🔮 Future Plans
We're planning to launch 'Mouse-II', specialized for backend system development, within this year. When used together with Mouse-I, it will enable complete automation of full-stack development.

We look forward to your feedback and suggestions about Mouse-I!
Thank you for your interest and support 🙏
#AI #CodeGeneration #WebDevelopment #HuggingFace #MouseI #Bidraft #AICodeAssistant

https://huggingface.co/spaces/VIDraft/mouse1

Reacted to cfahlgren1's post with ❤️ 5 days ago

Post

828

observers 🔭 - automatically log all OpenAI compatible requests to a dataset💽

• supports any OpenAI compatible endpoint 💪
• supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts

Reacted to merve's post with 👀 5 days ago

Post

1423

Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression 🤯
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with 🤗 transformers

Reacted to singhsidhukuldeep's post with ❤️ 5 days ago

Post

750

Excited to share my analysis of the most groundbreaking DCN-V2 paper from @Google , which introduces significant improvements to deep learning recommendation systems!

Key technical highlights:

>> Core Architecture
- Starts with an embedding layer that handles both sparse categorical and dense features
- Unique capability to handle variable embedding sizes from small to large vocabulary sizes
- Cross network creates explicit bounded-degree feature interactions
- Deep network complements with implicit feature interactions
- Two combination modes: stacked and parallel architectures

>> Key Technical Innovations
- Enhanced cross layers with full matrix-based feature interaction learning instead of vector-based
- Mixture of Low-Rank architecture with:
* Multiple expert networks learning in different subspaces
* Dynamic gating mechanism to adaptively combine experts
* Efficient time complexity when specific conditions are met
* Support for non-linear transformations in projected spaces

>> Production Optimizations
- Low-rank matrix approximation leveraging singular value decay patterns
- Mixture-of-Experts decomposition into smaller subspaces
- Efficient parameter allocation between cross and deep networks
- Automatic feature interaction learning for higher-order interactions in multi-layered networks
- Support for both homogeneous and heterogeneous polynomial patterns

>> Real-World Impact
- Successfully deployed across Google's recommendation systems
- Significant gains in both offline accuracy and online metrics
- Better performance-latency tradeoffs through low-rank approximations
- Proven effectiveness on large-scale data with billions of training examples

This represents a major leap forward in making deep learning recommendation systems more practical and efficient at scale.

Thoughts? Would love to hear your experiences implementing similar architectures in production!

Clem 🤗 PRO

AI & ML interests

Recent Activity

Organizations

clem's activity