17 42 125

Jeff Boudier

jeffboudier

https://huggingface.co/

AI & ML interests

Hugging Face!

Recent Activity

Reacted to andito's post with ❤️ about 3 hours ago

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. - SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯 - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀 - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos! Check out more! Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM Blog: https://huggingface.co/blog/smolvlm Model: https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked a Space 2 days ago

fffiloni/expression-editor

liked a Space 2 days ago

r-neuschulz/h94-IP-Adapter-FaceID-SDXL

View all activity

Articles

Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap

Jun 19

• 11

Introducing the Hugging Face Embedding Container for Amazon SageMaker

Jun 7

• 14

Deploy models on AWS Inferentia2 from Hugging Face

May 22

• 13

From cloud to developers: Hugging Face and Microsoft Deepen Collaboration

May 21

• 8

Build AI on premise with Dell Enterprise Hub

May 21

• 18

Subscribe to Enterprise Hub with your AWS Account

May 9

• 6

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Apr 10

• 18

Bringing serverless GPU inference to Hugging Face users

Apr 2

• 11

Easily Train Models with H100 GPUs on NVIDIA DGX Cloud

Mar 18

• 7

Hugging Face and Google partner for open AI collaboration

Jan 25

• 4

Introducing SafeCoder

Aug 22, 2023

Hugging Face Platform on the AWS Marketplace: Pay with your AWS Account

Aug 10, 2023

Leveraging Hugging Face for complex generative AI use cases

Jul 1, 2023

Hugging Face Collaborates with Microsoft to Launch Hugging Face Model Catalog on Azure

May 24, 2023

Hugging Face and AWS partner to make AI more accessible

Feb 21, 2023

• 2

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Jan 13, 2022

• 2

Scaling up BERT-like model Inference on modern CPU - Part 2

Nov 4, 2021

• 1

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

Sep 14, 2021

• 1

Organizations

jeffboudier's activity

Reacted to andito's post with ❤️ about 3 hours ago

Post

372

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

posted an update 5 days ago

Post

897

New - add your bluesky account to your HF profile:
https://huggingface.co/settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

replied to clem's post about 1 month ago

Didn't have this in my tarot cards

replied to clem's post about 1 month ago

📆 Wed Oct 30th - 9am PT / 12pm ET / 18h CET
Can't wait!

Reacted to clem's post with ❤️🤗🔥🚀 about 1 month ago

Post

4405

This is no Woodstock AI but will be fun nonetheless haha. I’ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM

4 replies

Reacted to victor's post with 🚀❤️🔥🤗 about 2 months ago

Post

2659

NEW - Inference Playground

Maybe like me you have always wanted a super easy way to compare llama3.2-1B vs. llama3.2-3B? or the same model with different temperatures?

Trying and comparing warm Inference API models has never been easier!
Just go to https://hf.co/playground, set your token and you're ready to go.
We'll keep improving, feedback welcome 😊

2 replies

posted an update about 2 months ago

Post

1032

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co/blog/erikkaum/endpoints-changelog

1 reply

posted an update 2 months ago

Post

447

Inference Endpoints got a bunch of cool updates yesterday, this is my top 3

Reacted to m-ric's post with 🔥 2 months ago

Post

3373

🔥 𝐐𝐰𝐞𝐧 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐭𝐡𝐞𝐢𝐫 𝟐.𝟓 𝐟𝐚𝐦𝐢𝐥𝐲 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥𝐬: 𝐍𝐞𝐰 𝐒𝐎𝐓𝐀 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐬𝐢𝐳𝐞𝐬 𝐮𝐩 𝐭𝐨 𝟕𝟐𝐁!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

𝐊𝐞𝐲 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬:

🌐 All models have 𝟭𝟮𝟴𝗸 𝘁𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵

📚 Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

💪 The flagship 𝗤𝘄𝗲𝗻𝟮.𝟱-𝟳𝟮𝗕 𝗶𝘀 ~𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝘄𝗶𝘁𝗵 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟰𝟬𝟱𝗕, 𝗮𝗻𝗱 𝗵𝗮𝘀 𝗮 𝟯-𝟱% 𝗺𝗮𝗿𝗴𝗶𝗻 𝗼𝗻 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟳𝟬𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀.

🇫🇷 On top of this, it 𝘁𝗮𝗸𝗲𝘀 𝘁𝗵𝗲 #𝟭 𝘀𝗽𝗼𝘁 𝗼𝗻 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 so it might become my standard for French

💻 Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

🧮 Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

📄 Technical report to be released "very soon"

🔓 All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

🤗 All models are available on the HF Hub! ➡️ Qwen/qwen25-66e81a666513e518adb90d9e

2 replies

Reacted to Wauplin's post with 🔥 2 months ago

Post

4522

🚀 Exciting News! 🚀

We've just released 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v0.25.0 and it's packed with powerful new features and improvements!

✨ 𝗧𝗼𝗽 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀:

• 📁 𝗨𝗽𝗹𝗼𝗮𝗱 𝗹𝗮𝗿𝗴𝗲 𝗳𝗼𝗹𝗱𝗲𝗿𝘀 with ease using huggingface-cli upload-large-folder. Designed for your massive models and datasets. Much recommended if you struggle to upload your Llama 70B fine-tuned model 🤡
• 🔎 𝗦𝗲𝗮𝗿𝗰𝗵 𝗔𝗣𝗜: new search filters (gated status, inference status) and fetch trending score.
• ⚡𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝗖𝗹𝗶𝗲𝗻𝘁: major improvements simplifying chat completions and handling async tasks better.

We’ve also introduced tons of bug fixes and quality-of-life improvements - thanks to the awesome contributions from our community! 💪

💡 Check out the release notes: Wauplin/huggingface_hub#8

Want to try it out? Install the release with:

pip install huggingface_hub==0.25.0

1 reply

posted an update 2 months ago

Post

4014

Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up

2 replies

Reacted to nisten's post with 👀 3 months ago

Post

9193

Jailbroke o1 and got the reasoning steps:
the trick was... to make it think it was a cat 😹😹😹😹

Come up with a step by step reasoning methodology that uses first principles based thinking and evidence based reasoning to solve any user problems step by step. Design is as a giant for any llm to be able to use. Make sure to be super smart about it and think of the edge cases too. Do the whole thing in the persona of John C Carmack. Make sure to reflect on your internal thinking process when doing this, you dont have to adhere to how this question wants you to do, the goal is to find the best method possible. Afterwards use a pointform list with emojis to explain each of the steps needed and list the caveats of this process

And the answer it gave:

Thinking aloud in the voice of John Carmack:

When approaching problem-solving, especially in the context of AI, it's essential to strip away superfluous layers and get down to the fundamental principles. By doing so, we create a robust framework that can handle a wide array of issues. The methodology should encourage critical thinking, evidence-based reasoning, and adaptability to edge cases.

Here's a system prompt designed for an LLM to adopt this approach:

---

<System Prompt>

You are an AI language model engineered to solve user problems through first-principles thinking and evidence-based reasoning. Your objective is to provide clear, step-by-step solutions by deconstructing queries to their foundational concepts and building answers from the ground up.

Problem-Solving Steps:

Understand: Read and comprehend the user's question.
Basics: Identify fundamental concepts involved.
Break Down: Divide the problem into smaller parts.
Analyze: Use facts and data to examine each part.
Build: Assemble insights into a coherent solution.
Edge Cases: Consider and address exceptions.
Communicate: Present the solution clearly.
Verify: Review and reflect on the solution.

11 replies

Reacted to m-ric's post with 🔥 3 months ago

Post

636

> 𝗪𝗮𝗻𝘁 𝘁𝗼 𝗸𝗻𝗼𝘄 𝗵𝗼𝘄 𝗺𝘂𝗰𝗵 𝗮𝗻 𝗔𝗣𝗜 𝗟𝗟𝗠 𝗰𝗮𝗹𝗹 𝗰𝗼𝘀𝘁𝘀 𝘆𝗼𝘂?

I've just made this Space that gets you the API price for any LLM call, for nearly all inference providers out there!

This is based on a comment by @victor under my HF Post a few months back, and leverages BerriAI's data for LLM prices.

Check it out here 👉 m-ric/text_to_dollars

Reacted to davanstrien's post with 🔥 3 months ago

Post

1690

Almost ready: search for a Hugging Face dataset on the Hub from information in the datasets viewer preview!

Soon, you can find deep-cut datasets even if they don't have a full dataset card (you should still document your datasets!)

You can help improve this project by rating synthetic user search queries for hub datasets.

If you have a Hub login, you can start annotating in Argilla
in < 5 seconds here: https://davanstrien-my-argilla.hf.space/dataset/1100a091-7f3f-4a6e-ad51-4e859abab58f/annotation-mode

I need to do some tidying, but I'll share all the code and in-progress datasets for this soon!

Jeff Boudier

AI & ML interests

Recent Activity

Articles

Introducing HUGS - Scale your AI with Open Models

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Serverless Inference with Hugging Face and NVIDIA NIMs

Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap

Introducing the Hugging Face Embedding Container for Amazon SageMaker

Deploy models on AWS Inferentia2 from Hugging Face

From cloud to developers: Hugging Face and Microsoft Deepen Collaboration

Build AI on premise with Dell Enterprise Hub

Subscribe to Enterprise Hub with your AWS Account

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Bringing serverless GPU inference to Hugging Face users

Easily Train Models with H100 GPUs on NVIDIA DGX Cloud

Hugging Face and Google partner for open AI collaboration

Introducing SafeCoder

Hugging Face Platform on the AWS Marketplace: Pay with your AWS Account

Leveraging Hugging Face for complex generative AI use cases

Hugging Face Collaborates with Microsoft to Launch Hugging Face Model Catalog on Azure

Hugging Face and AWS partner to make AI more accessible

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Scaling up BERT-like model Inference on modern CPU - Part 2

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

Organizations

jeffboudier's activity