Brigitte Tousignant

BrigitteTousi

AI & ML interests

None yet

Recent Activity

View all activity

Articles

Organizations

BrigitteTousi's activity

Reacted to victor's post with πŸ”₯ about 8 hours ago
view post
Post
1832
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer πŸ”₯
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights πŸš€.
Reacted to andito's post with πŸ”₯ about 8 hours ago
view post
Post
372
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🀯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! πŸš€
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Reacted to nataliaElv's post with πŸ‘€ about 8 hours ago
view post
Post
473
Would you like to get a high-quality dataset to pre-train LLMs in your language? 🌏

At Hugging Face we're preparing a collaborative annotation effort to build an open-source multilingual dataset as part of the Data is Better Together initiative.

Follow the link below, check if your language is listed and sign up to be a Language Lead!

https://forms.gle/s9nGajBh6Pb9G72J6
Reacted to merve's post with πŸ‘ about 8 hours ago
view post
Post
417
The authors of ColPali trained a retrieval model based on SmolVLM 🀠 vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 πŸ’—
Reacted to davanstrien's post with ❀️ 1 day ago
view post
Post
1356
First dataset for the new Hugging Face Bluesky community organisation: bluesky-community/one-million-bluesky-posts πŸ¦‹

πŸ“Š 1M public posts from Bluesky's firehose API
πŸ” Includes text, metadata, and language predictions
πŸ”¬ Perfect to experiment with using ML for Bluesky πŸ€—

Excited to see people build more open tools for a more open social media platform!
Reacted to davidberenstein1957's post with πŸ”₯ 1 day ago
view post
Post
1156
Let’s make a generation of amazing image-generation models

The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source so the community cannot train open models on them. Let’s change that!

The community can contribute image preferences for an open-source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.

Blog: https://huggingface.co/blog/burtenshaw/image-preferences
Reacted to merve's post with πŸ‘€πŸ€—πŸš€πŸ”₯ 1 day ago
view post
Post
2063
Small yet mighty! πŸ’«

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🀠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO πŸ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO πŸ’—
Reacted to merve's post with ❀️ 1 day ago
view post
Post
2747
your hugging face profile now has your recent activities πŸ€—
Reacted to davidberenstein1957's post with πŸ€―πŸ€—πŸ‘€ 6 days ago
Reacted to jsulz's post with 🧠 6 days ago
view post
Post
2848
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
πŸš€ Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks