Sylvain Filoni

fffiloni

AI & ML interests

ML for Animation โ€ข Alumni Arts Dรฉco Paris โ€ข PSL

Recent Activity

updated a Space about 4 hours ago
fffiloni/FlipSketch
updated a collection about 5 hours ago
LipSync and Face Operations
updated a Space about 5 hours ago
fffiloni/echomimic-v2
View all activity

Articles

Organizations

fffiloni's activity

Reacted to m-ric's post with โค๏ธ 13 days ago
view post
Post
3685
๐—ง๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—ฏ๐—ถ๐—ด ๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐Ÿฆ‹, ๐—ถ๐˜'๐˜€ ๐—›๐˜‚๐—ฏ ๐—ฃ๐—ผ๐˜€๐˜๐˜€! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

โš™๏ธ Computed with the great dataset maxiw/hf-posts
โš™๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
ยท
posted an update 13 days ago
Reacted to abhishek's post with ๐Ÿ”ฅ 18 days ago
view post
Post
5058
INTRODUCING Hugging Face AutoTrain Client ๐Ÿ”ฅ
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks ๐Ÿค—

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced
  • 6 replies
ยท
Reacted to MoritzLaurer's post with ๐Ÿš€๐Ÿค— about 2 months ago
view post
Post
4138
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
ยท
posted an update about 2 months ago
view post
Post
11181
Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

โ€œ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."โ€
โ€”
Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

  • 1 reply
ยท
Reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ 2 months ago
view post
Post
2498
Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1๏ธโƒฃ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2๏ธโƒฃ Training Pipeline:
โ€ข Started with pretrained Llama 3.1 text models
โ€ข Added image adapters and encoders
โ€ข Pretrained on large-scale noisy (image, text) pair data
โ€ข Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3๏ธโƒฃ Vision Integration:
โ€ข Trained adapter weights to integrate a pre-trained image encoder
โ€ข Used cross-attention layers to feed image representations into the language model
โ€ข Preserved text-only capabilities by not updating language model parameters during adapter training

4๏ธโƒฃ Post-Training Alignment:
โ€ข Multiple rounds of supervised fine-tuning (SFT)
โ€ข Rejection sampling (RS)
โ€ข Direct preference optimization (DPO)
โ€ข Synthetic data generation using Llama 3.1 for Q&A augmentation
โ€ข Reward model ranking for high-quality fine-tuning data

5๏ธโƒฃ Lightweight Models:
โ€ข Used pruning and distillation techniques for 1B and 3B models
โ€ข Structured pruning from Llama 3.1 8B model
โ€ข Knowledge distillation using Llama 3.1 8B and 70B as teachers

6๏ธโƒฃ Context Length:
All models support an impressive 128K token context length.

7๏ธโƒฃ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!
Reacted to jsulz's post with ๐Ÿš€ 2 months ago
view post
Post
1981
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and weโ€™ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248
Reacted to asoria's post with ๐Ÿ‘ 2 months ago
view post
Post
2379
๐Ÿ“ I wrote a tutorial on how to get started with the fine-tuning process using Hugging Face tools, providing an end-to-end workflow.

The tutorial covers creating a new dataset using the new SQL Console ๐Ÿ›ข and fine-tuning a model with SFT, guided by the Notebook Creator App ๐Ÿ“™.

๐Ÿ‘‰ You can read the full article here:
https://huggingface.co/blog/asoria/easy-fine-tuning-with-hf
asoria/auto-notebook-creator
Reacted to fdaudens's post with ๐Ÿ‘ 2 months ago
view post
Post
864
๐Ÿš€ Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case itโ€™s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co/JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI
Reacted to davanstrien's post with ๐Ÿ‘ 5 months ago
Reacted to alvdansen's post with ๐Ÿ‘ 5 months ago
view post
Post
5748
New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt
ยท
Reacted to DmitryRyumin's post with ๐Ÿ”ฅ 5 months ago
view post
Post
3603
๐Ÿš€๐ŸŽญ๐ŸŒŸ New Research Alert - Portrait4D-v2 (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿš€
๐Ÿ“„ Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer ๐Ÿ”

๐Ÿ“ Description: Portrait4D-v2 is a novel method for one-shot 4D head avatar synthesis using pseudo multi-view videos and a vision transformer backbone, achieving superior performance without relying on 3DMM reconstruction.

๐Ÿ‘ฅ Authors: Yu Deng, Duomin Wang, and Baoyuan Wang

๐Ÿ“„ Paper: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (2403.13570)

๐ŸŒ GitHub Page: https://yudeng.github.io/Portrait4D-v2/
๐Ÿ“ Repository: https://github.com/YuDeng/Portrait-4D

๐Ÿ“บ Video: https://www.youtube.com/watch?v=5YJY6-wcOJo

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: Portrait4D #4DAvatar #HeadSynthesis #3DModeling #TechInnovation #DeepLearning #ComputerGraphics #ComputerVision #Innovation
  • 1 reply
ยท
Reacted to alvdansen's post with ๐Ÿš€ 5 months ago
view post
Post
2515
Per popular request, I'm working on a beginning to end LoRA training workflow blog for a style.

It will focus on dataset curation through training on a pre-determined style to give a better insight on my process.

Curious what are some questions you might have that I can try to answer in it?
Reacted to louisbrulenaudet's post with ๐Ÿ‘ 5 months ago
view post
Post
3202
I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training ๐Ÿค—

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information ๐Ÿ”ฅ

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models โค๏ธ
  • 2 replies
ยท
Reacted to fdaudens's post with ๐Ÿš€ 5 months ago
view post
Post
3383
Updated the Journalists on ๐Ÿค— community page:
- new text-to-speech tools collection JournalistsonHF/text-to-speech-6675c4dccdaa11e86928a15b
- additional leaderboards in the eval collection: TTS-AGI/TTS-Arena and dylanebert/3d-arena
- new tools in the Text-Analysis collection: gokaygokay/Florence-2, pdf2dataset/pdf2dataset, cvachet/pdf-chatbot
- Xenova/realtime-whisper-webgpu in the Transcription collection
- radames/flash-sd3-taesd3 in the Image Tools collection
- Last but not least, okaris/omni-zero in the fun collection for zero-shot stylized portrait creation

Is there any tool you would like to see added?

Find all the curated tools here: https://huggingface.co/collections/JournalistsonHF/
Reacted to alvdansen's post with โค๏ธ 5 months ago
view post
Post
6799
I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.
ยท
Reacted to harpreetsahota's post with ๐Ÿ‘ 6 months ago
view post
Post
2095
The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset ๐Ÿ‘‡๐Ÿผ

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers
Reacted to thomwolf's post with ๐Ÿ”ฅ 6 months ago
view post
Post
4534
[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens ๐ŸทFineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce ๐Ÿ“šFineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1
  • 1 reply
ยท
replied to their post 6 months ago
view reply

Tu veux prรฉciser ton point de vue sur cette citation ? Les ยซ experts ยป du CNC font ici un peu de prospective, sans forcรฉment รชtre accurate (malheureusement)