1208 44 48

Quentin Gallouédec

qgallouedec

https://gallouedec.com

AI & ML interests

None yet

Recent Activity

upvoted a paper about 12 hours ago

QLoRA: Efficient Finetuning of Quantized LLMs

liked a dataset about 12 hours ago

b-mc2/sql-create-context

updated a model about 16 hours ago

qgallouedec/Qwen2-0.5B-Reward

View all activity

Articles

Preference Optimization for Vision Language Models

Jul 10

• 46

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

Organizations

qgallouedec's activity

upvoted a paper about 12 hours ago

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 46

upvoted an article about 1 month ago

Article

Finetuning PaliGemma with AutoTrain

•

Jul 25

• 8

upvoted 2 papers about 2 months ago

The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30 • 4

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16 • 34

upvoted a collection about 2 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 139

upvoted 3 papers 2 months ago

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Paper • 2405.21046 • Published May 31 • 3

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62

Binary Classifier Optimization for Large Language Model Alignment

Paper • 2404.04656 • Published Apr 6 • 2

upvoted 2 papers 3 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 121

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 11

upvoted an article 3 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22

• 85

upvoted a paper 3 months ago

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Paper • 2312.09244 • Published Dec 14, 2023 • 8

upvoted 3 papers 4 months ago

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12 • 9

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Paper • 2312.03732 • Published Nov 28, 2023 • 7

The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3

upvoted an article 4 months ago

Article

Putting RL back in RLHF

Jun 12

• 62

upvoted a paper 4 months ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

upvoted an article 4 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 68

upvoted 2 articles 5 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11

• 104

Article

Preference Optimization for Vision Language Models

Jul 10

• 46