shengzhi alex li

alexshengzhili

AI & ML interests

None yet

Recent Activity

Organizations

Posts 2

view post
Post
1110
We’re excited to release Abstract2Appendix v1 10K , a high-quality dataset crafted to enhance the long-context capabilities of Large Language Models (LLMs). This dataset combines thousands of peer reviews from NeurIPS 2023, EMNLP 2023, TMLR, and ICLR 2023, making it a treasure trove of detailed feedback, critical reasoning, and structured academic insights. Our experiments showed that this dataset increased long context ability of phi-3 models!

🌟 Key Highlights:

• Expert Reviews: Aggregated from 3–6 reviews per paper, capturing the most insightful and constructive content.
• Rich Metadata: we have aggregated the reviews, and also included full parsed paper
• LLM Ready: Perfect for fine-tuning (We did dpo and sft)

🎯 Use Cases:

• Fine-tuning models with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT).
• Benchmarking zero-shot and long-context comprehension capabilities.

🔗 Explore the dataset: alexshengzhili/Abstract2Appendix_v1_10k

This dataset is based on the methodology described in our recent paper, “Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities”. Check it out for more details! https://arxiv.org/abs/2411.05232
view post
Post

After the Supervised Fine-Tuning (SFT) phase, we observed a notable degradation in the instruction-following capabilities of the LLaVA Multi-Modal Large Language Model (MM-LLM). To address this issue, we introduced a 6K-entry VQA preference dataset and employed Direct Preference Optimization (DPO), alongside testing other algorithms such as Rejection Sampling and SteerLM, to enhance instruction-following proficiency. Our methodology not only fully restored the language following capabilities of LLaVa on the MT-Bench but also outperformed LLaVA-RLHF and Vicuna. Additionally, our approach extended to visual VQA tasks, as demonstrated by significant performance improvements on MM-Vet and LLaVa-Bench. An interesting observation was that, compared to models using distilled SFT, our method showed substantial out-of-distribution improvements.

https://arxiv.org/abs/2402.10884
Model available
alexshengzhili/llava-v1.5-13b-dpo
GitHub:
https://github.com/findalexli/mllm-dpo/edit/main/README.MD