Post
1110
We’re excited to release Abstract2Appendix v1 10K , a high-quality dataset crafted to enhance the long-context capabilities of Large Language Models (LLMs). This dataset combines thousands of peer reviews from NeurIPS 2023, EMNLP 2023, TMLR, and ICLR 2023, making it a treasure trove of detailed feedback, critical reasoning, and structured academic insights. Our experiments showed that this dataset increased long context ability of phi-3 models!
🌟 Key Highlights:
• Expert Reviews: Aggregated from 3–6 reviews per paper, capturing the most insightful and constructive content.
• Rich Metadata: we have aggregated the reviews, and also included full parsed paper
• LLM Ready: Perfect for fine-tuning (We did dpo and sft)
🎯 Use Cases:
• Fine-tuning models with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT).
• Benchmarking zero-shot and long-context comprehension capabilities.
🔗 Explore the dataset: alexshengzhili/Abstract2Appendix_v1_10k
This dataset is based on the methodology described in our recent paper, “Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities”. Check it out for more details! https://arxiv.org/abs/2411.05232
🌟 Key Highlights:
• Expert Reviews: Aggregated from 3–6 reviews per paper, capturing the most insightful and constructive content.
• Rich Metadata: we have aggregated the reviews, and also included full parsed paper
• LLM Ready: Perfect for fine-tuning (We did dpo and sft)
🎯 Use Cases:
• Fine-tuning models with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT).
• Benchmarking zero-shot and long-context comprehension capabilities.
🔗 Explore the dataset: alexshengzhili/Abstract2Appendix_v1_10k
This dataset is based on the methodology described in our recent paper, “Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities”. Check it out for more details! https://arxiv.org/abs/2411.05232