daily_paper_coll - a Jerrycool Collection

Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Jerrycool 's Collections

daily_paper_coll

daily_paper_coll

updated Sep 9

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 134
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18
Priority Sampling of Large Language Models for Compilers

Paper • 2402.18734 • Published Feb 28 • 16
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27 • 23
Towards Optimal Learning of Language Models

Paper • 2402.17759 • Published Feb 27 • 16
Training-Free Long-Context Scaling of Large Language Models

Paper • 2402.17463 • Published Feb 27 • 19
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126
Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22 • 22
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6 • 62
SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6 • 75
Backtracing: Retrieving the Cause of the Query

Paper • 2403.03956 • Published Mar 6 • 10
Learning to Decode Collaboratively with Multiple Language Models

Paper • 2403.03870 • Published Mar 6 • 18
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Paper • 2403.04746 • Published Mar 7 • 22
Common 7B Language Models Already Possess Strong Math Capabilities

Paper • 2403.04706 • Published Mar 7 • 16
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7 • 18
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 90
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11 • 53
Algorithmic progress in language models

Paper • 2403.05812 • Published Mar 9 • 18
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20
Language models scale reliably with over-training and on downstream tasks

Paper • 2403.08540 • Published Mar 13 • 14
On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27 • 16
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 24
Towards a World-English Language Model for On-Device Virtual Assistants

Paper • 2403.18783 • Published Mar 27 • 4
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18 • 16
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 72
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29 • 34
Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29 • 47
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 44
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 89
Training LLMs over Neurally Compressed Text

Paper • 2404.03626 • Published Apr 4 • 21
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Paper • 2404.02893 • Published Apr 3 • 20
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Paper • 2404.03411 • Published Apr 4 • 8
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3 • 47
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 60
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1 • 23
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4 • 24
Social Skill Training with Large Language Models

Paper • 2404.04204 • Published Apr 5 • 15
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 34
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 82
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 116
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2 • 59
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Paper • 2405.01481 • Published May 2 • 25
FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2 • 24
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20 • 98
fka/awesome-chatgpt-prompts

Viewer • Updated Sep 3 • 170 • 9.16k • 6.18k

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs