Abstract
Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge. Despite promising, current LLMs still exhibit significant dishonest behaviors, such as confidently presenting wrong answers or failing to express what they know. In addition, research on the honesty of LLMs also faces challenges, including varying definitions of honesty, difficulties in distinguishing between known and unknown knowledge, and a lack of comprehensive understanding of related research. To address these issues, we provide a survey on the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement. Moreover, we offer insights for future research, aiming to inspire further exploration in this important area.
Community
We are excited to share our work with everyone: A Survey on the Honesty of Large Language Models. In this paper, we systematically review the current research works on the honesty of LLMs and offer insights for future research, aiming to contribute to the development of this field.
Paper: https://arxiv.org/pdf/2409.18786
Project Page: https://github.com/SihengLi99/LLM-Honesty-Survey
Figure 1: An illustration of an honest LLM that demonstrates both self-knowledge and self-expression.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions (2024)
- Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions (2024)
- From Deception to Detection: The Dual Roles of Large Language Models in Fake News (2024)
- When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options (2024)
- Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper