Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach
Abstract
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code. Exception handling mechanisms require developers to detect, capture, and manage exceptions according to high standards, but many developers struggle with these tasks, leading to fragile code. This problem is particularly evident in open source projects and impacts the overall quality of the software ecosystem. To address this challenge, we explore the use of large language models (LLMs) to improve exception handling in code. Through extensive analysis, we identify three key issues: Insensitive Detection of Fragile Code, Inaccurate Capture of Exception Types, and Distorted Handling Solutions. These problems are widespread across real world repositories, suggesting that robust exception handling practices are often overlooked or mishandled. In response, we propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling. Seeker uses agents: Scanner, Detector, Predator, Ranker, and Handler to assist LLMs in detecting, capturing, and resolving exceptions more effectively. Our work is the first systematic study on leveraging LLMs to enhance exception handling practices, providing valuable insights for future improvements in code reliability.
Community
As the functional correctness of large language models (LLMs) in code generation tasks continues to gain attention and improve, how to generate code that passes more test cases and how to reduce functional vulnerabilities in the code have become determinative to evaluate LLMs' coding performance. However, few works delve into the performance of LLMs in code robustness represented by exception handling mechanisms. Especially in real development scenarios, the exception mechanism has high standardization requirements for developers in terms of detecting, capturing and handling. Due to the lack of interpretable experience and generalizable strategies, highly robust code is relatively scarce in the main peak of open source projects, which further affects the high-quality code training dataset and generation quality of LLM. This prompted us to raise a research question that few people have explored: "Do we need to enhance the standardization, interpretability and generalizability of exception handling in real code development scenarios?" To confirm this requirement, we first uncovered three pillar phenomena of incorrect exception handling through extensive LLM and human code review: Insensitive-Detection of Fragile Code, Inaccurate-Capture of Exception Type, and Distorted-Solution of Handling Block. These phenomena frequently occur both in real repositories and generated code, indicating that both human developers and LLMs are hard to spontaneously understand the usage skills of exception handling. Surprisingly, LLM's bad performance will have a good mitigation effect under the enhanced prompts of precise exception types, scenario logic and handling strategy. To exploit this effect, we propose a new method called Seeker, it is a chain of agents based on thoughts of the most experienced human developers when facing exception handling tasks, separated as Scanner, Detector, Predator, Ranker, Handler agents. To the best of our knowledge, our work is the first systematic study of the robustness of LLM-generated code in real development scenarios, providing valuable insights for future research in the direction of reliable code generation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? (2024)
- Vulnerability Handling of AI-Generated Code - Existing Solutions and Open Challenges (2024)
- HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale (2024)
- APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls (2024)
- VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper