Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

Authors

Note: This README provides a brief overview. For the complete paper with full technical details, methodology, and experimental results, please refer to paper.pdf.

1. Abstract

Large language models (LLMs) often hallucinate under uncertainty, producing fluent yet unsupported responses. We argue that hallucination is fundamentally a decision-making problem, involving when to answer, when to abstain, and how confidently to respond, rather than purely a generation error.

We propose a behavior-oriented reinforcement learning framework that explicitly models these decisions. Our method integrates behavior alignment, entropy-based uncertainty modeling, response quality shaping, and length regularization into a unified reward function, and is optimized via a two-stage training process combining preference learning (DPO) and reinforcement learning (GRPO).

Experimental results show that our approach reduces hallucination by over 65%, consistently outperforming strong baselines such as GPT-4o. The learned behavior generalizes effectively to unseen domains, improves response quality across eight evaluation dimensions, and preserves general knowledge capability without catastrophic forgetting.

These findings demonstrate that hallucination mitigation can be effectively achieved by learning behavior under uncertainty, enabling LLMs to produce responses that are both reliable and useful.

Acknowledgments

Special thanks to Eddie and CHI_YA for their valuable contributions to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

Authors

1. Abstract

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

Authors

1. Abstract

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages