Can We Align LLMs to Honesty via Instruction Fine-Tuning? Addressing H …

Researchers from the Hong Kong University of Science and Technology and the University of Illinois Urbana-Champaign have collaborated to address a challenge faced by large language models (LLMs) known as hallucination, where these models generate non-existent facts, by introducing a novel approach called Refusal-Aware Instruction Tuning (R-Tuning). The observation from the existing instruction tuning methods reveals that often in LLM, models are compelled to complete sentences even when there is a knowledge gap, which leads to the generation of inaccurate information. 

The core idea of R-tuning involves recognizing the knowledge gap between the parametric knowledge of LLMs and the instruction tuning data and then constructing a refusal-aware dataset by identifying uncertain questions and training the model to explicitly refuse to answer questions beyond its parametric knowledge. This two-step process involves measuring the knowledge gap by comparing model predictions with ground-truth labels and constructing refusal-aware data by appending uncertainty expressions to uncertain questions.

The researchers conducted both single-task and multi-task experiments on seven datasets, namely ParaRel, HotpotQA, SelfAware, HaluEval, FalseQA, NEC, MMLU, WiCE, and FEVER. In single-task experiments, R-Tuning demonstrated a remarkable ability to refuse uncertain questions, leading to improved accuracy on questions within the model’s knowledge. In multi-task experiments, R-Tuning showcased its refusal ability as a meta-skill, providing advantages in- and out-of-domain datasets.

Comparisons with baseline models, including Pretrain-T, Pretrain-W, and Vanilla fine-tuning, revealed that R-Tuning consistently outperformed in Average Precision (AP) scores. The results indicated that R-Tuning effectively reduced hallucination by filtering out questions beyond the model’s knowledge domain. Additionally, the study explored the impact of model size on refusal ability, showing that larger models demonstrated better scalability and performance.

Surprisingly, the researchers found that learning uncertainty during training and incorporating it into the model’s training process yielded better results than directly applying uncertainty filtering on test data. This unexpected finding suggested that learning uncertainty improved the model’s training in estimating uncertainty and answering questions, highlighting the advantages of incorporating uncertainty learning into LLM training. They also discovered unsupervised identification strategies and label replacement methods within R-Tuning, showing that uncertainty-based identification and direct label replacement were effective approaches. 

Furthermore, R-Tuning successfully addressed unanswerable questions, refusing to provide answers to queries that contradicted common sense or were beyond the model’s knowledge. The in-depth analysis included examining the perplexity of refused questions and the entropy of answers, providing insights into how R-Tuning improved the model’s ability to handle different levels of question randomness and difficulties.

In conclusion, the researchers introduced R-Tuning as a powerful method for teaching LLMs to refuse unknown questions, addressing the challenge of hallucination and improving model accuracy. The refusal ability demonstrated by R-Tuning was identified as a meta-skill that could be generalized across various tasks, showcasing its potential impact on the reliability and performance of large language models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, LinkedIn Group, Twitter, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Can We Align LLMs to Honesty via Instruction Fine-Tuning? Addressing Hallucination in Large Language Models with Refusal-Aware Instruction Tuning appeared first on MarkTechPost.