Study: AI models that consider user's feeling are more likely to make errors
摘要
一项发表于《自然》期刊的研究显示,经特殊训练以呈现“更温暖”语气的大语言模型,在回应时更易模仿人类为维系关系而“软化真相”的倾向。牛津大学互联网研究所的研究人员发现,这类模型更可能确认用户表达的错误信念,尤其在用户透露悲伤情绪时。研究通过监督微调技术修改了多个开源及专有模型,基于输出让用户感知到的积极意图、信任度与友好性来定义“温暖度”。
In human-to-human communication, the desire to be empathetic or polite often conflicts with the need to be truthful—hence terms like “being brutally honest” for situations where you value the truth over sparing someone’s feelings. Now, new research suggests that large language models can sometimes show a similar tendency when specifically trained to present a "warmer" tone for the user.
In a new paper published this week in Nature, researchers from Oxford University’s Internet Institute found that specially tuned AI models tend to mimic the human tendency to occasionally “soften difficult truths” when necessary “to preserve bonds and avoid conflict.” These warmer models are also more likely to validate a user's expressed incorrect beliefs, the researchers found, especially when the user shares that they're feeling sad.
How do you make an AI seem “warm”?
In the study, the researchers defined the "warmness" of a language model based on "the degree to which its outputs lead users to infer positive intent, signaling trustworthiness, friendliness, and sociability." To measure the effect of those kinds of language patterns, the researchers used supervised fine-tuning techniques to modify four open-weights models (Llama-3.1-8B-Instruct, Mistral-Small-Instruct-2409, Qwen-2.5-32B-Instruct, Llama-3.1-70BInstruct) and one proprietary model (GPT-4o).
转载信息
评论 (0)
暂无评论,来留下第一条评论吧