斯坦福大学：大象：测量和理解大型语言模型中的社交阿谀奉承（英文版）

发布者：wx****00

2025-11-25

812 KB 34 页

斯坦福大学

文件列表：

斯坦福大学：大象：测量和理解大型语言模型中的社交阿谀奉承（英文版）.pdf

下载文档

资源简介

LLMs are known to exhibit sycophancy: agreeing with and flattering users, even at the cost of correctness. Prior work measures sycophancy only as direct agreement with users’ explicitly stated beliefs that can be compared to a ground truth. This fails to capture broader forms of sycophancy such as affirming a user’s self-image or other implicit beliefs. To address this gap, we introduce social sycophancy, characterizing sycophancy as excessive preservation of a user’s face (their desired self

加载中...

本文档仅能预览20页

继续阅读请下载文档