Brett Winton: 有人可以在 Social IQa (SIQA) 基准测试中对 chatGPT（或其他人工反馈调整模型）进行基准测试吗？ SIQA 有一个明确的（并且可能是模棱两可的）特定于文化的道德覆盖，我猜 RLHF 调整会带来更好的表现

Posted on 2023-02-28

原推：Could somebody please benchmark chatGPT (or another human feedback-tuned model) on the Social IQa (SIQA) benchmark?

There is a clear (and potentially ambiguous) culture-specific ethical overlay to SIQA, and I’m guessing that RLHF-tuning would deliver outperformance