原推:Could somebody please benchmark chatGPT (or another human feedback-tuned model) on the Social IQa (SIQA) benchmark?
There is a clear (and potentially ambiguous) culture-specific ethical overlay to SIQA, and I’m guessing that RLHF-tuning would deliver outperformance