Brett Winton: @NaveenGRao 我明白了（我想）但我具体谈论的是偏好模型的大小（人择的词汇——其他人使用“奖励模型”）。 Anthropic 在扩大参数数量（至 50b）时获得了更好的结果无法通过相对少量的 HF 数据进行平方。

Posted on 2023-02-10

原推：@NaveenGRao I get that (I think)

But I’m talking specifically about the size of the preference model (anthropic’s vocabulary—others use “reward model”).

Anthropic got better results as it scaled up # of parameters (to 50b)