原推:@NaveenGRao I get that (I think)
But I’m talking specifically about the size of the preference model (anthropic’s vocabulary—others use “reward model”).
Anthropic got better results as it scaled up # of parameters (to 50b)
Having trouble squaring that w/ relatively small amount of HF data.
https://twitter.com/wintonARK/status/1623813109688463361