原推:Truly diverse data is massively valuable.
Train AI two models; duplicate 0.1% of the data 100 times for one of them.
Performance per dollar halves(!) for the model with the data duplication.
arxiv.org/pdf/2205.10487…
https://twitter.com/wintonARK/status/1626300042163388417