原推:Say you wanted to run a chatGPT-sized language model on your desktop.
Not possible.
Until now. (Unless you wanted to wait a minute and a half for each word.)
This paper figures out how to do it on a single GPU at 1 token per second
翻译英文优质信息和名人推特
原推:Say you wanted to run a chatGPT-sized language model on your desktop.
Not possible.
Until now. (Unless you wanted to wait a minute and a half for each word.)
This paper figures out how to do it on a single GPU at 1 token per second