Brett Winton: 将模型有效地卸载到成本较低的内存中也可以降低每个令牌的成本。当然，它不如在价值 20 万美元的 GPU 上进行云推理那么快，但它可以将生成的每个令牌的成本降低 75%

Posted on 2023-02-21

原推：Efficiently offloading the model into lower cost memory also reduces cost per token.

Sure, it’s not as fast as cloud-inferring on $200k worth of GPUs, but it reduces cost per token generated by 75%