Brett Winton: 如果多模式数据成为确保基础模型位置的关键因素，我不会感到惊讶。 https://t.co/eulYOCcWnb 引用@DannyDriess 的推文：当我们训练最大的视觉语言模型并添加机器人体验时会发生什么？结果是 PaLM-E ??，一个 5620 亿参数、通用、具体化的视觉语言通才——横跨机器人、视觉和语言。网站：palm-e.github.io https://t.co/5qfK23g52d

Posted on 2023-03-07

原推：Would not surprise me if multi-modal data become the critical ingredients for securing a foundation model position. https://t.co/eulYOCcWnb

Quoted tweet from @DannyDriess:

What happens when we train the largest vision-language model and add in robot experiences?

The result is PaLM-E ??, a 562-billion parameter, general-purpose, embodied visual-language generalist – across robotics, vision, and language.

Website: palm-e.github.io https://t.co/5qfK23g52d

https://twitter.com/wintonARK/status/1632974153635737600