青源TALK第112期:揭秘GPT-4V在机器人视觉-语言规划中的强大力量

1,140次阅读
没有评论

在此次演讲中,我们将分享如何赋予机器人根据物理世界做任务规划的能力。最近的很多研究表明,大型语言模型(LLMs)具有对机器人任务非常有用的知识,尤其是在推理和规划方面。然而,LLMs缺乏物理世界的grounding,同时LLMs也依赖于外部affordance模型来感知环境信息,并且这些affordance模型不能与LLMs共同进行推理。我们认为,任务规划器应该是一个统一的多模态系统。为此,我们介绍了机器人视觉-语言规划算法(ViLa),这是一种新颖的长程机器人规划方法,它利用视觉-语言模型(VLMs)生成一系列可操作的步骤。ViLa直接将感知数据整合到其推理和规划过程中,使其能够深刻理解视觉世界中的常识知识,包括空间布局和对象属性。它还支持灵活的多模态目标定义,并自然地结合视觉反馈。我们在真实机器人和模拟环境中进行的广泛评估表明,ViLa相比现在基于LLM的任务规划算法具有明显的优势,并在众多的开放世界操纵任务中取得很好的表现。

In this talk, we are interested in imbuing robots with the capability of physically-grounded task planning. Recent advancements have shown that large language models (LLMs) possess extensive knowledge useful in robotic tasks, especially in reasoning and planning. However, LLMs are constrained by their lack of world grounding and dependence on external affordance models to perceive environmental information, which cannot jointly reason with LLMs. We argue that a task planner should be an inherently grounded, unified multimodal system. To this end, we introduce Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning that leverages vision-language models (VLMs) to generate a  sequence of actionable steps. ViLa directly integrates perceptual data into its reasoning and planning process, enabling a profound understanding of commonsense knowledge in the visual world, including spatial layouts and object attributes. It also supports flexible multimodal goal specification and naturally incorporates visual feedback. Our extensive evaluation, conducted in both real-robot and simulated environments, demonstrates ViLa’s superiority over existing LLM-based planners, highlighting its effectiveness in a wide array of open-world manipulation tasks.

                                                                                                  青源TALK第112期:揭秘GPT-4V在机器人视觉-语言规划中的强大力量

胡英东是清华大学交叉信息研究院三年级博士生,导师为高阳教授。在此之前,他在北京邮电大学获得智能科学技术学士学位。他的研究兴趣主要包括计算机视觉,强化学习,具身智能和机器人学习。目前专注于利用基础模型中的先验知识,构建能在开放世界泛化的通用机器人。他已经在ECCV,ICML,CoRL等多个机器学习和机器人会议上发表论文,担任ICLR,CVPR等国际学术会议审稿人。Hu Yingdong is a third-year Ph.D. student at the Institute for Interdisciplinary Information Sciences at Tsinghua University, under the supervision of Professor Gao Yang. Prior to this, he received his Bachelor’s degree in Intelligence Science and Technology from Beijing University of Posts and Telecommunications. His research interests mainly include computer vision, reinforcement learning, embodied intelligence, and robot learning. He is currently focused on using the prior knowledge in foundation models to build general-purpose robots that can generalize in the open world. He has published papers at various machine learning and robotics conferences, such as ECCV, ICML, CoRL, and serves as a reviewer for international academic conferences like ICLR and CVPR.


                                                                                               

                                                                                                青源TALK第112期:揭秘GPT-4V在机器人视觉-语言规划中的强大力量

                                                                                                  

胡英东是清华大学交叉信息研究院三年级博士生,导师为高阳教授。在此之前,他在北京邮电大学获得智能科学与技术学士学位。他的研究兴趣主要包括计算机视觉,强化学习,具身智能和机器人学习。目前专注于利用基础模型中的先验知识,构建能在开放世界泛化的通用机器人。他已经在ECCV,ICML,CoRL等多个机器学习和机器人会议上发表论文,担任ICLR,CVPR等国际学术会议审稿人。Hu Yingdong is a third-year Ph.D. student at the Institute for Interdisciplinary Information Sciences at Tsinghua University, under the supervision of Professor Gao Yang. Prior to this, he received his Bachelor’s degree in Intelligence Science and Technology from Beijing University of Posts and Telecommunications. His research interests mainly include computer vision, reinforcement learning, embodied intelligence, and robot learning. He is currently focused on using the prior knowledge in foundation models to build general-purpose robots that can generalize in the open world. He has published papers at various machine learning and robotics conferences, such as ECCV, ICML, CoRL, and serves as a reviewer for international academic conferences like ICLR and CVPR.



 

Read More 

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(没有评论)

文心AIGC

2023 年 12 月
 123
45678910
11121314151617
18192021222324
25262728293031
文心AIGC
文心AIGC
人工智能ChatGPT,AIGC指利用人工智能技术来生成内容,其中包括文字、语音、代码、图像、视频、机器人动作等等。被认为是继PGC、UGC之后的新型内容创作方式。AIGC作为元宇宙的新方向,近几年迭代速度呈现指数级爆发,谷歌、Meta、百度等平台型巨头持续布局
文章搜索
热门文章
潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026 Jay 2025-12-22 09...
“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行 一水 2025-12-22 17:04:24 来源:量子位 本届...
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law 一水 2025-12...
天下苦SaaS已久,企业级AI得靠「结果」说话

天下苦SaaS已久,企业级AI得靠「结果」说话

天下苦SaaS已久,企业级AI得靠「结果」说话 Jay 2025-12-22 13:46:04 来源:量子位 ...
最新评论
ufabet ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง
tornado crypto mixer tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.
ดูบอลสด ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
Obrazy Sztuka Nowoczesna Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.
ufabet ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
ufabet ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!
ufabet ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
热评文章
库克提拔复旦校友掌舵苹果基础模型!庞若鸣走后涨薪止血,谷歌旧部占据半壁江山

库克提拔复旦校友掌舵苹果基础模型!庞若鸣走后涨薪止血,谷歌旧部占据半壁江山

库克提拔复旦校友掌舵苹果基础模型!庞若鸣走后涨薪止血,谷歌旧部占据半壁江山 衡宇 2025-12-21 10:...
清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026

清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026

清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026 Jay 2025...
奥迪+华为=油车智能天花板?

奥迪+华为=油车智能天花板?

Failed to fetch content Read More 
LeCun离职前的吐槽太猛了

LeCun离职前的吐槽太猛了

LeCun离职前的吐槽太猛了 一水 2025-12-21 19:13:08 来源:量子位 “LLM到不了AGI...
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026

自变量王潜:具身智能是物理世界的独立基础模型|MEET2026

自变量王潜:具身智能是物理世界的独立基础模型|MEET2026 一水 2025-12-21 19:11:12 ...