莱斯大学 & MIT | 使用合成数据的视觉语言大模型

【推荐理由】本文构建了一个百万级合成数据集和数据生成代码库，允许生成额外的合适数据以提高视觉语言大模型的VLC理解和组合推理能力。

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

【Rice University & MIT】

【论文链接】https://arxiv.org/pdf/2303.17590.pdf

【项目链接】https://synthetic-vic.github.io/

【摘要】大规模预训练的视觉语言（VL）模型在许多应用中表现出卓越的性能，使得可以用（几乎任意的）自然语言提示进行零样本开放词汇推理，取代了固定的支持类别集合。然而，最近的研究揭示了这些模型的一个根本性弱点。例如，它们难以理解超越名词的视觉语言概念（VLC），如非物体词语的含义（例如属性、动作、关系、状态等），或难以进行组合推理，如理解句子中单词顺序的重要性。在这项工作中，作者调查了纯合成数据能够在不损害零样本能力的情况下，教这些模型克服这些缺点的程度。文章提供了合成视觉概念（SyViC）-一个百万级合成数据集和数据生成代码库，允许生成额外的合适数据以提高VL模型的VLC理解和组合推理能力。此外，作者提出了一个通用的VL微调策略，以有效地利用SyViC实现这些改进。作者在VL-Checklist、Winoground和ARO基准测试上进行了广泛的实验和消融，证明可以用合成数据适应强大的预训练VL模型，显著提高它们的VLC理解能力（例如在ARO上提高了9.9%、在VLChecklist上提高了4.3%），而零样本准确度下降不到1%。

莱斯大学 & MIT | 使用合成数据的视觉语言大模型

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

莱斯大学 & MIT | 使用合成数据的视觉语言大模型

潞晨尤洋：日常办公没必要上私有模型，这三类企业才需要 | MEET2026

世界模型和具身大脑最新突破：90%生成数据，VLA性能暴涨300%｜开源

SpaceX估值8000亿美元超OpenAI，IPO就在明年

“豆包手机”在二手市场价格都翻倍了……

中国AI计算开放架构创新风向标：HAIC2025重磅启幕

库克不忍了！挥刀优化苹果AI大总管

中国移动亿元战略投资港科大系触觉智能企业

做难而正确的AI Infra创新——专访国产大模型推理引擎xLLM社区负责人刘童璇

PixVerse（拍我AI）V5.5发布：国内首款分镜+音频一键生成AI视频大模型

灵光 “一闪”，330万个“闪应用”已创建