ACT-1: Transformer for Actions

1,817次阅读
没有评论

AI has moved at an incredible pace in the last few years. Scaling up Transformers has led to remarkable capabilities in language (e.g., GPT-3, PaLM, Chinchilla), code (e.g., Codex, AlphaCode), and image generation (e.g., DALL-E, Imagen).

At Adept, we are building the next frontier of models that can take actions in the digital world—that’s why we’re excited to introduce our first large model, Action Transformer (ACT-1).

Why are we so excited about this?

First, we believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal, and ACT-1 is our first step in this direction.

Second, the next era of computing will be defined by natural language interfaces that allow us to tell our computers what we want directly, rather than doing it by hand. We hope these snippets of ACT-1 will give you a window into the next frontier of computing as we see it!

Sign up here to join the waitlist for the upcoming alpha release of our first product built around ACT-1.

Capability preview

ACT-1 is a large-scale Transformer trained to use digital tools — among other things, we recently taught it how to use a web browser. Right now, it’s hooked up to a Chrome extension which allows ACT-1 to observe what’s happening in the browser and take certain actions, like clicking, typing, and scrolling, etc. The observation is a custom “rendering” of the browser viewport that’s meant to generalize across websites, and the action space is the UI elements available on the page.

There’s a lot of room to make it faster, both on the modeling side and on the software side – so we expect future systems will have latency that’s largely imperceptible to humans. These videos have been sped up to make them easier for you to view. An upcoming technical post will go into much more detail on all of these topics.

Here are some cool things ACT-1 can do!

ACT-1 can take a high-level user request and execute it. The user simply types a command into the text box and ACT-1 does the rest. In this example, this requires repeatedly taking actions and observations over a long time horizon to fulfill a single goal.

https://player.vimeo.com/video/749413832?h=15f094bbb9&title=0&byline=0&portrait=0

[fvplayer id=”2″]

This can be especially powerful for manual tasks and complex tools — in this example, what might ordinarily take 10+ clicks in Salesforce can be now done with just a sentence.

https://player.vimeo.com/video/749413804?h=15f094bbb9&title=0&byline=0&portrait=0

Working in-depth in tools like spreadsheets, ACT-1 demonstrates real-world knowledge, infers what we mean from context, and can help us do things we may not even know how to do.

https://player.vimeo.com/video/749413815?h=15f094bbb9&title=0&byline=0&portrait=0

The model can also complete tasks that require composing multiple tools together; most things we do on a computer span multiple programs. In the future, we expect ACT-1 to be even more helpful by asking for clarifications about what we want.

https://player.vimeo.com/video/749413825?h=15f094bbb9&title=0&byline=0&portrait=0

The internet contains a lot of knowledge about the world! When the model doesn’t know something, it knows how to just look up the information online (seen here in voice mode).

https://player.vimeo.com/video/749413798?h=15f094bbb9&title=0&byline=0&portrait=0

ACT-1 doesn’t know how to do everything, but it’s highly coachable. With 1 piece of human feedback, it can correct mistakes, becoming more useful with each interaction.

https://player.vimeo.com/video/749597375?h=15f094bbb9&title=0&byline=0&portrait=0

Looking ahead

Natural language interfaces, powered by action transformers like ACT-1, will dramatically expand what people can do in front of a computer/phone/internet-connected device. A few years from now, we believe:

  • Most interaction with computers will be done using natural language, not GUIs. We’ll tell our computer what to do, and it’ll do it. Today’s user interfaces will soon seem as archaic as landline phones do to smartphone users.
  • Beginners will become power users, no training required. Anyone who can articulate their ideas in language can implement them, regardless of expertise. Software will become even more powerful as advanced features become accessible to everyone and no longer constrained by the length of a drop-down menu.
  • Documentation, manuals, and FAQs will be for models, not for people. No longer will we need to learn the quirky language of every individual software tool in order to be effective at a task. We will never search through forums for “how to do X in Salesforce or Unity or Figma” — the model will do that work, allowing us to focus on the higher-order task at hand.
  • Breakthroughs across all fields will be accelerated with AI as our teammate. Action transformers will work with us to bring about advances in drug design, engineering, and more. Collaborating with these models will make us more efficient, energized, and creative.

While we’re excited that these systems can transform what people can do on a computer, we clearly see that they have the potential to cause harm if misused or misaligned with user preferences. Our goal is to build a company with large-scale human feedback at the center — models will be evaluated on how well they satisfy user preferences, and we will iteratively evaluate how well this is working as our product becomes more sophisticated and load-bearing. To combat misuse, we plan to use a combination of machine learning techniques and careful, staged deployment.

What we’ve shown above is only scratching the surface — we’re making great progress towards Adept being able to do arbitrary things on a computer. We have ambitious goals in both the short and long term, and we’re hiring visionary and talented people across roles to make it happen — you can apply here.

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(没有评论)

文心AIGC

2023 年 3 月
 12345
6789101112
13141516171819
20212223242526
2728293031  
文心AIGC
文心AIGC
人工智能ChatGPT,AIGC指利用人工智能技术来生成内容,其中包括文字、语音、代码、图像、视频、机器人动作等等。被认为是继PGC、UGC之后的新型内容创作方式。AIGC作为元宇宙的新方向,近几年迭代速度呈现指数级爆发,谷歌、Meta、百度等平台型巨头持续布局
文章搜索
热门文章
潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026 Jay 2025-12-22 09...
“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行 一水 2025-12-22 17:04:24 来源:量子位 本届...
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law 一水 2025-12...
清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开

清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开

清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开 Jay 2026-01-08 20:18:...
最新评论
ufabet ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง
tornado crypto mixer tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.
ดูบอลสด ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
Obrazy Sztuka Nowoczesna Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.
ufabet ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
ufabet ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!
ufabet ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
热评文章
摩尔线程的野心,不藏了

摩尔线程的野心,不藏了

摩尔线程的野心,不藏了 量子位的朋友们 2025-12-22 10:11:58 来源:量子位 上市后的仅15天...
摩尔线程的野心,不藏了

摩尔线程的野心,不藏了

摩尔线程的野心,不藏了 量子位的朋友们 2025-12-22 10:11:58 来源:量子位 上市后的仅15天...
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身 量子位的朋友们 2025...
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身 量子位的朋友们 2025...
真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026

真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026

真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026 量子位的朋友...