玩最近 Facebook Research (Meta) 放出來的 LLaMA

1,493次阅读
没有评论

玩最近 Facebook Research (Meta) 放出來的 LLaMA

很多地方應該都有提到 Facebook Research (Meta) 放出來的 LLaMA 了,對應的論文是「LLaMA: Open and Efficient Foundation Language Models」這篇,但這邊論文提到的 open 並不是一般常見的 open 定義,而只是常見的行銷詞彙而已,實際上只是 free for charging with constraints。

另外要注意 LLaMA 是個 LLM 而已,跟 ChatGPT 不算是同樣性質的東西,能對比應該是 GPT-3 (或是 GPT-3.5)。

主要是 ChatGPT 多了 SLRL 的步驟,而產出來的東西更接近商業化產品要的結果。

LLaMA 的特點在於效能不錯,可以用 LLaMA-13B 打贏 GPT-3 (175B),另外這次訓練出來最大的 LLaMA-65B 則可以站上第一梯隊 (與 DeepMindChinchilla-70BGoogle ResearchPaLM-540B):

LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B.

但跟以前差異最大的是,這次 Facebook Research 決定把訓練完後的 model 檔案放出來,所以就有了後續很多的進展:

We release all our models to the research community.

首先一開始 Facebook Research 要求使用者填表單才提供下載 (2/24 的時候),但三月初的時候 GitHub 上有人直接把 BitTorrentmagnet 連結附上去,送了一個 pull request:「Save bandwidth by using a torrent to distribute more efficiently #73」,所以你就有「方法」可以取得 model 檔案,但還是可以注意一下使用限制:

To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.

除了可以透過 BitTorrent 下載外,comment 裡面也有 IPFS 的連結可以下載。

有需要下載這包檔案的人要注意檔案很大,大約 240GB,其中 65B model 佔了 128GB 左右。

被放出來以後就開始有很多人在上面包起來用,其中目前比較完整的應該是「Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp」這篇提到的方法,裡面提到的 ggerganov/llama.cpp 專案同時支援了 WindowsmacOSLinux,而且可以用 CPU 跑,速度也不慢。

試著用 llama.cpp 跑,65B 的 model 在家裡桌機有 64GB RAM 的情況下是可以應硬扛的,跑降到 4bits 的 model 大約吃 41GB RAM 左右。

比較驚訝的是 efficient 這個部份,拿 CPU 跑 65B 版本是跑得動的。

隨便丟個「文字接龍」進去 LLaMA-65B 讓他接,可以看到接出來的東西還可以:

main: prompt: ‘The main difference between javascript and java are ‘
main: number of tokens in prompt = 10
[…]

The main difference between javascript and java are 1)Java is a compiled language while JavaScript
is not. So that the performance of Javascript code can be lower than Java Code, but they run on different platforms: browser vs server or local machine;
2)’javac’ to compile .java file to bytecode(for JVM), and ‘jar'(to make executable jar files for client systems).
3)Java uses more memory (RAM) that javascript. So the Javascript code is generally small than Java
codes.(A web application written in JavaScript may run faster on a PC with 64 MB of RAM compared to one running using applets.) While an advantage offered by J

如果是 LLaMA-7B 的話會快很多,但回答就有點微妙了:

main: prompt: ‘The main difference between javascript and java are ‘
main: number of tokens in prompt = 10
[…]

The main difference between javascript and java are 1) Javascript is interpreted, while Java compiles to native machine code. (Compiled languages have a performance advantage over Interpreted ones
.)
2) The language standardization process for JavaScript has not been completed at the time of this writing which leaves it open to being hacked or changed by those who care enough about your website to do so… In Java, everything is set in stone. (Or at least as much code that can be shipped on a CD-ROM disk)
Sorry if my response was not clear – but you’re right! I think that “2” above should really have said: “(

訓練所花的資源的部份,可以從論文裡面看到,如果是 2048 張 A100 的話大約要跑五個月 (照這個語氣,實際上大概不是這個數字):

Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models.

另外也有列出 GPU hours 可以參考:

玩最近 Facebook Research (Meta) 放出來的 LLaMA

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(没有评论)

文心AIGC

2023 年 3 月
 12345
6789101112
13141516171819
20212223242526
2728293031  
文心AIGC
文心AIGC
人工智能ChatGPT,AIGC指利用人工智能技术来生成内容,其中包括文字、语音、代码、图像、视频、机器人动作等等。被认为是继PGC、UGC之后的新型内容创作方式。AIGC作为元宇宙的新方向,近几年迭代速度呈现指数级爆发,谷歌、Meta、百度等平台型巨头持续布局
文章搜索
热门文章
潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026 Jay 2025-12-22 09...
“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行

“昆山杯”第二十七届清华大学创业大赛决赛举行 一水 2025-12-22 17:04:24 来源:量子位 本届...
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law

MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law 一水 2025-12...
天下苦SaaS已久,企业级AI得靠「结果」说话

天下苦SaaS已久,企业级AI得靠「结果」说话

天下苦SaaS已久,企业级AI得靠「结果」说话 Jay 2025-12-22 13:46:04 来源:量子位 ...
最新评论
ufabet ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง
tornado crypto mixer tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.
ดูบอลสด ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
Obrazy Sztuka Nowoczesna Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.
ufabet ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
ufabet ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!
ufabet ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
热评文章
摩尔线程的野心,不藏了

摩尔线程的野心,不藏了

摩尔线程的野心,不藏了 量子位的朋友们 2025-12-22 10:11:58 来源:量子位 上市后的仅15天...
摩尔线程的野心,不藏了

摩尔线程的野心,不藏了

摩尔线程的野心,不藏了 量子位的朋友们 2025-12-22 10:11:58 来源:量子位 上市后的仅15天...
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身 量子位的朋友们 2025...
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身

AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身 量子位的朋友们 2025...
真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026

真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026

真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026 量子位的朋友...