很多地方應該都有提到 Facebook Research (Meta) 放出來的 LLaMA 了,對應的論文是「LLaMA: Open and Efficient Foundation Language Models」這篇,但這邊論文提到的 open 並不是一般常見的 open 定義,而只是常見的行銷詞彙而已,實際上只是 free for charging with constraints。
另外要注意 LLaMA 是個 LLM 而已,跟 ChatGPT 不算是同樣性質的東西,能對比應該是 GPT-3 (或是 GPT-3.5)。
主要是 ChatGPT 多了 SL 與 RL 的步驟,而產出來的東西更接近商業化產品要的結果。
LLaMA 的特點在於效能不錯,可以用 LLaMA-13B 打贏 GPT-3 (175B),另外這次訓練出來最大的 LLaMA-65B 則可以站上第一梯隊 (與 DeepMind 的 Chinchilla-70B 與 Google Research 的 PaLM-540B):
LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B.
但跟以前差異最大的是,這次 Facebook Research 決定把訓練完後的 model 檔案放出來,所以就有了後續很多的進展:
We release all our models to the research community.
首先一開始 Facebook Research 要求使用者填表單才提供下載 (2/24 的時候),但三月初的時候 GitHub 上有人直接把 BitTorrent 的 magnet 連結附上去,送了一個 pull request:「Save bandwidth by using a torrent to distribute more efficiently #73」,所以你就有「方法」可以取得 model 檔案,但還是可以注意一下使用限制:
To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.
除了可以透過 BitTorrent 下載外,comment 裡面也有 IPFS 的連結可以下載。
有需要下載這包檔案的人要注意檔案很大,大約 240GB,其中 65B model 佔了 128GB 左右。
被放出來以後就開始有很多人在上面包起來用,其中目前比較完整的應該是「Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp」這篇提到的方法,裡面提到的 ggerganov/llama.cpp 專案同時支援了 Windows、macOS 與 Linux,而且可以用 CPU 跑,速度也不慢。
試著用 llama.cpp 跑,65B 的 model 在家裡桌機有 64GB RAM 的情況下是可以應硬扛的,跑降到 4bits 的 model 大約吃 41GB RAM 左右。
比較驚訝的是 efficient 這個部份,拿 CPU 跑 65B 版本是跑得動的。
隨便丟個「文字接龍」進去 LLaMA-65B 讓他接,可以看到接出來的東西還可以:
main: prompt: ‘The main difference between javascript and java are ‘
main: number of tokens in prompt = 10
[…]
The main difference between javascript and java are 1)Java is a compiled language while JavaScript
is not. So that the performance of Javascript code can be lower than Java Code, but they run on different platforms: browser vs server or local machine;
2)’javac’ to compile .java file to bytecode(for JVM), and ‘jar'(to make executable jar files for client systems).
3)Java uses more memory (RAM) that javascript. So the Javascript code is generally small than Java
codes.(A web application written in JavaScript may run faster on a PC with 64 MB of RAM compared to one running using applets.) While an advantage offered by J
如果是 LLaMA-7B 的話會快很多,但回答就有點微妙了:
main: prompt: ‘The main difference between javascript and java are ‘
main: number of tokens in prompt = 10
[…]
The main difference between javascript and java are 1) Javascript is interpreted, while Java compiles to native machine code. (Compiled languages have a performance advantage over Interpreted ones
.)
2) The language standardization process for JavaScript has not been completed at the time of this writing which leaves it open to being hacked or changed by those who care enough about your website to do so… In Java, everything is set in stone. (Or at least as much code that can be shipped on a CD-ROM disk)
Sorry if my response was not clear – but you’re right! I think that “2” above should really have said: “(
訓練所花的資源的部份,可以從論文裡面看到,如果是 2048 張 A100 的話大約要跑五個月 (照這個語氣,實際上大概不是這個數字):
Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models.
另外也有列出 GPU hours 可以參考: