剑桥大学｜LongForm：利用语料库提取优化长文本生成的指令调控

LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction

Abdullatif Köksal, Timo Schick, Anna Korhonen, Hinrich Schütze

Abdullatif Köksal剑桥学生；Anna Korhonen是剑桥大学自然语言处理教授

指令调整使语言模型能够更有效地泛化，并更好地遵循用户的意图。然而，获得指令数据可能是昂贵的和具有挑战性的。之前的工作采用的方法包括昂贵的人类注释、存在对齐问题的众包数据集，或通过LLMs生成噪声实例。我们介绍了LongForm数据集，该数据集是通过利用带有增强指令的英语语料库实例创建的。我们从现有的语料库（如C4和维基百科）中选择了一组不同的人写的文件，并通过LLMs为给定的文件生成指令。这种方法提供了一个更便宜、更干净的指令调整数据集，而且适合于长文本的生成。

我们在我们的数据集上对T5、OPT和LLaMA模型进行了微调，并表明即使是较小的LongForm模型也具有良好的文本生成的泛化能力。我们的模型在各种任务上的表现优于10倍大的语言模型，如故事/菜谱生成和长文本问题回答。此外，LongForm模型在很大程度上超过了先前的指令调优模型，如FLAN-T5和Alpaca。最后，我们的模型能够有效地遵循和回答多语言指令；我们在新闻生成中证明了这一点。

论文地址：https://arxiv.org/abs/2304.08460

数据和模型：https://github.com/akoksal/LongForm

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

剑桥大学｜LongForm：利用语料库提取优化长文本生成的指令调控

n8n实战：Webhook、条件判断与API集成详解

谷歌太壕了！编程Agent大招至简：开源且免费，百万上下文、多模态、MCP全支持

国产GPU历史性时刻！摩尔线程、沐曦同日获IPO受理

一张小卡片敢卖999？原来是智能体AI硬件

OpenAI华人AI大牛集体跳槽Meta！清华北大浙大中科大校友各一位，多模态后训练、感知团队负责人全走了

谷歌太壕了！编程Agent大招至简：开源且免费，百万上下文、多模态、MCP全支持

MIT终身教授何恺明，入职谷歌了

AI“读书”合法了：美法院最新裁定，无需作者同意，已购书籍可用于训练AI

一张小卡片敢卖999？原来是智能体AI硬件

国产大模型高考出分了：裸分683，选清华还是北大？

剑桥大学｜LongForm： 利用语料库提取优化长文本生成的指令调控

剑桥大学｜LongForm：利用语料库提取优化长文本生成的指令调控