LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

1,694次阅读
没有评论

LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

1、[LG] Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators
2、[CL] Language Is Not All You Need: Aligning Perception with Language Models
3、[CV] The Role of Pre-training Data in Transfer Learning
4、[LG] The Dormant Neuron Phenomenon in Deep Reinforcement Learning
5、[LG] Permutation Equivariant Neural Functionals
[LG] Internet Explorer: Targeted Representation Learning on the Open Web
[CL] SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

摘要:机器学习随机性有利于持续攻击不利于评估、将感知与语言模型相结合的多模态大型语言模型、预训练数据对迁移学习的重要性、深度强化学习的休眠神经元现象、置换等变神经泛函网络、开放网络环境下的目标式表示学习、基于脉冲神经网络的生成式预训练语言模型

1、[LG] Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

K Lucas, M Jagielski, F Tramèr, L Bauer, N Carlini
[CMU & Google Research & ETH Zurich]

机器学习随机性有利于持续性攻击不利于评估

要点:

  1. 机器学习防御系统中的随机性会增加对重复性攻击者的脆弱性;
  2. 随机性对于机器学习防御系统的鲁棒性来说通常是不必要的;
  3. 随机防御的确定性版本不会降低其鲁棒性;
  4. 提出一种新的评估框架,Subspace Gridsweep,可以更好地评估确定性防御的鲁棒性。

一句话总结:
机器学习防御系统中的随机性会增加对持续性攻击的脆弱性,并使鲁棒性评估复杂化,建议在设计和实施时谨慎对待。

It is becoming increasingly imperative to design robust ML defenses. However, recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary. In this work we take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible. We begin by illustrating a new issue with the deployment of randomized defenses that reduces their security compared to their deterministic counterparts. We then provide evidence that making defenses deterministic simplifies robustness evaluation, without reducing the effectiveness of a truly robust defense. Finally, we introduce a new defense evaluation framework that leverages a defense’s deterministic nature to better evaluate its adversarial robustness.

https://arxiv.org/abs/2302.13464
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

2、[CL] Language Is Not All You Need: Aligning Perception with Language Models

S Huang, L Dong, W Wang, Y Hao, S Singhal…
[Microsoft]

只是语言还不够: 感知与语言模型相结合

要点:

  1. Kosmos-1 是一个多模态的大型语言模型,可以感知一般模态,遵循指令,并进行上下文学习;
  2. Kosmos-1 在语言和多模态任务上取得了令人印象深刻的表现,无需进行微调,包括带有文字指示的图像识别、视觉问答和多模态对话;
  3. MLLM 可以从跨模态迁移中获益,将知识从语言迁移到多模态,反之亦然;
  4. KOSMOS-1 可以扩展模型规模,并与语音能力相结合,作为多模态学习的统一界面。

一句话总结:
多模态大型语言模型 Kosmos-1 能感知一般模态,进行上下文学习,并遵循指令,在语言和多模态任务上取得了令人印象深刻的表现,无需进行微调,这表明将语言和感知相结合,是迈向通用人工智能的关键一步。

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.

https://arxiv.org/abs/2302.14045
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

3、[CV] The Role of Pre-training Data in Transfer Learning

R Entezari, M Wortsman, O Saukh, M Shariatnia, H Sedghi, L Schmidt
[University of Washington & TU GrazTehran & University of Medical Sciences & Google Research]
预训练数据对迁移学习的重要性

要点:

  1. 预训练数据分布的选择对于少样本前沿至关重要,但随着可用于微调的数据越来越多,其作用也会下降;
  2. 用 LAION 的 2000 倍以上预训练数据可以与有监督的 ImageNet 预训练的性能相当;
  3. 与语言-图像对比预训练相比,图像-图像对比预训练方法导致了更好的下游准确性;
  4. 不同的预训练决策可以在多样本场景下导致类似的准确性,但在所考虑的环境中,它们仍优于从头开始的训练。

一句话总结:
预训练数据的选择对于少样本迁移学习至关重要,但随着数据的增多而变得不那么重要,使用更多的预训练数据可以弥补不同预训练方法之间的性能差距。

The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy

https://arxiv.org/abs/2302.13602
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

4、[LG] The Dormant Neuron Phenomenon in Deep Reinforcement Learning

G Sokar, R Agarwal, P S Castro, U Evci
[Eindhoven University of Technology & Google Research]

深度强化学习的休眠神经元现象

要点:

  1. 描述了深度强化学习中的休眠神经元现象,即智能体网络出现越来越多不活跃的神经元,影响网络的表现力;
  2. 这种现象的存在在各种算法和环境中得到了证明,本文强调了其对学习的影响;
  3. 为了解决该问题,提出一种简单有效的方法(ReDo),在整个训练过程中回收休眠神经元,减少休眠神经元数量并提高性能;
  4. ReDo 可作为一个重要的组成部分,以样本高效方式扩展强化学习网络,并有可能进一步研究初始化和优化回收的能力,以实现更好的结果。

一句话总结:
深度强化学习网络存在休眠神经元现象,降低了网络的表现力,ReDo 是一种回收休眠神经元的简单方法,可以保持网络利用率并提高性能。

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent’s network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.

https://arxiv.org/abs/2302.12902
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

5、[LG] Permutation Equivariant Neural Functionals

A Zhou, K Yang, K Burns, Y Jiang, S Sokota, J. Z Kolter, C Finn
[Stanford University & CMU]

置换等变神经泛函

要点:

  1. 提出一种设计用于处理权重空间对象神经网络的框架,称为神经泛函网络(NFN);
  2. 重点是由于神经网络的结构而在权重空间产生的置换对称性;
  3. 包括两种等变 NF 层,作为 NFN 的构建模块,在基础对称性假设和参数效率方面有所不同;
  4. 实验结果表明,互换等变神经函数的性能优于之前的方法,并能有效解决权重空间的任务。

一句话总结:
提出一种设计用于处理权重空间对象神经网络的框架,重点是置换对称性,通过实验证明其对各种权重空间任务有效。

This work studies the design of neural networks that can process the weights or gradients of other neural networks, which we refer to as neural functional networks (NFNs). Despite a wide range of potential applications, including learned optimization, processing implicit neural representations, network editing, and policy evaluation, there are few unifying principles for designing effective architectures that process the weights of other networks. We approach the design of neural functionals through the lens of symmetry, in particular by focusing on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. We introduce a framework for building permutation equivariant neural functionals, whose architectures encode these symmetries as an inductive bias. The key building blocks of this framework are NF-Layers (neural functional layers) that we constrain to be permutation equivariant through an appropriate parameter sharing scheme. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks that require processing the weights of MLPs and CNNs, such as predicting classifier generalization, producing “winning ticket” sparsity masks for initializations, and editing the weights of implicit neural representations (INRs). In addition, we provide code for our models and experiments at this https URL.

https://arxiv.org/abs/2302.14040
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言


另外几篇值得关注的论文:

[LG] Internet Explorer: Targeted Representation Learning on the Open Web

A C. Li, E Brown, A A. Efros, D Pathak
[CMU & UC Berkeley]

Internet Explorer: 开放网络环境下的目标式表示学习

要点:

  1. 目前的视觉模型,只从预训练的数据集中获取知识,限制了其在特定任务上表现良好的能力;
  2. Internet Explorer 提供了一种替代方法,主动搜索并利用开放网络来训练擅长手头任务的小规模模型;
  3. Internet Explorer 在很短的时间内就取得了最先进的结果,并超过了计算量大的 oracle 模型和以无定向方式搜索网络的强大基线;
  4. Internet Explorer 的成功突出了交互式网络探索作为高度相关训练数据的有效来源的潜力。

一句话总结:
Internet Explorer 是一种用自监督学习通过在交互探索开放网络发现的相关样本上训练模型的方法,在单个 GPU 上仅用 30-40 小时就取得了最先进的结果。

Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet — where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30–40 hours. Results, visualizations, and videos at this https URL

https://arxiv.org/abs/2302.14051
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

[CL] SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

R Zhu, Q Zhao, J K. Eshraghian
[University of California, Santa Cruz  & Kuaishou Technology Co. Ltd]

SpikeGPT: 基于脉冲神经网络的生成式预训练语言模型

要点:

  1. SpikeGPT是第一个使用直接 SNN 训练进行语言生成的语言模型;
  2. SpikeGPT 实现了与 ANN 相当的性能,同时保持了基于尖峰的计算能效;
  3. 将强大的 Transformer 架构与 SNN 相结合,利用线性化和递归 Transformer 块来避免额外的模拟时间步骤。

一句话总结:
SpikeGPT是一种生成式语言模型,使用脉冲神经网络来减少计算开销和能源消耗。

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, we successfully implement “SpikeGPT”, a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at this https URL.

https://arxiv.org/abs/2302.13939
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 
评论(没有评论)