LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

1,969次阅读
没有评论

LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

1、[CV] Neural Groundplans: Persistent Neural Scene Representations from a Single Image
2、[LG] Kernel Regression with Infinite-Width Neural Networks on Millions of Examples
3、[LG] Non-parametric Outlier Synthesis
4、[LG] Few-Shot Incremental Learning Using HyperTransformers
5、[LG] LEVER: Learning to Verify Language-to-Code Generation with Execution
[LG] The Expressive Power of Tuning Only the Norm Layers
[CL] Auditing large language models: a three-layered approach
[CL] Compositional Exemplars for In-context Learning
[CL] On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

摘要:单图像持久神经场景表示、百万样本无限宽神经网络核回归、非参数化异常点合成、基于HyperTransformers的少样本增量学习、学习通过执行验证语言到代码生成、只对归一化层微调的表达能力、审计大型语言模型的三层方法、上下文学习中的组合范例、大型语言模型的规划能力

1、[CV] Neural Groundplans: Persistent Neural Scene Representations from a Single Image

P Sharma, A Tewari, Y Du, S Zakharov, R A Ambrus, A Gaidon…
[MIT & Toyota Research Institute]

神经地平线: 单图像持久神经场景表示

要点:

  1. 提出一种将场景的 2D 图像观测值映射到持久的 3D 场景表征的方法,该方法将场景中的可移动和不可移动的部分拆分;
  2. 用条件神经地平线,即地面对齐的 2D 特征网格,作为持久的和记忆有效的场景表征,这些表征利用可微渲染从无标签多视图观察中自监督地训练出来的,并学习补全几何和遮挡区域的外观;
  3. 该方法训练时利用多视图视频,学习在测试时从单图像中分别重建场景的静态和可移动部分,利用简单的启发式方法实现各种下游任务,如提取以物体为中心的 3D 表征、新视图合成、实例级分割、3D 边框预测和场景编辑;
  4. 展示了对 3D 场景表征的自监督学习,这些表征被分解为可移动和不可移动的场景元素,并显示了神经地面图作为一种表征的潜力,可以为 3D 场景的下游处理提供数据高效的解决方案。

一句话总结:
提出一种自监督方法,将场景的 2D 图像观测值映射到 3D 场景表示上,将可移动和不可移动组件拆分开来,从而能直接在 3D 中高效地处理场景的外观和几何。

We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the bird’s-eye-view (BEV) representation commonly used in vision and robotics, we propose conditional neural groundplans, ground-aligned 2D feature grids, as persistent and memory-efficient scene representations. Our method is trained self-supervised from unlabeled multi-view observations using differentiable rendering, and learns to complete geometry and appearance of occluded regions. In addition, we show that we can leverage multi-view videos at training time to learn to separately reconstruct static and movable components of the scene from a single image at test time. The ability to separately reconstruct movable objects enables a variety of downstream tasks using simple heuristics, such as extraction of object-centric 3D representations, novel view synthesis, instance-level segmentation, 3D bounding box prediction, and scene editing. This highlights the value of neural groundplans as a backbone for efficient 3D scene understanding models.

https://openreview.net/forum?id=Pza24zf9FpS
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

2、[LG] Kernel Regression with Infinite-Width Neural Networks on Millions of Examples

B Adlam, J Lee, S Padhy, Z Nado, J Snoek
[Google]

百万样本无限宽神经网络核回归

要点:

  1. 无限宽神经网络提供了稳定的解决方案,优化和超参数调整相对较少,可能很适合结构化设计问题;
  2. 大规模并行计算和分布式算法使核回归能在更大的数据集上进行,在 CIFAR-10 和其他预测任务上取得最先进的性能;
  3. 对 CIFAR-5m 数据集的几个神经核扩展律进行了研究,显示了显著的预测性能,并表明有可能用更多的数据进一步改进;
  4. 成对核计算近似是进一步扩展的必要条件。

一句话总结:
探讨了用无限宽神经网络进行核回归,通过大规模并行计算和分布式共轭梯度算法,在不同的非标准数据模式上实现了高性能。

While kernel regression remains an important practical method, its connection to neural networks as their width becomes large has initiated fresh research. These neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. We address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regression at a large scale (i.e. up to 5 million examples). Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset. Using data augmentation to expand the original CIFAR-10 training dataset by a factor of 20, we obtain a test accuracy of 91.2% (SotA for a pure kernel method). Finally, we explore other data modalities, obtaining results on protein and small molecule prediction tasks that are competitive with SotA methods.

https://openreview.net/forum?id=ED3WvUgu09
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

3、[LG] Non-parametric Outlier Synthesis

L Tao, X Du, J Zhu, Y Li
[Wuhan University & University of Wisconsin]

非参数化异常点合成

要点:

  1. 非参数化异常点合成(NPOS)在不强加任何分布假设的情况下生成人工 OOD 数据;
  2. 与最近一种依赖参数分布假设的方法相比,NPOS 具有更强的性能和通用性;
  3. NPOS 在数学上将离群点合成表述为一个拒绝抽样过程,近似水平集以区分 ID 和 OOD 数据。
  4. 广泛的实验证明了 NPOS 在 OOD 检测方面的有效性和可扩展性,包括在ImageNet这样的大型数据集上。

一句话总结:
提出一种非参数化异常点合成框架,用于改进机器学习模型中的分布外检测。

Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, non-parametric outlier synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin.

https://openreview.net/forum?id=JHklpEZqduQ
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

4、[LG] Few-Shot Incremental Learning Using HyperTransformers

M Vladymyrov, A Zhmoginov, M Sandler
[Google]

基于HyperTransformers的少样本增量学习

要点:

  1. 提出一种新方法,用最近发表的 HyperTransformer(HT) 进行增量少样本学习,直接从支持集生成特定任务的CNN权重;
  2. 重新用生成权重作为下一个任务输入,允许 HT 用权重本身作为之前学到任务的表示,类似于一个循环模型;
  3. 增量 HyperTransformer 模型是高效的少样本学习器,可以在不需要训练的情况下从一小部分标记样本快速生成 CNN 权重,同时也是一个持续学习器,可通过 HT 的新迭代,用新任务的信息递归更新权重;
  4. 在两种持续学习的场景中显示了有希望的结果:增量任务学习和增量类别学习,显示了学习和保留关于过去任务知识的能力,没有灾难性遗忘,甚至有积极的后向迁移。

一句话总结:
HyperTransformer(HT)是高效的少样本学习和持续学习器,可保留过去任务的知识,而不会出现灾难性遗忘。

Incremental few-shot learning methods make it possible to learn without forgetting from multiple few-shot tasks arriving sequentially. In this work we approach this problem using the recently published HyperTransformer (HT): a hypernetwork that generates task-specific CNN weights directly from the support set. We propose to re-use these generated weights as an input to the HT for the next task of the continual-learning sequence. Thus, the HT uses the weights themselves as the representation of the previously learned tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. Instead, we show that the HT works akin to a recurrent model, relying on the weights from the previous task and a support set from a new task. We demonstrate that a single HT equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for two continual learning scenarios: incremental-task learning and incremental-class learning.

https://openreview.net/forum?id=nXmU89Rfmgg
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

5、[LG] LEVER: Learning to Verify Language-to-Code Generation with Execution

A Ni, S Iyer, D Radev, V Stoyanov, W Yih, S I. Wang, X V Lin
[Meta AI & Yale University]

LEVER: 学习通过执行验证语言到代码生成

要点:

  1. LEVER 是一种简单方法,通过学习用执行结果验证生成的程序来改进语言到代码的生成;
  2. LEVER 训练验证器根据自然语言输入、程序本身及其执行结果来确定从 CodeLM 中抽取的程序是否正确;
  3. LEVER 通过将验证分数与 CodeLM 的生成概率相结合,并对具有相同执行结果的程序进行边际化处理,对抽样程序进行重排序;
  4. LEVER 持续改善了 CodeLM 在四种语言到代码任务中的性能,并在所有任务中取得了新的最先进结果。

一句话总结:
LEVER 通过训练验证器来学习验证语言到代码的生成,验证器根据自然语言输入、程序及其执行结果来确定从 CodeLM 中采样的程序是否正确,并根据验证分数和生成概率对采样的程序进行重排序。

The advent of pre-trained code language models (CodeLMs) has lead to significant progress in language-to-code generation. State-of-the-art approaches in this area combine CodeLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics cannot well capture the semantic features of the execution results, such as data type and value range, which often indicates the correctness of the program. In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the CodeLM is correct or not based on the natural language input, the program itself and its execution results. The sampled programs are reranked by combining the verification score with the CodeLM generation probability, and marginalizing over programs with the same execution results. On four datasets across the domains of table QA, math QA and basic Python programming, LEVER consistently improves over the base CodeLMs (4.6% to 10.9% with code-davinci-002) and achieves new state-of-the-art results on all of them.

https://arxiv.org/abs/2302.08468
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言


另外几篇值得关注的论文:

[LG] The Expressive Power of Tuning Only the Norm Layers

A Giannou, S Rajput, D Papailiopoulos
[University of Wisconsin-Madison]

只对归一化层微调的表达能力

要点:

  1. 特征归一化变换对最先进的深度神经网络是至关重要的;
  2. 只对更宽或更深的随机网络归一化层进行微调,就可以实现下游任务的高精度;
  3. 归一化层可以完美地重建任何给定的神经网络,只需增加 O(sqrt(d)) 的参数;
  4. 归一化层的微调能力,将结果扩展到其他架构,证明关于归一化层表达能力的下界或不可能结果,都是令人兴奋的开放问题。

一句话总结:
只对更宽或更深的随机网络的归一化层进行微调,可以完美地重建任何给定的神经网络,只需增加 O(sqrt(d)) 的参数。

Feature normalization transforms such as Batch and Layer-Normalization have become indispensable ingredients of state-of-the-art deep neural networks. Recent studies on fine-tuning large pretrained models indicate that just tuning the parameters of these affine transforms can achieve high accuracy for downstream tasks. These findings open the questions about the expressive power of tuning the normalization layers of frozen networks. In this work, we take the first step towards this question and show that for random ReLU networks, fine-tuning only its normalization layers can reconstruct any target network that is O(width‾‾‾‾‾√) times smaller. We show that this holds even for randomly sparsified networks, under sufficient overparameterization, in agreement with prior empirical work.

https://arxiv.org/abs/2302.07937
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

[CL] Auditing large language models: a three-layered approach

J Mökander, J Schuett, H R Kirk, L Floridi
[University of Oxford & Centre for the Governance of AI]

审计大型语言模型的三层方法

要点:

  1. 审计程序的设计,必须能够捕捉到 LLM 所带来的风险,并且必须在一个结构化的过程中进行连接;
  2. 提出三层方法,包括治理审计、模型审计和应用审计;
  3. 审计必须由独立的第三方进行,以确保 LLM 的道德、法律和技术上的健全;
  4. 该方法的有效性取决于协调的审计和对结构化过程的需求。

一句话总结:
提出了一个审计大型语言模型(LLM)的三层方法,以应对道德和治理的挑战。

The emergence of large language models (LLMs) represents a major advance in artificial intelligence (AI) research. However, the widespread use of LLMs is also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism to help ensure that AI systems are designed and deployed in ways that are ethical, legal, and technically robust. However, existing auditing procedures fail to address the governance challenges posed by LLMs, which are adaptable to a wide range of downstream tasks. To help bridge that gap, we offer three contributions in this article. First, we establish the need to develop new auditing procedures that capture the risks posed by LLMs by analysing the affordances and constraints of existing auditing procedures. Second, we outline a blueprint to audit LLMs in feasible and effective ways by drawing on best practices from IT governance and system engineering. Specifically, we propose a three-layered approach, whereby governance audits, model audits, and application audits complement and inform each other. Finally, we discuss the limitations not only of our three-layered approach but also of the prospect of auditing LLMs at all. Ultimately, this article seeks to expand the methodological toolkit available to technology providers and policymakers who wish to analyse and evaluate LLMs from technical, ethical, and legal perspectives.

https://arxiv.org/abs/2302.08500
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

[CL] Compositional Exemplars for In-context Learning

J Ye, Z Wu, J Feng, T Yu, L Kong
[The University of Hong Kong & Shark-NLP]

上下文学习中的组合范例

要点:

  1. CEIL 对上下文实例的联合概率进行建模,并用对比性损失来优化子集选择;
  2. CEIL 在7个不同类别的 12 个NLP任务上的表现超过了之前的方法;
  3. CEIL的习得检索器表现出令人惊讶的跨语言模型和数据集的可迁移性,以及对组合性任务的组合性;
  4. CEIL 提供了一种有效和高效的方法,使大型语言模型自适应下游任务。

一句话总结:
CEIL 是一种新的上下文学习方法,用联合概率模型和对比学习框架,实现了最先进的性能和可迁移性。

Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability, where the model learns to do an unseen task via a prompt consisting of input-output examples as the demonstration, without any parameter updates. The performance of ICL is highly dominated by the quality of the selected in-context examples. However, previous selection methods are mostly based on simple heuristics, leading to sub-optimal performance. In this work, we formulate in-context example selection as a subset selection problem. We propose CEIL(Compositional Exemplars for In-context Learning), which is instantiated by Determinantal Point Processes (DPPs) to model the interaction between the given input and in-context examples, and optimized through a carefully-designed contrastive learning objective to obtain preference from LMs. We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing. Extensive experiments demonstrate not only the state-of-the-art performance but also the transferability and compositionality of CEIL, shedding new light on effective and efficient in-context learning. Our code is released at this https URL.

https://arxiv.org/abs/2302.05698
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

[CL] On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

K Valmeekam, S Sreedharan, M Marquez…
[Arizona State University & Colorado State University]

大型语言模型的规划能力

要点:

  1. 大型语言模型在常识性规划任务中生成和验证规划的能力是微不足道的,平均只有3%左右的成功率;
  2. LLM 生成的规划可以被健全的规划器迅速纠正,以保证其合理性;
  3. 有 LLM 作为计划助手,展示了由人在回路产生的规划的准确性的适度改善;
  4. 所开发的基准套件和评估工具可供研究界评估LLM的规划能力。

一句话总结:
大型语言模型(LLM)的自主规划能力有限,但可以为人或AI规划器提供启发式指导。

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and (2) how good LLMs are in being a source of heuristic guidance for other agents–either AI planners or human planners–in their planning tasks. To investigate these questions in a systematic rather than anecdotal manner, we start by developing a benchmark suite based on the kinds of domains employed in the International Planning Competition. On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop. Our results show that LLM’s ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate. The heuristic and human-in-the-loop modes show slightly more promise. In addition to these results, we also make our benchmark and evaluation tools available to support investigations by research community.

https://arxiv.org/abs/2302.06706
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 
评论(没有评论)