报告主题:基于梯度下降的神经网络学习中的不变低维子空间
报告日期:12月26日(周二)11:00-12:00
主题简介:
在过去的几年里,梯度下降对于简洁解的隐式偏向是在深度网络训练中广泛研究的现象。在这项工作中,我们首先将焦点缩小到深度线性网络并来研究这一现象。通过我们的分析,在数据具有低维结构时,我们的研究揭示了学习动态中的一个令人惊讶的“简洁法则”。
具体而言,我们表明从正交初始化开始的梯度下降的演化只会影响所有权重矩阵的一小部分奇异向量空间。换句话说,尽管在整个训练过程中更新了所有权重参数,但学习过程仅发生在每个权重矩阵的一个小不变子空间内。学习动态的这种简单性对于提高训练的高效性和对更好的理解深度网络的表示都有重大影响。首先,该分析使我们能够通过利用学习动态中的低维结构来显著提高训练效率。我们可以构建更小但等效的深度线性网络,而不会牺牲对应的宽网络关联的优势。
此外,我们展示了对于高效训练深度非线性网络的潜在可能性。其次,它使我们能够更好地理解深度表示学习,并理论阐明从浅层到深层网络的逐渐特征压缩和区分。这项研究为深度非线性网络中的分层表示的理解奠定了基础。
本次演讲基于三项最近的研究成果:
https://arxiv.org/abs/2306.01154
https://arxiv.org/abs/2311.05061
https://arxiv.org/abs/2311.02960
Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we first investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising “law of parsimony” in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, even though all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Moreover, we demonstrate the potential implications for efficient training deep nonlinear networks.
Second, it allows us to better understand deep representation learning by elucidating the progressive feature compression and discrimination from shallow to deep layers. The study paves the foundation for understanding hierarchical representations in deep nonlinear networks.
报告嘉宾:
曲庆是密歇根大学电子工程与计算机科学系的助理教授。他分别于2018年10月从哥伦比亚大学获得电气工程博士学位,2011年7月从清华大学获得学士学位。他的研究兴趣集中在数据科学基础、机器学习、数值优化和信号/图像处理的交叉领域。
扫描下方二维码
或点击「阅读原文」报名