Exploding and vanishing gradients in deep neural networks: the effect of residual connections
中文标题: 深度神经网络中的梯度爆炸与消失:残差连接的影响
英文摘要
This paper analyzes the exploding and vanishing gradient problem in deep neural networks using multiplicative ergodic theory. It explains how adding residual connections affects gradient dynamics. The analysis exploits a characterization of Lyapunov exponents due to Furstenberg and Kifer to make precise statements about the Lyapunov spectrum. The work demonstrates the effect of residual connections on this spectrum and its relation to gradient stability.
中文摘要
本文利用乘法遍历理论分析深度神经网络中梯度爆炸与消失问题,并解释残差连接对梯度动态的影响。分析基于Furstenberg和Kifer对李雅普诺夫指数的刻画,对李雅普诺夫谱做出了精确陈述,并揭示了残差连接如何改变该谱以缓解梯度问题。
关键要点
Uses multiplicative ergodic theory to study exploding/vanishing gradients.
使用乘法遍历理论分析梯度爆炸/消失现象。
Employs Furstenberg and Kifer's characterization of Lyapunov exponents for rigorous statements about the Lyapunov spectrum.
运用Furstenberg和Kifer对李雅普诺夫指数的刻画,对李雅普诺夫谱做出严格表述。
Explains the precise effect of residual connections on the Lyapunov spectrum.
阐述残差连接对李雅普诺夫谱的具体影响。