通过部分Jacobian对广义和深层神经网络进行关键初始化:一般理论和对LayerNorm的应用
深度神经网络因藐视理论处理而臭名昭著。然而,当每层的参数数趋于无穷大时,网络函数是一个高斯过程(GP),定量的预测性描述是可能的。高斯近似允许制定选择超参数的标准,如权重和偏差的方差,以及学习率。这些标准依赖于为深度神经网络定义的临界性概念。在这项工作中,我们描述了一种新的方法来诊断(在理论上和经验上)这种临界性。为此,我们引入了网络的部分雅各布,定义为第l层的预激活相对于第l0<l层预激活的导数。当网络结构涉及许多不同的层时,这些量特别有用。我们讨论了部分Jacobian的各种属性,如它们与深度的比例关系以及与神经切线核(NTK)的关系。我们推导出部分雅各布的递归关系,并利用它们来分析有(或无)LayerNorm的深度MLP网络的临界性。我们发现,规范化层改变了最佳值。
原文题目:Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
原文:Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer l with respect to preactivations in layer l0<l. These quantities are particularly useful when the network architecture involves many different layers. We discuss various properties of the partial Jacobians such as their scaling with depth and relation to the neural tangent kernel (NTK). We derive the recurrence relations for the partial Jacobians and utilize them to analyze criticality of deep MLP networks with (and without) LayerNorm. We find that the normalization layer changes the optimal value.
相关文章
- 金融服务领域的大数据:即时分析
- 影响大数据、机器学习和人工智能未来发展的8个因素
- 从0开始构建一个属于你自己的PHP框架
- 如何将Hadoop集成到工作流程中?这6个优秀实践必看
- SEO公司使用大数据优化其模型的5种方法
- 关于Web Workers你需要了解的七件事
- 深入理解HTTPS原理、过程与实践
- 增强分析:数据和分析的未来
- PHP协程实现过程详解
- AI专家:大数据知识图谱——实战经验总结
- 关于PHP的错误机制总结
- 利用数据分析量化协同过滤算法的两大常见难题
- 怎么做大数据工作流调度系统?大厂架构师一语点破!
- 2019大数据处理必备的十大工具,从Linux到架构师必修
- OpenCV中的KMeans算法介绍与应用
- 教大家如果搭建一套phpstorm+wamp+xdebug调试PHP的环境
- CentOS下三种PHP拓展安装方法
- Go语言HTTP Server源码分析
- Go语言HTTP Server源码分析
- 2017年4月编程语言排行榜:Hack首次进入前五十