您现在的位置是：首页 > 其他

当前栏目

通过部分Jacobian对广义和深层神经网络进行关键初始化：一般理论和对LayerNorm的应用

网络神经网络

2023-03-20 14:50:40 时间

深度神经网络因藐视理论处理而臭名昭著。然而，当每层的参数数趋于无穷大时，网络函数是一个高斯过程（GP），定量的预测性描述是可能的。高斯近似允许制定选择超参数的标准，如权重和偏差的方差，以及学习率。这些标准依赖于为深度神经网络定义的临界性概念。在这项工作中，我们描述了一种新的方法来诊断（在理论上和经验上）这种临界性。为此，我们引入了网络的部分雅各布，定义为第l层的预激活相对于第l0<l层预激活的导数。当网络结构涉及许多不同的层时，这些量特别有用。我们讨论了部分Jacobian的各种属性，如它们与深度的比例关系以及与神经切线核（NTK）的关系。我们推导出部分雅各布的递归关系，并利用它们来分析有（或无）LayerNorm的深度MLP网络的临界性。我们发现，规范化层改变了最佳值。

原文题目：Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

原文：Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer l with respect to preactivations in layer l0<l. These quantities are particularly useful when the network architecture involves many different layers. We discuss various properties of the partial Jacobians such as their scaling with depth and relation to the neural tangent kernel (NTK). We derive the recurrence relations for the partial Jacobians and utilize them to analyze criticality of deep MLP networks with (and without) LayerNorm. We find that the normalization layer changes the optimal value.

通过部分Jacobian对广义和深层神经网络进行关键初始化：一般理论和对LayerNorm的应用.pdf

猜你喜欢

分库分表实战：寻根问底 — MySQL索引是如何形成的？
一款高颜值的MySQL管理工具：Sequel Pro
一篇文章带你了解Django ORM操作（基础篇）
什么是Snowflake数据云平台？体系结构和关键概念
spring：我是如何解决循环依赖的？
AutoUpgrade 快速升级 Oracle 数据库
图解 JavaScript 事件循环：微任务和宏任务
生产环境Oracle undo表空间管理的优秀实践
Neo4j 针对2022图数据平台发展的十大预测
JavaScript内存管理介绍
一体化架构：重新定义分布式数据库，进军核心业务系统
可视化搭建平台的地图组件和日历组件方案选型
如何在Linux中检查MySQL用户权限？
JavaScript的两大类内建数据类型
TypeScript 高级类型总结（含代码案例）
如何在 Amazon RDS 中部署 MySQL 数据库实例
阿粉被面试官吊起来疯狂捶打，结果很尴尬
Go和Scala等编程语言的比较研究
由浅到深让你明白 MySQL 的事务
Canal集群部署遇到的一些问题

zl程序教程

当前栏目

通过部分Jacobian对广义和深层神经网络进行关键初始化：一般理论和对LayerNorm的应用

相关文章