zl程序教程

您现在的位置是:首页 >  其它

当前栏目

为什么batchnormalize 有效

为什么 有效
2023-09-14 09:09:29 时间

论文链接

The popular belief is that this effectiveness stems from controlling the change of the layers’ input distributions during training to reduce the so-called“internal covariate shift”. In this work, we demonstrate that such distributionalstability of layer inputs has little to do with the success of BatchNorm. Instead,we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness inducesa more predictive and stable behavior of the gradients, allowing for faster training.
  • 普遍的看法是,这种有效性源于在训练期间控制层输入分布的变化以减少所谓的“内部协方差偏移”。 在这项工作中,我们证明了这种分布式层输入的稳定性与 BatchNorm 的成功无关。
  • 我们发现了 BatchNorm 对训练过程的一个更根本的影响:它使优化环境更加顺畅。 这种平滑性导致梯度的更具预测性和稳定性的行为,允许更快的训练。