您现在的位置是：首页 > 其它

当前栏目

[Converge] Larger batch size?

size Batch

2023-09-27 14:23:24 时间

Ref: Effect of batch size on training dynamics

Don’t decay the learning rate increase the batch size

We can often achieve the benefits of decaying the learning rate by instead increasing the batch size during training.

Hypothesis: larger batch sizes don’t generalize as well because the model cannot travel far enough in a reasonable number of training epochs.
Finding: better solutions can be far away from the initial weights and if the loss is averaged over the batch then large batch sizes simply do not allow the model to travel far enough to reach the better solutions for the same number of training epochs.

大家一般都有这样的默认, 调大batch size就要增大学习率, 这是为什么呢?

在比较大batch和小batch时,一般默认都是相同epoch去比较,这样大batch训练iteration次数会更少, 此时如果学习率不做任何调整, 大batch训练更少iteration, 导致拟合程度较低, 精度也会低, 因此需要对学习率做调整, 一个大batch包含样本更多, 避免了小batch包含极端样本的情况, 方差更小, 意味着使用大batch计算下来的梯度方向更可信, 因此可以使用一个更大的learning rate

盲目增大 Batch_Size 有何坏处？

内存利用率提高了，但是内存容量可能撑不住了。

跑完一次 epoch（全数据集）所需的迭代次数减少，要想达到相同的精度，其所花费的时间大大增加了，从而对参数的修正也就显得更加缓慢。

Batch_Size 增大到一定程度，其确定的下降方向已经基本不再变化。

Conclusion: use the default one.

猜你喜欢

JS数组常见方法的深浅拷贝分类
NOT IN clause and NULL values
SwiftUI 内功之如何获取设备唯一编号（教程含源码）
微软好绝情：Windows 7再无重大更新！
sql server 小技巧(8) visual studio 2013里使用Sql server compact 4.0及发布问题处理
微软或用人类新陈代谢物为数据中心供电
互联网金融，巨头天下还是创业者天堂?
变量、变量作用域、常量final、变量的命名规范
Laravel 安装指南
Nginx使用教程(一)：下载并编译安装Nginx
CPU性能实战分析
《惢客创业日记》2020.06.28-29（周日）我为什么要写创业日记？
chrome 插件
JavaScript中Switch语句的使用方法
基于C++实现的（控制台）景区管理系统【100010671】
基于协同过滤推荐算法(Collaborative Filtering Recommendation)的推荐系统matlab仿真
Redis开发与运维. 1.3　Redis使用场景
激光雷达

相关主题

JDBC fetch size
font-size：625%；
125.C++中的size_t
background-size
size属性
ARRAY_SIZE宏
length与size()
size和len
size函数
size(A,1)

zl程序教程

当前栏目

[Converge] Larger batch size?

相关文章