zl程序教程

您现在的位置是:首页 >  工具

当前栏目

Incremental Learning vs Online Learning

vs learning Online
2023-09-14 09:01:30 时间

目录

1. 前言

2. 有人认为两者就是同一概念

3. 有人认为在线学习包含了增量学习

4. Incremental Online Learning和Online Incremental Learning

5. Incremental Learning(vs batch learning)包括Online Learning

5.1 My comment


1. 前言

        最近接触到Online Learning(在线学习)和Incremental Learning(通常翻译成增量学习,不过我认为“递增式学习”更好些),自然而然地冒出一个疑问,两者是什么关系呢?有什么异同点呢?一查,发现这竟然看上去是在学术界和工业界似乎并没有取得共识的一桩公案。

2. 有人认为两者就是同一概念

        比如说,Matlab的关于Incremental Learning的文档描述(Incremental Learning Overview - MATLAB & Simulink (mathworks.com)):

         显然,这就是认为两者是同一概念。

        有人[2]引述“Gama, João, et al. "A survey on concept drift adaptation." ACM Computing Surveys (CSUR) 46.4 (2014): 44.”的一段话:

        We can distinguish two learning modes: offline learning and online learning. In offline learning, the whole training data must be available at the time of model training. Only when training is completed can the model be used for predicting. In contrast, online algorithms process data sequentially. They produce a model and put it in operation without having the complete training dataset available at the beginning. The model is continuously updated during operation as more training data arrives.

        Less restrictive than online algorithms are incremental algorithms that process input examples one by one (or batch by batch) and update the decision model after receiving each example. Incremental algorithms may have random access to previous examples or representative/selected examples. In such a case, these algorithms are called in- cremental algorithms with partial memory. Typically, in incremental algorithms, for any new presentation of data, the update operation of the model is based on the previous one. Streaming algorithms are online algorithms for processing high-speed continuous flows of data. In streaming, examples are processed sequentially as well and can be examined in only a few passes (typically just one). These algorithms use limited memory and limited processing time per item.

        然后得出结论“刚好读到一段,感觉应该是基本一样的做法,从不同的角度取名而已”。不过我并没有从以上这段话中读出这两个概念等价的意思来,相反,“Less restrictive than online algorithms are incremental algorithms...”应该理解为两者是不同且后者是比前者更宽泛的概念。

        我不赞同将两者混为一谈的观点。

3. 有人认为在线学习包含了增量学习

        比如[3]: online learning包括了incremental learning和decremental learning

        不过该博文其后的一堆不知出处(CSDN博客或者说中文世界的博文的最大的问题可能就是绝大多数人欠缺引用需要注明出处这个最基本常识)的文字描述并没有出现支持以上这个论断的论据。所以不知道这个说法是出自严谨的学术界还是博主自己的随意脑补臆断。

        又如[4]中有人回答:研究方向incremental learning相关,谈谈我个人的理解:

        online learning包括了incremental learning和decremental learning等情况,描述的是一个动态学习的过程。前者是增量学习,每次学习一个或多个样本,这些训练样本可以全部保留、部分保留或不保留;后者是递减学习,即抛弃“价值最低”的保留的训练样本。这两个概念在incremental and decremental svm这篇论文里面可以看到具体的操作过程。

        同样我也不赞同这种观点,因为事实应该恰好相反。

        

4. Incremental Online Learning和Online Incremental Learning

        在网上搜索一下还能找到像Incremental Online Learning和Online Incremental Learning这些混搭式说法(当然这些混搭式说法的存在至少说明了持这种提法的人认为Incremental Learning和Online Learning不是一回事)。

        比如说,"CVPR2020,Jiangpeng He, et al, Incremental Learning In Online Scenario":

Online incremental learning [15] is a subarea of incremental learning that are additionally bounded by run-time and capability of lifelong learning with limited data compared to offline learning.

        这段文字表明作者认为Incremental Learning是比Online Learning更宽泛的概念。

        又比如,“Incremental Online Learning. Incremental online machine learning | by Danny Butvinik | Analytics Vidhya | Medium”。这篇博文的标题是"Incremental Online Learning"而文内又出现“In this article I’m going to overview a few online incremental learning algorithms...”所以作者是认为"Incremental Online Learning"和"Online Incremental Learning"是同一概念还是单纯的笔误呢不得而知。

5. Incremental Learning(vs batch learning)包括Online Learning

        在[4]中有人给出了一个解答比较符合我的认识:

        So, the following answer is just based on different opinions of collegues and professors from the field. I want to try to summarize it briefly:

        Sequential and online learning is mostly associated with Bayesian updating. The sequential learning is used widely for an order in time of the data, meaning that $x_1$ is coming always first, then $x_2$, then $x_3$ and so on. The Dataset each has a certain order in that sense. In contrast to that incremental may be a whole block of data at time x and another block of data at time y. While the block internally may be randomly ordered.

        Concerning online learning, the people mostly referred to a data stream, hence a online learning is always incremental learning but incremental learning does not have to be online.

        These are quite fuzzy definitions, and in my opinion there is not clear definition though. I still hope that helps.

5.1 My comment

        追加一些我的comment。Online Learning肯定是指数据是以流式的方式逐个(或者逐批,很多文章强调data coming one by one,但是我觉得是不是one-by-one倒不是绝对的重点)到来,重点是不是所有数据一次性摆在模型面前,而是随着时间的推进逐步到来,是一个时序列。而Incremental Learning的重点不在于数据是不是向在线方式那样以时序列的方式到来,而更加类似于深度学习中在处理大的数据集时由于内存等限制而采用的mini-batch方式的训练一样,一小块一小块数据地读入训练,甚至每个数据每个数据地训练(这样就变成了stochastic gradient decent了),而且每一mini-batch训练完学习到的知识直接就应用于下一个min-batch的训练。这样在一个epoch(遍历整个数据集一遍)内就实现了递增式的训练。如果这每个mini-batch(或每个样本)的数据是随着时间的推移而逐个或逐批到来的话,那就成了online learning了。所以说Online Learning一定是Incremental Learning,反过来则不一定。

        所以上面的那些Incremental Online Learning和Online Incremental Learning的说法中,其实前者是冗余的。有些Incremental Learning并不是Online的,要强调区分是在线式的增量学习还是非在线式的增量学习online/offline Incremental Learning, 统称为Incremental Learning. 

        进一步,可以说Incremental Learning是与Batch Learning相对的概念,它要解决的主要问题其实是大数据集处理时的内存需求问题.

References:

(1) 增量学习(Incremental Learning)小综述 - 知乎 (zhihu.com)

(2) 在线学习on-line learning和增量学习Incremental Learning区别? - 知乎

(3)  从不同角度看机器学习的几种学习方式_Mr.Scofield-CSDN博客

(4) Is there a difference between on-line learning, incremental learning and sequential learning? - Data Science Stack Exchange