机器学习中的两个概率模型
discriminative model 和 generative model是机器学习算法中两种概率模型,用来实现对训练样本的概率分布进行建模,在实践中由于经常混淆,现在通过查阅资料,将两者的分别总结于此。
不妨用stackoverflow上的一段描述来开启这个话题:
Let’s say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution
p(x,y) and a discriminative model learns the conditional probability distributionp(y|x) - which you should read as “the probability ofy givenx ”. Here’s a really simple example. Suppose you have the following data in the form(x,y):(1,0),(1,0),(2,0),(2,1)
If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions. The distribution
p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms modelp(x,y) , which can be tranformed intop(y|x) by applying Bayes rule and then used for classification. However, the distributionp(x,y) can also be used for other purposes. For example you could usep(x,y) to generate likely(x,y) pairs. From the description above you might be thinking that generative models are more generally useful and therefore better, but it’s not as simple as that. The overall gist is that discriminative models generally outperform generative models in classification tasks.
Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule.
生成模型是对样本数据的联合概率
Although this topic is quite old, I think it’s worth to add this important distinction. In practice the models are used as follows.
In discriminative models to predict the label y from the training example x, you must evaluate:
Which merely chooses what is the most likely class considering x. It’s like we were trying to model the decision boundary between the classes. This behavior is very clear in neural networks, where the computed weights can be seen as a complex shaped curve isolating the elements of a class in the space.
Now using Bayes’ rule, let’s replace the
Which is the equation you use in generative models. While in the first case you had the conditional probability distribution
With the joint probability distribution function, given an y, you can calculate (“generate”) its respective x. For this reason they are called generative models.
Imagine your task is to classify a speech to a language:
you can do it either by:
1) Learning each language and then classifying it using the knowledge you just gained
OR
2) Determining the difference in the linguistic models without learning the languages and then classifying the speech.
the first one is the Generative Approach and the second one is the Discriminative approach.
Examples of discriminative models used in machine learning include:
- Logistic regression
- Support vector machines
- Boosting (meta-algorithm)
- Conditional random fields
- Linear regression
- Neural networks
Examples of generative models include:
- Gaussian mixture model and other types of mixture model
- Hidden Markov model
- Probabilistic context-free grammar
- Naive Bayes
- Averaged one-dependence estimators
- Latent Dirichlet allocation
- Restricted Boltzmann machine
2015-8-31 艺少
增补内容:2015-9-1
利用Discriminative model对
(注:
1. 为
比如选择
2. 通过
将正态分布的均值
3. 以
参数为
参数可以通过最大化后验概率(MAP),或者最大似然概率(MLE)等来实现估计。
利用Generative模型对
1. 为
比如选择
2. 通过
将正态分布的均值
3. 以
参数为
参数可以通过最大化后验概率(MAP),或者最大似然概率(MLE)等来实现估计。
之后通过
在这个例子中,如果采用最大似然估计的方法,则两个模型生成的相同的正态分布。主要是因为x,w都是连续的,而且由线性模型相关联着,都是采用的正态分布来表示不确定性。如果使用MAP即最大后验估计,两个模型将会有不同的结果。
上面主要是以连续回归的方法进行的对比,下面将通过分类离散的方法进行对比,区分效果将更加明显
利用Discriminative model对
(注:
1. 为
比如选择
2. 通过
对伯努利分布中的参数
3. 以
参数为
利用Generative模型对
1. 为
比如选择
2. 通过离散的二进制值
将正态分布的均值
3. 以
参数为
两者的对比如下图所示:
对于generative model,采用学习算法(learning algorithm)估计的是
p(x|y) 模型,而采用推理算法(inference algorithm)直接结合先验概率p(y) ,推至联合概率密度和利用贝叶斯准则计算至后验概率p(y|x) 。
2015-9-1 艺少
相关文章
- 机器学习:学习KMeans算法,了解模型创建、使用模型及模型评价
- 02 机器学习任务攻略-学习笔记-李宏毅深度学习2021年度
- 李宏毅机器学习笔记——14. Attack ML Models and Defense(机器学习模型的攻击与防御)
- 2020李宏毅机器学习笔记——3.Gradient Descent(梯度下降)
- 【机器学习】Matlab中实现QQ-plot的一个好工具gqqplot
- 【机器学习】机器学习中的各种相似性、距离度量
- 机器学习中的两个概率模型
- 机器学习之深度学习
- 特征选择:最大信息系数(MIC;Maximal Information Coefficient)【用于衡量两个变量X和Y之间的关联程度,线性或非线性的强度,常用于机器学习的特征选择】
- 五个鲜为人知,但又不可不知的机器学习开源项目
- 全球首个癌症计算解决方案启动,英伟达、IBM 加持的机器学习搞得定吗?
- 机器学习项目中的数据预处理与数据整理之比较
- 贝叶斯机器学习到底是什么?看完这篇你就懂啦
- 能逃避机器学习检测的 Cerber 勒索变种
- 机器学习算法总结(六)——EM算法与高斯混合模型
- 机器学习之混淆矩阵
- 机器学习之感知机算法
- 机器学习笔记之一步步教你轻松学关联规则Apriori算法
- 机器学习笔记之Logistics Regression逻辑回归模型原理介绍
- Xilinx 推出 reVISION,继续拓展机器学习市场
- 机器学习:朴素贝叶斯算法对新闻分类
- 《构建实时机器学习系统》一第1章 实时机器学习综述 1.1 什么是机器学习
- 【机器学习】KNN分类算法 介绍 || K值的选取 || 距离的计算 || 该算法特点 || 以识别鸢尾花种类为例 || 数据可视化求解过程:matplotlib绘图 || 主成分分析
- 机器学习中的监督学习和非监督学习有什么区别?
- 《中国人工智能学会通讯》——8.19 多目标优化中的机器学习
- 中国人工智能学会通讯——机器学习在商务智能中的创新应用 1.5 人工智能会带来什么
- 机器学习与计算机视觉(darknet编译)
- 如何在iPhone上建立第一个机器学习模型
- 机器学习:在SVM中使用核函数来构造复杂的非线性决策边界