Logistic回归与最小二乘概率分类算法简述与示例
plays one of the key roles in statistic inference, especially methods of estimating a parameter from a set of statistics. In this article, we’ll make full use of it.
Pattern recognition works on the way that learning the posterior probability p(y|x) of pattern x belonging to class y. In view of a pattern x, when the posterior probability of one of the class y achieves the maximum, we can take x for class y, i.e.
In Logistic regression algorithm, we make use of linear logarithmic function to analyze the posterior probability:
q(y|x,θ)=exp(∑bj=1θ(y)jϕj(x))∑cy′=1exp(∑bj=1θ(y′)jϕj(x)) Note that the denominator is a kind of regularization term. Then the Logistic regression is defined by the following optimal problem:
maxθ∑i=1mlogq(yi|xi,θ) We can solve it by gradient descent method:
Pick up a training sample (xi,yi) randomly. Update θ=(θ(1)T,…,θ(c)T)T along the direction of gradient ascent:θ(y)←θ(y)+ϵ∇yJi(θ),y=1,…,c where ∇yJi(θ)=−exp(θ(y)Tϕ(xi))ϕ(xi)∑cy′=1exp(θ(y′)Tϕ(xi))+{ϕ(xi)0(y=yi)(y≠yi) Go back to step 2,3 until we get a θ of suitable precision.
Take the Gaussian Kernal Model as an example:
q(y|x,θ)∝exp⎛⎝∑j=1nθjK(x,xj)⎞⎠ Aren’t you familiar with Gaussian Kernal Model? Refer to this article:n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:); x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:); hh=2*1^2; t0=randn(n,c); for o=1:n*1000 i=ceil(rand*n); yi=y(i); ki=exp(-(x-x(i)).^2/hh); ci=exp(ki*t0); t=t0-0.1*(ki*ci)/(1+sum(ci)); t(:,yi)=t(:,yi)+0.1*ki; if norm(t-t0) 0.000001 break; t0=t; N=100; X=linspace(-5,5,N); K=exp(-(repmat(X.^2,1,n)+repmat(x.^2,N,1)-2*X*x)/hh); figure(1); clf; hold on; axis([-5,5,-0.3,1.8]); C=exp(K*t); C=C./repmat(sum(C,2),1,c); plot(X,C(:,1),b-); plot(X,C(:,2),r--); plot(X,C(:,3),g:); plot(x(y==1),-0.1*ones(n/c,1),bo); plot(x(y==2),-0.2*ones(n/c,1),rx); plot(x(y==3),-0.1*ones(n/c,1),gv); legend(q(y=1|x),q(y=2|x),q(y=3|x));
In LS probability classifiers, linear parameterized model is used to express the posterior probability:
q(y|x,θ(y))=∑j=1bθ(y)jϕj(x)=θ(y)Tϕ(x),y=1,…,c These models depends on the parameters θ(y)=(θ(y)1,…,θ(y)b)T correlated to each classes y that is diverse from the one used by Logistic classifiers. Learning those models means to minimize the following quadratic error:Jy(θ(y))==12∫(q(y|x,θ(y))−p(y|x))2p(x)dx12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(y|x)p(x)dx+12∫p(y|x)2p(x)dx where p(x) represents the probability density of training set {xi}ni=1.By the Bayesian formula,p(y|x)p(x)=p(x,y)=p(x|y)p(y) Hence Jy can be reformulated as
Jy(θ(y))=12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(x|y)p(y)dx+12∫p(y|x)2p(x)dx Note that the first term and second term in the equation above stand for the mathematical expectation of p(x) and p(x|y) respectively, which are often impossible to calculate directly. The last term is independent of θ and thus can be omitted.
Due to the fact that p(x|y) is the probability density of sample x belonging to class y, we are able to estimate term 1 and 2 by the following averages:1n∑i=1nq(y|xi,θ(y))2,1ny∑i:yi=yq(y|xi,θ(y))p(y) Next, we introduce the regularization term to get the following calculation rule:J^y(θ(y))=12n∑i=1nq(y|xi,θ(y))2−1ny∑i:yi=yq(y|xi,θ(y))+λ2n∥θ(y)∥2 Let π(y)=(π(y)1,…,π(y)n)T and π(y)i={1(yi=y)0(yi≠y), then
J^y(θ(y))=12nθ(y)TΦTΦθ(y)−1nθ(y)TΦTπ(y)+λ2n∥θ(y)∥2 .
Therefore, it is evident that the problem above can be formulated as a convex optimization problem, and we can get the analytic solution by setting the twice order derivative to zero:
θ^(y)=(ΦTΦ+λI)−1ΦTπ(y) .
In order not to get a negative estimation of the posterior probability, we need to add a constrain on the negative outcome:p^(y|x)=max(0,θ^(y)Tϕ(x))∑cy′=1max(0,θ^(y′)Tϕ(x))
We also take Gaussian Kernal Models for example:
n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:); x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:); hh=2*1^2; x2=x.^2; l=0.1; N=100; X=linspace(-5,5,N); k=exp(-(repmat(x2,1,n)+repmat(x2,n,1)-2*x*(x))/hh); K=exp(-(repmat(X.^2,1,n)+repmat(x2,N,1)-2*X*(x))/hh); for yy=1:c yk=(y==yy); ky=k(:,yk); ty=(ky*ky +l*eye(sum(yk)))\(ky*yk); Kt(:,yy)=max(0,K(:,yk)*ty); ph=Kt./repmat(sum(Kt,2),1,c); figure(1); clf; hold on; axis([-5,5,-0.3,1.8]); C=exp(K*t); C=C./repmat(sum(C,2),1,c); plot(X,C(:,1),b-); plot(X,C(:,2),r--); plot(X,C(:,3),g:); plot(x(y==1),-0.1*ones(n/c,1),bo); plot(x(y==2),-0.2*ones(n/c,1),rx); plot(x(y==3),-0.1*ones(n/c,1),gv); legend(q(y=1|x),q(y=2|x),q(y=3|x));
Logistic regression is good at dealing with sample set with small size since it works in a simple way. However, when the number of samples is large to some degree, it is better to turn to the least square probability classifiers.
高斯混合模型 GMM 的详细解释 高斯混合模型(后面本文中将使用他的缩写 GMM)听起来很复杂,其实他的工作原理和 KMeans 非常相似,你甚至可以认为它是 KMeans 的概率版本。 这种概率特征使 GMM 可以应用于 KMeans 无法解决的许多复杂问题。
线性回归算法推导与实战(二) 本文属于 线性回归算法【AIoT阶段三】(尚未更新),这里截取自其中一段内容,方便读者理解和根据需求快速阅读。本文通过公式推导+代码两个方面同时进行,因为涉及到代码的编译运行,如果你没有NumPy,Pandas,Matplotlib的基础,建议先修文章:数据分析三剑客【AIoT阶段一(下)】(十万字博文 保姆级讲解)
线性回归算法推导与实战(一) 本文属于 线性回归算法【AIoT阶段三】(尚未更新),这里截取自其中一段内容,方便读者理解和根据需求快速阅读。本文通过公式推导+代码两个方面同时进行,因为涉及到代码的编译运行,如果你没有NumPy,Pandas,Matplotlib的基础,建议先修文章:数据分析三剑客【AIoT阶段一(下)】(十万字博文 保姆级讲解)
相关文章
- 一致性哈希算法 CARP 原理解析, 附 Golang 实现
- Java实现 蓝桥杯VIP 算法训练 入学考试
- Python实现的计算马氏距离算法示例
- (算法)整数转汉字描述
- 重新整理数据结构与算法(c#)—— 顺序存储二叉树[十九]
- 程序员的算法趣题Q15: 走楼梯
- PCL 目标函数对称的ICP算法
- JVM调优:定位垃圾的常用算法
- 基于离线SDK开发第一个简单示例算法
- 《数据算法:Hadoop_Spark大数据处理技巧》艾提拉笔记.docx 第1章二次排序:简介 19 第2章二次排序:详细示例 42 第3章 Top 10 列表 54 第4章左外连接 96 第5
- 什么是网络单纯型算法
- 智能优化算法应用:基于麻雀搜索算法的TSP问题求解 - 附代码
- 简单谈谈 数组排序 的方法 【自定义算法 、 冒泡算法 等】
- 各大厂算法岗面试题汇总之C++篇
- 【数据结构与算法】冒泡排序——Java、C++、Python 中的算法示例代码
- 【STL】算法 — partial_sort
- 目标检测算法——医学图像开源数据集汇总(附下载链接)
- DL之RetinaNet:基于RetinaNet算法(keras框架)利用resnet50_coco数据集(.h5文件)实现目标检测