Python和R代码机器学习算法速查对比表
在拿破仑·希尔(Napolean Hill)所著的《思考致富》(Think and Grow Rich)一书中,他为我们引述了Darby苦挖金矿多年后,就在离矿脉一步之遥的时候与宝藏失之交臂的故事。
思考致富中文版的豆瓣阅读链接:
http://read.douban.com/reader/ebook/10954762/
根据该书内容进行的修改
如今,我虽然不知道这故事是真是假,但是我明确知道在我身边有不少这样的“数据Darby”。这些人了解机器学习的目的和执行,对待任何研究问题只使用2-3种算法。他们不用更好的算法和技术来更新自身,只因为他们太顽固,或者他们只是在耗费时间而不求进步。
像Darby这一类人,他们总是在接近终点的时候而错失良机。最终,他们以计算量大、难度大或是无法设定合适的阈值来优化模型等借口,放弃了机器学习。这有什么意义?你听说过这些人吗?
今天给出的速查表旨在改变这群“数据Darby”对机器学习的态度,使他们成为身体力行的倡导者。这里收集了10个最为常用的机器学习算法,附上了Python和R代码。
考虑到机器学习方法在建模中得到了更多的运用,以下速查表可以作为代码指南来帮助你掌握机器学习算法运用。祝你好运!
对于那些超级懒惰的数据Darbies,我们将让你的生活过得更轻松。你可以在此下载PDF版的速查表,便可直接复制粘贴代码。
机器学习算法 | ||
类 型 | ||
监督学习 |
非监督学习 |
增强学习 |
决策树 K-近邻算法 随机决策森林 Logistics回归分析 |
Apriori算法 K-均值算法 系统聚类 |
马尔科夫决策过程 增强学习算法(Q-学习) |
线性回归 |
#Import other necessary libraries like pandas, #numpy... from sklearn import linear_model #Load Train and Test datasets #Identify feature and response variable(s) and #values must be numeric and numpy arrays x_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets x_test=input_variables_values_test_datasets #Create linear regression objectlinear = linear_model.LinearRegression() #Train the model using the training sets and #check scorelinear.fit(x_train, y_train) linear.score(x_train, y_train) #Equation coefficient and Intercept print('Coefficient: \n', linear.coef_) print('Intercept: \n', linear.intercept_) #Predict Output predicted= linear.predict(x_test)
|
#Load Train and Test datasets #Identify feature and response variable(s) and #values must be numeric and numpy arrays x_train <- input_variables_values_training_datasets y_train <- target_variables_values_training_datasets x_test <- input_variables_values_test_datasets x <- cbind(x_train,y_train) #Train the model using the training sets and #check score linear <- lm(y_train ~ ., data = x)summary(linear) #Predict Output predicted= predict(linear,x_test)
|
逻辑回归 |
#Import Library from sklearn.linear_model import LogisticRegression #Assumed you have, X (predictor) and Y (target) #for training data set and x_test(predictor) #of test_dataset #Create logistic regression object model = LogisticRegression() #Train the model using the training sets #and check score model.fit(X, y) model.score(X, y) #Equation coefficient and Intercept print('Coefficient: \n', model.coef_) print('Intercept: \n', model.intercept_) #Predict Output predicted= model.predict(x_test)
|
x <- cbind(x_train,y_train) #Train the model using the training sets and check #score logistic <- glm(y_train ~ ., data = x,family='binomial') summary(logistic) #Predict Outputpredicted= predict(logistic,x_test) |
决 策 树 |
#Import Library #Import other necessary libraries like pandas, numpy... from sklearn import tree #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of #test_dataset #Create tree objectmodel = tree.DecisionTreeClassifier(criterion='gini') #for classification, here you can change the #algorithm as gini or entropy (information gain) by #default it is gin #model = tree.DecisionTreeRegressor() for #regression #Train the model using the training sets and check #score model.fit(X, y) model.score(X, y) #Predict Outputpredicted= model.predict(x_test)
|
#Import Library library(rpart) x <-cbind(x_train,y_train) #grow tree fit <- rpart(y_train ~ ., data = x,method="class") summary(fit) #Predict Outputpredicted= predict(fit,x_test)
|
支持 向量机 |
#Import Library from sklearn import svm #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset #Create SVM classification objectmodel = svm.svc() #there are various options associatedwith it, this is simple for classification. #Train the model using the training sets and check #score model.fit(X, y) model.score(X, y) #Predict Outputpredicted= model.predict(x_test)
|
#Import Library library(e1071) x <- cbind(x_train,y_train) #Fitting model fit <-svm(y_train ~ ., data = x) summary(fit) #Predict Outputpredicted= predict(fit,x_test)
|
贝叶斯算法 |
#Import Libraryfrom sklearn.naive_bayes import GaussianNB #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset #Create SVM classification object model = GaussianNB() #there is other distribution for multinomial classes like Bernoulli Naive Bayes #Train the model using the training sets and check #scoremodel.fit(X, y) #Predict Outputpredicted= model.predict(x_test) |
#Import Librarylibrary(e1071) x <- cbind(x_train,y_train)#Fitting model fit <-naiveBayes(y_train ~ ., data = x) summary(fit) #Predict Outputpredicted= predict(fit,x_test)
|
k-近邻算法析 |
#Import Library from sklearn.neighbors import KNeighborsClassifier #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset #Create KNeighbors classifier object model KNeighborsClassifier(n_neighbors=6) #default value for n_neighbors is 5 #Train the model using the training sets and check score model.fit(X, y) #Predict Outputpredicted= model.predict(x_test)
|
#Import Librarylibrary(knn) x <- cbind(x_train,y_train) #Fitting model fit <-knn(y_train ~ ., data = x,k=5) summary(fit) #Predict Output predicted= predict(fit,x_test) |
硬聚类算法
|
#Import Library from sklearn.cluster import KMeans #Assumed you have, X (attributes) for training data set #and x_test(attributes) of test_dataset #Create KNeighbors classifier object model k_means = KMeans(n_clusters=3, random_state=0) #Train the model using the training sets and check score model.fit(X) #Predict Outputpredicted= model.predict(x_test)
|
#Import Library library(cluster) fit <- kmeans(X, 3) #5 cluster solution |
随机森林算法 |
#Import Libraryfrom sklearn.ensemble import RandomForestClassifier #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset #Create Random Forest objectmodel= RandomForestClassifier() #Train the model using the training sets and check score model.fit(X, y) #Predict Outputpredicted= model.predict(x_test) |
#Import Library library(randomForest) x <- cbind(x_train,y_train) #Fitting model fit <- randomForest(Species ~ ., x,ntree=500) summary(fit) #Predict Outputpredicted= predict(fit,x_test) |
降维算法 |
#Import Library from sklearn import decomposition #Assumed you have training and test data set as train and #test #Create PCA object pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features) #For Factor analysis #fa= decomposition.FactorAnalysis() #Reduced the dimension of training dataset using PCA train_reduced = pca.fit_transform(train) #Reduced the dimension of test datasettest_reduced = pca.transform(test)
|
#Import Library library(stats) pca <- princomp(train, cor = TRUE) train_reduced <- predict(pca,train) test_reduced <- predict(pca,test) |
GB D T |
#Import Library from sklearn.ensemble import GradientBoostingClassifier #Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset #Create Gradient Boosting Classifier object model= GradientBoostingClassifier(n_estimators=100, \ learning_rate=1.0, max_depth=1, random_state=0) #Train the model using the training sets and check score model.fit(X, y) #Predict Output predicted= model.predict(x_test)
|
#Import Library library(caret) x <- cbind(x_train,y_train) #Fitting modelfitControl <- trainControl( method = "repeatedcv", + number = 4, repeats = 4) fit <- train(y ~ ., data = x, method = "gbm",+ trControl = fitControl,verbose = FALSE) predicted= predict(fit,x_test,type= "prob")[,2] |
原文发布时间为:2015-12-02
本文来自云栖社区合作伙伴“大数据文摘”,了解相关信息可以关注“BigDataDigest”微信公众号
相关文章
- 图像处理工具Python扩展库,你了解吗?
- 十个常用的损失函数解释以及Python代码实现
- 30 个数据科学工作中必备的 Python 包
- 如何在 Windows 上安装 Python
- 几行 Python 代码就可以提取数百个时间序列特征
- 使用Python快速搭建接口自动化测试脚本实战总结
- 哪种编程语言最适合开发网页抓取工具?
- 不要在 Python 中使用循环,这些方法其实更棒!
- 震惊!用Python探索《红楼梦》的人物关系!
- 如何最简单、通俗地理解Python模块?
- 酷炫,Python实现交通数据可视化!
- 为什么急于寻找Python的替代者?
- 30 个数据工程必备的Python 包
- 去字节面试被面这题能答上来吗?谈谈你对时间轮的理解?
- 火山引擎在行为分析场景下的 ClickHouse JOIN 优化
- 用Python爬取了某宝1166家月饼数据进行可视化分析,终于找到最好吃的月饼~
- 在 Linux 上试试这个基于 Python 的文件管理器
- Python列表解析式到底该怎么用?
- 如何快速把你的 Python 代码变为 API
- 十个Python初学者常犯的错误