您现在的位置是：首页 > 其它

当前栏目

pyspark 梯度提升树

提升梯度 Pyspark

2023-09-14 09:09:29 时间

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Jun  7 18:15:30 2018

@author: luogan
"""

from pyspark.ml import Pipeline
from pyspark.ml.classification import GBTClassifier
from pyspark.ml.feature import StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator


from pyspark.sql import SparkSession

spark= SparkSession\
                .builder \
                .appName("dataFrame") \
                .getOrCreate()


# Load and parse the data file, converting it to a DataFrame.
data = spark.read.format("libsvm").load("/home/luogan/lg/softinstall/spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt")


# Index labels, adding metadata to the label column.
# Fit on whole dataset to include all labels in index.
labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(data)
# Automatically identify categorical features, and index them.
# Set maxCategories so features with > 4 distinct values are treated as continuous.
featureIndexer =\
    VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)

# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = data.randomSplit([0.7, 0.3])

# Train a GBT model.
gbt = GBTClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxIter=10)

# Chain indexers and GBT in a Pipeline
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, gbt])

# Train model.  This also runs the indexers.
model = pipeline.fit(trainingData)

# Make predictions.
predictions = model.transform(testData)

# Select example rows to display.
predictions.select("prediction", "indexedLabel", "features").show(5)

# Select (prediction, true label) and compute test error
evaluator = MulticlassClassificationEvaluator(
    labelCol="indexedLabel", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)
print("Test Error = %g" % (1.0 - accuracy))

gbtModel = model.stages[2]
print(gbtModel)  # summary only

+----------+------------+--------------------+
|prediction|indexedLabel|            features|
+----------+------------+--------------------+
|       1.0|         1.0|(692,[95,96,97,12...|
|       1.0|         1.0|(692,[121,122,123...|
|       1.0|         1.0|(692,[122,123,124...|
|       1.0|         1.0|(692,[124,125,126...|
|       1.0|         1.0|(692,[124,125,126...|
+----------+------------+--------------------+
only showing top 5 rows

Test Error = 0.0571429
GBTClassificationModel (uid=GBTClassifier_483a8eddb2c54d041fae) with 10 trees

猜你喜欢

SQL Server中的日期格式转换实践（sqlserver日期转换）
Discuz!nt源文件变成乱码的解决方法
MySQL Error number: MY-013697; Symbol: ER_HEALTH_WARNING_DISK_USAGE_LEVEL_1; SQLSTATE: HY000 报错故障修复远程处理
ORA-13980: Invalid file URL ORACLE 报错故障修复远程处理
python的迭代器与生成器实例详解
iOS无限金币安装包实现
搭建Linux环境，体验FTP服务器的魅力（ftplinux虚拟机）
F 阎小罗的Minimax （第十届山东理工大学ACM网络编程擂台赛正式赛）
使用PHP实现Mysql读写分离
火绒安全终端防护数据月报（2022-07）
phpexcel导出excel的颜色和网页中的颜色显示不一致
如何排查慢的 Oracle SQL？（oracle慢的sql）
SpaceX将从8月起恢复星链卫星发射至少发射两次
编程Java编程解决Redis过期问题（redisjava过期）
数据深入研究获取Redis当前数据库的最佳方式（获取redis当前库）
信息系统项目管理师(高级)知识点 - 1
Linux Deb包安装指南（linuxdeb文件安装）
nodejs安装包安装教程_nodejs怎么下载安装

相关主题

Redis提升
1、变量提升
性能提升
提升权限
提升效率方法
提升效率
redis性能提升
js 变量提升
提升
权限提升漏洞

zl程序教程

当前栏目

pyspark 梯度提升树

相关文章