您现在的位置是：首页 > 云平台

当前栏目

spark logstic 回归

Spark 回归

2023-09-14 09:09:29 时间

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Jun  7 16:28:03 2018

@author: luogan
"""

from pyspark.ml.linalg import Vectors
from pyspark.ml.classification import LogisticRegression

from pyspark.sql import SparkSession

spark= SparkSession\
                .builder \
                .appName("dataFrame") \
                .getOrCreate()


# Prepare training data from a list of (label, features) tuples.
training = spark.createDataFrame([
    (1.0, Vectors.dense([0.0, 1.1, 0.1])),
    (0.0, Vectors.dense([2.0, 1.0, -1.0])),
    (0.0, Vectors.dense([2.0, 1.3, 1.0])),
    (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])

# Create a LogisticRegression instance. This instance is an Estimator.
lr = LogisticRegression(maxIter=10, regParam=0.01)
# Print out the parameters, documentation, and any default values.
print("LogisticRegression parameters:\n" + lr.explainParams() + "\n")

# Learn a LogisticRegression model. This uses the parameters stored in lr.
model1 = lr.fit(training)

# Since model1 is a Model (i.e., a transformer produced by an Estimator),
# we can view the parameters it used during fit().
# This prints the parameter (name: value) pairs, where names are unique IDs for this
# LogisticRegression instance.
print("Model 1 was fit using parameters: ")
print(model1.extractParamMap())

# We may alternatively specify parameters using a Python dictionary as a paramMap
paramMap = {lr.maxIter: 20}
paramMap[lr.maxIter] = 30  # Specify 1 Param, overwriting the original maxIter.
paramMap.update({lr.regParam: 0.1, lr.threshold: 0.55})  # Specify multiple Params.

# You can combine paramMaps, which are python dictionaries.
paramMap2 = {lr.probabilityCol: "myProbability"}  # Change output column name
paramMapCombined = paramMap.copy()
paramMapCombined.update(paramMap2)

# Now learn a new model using the paramMapCombined parameters.
# paramMapCombined overrides all parameters set earlier via lr.set* methods.
model2 = lr.fit(training, paramMapCombined)
print("Model 2 was fit using parameters: ")
print(model2.extractParamMap())

# Prepare test data
test = spark.createDataFrame([
    (1.0, Vectors.dense([-1.0, 1.5, 1.3])),
    (0.0, Vectors.dense([3.0, 2.0, -0.1])),
    (1.0, Vectors.dense([0.0, 2.2, -1.5]))], ["label", "features"])

# Make predictions on test data using the Transformer.transform() method.
# LogisticRegression.transform will only use the 'features' column.
# Note that model2.transform() outputs a "myProbability" column instead of the usual
# 'probability' column since we renamed the lr.probabilityCol parameter previously.
prediction = model2.transform(test)
result = prediction.select("features", "label", "myProbability", "prediction") \
    .collect()

for row in result:
    print("features=%s, label=%s -> prob=%s, prediction=%s"
          % (row.features, row.label, row.myProbability, row.prediction))

Model 1 was fit using parameters: 
{}
Model 2 was fit using parameters: 
{}
features=[-1.0,1.5,1.3], label=1.0 -> prob=[0.05707302714277409,0.9429269728572259], prediction=1.0
features=[3.0,2.0,-0.1], label=0.0 -> prob=[0.9238522038489116,0.07614779615108845], prediction=0.0
features=[0.0,2.2,-1.5], label=1.0 -> prob=[0.1097278338217611,0.8902721661782389], prediction=1.0

猜你喜欢

Linux安装显卡驱动：轻松实现显示效果升级。（linux装显卡驱动）
深入解析MySQL中的ASC排序方式（mysql中asc是什么）
拥抱变化Oracle云数据库架构让数据变得更简单（oracle云数据库架构）
Redis中哈希表的应用及优化实践（redis哈希表）
w ndows无法连接到System,电脑无法连接到System Event Notification Service服务[通俗易懂]
利用Linux修复硬盘坏道（linux修复硬盘坏道）
Redis打造精准的随机推荐系统（redis随机推荐）
Oracle解密索引之谜（oracle关于索引）
子域名信息搜集
小米全球副总裁表示：小米可能卖出100亿部手机，但利润为零
谷歌PhD奖研金获得者徐海峰：“幸运”的算法博弈论之路
利用Linux系统实现域内网络构建（linux创建域）
深入理解Oracle数据库触发器类型（oracle触发器类型）
MySQL BTree索引：优化查询效率的良方（mysqlbtree）
js鼠标滑过弹出层的定位IE6bug解决办法
1、统一数据返回
c#中过滤html的正则表达式
CSAPP---第七章-链接

相关主题

Spark SQL
Spark技术栈
spark 内存管理
Spark 与 DataFrame
spark的wordcount
spark安装
spark RDD详解
spark 安装
Spark 之WordCount
Spark Streaming
spark入门（1）
spark优化
spark数据倾斜
SPARK-18560
Spark相关
spark source
hive、spark
008-spark 的调度

zl程序教程

当前栏目

spark logstic 回归

相关文章