Spark(Python) 从内存中建立 RDD 的例子
Spark(Python) 从内存中建立 RDD 的例子:
myData = ["Alice","Carlos","Frank","Barbara"]
myRdd = sc.parallelize(myData)
myRdd.take(2)
----
In [52]: myData = ["Alice","Carlos","Frank","Barbara"]
In [53]: myRdd = sc.parallelize(myData)
In [54]: myRdd.take(2)
17/09/24 02:40:10 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:393
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Got job 5 (runJob at PythonRDD.scala:393) with 1 output partitions
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (runJob at PythonRDD.scala:393)
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Missing parents: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43), which has no missing parents
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 3.2 KB, free 1767.1 KB)
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1769.3 KB)
17/09/24 02:40:10 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:33950 (size: 2.2 KB, free: 208.7 MB)
17/09/24 02:40:10 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43)
17/09/24 02:40:10 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 1 tasks
17/09/24 02:40:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, localhost, partition 0,PROCESS_LOCAL, 2028 bytes)
17/09/24 02:40:10 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 5)
17/09/24 02:40:11 INFO python.PythonRunner: Times: total = 41, boot = 20, init = 14, finish = 7
17/09/24 02:40:11 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 5). 979 bytes result sent to driver
17/09/24 02:40:11 INFO scheduler.DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:393) finished in 0.423 s
17/09/24 02:40:11 INFO scheduler.DAGScheduler: Job 5 finished: runJob at PythonRDD.scala:393, took 0.648315 s
17/09/24 02:40:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 5) in 423 ms on localhost (1/1)
17/09/24 02:40:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
Out[54]: ['Alice', 'Carlos']
In [55]:
相关文章
- Python脚本写端口扫描器(socket,python-nmap)
- python执行脚本加参数_命令行运行Python脚本时传入参数的三种方式详解以及argparse子命令subparsers()方法
- 从C#到Python手把手教你用Python实现内存扫描获取指定字符串
- Word处理控件Aspose.Words功能演示:在 Python 中将 PDF 转换为 JPG
- 地球引擎初级教程——Python API 语法(内涵JavaScript转python工具包介绍)
- Python怎么就火起来了?学会python可以做的兼职
- 零基础教你快速入门Python怎么学python入门?python新手学习路线
- 人生苦短,我用Python!为什么现在越来越多的人转行python?
- 小白如何搭建Python自带静态Web服务器?
- 将自己OpenCV-Python-PyCharm开发环境的Python-3.6.8更换为python-3.9.10的详细过程记录
- Python的datetime模块,如果遇到一个错误的日期,会发生什么?
- python 高级函数
- 《python 与数据挖掘 》一 第2章 Python基础入门
- Python中python-nmap模块的使用
- 带大家用40行python代码实现一个疫情地图
- 一、psutil模块--使用python监控当前系统的CPU、内存、根目录、IP地址等信息
- 【转载】 python修饰符@
- 快速新建python虚拟环境,进行产品开发
- 《像计算机科学家一样思考Python(第2版)》——1.6 形式语言和自然语言
- 《Python高手之路》——2.4 框架
- 《Python编程初学者指南》——1.5 IDLE简介
- python Python程序的架构
- Python代码大全之sqlite通过参数update数据
- Python之旅本地环境搭建
- Python学习---重点模块之re
- python和numpy matplotlib版本匹配,以及安装指定版本库
- Python 基础 之 Ubuntu 上安装 python 和 python-pip
- Python 基础 之 python 线程知识点整理,并实现一个简单多线程 udp 聊天应用
- Python标准库random模块用法
- 清华源pip安装Python第三方包