zl程序教程

您现在的位置是:首页 >  系统

当前栏目

Windows上搭建Standalone模式的Spark环境

Windows模式Spark 环境 搭建 Standalone
2023-09-11 14:17:43 时间

安装Java8,设置JAVA_HOME,并添加 %JAVA_HOME%\bin 到环境变量PATH中

E:\java -version

java version "1.8.0_60"

Java(TM) SE Runtime Environment (build 1.8.0_60-b27)

Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
Scala

下载解压Scala 2.11,设置SCALA_HOME,并添加 %SCALA_HOME%\bin 到PATH中

E:\ scala -verion

Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL
Spark

下载解压Spark 2.1, 设置SPARK_HOME,并添加 %SPARK_HOME%\bin 到PATH中,此时尝试在控制台运行spark-shell,出现如下错误提示无法定位winutils.exe。

E:\ spark-shell

Using Sparks default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

17/06/05 21:34:43 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)

 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)

 at org.apache.hadoop.util.Shell. clinit (Shell.java:387)

 at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327)

 at org.apache.hadoop.hive.conf.HiveConf$ConfVars. clinit (HiveConf.java:365)

 at org.apache.hadoop.hive.conf.HiveConf. clinit (HiveConf.java:105)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:348)

 at org.apache.spark.util.Utils$.classForName(Utils.scala:229)

 at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:991)

 at org.apache.spark.repl.Main$.createSparkSession(Main.scala:92)

 at $line3.$read

iw. init ( console :15)

 at $line3.$read

iw. init ( console :42)

 at $line3.$read. init ( console :44)

 at $line3.$read$. init ( console :48)

 at $line3.$read$. clinit ( console )

 at $line3.$eval$.$print$lzycompute( console :7)

 at $line3.$eval$.$print( console :6)

 at $line3.$eval.$print( console )

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:497)

 at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)

 at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)

 at scala.tools.nsc.interpreter.IMain$WrappedRequest

anonfun$loadAndRunReq$1.apply(IMain.scala:638)

 at scala.tools.nsc.interpreter.IMain$WrappedRequest

anonfun$loadAndRunReq$1.apply(IMain.scala:637)

 at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)

 at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)

 at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)

 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)

 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)

 at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)

 at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)

 at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)

 at org.apache.spark.repl.SparkILoop

anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)

 at org.apache.spark.repl.SparkILoop

anonfun$initializeSpark$1.apply(SparkILoop.scala:37)

 at org.apache.spark.repl.SparkILoop

anonfun$initializeSpark$1.apply(SparkILoop.scala:37)

 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)

 at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)

 at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)

 at scala.tools.nsc.interpreter.ILoop

anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)

 at scala.tools.nsc.interpreter.ILoop

anonfun$process$1.apply(ILoop.scala:909)

 at scala.tools.nsc.interpreter.ILoop

anonfun$process$1.apply(ILoop.scala:909)

 at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)

 at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)

 at org.apache.spark.repl.Main$.doMain(Main.scala:69)

 at org.apache.spark.repl.Main$.main(Main.scala:52)

 at org.apache.spark.repl.Main.main(Main.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:497)

 at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit

runMain(SparkSubmit.scala:743)

 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)

 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

从错误消息中可以看出Spark需要用到Hadoop中的一些类库(通过HADOOP_HOME环境变量,因为我们之前并未设置过,所以文件路径null\bin\winutils.exe里面出现了null),但这并不意味这我们一定要安装Hadoop,我们可以直接下载所需要的winutils.exe到磁盘上的任何位置,比如C:\winutils\bin\winutils.exe,同时设置 HADOOP_HOME=C:winutils 。

现在我们再次运行spark-shell,又有一个新的错误:

java.lang.IllegalArgumentException: Error while instantiating org.apache.spark.sql.hive.HiveSessionState:

 at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession

reflect(SparkSession.scala:981)

 at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)

 at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)

 at org.apache.spark.sql.SparkSession$Builder

anonfun$getOrCreate$5.apply(SparkSession.scala:878)

 at org.apache.spark.sql.SparkSession$Builder

anonfun$getOrCreate$5.apply(SparkSession.scala:878)

 at scala.collection.mutable.HashMap

anonfun$foreach$1.apply(HashMap.scala:99)

 at scala.collection.mutable.HashMap

anonfun$foreach$1.apply(HashMap.scala:99)

 at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)

 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)

 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)

 at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)

 at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)

 ... 47 elided

Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating org.apache.spark.sql.hive.HiveExternalCatalog:

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

 at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession

reflect(SparkSession.scala:978)

 ... 58 more

Caused by: java.lang.IllegalArgumentException: Error while instantiating org.apache.spark.sql.hive.HiveExternalCatalog:

 at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState

reflect(SharedState.scala:169)

 at org.apache.spark.sql.internal.SharedState. init (SharedState.scala:86)

 at org.apache.spark.sql.SparkSession

anonfun$sharedState$1.apply(SparkSession.scala:101)

 at org.apache.spark.sql.SparkSession

anonfun$sharedState$1.apply(SparkSession.scala:101)

 at scala.Option.getOrElse(Option.scala:121)

 at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)

 at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)

 at org.apache.spark.sql.internal.SessionState. init (SessionState.scala:157)

 at org.apache.spark.sql.hive.HiveSessionState. init (HiveSessionState.scala:32)

 ... 63 more

Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

 at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState

reflect(SharedState.scala:166)

 ... 71 more

Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

 at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)

 at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)

 at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)

 at org.apache.spark.sql.hive.HiveExternalCatalog. init (HiveExternalCatalog.scala:66)

 ... 76 more

Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

 at org.apache.spark.sql.hive.client.HiveClientImpl. init (HiveClientImpl.scala:188)

 ... 84 more

Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

 at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)

 at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)

 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)

 ... 85 more

 console :14: error: not found: value spark

 import spark.implicits._

 console :14: error: not found: value spark

 import spark.sql

Welcome to

 ____ __

 / __/__ ___ _____/ /__

 _\ \/ _ \/ _ `/ __/ _/

 /___/ .__/\_,_/_/ /_/\_\ version 2.1.1

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)

Type in expressions to have them evaluated.

Type :help for more information.

scala 

错误消息中提示零时目录 /tmp/hive 没有写的权限:

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

所以我们需要更新E:/tmp/hive的权限(我在E盘下运行的spark-shell命令,如果在其他盘运行,就改成对应的盘符+/tmp/hive)。运行如下命令:

E:\ C:\winutils\bin\winutils.exe chmod 777 E:\tmp\hive

再次运行spark-shell,spark启动成功。此时可以通过 http://localhost:4040 来访问Spark UI


Windows操作系统:测试模式禁用数字签名 程序必须在特殊的启动环境下才能正常使用,尤其是一些非官方或者需要数字签名的驱动程序,而Windows测试模式可以帮助我们解决类似的问题,开启后会帮助我们禁用驱动程序强制签名
windows server 2012 R2 远程桌面授权模式尚未配置 windows server 2012 R2 远程桌面授权模式尚未配置,远程桌面服务将在120天内停止工作。如何破解这个宽限期,目前企业7位协议号码均不包含2012 R2以上授权。 那么只能蛋疼的“破解”咯。