Hive-Container killed by YARN for exceeding memory limits. 9.2 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Spark for of by hive memory yarn container
2023-09-11 14:14:34 时间
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 62, hadoop7, executor 17): ExecutorLostFailure (executor 17 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.2 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1524) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1512) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1511) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1511) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1739) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1694) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1683) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed because of out of memory. INFO : Completed executing command(queryId=hive_20190529100107_063ed2a4-e3b0-48a9-9bcc-49acd51925c1); Time taken: 1441.753 seconds Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed because of out of memory. (state=42000,code=3) Closing: 0: jdbc:hive2://hadoop1:10000/pdw_nameonce
Hive on spark时报错
解决
a.set spark.yarn.executor.memoryOverhead=512G 调大(权宜之计),excutor-momery + memoryOverhead不能大于集群内存
b.该问题的原因是因为OS层面虚拟内存分配导致,物理内存没有占用多少,但检查虚拟内存的时候却发现OOM,因此可以通过关闭虚拟内存检查来解决该问题,yarn.nodemanager.vmem-check-enabled=false 将虚拟内存检测设置为false
相关文章
- Bug剖析篇-"Facebook 60TB+级的Apache Spark应用案例"
- Hudi(7):Hudi集成Spark之spark-sql方式
- 通过阿里云容器镜像服务海外服务器构建spark-operator镜像
- Spark SQL入门示例
- spark分区增减、JavaFX基本操作和HDFS NN DN概念
- 【网址收藏】Spark History Server配置及使用
- Spark DCT 离散余弦变换
- BigData之Spark:Spark(大数据通用的分布式开源计算引擎)的简介、下载、经典案例之详细攻略
- 【华为云技术分享】Spark中的文件源(上)
- 如何优雅的实现pandas DataFrame 和spark dataFrame 相互转换
- 随机森林算法demo python spark
- 三个大数据处理框架:Storm,Spark和Samza 介绍比较
- Spark 以及 spark streaming 核心原理及实践
- Spark实战(六)spark SQL + hive(Python版)
- Spark实战(五)spark streaming + flume(Python版)
- Spark实战(三)本地连接远程Spark(Python环境)
- 【Spark NLP】第 12 章:情感分析和情绪检测
- 【Apache Spark 】第 9 章使用 Apache Spark构建可靠的数据湖
- Spark(1):Spark概述