win下idea远程提交WordCount任务到HA集群详解大数据
2023-06-13 09:20:26 时间
一,环境配置
1,修改win下的host文件:即C:/Windows/System32/drivers/etc/host中添加集群中机子的ip
2,win下hadoop,并为win的环境变量配置hadoop_home,添加winutils.exe放到$HADOOP_HOME/bin下
3,使用idea新建maven项目,其中pom.xml设置如下:
?xml version="1.0" encoding="UTF-8"? project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" modelVersion 4.0.0 /modelVersion groupId big /groupId artifactId data /artifactId version 1.0-SNAPSHOT /version dependencies dependency groupId org.apache.hadoop /groupId artifactId hadoop-common /artifactId version 2.7.5 /version /dependency dependency groupId org.apache.hadoop /groupId artifactId hadoop-client /artifactId version 2.7.5 /version /dependency dependency groupId org.apache.hadoop /groupId artifactId hadoop-hdfs /artifactId version 2.7.5 /version /dependency !-- dependency groupId org.apache.hadoop /groupId artifactId hadoop-hdfs-client /artifactId version 2.7.5 /version /dependency -- dependency groupId org.apache.hadoop /groupId artifactId hadoop-mapreduce-client-jobclient /artifactId version 2.7.5 /version /dependency /dependencies /project
4,拷贝ha集群中hadoop的配置文件到idea中resource中,hadoop的具体配置如下:
core-site.xml:
?xml version="1.0" encoding="UTF-8"? ?xml-stylesheet type="text/xsl" href="configuration.xsl"? !-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. !-- Put site-specific property overrides in this file. -- configuration property name fs.defaultFS /name value hdfs://mycluster /value /property property name ha.zookeeper.quorum /name value cent1:2181,cent2:2181,cent3:2181 /value /property !-- property name hadoop.tmp.dir /name value /opt/hadoop2 /value description A base for other temporary directories. /description /property -- /configuration
hdfs-site.xml:
?xml version="1.0" encoding="UTF-8"? ?xml-stylesheet type="text/xsl" href="configuration.xsl"? !-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. !-- Put site-specific property overrides in this file. -- configuration property name dfs.nameservices /name value mycluster /value /property property name dfs.ha.namenodes.mycluster /name value nn1,nn2 /value /property property name dfs.namenode.rpc-address.mycluster.nn1 /name value cent1:9000 /value /property property name dfs.namenode.rpc-address.mycluster.nn2 /name value cent2:9000 /value /property property name dfs.namenode.http-address.mycluster.nn1 /name value cent1:50070 /value /property property name dfs.namenode.http-address.mycluster.nn2 /name value cent2:50070 /value /property property name dfs.namenode.shared.edits.dir /name value qjournal://cent2:8485;cent3:8485;cent4:8485/mycluster /value /property property name dfs.client.failover.proxy.provider.mycluster /name value org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider /value /property property name dfs.ha.fencing.methods /name value sshfence /value /property property name dfs.ha.fencing.ssh.private-key-files /name value /root/.ssh/id_rsa /value /property property name dfs.journalnode.edits.dir /name value /opt/jn/data /value /property property name dfs.ha.automatic-failover.enabled /name value true /value /property property name dfs.permissions.enabled /name value false /value /property /configuration
mapred-site.xml:
?xml version="1.0"? ?xml-stylesheet type="text/xsl" href="configuration.xsl"? !-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. !-- Put site-specific property overrides in this file. -- configuration property name mapreduce.framework.name /name value yarn /value /property /configuration
yarn-site.xml:
?xml version="1.0"? !-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. configuration !-- Site specific YARN configuration properties -- property name yarn.nodemanager.aux-services /name value mapreduce_shuffle /value /property property name yarn.nodemanager.aux-services.mapreduce.shuffle.class /name value org.apache.hadoop.mapred.ShuffleHandler /value /property property name yarn.resourcemanager.hostname /name value cent1 /value /property /configuration
log4j.properties:
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Define some default values that can be overridden by system properties hadoop.root.logger=INFO,console hadoop.log.dir=. hadoop.log.file=hadoop.log # Define the root logger to the system property "hadoop.root.logger". log4j.rootLogger=${hadoop.root.logger}, EventCounter # Logging Threshold log4j.threshold=ALL # Null Appender log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender # Rolling File Appender - cap space usage at 5gb. hadoop.log.maxfilesize=256MB hadoop.log.maxbackupindex=20 log4j.appender.RFA=org.apache.log4j.RollingFileAppender log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize} log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex} log4j.appender.RFA.layout=org.apache.log4j.PatternLayout # Pattern format: Date LogLevel LoggerName LogMessage log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # Debugging Pattern format #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file} # Rollover at midnight log4j.appender.DRFA.DatePattern=.yyyy-MM-dd log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout # Pattern format: Date LogLevel LoggerName LogMessage log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # Debugging Pattern format #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n # TaskLog Appender #Default values hadoop.tasklog.taskid=null hadoop.tasklog.iscleanup=false hadoop.tasklog.noKeepSplits=4 hadoop.tasklog.totalLogFileSize=100 hadoop.tasklog.purgeLogSplits=true hadoop.tasklog.logsRetainHours=12 log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender log4j.appender.TLA.taskId=${hadoop.tasklog.taskid} log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup} log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize} log4j.appender.TLA.layout=org.apache.log4j.PatternLayout log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # HDFS block state change log from block manager # Uncomment the following to suppress normal block state change # messages from BlockManager in NameNode. #log4j.logger.BlockStateChange=WARN #Security appender hadoop.security.logger=INFO,NullAppender hadoop.security.log.maxfilesize=256MB hadoop.security.log.maxbackupindex=20 log4j.category.SecurityLogger=${hadoop.security.logger} hadoop.security.log.file=SecurityAuth-${user.name}.audit log4j.appender.RFAS=org.apache.log4j.RollingFileAppender log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file} log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize} log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex} # Daily Rolling Security appender log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file} log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd # hadoop configuration logging # Uncomment the following line to turn off configuration deprecation warnings. # log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN # hdfs audit logging hdfs.audit.logger=INFO,NullAppender hdfs.audit.log.maxfilesize=256MB hdfs.audit.log.maxbackupindex=20 log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger} log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize} log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex} # mapred audit logging mapred.audit.logger=INFO,NullAppender mapred.audit.log.maxfilesize=256MB mapred.audit.log.maxbackupindex=20 log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger} log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize} log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex} # Custom Logging levels #log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG #log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG #log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG # Jets3t library log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR # AWS SDK S3A FileSystem log4j.logger.com.amazonaws=ERROR log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN # Event Counter Appender # Sends counts of logging messages at different severity levels to Hadoop Metrics. log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter # Job Summary Appender # Use following logger to send summary to separate file defined by # hadoop.mapreduce.jobsummary.log.file : # hadoop.mapreduce.jobsummary.logger=INFO,JSA hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger} hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log hadoop.mapreduce.jobsummary.log.maxfilesize=256MB hadoop.mapreduce.jobsummary.log.maxbackupindex=20 log4j.appender.JSA=org.apache.log4j.RollingFileAppender log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file} log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize} log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex} log4j.appender.JSA.layout=org.apache.log4j.PatternLayout log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger} log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false # Yarn ResourceManager Application Summary Log # Set the ResourceManager summary log filename yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log # Set the ResourceManager summary log level and appender yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger} #yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY # To enable AppSummaryLogging for the RM, # set yarn.server.resourcemanager.appsummary.logger to # LEVEL ,RMSUMMARY in hadoop-env.sh # Appender for ResourceManager Application Summary Log # Requires the following properties to be set # - hadoop.log.dir (Hadoop Log directory) # - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename) # - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender) log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger} log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file} log4j.appender.RMSUMMARY.MaxFileSize=256MB log4j.appender.RMSUMMARY.MaxBackupIndex=20 log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n # HS audit log configs #mapreduce.hs.audit.logger=INFO,HSAUDIT #log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger} #log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false #log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender #log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log #log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout #log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n #log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd # Http Server Request Logs #log4j.logger.http.requests.namenode=INFO,namenoderequestlog #log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log #log4j.appender.namenoderequestlog.RetainDays=3 #log4j.logger.http.requests.datanode=INFO,datanoderequestlog #log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log #log4j.appender.datanoderequestlog.RetainDays=3 #log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog #log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log #log4j.appender.resourcemanagerrequestlog.RetainDays=3 #log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog #log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log #log4j.appender.jobhistoryrequestlog.RetainDays=3 #log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog #log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log #log4j.appender.nodemanagerrequestlog.RetainDays=3
二,编写WordCount程序
import java.io.IOException; import java.net.URI; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper Object, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); public static class IntSumReducer extends Reducer Text, IntWritable, Text, IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable IntWritable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); result.set(sum); context.write(key, result); public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); System.setProperty("hadoop.home.dir", "E://softs//majorSoft//hadoop-2.7.5");//初始时解决winutils异常 conf.set("mapreduce.app-submission.cross-platform", "true");//允许远程访问 Path input = new Path(URI.create("hdfs://mycluster/testFile/wordCount")); Path output = new Path(URI.create("hdfs://mycluster/output")); Job job = Job.getInstance(conf, "word count"); job.setJar("E://bigData//hadoopDemo//out//artifacts//wordCount_jar//hadoopDemo.jar");//必须要先打包出jar包 job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, input); FileOutputFormat.setOutputPath(job, output); System.exit(job.waitForCompletion(true) ? 0 : 1); }
三,遇到的异常
1,RuntimeException, ClassNotFoundException: Class WordCount$Map not found . Mapper class issue job.setJar("WordCount.jar"); 2,Exception message:/bin/bash:第0行fg:无任务控制 #表示运行远程访问格式 conf.set(“mapreduce.app-submission.cross-platform”, “true”); 和设置hdfs-site.xml property name dfs.permissions.enabled /name value false /value /property 3. java.io.IOException: Could not locate executable null/bin/winutils.exe in the Hadoop binaries. System.setProperty("hadoop.home.dir", "E://softs//majorSoft//hadoop-2.7.5"); 4,无法访问hdfs权限和识别不到集群 修改C:/Windows/System32/drivers/etc文件
9432.html
分布式文件系统,分布式数据库区块链并行处理(MPP)数据库,数据挖掘开源大数据平台数据中台数据分析数据开发数据治理数据湖数据采集相关文章
- idea是什么软件_总结IDEA开发的26个常用设置
- IntelliJ IDEA 配置Tomcat运行web项目[通俗易懂]
- IDEA 最新永久 2023 年激活码,亲测有效!!
- IDEA 创建 Servelet 项目
- 2022.10最新IntelliJ IDEA激活码、Idea注册码(持续更新中)
- IDEA 公司再发新神器!超越 VS Code 骚操作
- IDEA如何使用javadoc工具导出API 文档和注解@Documented的具体作用
- idea全局搜索快捷键修改_idea整个项目替换
- IntelliJ IDEA 常用快捷键一览表
- Idea激活码-idea使用教程-idea简介
- 拒绝内卷,从IDEA防沉迷做起!
- idea程序包org不存在-IDEA工程运行时总是报xx程序包不存在实际上包已导入(问题分析及解决方案)
- idea新版本的底部version control里local change窗口显示配置修改详细教程
- 这样配置,让你的 IDEA 好用到飞起来!
- idea和vs哪个好用
- 使用IDEA自带maven建java项目时报错。详解程序员
- IntelliJ IDEA添加过滤文件或目录详解程序员
- 利用Idea连接MySQL数据库(idea连接mysql)
- Linux下如何安装IDEA?(linux安装idea)
- 在Idea中快速配置Redis环境(idea中配置redis)
- IDEA中快速搭建Redis连接(redis连接idea)