zl程序教程

您现在的位置是:首页 >  工具

当前栏目

win下idea远程提交WordCount任务到HA集群详解大数据

IDEA集群数据 详解 远程 任务 提交 win
2023-06-13 09:20:26 时间

一,环境配置

1,修改win下的host文件:即C:/Windows/System32/drivers/etc/host中添加集群中机子的ip

2,win下hadoop,并为win的环境变量配置hadoop_home,添加winutils.exe放到$HADOOP_HOME/bin下

3,使用idea新建maven项目,其中pom.xml设置如下:

 ?xml version="1.0" encoding="UTF-8"? 

 project xmlns="http://maven.apache.org/POM/4.0.0" 

 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 

 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" 

 modelVersion 4.0.0 /modelVersion 

 groupId big /groupId 

 artifactId data /artifactId 

 version 1.0-SNAPSHOT /version 

 dependencies 

 dependency 

 groupId org.apache.hadoop /groupId 

 artifactId hadoop-common /artifactId 

 version 2.7.5 /version 

 /dependency 

 dependency 

 groupId org.apache.hadoop /groupId 

 artifactId hadoop-client /artifactId 

 version 2.7.5 /version 

 /dependency 

 dependency 

 groupId org.apache.hadoop /groupId 

 artifactId hadoop-hdfs /artifactId 

 version 2.7.5 /version 

 /dependency 

 !-- dependency 

 groupId org.apache.hadoop /groupId 

 artifactId hadoop-hdfs-client /artifactId 

 version 2.7.5 /version 

 /dependency -- 

 dependency 

 groupId org.apache.hadoop /groupId 

 artifactId hadoop-mapreduce-client-jobclient /artifactId 

 version 2.7.5 /version 

 /dependency 

 /dependencies 

 /project 

4,拷贝ha集群中hadoop的配置文件到idea中resource中,hadoop的具体配置如下:

core-site.xml:

 ?xml version="1.0" encoding="UTF-8"? 

 ?xml-stylesheet type="text/xsl" href="configuration.xsl"? 

 !-- 

 Licensed under the Apache License, Version 2.0 (the "License"); 

 you may not use this file except in compliance with the License. 

 You may obtain a copy of the License at 

 http://www.apache.org/licenses/LICENSE-2.0 

 Unless required by applicable law or agreed to in writing, software 

 distributed under the License is distributed on an "AS IS" BASIS, 

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

 See the License for the specific language governing permissions and 

 limitations under the License. See accompanying LICENSE file. 

 !-- Put site-specific property overrides in this file. -- 

 configuration 

 property 

 name fs.defaultFS /name 

 value hdfs://mycluster /value 

 /property 

 property 

 name ha.zookeeper.quorum /name 

 value cent1:2181,cent2:2181,cent3:2181 /value 

 /property 

 !-- property 

 name hadoop.tmp.dir /name 

 value /opt/hadoop2 /value 

 description A base for other temporary directories. /description 

 /property -- 

 /configuration 

hdfs-site.xml:

 ?xml version="1.0" encoding="UTF-8"? 

 ?xml-stylesheet type="text/xsl" href="configuration.xsl"? 

 !-- 

 Licensed under the Apache License, Version 2.0 (the "License"); 

 you may not use this file except in compliance with the License. 

 You may obtain a copy of the License at 

 http://www.apache.org/licenses/LICENSE-2.0 

 Unless required by applicable law or agreed to in writing, software 

 distributed under the License is distributed on an "AS IS" BASIS, 

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

 See the License for the specific language governing permissions and 

 limitations under the License. See accompanying LICENSE file. 

 !-- Put site-specific property overrides in this file. -- 

 configuration 

 property 

 name dfs.nameservices /name 

 value mycluster /value 

 /property 

 property 

 name dfs.ha.namenodes.mycluster /name 

 value nn1,nn2 /value 

 /property 

 property 

 name dfs.namenode.rpc-address.mycluster.nn1 /name 

 value cent1:9000 /value 

 /property 

 property 

 name dfs.namenode.rpc-address.mycluster.nn2 /name 

 value cent2:9000 /value 

 /property 

 property 

 name dfs.namenode.http-address.mycluster.nn1 /name 

 value cent1:50070 /value 

 /property 

 property 

 name dfs.namenode.http-address.mycluster.nn2 /name 

 value cent2:50070 /value 

 /property 

 property 

 name dfs.namenode.shared.edits.dir /name 

 value qjournal://cent2:8485;cent3:8485;cent4:8485/mycluster /value 

 /property 

 property 

 name dfs.client.failover.proxy.provider.mycluster /name 

 value org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider /value 

 /property 

 property 

 name dfs.ha.fencing.methods /name 

 value sshfence /value 

 /property 

 property 

 name dfs.ha.fencing.ssh.private-key-files /name 

 value /root/.ssh/id_rsa /value 

 /property 

 property 

 name dfs.journalnode.edits.dir /name 

 value /opt/jn/data /value 

 /property 

 property 

 name dfs.ha.automatic-failover.enabled /name 

 value true /value 

 /property 

 property 

 name dfs.permissions.enabled /name 

 value false /value 

 /property 

 /configuration 

mapred-site.xml:

 ?xml version="1.0"? 

 ?xml-stylesheet type="text/xsl" href="configuration.xsl"? 

 !-- 

 Licensed under the Apache License, Version 2.0 (the "License"); 

 you may not use this file except in compliance with the License. 

 You may obtain a copy of the License at 

 http://www.apache.org/licenses/LICENSE-2.0 

 Unless required by applicable law or agreed to in writing, software 

 distributed under the License is distributed on an "AS IS" BASIS, 

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

 See the License for the specific language governing permissions and 

 limitations under the License. See accompanying LICENSE file. 

 !-- Put site-specific property overrides in this file. -- 

 configuration 

 property 

 name mapreduce.framework.name /name 

 value yarn /value 

 /property 

 /configuration 

yarn-site.xml:

 ?xml version="1.0"? 

 !-- 

 Licensed under the Apache License, Version 2.0 (the "License"); 

 you may not use this file except in compliance with the License. 

 You may obtain a copy of the License at 

 http://www.apache.org/licenses/LICENSE-2.0 

 Unless required by applicable law or agreed to in writing, software 

 distributed under the License is distributed on an "AS IS" BASIS, 

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

 See the License for the specific language governing permissions and 

 limitations under the License. See accompanying LICENSE file. 

 configuration 

 !-- Site specific YARN configuration properties -- 

 property 

 name yarn.nodemanager.aux-services /name 

 value mapreduce_shuffle /value 

 /property 

 property 

 name yarn.nodemanager.aux-services.mapreduce.shuffle.class /name 

 value org.apache.hadoop.mapred.ShuffleHandler /value 

 /property 

 property 

 name yarn.resourcemanager.hostname /name 

 value cent1 /value 

 /property 

 /configuration 

log4j.properties:

# Licensed to the Apache Software Foundation (ASF) under one 

# or more contributor license agreements. See the NOTICE file 

# distributed with this work for additional information 

# regarding copyright ownership. The ASF licenses this file 

# to you under the Apache License, Version 2.0 (the 

# "License"); you may not use this file except in compliance 

# with the License. You may obtain a copy of the License at 

# http://www.apache.org/licenses/LICENSE-2.0 

# Unless required by applicable law or agreed to in writing, software 

# distributed under the License is distributed on an "AS IS" BASIS, 

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

# See the License for the specific language governing permissions and 

# limitations under the License. 

# Define some default values that can be overridden by system properties 

hadoop.root.logger=INFO,console 

hadoop.log.dir=. 

hadoop.log.file=hadoop.log 

# Define the root logger to the system property "hadoop.root.logger". 

log4j.rootLogger=${hadoop.root.logger}, EventCounter 

# Logging Threshold 

log4j.threshold=ALL 

# Null Appender 

log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender 

# Rolling File Appender - cap space usage at 5gb. 

hadoop.log.maxfilesize=256MB 

hadoop.log.maxbackupindex=20 

log4j.appender.RFA=org.apache.log4j.RollingFileAppender 

log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file} 

log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize} 

log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex} 

log4j.appender.RFA.layout=org.apache.log4j.PatternLayout 

# Pattern format: Date LogLevel LoggerName LogMessage 

log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n 

# Debugging Pattern format 

#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n 


log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file} # Rollover at midnight log4j.appender.DRFA.DatePattern=.yyyy-MM-dd log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout # Pattern format: Date LogLevel LoggerName LogMessage log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # Debugging Pattern format #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n # TaskLog Appender #Default values hadoop.tasklog.taskid=null hadoop.tasklog.iscleanup=false hadoop.tasklog.noKeepSplits=4 hadoop.tasklog.totalLogFileSize=100 hadoop.tasklog.purgeLogSplits=true hadoop.tasklog.logsRetainHours=12 log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender log4j.appender.TLA.taskId=${hadoop.tasklog.taskid} log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup} log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize} log4j.appender.TLA.layout=org.apache.log4j.PatternLayout log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # HDFS block state change log from block manager # Uncomment the following to suppress normal block state change # messages from BlockManager in NameNode. #log4j.logger.BlockStateChange=WARN #Security appender hadoop.security.logger=INFO,NullAppender hadoop.security.log.maxfilesize=256MB hadoop.security.log.maxbackupindex=20 log4j.category.SecurityLogger=${hadoop.security.logger} hadoop.security.log.file=SecurityAuth-${user.name}.audit log4j.appender.RFAS=org.apache.log4j.RollingFileAppender log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file} log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize} log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex} # Daily Rolling Security appender log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file} log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd # hadoop configuration logging # Uncomment the following line to turn off configuration deprecation warnings. # log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN # hdfs audit logging hdfs.audit.logger=INFO,NullAppender hdfs.audit.log.maxfilesize=256MB hdfs.audit.log.maxbackupindex=20 log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger} log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize} log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex} # mapred audit logging mapred.audit.logger=INFO,NullAppender mapred.audit.log.maxfilesize=256MB mapred.audit.log.maxbackupindex=20 log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger} log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize} log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex} # Custom Logging levels #log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG #log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG #log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG # Jets3t library log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR # AWS SDK S3A FileSystem log4j.logger.com.amazonaws=ERROR log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN # Event Counter Appender # Sends counts of logging messages at different severity levels to Hadoop Metrics. log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter # Job Summary Appender # Use following logger to send summary to separate file defined by # hadoop.mapreduce.jobsummary.log.file : # hadoop.mapreduce.jobsummary.logger=INFO,JSA hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger} hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log hadoop.mapreduce.jobsummary.log.maxfilesize=256MB hadoop.mapreduce.jobsummary.log.maxbackupindex=20 log4j.appender.JSA=org.apache.log4j.RollingFileAppender log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file} log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize} log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex} log4j.appender.JSA.layout=org.apache.log4j.PatternLayout log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger} log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false # Yarn ResourceManager Application Summary Log # Set the ResourceManager summary log filename yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log # Set the ResourceManager summary log level and appender yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger} #yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY # To enable AppSummaryLogging for the RM, # set yarn.server.resourcemanager.appsummary.logger to # LEVEL ,RMSUMMARY in hadoop-env.sh # Appender for ResourceManager Application Summary Log # Requires the following properties to be set # - hadoop.log.dir (Hadoop Log directory) # - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename) # - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender) log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger} log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file} log4j.appender.RMSUMMARY.MaxFileSize=256MB log4j.appender.RMSUMMARY.MaxBackupIndex=20 log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n # HS audit log configs #mapreduce.hs.audit.logger=INFO,HSAUDIT #log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger} #log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false #log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender #log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log #log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout #log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n #log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd # Http Server Request Logs #log4j.logger.http.requests.namenode=INFO,namenoderequestlog #log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log #log4j.appender.namenoderequestlog.RetainDays=3 #log4j.logger.http.requests.datanode=INFO,datanoderequestlog #log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log #log4j.appender.datanoderequestlog.RetainDays=3 #log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog #log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log #log4j.appender.resourcemanagerrequestlog.RetainDays=3 #log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog #log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log #log4j.appender.jobhistoryrequestlog.RetainDays=3 #log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog #log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender #log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log #log4j.appender.nodemanagerrequestlog.RetainDays=3

二,编写WordCount程序

import java.io.IOException; 

import java.net.URI; 

import java.util.StringTokenizer; 

import org.apache.hadoop.conf.Configuration; 

import org.apache.hadoop.fs.Path; 

import org.apache.hadoop.io.IntWritable; 

import org.apache.hadoop.io.Text; 

import org.apache.hadoop.mapreduce.Job; 

import org.apache.hadoop.mapreduce.Mapper; 

import org.apache.hadoop.mapreduce.Reducer; 

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 

public class WordCount { 

 public static class TokenizerMapper 

 extends Mapper Object, Text, Text, IntWritable { 

 private final static IntWritable one = new IntWritable(1); 

 private Text word = new Text(); 

 public void map(Object key, Text value, Context context 

 ) throws IOException, InterruptedException { 

 StringTokenizer itr = new StringTokenizer(value.toString()); 

 while (itr.hasMoreTokens()) { 

 word.set(itr.nextToken()); 

 context.write(word, one); 

 public static class IntSumReducer 

 extends Reducer Text, IntWritable, Text, IntWritable { 

 private IntWritable result = new IntWritable(); 

 public void reduce(Text key, Iterable IntWritable values, 

 Context context 

 ) throws IOException, InterruptedException { 

 int sum = 0; 

 for (IntWritable val : values) { 

 sum += val.get(); 

 result.set(sum); 

 context.write(key, result); 

 public static void main(String[] args) throws Exception { 

 Configuration conf = new Configuration(); 

 System.setProperty("hadoop.home.dir", "E://softs//majorSoft//hadoop-2.7.5");//初始时解决winutils异常 

 conf.set("mapreduce.app-submission.cross-platform", "true");//允许远程访问 

 Path input = new Path(URI.create("hdfs://mycluster/testFile/wordCount")); 

 Path output = new Path(URI.create("hdfs://mycluster/output")); 

 Job job = Job.getInstance(conf, "word count"); 

 job.setJar("E://bigData//hadoopDemo//out//artifacts//wordCount_jar//hadoopDemo.jar");//必须要先打包出jar包 

 job.setJarByClass(WordCount.class); 

 job.setMapperClass(TokenizerMapper.class); 

 job.setCombinerClass(IntSumReducer.class); 

 job.setReducerClass(IntSumReducer.class); 

 job.setOutputKeyClass(Text.class); 

 job.setOutputValueClass(IntWritable.class); 

 FileInputFormat.addInputPath(job, input); 

 FileOutputFormat.setOutputPath(job, output); 

 System.exit(job.waitForCompletion(true) ? 0 : 1); 

} 

三,遇到的异常

1,RuntimeException, ClassNotFoundException: Class WordCount$Map not found . Mapper class issue 

job.setJar("WordCount.jar"); 

2,Exception message:/bin/bash:第0行fg:无任务控制 #表示运行远程访问格式 

conf.set(“mapreduce.app-submission.cross-platform”, “true”); 

和设置hdfs-site.xml 

 property 

 name dfs.permissions.enabled /name 

 value false /value 

 /property 

3. java.io.IOException: Could not locate executable null/bin/winutils.exe in the Hadoop binaries. 

System.setProperty("hadoop.home.dir", "E://softs//majorSoft//hadoop-2.7.5"); 

4,无法访问hdfs权限和识别不到集群 

修改C:/Windows/System32/drivers/etc文件

 

9432.html

分布式文件系统,分布式数据库区块链并行处理(MPP)数据库,数据挖掘开源大数据平台数据中台数据分析数据开发数据治理数据湖数据采集