编译hadoop版的hello,world
hadoop 编译 Hello World
2023-09-11 14:15:06 时间
cd ~/src mkdir classes javac -classpath ~/hadoop-0.20.2/hadoop-0.20.2-core.jar WordCount.java -d classes jar -cvf WordCount.jar -C classes/ . hadoop jar WordCount.jar com.codestyle.hadoop.WordCount input output hadoop fs -ls output hadoop fs -cat output/part-00000
要点:
编译WordCount.java时必须通过classpath指定hadoop的库文件。指定源码输出到classes目录
打包class文件成为jar文件
通过hadoop调用jar文件执行MapReduce, 内容输出到output目录 (如果该目录存在,则要先删掉这个目录)在命令参数中必须指定包名+类名
WordCount.java
package com.codestyle.hadoop; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
查看执行结果
lishujun@lishujun-virtual-machine:~/src$ hadoop fs -cat output/part-00000 Hadoop 1 Hello 2 World 1
参考资料:
http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html
http://blog.csdn.net/xw13106209/article/details/6862480
http://blog.csdn.net/turkeyzhou/article/details/8121601
相关文章
- “化鲲为鹏,我有话说”如何用鲲鹏弹性云服务器部署《Hadoop伪分布式》
- hadoop使用(五)
- HADOOP:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable终于解决了
- Hadoop Yarn公平调度器的特点、缺额、DRF策略
- 【Hadoop基础】hadoop fs 命令
- win系统下的eclipse连接和使用linux上的hadoop集群
- java.lang.NullPointerException Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@1398c56
- Hadoop 3.2.1 win10 64位系统 vs2015 编译
- Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://slaver1:9000/user/hadoop/tb_user already exists
- 17/11/24 05:08:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 大数据Hadoop之——数据采集存储到HDFS实战(Python版本)
- 大数据Hadoop之——Flink的状态管理和容错机制(checkpoint)
- Hadoop之hadoop fs命令
- Hadoop 未授权访问【原理扫描】及Apache Hadoop YARN 资源管理器 REST API未授权访问漏洞【原理扫描】修复记录
- FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset 错误解决
- Hadoop 概述
- Hadoop(26):Yarn中容量调度器多队列提交