您现在的位置是：首页 > 大数据

当前栏目

编译hadoop版的hello,world

hadoop 编译 Hello World

2023-09-11 14:15:06 时间

cd ~/src
mkdir classes
javac -classpath ~/hadoop-0.20.2/hadoop-0.20.2-core.jar WordCount.java -d classes
jar -cvf WordCount.jar -C classes/ .
hadoop jar WordCount.jar com.codestyle.hadoop.WordCount input output
hadoop fs -ls output
hadoop fs -cat output/part-00000

要点：

编译WordCount.java时必须通过classpath指定hadoop的库文件。指定源码输出到classes目录

打包class文件成为jar文件

通过hadoop调用jar文件执行MapReduce, 内容输出到output目录（如果该目录存在，则要先删掉这个目录）在命令参数中必须指定包名+类名

WordCount.java

package com.codestyle.hadoop;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

   public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
     private final static IntWritable one = new IntWritable(1);
     private Text word = new Text();

     public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
         word.set(tokenizer.nextToken());
         output.collect(word, one);
       }
     }
   }

   public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
     public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
       int sum = 0;
       while (values.hasNext()) {
         sum += values.next().get();
       }
       output.collect(key, new IntWritable(sum));
     }
   }

   public static void main(String[] args) throws Exception {
     JobConf conf = new JobConf(WordCount.class);
     conf.setJobName("wordcount");

     conf.setOutputKeyClass(Text.class);
     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(Map.class);
     conf.setReducerClass(Reduce.class);

     conf.setInputFormat(TextInputFormat.class);
     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0]));
     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);
   }
}

查看执行结果

lishujun@lishujun-virtual-machine:~/src$ hadoop fs -cat output/part-00000
Hadoop    1
Hello    2
World    1

参考资料：

http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html

http://blog.csdn.net/xw13106209/article/details/6862480

http://blog.csdn.net/turkeyzhou/article/details/8121601

猜你喜欢

印度将成全球第四大光伏市场超越英、德、法
24MyCat - 全局序列号（数据库方式）
做开发的目的是为了什么
C#通过代码注册COM组件
shell脚本报错："[: =: unary operator expected"
SQL INNER JOIN 关键字
微软最新博文：Windows 7寿命仅剩3年企业应尽早升级
一看就会 Android开发架构中[三化]的演变与示例
《PHP和MySQL Web开发从新手到高手（第5版）》一一1.6 第一个PHP脚本
关于变量问题的总结
09_EGIT插件的安装，Eclipse中克隆(clone)，commit,push,pull操作演示
JS异步编程 (1)
136Echarts - 树图（From Left to Right Tree）
V$SGA_RESIZE_OPS.STATUS = ERROR, and MMAN / auto-tuning stops.
Magicodes.IE之花式导出
开源不应作为推荐的理由
Pycharm中实现openCV安装好后简单测试
[置顶] Linux信号相关笔记
Docker安装Zookeeper

相关主题

Hadoop 生态系统
Hadoop生态系统
Hadoop安装教程
Hadoop常用命令
hadoop fs命令
hadoop面试题一
Hadoop概述
Hadoop是什么？
hadoop的块
Hadoop 2.x (一)
Hadoop 2.5.1编译
Hadoop：DataNode
hadoop 生态
Hadoop总结
hadoop下载

zl程序教程

当前栏目

编译hadoop版的hello,world

相关文章