window上使⽤idea运⾏Hadoop⾃带的WordCount⽰例代码
折腾了好久,昨天终于把Hadoop安装在了window有兴趣的也可以看⼀下怎么安装
安装完之后最关键的还是要知道怎么⽤吧,下⾯就使⽤Hadoop⾃带的WordCount例⼦运⾏⼀下
⼯具:idea和maven(没有的话可以去下载,安装⽅法很简单,和jdk的安装⼀样,如果嫌⿇烦,使⽤idea⾃带的maven也可以)
new⼀个项⽬,选择使⽤maven构建,注意⼀定要选择jdk1.8,然后下⾯的就⼀直点next,填写项⽬名称就可以(使⽤过maven的话构建⼀个新项⽬不难,如果不懂的可以去百度⼀下怎么使⽤maven)
构建完之后项⽬结构如下(不包括input⽂件夹和output⽂件夹)
然后我们新建⼀个⽂件夹input,注意这个⽂件夹要和src同⽬录,也就是放在hadoop项⽬⽬录下⾯,
然后右击input⽂件夹,并把它设置为Excluded,然后在input下⾯建⼀个⽂件,在⾥⾯写⼀些单词,最好单词多⼀些,并且存在⼤量重复的单词。output⽂件夹是程序运⾏时⾃动⽣成的,不⽤去创建
下⾯是代码部分:
1、添加l的依赖:在pom⽂件中添加以下代码
<dependencies>
<!-- mvnrepository/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.3</version>
</dependency>
<!-- mvnrepository/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.1.3</version>
</dependency>
<!-- mvnrepository/artifact/org.apache.hadoop/hadoop-mapreduce-client-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>3.1.3</version>
</dependency>
</dependencies>
注意:上⾯三个dependency都是必须的,否则运⾏会报错。maven会⾃动帮你添加这些jar包,如果你是直接使⽤idea⾃带的maven,可能会失败,因为要从国外下载jar包,所以建议还是⾃⼰下载⼀个maven并把映像地址改成阿⾥云(⾃⾏百度怎么修改maven配置)
2、添加⽇志⽂件log4j.properties:在resources⽂件下添加该⽂件,并添加代码
log4j.properties
# Set root logger level to DEBUG and its only appender to A1.
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t]%-5p %c %x -%m%n 3、WordCount代码:创建⼀个类
WordCount.class
package test;
import org.f.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* @Classname WordCount
* @Description TODO
* @Date 2019/12/6 22:12
* @Created by KingSSM
*/
*/
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one =new IntWritable(1);
private Text word =new Text();
public void map(Object key, Text value, Context context
)throws IOException, InterruptedException {
StringTokenizer itr =new String());
while(itr.hasMoreTokens()){
word.Token());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable result =new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
idea debug)throws IOException, InterruptedException {
int sum =0;
for(IntWritable val : values){
sum += ();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args)throws Exception {
Configuration conf =new Configuration();
Job job = Instance(conf,"word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
}
}
下⾯是怎么运⾏程序:
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论