安卓开发实例入门⼤数据:Hadoop⼊门经典案例wordcount单词统计Java代码实
现
⼤数据:Hadoop⼊门经典案例wordcount单词统计Java代码实现,Windows 10环
境,IntelliJ IDEA集成开发环境。
附1通过命令⾏形式实现⼤数据平台Hadoop下单词统计功能。现在通过⾃⼰编写Java代码实现。本例基于Hadoop 2.8.3,Windows
10(64位)。开发环境是Windows下的IntelliJ IDEA。
1,⾸先需要为IntelliJ IDEA增加maven依赖,在项⽬的l中添加以下Hadoop开发需要的依赖包,然后同步:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.3</version>
</dependency>
</dependencies>
2,编写单词统计的Java代码。
主类WordCountMain.java:
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountMain {
public WordCountMain(String[] args) throws Exception {
Configuration configuration = new Configuration();
Job job = Instance(configuration, "word_count");
job.setJarByClass(WordCountMain.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.out.println(job.waitForCompletion(true) ? "运⾏成功" : "运⾏失败"); }
public static void main(String[] args) {
try {
WordCountMain wordCountMain = new WordCountMain(args);
} catch (Exception e) {
e.printStackTrace();
}
}
}
map类:
public class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override
protected void map(LongWritable key, Text value, Context context) {
String line = String();
String[] words = line.split(" ");
for (String word : words) {
// 将单词作为key,将次数1作为value。
try {
context.write(new Text(word), new LongWritable(1));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
reducer类:
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) { int count = 0;
for (LongWritable value : values) {
count += ();
}
try {
context.write(key, new LongWritable(count));
} catch (Exception e) {
e.printStackTrace();
}
}
}
如果不删除这两个⽂件夹,在jar包运⾏时候,会抛出错误导致运⾏失败。
4,通过start-all命令,启动Hadoop,如附录1中那样。只不过这次使⽤的jar包不是Hadoop⽰例代码中的jar包,⽽是我⾃⼰编写的Java 代码:
hadoop jar E:/code/IdeaProjects/bigdata/out/artifacts/bigdata_jar/bigdata_jar.jar WordCountMain /test_dir/myfile /test_dir/result
bigdata_jar.jar包就是上⾯Java代码在第三步IntelliJ IDEA中⽣成的jar包。
运⾏后输出的结果和附1相同。
附:
1、命令⾏⽅式使⽤Hadoop⾃带的word count单词统计jar包统计字符单词数
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论