mapreduce 怎么查看每个reducer处理的数据量

您好,第一种方法是用Mapper读取文本文件用StringTokenizer对读取文件内的每一行的数字(Hadoop处理文本文件时,处理时是一行一行记取的)进行分隔,获取每一个数字,然后求和,再将求得的值按Key/Value格式写入Context,最后用Reducer对求得中间值进行汇总求和,得出整个文件所有数字的和。

第二种方法是用Mapper读取文本文件用StringTokenizer对文件内的数字进行分隔,获取每一个数字,并救出文件中该数字有多少个,在合并过程中,求出每个数字在文件中的和,最后用Reducer对求得每个数字求得的和进行汇总求和,得出整个文件所有数字的和。

package com.metarnet.hadoop;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class NumberSum {

//对每一行数据进行分隔,并求和

public static class SumMapper extends

Mapper<Object, Text, Text, LongWritable> {

private Text word = new Text("sum");

private static LongWritable numValue = new LongWritable(1);

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

long sum = 0;

while (itr.hasMoreTokens()) {

String s = itr.nextToken();

long val = Long.parseLong(s);

sum += val;

}

numValue.set(sum);

context.write(word, numValue);

}

}

// 汇总求和,输出

public static class SumReducer extends

Reducer<Text, LongWritable, Text, LongWritable> {

private LongWritable result = new LongWritable();

private Text k = new Text("sum");

public void reduce(Text key, Iterable<LongWritable> values,

Context context) throws IOException, InterruptedException {

long sum = 0;

for (LongWritable val : values) {

long v = val.get();

sum += v;

}

result.set(sum);

context.write(k, result);

}

}

/**

* @param args

* @throws Exception

*/

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args)

.getRemainingArgs();

if (otherArgs.length != 2) {

System.err.println("Usage: numbersum <in> <out>");

System.exit(2);

}

Job job = new Job(conf, "number sum");

job.setJarByClass(NumberSum.class);

job.setMapperClass(SumMapper.class);

job.setReducerClass(SumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(LongWritable.class);

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

System.out.println("ok");

}

}

第一种实现的方法相对简单,第二种实现方法用到了Combiner类,Hadoop 在 对中间求的的数值进行Combiner时,是通过递归的方式不停地对 Combiner方法进行调用,进而合并数据的。

从两种方法中,我们可以看出Map/Reduce的核心是在怎样对输入的数据按照何种方式是进行Key/Value分对的,分的合理对整个结果输出及算法实现有很大的影响。