subject
Engineering, 07.03.2020 05:43 wbrandi118

JAVA HADOOP MAPREDUCE

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount program should be of the form ‘wordfilename count’, where ‘’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount. java.

Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount. java’ on these two files will give an output similar to that below, where is a delimiter.

Output of DocWordCount. java

yellowfile2.txt 1

Hadoopfile2.txt 1

isfile2.txt 1

elephantfile2.txt 1

yellowfile1.txt 1

Hadoopfile1.txt 2

isfile1.txt 1

anfile2.txt 1

Initial code that needs to be modified:

package org. myorg;

import java. io. IOException;
import java. util. regex. Pattern;
import org. apache. hadoop. conf. Configured;
import org. apache. hadoop. util. Tool;
import org. apache. hadoop. util. ToolRunner;
import org. apache. log4j. Logger;
import org. apache. hadoop. mapreduce. Job;
import org. apache. hadoop. mapreduce. Mapper;
import org. apache. hadoop. mapreduce. Reducer;
import org. apache. hadoop. fs. Path;
import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
import org. apache. hadoop. io. IntWritable;
import org. apache. hadoop. io. LongWritable;
import org. apache. hadoop. io. Text;

public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount. class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job. setJarByClass( this .getClass());

FileInputFormat. addInputPaths(job, args[0]);
FileOutputFormat. setOutputPath(job, new Path(args[ 1]));
job. setMapperClass( Map .class);
job. setReducerClass( Reduce .class);
job. setOutputKeyClass( Text .class);
job. setOutputValueClass( IntWritable .class);

return job. waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText. toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word. isEmpty()) {
continue;
}
currentWord = new Text(word);
context. write(currentWord, one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count. get();
}
context. write(word, new IntWritable(sum));
}
}
}

ansver
Answers: 1

Another question on Engineering

question
Engineering, 03.07.2019 14:10
Amass of m 1.5 kg of steam is contained in a closed rigid container. initially the pressure and temperature of the steam are: p 1.5 mpa and t 240°c (superheated state), respectively. then the temperature drops to t2= 100°c as the result of heat transfer to the surroundings. determine: a) quality of the steam at the end of the process, b) heat transfer with the surroundings. for: p1.5 mpa and t 240°c: enthalpy of superheated vapour is 2900 kj/kg, specific volume of superheated vapour is 0. 1483 m/kg, while for t 100°c: enthalpy of saturated liquid water is 419kj/kg, specific volume of saturated liquid water is 0.001043m/kg, enthalpy of saturated vapour is 2676 kj/kg, specific volume of saturated vapour is 1.672 m/kg and pressure is 0.1 mpa.
Answers: 3
question
Engineering, 04.07.2019 18:10
Which from the following instrument is commonly used to detect the high pitch butzing sound in bearings? [clo4] a)-digital ultrasonic meter b)-infrared camera c)-spectroscopic d)-vibrometer
Answers: 2
question
Engineering, 04.07.2019 18:20
Most leaks in reciprocating air compressors can be detected and minimized by: (clo4) a)-detecting leakage areas using ultrasonic acoustic detector. b)-tightening joints and connections c)-replacing faulty equipment d)-all of the given options
Answers: 2
question
Engineering, 04.07.2019 19:10
Air inially occupying a volume of 1 m2 at 100 kpa, 27 c undergoes three internally reversible processes in series. process 1-2 compression to 500 kpa during which pv constant process 2-3 adiabatic expanslon to 100 kpa process 3-1: constant-pressure expansion to 100 kpa (a) calculate the change of entropy for each of the three processes. (b) calculate the heat and work involved in each process. (c) is this cycle a power cycle or refrigeration cycle?
Answers: 3
You know the right answer?
JAVA HADOOP MAPREDUCE

Modify the WordCount program so it outputs the wordcount for each...
Questions
question
Social Studies, 15.07.2019 18:00
Questions on the website: 13722361