subject

Yates' big file sorter

for this assignment, you will design and implement a sort program that still works correctly, even when there is not enough memory to store an entire input file. this will give you practice with collections, files, exceptions, and sorting algorithms.

program description

at runtime, your program should read the name of a "large" input file to sort, and the name to give the sorted output file.

the program will sort the lines in the input file, and save the sorted lines in the output file. your sort algorithm will be a hybrid of the mergesort algorithm and java's built-in sorting algorithm as a two-phase process, as described below.

phase 1: breaking the input file into manageable chunks

in the first phase, the sort algorithm will read in a fixed number of lines from the input file, and store them into an array. it should pick a manageable number that will fit into memory. next, it will sort that array using java's built-in arrays. sort method. then it will store the sorted array in a temporary file called "temp_0_0.txt".

the algorithm will repeatedly read in chunks of lines until it has read in the entire contents of the input file. each time it reads in a chunk of lines from the input file, it stores that chunk in an array, sorts the array, and saves the array in another temporary file ("temp_0_1.txt", "temp_0_2.txt", "temp_0_n. txt"). notice that it is reading in an amount that will fit into memory each time, so that it does not run out of memory.

after this first phase is complete, each of the "temp_0_i. txt" files is in sorted order, and together they contain all of the stuff that was in the input file. the second phase must merge the files together, while making sure that the merged files remain in sorted order.

phase 2: merging the chunks together

remember from class that the mergesort algorithm repeatedly merges two small sorted arrays into a larger sorted array. here, you will do the same, but instead repeatedly merge two small sorted files into a larger sorted file. take care that you only read in one line of each file into memory at a time don't read the entire files into memory, or you will run out of memory!

your sort algorithm should begin phase 2 by merging "temp_0_0.txt" and "temp_0_1.txt", and saving it to a new file called "temp_1_0.txt". next, it will merge "temp_0_2.txt" with "temp_0_3.txt", and save the merged file as "temp_1_1.txt". this will repeat until there are no more "temp_0_i. txt" files left. if there are an odd number of these files, the last one will have nothing to merge with. that's ok, it can be merged in later iterations. just copy it into the next available "temp_1_i. txt".

after merging pairs of the "temp_0_i. txt" files, the sort algorithm needs to begin merging pairs of the "temp_1_i. txt" files. it will begin by merging "temp_1_0.txt" and "temp_1_1.txt", and saving the result to "temp_2_0.txt". then, it will merge "temp_1_2.txt" and "temp_1_3.txt", and save the result to "temp_2_1.txt", and so on.

each time through the set of temporary files, the merging process cuts the number of temporary files roughly in half, because it merges two files into one. (i say "roughly" because there may be an odd number of files, in which case the last file does not get merged with anything.) this phase of the sorting must keep repeating, until there are only two temporary files left. when that happens, it should merge those temporary files, but instead of saving the result to another temporary file, it should save it to the output file. then the sort is finished.

guidelines for testing your program

you may find a reasonably large text file (~7 mb) here, which you may use to test your program. it contains aesop's fables, the complete works of shakespeare, mary shelley's frankenstein, and mark twain's the adventures of huckleberry finn.

the file is certainly small enough to fit into memory i didn't want to you to have to deal with an enormous file. however, you should choose a setting so that your program reads in less than the whole file at a time. at first, try reading in around 1/20 of the lines at a time into memory during the first phase. you should test your program with several different settings.

unit tests

write a junit test to make sure that your merge() function works correctly. be sure to test that it merges arrays that are the same size and two arrays that are not the same size.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 21:40
Which is a benefit of getting information from a government website? a. the information will be easy to understand. ob. the information will be the most current. oc. the information can be trusted.
Answers: 1
question
Computers and Technology, 22.06.2019 02:00
Consider how gaming consoles initially relied on joysticks and joypads and then made the switch to modern gaming controls, which include analog sticks, buttons and switches, touch controls, accelerometers, motion controls, etc. name at least two kinds of gaming experiences that are possible with these new control devices but were not possible on original joysticks. explain how new technologies made this newer game style possible.
Answers: 1
question
Computers and Technology, 22.06.2019 13:50
The instruction ishl (shift left integer) exists in jvm but not in ijvm. it uses the top two values on the stack, replacing the two with a single value, the result. the sec- ond-from-top word of the stack is the operand to be shifted. its content is shifted left by a value between 0 and 31, inclusive, depending on the value of the 5 least signifi- cant bits of the top word on the stack (the other 27 bits of the top word are ignored). zeros are shifted in from the right for as many bits as the shift count. the opcode for ishl is 120 (0x78).a. what is the arithmetic operation equivalent to shifting left with a count of 2? b. extend the microcode to include this instruction as a part of ijv.
Answers: 1
question
Computers and Technology, 23.06.2019 23:40
4. what is the reason for including the following code snippet in the header file animal.h? #ifndef animal_h #define animal_h class animal { public: animal(); animal(double new_area_hunt); void birth(); void hunt(double new_area_hunt); void death(); double get_area_hunt() const; private: double area_hunt; }; #endif
Answers: 3
You know the right answer?
Yates' big file sorter

for this assignment, you will design and implement a sort program...
Questions
question
English, 14.12.2020 22:40
question
History, 14.12.2020 22:40
Questions on the website: 13722360