Due Date:
Thursday, April 4
File(s) to be submitted:
Huffman.java
Sample Output:
SampleOutput.html
Summary
Write a program called Huffman that uses a PriorityQueue to build a (simplified) "Huffman tree". (The "tree" you create is actually just a String with the tree "encoded" inside it.)
In order to pass large files thru the Internet, they are often compressed into a smaller file that encodes all the information in the larger file. The smaller file can then be sent (more quickly than the larger file) and decompressed at the other end.
One way of compressing a file
is to use what's called a Huffman code.
The idea of a Huffman code
is that the most frequent characters
(or whatever)
in the file
should be encoded with the shortest bit strings.
In ASCII a file with 1000 As followed by an H and an exclamation point
would use 8016 bits.
A Huffman code might use a 0
to represent the A,
10
to represent the H
and 11
to represent the exclamation point --
bringing the total number of bits down to 1004
(plus a few more bytes to encode the table of Huffman codes).
Creating the Huffman code requires the use of a priority queue. The method is as follows:
The final element left in the priority queue is the "tree" used to generate the Huffman codes. (We'll skip generating the codes; just print out the "tree".)
You will need to create a private static class for the elements of the priority queue. Call that class Entry and include it in your program file. It'll hold a String (the "tree" or a single character) and its total frequency.
Create a List of Entrys for counting the frequencies. For each character in the String, go thru the list looking for the Entry with that character. If you find it, add one to its count. If it's not there, create a new Entry and add it to the List.
In building the "code" you need to combine Entrys together. To do so, add their frequencies together, and put the Strings from the two Entrys together in parentheses, separated by a comma, to make the "tree".
For example, conside the String "AAAAAAAAAAH!!". The frequencies are:
The Entrys ("H", 1) and ("!", 2) come out first (in that order). We combine them to form the node ("(H,!)", 3). That goes into the PQ. Then ("(H,!)", 3) and ("A", 10) come out of the PQ. We combine them to form the Entry ("((H,!),A)", 13), and it goes back into the PQ. At this point there is only one Entry in the PQ, so it has the "tree" in it. The String "((H,!),A)" gets printed as the "tree". (See the sample output for more examples.)
Your program must produce the same output as is shown in the sample output, for the first two examples shown. Note that for the third example, there are many equally good trees that could be produced. Your code only needs to produce a good tree for whatever input it's given.