This program requires a call to the acos() standard function. In order
to have access to this function, you should put "use Math::Trig;" at the
top of the program.

Start with checking if some file arguments have been supplied. If this
is not the case, generate an error message and stop the program (with 
die()). Otherwise, apply the program for exercise 6.3 to the files by
calling the function system(). The arguments of the function should be 
the same as what you normally type in at the command line: "perl" 
followed by "-w" and "63.pl", all separated by commas. The final 
argument should be the file list @ARGV.

Next, open the tf-idf file for the first argument (generated by 63.pl),
read its contents line by line and store it in a hash %tfidf. Do not
process the input file $file but its tf-idf file "$file.tfidf". The 
hash will contain a vector and we also need the length of the vector. 
You might as well start computing this length in the loop that 
processes the lines of the input file. The length of vector (x,y,z) 
is equal to the square root (sqrt()) of x*x+y*y+z*z. Define a variable 
$length before opening the file and in the line reading loop add $t*$t 
to this variable where $t represents the tf-idf score associated with 
a word. When the complete file has been processed, compute the square 
root (sqrt()) of $length.

After this, start a loop over all remaining files. For each file, the
file will be opened, its lines will be read and two values will be
adjusted: the sum of the products of the corresponding values in this
file and the first file, and the length of the vector in the current
file. Define two variables $product and $length2 before opening the
new file. While reading the file, add $t*$tfidf{$w} to $product if 
the $tfidf{$w} exists ($t is the tf-idf score of $w in the current 
file) and add $t*$t to $length2. 

When all the lines have been processed, compute the actual length of
the current vector by taking the square root of $length2. Then you
can compute the similarity of the current file and the first file
by applying acos() to $product divided by the two lengths. Print
the result of this computation together with the name of the current
file and continue with processing the next file.

In the output, there will be two extreme values: 0.000 means that the
current file contains exactly the same words as the first file (the
order of the words does not matter) while 1.571 (pi/2) means that the 
two files do not share any words. The lower the score received by 
a file, the more similar it is to the first file.

Here are the results of an example run with three files as well as
the files that were involved. You can try running your program on
the same files and check if you obtain the same results.

erikt@stuwww:~ perl -w 64.pl lim1.txt lim2.txt lim3.txt   
1.552 lim2.txt
1.547 lim3.txt
erikt@stuwww:~ cat lim1.txt
    There was a young man from Japan
    Whose limericks never would scan.

        When asked why this was,
        He answered 'because

    I always try to fit as many syllables into the last line as ever possibly I can.' 
erikt@stuwww:~ cat lim2.txt
    There once was a man from the sticks
    Who liked to compose limericks.

        But he failed at the sport,
        For he wrote 'em too short.

erikt@stuwww:~ cat lim3.txt
    There was an old man of St. Bees,
    Who was stung in the arm by a wasp;

        When they asked, "Does it hurt?"
        He replied, "No, it does n't,

    But I thought all the while 't was a Hornet!"