Due Thursday, March 17, 11:59 P.M.
- Overview
- Assignment
- Hints
The purpose of this lab is to introduce you to arrays and sorting
in Fortran.
Top
Write a Fortran program called anagrams
that reads in the UNIX dictionary file (/usr/share/dict/words)
and prints a list of all words that are anagrams of each other.
Note that you do not need to handle any other type of input files, and
that the online dictionary is formatted with one word per line and no
blank lines. You may assume the file is formatted correctly to simplify
input processing.
All anagrams of a single word should be printed on a single line
in alphabetical order. Additionally, you should print the lists in
decreasing order of the number of anagrams contained in each list.
If two lists contain the same number of anagrams, they should be printed
in alphabetical order on the first word in the list. Finally, words
without any anagrams should not be printed. So assuming the dictionary
contained only the following words:
acts
aide
bats
cast
cats
fast
fats
idea
pact
past
pats
spat
stab
tabs
taps
wind
the output of your program should be:
past pats spat taps
acts cast cats
bats stab tabs
aide idea
fast fats
Top
- Decide what the largest word you want to support will be. A good
way to do this is to determine the largest word in your actual input
file (28 characters), and pick a number that is comfortably larger than
the actual requirement without being too large, e.g. 64 characters.
Likewise you will want to decide the maximum number of words to support,
say 1048576. Create parameter variables for these values, as well as
the name of the file containing the dictionary. Now create an array of
character variables large enough to hold the dictionary. Whenever you are
declaring an array variable (and later on, whenever you are dynamically
allocating an array), do a quick check that you are not using too
much space. Since character variables use 1 byte per character, this
array uses 64 megabytes of memory, which is reasonable on any modern PC.
You can also determine exactly how much memory is avaiable using the
top command. Now read the dictionary into this array with
one word per array element, and write it out again to standard output.
Make sure you don't write out the uninitialized strings at the end of
the array.
- Now go back and change your array to make it two dimensional.
The second column of this array should contain the words of the
dictionary, as before. The first column should contain a key generated
from each word by shifting the word to all lower case and sorting its
letters alphabetically. So for the word Hippocrates, your
code would generate the key acehiopprst. You should implement
sorting using one of the three algorithms mentioned in class. Test this
by printing out each word with its associated key.
- Now sort the entire array using the keys in the first column (and
using the values in the second column as tiebreakers). Make sure your
sorting code interchanges entire rows and not just the keys in the
first column. Your array should now have all anagrams for a single
word in consecutive rows in alphabetical order; make sure your output
is consistent with this. Note that at this point, you are sorting
the entire dictionary using a relatively inefficient algorithm, which
requires approximately n2 passes through your data,
where n is the number of words in the dictionary. You may
want to print some intermittent status messages to standard error so
you know how much longer you have to wait for your code to finish.
These status messages will help reassure you that your code hasn't gone
into an infinite loop somehere. You may also find it useful to create
a truncated version of the dictionary file to use until you are fairly
sure your code works correctly.
- Now create another two-dimensional array of integers with the same
dimensions as before. (If you want to continue counting how much memory
you are now using, use the fact that integers take up four bytes each).
For each block of anagrams in your character array, make an entry in
your integer array with the size of the block in the first column and
the row number where the block starts in the second column. Test this
by modifying your output loop to use the integer array to get sequences
of indices into the character array, and make sure that each sequence
consists entirely of anagrams. Go ahead and modify your output statements
so that they print all anagrams of a single word on the same line.
The easiest way to do this is probably by concatenating the character
strings from the appropriate array elements into a single character
variable, which you then display.
- Now sort the integer array using the entries in the first column
(the block lengths) as sort keys, and modify your output code so that
only words with at least one anagram are printed. Your program should
now print the lists of anagrams in order of decreasing block lengths.
After you ensure that this works properly, go back and modify your integer
sorting code so that ties are broken by lexicographical comparision of
the words at the start of each block. The anagrams should now appear
in the indicated order in your output.
- Finally, go back and test using the full dictionary (assuming you
have been using a smaller version for testing). Clean up any intermediate
output statements you have been using for debugging so that only the word
lists are printed. You may want to leave in the status report messages
if you have them, since these will be just as useful to the user as
they were to you. As usual, you should check your program against
the output of my code before submitting.
If you have extra time, you may want to optimize the sorting routine
as much as possible to see if you can get a faster runtime than my code
(which used an unoptimized bubble sort, took 566.6 seconds to complete
on nehi6a, and took 562.1 seconds to complete on agnesi).
You can use the /usr/bin/time command to time your code.
Top
Back to the course information page