Math 171 Spring 2005, Lab 12 (double lab)

Due Thursday, March 17, 11:59 P.M.

  1. Overview
  2. Assignment
  3. Hints

Overview

The purpose of this lab is to introduce you to arrays and sorting in Fortran.

Top

Assignment

Write a Fortran program called anagrams that reads in the UNIX dictionary file (/usr/share/dict/words) and prints a list of all words that are anagrams of each other. Note that you do not need to handle any other type of input files, and that the online dictionary is formatted with one word per line and no blank lines. You may assume the file is formatted correctly to simplify input processing.

All anagrams of a single word should be printed on a single line in alphabetical order. Additionally, you should print the lists in decreasing order of the number of anagrams contained in each list. If two lists contain the same number of anagrams, they should be printed in alphabetical order on the first word in the list. Finally, words without any anagrams should not be printed. So assuming the dictionary contained only the following words:


acts
aide
bats
cast
cats
fast
fats
idea
pact
past
pats
spat
stab
tabs
taps
wind

the output of your program should be:

past pats spat taps
acts cast cats
bats stab tabs
aide idea
fast fats

Top

Hints

  1. Decide what the largest word you want to support will be. A good way to do this is to determine the largest word in your actual input file (28 characters), and pick a number that is comfortably larger than the actual requirement without being too large, e.g. 64 characters. Likewise you will want to decide the maximum number of words to support, say 1048576. Create parameter variables for these values, as well as the name of the file containing the dictionary. Now create an array of character variables large enough to hold the dictionary. Whenever you are declaring an array variable (and later on, whenever you are dynamically allocating an array), do a quick check that you are not using too much space. Since character variables use 1 byte per character, this array uses 64 megabytes of memory, which is reasonable on any modern PC. You can also determine exactly how much memory is avaiable using the top command. Now read the dictionary into this array with one word per array element, and write it out again to standard output. Make sure you don't write out the uninitialized strings at the end of the array.
  2. Now go back and change your array to make it two dimensional. The second column of this array should contain the words of the dictionary, as before. The first column should contain a key generated from each word by shifting the word to all lower case and sorting its letters alphabetically. So for the word Hippocrates, your code would generate the key acehiopprst. You should implement sorting using one of the three algorithms mentioned in class. Test this by printing out each word with its associated key.
  3. Now sort the entire array using the keys in the first column (and using the values in the second column as tiebreakers). Make sure your sorting code interchanges entire rows and not just the keys in the first column. Your array should now have all anagrams for a single word in consecutive rows in alphabetical order; make sure your output is consistent with this. Note that at this point, you are sorting the entire dictionary using a relatively inefficient algorithm, which requires approximately n2 passes through your data, where n is the number of words in the dictionary. You may want to print some intermittent status messages to standard error so you know how much longer you have to wait for your code to finish. These status messages will help reassure you that your code hasn't gone into an infinite loop somehere. You may also find it useful to create a truncated version of the dictionary file to use until you are fairly sure your code works correctly.
  4. Now create another two-dimensional array of integers with the same dimensions as before. (If you want to continue counting how much memory you are now using, use the fact that integers take up four bytes each). For each block of anagrams in your character array, make an entry in your integer array with the size of the block in the first column and the row number where the block starts in the second column. Test this by modifying your output loop to use the integer array to get sequences of indices into the character array, and make sure that each sequence consists entirely of anagrams. Go ahead and modify your output statements so that they print all anagrams of a single word on the same line. The easiest way to do this is probably by concatenating the character strings from the appropriate array elements into a single character variable, which you then display.
  5. Now sort the integer array using the entries in the first column (the block lengths) as sort keys, and modify your output code so that only words with at least one anagram are printed. Your program should now print the lists of anagrams in order of decreasing block lengths. After you ensure that this works properly, go back and modify your integer sorting code so that ties are broken by lexicographical comparision of the words at the start of each block. The anagrams should now appear in the indicated order in your output.
  6. Finally, go back and test using the full dictionary (assuming you have been using a smaller version for testing). Clean up any intermediate output statements you have been using for debugging so that only the word lists are printed. You may want to leave in the status report messages if you have them, since these will be just as useful to the user as they were to you. As usual, you should check your program against the output of my code before submitting. If you have extra time, you may want to optimize the sorting routine as much as possible to see if you can get a faster runtime than my code (which used an unoptimized bubble sort, took 566.6 seconds to complete on nehi6a, and took 562.1 seconds to complete on agnesi). You can use the /usr/bin/time command to time your code.

Top

Back to the course information page