Tlvunarfri 2

Weekly note 6

We have now covered the first three chapters in the Algorithms book, with the exception of a few section. This week we will start with abstract data types in Chapter 4 and look at several implementations of stacks and queues.

Next week we will continue with implementations of stacks and queues.

Programming project 1

In this first programming project you are to write the program Drithfundur in C++. Drithfundur read in a book as a text file and "imitates" the book by writing a random text that has the same frequence distribution of letters as the original book.

Assume that we want to imitate the saga Njla. The primitive version of Drifhfundur would find the frequency of all letters in Njla and each time pick a letter at random based on those probabilities. For example if a space, ' ', is every fifth letter on averate in Njla, then the probability that the program outputs a space is 20%. The second most common letter might be the letter 'a', and then there is less probability of the program picking it, etc. The text that comes out has the same frequence distribution on the letters as Njla, but doesn't look much like Njla.

However, we can continue with this idea and base the probability on which letter came last For instance, there is a much higher probability for the letter '' if the last letter was an 'a', than if it was some other letter. This is because of how common words "a" and "a" are in Njla (as in most other icelandic books). Below is an example of how such a random text could look like:

a Gumirg hanni ir el t di s h hrekumrastrjg heg ai Gr s. ta ll ttti a vldindir, stir

Now we can increase the number of letters that we base the probability on, such that we look at the last k letters used. In the example above we had k = 1, but if we use the last two letters we can get the random text:

gir bjt sangbeir Gist hafli hg mr fa akarpingi st kvai. Gis a rar vorir eranga. a

As k increases the random text starts looking more like the original text. Let's look at k = 5:

taki til Mdal. a var Mrk ba heyverkum fyrir a yfir v valda, val-Freyjur randa f

and k = 10:

r a nta sjlfdmi Gunnar." Mrur var allra kvenna fegurst og best a sr orin um a er k

The algorithm that you are to use is not very complicated. The program reads the value of k, how long the random text should be and the name of the input file. It starts by reading the whole input file into a string (string variable or char array) and chooses a k-letter initial string at random from the input text as a precursor and prints it out. We then search for this k-letter precursor in the text and each time we find it we add the letter that immediately follows it in the text to a list. For example if k is 2 and the first precursor is "e" then the list of letters that that come after it could be ['s', 't', 's', 'i', 'i', ... ]. We then choose a letter at random from this list. Those letters that appear often in the list have a higher probability of being chosen than those that appear less often.
When the first random letter has been chosen it is printed out and the precursor updated by throwing away the first letter and appending the new letter. For example if the letter 't' had been chosen in the example above, then the next precursor would be "et", and we would then again make a list of all the letters that immediately follow the string "et" in the text.

There are various data structures that would make the seach for the letters that follow precursor faster, but in this project it is sufficient to just search by going sequentially through the text each time, recording the letter that follows every time we find the precursor.

This project should not be very hard. Try to do it as much as possible by yourselves, since this will benefit you most. As before, it is all right to discuss the project with your fellow students, but avoid accepting any program code.

You can find many books in icelandic at the web page of Nettgfan. They are all in HTML format, but there are not many HTML commands in the files, and you can always get the clean text by selecting the whole text in the browser (ctrl-a), copy it (ctrl-c), and pasting it into a texteditor (ctrl-v). There is also large amounts of books in english at the Project Gutenberg.

Hand in a short report on your program, where you describe in more detail the design of the program and show the results of running the program on 2-3 different books for the values of 1, 4, 7, and 10 of k. Also hand in the printout of the program to your section teacher before 6pm on friday February 25th. They will not accept solutions that arrive later.

annaing (hja) / hh (hja), February 14th, 2005.