Context-sensitive Spell Checking

Project leader: Eiríkur Rögnvaldsson

Collaborators: Hrafn Loftsson, Sigrún Helgadóttir

 

The purpose of this project is to investigate and develop methods for context-sensitive spell checking of Icelandic texts. Most spell checkers, including Friđrik Skúlason's Púki, only check words in isolation and determine whether they are spelled correctly. However, a considerable proportion of spelling errors does not involve non-words but rather valid words that are used in invalid places in sentences. Hence, spell checkers that only look at individual words are unable to detect many of the most common spelling mistakes that Icelanders make. Such mistakes can only be detected by investigating word patterns, syntactic patterns, and collocations, making use of frequency information, and employing statistical models.

 

In this project, all of the aforementioned methods will be used in order to develop context-sensitive spell checking for Icelandic. The goal is twofold; to make a detailed description and analysis of the methods that can be used for this purpose, and to write software that utilizes this description and analysis in spell checking. Such software could be used as an independent unit to scan texts that ordinary word-based spell checkers have already checked, and detect errors that such programs are incapable of noticing. The analysis, or the software, could also be linked to, or embedded in, word-based spell checkers to improve their functionality. We will focus on errors in homophones, and our target is to reach at least 90% correct analysis of such words.