Norvig Spellchecker for Perl

20 December 2020

Last week, I thought about how we could transcribe the speech of my friend's grandfather and noted that Sicilian speech is usually transcribed phonetically, whereas English speech is usually transcribed in a standard orthography.

This peculiar characteristic makes it difficult to create and use a Sicilian dictionary because the dictionary entry and the user search could be in phonetic or standard form.

Nonetheless, I never included a spell with the Dieli Dictionary, so this page describes my first efforts to write one in Perl. We use Perl to search the Dieli Dictionary, so our goal here is to develop a Perl module to perform Sicilian spell checking.

There are, of course, several perfectly good Perl modules that can perform the task. Lingua::Ispell, for example, implements the Ispell interactive spell-checking program, while Spellunker implements a spell checker purely in Perl.

But I also wanted to explore the simplicity that Peter Norvig demonstrates on his "How to Write a Spelling Corrector" page, where he shows that spell checking is a probability problem:  Given the word typed, what candidate word maximizes the probability of being the intended word?

To solve the problem, Norvig transforms the typed word with deletions, transpositions, replacements and insertions. Then he compares the counts of candidate words to counts of the typed word in a large text file.

And because Norvig only needed 36 lines of Python code to implement this spell checker, many people have reproduced it in other programming languages. Norvig lists them at the bottom of his page.

<< back to my multilingual blog