Sunday, July 31, 2005

Checking the spelling in a given string, such as the text in a control, is something that a lot of Visual FoxPro applications can use. There are many ways to accomplish this.  Some Visual FoxPro developers use Microsoft Word similar to the way I did in my FAQ about it. Others shell out between $250-500 in order to purchase a third-party tool or control that will get the job done. And some, just do without.

Making a Contribution
Well, a couple days ago I was looking for a way to put my recent blog about the Visual FoxPro Community into action at this end, and I decided to try my hand at creating a class with the ability to easily add spelling checks to Visual FoxPro applications. Here are some of the important steps I took, followed by a screen capture of spelling checker I created in Visual FoxPro and a couple of links, so you can download a copy of the project (all source code included), some runnable examples, and even an alternate dictionary table.

Step 1:
Find a way to recognize a misspelling. Obviously I needed a dictionary. Luckily there are plenty of open source dictionaries out there, and I soon I had compiled a dictionary table of 144,238 words. The words came from Kevin Atkinson's Spell Checking Oriented Word Lists (SCOWL). I took English and American words up to level 70. Level 80 and 95 words are mostly obscure (scrabble players only), removed possessive words and duplicates, and gave it a few final tweaks. 140,000+ words is a nice size dictionary for any spelling checker.

Step 2:
Give the user some choices of possible matches. Here again I was lucky in that I've worked on a number of sounds-like algorithms for Visual FoxPro in the past (including: Improved Soundex, Metaphone, and Double Metaphone). In this case, because I was dealing with English words the Improved Soundex was all that was needed, and it is the fastest of the alternatives so that was a Win-Win. However, this only handled misspellings that sound like the actual spelling, if I ran into “thwrw” because the user's finger had slipped to the “w” instead of the “e”, the Improved Soundex algorithm wasn't going to do me much good.

Well, it just so happened that I had run into a posting of a class method for computing the Levenshtein Distance in Visual FoxPro and I had rewritten it (see the bottom of the page I linked). The Levenshtein Distance is an excellent indicator of how closely matched two words are even though they don't sound the same.

Step 3:
With the first two hurdles overcome, it was time to figure out what kind of design I would go for. I decided on doing the entire thing with classes and in such a way that new user interfaces could be created and hooked up. Now, bear in mind that this is just my first stab at it and less than two days have gone by, but I think you'll see some things you like if you care enough to dig into it.

Before I go any further let me say that I have not put the error handling into it yet. That will go in next. I find with projects of this size (and when there is a high potential for me to be changing my mind while coding) that it is easiest to put the error handling in once the design has stabilized. So please, no flames about the lack of any error handling... it's on the way. Or, better yet, since it will be a couple of days before I can get back to this, how about you throw in some really nice Try...Catch...Finally commands in it and shoot it back to me? LOL.

The guts of the spelling checker is in the spellcheck class in spellcheck.vcx and my first stab at an interface is the splchkdialog class in spellcheck.vcx.  Other than that the class library just holds a bunch of subclasses of the Visual FoxPro base classes. I want to encapsulate the logic even further, so interfaces are a complete snap to create and the spellcheck class is a black box engine (but it's still pretty decent as it sits now).

Step 4:
Return a list of suggestions for the user to pick from as replacements for a given misspelling really fast. I had to figure out a way to tweak the returned suggestion list that would handle the size of the dictionary table. I decided on an SQL select statement because of its flexibility and speed when optimized. Then I defined the criteria for the SQL where clause. I also ordered the elements in the Where clause so that the criteria that would weed out the most entries came first and the most resource intensive operations (such as the Levenshtein Distance) would come last.

This still wasn't fast enough, so I created another field in the dictionary table to hold the Improved Soundex codes for the words and indexed it. And created an index on the LEN(ALLTRIM()) of the word as well. This dramatically improved the speed and after a few more tweaks I had a workable SQL for returning the results. See the getsuggestions method of the spellcheck class.

Step 5:
Logically order the results returned to the user. Alphabetical returns are anything but logical when returning suggestions for a spellchecker. I decided to add a weight field to the suggestions curosr and base it on different aspects of the matches between the word in question and the words in the dictionary table. In this case, I went with lower weights meaning better the matches. The weight made an excellent field to base the Order by clause on as well. The top 10 results were finally being returned in a logical order. To see the way the weight is figured, look in the figureweight method of the spellcheck class.

Step 6:
Test, tweak, test fix... drink more coffee and Red Bull energy drink... Test, tweak, test, fix. And finally, create two examples, write a README.txt, zip it all up, take a couple screen shots, and write this blog. The download link is provided below (look on the right-hand side), full source included, don't forget to read the README.txt.

Visual FoxPro Spelling Checker

 Download VFP Spelling Checker (2MB approx.)

UPDATES: 07-31-2005 Created a new dictionary table based on the SCOWL so that copyright and redistribution rights were sure (removed previous two dictionaries). Optimized some code, fixed a couple of bugs, worked in the new dictionary and edited this blog entry to apply to the new dictionary table.
Sunday, July 31, 2005 12:46:58 PM (Central Daylight Time, UTC-05:00)  #    Comments [7]
Sunday, July 31, 2005 5:12:40 PM (Central Daylight Time, UTC-05:00)
AWESOME!

I'm looking forward to contributing to your spellcheck project.

Malcolm
Tuesday, February 14, 2006 12:32:47 AM (Central Standard Time, UTC-06:00)
I downloaded the spellcheck files on 02/13/2006

When I run the example1 form in vfp9 development mode, everything seems to work fine.

When I try to create spellcheck.app or spellcheck.exe in vfp9 I get an error: "class name is invalid". It does not give a name of the class. Any ideas?


Do you know of a 'medical/dental' dictionary that would work?
Monday, April 02, 2007 10:55:15 AM (Central Daylight Time, UTC-05:00)
I am having an interesting problem. When i use the class and not the com object (unable to in this install) the spell checker losses focus after it highlights the missspelt work if my application is set as the top-level. Anyone have any ideas on how to put the focus back to the spelling application with out having the person click on it?
Friday, May 18, 2007 3:32:02 AM (Central Daylight Time, UTC-05:00)
Thanks for great work.

The spellcheck requires following improvements:
(1) Manual correction is not provided. If a user wants to rectify the word, he/she should be allowed to that.

(2) The dictionary should have frequency of words. The ordering of suggested words should be done based on descending order of frequency. For Example: (1) For word 'iz', the suggested words are 'Izy', 'i', 'is' whereas 'is' should be the first suggestion. For 'yu', the word 'you' comes at the bottom of the list but it should be the first choice.

(3) The algorithm for suggestion of words may require changes. For word 'bi',
the word 'be' should be there in the list but is missing. Most of the words suggested are three letter words and one is single letter.

Please note that above suggestion are only for improvement. You are doing awesome work and please continue to do it.

P L Patodia
P L Patodia
Tuesday, July 03, 2007 3:30:21 PM (Central Daylight Time, UTC-05:00)
thanks Craig for this, and all the work you have put into it.

When I ran form Example1, it wanted to install EZ CD creator !

Ummm.

Thursday, November 29, 2007 2:08:40 PM (Central Standard Time, UTC-06:00)
Hi Craig,

I've been using the spellchecker recently and have come across a problem that I'm not sure how to work around. If I try to run the spellchecker from a modal form, the spellchecker does not get focus so I can't click on any of the options to "Replace", "Ignore" and so on.

Of course the problem is due to calling a modeless form from a modal form, so I tried setting the SpellChecker form to modal, but that still didn't work.

Have you come across this before and have a suggestion as to what I need to do to get this to work?

Thanks,

Frank
Frank Cazabon
Thursday, August 28, 2008 1:13:59 AM (Central Daylight Time, UTC-05:00)
I found a problem with the spellchecker class which I have fixed and enclose the fix for you here. The problem occurs when the end of line has a CRLF character. It is including these characters in the spelling lookup and hence, the last word on a line fails to be found.

Include the following in the currentsentence_assign method to strip CR/LF from the end of the sentence.

LPARAMETERS vNewVal
LOCAL lnCounter, lcPreviousSentence

*!* Keep track of multiple sentences within the search string that are the same
THIS.SentenceOccurence = 1
FOR lnCounter = 1 TO (THIS.sentencepointer - 1)
lcPreviousSentence = THIS.arysentences(lnCounter)
IF lcPreviousSentence == m.vNewVal
THIS.SentenceOccurence = THIS.SentenceOccurence + 1
ENDIF
ENDFOR

*!* David Younger - 29 August 2008 - Remove CR/LF characters from end of line.
DO WHILE INLIST(RIGHT(m.vNewVal,1),CHR(13),CHR(10)) AND LEN(m.vNewVal) > 0
m.vNewVal = LEFT(m.vNewVal,LEN(m.vNewVal)-1)
ENDDO

THIS.currentsentence = m.vNewVal

Hope this helps.

Cheers
David Younger
Name
E-mail
(will show your gravatar icon)
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u)  

Enter the code shown (prevents robots):


 

Archive

<October 2008>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678