Copy Link
Add to Bookmark
Report
AIList Digest Volume 8 Issue 015
AIList Digest Monday, 18 Jul 1988 Volume 8 : Issue 15
Today's Topics:
More on the Soundex algorithm
----------------------------------------------------------------------
Date: 11 Jul 88 18:21:16 GMT
From: sundc!netxcom!sdutcher@seismo.css.gov (Sylvia Dutcher)
Subject: Re: Soundex algorithm
In article <12520@sunybcs.UUCP> stewart@sunybcs.UUCP (Norman R. Stewart) writes:
>
> The source I've used for Soundex (developed by the
>Remington Rand Corp., I believe), is
>
> Huffman, Edna K. (1972) Medical Record Management.
> Berwyn, Illonois: Physicians' Record Company.
I've written a soundex program based on the rules in Knuth's _Searching_and_
Sorting. These are also the rules used at the National Archives to sort
census data. These rules differ slightly from the ones posted by Mr. Stewart.
If you don't need to match anyone else's soundex, the most important rule is
to be consistent. I will insert Knuth's rules below.
>The algorithm is very simple,
1. Retain the first letter of the name, and drop all occurrances of a, e, h,
i, o, u, w, y in other positions.
>1: Assign number values to all but the first letter of the
>word, using this table
> 1 - B P F V 2 - C S K G J Q X Z
> 3 - D T 4 - L
> 5 - M N 6 - R
> 7 - A E I O U W H Y
2. Assign number values as above, except for 7.
>2: Apply the following rules to produce a code of one letter and
> three numbers.
> A: The first letter of the word becomes the initial character
> in the code.
> B: When two or more letters from the same group occur together
> only the first is coded.
> C: If two letters from the same group are seperated by an H or
> a W, code only the first.
3. If two or more letters with the same code were adjacent in the original
name (begore step 1), omit all but the first.
> D: Group 7 letters are never coded (this does not include the
> first letter in the word, which is always coded).
4. Convert to the form "letter, digit, digit, digit" by adding trailing
zeros or dropping rightmost digits.
BTW according to the reference in Knuth's book, this algorithm was
developed by Margaret Odell and Robert Russell in 1922.
>Norman R. Stewart Jr. * How much more suffering is
>C.S. Grad - SUNYAB * caused by the thought of death
>internet: stewart@cs.buffalo.edu * than by death itself!
>bitnet: stewart@sunybcs.bitnet * Will Durant
--
Sylvia Dutcher * The likeliness of things
NetExpress Communications, Inc. * to go wrong is in direct
1953 Gallows Rd. * proportion to the urgency
Vienna, Va. 22180 * with which they shouldn't.
------------------------------
Date: Wed, 13 Jul 88 10:35:06 EDT
From: "William J. Joel" <JZEM%MARIST.BITNET@MITVMA.MIT.EDU>
Subject: Re: Soundex algorithm
/* The following is source code for a Soundex algorithm written in */
/* Waterloo Prolog. */
/* William J. Joel*/
/* Marist College */
/* Poughkeepsie, NY */
/* jzem@marist.bitnet */
key(a,-1).
key(b,1).
key(c,2).
key(d,3).
key(e,-1).
key(f,1).
key(g,2).
key(h,-2).
key(i,0).
key(j,2).
key(k,2).
key(l,4).
key(m,5).
key(n,5).
key(o,-1).
key(p,1).
key(q,2).
key(r,6).
key(s,2).
key(t,3).
key(u,-1).
key(v,1).
key(w,-3).
key(x,2).
key(y,-2).
key(z,2).
soundex(Name,Code)<-
string(Name,Code1) & write(Code1) &
soundex1(Code1,A.B.C.D.Rem) &
string(Code,A.B.C.D.nil).
soundex1(Head.Code1,Head.Code)<-
keycode(Head.Code1,Code2) & write(Code2) &
reduce(Code2,T.Code3) & write(T.Code3) &
eliminate(Code3,Code4) & write(Code4) &
append(Code4,0.0.0.nil,Code).
reduce(X.(-2).X.Rem,List)<-
reduce(X.Rem,List).
reduce(X.X.Rem,List)<-
reduce(X.Rem,List).
reduce(X.Y.Z.Rem,X.List)<-
^X==Z &
reduce(Y.Z.Rem,List).
reduce(X.Y.Rem,X.List)<-
^X==Y &
reduce(Y.Rem,List).
reduce(X.nil,X.nil).
reduce(nil,nil).
eliminate(X.Rem,List)<-
lt(X,0) &
eliminate(Rem,List).
eliminate(X.Rem,X.List)<-
gt(X,0) &
eliminate(Rem,List).
eliminate(nil,nil).
keycode(H.T,N.CodeList)<-
key(H,N) &
keycode(T,CodeList).
keycode(nil,nil).
append(Head.Tail,List,Head.NewList)<-
append(Tail,List,NewList).
append(nil,List,List).
------------------------------
Date: 13 Jul 88 17:02:31 GMT
From: ihnp4!twitch!anuck!jrl@bloom-beacon.mit.edu (j.r.lupien)
Subject: Re: Soundex algorithm
>From article <1552@randvax.UUCP>, by leverich@randvax.UUCP (Brian Leverich):
> Incidentally, does anyone know if there's been any genealogy applications
> built using Prolog or the like? Looks like a logic programming approach
> to maintaining relations between individuals might be a big win. -B
Not really a C question anymore, but there is such a beast. The
United Kingdom imigration and naturalization department has a
Prolog implementation for their citizenship status analysis system.
Apparently the rules involved are very complicated and perversely
interconnected, having to do with date and place of birth, date and
place of birth of both parents and their citizenship at the time of
your birth as well as now, and probably the phase of the moon etc.
I can't give you a reference for this offhand, since I heard about
it in a keynote lecture by the MIT AI lab. They should be able to
point you in the right direction.
John Lupien
twitch!mvuxe!anuxh!jrl
------------------------------
End of AIList Digest
********************