Copy Link
Add to Bookmark
Report
AIList Digest Volume 8 Issue 004
AIList Digest Tuesday, 12 Jul 1988 Volume 8 : Issue 4
Today's Topics:
Queries:
Soundex algorithm (3 responses)
Syllables of English (3 responses)
----------------------------------------------------------------------
Date: 8 Jul 88 17:18:30 GMT
From: hubcap!shorne@gatech.edu (Scott Horne)
Subject: Soundex algorithm
Does anyone have a reference to info on the design of the Soundex algorithm?
Source code (whatever language) would be helpful, too.
Advance thanks. (BTW, it's probably best to post, as mail is at best shaky
at this site.)
--Scott Horne
BITNET: PHORNE@CLEMSON (not working; please use another address)
uucp: ....!gatech!hubcap!scarle!{hazel,citron,amber}!shorne
(If that doesn't work, send to cchang@hubcap.clemson.edu)
SnailMail: Scott Horne
812 Eleanor Dr.
Florence, SC 29501
VoiceNet: 803 667-9848
------------------------------
Date: 9 Jul 88 04:20:40 GMT
From: wesommer@athena.mit.edu (William Sommerfeld)
Subject: Re: Soundex algorithm
Sorry for the length of this posting..
In article <2130@hubcap.UUCP> shorne@citron writes:
>Does anyone have a reference to info on the design of the Soundex algorithm?
This one is a somewhat superficial article; it contains a short
Apple ][+ BASIC program which implements the soundex algorithm.
@article{soundx,
AUTHOR="Jacob R. Jacobs",
TITLE="Finding Words That Sound Alike: The Soundex Algorithm",
YEAR="1982",
MONTH="March",
JOURNAL="Byte"
}
Fortunately, it references the following, which talks about many
algorithms other than just Soundex:
@article{acmsoundex,
AUTHOR="Patrick A. V. Hill and Geoff R. Dowling",
TITLE="Approximate String Matching",
JOURNAL="ACM Computing Surveys",
VOLUME="12",
MONTH="December",
YEAR="1980"
}
>Source code (whatever language) would be helpful, too.
You asked for it, you got it.
Don't ask me why it's in BCPL; I didn't write it (but I'm going to
have to convert it to C Real Soon Now (before DECSYSTEM-20 that it
runs on turns into scrap metal).
structure
{ SoundXCode^1^4 char
}
SoundX(Str) := valof
{ let Value := 0
let S := vec 40
CopyString(Str, S)
RaiseString(S)
Value<<SoundXCode^1 := S>>String.C^1
let N := 2
and PreviousSoundX := -1
for i := 2 to S>>String.N do
{ let Ch := S>>String.C^i
let ThisSoundX := selecton Ch into
{ default: 0
case $F:
case $V: 1
case $C:
case $G:
case $J:
case $K:
case $Q:
case $S:
case $X:
case $Z: 2
case $B:
case $P:
case $D:
case $T: 3
case $L: 4
case $M:
case $N: 5
case $R: 6
}
if ThisSoundX=0 \ ThisSoundX=PreviousSoundX loop
Value<<SoundXCode^N := ThisSoundX
PreviousSoundX := ThisSoundX
N := N+1
if N=5 break
}
resultis Value
}
and SoundXCompare(DataBase, Attempt) := valof
{ let DBSoundX := SoundX(DataBase)
for i := 1 to 4 do
{ let ThisAttempt := Attempt<<SoundXCode^i
if ThisAttempt=0 resultis true
if ThisAttempt ne DBSoundX<<SoundXCode^i resultis false
}
resultis true
}
------------------------------
Date: 10 Jul 88 16:14:00 GMT
From: leverich@rand-unix.arpa (Brian Leverich)
Subject: Re: Soundex algorithm
If you aren't satisfied with the responses you've already received,
try posting to the genealogy newsgroup (rec.genealogy or whatever...).
Soundex is used to index many lists of names, and there are several PD
programs genealogists use for converting names to Soundex.
Incidentally, does anyone know if there's been any genealogy applications
built using Prolog or the like? Looks like a logic programming approach
to maintaining relations between individuals might be a big win. -B
--
"Simulate it in ROSS"
Brian Leverich | U.S. Snail: 1700 Main St.
ARPAnet: leverich@rand-unix | Santa Monica, CA 90406
UUCP/usenet: decvax!randvax!leverich | Ma Bell: (213) 393-0411 X7769
------------------------------
Date: 11 Jul 88 13:38:21 GMT
From: rochester!ur-tut!sunybcs!stewart@bbn.com (Norman R. Stewart)
Subject: Re: Soundex algorithm
The source I've used for Soundex (developed by the
Remington Rand Corp., I believe), is
Huffman, Edna K. (1972) Medical Record Management.
Berwyn, Illonois: Physicians' Record Company.
The algorithm is very simple,
1: Assign number values to all but the first letter of the
word, using this table
1 - B P F V
2 - C S K G J Q X Z
3 - D T
4 - L
5 - M N
6 - R
7 - A E I O U W H Y
2: Apply the following rules to produce a code of one letter and
three numbers.
A: The first letter of the word becomes the initial character
in the code.
B: When two or more letters from the same group occur together
only the first is coded.
C: If two letters from the same group are seperated by an H or
a W, code only the first.
D: Group 7 letters are never coded (this does not include the
first letter in the word, which is always coded).
Of course, this can be used without the numeric substitution to
produce abbreviations also, but the numbers indicate the phonemic
similarity (e.g. Bear = Bare = B6), or Rhymes (e.g. Glare = G46,
Flair = F46). This can also be useful for finding duplicate entries
in a large database, where a name may be slightly mis-spelled (e.g.
Smith = Simth = S53).
Norman R. Stewart Jr. * How much more suffering is
C.S. Grad - SUNYAB * caused by the thought of death
internet: stewart@cs.buffalo.edu * than by death itself!
bitnet: stewart@sunybcs.bitnet * Will Durant
------------------------------
Date: 6 Jul 88 14:14:26 GMT
From: ece-csc!ncrcae!gollum!rolandi@ncsuvx.ncsu.edu (Walter Rolandi)
Subject: syllables of English
Can anyone provide me with a list of all the constituent syllables of English?
Any ideas as to how one could produce such a list would be greatly appreciated.
Thanks.
Walter Rolandi
rolandi@gollum.UUCP
rolandi@ncrcae.Columbia.NCR.COM
NCR Advanced Systems Development, Columbia, SC
------------------------------
Date: 7 Jul 88 02:03:31 GMT
From: hubcap!shorne@gatech.edu (Scott Horne)
Subject: Re: syllables of English
>From article <125@gollum.UUCP>, by rolandi@gollum.UUCP (Walter Rolandi):
>
> Can anyone provide me with a list of all the constituent syllables of English?
I've read that there are more than 8000 such syllables (DeFrancis, _The
Chinese Language: Fact and Fantasy_, U. of Hawaii). Good luck compiling a
list! (N.B.: Those are phonetically distinct syllabes, not graphically
distinct.)
Incidentally, Japanese has just over 100 syllables.
--Scott Horne
BITNET: PHORNE@CLEMSON (not working; please use another address)
uucp: ....!gatech!hubcap!scarle!{hazel,citron,amber}!shorne
(If that doesn't work, send to cchang@hubcap.clemson.edu)
SnailMail: Scott Horne
812 Eleanor Dr.
Florence, SC 29501
VoiceNet: 803 667-9848
------------------------------
Date: 7 Jul 88 16:07:17 GMT
From: uhccux!stampe@humu.nosc.mil (David Stampe)
Subject: Re: syllables of English
If it's possible, rather than occurring, English syllables you want, you
might look at diagrams for possible monosyllables, as in Zellig Harris,
Methods in Structural Linguistics, U. Chicago Press, 195?. Stressed
syllables in polysyllables are a subset of those in monosyllables.
Unstressed syllables are a subset of stressed syllables, unless you take
the consonantal nuclei in rubber, rubble, ribbon, rub'm to be distinct
from the nuclei of brr, bull, bun, bum. Such diagrams are approximations,
since the number of phonemes and especially the number of possible
combinations into syllables differs somewhat among dialects and
individuals. They usually admit hundreds of pronounceable but very
peculiar syllables like trart, klilk, kwuw, smamp, oyj, awb.
David (stampe@uhccux.uhcc.hawaii.edu)
------------------------------
Date: 8 Jul 88 19:22:34 GMT
From: att!ihlpa!krista@bloom-beacon.mit.edu (Anderson)
Subject: Re: syllables of English
<>
To Walter R.: I tried to send mail, but it bounced. I don't
have a list of English syllables, but I do have a list of consonant
clusters and vowels. If you want it, I'll post it; however, it is
about 250 lines.
Actually, I made the list when I was trying to understand why
a Navajo friend was having trouble with some English words.
I wrote all the English consonant clusters I could think
of, including those that occur only in the *final* positions of
words. I came up with about 197 consonants and consonant clusters!
And the list is probably not be conclusive.
Since Navajo has only about 35 consonants and clusters, of which
about 15 intersect the English set, I gained a lot of sympathy for
anybody learning English as a second language. I've heard that
Polish has a lot of clusters; anybody know how many? Cherokee has
only 13 consonants (no clusters), I seem to recall. Tlingit
(related to Navajo) is reputed to have a great many phonemes (50
compared to English 35); but these figures do not include clusters.
By the way, Cherokee is about the prettiest language I've ever
heard. It was once a tonal language, but the tones lost their
meaning in most words, at least in the western dialect. However, a
light, musical quality remains.
Shut me up, please! If you want the list, let me know.
Krista Anderson, ihnp4!ihlpa!krista, but we may be shutting down email?
------------------------------
End of AIList Digest
********************