Copy Link
Add to Bookmark
Report
NL-KR Digest Volume 05 No. 10
NL-KR Digest (8/19/88 21:23:10) Volume 5 Number 10
Today's Topics:
Looking for e-mail address
Re: Category Theory in AI
English grammar (open/closed classes)
doublespeak, Orwell_is_here!
Re: p. from Tom Bever on intuitions ...
Readability Formula
Language and Language Acquisition Conference
Acquiring a Model of the User's Beliefs ...
Workshop Announcement
Call for Panels for IJCAI-89
Submissions: NL-KR@CS.ROCHESTER.EDU
Requests, policy: NL-KR-REQUEST@CS.ROCHESTER.EDU
----------------------------------------------------------------------
Date: Thu, 11 Aug 88 15:18 EDT
From: mcvax!cs.tcd.ie!root@uunet.UU.NET, Paul Harrington <phrrngtn@cs.tcd.ie>
Subject: Looking for e-mail address
I am trying to get in contact with a researcher in natural language
processing called Andrew Haase.
I think he is working somewhere on the East coast of the USA.
I'd be grateful for any replies to the above address.
Thanks,
Paul Harrington.
------------------------------
Date: Thu, 11 Aug 88 21:30 EDT
From: Donald F. Geddis <geddis@atr-la.atr.junet>
Subject: Re: Category Theory in AI
In article <32864@philabs.Philips.Com>, dpb@philabs.philips.com (Paul Benjamin) writes:
> Some of us here at Philips Laboratories are using universal
> algebra, and more particularly category theory, to formalize
> concepts in the areas of representation, inference and
> learning.
>
> Paul Benjamin
>
> {uunet,decvax}!philabs!dpb
I'm familiar with those areas of AI, but not with category theory (or
universal algebra, for that matter). Can anyone give a short summary for
the layman of those two mathematical topics? And perhaps a pointer as to
how they might be useful in formalizing certain AI concepts. Thanks!
-- Don
--
"You lock the door, and throw away the key
There's someone in my head, but it's not me." -- Pink Floyd
Internet: Geddis@Score.Stanford.Edu (which is forwarded to Japan...)
USnail: P.O. Box 4647, Stanford, CA 94309 USA
------------------------------
Date: Sat, 13 Aug 88 03:50 EDT
From: mcguire@aerospace.aero.org
Subject: English grammar (open/closed classes)
John B. Nagle <jbn@glacier.stanford.edu> writes:
> I understand that there is an approach to English grammar based on
>the following assumptions.
> 1. There are four main categories of words, essentially nouns,
> verbs, adjectives, and adverbs. These categories are
> extensible; new words can be added.
> 2. There are about 125 "special" words, not in one of the four
> main categories. This list is essentially fixed. (New
> nouns appear all the time, but new conjunctions and articles
> never.)
>Does anyone have a reference to this, one that lists all the "special"
>words?
The proper technical term for what I think you are referring to is the
distinction between "open class" v.s. "closed class" words. Certain
classes of words (where a class is defined by its members in some way
behaving the same) contain a finite number of members while other
classes contain a potentially infinite number. If you want to construct
a list of all closed class words in English you might start with the
prepositions, determiners, articles, auxiliary verbs, conjunctions,
numerals, verb features, etc. - though your ultimate list depends upon
how you define your classes, what "behave the same" means, and what
counts as a words.
While I'm familiar with this distinction, and think that it may have
been around in linguistics for quite some while (Bernard Bloch maybe?),
I don't remember it being used much. The only references that spring to
mind are some studies in speech production and slips of the tongue done
in the 70s by Anne Cunningham (she's a Brit though I'm not sure of her
last name) and maybe Victoria Fromkin claiming that less errors are
associated with closed class words and that they play some privileged role
in speech_production/syntax/lexical_access/the_archetecture_of_the_mind.
I can't think of any explicit influence the "open/closed" distinction has
had on generative grammer. I feel however that implicit awareness of
this distinction has lead people to construct and prefer theories where
closed classes correspond to atomic linguistic categories. Coupled with
the generativist bias on how classes are defined, this preference has
left most most current theories analyzing the examples:
"John loved Mary"
"John has loved Mary"
"John might love Mary"
"John seems to love Mary"
as having practically nothing in common.
------------------------------
Date: Fri, 19 Aug 88 14:55 EDT
From: Clay M Bond <bondc@iuvax.cs.indiana.edu>
Subject: doublespeak, Orwell_is_here!
Some excerpts from the _Quarterly Review of Doublespeak_ (NCTE) which you all
should find amusing:
A reader reports that when the patient died, the attending doctor
recorded the following on the patient's chart: "Patient failed to fulfill
his wellness potential."
Another doctor reports that in a recent issue of the *American Journal
of Family Practice* fleas were called "hematophagous arthropod vectors."
The letter from the Air Force colonel in charge of safety said that
rocket boosters weighing more than 300,000 pounds "have an explosive force
upon surface impact that is sufficient to exceed the accepted overpressure
threshhold of physiological damage for exposed personnel." In other words,
if a 300,000-pound booster rocket falls on someone, he or she is not likely
to survive.
A reader reports that the Army calls them "vertically deployed anti-
personnel devices." You probably call them bombs.
At McClellan Air Force base in Sacramento, California, civilian
mechanics were placed on "non-duty, non-pay status." That is, they were fired.
A personal ad from an unidentified mewspaper announces that a "for-
merly single man" seeks a single or married woman.
After taking the trip of a lifetime, our reader sent his twelve rolls
of film to Kodak for developing (or "processing," as Kodak likes to call it)
only to receive the following notice: "We must report that during the handling
of your twelve 35mm Kodachrome slide orders, the films were involved in an
unusual laboratory experience." The use of the passive is a particularly nice
touch, don't you think? Nobody did anything to the films; they just had a bad
experience. Of course our reader can always go back to Tibet and take his
pictures all over again, using the twelve replacement rolls Kodak so generously
sent him.
The description on the package of Stouffer's Veal Tortellini with
Tomato Sauce says it contains "exquisite egg pasta." The list of ingredients,
however, includes "cooked noodle product."
It's not a calendar, it's a "personal manual data base."
In St. Louis there is an oriental rug store that advertizes "semi-
antique" rugs.
The envelope wasn't marked "rush"; it was marked "time valued data--
please expedite."
The Minnesota Board of Education boted to consider requiring all
students to do some "volunteer work" as a prerequisite to high school gradu-
ation.
The London Zoo now has a "beharioral enrichment research fellow,"
whose job it is to cure the animals' boredom. A zoo clown.
Senator Orrin Hatch said that "capital punishment is our society's
recognition of the sanctity of human life."
According to the tax bill signed by President Reagan on December 22,
1987, Don Tyson and his sister-in-law Barbara run a "family farm." Their
"farm" has 25,000 employees and grosses $1.7 billion a year. But as a "family
farm" they get tax breaks that save them $135 million a year.
Scott L. Pickard, spokesperson for the Massachusetts Department of
Public Works, calls them "ground-mounted confirmatory route markers." You
probably call them road signs, but then you don't work in a government agency.
It's not "elderly" or "senior citizens" anymore. Now it's "chrono-
logically experienced citizens."
According to the FAA, the propeller blade didn't break off, it was
just a case of "uncontained blade liberation."
The New York Yellow Pages lists a telephone number for "Services for
the Differently Abled." [For the jaded among you, last summer after the SF
parade, a speaker was talking about the "otherly abled." CB]
You all have a nice weekend, ya hear?
Flames to: /dev/null
--
<<<<<<<<<<<<******<<<<<<<<<<<<******>>>>>>>>>>>>******>>>>>>>>>>>>
<< Clay Bond Indiana University Department of Linguistics >>
<< ARPA: bondc@iuvax.cs.indiana.edu >>
<<<<<<<<<<<<******<<<<<<<<<<<<******>>>>>>>>>>>>******>>>>>>>>>>>>
------------------------------
Date: Wed, 10 Aug 88 11:39 EDT
From: Rick Wojcik <rwojcik@bcsaic.UUCP>
Subject: Re: p. from Tom Bever on intuitions ...
Tom Bever writes:
>So, the linguist's preoccupation with linguistic intuitions is out of
>convenience, not necessity. And, as a number of you have noted,
>intuitions are a sometime thing - the basic problem with them is that
>nobody knows how they work. This means that one can never be sure that a
>given intuition directly reflects grammatical competence or knowledge of
>some other kind...
Ok. But I'm not so sure that the generative linguist's preoccupation with
intuitions is out of convenience. I couldn't begin to define generativism
without making some reference to the data, which is supposed to be delimited
by well-formedness judgments. If you can't rely on intuitions, then you
can't rely on the data. Maybe that's why so many psychologists prefer to
go with brand X linguistics.
>What kind of object IS language, anyway?
UVO (UF^O to John Chambers): unidentified verbal object.
Welcome aboard, Tom :-).
--
Rick Wojcik csnet: rwojcik@boeing.com
uucp: uw-beaver!ssc-vax!bcsaic!rwojcik
address: P.O. Box 24346, MS 7L-64, Seattle, WA 98124-0346
phone: 206-865-3844
------------------------------
Date: Thu, 11 Aug 88 17:26 EDT
From: S. Kulikowski <m0p@k.cc.purdue.edu>
Subject: Readability Formula
HOW THE RAP READABILITY FORMULA WORKS
Stan Kulikowski II Special Education Purdue University
It is time that the common measurement of readability in textual
material should be no more difficult than the measurement of temperature
or weight in physical material.
The RAP works with a readability formula from Gunning (1952) with some
minor adaptions needed which were felt to be needed for the kind of text
commonly found in elementary arithmetic word problems. RAP keeps count of
certain linguistic features in the text it is displaying for timed
reading:
CHARS,
SYLLABLES, (* VARs for readability *)
SYLLABLES_PER_WORD,
WORDS,
LONGWORDS,
SENTENCES,
Gunning's formula makes use of some of these counters.
READABILITY := 0.4 * (WORDS/SENTENCES) + (LONGWORDS/WORDS*100);
LONGWORDS are operationally defined as words with more than two syllables.
RAP uses algorithms to parse the number of syllables, words, sentences and
so forth.
The number of sentences roughly correspond to the number periods,
question marks and exclamation points in the text, so this counter is
esstentially character-recognition. RAP knows not to sentence-count a
decimal point surrounded by digits, but I suspect that abbreviations will
throw the sentence-count off.
The number of words is mostly character recogitions. Multiple blanks
between words and sentences will not throw it off because it is processing
on a level of ALPHANUMERIC-BOUNDARY-ALPHANUMERIC categories. The blank
and the hyphen are strong boundary characters. Stylistic hyphenation of
words will drive the word count up, but not unreasonably. A writer who
puts hyphens with-in words is mak-ing the word re-cog-ni-tion task more
notice-able, and therefore deserves the extra readability weight. The
same goes for underlining and font-switching (but presently RAP only reads
standard ASCII text). RAP decides that arabic numerals in text count as
a single word without syllables. Therefore, 1 and 234 both count as
single reading words but do not have any syllables. Complex numbers like
8.05 and 2,000,301 count as two and three words respectively but still no
syllables. The purpose of using arabic characters in text is to simplify
the reading process. Compare
'three hundred thousand forty eight' (5 WORDS, 8 SYLLABLES)
'300048' (1 WORD, 0 SYLLABLES)
'300,048' (2 WORDS, 0 SYLLABLES)
'$300048.00' (3 WORDS, 0 SYLLABLES)
These are the conventions that RAP uses to estimate the difficulty of
reading the numbers which are frequent in word problem text. Oh yeah,
there are special characters which also count as words without syllables.
WORD_SYMBOLS := ['@','#','$','%','&','=','*','+'];
Can you say 'ampersand'? You know what it means? I once asked a teacher,
"who were the ampers that they needed their own 'and'?"
These characters found in text are treated like the readability of kanje
characters in japanese. Whole-word characters are difficult to read based
on the reader's familiarity with the special symbol. This is roughly
related to the character's overall frequency in the matrix language, but
as a cognitive task, it is directly related to the individual reader's
reading experience. This is what makes mathematical texts so difficult to
read. If you can read calculus, the integral sign becomes easier to read
than, say, its 6th-grade spoken english equivalent. Advanced mathematical
texts are going to have cognitive readability issues much like the
readability of chinese and japanese text which rely on special word
symbols. The readability will be more frequency-related than texts in
writing systems which are based on regular phonic rules. Korean is
apparently very readable because phonic association is so strong and
irregular spelling is historically weak.
RAP's english syllable counter is pretty simple. It parses
VOWEL-CONSONANT and VOWEL-BOUNDARY structures as syllables. It has a
silent-final-e rule (for english). RAP, however, will not distinguish
between 'milked' (2 SYLLABLES) and 'waited' (2 SYLLABLES). RAP finds it a
weak argument that those two words differ in reading difficulty, and I am
not sure that we should change it. Human judges will characteristically
score 1 syllable for 'milked' because it is pronounced "milkt". If people
want to conventionally misspell or mispronounce written text, it should
not have an inordinate effect on readability in otherwise phonic writing
systems.
All told, RAP does real well when compared to humans in applying the
readability formula to text. RAP's interrater reliability was an overall
102.4% compared to three adult humans calculating the same texts (at grade
levels 1, 4 and 8). RAP tends to slightly overestimate syllable counts
compared to humans. The humans complained about the tedious and time
consumming nature of using the formula on about 1K of graded text (and
they were spared character-counting!) RAP on my slow machine (an 8088,
maybe 5ish mHz) will painlessly process a megabyte text in the same time
it took the reliability subjects to complainingly handle a kilobyte. It
is time that the common measurement of readability in textual material
should be no more difficult than the measurement of temperature or weight
in physical material.
REFERENCE
R. Gunning (1952) THE TECHNIQUE OF CLEAR WRITING; McGraw-Hill; New York, NY.
BITNET : XM0P @ PURCCVM (* note, zero, not Oh *)
SnailMail : Special Education; Purdue University; W. Lafayette, IN 47907
USENET : k.cc.purdue.edu!m0p COMPUSERVE : 75410,1211
------------------------------
Date: Fri, 12 Aug 88 09:59 EDT
From: Francis LOWENTHAL <PLOWEN%BMSUEM11.BITNET@MITVMA.MIT.EDU>
Subject: Language and Language Acquisition Conference
ANNOUNCING A CONFERENCE : LANGUAGE AND LANGUAGE ACQUISITION 4
=============================================================
This will be an interdisciplinary seminar.
Dear colleague,
I have the pleasure to invite you to the fourth
conference we organize on Language and Language Acquisition
at the University of Mons, Belgium.
The specific theme of this conference will be :
"LANGUAGE DEVELOPMENT AND COGNITIVE DEVELOPMENT"
Date : From August 22 to August 27, 1988
Place : Mons University.
The aim of this meeting is to further an interdiscipli-
nary and international collaboration among researchers connec-
ted one way or the other with the field of communication and
subjacent logic : this includes as well studies concerning
normal children as handicapped subjects.
Five topics have been chosen : Mathematics, Philosophy,
Logic and Computer Sciences, Psycholinguistics, Psychology and
Medical Sciences. During the conference, each morning will be
devoted to two 45-minutes lectures on one of these domains, and
to a wide discussion concerning all the papers already presen-
ted. The afternoon will be devoted to short presentations by
panelists and to further discussions concerning the panel and
everything that preceded it.
There will be no parallel sessions and, as the organi-
zers want to favour as much as possible discussions between the
participants, it has been decided to reduce the number of par-
ticipants to 70. The selection procedure will be supervised by
an international committee.
Further informations and registration forms can be
obtained by old fashioned mail or by E-mail from :
F. LOWENTHAL
Universite de l'Etat a Mons
Laboratoire N.V.C.D.
Place du Parc, 20
B-7000 MONS (Belgium)
tel : (32)65.37.37.41
TELEX 57764 - UEMONS B
bitnet : PLOWEN@BMSUEM11
Please, feel free to communicate this call for papers
to other potential interested researchers.
F. LOWENTHAL
------------------------------
Date: Fri, 12 Aug 88 11:13 EDT
From: finin@PRC.Unisys.COM
Subject: Acquiring a Model of the User's Beliefs ...
Ph.D. Dissertation Defense
Acquiring a Model of the User's Beliefs from
a Cooperative Advisory Dialogue
Robert Kass
The ability of expert systems to explain their own reasoning is often
cited as their most important feature. Unfortunately, the quality of
these explanations is frequently poor. In this talk, I will argue
that for expert systems to produce good explanations, they must have
available a model of the user's beliefs about the system domain.
Obtaining such a model is not easy, however. Traditional approaches
have depended on the explicit hand-coding of a large number of
assumptions about the beliefs of anticipated system users -- a tedious
and error-prone process. In contrast, I will present an implicit
method for acquiring a user model, embodied in a set of implicit user
model acquisition rules. These rules, developed from the study of a
large number of transcripts of people seeking advice from a human
expert, represent likely inferences that can be made about a user's
beliefs -- based on the system-user dialogue and the dialogue
participants' previous beliefs. This implicit acquisition method is
capable of quickly building a substantial model of the user's beliefs;
a model sufficient to support the generation of expert system
explanations tailored to individual users. Furthermore, the
acquisition rules are domain independent, providing a foundation for a
general user modelling facility for a variety of interactive systems.
Committee: Tim Finin (Advisor)
Aravind Joshi (Chairman)
Elaine Rich (MCC)
Bonnie Webber
Date: Monday, August 15, 1988
Time: 3:00 - 5:00 p.m.
Location: 554 Moore
------------------------------
Date: Fri, 12 Aug 88 19:10 EDT
From: Robert A Amsler <amsler@flash.bellcore.com>
Subject: Workshop Announcement
DICTIONARY ENCODING INITIATIVE
A ONE-DAY WORKSHOP ON THE DEVELOPMENT OF AN
SGML STANDARD FOR MACHINE-READABLE DICTIONARIES
Hosted by Robert A. Amsler and Frank Wm. Tompa
Wednesday, October 26, 1988, 10 AM - 5 PM
(the day before the 1988 Waterloo Conference: Information in Text)
Davis Building, University of Waterloo, Ontario, Canada
The development of a text standard for the interchange of machine-
readable lexical entries is seen as an essential step toward making
such information useful to future generations of computational
scientists and scholars. Whereas several ad hoc schemes for encoding
dictionary entries exist, and even larger numbers of idiosyncratic
typesetting formats exist, there is an increasing number of variants
of such formats being propagated through the research community.
Without the introduction of some standard formats for the interchange
of such information, both the publishing and research communities
will suffer.
A preliminary draft of such an interchange standard for encoding
machine-readable English monolingual dictionary entries has been
developed in Standard Generalized Markup Language (SGML). This
workshop will present the contents and rationale for this standard
and offer attendees the opportunity to join the Dictionary Encoding
Initiative to refine and complete the standard. We are both inviting
your commentary and soliciting your help in attempting to make the
resultant standard serve the needs of all researchers.
If you are able to attend the workshop, please reply via email or
postal mail to:
Robert A. Amsler
Dictionary Encoding Initiative Workshop
Bellcore, MRE 2D-398
445 South Street
P.O. Box 1910
Morristown, NJ 07960-1910, USA
email:
amsler@flash.bellcore.com
uunet.uu.net!bellcore!amsler
------------------------------
Date: Thu, 18 Aug 88 09:15 EDT
From: schmolze@cs.tufts.edu
Subject: Call for Panels for IJCAI-89
The IJCAI committee requests the submission of proposals for panel sessions
to be presented at IJCAI-89. A panel session allows from three to five
people to present their views and/or results on a common theme, issue or
question. The panel topic must be both relevant and interesting to the AI
community. The panel members must have substantive experience with the
topic. However, the members need not be members of the AI community.
Preference will be given to panels that demonstrate broad, preferably
international, participation.
A panel topic must be specified clearly and narrowly so it can be adequately
addressed in a single session. Panel sessions run for 75 minutes. The
format usually consists of an introduction by the chairperson with the
purpose of providing the audience with a background for the ensuing
discussion. The panel members, including possibly the chairperson, then
present their views and/or results, followed by interchange between the
participants and, finally, by interchange between the panelists and the
audience. Preferably, the session ends with an overview by the chairperson.
Panels may primarily serve to present information on a specific topic, such
as recent important results or the status of important projects. Panels may
focus on alternative approaches or views to a common question, where
panelists present their approaches or views and the results they produced.
Also, panels may be critical, where some members present an approach or view
and other members criticize them, allowing time for rebuttals.
REQUIREMENTS FOR SUBMISSION
A proposal consists of a cover page, an overall summary and a summary of each
member's presentation.
The cover page should contain the following.
o At the top of the first page, write "PANEL PROPOSAL".
o Title of panel: The length should be similar to the lengths of titles of
papers.
o Chairperson: Name, affiliation, phone number, postal mailing address and
electronic mailing address. Please give phone number and address for
correspondence from the United States.
o Members: Names, affiliations, phone numbers, postal mailing addresses and
electronic mailing addresses. Please give phone numbers and addresses for
correspondence from the United States.
The overall summary should be brief, giving a clear description of the panel
topic such that members of the general AI community can understand and
appreciate it. It should explain how the member's presentations will be
integrated. In addition, it should address the following questions.
o What is the relevance and/or significance of the panel, including both
the topic and the members?
o What is the general AI interest in the topic? Please give evidence, such
as recent important papers, workshops, etc.
o How does the panel membership demonstrate broad, preferably
international, participation? If it does not, why is narrow
participation preferable?
o If your topic has been discussed by another panel in a recent national or
international AI conference, how will your panel differ from it?
The overall summary should be from 500 to 1000 words in length.
The final part of the proposal should be a brief summary of each member's
presentation. This includes the chairperson if she or he will give a
presentation. Each such summary should give a clear description of the
member's view or approach, summarize results if appropriate, and demonstrate
the connections to the panel topic. Where appropriate, each summary should
support the arguments given in the overall summary. These summaries,
including the overall summary, should be coordinated such that the panel
proposal is a sensible whole and not a loosely coupled collection of parts.
Each member's summary should be approximately 500 words.
Please submit six (6) copies of the proposal (cover page, overall summary and
member summaries) no later than December 12, 1988 to:
IJCAI 89
c/o AAAI
445 Burgess Drive
Menlo Park, CA 94025-3496 USA
Chairpersons for proposals will be notified of the final decisions by March
27, 1989. The proposals selected for presentation will be published in the
proceedings. Chairpersons and members of these panels will be allowed to
submit extended versions of their summaries. Revised versions will be due by
April 27, 1989.
------------------------------
End of NL-KR Digest
*******************