Copy Link
Add to Bookmark
Report
NL-KR Digest Volume 05 No. 15
NL-KR Digest (9/12/88 21:11:42) Volume 5 Number 15
Today's Topics:
Re: open/closed classes
nl evaluation workshop
Data Wanted:
Re: GPSG parsers
Submissions: NL-KR@CS.ROCHESTER.EDU
Requests, policy: NL-KR-REQUEST@CS.ROCHESTER.EDU
----------------------------------------------------------------------
Date: Thu, 1 Sep 88 07:54 EDT
From: Bruce E. Nevin <bnevin@cch.bbn.com>
Subject: open/closed classes
There is some recent work by Leonard Talmy on the supposed cognitive
whys and wherefores of open vs closed classes. Sorry, I don't have a
reference handy.
The supposition that a speech recognizer has to be especially good at
hearing closed-class words misses an important point: the closed-class
words are unstressed and generally subject to reduction in phonemic--how
shall I say--extent. This is part of a general process, apparently in
all languages, of reducing the phonemic representation of words that
carry less information. They are reducible to the extent that they are
redundant. (Not much difficulty predicting the filler in the context
'He __ gone.' You only need enough phonemic content to distinguish the
words 'has, had, was' plus of course more obvious--and less reduced--
constructions incorporating these such as their negatives, `will have
gone', etc.)
Historically, closed-class morphology derives from open-class words that
have become more redundant and predictable, so that their reduced forms
become `frozen' in their now predictable contexts. An example is the
suffix -hood in `childhood', from an earlier form had meaning `state',
something like `child-state'. The suffix -ly in adverbs of manner
derives from the dative of a word for `form, body'. Ancestors of
Proto-Indo-European not having been reconstructed, we have no
confirmation that this is the origin of inflectional morphology such as
the preterit in descendant languages like English, but that is certainly
the most plausible assumption. In American Indian languages, Shirley
Silver dubbed this process `morphemization' almost 20 years ago.
So affixes (inherently closed-class morphology) appear to be derived by
reduction from once free-standing words. Similarly for closed-class
words. `Because' derives from `by cause'. OED cites 1305 `bi cause
whi'; whi or `why' is the instrumental of the wh- pronouns typified by
`what', reduced to `that' in the later `by cause that, because that'.
(Compare reduction of cause to zero in `for the cause why' --> `forwhy',
a common conjunction now obsolete, to which compare further `from the
place where' --> `from where'.) Zeroing of `why ~ that' in `because
why, because that' leaves `because' as a conjunction, a closed-class
word. (See Jespersen _Modern English Grammar on Historical Principles_
V 397 and Harris _A Grammar of English on Mathematical Principles_ 195
for further details.)
An example currently in progress in English is `going to' --> `gonna', a
reduction that takes place before verbs but not before nouns (*`I'm
gonna New York') precisely because `going to' can occur before the
whole class of verbs (and consequently carries less information and is
subject to reduction there) but cannot occur before every possible
noun. (Note that in e.g. `I'm going to authority' an indefinite noun,
one of exceptionally broad distribution, can be understood as having
been elided: `I'm going to someone of/in authority'. It is not
possible to reverse a reduction in this way to account for the broad
distribution of `going to' before verbs.) This appears to be on the way
to being a separate future tense morpheme in the closed-class set.
The above example of `forwhy' illustrates that closed-class words also
become obsolete and drop from the language. The class is closed with
respect to distribution, and conservative but not closed with respect to
change.
Bruce Nevin
bn@cch.bbn.com
<usual_disclaimer>
------------------------------
Date: Fri, 2 Sep 88 12:19 EDT
From: palmer@PRC.Unisys.COM
Subject: nl evaluation workshop
CALL FOR PARTICIPATION
Workshop on
Evaluation of Natural Language Processing Systems
Dec 8-9
Wayne Hotel, Wayne, PA (Philadelphia)
There has been much recent interest in the difficult
problem of evaluating natural language systems. With the
exception of natural language interfaces there are few work-
ing systems in existence, and they tend to be concerned with
very different tasks and use equally different techniques.
There has been little agreement in the field about training
sets and test sets, or about clearly defined subsets of
problems that constitute standards for different levels of
performance. Even those groups that have attempted a meas-
ure of self-evaluation have often been reduced to discussing
a system's performance in isolation - comparing its current
performance to its previous performance rather than to
another system. As this technology begins to move slowly
into the marketplace, the need for useful evaluation tech-
niques is becoming more and more obvious. The speech com-
munity has made some recent progress toward developing new
methods of evaluation, and it is time that the natural
language community followed suit. This is much more easily
said than done and will require a concentrated effort on the
part of the field.
There are certain premises that should underly any dis-
cussion of evaluation of natural language processing sys-
tems:
(1) It should be possible to discuss system evaluation
in general without having to state whether the pur-
pose of the system is "question-answering" or "text
processing." Evaluating a system requires the
definition of an application task in terms of I/O
pairs which are equally applicable to question-
answering, text processing, or generation.
(2) There are two basic types of evaluation: a) "black box
evaluation" which measures system performance on a
given task in terms of well-defined I/O pairs; and b)
"glass box evaluation" which examines the internal
workings of the system. For example, glass box per-
formance evaluation for a system that is supposed
to perform semantic and pragmatic analysis should
include the examination of predicate-argument rela-
tions, referents, and temporal and causal relations.
Given these premises, the workshop will be structured
around the following three sessions: 1) Defining "glass box
evaluation" and "black box evaluation." 2) Defining criteria
for "black box evaluation." _A Proposal for establishing task
oriented benchmarks for NLP Systems_ (Session Chair - Beth
Sundheim) 3) Defining criteria for "glass box evaluation."
(Session Chair - Jerry Hobbs) Several different types of
systems will be discussed, including question-answering sys-
tems, text processing systems and generation systems.
Researchers interested in participating are requested
to submit a short (250-500 word) description of their
experience and interests, and what they could contribute to
the workshop. In particular, if they have been involved in
any evaluation efforts that they would like to report on,
they should include a short abstract (500-1000 words) as
well. The number of participants at the workshop must be
restricted due to limited room size. The descriptions and
abstracts will be reviewed by the following committee: Mar-
tha Palmer (Unisys), Mitch Marcus (University of Pennsyl-
vania), Beth Sundheim (NOSC), Ed Hovy (ISI), Tim Finin
(Unisys), Lynn Bates (BBN). They should arrive at the
address given below no later than October 1st. Responses to
all who submit abstracts or descriptions will be sent by
November 1st.
Martha Palmer
Unisys
Research & Development
PO Box 517
Paoli, PA 19301
palmer@prc.unisys.com
(215) 648-7228
------------------------------
Date: Mon, 5 Sep 88 16:36 EDT
From: Mark William Hopkins <markh@csd4.milw.wisc.edu>
Subject: Data Wanted:
I am in need of some English text, for setting up a data base. If you
have any to contribute please e-mail them to me.
I asked Jerry Lewis to set up a telethon for this, but he said he was
busy :-)
------------------------------
Date: Mon, 12 Sep 88 08:02 EDT
From: COR_HVH%HNYKUN52.BITNET@CUNYVM.CUNY.EDU
Subject: GPSG parsers
Some time ago I asked for information on GPSG parsers (or parser-generators)
and promised to report any replies. Up to now, I have been notified of two
efforts in this area.
At the Technical University in Berlin a PROLOG system is being developed in
a machine translation context (Eurotra). It is able to parse and generate
sentences according to a small English or a medium German grammar.
At Boeing work is done on a LISP GPSG parser with the eventual aim of
automatic message processing. The system can parse English sentences
using a fairly large grammar and dictionary. Neither system uses "pure"
GPSG (in case it exists at all), the most important difference being the
absence of metarules.
I will ask both my contacts to do a more detailed write-up about their
work and submit them to this list.
Hans van Halteren COR_HVH@HNYKUN52.BITNET
------------------------------
End of NL-KR Digest
*******************