Copy Link
Add to Bookmark
Report

NL-KR Digest Volume 02 No. 11

eZine's profile picture
Published in 
NL KR Digest
 · 1 year ago

NL-KR Digest             (2/26/87 17:41:15)            Volume 2 Number 11 

Today's Topics:
A Real Linguistics Question ?
How unknown words are handled: responses

----------------------------------------------------------------------

From: edwards@uwmacc.UUCP (mark edwards)
Subject: A Real Linguistics Question ?
Date: 25 Feb 87 17:58:19 GMT
Keywords: Source of Linguistic Data

[Forwarded from sci.lang]

I am thinking about doing a paper on a topic that I think is one of
the Fundamental Problems of Linguistics. Namely, is a sentence always
the proper datum for doing GB, GPSG or other roughly related research.

Let me say outright that I am not a particular fan of GB or that line
of research. I do like GPSG, but one of its problems is that its roots
are in GB, or the basis for GB.

Chomsky would argue that a sentence is the proper place for doing work
in Linguistics (syntax ?). He would also say that sentences that seem
gramatical in a certain context are really syntactically ungrammatical,
but pragmatically correct. Or something on that line of thought.

What I am interested in, is any references or any thoughts (specific
examples) on this topic.

Thanks
mark
--
edwards@unix.macc.wisc.edu
{allegra, ihnp4, seismo}!uwvax!uwmacc!edwards
UW-Madison, 1210 West Dayton St., Madison WI 53706

------------------------------

From: goldberg@su-russell.ARPA (Jeffrey Goldberg)
Subject: Re: A Real Linguistics Question ?
Date: 26 Feb 87 10:02:34 GMT
Organization: Stanford University, CSLI

[Forwarded from sci.lang]

In article <1111@uwmacc.UUCP> edwards@uwmacc.UUCP (mark edwards) writes:
> I am thinking about doing a paper on a topic that I think is one of
> the Fundamental Problems of Linguistics. Namely, is a sentence always
> the proper datum for doing GB, GPSG or other roughly related research.

Generalized Phrase Structure Grammar (GPSG) is a theory that will
let one write grammars that define phrase structure trees. Thus a
grammar written using GPSG not only defines the class of
well-formed Ss, but also the class of well-formed NPs, PPs,
A's, etc.

Government and Binding theory (GB) is much more a theory about Ss.
I do not see how it could tell you whether an NP in isolation is
well formed: It would neither be assigned Case nor a Theta-role.

But, somehow, I gather that you are really asking about units
larger then the sentence. I will get to that below.

> Let me say outright that I am not a particular fan of GB or that line
> of research. I do like GPSG, but one of its problems is that its roots
> are in GB, or the basis for GB.

GPSG is not derivative of GB in any sense. But GPSG and GB have
common origins. Both are theories of "Generative Syntax" (but
see the first chapter of "GPSG" by Gazdar, Klein, Pullum, and
Sag for a statement that GP fails to meet the criteria of
generative syntax.) I do not want to go over the entire history
of the field, but the lasting split in gernative syntax occured
when Chomsky decide that he didn't like Raising To Object.
Until that time, syntacticians (I got tired of writing
"generative" all over the place) treated "Mary" in (1) to be
structurally the object of "expect" at surface structure.

(1) He expected Mary to be late.

S
______|_______
| |
NP VP
| ________|_________
He | | |
V NP ?
| | |
expected Mary to be late

Instead Chomsky wanted a structure more like:

S
_____|______
| |
NP VP
| ______|_______
He | |
V S
| ______|______
expected | | |
NP INFL VP
| | |
Mary to be late

It was about this time that people who had sided with him during
a previous battle broke off and developed other theories:
Relational Grammar (Perlmutter and Postal) and Micheal Brame's
nontransformational theory. Based on these and on serious
thought about semantics (mostly by people who had lost the
previous war with the Chomsky line), other theories came into
being: Lexical Functional Grammar (Bresnan), Arc Pair Grammar
(Postal), Montague Grammar (Partee, Dowty). All of these people
essentaily treated "Mary" in (1) as the object of expect at
their surface level.

Meanwhile, the Chomsky line went from Standard Theory (where he too
had Raising to Object) to the Extended Standard Theory to the
Revised Extended Standard Theory (some people called it "the Over
extended Standard Theory"
) to GB and now to "Barriers".

Again, this is just talking about generative syntax. There were
and are those opposed to the entire enterprise, but space doesn't
permit mention of them here.

Anyway, to come to the point about GPSG, it has been described as
the result of an "unholy marriage of Bresnan and Montague", and had
its start with a couple of papers by Gerald Gazdar in '81 and '82.

> Chomsky would argue that a sentence is the proper place for doing work
> in Linguistics (syntax ?). He would also say that sentences that seem
> gramatical in a certain context are really syntactically ungrammatical,
> but pragmatically correct. Or something on that line of thought.
> Chomsky would argue that a sentence is the proper place for doing work
> in Linguistics (syntax ?). He would also say that sentences that seem
> gramatical in a certain context are really syntactically ungrammatical,
> but pragmatically correct. Or something on that line of thought.

> What I am interested in, is any references or any thoughts (specific
> examples) on this topic.

I am not sure that I understand what you are asking here. Chomsky
talks about what is called the competance/performance distinction.
While many people who attack generative syntax attack this working
hypothesis, only a few people in sociolinguistics have presented
any alternative. For the most part, both generativists and
anti-generativists make this distinction. Generativists do it
overtly, that is all.

But to get back to your original question, there are linguists who
look a things larger then the sentence. It is not clear that there
are linguistically definable units at those larger levels, thus
making the sentence the largest unit that one can really try to say
things about. But just because there aren't larger units doesn't
mean that there is nothing for linguists to discover at what is
called "the discourse level". It is a useful thing to look at, and
I wish you well with it.

Jeff Goldberg
ARPA: goldberg@russell.stanford.edu, goldberg@csli.stanford.edu
UUCP: ...!ucbvax!russell.stanford.edu!goldberg

------------------------------

Date: Fri, 20 Feb 87 17:23:27 CST
From: Larry Waswick <ncwaswic%NDSUVAX.BITNET@wiscvm.wisc.edu>
Subject: How unknown words are handled: responses

Approximately one month ago I posted a query in search of materials which
tell of how unknown words, i.e., words absent from the lexicon, are handled
in a natural language processing system. Following is an accumulation of the
responses which listed sources. The responses have been edited for the
purpose of producing a condensed result.

Thank you to all who responded.

- Larry Waswick

>From: WEISCHEDEL@G.BBN.COM

There are two approaches, more or less.

Larry Harris, from a Dartmouth tech report around 1977, discussed the problem
in the context of NLI's for data bases. Namely, he proposed, inverting the
data base, so that proper names could be looked up in that way. This
has obvious merit, since you don't have to load the lexicon with all possible
proper names. BUT it means that if the company doesn't currently have sales in
Pennsylvania, the NLI wouldn't know it's a state. Also, it
pretty much deals only with proper names.

The second approach is to try to infer it's meaning from
context. Influential early papers here include Hendrix's 1978
paper on LADDER (in TODS) and Carbonell's '79 paper in the ACL
conference. There have been many, many papers now following up
on that approach. Almost all are limited. I don't think there
is any non-research NLI that tries to infer the meaning of
unknown words. Furthermore, all are fairly limited. It's not
an easy problem.

Of course, the "unknown word" might be a typo of a word it
knows. It is easy to write a routine that checks an "unknown
word"
as a possible spelling/typo of a known word, and enter
into an acquisition package to learn its meaning from the user.
See the TEAM report (SRI, 1985), a paper on TELI (Ballard, ACL
conf. '86), or our own paper on IRUS (BBN, '87) for three
distinct interesting variants on this practical approach.
-------

>From: Stephan Busemann <BUSEMANN@DB0TUI11>

As far as I know, James Kilbury has given a paper at ECAI-86
that dealt, among other
topics, with the prediction of syntactic properties of unknown words
during an Earley-based parsing process using a GPSG of English.
If you like to get to know more about that, here's his address:
Dr. J. Kilbury
Universitaet Trier
FB II, Linguistische Datenverarbeitung
Postfach 3825
D-5500 Trier
Best wishes,
Stephan
-------

>From: AI.DUFFY@R20.UTEXAS.EDU

My system, using the Symbolics environment, queries the user. I
suppose that's the obvious answer to your question. The user supplies
category and subcategory information using a mouseable menu interface.
This really speeds the process. Unfortunately for you, this probably
is of little help. Your environment, likely as not, will force you to
ask questions about subcategory information one at a time, boring the
user to tears.

I save the acquired information so they can be written to lexicon
patch files. These are reloaded during subsequent user sessions so
that the process need be performed only once.

I am currently developing a new lexicon shell that allows applications
programmers to define categories and subcategories using forms that
specify the questions to be asked. The lexicon composes its own
subcategory predicates and fetch functions and stores subcategory
choices on bit arrays for data compaction purposes. The menus are
consed on the fly and ask only for the subcategory information not
presently known about. I'm preparing a paper on it, but it's not yet
completed.
-------

>From: Steven Bird <munnari!mulga.oz!steven@seismo.CSS.GOV>

If you give me your address, I will send you a copy of my thesis when it is
completed in a couple of months. It considers a number of flexible parsing
issues, including the problem you mentioned.

[Steven Bird. Comp. Sci. Dept. Melbourne University, Parkville 3052, Australia]
[UUCP: :seismo,mcvax,ukc,ubc-vision,prlb2,enea,hplabs,tataelxsi:!munnari!steven]
[ARPA: steven%munnari.oz@seismo.css.gov CSNET: steven%munnari.oz@australia]
-------

>From: Koenraad De Smedt <DESMEDT@HNYKUN52>

Leo Konst wrote a parser on a Symbolics which was based on psychological
filter theory. He included a menu-driven interface to get information
about unknown words from the user. That's one way of doing it.

Then Eddy van Vliembergen at OCE research rewrote that parser for an IBM
PC/AT and modified it so that it makes hypotheses about unknown words.
That's another way of doing it. It's more risky, but it seems to work
pretty well.

Both parsers are for Dutch and are written in LISP. So far, very little
has been published about them.

Koenraad De Smedt.
-------

>From krovetz@umass Thu Jan 22 23:03:34 1987

I've been looking at that problem for a while. There are systems
that make attempts to figure out part of speech, such as Don Hindle
and Mitch Marcus' FIDICH parser (I don't have a reference to their
work, but it's being done at Bell Labs), or Gavan Duffy's system
(described in AAAI-86 proceedings on page 1079). There have also been
a couple of systems that try to do some semantic guessing. The ones
I know of are: FOUL-UP, by Richard Granger (IJCAI 77, pp. 172-178),
Moser's work with the RUS parser (first IEEE Conference on AI
applications, 1984), Haas' and Hendrix's work on NanoKlaus (AAAI-80,
pp. 235-239), and a system that used pragmatics to figure out
unknown words in the context of tic-tac-toe (that was done by
Perry Miller and was in one of the AAAI or IJCAI proceedings
around 1973-179, but I don't have an exact reference at hand).

I've been trying to work on a system myself but I've been stuck
against the problem of dealing with compound nominals. I've
been using titles and abstracts from CACM as a source of unknown
words. I compare them against a collection of the 2000 most common
words in English and whatever isn't a morphological variation I
assume is a word to try to figure out. I'll probably write up
what I've done sometime in the Spring.

I'm working on my doctorate at the University of Massachusetts at
Amherst. I hope to do a dissertation on intelligent information retrieval.

Cheers,
Bob

P.S. My address is:

Bob Krovetz
Dept. of Computer Science
University of Massachusetts
Amherst, MA. 01003
-------

>From: Educational Software <ihnp4!ecfa!edusoft>

Why don't you try:

Efficient Parsing for Natural Language
Masaru Tomita
1985
Kluwer
P98.T65 1985 006.3 '5

This is his CMU thesis (he's still there).

He uses a graph-oriented parser, graphs denoting different readings
for different constituents.

This is extended to allow for words of unknown category, determining
which categories are allowed by the rest of the construction.

bill idsardi
computational linguist
educational software products OR department of linguistics
1263 bay st university of toronto
toronto toronto
ontario ontario
canada canada
M5R 2C1 M5S 1A1
416-922-0087 416-978-6916
..utzoo!utai!utecfa!edusoft
-------

>From: D J Besemer <rutgers!rochester!steinmetz!besemer>
Subject: Automatic acquisition of new lexemes

I am involved with a natural language effort at GE Corporate Research
and Development in which we have dealt with the automatic acquisition
of new lexical entries. Although our approach to this is quite
simple, it is made possible by our knowledge representation.

Our lexicon is summarized in the GE Technical Report "FLUSH: Beyond
the Phrasal Lexicon"
(which is also currently being considered for the
special issue of "Computational Linguistics" on the lexicon). I would
be happy to send you a copy of the Tech. Report if you would send me a
mailing address.

|>/\\/<-

(David Besemer besemer@ge-crd.arpa)

------------------------------

End of NL-KR Digest
*******************


← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT