Copy Link
Add to Bookmark
Report

NL-KR Digest Volume 12 No. 04

eZine's profile picture
Published in 
NL KR Digest
 · 11 months ago

NL-KR Digest      (Wed Jun  2 21:24:41 CDT 1993)      Volume 12 No. 4 

Today's Topics:

Announcement: Recent NLP Memoranda in Computer and Cognitive Science

Submissions: nl-kr@cs.rpi.edu
Requests, policy: nl-kr-request@cs.rpi.edu
Back issues are available from host archive.cs.rpi.edu [128.213.3.18] in
the files nl-kr/Vxx/Nyy (ie nl-kr/V01/N01 for V1#1), mail requests will
not be promptly satisfied. Starting with V9, there is a subject index
in the file INDEX. If you can't reach `cs.rpi.edu' you may want
to use `turing.cs.rpi.edu' instead.
BITNET subscribers: we now have a LISTSERVer for nl-kr.
You may send submissions to NL-KR@RPITSVM
and any listserv-style administrative requests to LISTSERV@RPITSVM.

-----------------------------------------------------------------

To: nl-kr@cs.rpi.edu
Newsgroups: comp.ai.nlang-know-rep
From: <yorick@NMSU.Edu>
Subject: Announcement: Recent NLP Memoranda in Computer and Cognitive Science
Reply-To:
Date:


For ordering technical reports listed below write to:

Memoranda Series, Computing Research Laboratory, Box 30001, New Mexico
State University, Las Cruces, New Mexico, 88003, USA.




Ball, Jerry T., (1992), PM, Propositional Model, a Computational
Psycholinguistic Model of Language Comprehension Based on a Relational
Analysis of Written English, CRL, (Ph.D. Thesis) MCCS-92-226. ($20.00)

A computational psycholinguistic model of written language comprehension
called PM (Propositional Model) is described. PM consists of two basic
components: (1) a propositional system of representation for representing
the relational structure and content of written English sentences, and (2)
a processing mechanism for constructing propositional representations
directly from written English input. PM is a highly interactive model.
Written English text is processed directly into propositional
representations. There is no separate syntactic analysis and no distinctly
syntactic representations exist. The processing mechanism is lexically
driven and most knowledge of language is assumed to be encoded in the
lexicon. Of particular importance is the information encoded by relational
lexical items. That information sets up expectations which drive the
processing mechanism. It also determines the possible propositional
structures. Following the description of the system of representation and
processing mechanism, the results of two experiments which provide support
for highly interactive models of language comprehension like PM and argue
against autonomous models are presented. The dissertation concludes with a
discussion of PM's contributions to the study of language processing.
Important influences on the development of PM have been Y. Wilks
(Preference Semantics), R. Langacker and G. Lakoff (Cognitive Linguistics),
T. Givon (Functional-Typological Grammar), W. Kintsch and J. R. Anderson
(Propositional Representations), G. A. Miller and P. Johnson-Laird
(Lexical Semantics), and P. Johnson-Laird (Mental Models).


Barnden, John A., (1992), Connectionism, Structure-Sensitivity, and
Systematicity: Refining the Task Requirements, CRL, MCCS-92-227. ($7.00)

Some issues in applying connectionism to reasoning and natural
language understanding are explored. They center on systematicity and
structure-sensitivity, two notions that are central to Fodor and
Pylyshyn's critique of connectionism. Certain neglected but crucial
aspects of these notions make them more troublesome for connectionism
than has been previously acknowledged. First, connectionism must
provide a way of embedding reasoning within certain types of context.
For instance, a system must be able to reason within the context of
another agent's beliefs. Secondly, connectionism must provide a way
of matching two structured representations in working memory, as
opposed to merely associating working-memory items to long-term
memories. Thirdly, there may be variables within working memory
representations, not just within long-term rules. These three points
lead to significant systematicity and structure-sensitivity
requirements over and above those that have already been discussed
in the connectionism/symbolicism debate. The paper is, nevertheless,
generally sympathetic to connectionism, and its intent is to clear
the way for further advances within that field.



Ball, Jerry T., (1992), PM, Propositional Model, a Computational
Psycholinguistic Model of Language Comprehension Based on a Relational
Analysis of Written English (Summary Paper), CRL, MCCS-92-229. ($7.00)

A computational psycholinguistic model of written language comprehension
called PM (Propositional Model) is described. PM consists of two basic
components: (1) a propositional system of representation for representing
the relational structure and content of written English sentences, and (2)
a processing mechanism for constructing propositional representations
directly from written English input. PM is a highly interactive model.
Written English text is processed directly into propositional
representations. There is no separate syntactic analysis and no distinctly
syntactic representations exist. The processing mechanism is lexically
driven and most knowledge of language is assumed to be encoded in the
lexicon. Of particular importance is the information encoded by relational
lexical items. That information sets up expectations which drive the
processing mechanism. It also determines the possible propositional
structures. After describing PM, the use of PM for computerized NLP is
considered. A brief comparison of PM with compatible linguistic approaches
follows. Finally, two experiments which support the interactive nature of
PM are presented. Important influences on the development of PM have been
Y. Wilks (Preference Semantics), R. Langacker and G. Lakoff (Cognitive
Linguistics), T. Givon (Functional-Typological Grammar), W. Kintsch and J.
R. Anderson (Propositional Representations), G. A. Miller and P.
Johnson-Laird (Lexical Semantics), and P. Johnson-Laird (Mental Models).



Barnden, John A., (1992), Beliefs, Connectionism, Meta-Representation,
Vagueness: Stirring the Pot, CRL, MCCS-92-230. ($5.00)

I discuss two separate topics in the area of propositional attitude
representation. One is the question of how to treat vague
quantification (viz. {\it most}, {\it several}, and so on) within
propositional attitude contexts, as for example when someone says
``John believes that most of his toenails are ingrown.'' I report
some initial considerations on this issue. The intention is to fill in
a lacuna in propositional attitude research, which has been too
narrowly concerned with strict universal and existential
quantification. The other topic is to do with the representation of
attitudes in non-implementational connectionist systems. It throws
some light on attitude representation issues as well as on
connectionism. One prominent way of representing attitudes is by
means of meta-logics, including quotational logics. Meta-logic
provides some of the most expressively powerful attitude
representation approaches. Unfortunately, there are difficulties in
importing its central ideas into non-implementational connectionist
systems. I suggest that some of the difficulties are removed by
meta-linguistic attitude representation proposals. These have terms
denoting natural language sentences or utterances. An independent
reason for considering such approaches is that they are strongly
related to a prevalent commonsense metaphor of attitudes, namely the
model of beliefs and so on as internal, natural language utterances.
The discussion of the second topic reveals that connectionist concerns
with representation have been insufficiently general, in failing to
address the need for a cognitive system to be able to think about
complex structured expressions, as opposed to thinking with them.


Iverson, Eric and Helmreich, Stephen, (1992), Metallel: An Integrated
Approach to Non-literal Phrase Interpretation, CRL, MCCS-92-231. ($5.00)

Metallel is a program that incorporates marker passing techniques
within a preference/collative semantics framework. This allows for
the simultaneous generation of literal and non-literal meaning
representations, while allowing for a much greater degree of
parallelism during processing. In addition, we have integrated
metonymic and metaphoric inferencing into one procedure, arguing that
at least some types of metaphor can be represented as parallel
metonymies. A number of examples are presented which show that
metallel's output is roughly equivalent to conventional, rule-based
approaches to metonymy.


Barnden, John A., 1992, On Using Analogy to Reconcile Connections and
Symbols, CRL, MCCS-92-232. ($7.00)

How do we gain both standard advantages of connectionism and those of
symbolic systems, without adopting hybrid symbolic/connectionist
systems? Fully connectionist systems that support analogy-based
reasoning are proposed as an answer, at least in the realm of
high-level cognitive processing. This domain includes commonsense
reasoning and the semantic/pragmatic aspects of natural language
processing. The proposed type of system, purely by being
analogy-based, gains forms of graceful degradation, representation
completion, similarity-based generalization, learning, rule-emergence
and exception-emergence. The system therefore gains advantages
commonly associated with connectionism, although the precise forms of
the benefits are different. At the same time, through being fully
connectionist, the system also gains the traditional connectionist
variants of those advantages, as well as gaining further advantages
not provided by analogy-based reasoning per se. And, because the
system is in part an implementation of a form of symbolic processing,
it preserves the flexible handling of complex, temporary structures
that are well supported in traditional artificial intelligence and
which are essential for high-level cognitive processing. The chapter
is in part a reaction against the excessive polarization of the
connectionism/symbolicism debate. This polarization is seen as
resulting from over-simplified, monolithic views both of what symbolic
processing encompasses and of the nature of the benefits that
connectionism provides.



Dunning, Ted, Cowie, Jim, & Wakao, Takahiro, 1992, An Analysis of a
Parallel Japanese-English Corpus, CRL, MCCS-92-233. ($5.00)

We have analyzed a data set consisting of 10,000 paired English and
Japanese scientific abstracts. Roughly 80% of these abstracts are
direct translations, with the remainder being summaries in Japanese of
originally English texts. This is the first example of a parallel
corpus pairing English with any oriental language that we are familiar
with. We show that Japanese in Extended Unix Code (EUC) is relatively
more efficient at encoding information than English in ASCII, but that
the information content of the two is similar. We present the results
of a standard frequency analysis of the Japanese corpus alone,
including character, character bigram, word and word bigram frequency
analyses. Further, we present the results of statistical studies
which attempt to extract a translation glossary from the paired texts
and an initial assessment of the feasibility of automatic sentence
alignment based on an analysis of 100 texts whose sentences were
aligned manually.

Jin, Wanying, 1992, A Case Study: Chinese Segmentation and its
Disambiguation, CRL, MCCS-92-237. ($5.00)

This paper first reviews the techniques used in the current Chinese
segmentation systems. The methods include character string match,
generate-and-test approach and knowledge-based expert system approach.
A proposed algorithm for segmenting Chinese sentences in news
articles is then presented. The basic idea is to use each character in
the input string as an index to retrieve a list of candidate words
from a Chinese lexical database, and then use the input string
as a filter to rule out all the incompatible candidates. All compatible
candidates are aligned to produce the plausible strings as hypotheses.
The difficulties in Chinese segmentation are also discussed. A
technique of reasoning under uncertainty is studied in an attempt
of solving problems in disambiguation. Finally, a knowledge-based
plausible reasoning mechanism is proposed.


Harary, Frank & Wilks, Yorick, 1992, On Unidirectional Linguistic
Comprehension, CRL, MCCS-92-238. ($5.00)

NO ABSTRACT WITH THIS REPORT


Stein, Gees C., (1993), Genus Verb Disambiguation: Possible or
Impossible, CRL, MCCS-93-240. ($5.00)

The problem of verb disambiguation is of interest to many research
projects. This paper concentrates on the disambiguation of verbs as
used in verb definitions: the genus verb.

The verb definitions in the on-line dictionary LDOCE (Longman
Dictionary of Contemporary English, 1984) contain not only definition
sentences and example sentences, but also grammatical information,
pragmatic information and semantic information among others.

The algorithm developed for the genus disambiguation is based on the
definition and example sentences, the pragmatic information (in which
contexts is a word in general used, like Law, Engineering etc.) and on
semantic information (what kind of subject does a verb prefer).

The algorithm was tested on a hand-disambiguated test set of 100 verb
definitions. The disambiguation was done with respect to the
definitions as found in LDOCE.

The final results looked promising although verbs have some specific
problems. Future work has to decide whether this approach really is
useful.


Helmreich, Steve, Jin, Wanying, Wilks, Yorick, Guillen, Rocio, 1992,
Research Issues in Machine Translation at the Computing Research
Laboratory, CRL, MCCS-92-242. ($5.00)

In this paper, several issues related to Machine Translation and the
ULTRA MT system that is currently under development at the Computing
Research Laboratory (CRL) at New Mexico State University are
presented. ULTRA is a five-language, interlingual-based system
(English, Spanish, German, Chinese, and Japanese). Its theoretical
goals lie in the area of pragmatics and communication, though to date
this aspect has been implemented in only a limited manner.

This paper emphasizes on issues which are particularly pertinent to
interlingual systems and to those based on communicative principles.
The approach to each of these issues within the ULTRA system and some
of the ancillary tools to assist in the process of MT-system research
are described. For example, a multi-lingual interface which supports
special character sets and a lexicon-building menu system are
described. The current state of the ULTRA system are summarized and
future research directions are discussed.


Wilks, Yorick, First Workshop of the Consortium for Lexical Research,
1992, CRL, MCCS-92-243. ($7.00)

This document is a brief record of the presentations and
discussion at the first workshop of the Consortium for Lexical
Research (CLR), held at Las Cruces, New Mexico in January 1992. The
nature and role of the CLR is explained at the end of this document.
The workshop brought together researchers, publishers, funders and
consumers of lexical data to discuss how a range of key legal and
intellectual issues related to the functioning of the CLR.

The transcript is partial and must be viewed in that light: notes were
taken by a range of people and transcribed but, inevitably, some kept
much fuller notes than others and my editing cannot repair that, so
that the space given here to speakers is a function of the fullness of
the notes and NOT of the length of what was said. It should be born in
mind that the words ascribed to speakers are not literally their own,
and we have avoided a lengthy process of consulting them because any
editing by the speakers themselves would inevitably destroy dialogue
coherence. Hence this record will not receive any formal publication
beyond this form, and anyone who feels they have been misrepresented
must accept my apologies.

The workshop was supported by the Office of Naval Research, Defense
Advanced Research Projects Agency, the National Science Foundation and
the Association for Computational Linguistics, and many thanks are due
to the relevant officers at all three institutions.


Helmreich, Steve, Jin, Wanying, Wilks, Yorick, Guillen, Rocio, 1992,
Questions de Traduction Automatique au Computing Research Laboratory
(CRL) (French Version), CRL, MCCS-92-244. ($5.00)

NO ABSTRACT


Barnden, John & Srinivas, K., 1992, Working Memory Variables, Logical
Combinators and Systematicity, CRL, MCCS-92-245. ($5.00)

The connectionist problem of achieving the quantificational effect of
symbolic variables is well recognized. However, one relatively
neglected issue is that of variables in working memory representations
(arising, for instance, from natural language inputs), as opposed to
variables in rules. Working memory variables present difficulties,
centering on the arbitrariness of the set of variables used in any
given expression, and on non-uniformity in expressions and
manipulations. However, the variables can in principle be avoided,
for instance by using logical combinators. These are special
functions much studied within the symbol processing arena. The use of
combinators makes structures less arbitrary and more uniform. The
reduced arbitrariness ameliorates an important systematicity problem,
and the added uniformity could facilitate high-level parallelism. We
do not claim that the combinator approach is definitely the right one
to adopt, because of some problems. Nevertheless, combinators need to
be borne in mind, and the symbolic/connectionist debate has been
over-simplified in ignoring them. We discuss several ways in which
combinator-based working memory items could be implemented in
connectionism, with special attention to reduced-representation
implementations. We also compare the combinator approach with the
technique of using canonical sets of variables in expressions.


Bruce, Rebecca, Wilks, Yorick, Guthrie, Louise, Slator, Brian, and
Dunning, Ted, 1992, NounSense - A Disambiguated Noun Taxonomy with a
Sense of Humour, CRL, MCCS-92-246. ($7.00)

The Computing Research Laboratory at New Mexico State University is
involved in a project to create a data-base of lexical facts in the
form of a network of semantically related word senses. The data-base
will support the automatic construction of lexicons for many types of
natural language processing systems. In this paper we discuss
NounSense, a disambiguated IS-A hierarchy of nouns automatically
constructed from the Longman Dictionary of Contemporary English
(LDOCE). The primary focus of our presentation will be on the
techniques used to construct the network, and on its semantic readable
dictionaries in general. Additionally we will present a brief
overview of the interface developed for NounSense, as we feel it
incorporates many features that enhance the usability of the
information in the data-base. Finally, we will exhibit the fact that
our network does contain a sense of humor, indeed more than one, with
some interesting taxonomical relationships.


Wang, Jin and Wilks, Yorick, 1992, Protocols for Reference Sharing in
a Belief Ascription Model of Communication, CRL, MCCS-92-248. ($5.00)

The basic idea behind the \fIViewGen\fR model is that each agent
involved in a conversation has a belief space which includes models of
what other parties to the conversation believe. The distinctive
notion is that a basic procedure, called belief ascription allows
belief spaces to be amalgamated so as to model the updating and
augmentation of belief environments. In this paper we extend the
\fIViewGen\fR model to a more general account of reference phenomena,
in particular by the notion of an ascription path (AP) that links
intensional objects across belief environments so as to locate the
most heuristically plausible referent at a given point in a
conversation. The key notion is the location and attachment of
entities that may be under different descriptions, the consequent
updating of the system's beliefs about other agents by default, and
the role in that process of speaker's and hearer's protocols that
ensure that the choice is the appropriate one. The purpose of these
protocols is to make the models of other agent's beliefs as good a
representation as possible given the information to hand, and to make
the agent's own beliefs more accessible to the other (on the
assumption no deception is involved). The important characteristic of
this model is that each communicator considers nothing beyond his own
belief space.


Guthrie, Louise, Guthrie, Joe, Wilks, Yorick, Cowie, Jim, Farwell,
David, Slator, Brian, and Bruce, Rebecca, 1992, A research program on
machine-tractable dictionaries and their application to text analysis,
CRL, MCCS-92-249. ($5.00)

Machine-readable dictionaries (MRD's) contain substantial knowledge
about language and the world essential for large-scale tasks in
natural language processing (NLP), though an important empirical
question remains whether it is sufficient for such tasks. This
knowledge, however, collected and recorded by lexicographers for human
readers, is not expressed in MRD's in a form that can be used directly
as a tool for NLP tasks. What the NLP research community needs is
machine tractable dictionaries (MTD's); that is, MRD's transformed
into a format appropriate for NLP tasks.

At CRL we have explored several large-scale computational methods for
the transformation of MRD's into MTD's, and have also developed a
range of tools for extracting information from MRD's for specific NLP
applications. We describe here a combination of these methods, based
on their respective strengths: a hybrid SPIRAL methodology that
combines elements from each of the methods (numerical and
non-numerical) into a single coherent procedure to produce an MTD or
lexical-knowledge base. Our hope is that, although each of the
methods is incomplete in certain respects, and so a weak method in
Newell's sense, the combination of them will yield better results than
any individual method could. The chief difficulty with the MRD itself
is that the defining items in the dictionary are themselves ambiguous,
and it is this that the SPIRAL methodology seeks to overcome.

The result is an MTD that is a knowledge-base of unambiguous lexical
facts, linked by a network of semantically-related word senses. It is
currently derived from Longman's Dictionary of Contemporary English
(LDOCE), though we are augmenting that from other MRD's such as
COBUILD. We also describe briefly one application of the MTD to
machine translation, and the extension of the disambiguation
techniques used in construction of the MTD to large-scale
sense-tagging of general text. The end goal of the work described
here is to develop methods for the production of larger systems faster
than can be achieved with custom-made lexicons.


Dunning, Ted and Davis, Mark, (1993), Multi-lingual Information
Retrieval, CRL, MCCS-93-252. ($5.00)

We have designed a fully multi-lingual information retrieval system
and tested crucial parts. This system can accept a query in one
language and find documents in others. Furthermore, relevance
feedback can be used in a fully multi-lingual fashion.

Our system is based on the availability of parallel and aligned texts.
We use these texts to derive a linear approximation of the translation
process, and then use this linear transformation to implement a
conventional vector based information retrieval system. We describe
three possible techniques for deriving this translation matrix, one of
which we have implemented and tested on a relatively moderately sized
training corpus. Our method appears to be very efficient in terms of
the size of the necessary training corpus.

Since our solution for the translation matrix is incremental in
nature, additional parallel texts can be used to augment the system at
any time.


Wilks, Yorick, (1993), Second Workshop of the Consortium for Lexical Research:
US/European Lexical Cooperation, CRL, MCCS-93-254. ($5.00)

In January 1993 the Computing Research Laboratory hosted a workshop on
international cooperation of lexical computation under the auspices of
the Consortium for Lexical Research and supported by the National
Science Foundation and the European Commission. This document is a
report of the discussion and presentations.


Wilks, Yorick, (1993), Stone Soup and the French Room: the
empiricist-rationalist debate about machine translation, CRL,
MCCS-93-255. ($5.00)

The paper argues that the IBM statistical approach to machine
translation has done rather better after a few years than many
sceptics believed it could. However, it is neither as novel as its
proponents suggest nor is it making claims as clear and simple as they
would have us believe. The performance of the purely statistical
system (and we discuss what that phrase could mean) has not equalled
the performance of SYSTRAN. More importantly, the system is now being
shifted to a hybrid that incorporates much of the linguistic
information that it was initially claimed by IBM would not be needed
for MT. Hence, one might infer that its own proponent do not believe
"pure" statistics sufficient for MT of a usable quality. In addition
to real limits on the statistical method, there are also strong
economic limits imposed by their methodology of data gathering.
However, the paper concludes that the IBM group have done the field a
great service in pushing these methods far further than before, and by
reminding everyone of the virtues of empiricism in the field and the
need for large scale gathering of data.


Cowie, Jim, Smith, Lisa, and Wilks, Yorick, (1993), Projects at CRL in
Natural Language Processing, CRL, MCCS-93-256. ($5.00)

NO ABSTRACT


Wilks, Yorick, (1993), Language, vision and metaphor, CRL,
MCCS-93-257. ($3.00)

The integration of language and vision capabilities in computers can
be seen purely as a multi-media task without any theoretical
assumptions being required. However, it is worth exploring whether
the modalities have anything serious in common, in particular in the
light of the claim that most non-technical language use is
metaphorical. What consequences would that have for the underlying
relationship of language and vision: is it possible that vision is
largely metaphorical?

The conclusion is that visual processing can embody structural
ambiguity (whether compositional or not), but not anything analogous
to metaphor. Metaphor is essentially connected with the extension of
sense and only symbols can have senses. But if it makes no sense to
say a figure can be metaphorical (unless it embodies symbolic
elements) that must also mean, alas, that it makes no sense to say it
is literally anything either. Only a symbol can be literally
something. A hat is a hat is a hat, but never, ever, literally so.


Wilks, Yorick, (1993), Penrose on Artificial Intelligence, CRL,
MCCS-93-258. ($3.00)

Penrose's attack on artificial intelligence (AI) will give its
practitioners less worry and discomfort than did its earlier
philosophical critics Dreyfus and Searle, answering whom busied AI-ers
for years. Penrose does not really understand his target, in the sense
of knowing first-hand the detail and variety of AI work. He seems to
have got much of his understanding of it second-hand from Searle whose
critical terminology he uses, unexamined. Moreover, some of his
arguments have already been well-rehearsed within philosophy, such as
the one about the possible relevance of Goedel's theorem to machine
intelligence.

None of this would matter if his arguments were good but they are not;
in the case of the Goedel argument he has added nothing not present in
the older version. Penrose's only claim on our attention, apart
from his book sales, is that he is an established physicist and
mathematician, one with striking achievements to his name in topology,
cosmology and quantum theory as well as the discovery of a new
impossible object.



End of NL-KR Digest
*******************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT