Copy Link
Add to Bookmark
Report
NL-KR Digest Volume 02 No. 19
NL-KR Digest (3/30/87 14:25:16) Volume 2 Number 19
Today's Topics:
The data of Linguistics ( 1st & 2nd try wrong )
AI Project Information Request
from CFGs to TAGs and beyond - Weir
Seminar - Logic Programming and Expert Systems
- A Computational Model of Referring - Kronfeld
- Left-associative Grammar - Hausser
Colloquium - Lazy Chart-parsing w/extended grammar - Pareschi
----------------------------------------------------------------------
Date: Sun, 22 Mar 87 20:32:55 CST
From: mark edwards <edwards@unix.macc.wisc.edu>
Subject: The data of Linguistics ( 1st & 2nd try wrong )
There has been some controversy on my original question. I have
started the paper and decided this was a way of elucidating my
thoughts. As it turns out those who questioned what I was asking
were right. Namely, it appears now that what I was asking for was
only a part of the intended goal. The following included text is
an initial draft that I am using to collect and consolidate my
thoughts. I hope this makes my question clearer. I welcome any
discusion, or critizism.
mark
The Data of Linguists
1. Introduction
I started out this paper to look at the validity of Linguists using
the single isolated sentence (SIS) as the object of study. Something about the
way sentences were judged to be "ungrammatical" really bothered me. In many
of the cases where a sentence was "said" to be ungrammatical it turned out
to be grammatical in a context. I began to wonder if sentences should be
judged in a context with a possible proceeding sentence. But I could find
no realistic evidence, syntax wise, that would support my theory. I of course
was looking at the wrong thing. I was looking at the sentence, when the context
was what I was after. This paper examines the grammaticality of a sentence
with respect to context.
2. Syntax
Sometimes, I have a hard time determining what exactly is the study of syntax.
Much of the literature seems, at least to me, to blur syntax with a sense of
of grammaticalness. The symbol for being not grammatical is "*" and is applied
equally to sentences that are unacceptable or bad in either its structure
or its confusion of meaning or both. It could be argued that the sense of being
grammatical is the sense of clear meaning. So, if we jumble a sentence then the
meaning becomes confused and the structure is bad. Thus a bad structure would
imply being unsemantical and thus ungrammatical.
(1) The man read today's paper.
(2) Man the read paper today's.
Sentence (2) is a jumbled version of sentence (1). The meaning of (1) is readily
clear, while (2) is not. Interesting enough, (2) is understandable, probably
because the structure is jumbled in a consistent way.
Some Linguistics would argue that the role syntax plays is a minor role, and
can be ignored (Shank ). However, Miller and Isard (1963) experiments suggest
that that subjects can identify strings that are syntactically correct easier
than random strings, and those that were also semantically correct even better.
Another experiment that we can do here will show that syntax does play a role
in natural language. The following sentence consists mainly of nonsense syllables
(3) The ifrothy wizzle greped the milther.
Even thought (3) has no meaning, my intuitions say that it is grammatical, or
at least syntactically correct.
Syntax is structure oriented. Syntax tells us about the structured
distribution of words. When we see the word "the" we know that soon after
there must be some kind of noun phrase. When we see a word that ends in,
"ed", "ly", "y", we usually know something about the distribution of that
word. Therefore it should come as no surprise that (3) seems grammatical.
Whatever (3) tells us, analyzing real natural language is not so clear cut.
Many of the methods we use to get at the structure of sentences and their
integral parts must rely on meaning. Many english words have several meanings
and to judge a sentence we ultimately have to appeal to the words meaning in
a particular sense. Subcategorization, a valuable tool in analyzing sentences,
relies on the sense of a word that we choose. Thus a verb may have several
subcategorizations. We could also say that to subcategorize a verb first
we must appeal to a context where the verb is in a grammatical sentence.
So we must put the verb in context before useful work can be done.
3. The Data
The study of Language undoubtably must use some data and we can use
something Gazdar (1979) said as plausible.
I shall assume ... that invented strings and certain intuitive judgements
about them constitute legitimate data for linguistics research.
Though I doubt that it wise to use just these invented strings. Natural
data must eventually be consulted to prove or disprove any theory that
is posited. We also might take Lyons (1970) statement of the proper medium that
Linguist study should be taken from.
Linguistics takes it as axiomatic that speech is primary and that written
language is secondary and derived from it.
If Lyons is right then why do Linguists popularly use only the SIS. We seldom
see sentences in the isolation that Linguists creates for them. Yet we are
asked to judge these sentences in this isolation. Brown and Yule (1983) rightly
question:
After all what do we do when we are asked whether a particular string is
'acceptable'? Do we not immediately, and quite naturally, set about
constructing some circumstances (i.e. a 'context') in which the sentences
could be acceptably used ?
The following sentences seem to be grammatical.
(4) John home ?
(5) John ?
(6) Yes. Your brother.
(7) Probably.
(8) John, car in the garage ?
(9) John, the potatoes.
The comma in (8) and (9) is not really needed. It is included to facilitate the
intended reading. Sentences (4) through (7) seem to form a discourse or a text.
They all are missing verbs, I find it difficult to believe that sentences like
these can not be accounted for in the usual literature on GB and GPSG.
The following sentences are also grammatical.
(10) John put the car in the garage.
(11) Who put it ?
(12) John put it.
Maybe my imagination is just running wild. In this discourse the subcategori-
zation of put is only satisfied in (10). If you can't get (11) try making the
"o" in Who long and rising. We can now really understand Bloomfield's (1933)
definitions of the sentence;
When a linguistic form occurs as part of a larger form, it is said to be in
included position; otherwise it is said to be in absolute position and to
constitute a sentence.
An equally correct definition is how many Linguists now define it;
The place where we would put the period.
Is it true that sentences that seem to be ungrammatical when in isolation
magically become grammatical in context? I think that these sentences should
not be labeled "*" at all. There is nothing magical in Linguistics. There are
only cold hard facts. Namely, real data to justify these sentences. A better
analysis of the sentence might be that they are ambiguously ungrammatical.
Meaning that it is very hard for the normal hearer to judge these "*"
sentences grammatical when they are not given some kind of primer.
Some sentences need no primer to be judged. Perhaps this is because their
structure has a higher frequency in normal conversation. Words like,
"moreover", and "however" imply that they are a part of a context. Take
a look at (13).
(13) However, John went to school.
(13) begs to have a previous sentence or context. The semantic analysis
of (13) implies a context, just as (11) does. On the other hand (10)
could stand in isolation, in so much that no context necessarily comes
to mind, and possibly because the sentence is not remarkable in any other way.
For my last piece of evidence on the notion of grammatical vs. ungrammatical
I would like to appeal the readers past experience. Have you ever read a
sentence in a passage in an unfamiliar advanced science textbook and found
that you had to reread it, to get the meaning? I think it is arguable that
the first time through the sentence you found the sentence to be ungrammatical.
Sometimes it took many times to make the sentence grammatical. It seems that
familiarity of structure and words helps a reader more easily determine the
grammaticalness of sentence.
Conclusion
A sentence is not just a piece of data that can be analyzed in total isolation.
The words that make up any particular sentence have meaning and meaning does
affect the structure of a sentence. Pronouns and other words or phrases of
of variable like qualities refer to other things outside the sentence. Some
words imply previous context, otherwise the sentence would seem awkward to the
hearer. Sentences are judged to be grammatical on both the acceptability of the
syntax and the meaning of the words in the sentence. A sentence should not be
judged ungrammatical just because the person judging the sentence cannot
find a context where the sentence would be grammatical.
------------------------------
Date: 21 Mar 87 02:26:45 GMT
From: tektronix!reed!psu-cs!qiclab!neighorn@ucbvax.Berkeley.EDU (Steven C. Neighorn)
Organization: Qic Laboratories, Portland Oregon
Subject: AI Project Information Request
At Portland Public Schools we are using a Writing Assessment guide to
examine certain writing assignments. Normally, writing experts are used
to evaluate the text. This is a slow and laborious process. The idea
of computerizing some or all of the assessment was brought up at a recent
meeting. A visiting Artificial Intelligence expert thought the assessment
presented many interesting problems, and suggested presenting it to a
wider audience.
Writer's Work Bench and similar programs are useful for checking sentence
structure, but what we are interested in is something that can examine a
paper for organization, presentation, word usage, and content.
The assessment is divided up into five areas. Each area has a possible
score of 1, 3, or 5. A perfect paper would receive a score of 25.
The five scored areas for Writing Assessment are : Ideas and Content,
Organization, Voice, Effective Word Choice, and Sentence Structure.
An example of one of the areas is as follows:
Analytical Rating Guide
IDEAS AND CONTENT
5. This paper is clear in purpose and conveys ideas in an interesting,
original manner that holds the reader's attention. Clear, relevant examples,
anecdotes or details develop and enrich the central idea or ideas.
o The writer seems to be writing what he or she knows, often from
experience.
o The writer shows insight--a good sense of the world, people,
situations.
o The writer selects supportive, relevant details that keep the main
idea(s) in focus.
o Primary and secondary ideas are developed in proportion to their
significance; the writing has a sense of balance.
o The writer seems in control of the topic and its development
throughout.
3. The writer's purpose is reasonably clear; however, the overall result
may not be especially captivating. Support is less than adequate to fully
develop the main idea(s).
o The reader may not be convinced of the writer's knowledge of the
topic.
o The writer seems to have considered ideas, but not thought things
through all the way.
o Ideas, though reasonably clear and comprehensible, may tend toward the
mundane; the reader is not sorry to see the paper end.
o Supporting details tend to be skimpy, general, predictable, or
repetitive. Some details seem included by chance, not selected
through careful discrimination.
o Writing sometimes lacks balance: e.g., too much attention to minor
details, insufficient development of main ideas, information gaps.
o The writer's control of the topic seems inconsistent or uncertain.
1. This paper lacks a central idea or purpose--or the central idea can be
inferred by the reader only because he or she know the topic (question asked).
o Information is very limited (e.g., restatement of the prompt, heavy
reliance on repetition) or simply unclear altogether.
o Insight is limited or lacking (e.g., details that do not ring true;
dependence on platitudes or stereotypes).
o Paper lacks balance; development of ideas is minimal, or there may be
a list of random thoughts from which no central theme emerges.
o Writing tends to read like a rote response--merely an effort to get
something down on paper.
o The writer does not seem in control of the topic; shorter papers tend
to go nowhere, longer papers to wander aimlessly.
I would be very interested in hearing from anyone in netlandia who is working/
has worked/will be working on similar projects. Please follow-up, send email,
or call via landline. Comments are more than welcome. Thank you for your
consideration.
--
Steven C. Neighorn tektronix!{psu-cs,reed}!qiclab!neighorn
Portland Public Schools "Where we train young Star Fighters to defend the
(503) 249-2000 ext 337 frontier against Xur and the Ko-dan Armada"
QUOTE OF THE DAY -> 'Dr. Ruth is no stranger to friction.'
------------------------------
Date: Mon, 23 Mar 87 23:37:04 EST
From: tim@linc.cis.upenn.edu (Tim Finin)
Subject: from CFGs to TAGs and beyond - Weir
Dissertation proposal
Computer and Information Science
University of Pennsylvania
FROM CONTEXT-FREE GRAMMARS TO TREE ADJOINING
GRAMMARS AND BEYOND
David Weir
We describe recent results concerning Tree Adjoining Grammars.
In light of these results we compare this formalism with Context-
Free Grammars and establish a progression from Context-Free
Languages to Tree Adjoining Languages. In the hope of finding
linguistically interesting and mathematically elegant systems,
we generalize this progression defining an infinite hierarchy of
full principal AFL's that exhibit increasingly more complex
dependencies. This hierarchy is defined in terms of generators,
automata, and two grammatical systems. We argue that each family
in the hierarchy possesses a minimal set of criteria necessary for
linguistic adequacy. The relationship between each successive family
in the language hierarchy is most clearly understood in terms of
the nature of the dependencies that can be exhibited by each class.
We formalize the concept of dependencies in Context-Free Grammars
and Tree Adjoining Grammars. This definition is somewhat general
since it is derived in a systematic way from the pumping lemma for
the formalism under consideration.
Another major component of our work is the investigation of a
number of mechanisms that add to the expressive power of Tree
Adjoining Grammars: multicomponent TAG's; and the use of schematic
trees to reduce the size of the grammar and handle conjunction. We
investigate the extent to which these mechanisms add to the weak and
strong generative capacity of TAG's, and consider how the resulting
systems relate in power to members of the language hierarchy.
10am Thursday 26th March
ADVISOR: Dr. A. Joshi
COMMITTEE: Dr. J. Gallier (Chairperson)
Dr. A. Kroch
Dr. M. Palis
Dr. M. Steedman
------------------------------
From: chomicki@topaz.RUTGERS.EDU (Jan Chomicki)
Subject: Logic Programming and Expert Systems: a talk
Date: 19 Mar 87 17:28:37 GMT
Subject: AN EXPERT SYSTEM SHELL FOR
NEGOTIATION SUPPORT
Speaker: Prof. Stan Szpakowicz,
Dept.of Computer Science, University of Ottawa
(one of the authors of "Prolog for Programmers").
Time: Wednesday, March 25, 2:50 pm.
ABSTARCT
We present NEGOPLAN, a rule-based system for the modelling and support
of negotiation. The system uses rules and metarules to represent
knowledge essential to any negotiation process. Rules (Horn clauses),
describing the main goal in terms of lower-level goals and domain
facts, are backward-chained to check the consistency of the goal
representation. Metarules, which describe the anticipated behaviour
of both parties during negotiation, are forward-chained. They may, and
usually will, cause modifications in the goal description, and this in
turn requires re-evaluating this goal's consistency: hence a non-
monotonic character of the two-level deduction.
The negotiating position is expressed in terms of the degree of
flexibility as to domain facts. The relationship between the goal
representation and the position is established by means of a series of
choices from the various negotiating options.
This framework presents some interesting questions, both in its AI
aspects and in those related to negotiation. We present some of the
AI-related problems, such as the need to go outside the closed-world
assumption for goal consistency, or the applicability of dependency-
directed backtracking when an inconsistency has been detected.
The system is being prototyped in Prolog on a Sun-3 workstation. The
test case is a simplified representation of the Israeli-Egyptian
negotiations in Camp David in 1978.
--
Jan Chomicki (chomicki@topaz.rutgers.edu) Phone: (201) 932-3999
Dept.of Computer Science, Rutgers University, New Brunswick, NJ 08903
Usenet: {...harvard,pyramid,seismo,nike}!rutgers!topaz!chomicki
Arpanet: chomicki@topaz.rutgers.edu
------------------------------
Date: Fri, 27 Mar 87 09:14:36 EST
From: patricia
Subject: Seminar - A Computational Model of Referring - Kronfeld
SEMINAR -- University of Rochester Computer Science Department
Wednesday April 1, 1987
11th Floor Lounge, Hylan Bldg.
11:00 am
Speaker: Ami Kronfeld, SRI
Topic: A Computational Model of Referring:
(a joint research with Doug Appelt)
Speech acts have LITERAL GOALS and CONDITIONS OF SATISFACTION. For example,
the literal goal of a command is to let the hearer know that the speaker wants
him to do something. The command is satisfied when the hearer does what he is
told. We characterize the literal goal and conditions of satisfaction of
referring in terms of what the hearer is supposed to believe and then do. We
then develop a formal model where the effects of successful referring on both
the speaker's and hearer's mental states can b e specified in a precise way.
The model enables us to show:
1. How the effects of referring follow from an appropriate use of a noun
phrase.
2. How a referring act can be used to inform.
3. How referring can be interpreted as a request.
4. How reference can succeed even though the hearer knows that the
description used is wrong.
5. How to eliminate the implausible assumption that all objects in the
domain of discourse have known standard names.
Refreshments will be served in the 11th Floor Lounge at 10:45 am
------------------------------
Date: 27 Mar 87 10:27:09 EST
From: Patricia.Mackiewicz@isl1.ri.cmu.edu
Subject: Seminar - Left-associative Grammar - Hausser
TOPIC: Left-associative Grammar - a New Approach
to Efficient Parsing and Generation
SPEAKER: Roland Hausser, CMU
WHEN: Tuesday, March 31, 1987, 3:30 p.m.
WHERE: Wean Hall 5409
ABSTRACT
The surface structures of natural language are linear. When we utter
or understand a sentence, we process it word by word, starting at the
beginning. But the associated semantic structures are of a
hierarchical nature, expressed in terms of tree structures or
set-theoretic relations. This interesting structural relation is
investigated in Left-associative Grammar. Left-associative Grammar is
implemented as a new parsing algorithm, called Newcat (for 'new
categorical grammar').
Conceptually, left-associative grammar is based on the notion of
"possible continuations": after word n has been added, the system
specifies precisely what the categories of word n+1 may be for
grammatical continuations of the current sentence start. The linear
processing of syntax is captured by a "left-associative" algorithm,
which analyzes surface structure from left to right, first combining
word 1 and word 2, then combining the result with word 3, then adding
word 4, etc., until all the input is consumed.
The relation between linear (left-associative) trees and hierarchical
semantic structures is characterized explicitly in their simultaneous
generation. Each time a next word is combined with the analysis of the
current sentence start, the parser combines semantic sub-trees which
are homomorphic to a frame-theoretic data structure. The building of
semantic tree structures in the course of a left-associative parse
provides a theoretical link to systems like LFG and GPSG.
The conceptual derivation order of left-associative grammar is strictly
linear, left to right. This constitutes the main difference from other
generative grammars. The conceptual derivation order of
Phrase-structure Grammars, for example, is a top-down expansion of
nodes, while Categorical Grammars are based on bottom-up amalgamation.
Such conceptual orders of derivation cannot double as the procedural
order of a parse. For example, a Phrase-structure Grammar starts by
expanding the S-node, but a parser must reconstruct the constituent
structure postulated by the grammar from a string of input words.
Left-associative grammar is based on a conceptual derivation order
which serves also as the procedural order. The linguistic rules of
Left-associative Grammar are presented in terms of an abstract,
implementation-independent formalism which is declarative rather than
procedural in nature. The fact that the grammatical algorithm and the
parsing algorithm are the same is to be regarded as an advantage.
Improvements of the linguistic analysis translate directly into a
faster implementation, and analysis of the computation provides
excellent heuristics for improving the grammar.
The control structure of a left-associative parser is a
non-deterministic finite state transducer which operates over sequences
of categories, defined as lists of atoms. The transduction rules all
have the form
c1 c2 w ==> f(c1 c2) w
where f can be any computable function, c1 is the current sentence
start, c2 is the current new word, and w represents the remainder of
the input. The generative power of NEWCAT is at least context
sensitive and at most recursive.
Left-associative grammar has been implemented as the parsers ECAT and
DCAT, which handle sizeable fragments of English and German,
respectively, including constructions such as relative clauses,
passives, raising, unbounded dependencies, and conjunction. The parser
accomplishes the analysis of input sentences in real time. The current
version is implemented in Common Lisp on an HP-Bobcat. For sentences
of an average length of 10 words, parses are produced in an average
time of 0.3 seconds. In addition, Newcat is highly space efficient.
The source code is presently about 40 kilobytes, while the compiled
code is less than that (i.e., there is no trade-off of space to
increase run-time efficiency as in LR-style parser tables).
The talk will discuss the formal nature of the left-associative
algorithm, the format of the linguistic rules, the structure of the
lexicon, the linguistic analysis of specific constructions of English,
the relation of parsing and generation, and practical applications.
------------------------------
Date: Wed, 25 Mar 87 18:04:44 EST
From: tim@linc.cis.upenn.edu (Tim Finin)
Subject: Colloquium - Lazy Chart-parsing w/extended grammar - Pareschi
ABSTRACT FOR CIS COLLOQUIUM, U.PENN
3.00 p.m. Thursday 26th March
A LAZY WAY TO CHART-PARSE WITH EXTENDED CATEGORIAL GRAMMARS
Remo Pareschi
Department of Artificial Intelligence and
Centre for Cognitive Science, University of Edinburgh
Several recent linguistic proposals to analyse natural
language syntax by extending Categorial Grammars give rise
to proliferating semantically equivalent surface syntactic
analyses, posing potentially grave problems for parsing
efficiency. The paper offers as a solution a novel
unification-based extension of chart parsing which is
claimed to be generally applicable to parsing extended
Categorial Grammars. The solution in question exploits cer-
tain properties inherent in the grammar formalism itself,
namely the associativity and invertibility of the combina-
tion operations for syntactic categories.
------------------------------
End of NL-KR Digest
*******************