Copy Link
Add to Bookmark
Report
AIList Digest Volume 2 Issue 131
AIList Digest Friday, 5 Oct 1984 Volume 2 : Issue 131
Today's Topics:
Linguistics - Sastric Sanskrit & LOGLAN & Interlinquas
----------------------------------------------------------------------
Date: Wed, 3 Oct 1984 23:55 PDT
From: KIPARSKY@SU-CSLI.ARPA
Subject: Sanskrit has ambiguity and syntax
Contrary to what Briggs claims, Shastric Sanskrit the same kinds of
ambiguities as other natural languages. In particular, the language
allows, and the texts abundantly exemplify: (1) anaphoric pronouns
with more than one possible antecedent, (2) ambigous scope of
quantifiers and negation, (3) ellipses, (4) lexical homonymy, (5)
morphological syncretism. Even the special regimented language in
which Panini's grammar of Sanskrit is formalized (not a natural
language though based on Sanskrit) falls short of complete unambiguity
(see Kiparsky, Panini as a Variationist, MIT Press 1979). The claim
that Sanskrit has no syntax is also untrue, even if syntax is
understood to mean just word order: rajna bhikshuna bhavitavyam would
normally mean "the beggar will have to become king", bhikshuna rajna
bhavitavyam "the king will have to become a beggar" --- but in any
case, there is a lot more to syntax than word order.
------------------------------
Date: Wed, 3 Oct 84 01:23:07 PDT
From: "Dr. Michael G. Dyer" <dyer@UCLA-LOCUS.ARPA>
Subject: Sastric Sanskrit
Re: Rick Briggs' comments on a version of Sastric Sanskrit.
Well, I AM incredulous! Imagine. The entire natural language
processing problem in AI has already been solved! and a millenium ago!
All we need to do now is publish a 'manual' of this language and
our representational problem in NLP is over! Since this language
can say anything you want, and "mean exactly what you say" and
"with no effort", and since it is unambiguous, it sounds like
my problems as an NLP researcher are over.
I DO have a few minor concerns (still). The comment that
there are no translations, and that it takes sanskrit scholars
a "very long time" to figure out what it says, makes it sound to
me like maybe there's some complex interpretations going on.
Does this mean that a 'parser' of some sort is still needed?
Also, I'd greatly appreciate a clearer reference to the book (?)
mentioned. Who is the publisher? Is it in English? What year
was it published? How can we get a copy?
Another problem: since this language has an "extensive literature" does
that include poetry? novels? Are the poems unambiguous? are there
plays on words? metaphor? (Can you say the equivalent of "Religion is
the opiate of the masses"? and if not, it that natural? if not, then
how are analogical mappings formed?) satire? humor? puns?
exaggeration? fantasy? does the language look like a bunch of horn
clauses? (most of the phenomena in the list above involve AMBIGUITY of
context, beliefs, word senses, connotations, etc. How does the
literature avoid these features and remain literature?)
Finally, Yale researchers have been arguing that representational
systems for story understanding requires explict conceptual structures
making use of scripts, plans, goals, etc. Do such constructs
(e.g. scripts) exist explicity in the language?
does its literature make use of idioms?
e.g. "John drove Mary [home]" vs
"John drove Mary [to drink]"
Also, why is English "worse" than other languages? Chinese has
little syntax and it's ambiguous. Latin has very free word order
with prefixes and suffixes and it's ambiguous. Both rely heavily on
context and implicit world knowledge. Early work by Schank
included representing a Mayan dialect (i.e. Quiche') in Conceptual
Dependency. Quiche seems to have features standard to other natural
languages, so how is English worse?
In the book "Reader over Your Shoulder", Graves & Hodge have a humorous
piece about some town councilmen trying to write a leash law.
No matter how they state it, unhappy assumptions pop up.
e.g. "No dogs in the park without a leash" seems to be addressed
to the dogs. "People must take their dogs into the park on a leash"
seems to FORCE people to drag there dogs into the park (and at what hour?)
even if they don't want to do so. etc etc
what about reference? does sastric sanskrit have pronouns?
what about IT? does IT have THEM? etc if so, how does it avoid
ambiguous references? how many different types of pronouns does it
have (if any)?
Let's have some specific examples. E.g. does it have the equivalent of
the word "like"? Before you answer "yes", there's a difference
between "John likes newsweek" and "John likes chocolate"
In one case we want our computer to infer that John likes to "eat"
chocolate (not read it) and in the other case that he likes to
read newsweek (not eat it). Sure, I COULD have said
"John likes to eat chocolate" specifically. but I can abbreviate
that simply to "x likes <object>" and let the intelligent listener
figure out what I mean. When I say "John likes to eat chocolate"
do I mean he enjoys the activity of eating, or that he feels
better after he's eaten? When I say "John likes to eat
chocolate but feels terrible afterwards" I used the word "but"
because I know it violated a standard inference on the part of the
listener. Natural languages are "expectation-based". Does this
ancient language require the speaker to explicitly state all
inferences & expectations?
Like I said already, if this ancient language really does what
is claimed, then we should all dump the puny representational
systems we've been trying to invent and extend over the last
decade and adopt this ancient language as our final say
on semantics.
Recent work by Layman Allen (1st Law & Technology conference)
in normalizing American law shows that the logical connectives
used by lawyers are horribly ambiguous. Lawyers use
content semantics to avoid noticing these logical ambiguities.
Does this brand of sanskrit have a text of ancient law? What
connectives did they use? Maybe the legal normalization problem
has also already been solved.
Did they have a dictionary? If so, can we see some of the entries? How
do the dictionary entries combine? No syntax AT ALL? Loglan adds
suffixes onto everything and it's plenty awkward. It has people who
write poems in it and other "literature" but you can probably pack all
loglanners who "generate" loglanese into a single phone booth.
Just how many ancient scholars spoke this sanskrit?
I look forward to more discussion on this incredible language.
-- A still open-minded but somewhat skeptical inquirer
------------------------------
Date: Thursday, 4-Oct-84 23:59:06-BST
From: O'KEEFE HPS (on ERCC DEC-10) <okeefe.r.a.%edxa@ucl-cs.arpa>
Subject: An Unambiguous Natural Language?
There was a recent claim in this digest that a "branch of Sastric
Sanskrit" was an unambiguous natural language. There are a number of
points I'd like to raise:
(a) If there are no translated texts, and if it takes a very long
time for an expert in "ordinary" Sanskrit to read untranslated
texts, it seems more than likely that the appearance of being
free from ambiguity is an illusion due to our ignorance.
(b) Thanks for the reference. But judging by the title you need to
know a lot more about Indian languages to read it than most of
the readers of this digest, and without knowing the publisher one
would have to be thoroughly at home with the literature to even
find it.
(c) It's news to me that Sanskrit wasn't an Indo-European language.
The Greek-English dictionary I have a copy of keeps pointing to
Sanskrit roots as if the two languages were related, but what do
they know? If Sastric Sanskrit is an Indo-European language, it
is astonishing that it alone is unambiguous. It's especially
astonishing when the one non-Indo-European language of which I
have even the sketchiest acquaintance (Maaori) isn't unambiguous
either and when no-one seems to be claiming that Japanese or
Chinese or any other common living language is unambiguous.
(d) Dead languages are peculiarly subject to claims of perfection.
Without a living informant, we cannot tell whether our failure to
discover another reading means there isn't one or whether it just
means that we're ignorant of a word sense. I suppose this is
point (a) again.
(e) If a language permits metaphor, it is ambiguous. The word for
"see" in ordinary Sanskrit is something like "oide", and I'm told
that it can mean "understand" as well as "perceive with the eye".
Do we KNOW that the Sastric Sanskrit words for "see", "grasp",
and so on were NEVER employed with this meaning?
(f) We're actually dealing with an ambiguous term here: "ambiguous".
The following definition is the only one I can think of which is
not dependent on some "expert's" arbitrary choice:
a sentence S in a text is ambiguous if
taking into account assumed common knowledge and the
context supplied by the rest of the text
there is some natural language L such that
S has at least two incompatible translations in L.
Here's an example: there are four people in a room, A, B, C, D.
This is the beginning of the text, and nothing else in the text
lets us judge these points, and we've never heard of A,B,C,D
before. A says to D: "we came from X."
I assume we know exactly what place X is. Now, does A mean that
A,B,C and D all came from X? (reminding D)
A,B,C came from X?
A and D came from X? (he knows B and C are listening)
A and one of B and C came from X?
We need to distinguish between dual and plural number, and
between inclusive first person and exclusive first person. If
the language L marks the gender of plural subjects, we may need
to know in the case of A and (B or C but not both) which of B
and C was intended. Now consider A mentioning to D "that table",
assuming that there are several tables in the same room, all of
the same sort. We need to know whether the table he is indicating
is near D (it can't be near A or he'd say "this table") or whether
it is distant from both A and D. Does the branch of Sanskrit in
question make all these distinctions? Can every tense in it be
translated to a unique English tense? Does it have no broad
colour terms such as the "grue" present in several languages?
Failing that, by what criterion IS it unambiguous?
{What's a better definition of ambiguity? This one strikes
most people I've offered it to as too strong.}
(g) Absence of syntax is no guarantee of unambiguity. Consider the
phrase "blackbird". It doesn't matter how we indicate that
black modifies bird, the source of ambiguity is that we don't
know whether the referent is some generic bird that happens to
be black (a crow, say), or whether this phrase is used as the
name of a species. In English you can tell the difference by
prosody, but that doesn't work to well with long-dead languages,
and if you thought it always meant turdus merula you might never
find anything in the fixed stock of surviving texts to reveal
the mistake.
(h) What evidence is there that this language was spoken? Note that
if a text in this language quotes someone as speaking in it,
that still isn't evidence that the language was spoken. I've
just been reading a book set in Greece, with Greek characters,
but the whole thing was in English... Are there historians
writing in other languages who say that the language was spoken?
(i) There is another ambiguous term: "natural" language. Is Esperanto
a natural language? Is Shelta? The pandits were nobody's fools,
after all, Panini invented Backus-Naur form for the express
purpose of describing Sanskrit, and I am not so contemptuous of
the ancient Indians as to say that they couldn't do a better job
of designing an artificial language than Zamenhof did.
I'm not saying the language isn't unambiguous, just that it's such a
startling claim that I'll need more evidence before I believe it.
------------------------------
Date: 3 Oct 84 12:57:24-PDT (Wed)
From: hplabs!sdcrdcf!sdcsvax!sdamos!elman @ Ucb-Vax.arpa
Subject: Re: Sanskrit
Article-I.D.: sdamos.17
Rick,
I am very skeptical about your claims that Sastric Sanskrit is an
unambiguous language. I also feel you misunderstand the nature
and consequences of ambiguity in natural human language.
| The language is a branch of Sastric Sanskrit which flourished
|between the 4th century B.C and 4th century A.D., although its
|beginnings are somewhat older. That it is unambiguous is without
|question.
Your judgment is probably based on written sources. The sources may also
be technical texts. All this indicates is that it was possible to write
in Sastric Sanskrit with a minimum of ambiguity. So what? Most languages
allow utterances which have no ambiguity. Read a mathematics text.
|The problem is that most (maybe all) of us are used
|to languages like English (one of the worst) or other languages which
|are so poor as vehicles of transmission of logical data.
I think you have fallen victim to the trap of the egocentrism. English is
not particularly less (or more) effective than other languages as a vehicle
for communicating logical data, although it may seem that way to
a native monolingual speaker.
| The facility and ease with which these Indians communicated
|indicates that it is possible for a natural language to serve all
|purposes of artificial languages based on logic.
How do you know how easily they communicated? I'm serious. And
how easily do you read a text on partial differential equations? An
utterance which is structurally ambiguous may not be the easiest to
read.
|If one could say what one wishes to say with absolute clarity (although
|with apparent redundancy) in the same time and with the same ease as
|you say part of what you mean in English, why not do so? And if a
|population actually got used to talking in this way there would be
|much more clarity and less confusion in our communication.
Here we come to an important point. You assume that the ambiguity of
natural languages results in loss of clarity. I would argue that
in most cases the structural ambiguity in utterances is resolved
by other (linguistic or paralinguistic) means. Meaning is determined
by a complex interaction of factors, of which surface structure is but one.
Surface ambiguity gives the language a flexibility of expression. That
flexibility does not necessarily entail lack of clarity. Automatic
(machine-based) parsers, on the other hand, have a very difficult time
taking all the necessary interactions into account and so must rely more
heavily on a reliable mapping of surface to base structure.
| As to how this is accomplished, basically SYNTAX IS ELIMINATED.
|Word order is unimportant, speaking is thus comparable to adding a
|series of facts to a data-base.
Oops! Languages may have (relatively) free word order and still have
syntax. A language without syntax would be the linguistic find of
the century!
In any event, the principal point I would like to make is that structural
ambiguity is not particularly bad nor incompatible with "logical" expression.
Human speech recognizers have a variety of means for dealing with
ambiguity. In fact, my guess is we do better at understanding languages
which use ambiguity than languages which exclude it.
Jeff Elman
Phonetics Lab, Dept. of Linguistics, C-008
Univ. of Calif., San Diego La Jolla, CA 92093
(619) 452-2536, (619) 452-3600
UUCP: ...ucbvax!sdcsvax!sdamos!elman
ARPAnet: elman@nprdc.ARPA
------------------------------
Date: Friday, 5 Oct 1984 10:15-EDT
From: jmg@Mitre-Bedford
Subject: Loglan, properties of interlinguas, and NLs as interlinguas
There has been a running conversation regarding the use of an
intermediate language or interlingua to facilitate communication between
man and machine. The discussion lately has focused on whether or not it
is possible or even desirable for a natural language (i.e., one which was
made for and spoken/written by humans in some historical and cultural
context) to serve in this role. At last glance it would seem to be a
standoff between the cans and cannots. It might be interesting to see
if a consensus can at least be reached regarding what an interlingua
might be like and therefore whether any natural languages or formal ones
for that matter would fit or could be made to fit the necessary form.
It would seem that a candidate language would possess a fair
sample of the following characteristics (feel free to add to or modify
this list):
1) small number of grammar rules--to reduce the trauma of learninng
a new language, simplify parsing program, and generally speed up the works
2) small number of speech sounds--to ease learning, and, if well
chosen, improve the distinction between sounds and thus the apprehensibil-
ity of the spoken language
3) phonologically consistent--for similar reasons as 2) above
4) relative freedom from syntactic ambiguity--to ease translation
activities and provide an experimental tool for exploring ambiguity in
NLs and thought
5) graphologically regular/consistent with phonology--to ease the
transition to the interlingua by introducing no new characters and only
simple spelling rules
6) simple morphology--to improve the recognizability of words and
word types by limiting the structures of legal words to a few and making
word construction regular
7) resolvability--to aid in machine and human information extraction,
particularly in noisy environments, by combining well-chosen phonology and
morphology
8) freedom from cultural or metaphysical bias--to avoid introducing
unintended effects due to specific built-in assumptions about the universe
that may be contained within the language
9) logical clarity--to ensure the ability to construct the classical
logical connections important to semantically and linguistically useful
expressions
10) wealth of metaphor--to allow this linguistic feature to be studied
and provide a creative tool for expression
These features were selected to try to characterize the intent of
a hypothetical designer of an interlingua. Possibly no product could fully
merge all the features without compromising unacceptably some of the desir-
able traits. If this list appears unacceptable, make suggestions and/or
additions and deletions until a workable list results.
It is likely that no current or historical natural language would
combine a sufficient number of the above features to stand out as an obvious
choice to use as interlingua. Simplicity, regularity, ease of learning,
ease of information extraction, lack of syntactic ambiguity, and the rest
are the earmarks of a constructed language. It remains to be seen that a
so-constructed language can be used by humans to express unrestrictedly the
full range of human thought.
In response to Dr. Dyer's comment about loglan, I can testify that it
is not all that hard to get around in. It is a "foreign" language, however,
and thus takes some learning and getting used to. It does have several of
the features that an interlingua would. Only experience will ultimately
reveal whether it is "natural" enough to be useful for exploring the rela-
tionship between thought and language and formal enough to be machine-
realizable.
-Michael Gilmer
jmg@MITRE-BEDFORD.ARPA
------------------------------
End of AIList Digest
********************