Copy Link
Add to Bookmark
Report

AIList Digest Volume 1 Issue 024

eZine's profile picture
Published in 
AIList Digest
 · 11 months ago

AIList Digest            Friday, 22 Jul 1983       Volume 1 : Issue 24 

Today's Topics:
Weizenbaum in Science Digest
AAAI Preliminary Schedule [Pointer]
Report on Machine Learning Workshop
----------------------------------------------------------------------

Date: 20 July 1983 22:28 EDT
From: Steven A. Swernofsky <SASW @ MIT-MC>
Subject: Weizenbaum in Science Digest

How much credence do Professor Weizenbaum's ideas get among the
current A.I. community? How do these statements relate to his work?

-- Steve

------------------------------

Date: 20 Jul 1983 0407-EDT
From: STRAZ.TD%MIT-OZ@MIT-MC
Subject: AAAI Preliminary Schedule

What follows is a complete preliminary schedule for AAAI-83.
Presumably changes are still possible, particularly in times, but it
does tell what papers will be presented.

AAAI-83 THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE at the
Washington Hilton Hotel, Washington, D.C. August 22-26, 1983,
sponsored by THE AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE and
co-sponsored by University of Maryland and George Washington
University.

[Interested readers should FTP file <AILIST>V1N25.TXT from SRI-AI. It
is about 19,000 characters. -- KIL]

------------------------------

Date: 19 Jul 1983 1535-PDT
From: Jack Mostow <MOSTOW@USC-ISIF>
Subject: Report on Machine Learning Workshop


1983 INTERNATIONAL MACHINE LEARNING WORKSHOP:
AN INFORMAL REPORT

Jack Mostow
USC Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA. 90291

Version of July 18, 1983

[NOTE: This is a draft of a report to appear in the October 1983 SIGART. I
am circulating it at this time to get comments before sending it in. The
report should give the flavor of the work presented at the workshop, but is not
intended to be formal, precise, or complete. With this understanding, please
send corrections and questions ASAP (before the end of July) to
MOSTOW@USC-ISIF. Thanks. - Jack]

The first invitational Machine Learning Workshop was held at C-MU in the
summer of 1980; selected papers were eventually published in Machine Learning,
edited by the conference organizers, Ryszard Michalski, Jaime Carbonell, and
Tom Mitchell. The same winning team has now brought us the 1983 International
Machine Learning Workshop, held June 21-23 in Allerton House, an English manor
on a park-like estate donated to the University of Illinois. The Workshop
featured 33 papers, two panel discussions, countless bull sessions, very little
sleep, and lots of fun.

This totally subjective report tries to convey one participant's impression
of the event, together with a few random thoughts it inspired. I have
classified the papers rather arbitrarily under the topics of "Analogy,"
"Knowledge Transformation," and "Induction" (broadly construed), but of course
33 independent research efforts can hardly be expected to fall neatly into any
simple classification scheme. The papers are discussed in semi-random order; I
have tried to put related papers next to each other.

1. Analogy
One notable change from the first Machine Learning workshop was the new
abundance of work on analogy. In 1980, analogy was a topic that clearly needed
work, but for which ideas were lacking. In 1983, several papers relevant to
analogical reasoning were presented:

Pat Winston (MIT) "Learning by Augmenting Rules and Accumulating Censors"
makes an interesting connection between analogy and non-monotonic reasoning.

Jaime Carbonell (CMU) "Derivational Analogy in Problem Solving and Knowledge
Acquisition" argues for the inseparability of learning and problem solving.

Lindley Darden (U. of Maryland) "Reasoning by Analogy in Scientific Theory
Construction" shows different ways in which analogy was used historically in
scientific discovery, and challenges AI to implement them.

Mark Burstein (Yale) "Concept Formation by Incremental Analogical Reasoning
and Debugging" models a student learning the semantics of programming language
assignment by combining analogies given by a teacher or textbook. This
excellent paper is discussed below in more detail.

Ken Forbus and Dedre Gentner (BBN) "Learning Physical Domains: Towards a
Theoretical Framework" describes "qualitative process theory" for describing
naive physics and "structure mapping" for analogical reasoning. They have
tackled the difficult and important problem of reasoning symbolically about
continuous processes.

Nachum Dershowitz (U. of Illinois) "Programming by Analogy" suggests how a
program to compute cube roots can be constructed by analogy with a program for
division, and how both can be abstracted into a common schema. Nachum wins the
"Presenting by Analogy" award for using abstract geometrical figures to
communicate most of his talk.



1.1. Lessons
It is clear that much progress has been made in analogy.

In contrast to classical work on abstract analogies of the sort used in IQ
tests, the 1983 papers emphasized analogy as a knowledge transfer method that
uses knowledge about old problems to help solve new ones.

The idea of analogies as matching graph structures in semantic networks was
already established, but it has now been refined in some important ways.
First, there is a consensus that causal relations are crucial to the analogy
while certain other parts of the graph (unary surface features) are not
[Winston, Burstein, Gentner].

Related to this is the idea of analogy as a process of inheriting a
justification [Carbonell, Winston]. Carbonell had previously introduced the
idea of "transformational analogy" in problem-solving -- to solve a new
problem, find a similar old problem, retrieve its solution path (sequence of
problem-solving operators), and perturb it into a solution to the new problem.
His new paper extends this into "derivational analogy" by adding information
about the goal structure motivating the operator sequence, the choices for how
to reduce each subgoal, and the reasons for choosing one over another. Via a
truth maintenance mechanism, each goal points to the choices that depend on it.
To solve a new problem, the derivation of the old solution is replayed as in
the POPART system developed by David Wile at ISI, but with an important
difference. At each goal node, the justifications for how the goal was
achieved are checked. If the reasons that support them are still true, or if
they can be proven based on new reasons, the solution can be used as is.
Otherwise, only the steps that depend on the violated justifications need be
modified. In short, adding explicit justifications gives a clean way to patch
old solutions instead of completely replaying their derivations. I think this
technique should be very useful for making the replay mechanism efficient.

Winston illustrated the problem of relating function to structure by asking,
"What is a cup?" Given a general functional definition ("graspable, stable,
holds liquid"), a structural description of a coffee cup ("handle, flat bottom,
concave upward"), and an explanation of how the latter instantiates the former
("the handle provides graspability in the case of hot liquids"), he draws an
analogy to a styrofoam cup by repairing the explanation (the handle is not
needed because the styrofoam insulates).

The problem of patching up sloppy or partial analogies received some
much-needed attention [Carbonell, Winston, Burstein, Darden]. In particular,
Burstein addressed the problem of integrating imperfect analogies. His CARL
program models the behavior of a (real) student learning the semantics of
assignment statements from a teacher's analogies to putting things in boxes,
algebraic equality, and remembering. He refined the model of analogy as
finding a match all at once to the more realistic one of incrementally
extending an initial correspondence (suggested by a tutor) into a more detailed
analogy. This can involve selecting among alternative mappings when the
initial analogy is ambiguous. For example, if `X = 6' is like putting the
value 6 in the box named X, does `X = Y' mean putting the box named Y inside
the box named X, or putting the value of Y into X? CARL infers the answer from
the analogy with algebraic equality.

2. Knowledge Transformation
Knowledge transformation converts knowledge from an inefficient or unusable
form to a more useful one.

Doug Lenat (Stanford), Rick Hayes-Roth (Teknowledge), and Phil Klahr (Rand)
"Cognitive Economy in a Fluid Task Economy" updates their 1979 Rand tech report
on caching, work worth knowing, but what Doug actually presented was one of the
entertaining (and instructive) two-screen EURISKO talks we have come to enjoy
so much.

To nobody's surprise but his own, Doug won the official workshop puzzle
contest, getting only one word wrong (which I helped him with). Doug's prize
was a diseased soybean plant, symbolizing U. of Illinois's favorite induction
problem. There was no second prize, but if there had been it probably would
have been two diseased soybean plants.

John Anderson (CMU) "Knowledge Compilation: The General Learning Mechanism"
uses production "composition" and "proceduralization" to model the progress of
a student learning how to program in LISP.

Paul Rosenbloom (CMU) "The Chunking of Goal Hierarchies: A Generalized Model
of Practice." Paul's Ph.D. thesis accounts for the universal power law of
practice with a chunking model (fast encode, connect, fast decode) that unifies
classical chunking, memo functions (alias "caching"), and production
composition.

Jack Mostow (USC-ISI) "Operationalizing Advice: A Problem-Solving Model"
describes a problem-solver called BAR, the successor to the 1981 FOO system.
Given a piece of advice for the card game Hearts, BAR helps find a sequence of
general program transformations that converts it into a procedure executable by
the learner.

Tom Mitchell and Rich Keller (Rutgers) "Goal-Directed Learning" describes
LEX2, which learns heuristics for symbolic integration by analyzing worked-out
examples. Unlike LEX1, which performed empirical induction from multiple
examples using the version space method, LEX2 constructs justifiable
generalizations from single examples based on an elegant explicit definition in
predicate calculus of what it means in LEX to be a heuristic (to lie on an
[optimal] solution path). This work fits both the "Knowledge Transformation"
and "Induction" categories because it induces heuristics by converting a
precise but inefficient definition of "heuristic" into specialized patterns
that can be tested inexpensively by matching. Tom's talk compared LEX2 with
DeJong's explanatory schema acquisition, Winston's analogical reasoning, and
Mostow's operationalizer, in terms of a three-step process (generate
explanation; extract sufficient condition for satisfying goal concept;
optimize). I enjoy seeing attempts to unify and compare different research
projects, especially when one of them is mine.



2.1. Lessons
In his keynote address at IJCAI79 in Tokyo, Herb Simon suggested that Lenat's
AM and Langley's BACON provided examples of discovery systems that might be
used as the basis for a theory of discovery, and that such a theory might in
turn serve to guide research in AI. When he pointed out that both AM and BACON
shared a heavily empirical bent, I realized to my distress that my FOO program
was just the opposite -- completely analytic. At the 1980 Machine Learning
Workshop, Tom Mitchell and I discussed the "analytic-to-empirical" spectrum,
and wondered how the two approaches might profitably be combined. The 1983
Workshop gives at least a couple of answers; more should be found.

An example of a purely empirical approach to knowledge transformation would
be a program that compiles frequently-used action sequences into
macro-operators without regard to such factors as the goal structure motivating
them; such macro-operators lack flexibility since they apply only to cases
where the exact sequence of operators applies. At the other extreme, a purely
analytic knowledge transformer (e.g., FOO/BAR) converts declarative knowledge
into an effective form without regard to such factors as which cases actually
arise in practice; the failure to exploit realistic assumptions leads to
procedures that are very general but very weak.

One way to combine empirical and analytic techniques is to analyze specific
examples that have arisen in actual practice, and generalize them by
identifying which properties were actually relevant to the outcomes [LEX2].

Another way takes a general piece of knowledge, an interpreter that applies
it to specific cases, and a caching mechanism that records the results. The
general knowledge is gradually compiled into streamlined procedures for special
cases [Anderson, Lenat, Rosenbloom].

3. Induction
Induction generalizes examples obtained from experience, observation,
experiments, tutors, newspapers, or elsewhere.



3.1. Inducing Rules
Ryszard Michalski and Robert Stepp (U. Illinois) "How to Structure Structured
Objects" manages to discuss classification of structured objects without
referring to soybean diseases.

Tom Dietterich (Stanford) and Ryszard Michalski (U. Illinois) "Discovering
Patterns in Sequences of Objects" describes an extension of Tom's 1979 M.S.
thesis program for the card game Eleusis, where the problem is to induce a
secret rule from positive and negative examples.

Tom Dietterich and Bruce Buchanan (Stanford) "The Role of Experimentation in
Theory Formation" reports on Tom's ongoing Ph.D. thesis project, EG, to induce
the semantics of Unix commands by performing experiments to see what they do.
EG ignores the explanations provided by Unix error messages, but it is not
clear that this loses very much information. Previous work on experimentation
has focussed on internally formalizable domains in order to avoid the
bottleneck of a low-bandwidth interface to the outside world, so this project
is a welcome entry into an area deserving exploration. I'm eager to see the
results; I'm sure Tom and Bruce are too!

Pat Langley, Jan Zytkow, Herb Simon (CMU) "Mechanisms for Qualitative and
Quantitative Discovery" reports on four discovery programs. BACON.6 extends
previous BACON.i by finding quantitative functional relationships in noisy
data. The other three programs induce qualitative theories from collections of
chemical reactions: GLAUBER discovers the concepts of acids, bases, and salts;
STAHL infers the composition of substances, recreating something like
phlogiston theory; and DALTON infers the number of atoms per molecule. The
next step is to integrate these programs.

Saul Amarel (Rutgers) "Program Synthesis as a Theory Formation Task: Problem
Representations and Problem Methods" describes a program that induces the
algebraic structure of a relation represented as a set of tuples.

Donald Michie (U. of Edinburgh) "Inductive Rule Generation in the Context of
the Fifth Generation" provocatively suggests that to interface usefully with
human experts, induction systems should produce "brain-compatible" results that
are both human-understandable and "mentally executable."

Paul Utgoff (Siemens CRS) "Adjusting Bias in Concept Learning" discussed his
Ph.D. work on getting LEX to modify its inductive bias, defined as the
knowledge that causes a learner to choose one hypothesis over another. LEX's
bias is determined by its pattern language for describing classes of
integration problems. Paul's program infers new terms like "odd" or
"twice-integrable" based on analysis of worked-out examples, and figures out
how to assimilate them into the language.

Bernard Silver (U. of Edinburgh) "Learning Equation Solving Methods from
Worked Examples" describes LP, a program that solves difficult algebraic and
trigonometric equations better than many of us, and learns new problem-solving
"schemas" from worked-out examples. LP apparently derives its power from a
well-chosen abstraction function that describes each equation in terms of its
"characteristic 4-tuple" (number of occurrences of unknown; type of function
symbols, e.g. trig; single equation vs. disjunction; top-level connective).
Essentially, LP performs means-ends analysis in the abstracted space: a
difference between two tuples indexes a collection of operators for reducing
it. I view LP as learning what order to reduce differences, but if you want
Bernard's view of the matter you should read his paper.



3.2. Dealing with Noise
Ross Quinlan (New South Wales Institute of Technology) "Learning from Noisy
Data" reports some interesting empirical results from introducing controlled
amounts of noise into the training and test data for a binary classification
system that induces decision trees. By storing in each leaf node of the
decision tree the proportion of positive instances among the objects classified
under that node, the system identifies which attributes classify the data most
reliably -- i.e., in some sense it learns about the noise. Among several
surprising results: if it is known that the test data will be noisy, it is
actually better to use noisy training data! Such results have important
implications: for example, if a medical diagnosis system is to be built by
induction from a medical database and applied to patients whose symptoms are
reported unreliably, it may actually perform better if the database is munged a
bit first. Of course further work is needed to analyze why Quinlan's system
behaves this way, and what class of induction systems will behave similarly.

Michael Lebowitz (Columbia) "Concept Learning in a Rich Input Domain" induces
predictive stereotypical patterns from event descriptions such as news stories.
An interesting aspect of his approach is the ability to induce generalizations
based on noisy data.

Casimir Kulikowski (Rutgers) "Knowledge Acquisition and Learning in EXPERT"
describes how the SEEK system addresses the important practical problem of
debugging a large collection of expert rules. SEEK extends the contributions
of Randy Davis' TEIRESIAS to "knowledge acquisition in the context of a
shortcoming in the knowledge base." SEEK experiments by perturbing rules, and
uses the number and type of resulting errors with respect to stored cases to
suggest possible improvements. It gathers statistics on the "missing
components" that prevent rules from firing when they should, using work by
Politakis on statistical credit assignment.

John Holland (U. of Michigan) "Escaping Brittleness" describes recent results
in his continuing work on genetic learning algorithms. These methods exploit
parallelism and ideas from ecology and capitalism, and are actually producing
usable application systems for arm-eye coordination tasks. I enjoy John's work
because it is so mind-bendingly different from what the rest of us do. I
suspect it may serve as an elegant simplified model to understand the
computational aspects of molecular biology and evolution, by bridging the gap
between our standard computational metaphors (subroutine call, naming, symbolic
processing) and as-yet undeciphered biological mechanisms (enzymes, codons,
complex feedback systems).



3.3. Logic-based Work
Clark Glymour, Kevin Kelly, and Richard Scheines (U. of Pittsburgh) "Two
Programs for Testing Hypotheses of any Logical Form" implements Hempel's
confirmation relation and extends it to handle partial confirmation. The
resulting programs tell whether (or to what extent) a given set of propositions
confirms a given inductive hypothesis.

Claude Sammut and Ranan Banerji (St. Joseph's University) "Hierarchical
Memories: An Aid to Concept Learning" describes a logic-based system for
inserting new categories into is-a hierarchies.

Y. Kodratoff and J.-G. Ganascia (Universite de Paris) "Learning as a
Non-deterministic but Exact Logical Process" describes a logic-based
generalization algorithm that I got the impression extends previous work by
Hayes-Roth and Vere with respect to many-to-one mappings.



3.4. Cognitive Modelling
Derek Sleeman (Stanford) "Inferring (MAL) Rules from Pupil's Protocols" was
an amusing report on automating the induction of students' buggy algebra
productions from their incorrectly worked-out problems. Apparently the
students are so eager to achieve the goal -- get the unknown on one side and a
number on the other -- that they resort to a powerful problem-solving technique
I like to call "ends-justifies-the-means analysis."

Kurt VanLehn (Xerox PARC) "Validating a Theory of Human Skill Acquisition"
reports on some similar work: modelling students' subtraction errors in terms
of the hypothesized induction methods whereby they infer the subtraction
algorithm from the teachers' examples. The theory posits several "felicity
conditions" -- conventions on teacher-student communication that facilitate the
induction process. One such condition is the "one disjunction per lesson"
rule. This work, in the Buggy-Debuggy tradition, uses a flowgraph
representation in contrast to Sleeman's production system representation.

Bob Berwick (MIT) "Domain-specific Learning and the Subset Principle" used
certain linguistic data as evidence that human languages conform to constraints
on how much humans induce from each example in a sequence. Unfortunately, as a
non-linguist I was unable to induce anything from the examples Bob used in his
talk. This datum may actually constitute further evidence in support of his
theory.

Douglas Medin (U. of Illinois) "Linear Separability and Concept Naturalness"
presents evidence that linearly separable categories are not generally easier
for people to learn.

Doug Hofstadter (Indiana University) "The Architecture of Jumbo" models the
process of permuting a string of letters into a recognizable word. Doug's talk
started by borrowing the last name of a Cognitive Modelling panelist (Janet
Kolodner) as an example and suggesting that it had no one-word permutation. I
immediately set about looking for one, and by the end of the talk had found
"elk-donor" (one who donates elks) and "do-Kloner" (one who implements or
applies the representation language KLONE) as well as several two-word phrases
of varying social and orthographic acceptability ("lone dork," "red kolon," "no
kolder," ...). Doug's talk certainly wins the "Giving the Audience Something
to Do to Keep it Amused During Your Talk" award. Unfortunately I can't tell
you what it was about, except that by analogy with the concept of "spoonerism"
he introduced such new concepts as "forkerism" and "kniferism."

Mallory Selfridge (U. of Connecticut) "How Can CHILD Learn About Agreement?
Explorations of CHILD's Syntactic Inadequacies" was the last talk of the
conference. Mallory spoke without slides, allowing worn-out members of the
audience could close their eyes and concentrate better.

Gerry Dejong (U. of Illinois) "An Approach to Learning from Observation"
describes his continued work on learning by composing schemas to explain
observed event sequences. This might be classified under "Knowledge
Transformation" since it consists of recognizing and naming specialized
combinations of existing concepts. Gerry also composed the official workshop
puzzle to divert participants when not listening to presentations, thereby
winning the "Giving the Audience Something to Keep It Amused During Everybody
Else's Talk" award.



3.5. Lessons
Some induction systems use user input to help fill in a gap in a chain of
reasoning otherwise derivable by existing rules [Silver, Sleeman, Kulikowski].

The incremental learning theme evidenced in the work on analogy also appeared
in induction systems that construct and refine hypotheses [Amarel, Dietterich &
Buchanan, Holland, Lebowitz].

The real-world problem of noisy data is receiving attention [Langley,
Quinlan, Lebowitz], and statistical induction is being used in interesting ways
[Holland, Kulikowski].

4. Panel Discussion: Cognitive Modelling -- Why Bother?
The first day of the workshop ended with an evening panel on "Cognitive
Modelling of Learning Processes." Having arisen at 4am Pacific Sleepy Time,
sat through morning and afternoon sessions filled with paper presentations, and
partaken of three meals filled with shop talk, I found my capacity to absorb
the insights of the panelists severely diminished. I did find the panel on
cognitive modelling a convincing argument for the importance of combined
aural-visual input in human learning, insofar as some of the panelists didn't
use slides and now I can't remember what they said. On the other hand, I'm not
doing too well at remembering what any of them said. In fact, I had trouble
reconstructing who was on the panel. All of which illustrates at least one
area for applying cognitive modelling to AI: investigating, and preventing,
the process whereby researchers forget what they see and hear at AI
conferences.

Fortunately Jaime Carbonell, who moderated the panel, was kind enough to
supply a description of what transpired while I was lapsing in and out of my
stupor:

``I started the discussion by noting several examples where work in cognitive
modelling had inspired and influenced work in machine learning (e.g., Earl
Hunt's work on concept acquisition helped motivate work on symbolic
descriptions in learning over earlier neural net approaches), and a bit
vice-versa. The production systems paradigm emerged from the joint concerns of
both camps. Then I asked the panelists to draw from their own work to
substantiate or criticize the cross-fertilization hypothesis.

``Paul Kline (Texas Instruments) presented the major result in his thesis:
Concept acquisition in humans is clearly not commutative with respect to the
order of presentation of examples. This is important, as most of the recent
work in machine learning no longer assumes commutativity as a requisite
constraint (e.g. learning by analogy is clearly governed by past knowledge and
experience -- what you know structures what you learn and what you pay
attention to in new information).

``Janet Kolodner (Georgia Institute of Technology) argued in favor of
case-based reasoning in expert systems design, where episodic traces and
generalizations therefrom may constitute the primary form of expertise
acquisition. She argued in favor of using human memory structuring principles
as a guiding criterion for modelling expertise and its acquisition.

``John Anderson (CMU) played the role of devil's advocate, saying "Why should
you machine learning people handcuff yourselves by known restrictions on human
learning?" and sketched a case for separating the lines of reserach.

``Paul Rosenbloom (CMU) and Dedre Gentner (BBN) served as discussants in the
panel and addressed Anderson's concerns, mostly refuting the argument by
counterexamples and by suggesting that the only known existence proof of robust
learning behavior is to be found in humans and other biological systems, and
therefore ought to serve as inspiration for machine learning, rather than as a
constraint. Anderson quickly agreed, since he didn't really believe his
straw-man position anyway. The discussion went on to conclude that problem
solving, memory organization, and learning are inextricably woven phenomena,
and the study of each impacts strongly upon the others.''

5. Panel Discussion: "Machine Learning -- Challenges of the 80's"
At one late-night bull session, several of us were trying to figure out how
to spice up the final panel discussion. A panel whose members agree on
everything is boring; perhaps a panel discussion shouldn't be considered a
complete success unless it comes to blows. What issue might provoke some
edifying disagreement?

Pat Langley suggested distinguishing between "Darwinian" induction systems,
which generate hypotheses independent of the environment in which they're
tested, and "Lysenkoist" systems, where the hypothesis generator is sensitive
to the result of such tests. Reincarnating the Darwin-Lysenko controversy
would have served to replace the now-passe' "declarative vs. procedural" and
"neat vs. scruffy" controversies as a source of much meaningless and
entertaining debate, while adding a classy touch of history. Unfortunately we
all chickened out, and the panelists found little to disagree about. But they
tried....

Saul Amarel suggested that the term "analogy" be banned and replaced with
precise terms denoting the processes of identifying a relevant analogy,
importing it into a new area, assimilating it, and repairing it. Jaime
Carbonell responded promisingly ("Oh come on, Saul..."), but eventually they
degenerated into agreement. My personal feeling is that a precise definition
is still premature and the field can benefit from looking for more patterns of
reasoning that might be called "analogy;" I suspect there are some important
ones not used in current analogy systems.

While pondering Saul's provocative stance on this issue, I failed to
concentrate fully on the research directions he next proposed. Fortunately Tom
Dietterich filled me in. Here's one of my favorites: ``Another fruitful avenue
for future research is to develop problem-solving environments in which experts
can be automatically observed while they solve problems. In this way, programs
might be able to capture expertise by "watching over the shoulder" of experts.
This is a good area for research on psychology and man/machine interactions,
too.'' Tom Mitchell mentioned to me that he is planning to do something like
this in the VLSI domain. It's hard to find a hotter combination of topics than
VLSI design and machine learning! Saul would also like to see work on
real-world scientific theory formation problems in areas like physics and
biology, and a new MetaDendral-like project.

Donald Michie made a good try at generating some dissension by suggesting
that the next Machine Learning workshop should be restricted to papers that
report complete results, but apparently nobody was brave enough to disagree.

Doug Lenat identified some sources of power for current and future success in
the field:

- Synergy of learning programs

* with humans: EURISKO is one example of a cooperative discovery
system. Learning systems will want fancy front ends with
natural language, visual, and non-verbal I/O; conversely, fancy
front ends will need to induce models of a session or user.

* with AI: MetaDendral illustrates a performance program with a
learning component for improving itself.

* with other learning programs: One lesson of early learning
research is that no single general technique suffices by itself;
progress requires combining them.

One way to provide such synergy is to package learning methods into
tools usable by other researchers and their systems, somewhat like
the way certain program analysis methods have been packaged into
Interlisp's Masterscope. Doug plans to package EURISKO in usable
form and distribute it to the AI community in a year or so, which is
great news.

- Analogy

* as a paradigm for knowledge acquisition: Help automate the
find-copy-edit technique often used to construct new schemas
manually by adapting existing ones.

* as a technique for suggesting plausible approaches based on
similar past problems: This will require a broad knowledge base
of common sense and facts, along the lines of Alan Kay's new
project at Atari to encode an encyclopedia [see IJCAI83].

- Heuretics: The study of heuristics will both require and help produce
a broad base of heuristics.

- Representation: Only a few basic representations are now known;
automatic change of representation will require the kind of
self-modelling, -monitoring, and -modifying systems discussed in the
Cognitive Economy paper.

- Parallelism: VLSI offers obvious potential.

- Morphological analysis: There are other natural learning systems
besides human cognition, including the immune system and evolution;
what can they teach us?

During the panel discussion, Pat Winston observed that the workshop
represented a healthy balance between different types of work, such as
experimental and theoretical, analytic and empirical, basic and applied, etc.
To which Donald Michie added "good and bad." It might be mentioned that the
winner in the "Activities for Keeping Amused Between (and at the Expense of)
Other People's Talks" category consisted of exchanging nominations for the
"Worst Talk" category. The overall quality of the workshop was better than
most conferences, but there was intense competition in this category
nonetheless.

Winston foresees a danger of success in machine learning leading to the
"Expert Systems Syndrome," with reporters and venture capitalists getting
underfoot and interfering with scientific progress by tempting researchers away
from their work to fame and riches. (Some of us would like to know where we
can sign up for this.) Pat also sees a great opportunity for supercomputers to
qualitatively change how we think and do research, analogous to the way fast
computers liberated early work on vision from the limitations of running 3-by-3
operators over 256-by-256 arrays.

Ryszard Michalski, panel moderator and intrepid Workshop Chairman, called for
the development of a body of theory to help identify isomorphic ideas and
establish a uniform terminology for them. He also emphasized the importance of
general methods that can be applied to the problem of knowledge acquisition in
expert systems.

In response to repeated pleas to extend MetaDendral, Bruce Buchanan pointed
out that he and his co-workers quit working on Dendral and MetaDendral after
several years largely because they were just plain tired of it. Having made
this candid admission, he immediately left town.

One question that I raised in bull sessions and Pat Langley posed to the
panel is whether the time is yet ripe for a large-scale machine learning effort
analogous to the ARPA Speech Understanding Project. This question does not yet
have a clear answer. On the one hand, we are running up against limitations on
the kind of learning attainable in a one-Ph.D.-student project. On the other
hand, integrating multiple learning methods in a single system would appear to
require much tighter coupling than was necessary, say, between the various
knowledge source modules in Hearsay-II. In particular, learning methods tend
to be representation-specific. The development of a learning system employing
multiple evolving methods, each with its own evolving representation(s), would
be very difficult to manage.

Moreover, much thought must be given to the goals of such a project. The
ARPA Speech Understanding Study Group formulated clear goals for the kind of
system to be developed. What would be appropriate goals for a learning system?
They would have to be defined so as to preclude simply programming in the skill
to be learned. What would be gained by building a large learning system? As
Herb Simon points out in his provocative chapter in Machine Learning, the
knowledge transfer function that learning serves for people can be fulfilled
much more easily in computers by copying code. If the system learns things
that people already know, why would this be better than programming them in by
hand? If the system is supposed to discover things that people don't already
know, how can one set realistic goals for its performance? Although many such
devil's-advocate questions remain, I find the problem of designing a
Hearsay-style learning system a useful mental exercise for thinking about
research issues and strategies.

6. A Bit of Perspective
No overview would be complete without a picture that tries to put everything
in perspective:


-------------> generalizations ------------
| |
| |
INDUCTION COMPILATION
(Knowledge Discovery) (Knowledge Transformation)
| |
| v
examples ----------- ANALOGY --------> specialized solutions
(Knowledge Transfer)

Figure 6-1: The Learning Triangle: Induction, Analogy, Compilation

Of course the distinction between these three forms of learning breaks down
under close examination. For example, consider LEX2: does it induce
heuristics from examples, guided by its definition of "heuristic," or does it
compile that definition into special cases, guided by examples?

7. Looking to the Future
The 1983 International Workshop on Machine Learning felt like history in the
making. What could be a more exciting endeavor than getting machines to learn?
As we gathered for the official workshop photograph, I thought of Pamela
McCorduck's Machines Who Think, and wondered if twenty years from now this
gathering might not seem as significant as some of those described there. I
felt privileged to be part of it.

In the meantime, there are lessons to be absorbed, and work to be done....

One lesson of the workshop is the importance of incremental learning methods.
As one speaker observed, you can only learn things you already almost know.
The most robust learning can be expected from systems that improve their
knowledge gradually, building on what they have already learned, and using new
data to repair deficiencies and improve performance, whether it be in analogy
[Burstein, Carbonell], induction [Amarel, Dietterich & Buchanan, Holland,
Lebowitz, Mitchell], or knowledge transformation [Rosenbloom, Anderson, Lenat].
This theme reflects the related idea of learning and problem-solving as
inherent parts of each other [Carbonell, Mitchell, Rosenbloom].

Of course not everyone saw things the way I do. Here's Tom Dietterich again:
``I was surprised that you summarized the workshop in terms of an "incremental"
theme. I don't think incremental-ness is particularly important--especially
for expert system work. Quinlan gets his noise tolerance by training on a
whole batch of examples at once. I would have summarized the workshop by
saying that the key theme was the move away from syntax. Hardly anyone talked
about "matching" and syntactic generalization. The whole concern was with the
semantic justifications for some learned concept: All of the analogy folks
were doing this, as were Mitchell, DeJong, and Dietterich and Buchanan. The
most interesting point that was made, I thought, was Mitchell's point that we
need to look at cases where we can provide only partial justification for the
generalizations. DeJong's "causal completeness" is too stringent a
requirement.''

Second, the importance of making knowledge and goals explicit is illustrated
by the progress that can be made when a learner has access to a description of
what it is trying to acquire, whether it is a criterion for the form of an
inductive hypothesis [Michalski et al] or a formal characterization of the kind
of heuristic to be learned for guiding a search [Mitchell et al].

Third, as Doug Lenat pointed out, continued progress in learning will require
integrating multiple methods. In particular, we need ways to combine analytic
and empirical techniques to escape from their limitations when used alone.

Finally, I think we can extrapolate from the experience of AI in the '60's
and '70's to set a useful direction for machine learning research in the '80's.
Briefly, in AI the '60's taught us that certain general methods exist and can
produce some results, while the '70's showed that large amounts of domain
knowledge are required to achieve powerful performance. The same can be said
for learning. I consider a primary goal of AI in the '80's, perhaps the
primary goal, to be the development of general techniques for exploiting domain
knowledge. One such technique is the ability to learn, which itself has proved
to require large amounts of domain knowledge. Whether we approach this goal by
building domain-specific learners (e.g. MetaDendral) and then generalizing
their methods (e.g. version space induction), or by attempting to formulate
general methods more directly, we should keep in mind that a general and robust
intelligence will require the ability to learn from its experience and apply
its knowledge and methods to problems in a variety of domains.

A well-placed source has informed me that plans are already afoot to produce
a successor to the Machine Learning book, using the 1983 workshop papers and
discussions as raw material. In the meantime, there is a small number of extra
proceedings which can be acquired (until they run out) for $27.88 ($25 + $2.88
postage in U.S., more elsewhere), check payable to University of Illinois.
Order from

June Wingler
University of Illinois at Urbana-Champaign
Department of Computer Science
1304 W. Springfield Avenue
Urbana, IL 61801

There are tentative plans for a similar workshop next summer at Rutgers.

------------------------------

End of AIList Digest
********************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT