Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 1 No. 14
Machine Learning List: Vol. 1 No. 14
Sunday, Dec 3, 1989
Contents:
Book announcement: Connectionistic Problem Solving
IJCAI Review:representation change in induction
CFP: First Japanese Knowledge Acquisition Workshop (JKAW90)
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
of Volume 1 may be FTP'd from /usr2/spool/ftp/pub/ml-list/V1/<N> or N.Z where
N is the number of the issue; Host ics.uci.edu; Userid & password: anonymous
- ----------------------------------------------------------------------
Subject: Book announcement: Connectionistic Problem Solving
From: Steve Hampson <hampson@ICS.UCI.EDU>
Date: Tue, 28 Nov 89 13:36:25 -0800
Connectionistic Problem Solving
Steven Hampson, UC Irvine
Birkhauser: Boston
ISBN 0-8176-3450-9
is now available from:
Birkhauser Boston, Inc. Order Dept.
c/o Springer Verlag
44 Hartz Way
Secaucus, NJ 07094
1-800-777-4643
This book develops a connectionistic model of adaptive problem solving. In
the process, various computational issues are identified that are
susceptible to formal analysis and that can be related directly to the
biological structures and processes involved in biological problem solving.
Abstract problem solving is a well-defined domain for the formal
investigation of intelligent behavior, and maze learning, used here as a
concrete example of problem solving, is a traditional tool for
psychologists and neuroscientists studying animal behavior. Consequently, a
substantial body of physiological and behavioral data is available to
motivate the construction of computational models. Conversely, functional
issues arising in model building help organize these often disparate
biological results. At the most general level, the book addresses issues
concerning the efficient learning and representation of categories, and the
assembly of action sequences to achieve specific goals. Various
approaches, often with complementary characteristics, are considered. An
extensive bibliography, ranging from the work of Tolman to recent
connectionist theory, provides access to relevant work in the
neural-modelling and animal learning-theory literature.
- ----------------------------------------------------------------------
Subject: IJCAI Review: representation change in induction
Date: Sat, 02 Dec 89 17:09:45 -0800
From: "David W. Aha" <aha@ICS.UCI.EDU>
Long ago, I volunteered to provide brief reviews of the four papers published
at IJCAI 1989 that were presented in the machine learning/knowledge
acquisition session on representation change in induction. Here are outlines
and brief comments on these articles. While I hope they are relatively
accurate, I invite the authors to respond with corrections and comments.
1. Learning DNF by Decision Trees
Giulia Pagallo, UC Santa Cruz
This paper introduces the FRINGE algorithm for learning decision trees. It
adaptively enlarges the given attribute set with higher-order attributes.
The purpose is to solve the _replication problem_, which concerns having
tests that an instance satisfies a term in multiple places in the d-tree.
FRINGE builds the d-tree, analyzes it to find candidates for higher-order
features, re-describes the training instances in terms of the extended
feature set, and builds another tree from them. This process repeats
until no new candidate features are found or until a maximum (parameterized)
number of features have been created. Each added feature is the Boolean
conjunct of the final two feature/value pairs in the path to a positively-
classified leaf. Disjunctions arise through combinations of conjunctions
involving negated features.
FRINGE and a generic d-tree algorithm were applied to 9 domains whose
target concepts are describable in DNF. The results for final classification
accuracy and tree size (number of internal nodes) are impressive: FRINGE
outperformed the other algorithm in each domain. They were also applied to
one domain where the training set's noise level (once for all-attributes and
once for the class) was the independent variable. FRINGE's performance
wasn't as impressive in this demonstration, but it still significantly
outperformed the other algorithm. Pagallo noted similar comparative results
in applications to one of Quinlan's thyroid disease domains, the noisy LED
Display domain, and to Schlimmer's mushroom domain, which suggests that
FRINGE performs well in applications with nominal and continuous-valued
features.
Notes and Questions:
1. Pagallo notes that the approach is applicable to learning concepts,
possibly noisy, describable by "small" DNF expressions. It won't
improve decision trees otherwise.
2. Can a variant of FRINGE work incrementally? What are the tradeoffs?
3. How does FRINGE process the missing attribute values in the thyroid
diagnosis database?
4. While FRINGE's results are given for classification accuracy and
d-tree size, no results are given for some other interesting
performance measures (e.g., number of iterations, number of constructed
features).
5. The scales in Figure 3 appear to have been mis-identified and switched,
as were the values in the lowest pair of leaf nodes in Figure 2.
2. Constructive Induction on Decision Trees
Christopher J. Matheus and Larry Rendell, University of Illinois
This paper also presents a d-tree algorithm, CITRE, that constructs
features. The authors also give a four-part framework for feature
construction. Given a set of constructor operators and an (initial) set of
features, the framework and some typical instantiations are:
1. Detection of when feature construction should be performed
A. none
B. analysis of the initial data set
C. analysis of the concept description
2. Selection of constructors (operator/feature tuples): ordering biases
A. algorithm biases
B. training set biases
C. concept-based biases
D. domain-based biases
3. Generalization of the constructors
A. none
B. constructive compilation (deductive)
C. constructive induction
4. Evaluation of the new features
A. none
B. request evaluation from teacher
C. automatically order the features and keep the best ones
The authors survey the tradeoffs involved with each option for each of
the four framework components. Their algorithm, CITRE, is similar to
FRINGE in that it iterates through the two step process of constructing
a d-tree and using it to suggest new features. However, CITRE differs in
its use of domain knowledge, feature generalization, and feature
construction. CITRE is defined as follows:
1. Detection: whenever disjuncts are found
2. Selection:
A. Propose conjoining all possible pairs of attribute-values
that lie along a path from the root to a positive leaf
B. Use simple domain knowledge to constrain the selection
3. Generalization: changing constants to variables
4. Evaluation: a user-defined maximum number of features are maintained,
where the ordering is determined from the information-theoretic gain
from using this feature to partition the _entire_ training set
CITRE stops iterating when no new features are constructed.
CITRE was applied to a tic-tac-toe domain, where the training instances
are examples of a winning position for "x" or "o". The generalization
and selection strategies were shown to both significantly improve
the accuracy of CITRE, but only the selection strategy resulted with
significantly reduced number of terms considered at each iteration
(down from an average of 367 to 27). There was no significant difference
between using only the selection strategy and using both strategies
for any of the performance measures (i.e., accuracy, number of constructors
considered, and number of internal nodes). The authors also note that
CITRE improved classification accuracies in some experiments with learning
random, Boolean functions.
Notes and Questions:
1. CITRE's choice for evaluating the proposed features may differ if
their information gain was evaluated on subsets of the training data.
How can this be performed efficiently?
2. CITRE's results were detailed only on the tic-tac-toe domain. How will
its components change for other domains? Will its choice for
selection result with far too many proposed new features? How will
the maximum number of retained features vary with regard to the
characteristics of the application domain?
3. Is most of the power of CITRE in its selection component? While the
generalization strategy resulted with improved accuracies, it did
not improve CITRE's efficiency. What domain characteristics are
required for the benefits of constructor generalization to be
realized?
4. Optimal behavior on the application domain was not described. How
close is CITRE's behavior to optimal? How well do comparable
algorithms perform in this domain?
5. CITRE may be too inefficient (propose too many new constructors) when
the selection strategy is weak, which will occur whenever the domain
knowledge is insufficient. Perhaps the criteria for proposing
new features should be more conservative in these situations.
6. Were the fringe feature conjunctions more useful than other interior
conjunctions? Under what conditions will this occur? When will
CITRE require fewer iterations than FRINGE? How many iterations did
CITRE require when applied to the tic-tac-toe domain?
3. Principled Constructive Induction
Pankaj Mehra, Larry Rendell, and Benjamin W. Wah, University of Illinois
This paper suggests using Watanabe's notion of inverted spaces to guide
the constructive induction process. In an inverted space, the examples
define the axes of the features define points in the space. The authors
attempt to explain how a hyper-cube in the inverted space can be used
to determine whether features are linear separators in the (original)
feature space. If feature f lies outside the hyper-cube in the inverted
space defined by the positive examples and inside the hyper-cube in
the inverted space defined by the negative examples, then f is a
separator of these examples.
One problem with this approach is that complex computations can result
when large numbers of examples yield inverted spaces with high
dimensionality. The authors are confident that methods for selecting
representative examples will sufficiently reduce their number to simplify
these computations.
In an inverted space, an equiparameter line that equates all the examples'
values yields maximally uniform values in that set of examples. The goal
of this work is to define an algorithm that constructs features which (1)
lie close to the equiparameter lines in both of the inverted spaces (i.e.,
the one for positive and the one for negative instances) and (2) satisfy
the inclusion/exclusion test for the separating hyper-cube. A brief
example is given in which a feature is constructed for the Xor concept.
Notes and Questions:
1. No empirical evidence is given suggesting that their approach will
efficiently learn useful features.
2. This is an exciting approach, derived from the pattern recognition
literature, that proposes to tackle several problems simultaneously.
If it proves successful, it may lead to a highly sophisticated and
mathematically insightful means for inducing features.
3. Why does the hyper-rectangle always point downwards? What are the
reasons why the hyper-rectangle's vertex is fixed at
(Theta,Theta,...,Theta)?
4. The authors pointed out one mistake in their paper: What was said in
the paper (at different points) is that (i) we are interested in
features that lie at small angles of the equiparameter line; and,
(ii) we are interested in features that lie inside the hyper-cube in 1
space and outside it in another.
Consider the following picture:
|
IV | II
|
--------------(theta,theta)------
|
|
| III
I |
|
Features that lie in regions of type I in one inverted space and in
regions of type II in the other GUARANTEE separability. In general,
nothing can be said about features that lie inside/outside the cube in
one space and outside/inside the corresponding cube in the other space
(for example, features in regions III and IV). (Interested readers can
contact mehra@aquinas.csl.uiuc.edu for further details.)
5. Figure 5: E- space is on the left and E+ is on the right.
4. Improving Efficiency by Learning Intermediate Concepts
James Wogulis and Pat Langley, UC Irvine
This paper introduces RINCON (Retaining INtermediate CONcepts), which
deductively extends domain theories from examples with the goal of
maximizing classification efficiency. Their thesis is that
efficiency benefits result from saving intermediate concepts. All
instances and concepts are represented as logical implications.
These are partially ordered by generality in a hierarchy which allows
nodes to have multiple parents. RINCON incrementally updates this
hierarchy when it processes an instance. When an instance does not
exactly match some node, a search is made for the least upper bounds
(LUB) of the instance and the set of most general concepts that do
not match it. A LUB is chosen (from this set of LUBs) that can be
used to re-express the most number of concepts. The chosen LUB and
the generalized instance are re-expressed using that LUB are then
added to the hierarchy. All concepts/nodes in the hierarchy that can
be simplified by using the newly found LUB are re-expressed in terms of it.
The authors summarize two empirical studies, where the dependent variable
is the amount of work required to match or reject an instance, as measured
by a "join" (which occurs when two lists of bindings are combined). They
show that the flat domain theory, in which no learning takes place, is less
efficient at classifying instances in two applications than is the theory
which results from learning intermediate concepts, both for processed and
new instances.
RINCON's behavior can be viewed as an inverse of EBL. While EBL
yields operational concept descriptions given domain theories, RINCON
yields domain theories given operational concept descriptions
(instances). RINCON is a purely deductive, incremental learning
algorithm which works with relational data. It updates the nodes in
its domain theory as new intermediate concepts are found.
Notes and Questions:
1. It's odd that new instances are, on the average, classified more
efficiently than processed instances. This seems to be what Figure 2
shows. However, this interpretation is wrong. The lines for the
unseen instances are simply the time required to FAIL to match.
(Recall that, since RINCON doesn't infer beyond its data, it will
match on only previously processed instances.)
2. Is there a point at which too many intermediate concepts result in
a decrease in efficiency?
3. Under what conditions can RINCON be expected to result with increased
matching efficiency?
4. An evaluation function is needed to judge the utility of the
intermediate concepts to determine which ones are worth saving.
5. How should RINCON learn terms other than conjuncts?
6. The authors plan to use RINCON in a combined empirical/analytic
learning algorithm. How will inductions be performed? How will
the resulting concepts be evaluated?
7. The concept hierarchy's internal nodes are always conjunctions.
However, the highest-level nodes can be disjunctive. The authors plan
to develop extensions of RINCON to allow internals to be disjunctive.
- ----------------------------------------------------------------------
From: motoda@harl86.harl.hitachi.co.jp (Motoda Hiroshi)
Newsgroups: news.announce.conferences
Subject: CFP: First Japanese Knowledge Acquisition Workshop (JKAW90)
Date: 30 Nov 89 01:00:38 GMT
Call for Participation
FIRST JAPANESE KNOWLEDGE ACQUISITION FOR KNOWLEDGE-BASED SYSTEMS
WORKSHOP
Co-Sponsored by
Advanced Research Laboratory, Hitachi, Ltd.
Kansai Institute of Information System
In Cooperation with
American Association for Artificial Intelligence
Information Processing Society of Japan
Japanese Society for Artificial Intelligence
Japan Society for Software Science and Technology
The Institute of Electronics, Information and Communication
Engineers
Kyoto International Conference Hall (Kyoto)
October 25 - 26, 1990
Advanced Research Laboratory, Hitachi Ltd. (Tokyo)
October 29 - 31, 1990
A problem in the process of building knowledge-based systems is
acquiring and modeling appropriate problem-solving knowledge. The
objective of this workshop is to assemble theoreticians and
practitioners of AI who recognize the need for developing methods and
systems that assist the knowledge acquisition process.
The workshop will be in two parts: a two-day open meeting in Kyoto and
three-day closed workshop in Tokyo. To encourage vigorous interaction
and exchange of ideas the closed workshop will be kept small - about
40 participants, one author for each paper accepted. Some papers will
be presented at the open meeting and the remainder in the closed
workshop. There will be Tutorial and invited talk sessions in the open
workshop.
Papers are invited for consideration in all aspects of knowledge
acquisition for knowledge-based systems, including (but not restricted
to):
o Transfer/modeling of expertise - systems that obtain and model
knowledge from experts.
o Transfer/modeling of expertise - manual knowledge acquisition
methods and techniques.
o Apprenticeship, explanation-based, and other learning systems;
integration of such systems with other knowledge acquisitIon
techniques.
o Methods for capturing design knowledge and requirements
o Issues in cognition and expertise that affect the knowledge
acquisition process.
o Extracting and modeling of knowledge from text.
o Eliciting and modeling knowledge from multiple sources.
o Integration of knowledge acquisition techniques within a single
system; integration of knowledge acquisition systems with other
systems (hypermedia, database management systems, simulators,
spreadsheets...).
o Knowledge acquisition methodology and training.
o Validation of knowledge acquisition techniques; the role of
knowledge acquisition techniques in validating knowledge-based
systems.
Five copies of a draft paper (up to 20 pages) should be sent to
Hiroshi Motoda before February 28th, 1990. Acceptance notices will be
mailed by May 30th. Camera-ready copies should be returned before
August 15th. A preprint volume will be distributed at the workshop.
There will be travel-and-expense awards for the best paper submitted
by students from overseas countries to cover a part of their travel
expenses. Please note if the paper should be considered for this
award.
Workshop Co-chairmen:
John Boose Brian Gaines
Advanced Technology Center Department of Computer Science
Boeing Computer Services University of Calgary
john@boeing.com gaines@calgary.cdn
Hiroshi Motoda Riichiro Mizoguchi
Advanced Research Laboratory Institute of Scientific and
Hitachi, Ltd. Industrial Research
Kokubunji, Tokyo 185, Japan Osaka University
motoda@harl.hitachi.co.jp miz@ei.sanken.osaka-u.ac.jp
International Program Committee
Tom Addis, University of Reading, UK
Guy Boy, Centre d'Etudes et de Recherches de Toulouse, France & NASA AMES
Jeffrey Bradshaw, Boeing Computer Services
B. Chandrasekaran, Ohio State University
William Clancey, Institute for Research on Learning, CA
Jean-Gabriel Ganascia, University Pierre et Marie Curie, France
Thomas Gruber, Stanford University
Koichi Hori, University of Tokyo
Nancy Johnson, Brunel University, UK
Georg Klinker, Digital Equipment Corp.
Shigenobu Kobayashi, Tokyo Institute of Technology
Yves Kodratoff, CNRS, France, & George Mason University
Marc Linster, GMD. Bonn, Germany
John McDermott, Digital Equipment Corporation
Ryszard Michalski, George Mason University
Katharina Morik, GMD, Bonn, Germany
Toyoaki Nishida, Kyoto University
Mark Musen, Stanford University
Bruce Porter, University of Texas at Austin
Ross Quinlan, Sydney University, Australia
Alain Rappaport, Neuron Data, USA
Mildred Shaw, University of Calgary
Hirokazu Taki, Institute for New Generation Computer Technology
Masanobu Watanabe, NEC Corporation
Bob Wielinga, University of Amsterdam, Holland
- ----------------------------------------------------------------------
END of ML-LIST 1.14