Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 3 No. 12
Machine Learning List: Vol. 3 No. 12
Thursday, July 25, 1991
Contents:
Comments on Dietterich's AAAI 91 Talk (and bibliography)
MACHINE LEARNING AND KNOWLEDGE ACQUISITION subsection of IEEE CompEuro 92
First International TC-12 Working Conference on Low Quality Information (LQI)
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
may be FTP'd from ics.uci.edu in /usr2/spool/ftp/pub/ml-list/V<X>/<N> or N.Z
where X and N are the volume and number of the issue; ID & password: anonymous
------------------------------
Date: Tue, 23 Jul 91 11:49:25 PDT
From: Wray Buntine <wray@ptolemy.arc.nasa.GOV>
Subject: Comments on Dietterich's AAAI 91 Talk (and bibliography)
Following Tom Dietterich's talk at AAAI, I thought the community might
like to get an annotated bibliography for some of the topics he mentioned.
Below I mention:
cross validation
minimum encoding
averaging multiple models
I've spent a great deal of time in implementation and theory of these
three topics, and Tom only had time to give a tiny and somewhat
skewed selection.
Wray Buntine
NASA Ames Research Center phone: (415) 604 3389
Mail Stop 244-17 fax: (415) 604 6997
Moffett Field, CA, 94035 email: wray@ptolemy.arc.nasa.gov
CROSS-VALIDATION (and other resampling techniques)
================
Tom gave this a big plug. This is traditionally used in two tasks:
(1) estimating subsequent prediction error ("out-of-sample")
(2) coping with over-fitting
My experience is that they are real good at (1), but I've managed to get
better ways of doing (2), for instance doing a proper job of
"averaging multiple models" (another topic Tom briefly mentioned).
Below gives some references. The experiments showing
how other methods are sometimes better at (2) are described in (Buntine 1990).
WARNING the theory of cross-validation and similar techniques only
says they give "near unbiased estimates" of (1),
which doesn't mean they are "optimal" in any sense;
and the theory gives no support that they are doing (2) well
"on average"; so don't be confused into thinking they are
"optimal" methods
Nevertheless, cross-validation is a neat trick that is framed in
a manner such that it can be applied to just about about empirical learning
problem. I use it as a bench-mark technique because I think better exist,
and from experience I know cross-validation works pretty well.
This is a mildly mathematical introduction to cross-validation, jack-knife,
boot-strap. Cross validation is also used
to prune trees in CART, but the description in CART is harder to understand.
NB. jack-knife and boot-strap are related resampling procedures
@article{efron:siam,
AUTHOR = "B.~Efron",
TITLE = "Computers and the theory of statistics:
thinking the unthinkable",
YEAR = 1979,
JOURNAL = "SIAM Review",
VOLUME = 21,
NUMBER = 4,
PAGES = "460--480"}
This probably the best introduction for the machine learning audience.
This book also gives some simplistic guidelines for choosing between
cross-validation/resampling, etc. on page 38. The authors should be
embarrassed that their magic numbers of 100 and 50 are independent
of the learning problem and its inherent complexities. Tsk, tsk.
@book{weiss.kulik:91,
AUTHOR = "S.M.~Weiss and C.A.~Kulikowski",
TITLE = "Computer Systems That Learn",
PUBLISHER = "Morgan-Kaufmann",
YEAR = "1991" }
Cullen Schaffer has a very nice paper appearing in the
IJCAI-91 Workshop that discusses the topic in a machine learning context.
A good paper to get you thinking. He argues cross-validation is just
another way of encoding a bias.
@inproceedings{schaffer:ijw,
AUTHOR = "C.~Schaffer",
TITLE = "Overfitting avoidance as bias",
YEAR = 1991,
BOOKTITLE = "IJCAI-91 Workshop on
Evaluating and Changing Representation in Machine Learning",
ADDRESS = "Sydney",
NOTE = "To appear" }
I did some comparative experiments several years ago now comparing
cross-validation with things like Bayesian methods, MDL and a
reimplementation of Quinlan's C4 on the task of decision tree pruning.
This appears in my thesis, and is currently under modification for a
journal. An advance copy is:
@techreport{buntine:trees,
AUTHOR = "W.L. Buntine",
TITLE = "Learning Classification Trees",
YEAR = 1990,
INSTITUTION = "RIACS and NASA Ames Research Center",
NUMBER = "FIA-90-12-19-01",
ADDRESS = "Moffett Field, CA",
NOTE = "Paper presented at Third International Workshop
on Artificial Intelligence and Statistics" }
MINIMUM ENCODING METHODS
========================
Tom briefly mentioned these before dismissing them. In these
methods, the coding scheme for the hypotheses corresponds to a prior
in the Bayesian sense, and the schemes are usually viewed mathematically
as approximate Bayesian methods.
Using a different coding scheme gives you a different answer, so you
have to think carefully about it. One way to alleviate this is to do
a scheme of the form:
\[
cost-of-coding-data + A * cost-of-coding-hypothesis
\]
and to use fancy methods to pick an appropriate value for the
tradeoff parameter A.
This is standard practice in regularization, Bayesian, and GCV
(generalized cross validation) methods.
Quinlan and Rivest allude to this in section 8 of their paper but
don't appear to be familiar with the literature. (I only just found this
out recently too!)
NB. with the parameter A used above, minimum encoding is beginning
to look like cross-validation in its tree implementation
NB. with the parameter A used above, the scheme loses its neat
"coding" interpretation, but no loss since this is just
embelishment intended to appeal to the information theoreticians
When done properly, these methods work well.
WARNING Minimum encoding methods are quite widely misapplied and
misinterpreted. Of course the state of the theoretical literature
hasn't really helped here. Some people forget that code-lengths
originate as "log-likelihoods", some people forget that noise-free
data has zero code length because its likelihood is 1.0, and some
people forget that there is no inherently correct "coding scheme"
so we really have to think carefully about the one we pick.
NOTE There are a whole bunch of good minimum encoding papers in
recent AI proceedings (AAAI'90,'91, ML'91, etc.). In general you
may be better off looking to earlier work outside of AI
if you want to learn the basics of these methods.
Here is the Quinlan and Rivest paper that Pednault and Dietterich
mentioned at AAAI. If you wish to read this in detail, please get the
next paper too, and maybe look at (Buntine 1990) too to get the whole
cross-section.
@article{quinlan.rivest:ic,
AUTHOR = "J.R. Quinlan and R.L. Rivest",
TITLE = "Inferring Decision Trees Using the Minimum Description
Length Principle",
JOURNAL = "Information and Computation",
YEAR = "1989",
VOLUME = 80,
PAGES = "227--248" }
Here is a paper listing modifications/corrections to Quinlan and Rivest's paper
from some of the founders of the "minimum encoding" field.
@techreport{wallace:trees,
AUTHOR = "C.S. Wallace and J.D. Patrick",
TITLE = "Coding Decision Trees",
YEAR = "1991",
INSTITUTION = "Monash University",
ADDRESS = "Melbourne",
TYPE = "Technical Report",
NUMBER = "151" }
This is a good theoretical overview, but pretty heavy going.
@article{barron.cover:it,
AUTHOR = "A.R.~Barron and T.M.~Cover",
TITLE = "Minimum Complexity Density Estimation",
YEAR = 1991,
JOURNAL = "IEEE Trans. on IT",
VOLUME = "??",
NOTE = "I suspect it should have appeared, my copy is:
Univ. of Illinois, Dept.\ of Stat.\ TR~28, 1989" }
There is a somewhat theoretical but much diluted and more accessible
discussion in sections 6.2 and 6.3. Does a comparatively simple
derivation of the various methods.
@techreport{buntine.weigend:bbp,
AUTHOR = "W.L. Buntine and A.S. Weigend",
TITLE = "Bayesian Back-Propagation",
YEAR = 1991,
INSTITUTION = "NASA Ames Research Center",
NUMBER = "FIA-91-22",
NOTE = "Submitted, also available on the neuroprose archive" }
AVERAGING MULTIPLE MODELS (and redundant knowledge)
=========================
This seems to be coming a trend at COLT'91. Historians and Tom might
note I finished writing my thesis on this same topic at the end of '89
using the same theory and implementing it in a real system.
Turns out, statisticians have been doing similar stuff too, and some
related ideas have appeared elsewhere. In my biased view this is a
hot topic because it actually includes things like minimum encoding
methods, etc., as a special case (see Buntine & Weigend '91).
Applies "averaging multiple models" to arbitrary boolean
formula in a noise free environment. I also handled
conjunctive rules in my thesis.
@inproceedings{buntine:ijcai89,
AUTHOR = "W.L. Buntine",
TITLE = "A Critique of the {V}aliant Model",
YEAR = 1989,
BOOKTITLE = IJCAI,
PAGES = "837--842",
PUBLISHER = "Morgan Kaufmann",
ADDRESS = "Detroit" }
I implemented a decision tree learning package that
approximates "averaging multiple models" on real data with noise, etc.
Its described here and also in (Buntine 1990) above.
@phdthesis{buntine:thesis,
AUTHOR = "W.L. Buntine",
SCHOOL = "University of Technology, Sydney",
TITLE = "A Theory of Learning Classification Rules",
YEAR = 1991 }
Develops general bounds for "averaging multiple models"
in a noise free environment. I suspect this may become a classic
reference. Implementation in real (i.e. noisy) enviroments
isn't easy, though, see my own work above.
@inproceedings{haussler:percept,
AUTHOR = "M. Opper and D. Haussler",
TITLE = "Generalised Performance of {B}ayes Optimal Classification
Algorithm for Learning a Perceptron",
BOOKTITLE = "{COLT}'91: 1991 Workshop on Computational
Learning Theory",
PUBLISHER = "Morgan Kaufmann",
YEAR = "1991",
NOTE = "To appear." }
Applies "averaging multiple models" to linear perceptrons in a noise free
environment. See the next paper too.
@inproceedings{haussler.kearns:colt,
AUTHOR = "D. Haussler and M. Kearns and R.E. Schapire",
TITLE = "Unifying Bounds on the Sample Complexity of {B}ayesian
Learning Using Information Theory and the {VC} Dimension",
BOOKTITLE = "{COLT}'91: 1991 Workshop on Computational
Learning Theory",
PUBLISHER = "Morgan Kaufmann",
YEAR = "1991",
NOTE = "To appear." }
An example of the same ideas in statistics, applied to logistic
regression (i.e. rather like single layer perceptron with
sigmoid activation in a noisy environment). Except this is implemented.
@article{stewart:stat,
AUTHOR = "L. Stewart",
TITLE = "Hierarchical Bayesian analysis using Monte Carlo
integration: computing posterior distributions
when there are many possible models",
YEAR = 1987,
JOURNAL = "The Statistician",
VOLUME = 36,
PAGES = "211--219" }
Here's some machine learning work on redundant knowledge. They also have a
paper recently in Int. Jnl. Man Mach. Studies.
@inproceedings{gams:ewsl89,
AUTHOR = "M. Gams",
TITLE = "New Measurements Highlight the
Importance of Redundant Knowledge",
YEAR = 1989,
BOOKTITLE = "Proceedings of the Fourth European Working Session on Learning",
ADDRESS = "Montpellier",
EDITOR = "K. Morik",
PUBLISHER = "Pitman Publishing",
PAGES = "71--80"}
Just to show you how pervasive the idea of redundant knowledge and
multiple models is, here is some work from philosophers in a social
science setting.
@article{spirtes:smr,
AUTHOR = "P. Spirtes and R. Scheines and C. Glymour",
TITLE = "Simulation Studies of the Reliability of Computer-Aided
Model Specification Using {TETRAD II}. {EQS} and {LISREL}
Programs",
JOURNAL ="Socialogical Methods and Research",
YEAR ="1990",
VOLUME = 19,
NUMBER = 1,
PAGES = "3--66" }
[THANKS WRAY- I'd like to hear other reactions to or summaries of AAAI and MLW
-MP]
------------------------------
Date: Tue, 23 Jul 91 09:33:08 +0200
From: Luc De Raedt <lucdr@cs.kuleuven.ac.be>
Subject: MACHINE LEARNING AND KNOWLEDGE ACQUISITION subsection of IEEE CompEuro 92
COMPEURO-92
IEEE 1992 International Conference on Computer Systems and Software Engineering.
4-8 May, 1992, The Hague, The Netherlands
===============================================================================
This conference with main themes Computers and Optimization, Parallel Processing
and Computational Algebra organizes subsections on
MACHINE LEARNING AND KNOWLEDGE ACQUISITION, COMPUTATIONAL LOGIC / LOGIC
PROGRAMMING, optimization and operations research, theory of computation,
parallel processing, databases, numerical mathematics, data security
and computer graphics
**********************************************************
* *
* subsections : *
* *
* COMPUTATIONAL LOGIC and *
* MACHINE LEARNING and KNOWLEDGE ACQUISITION. *
* *
**********************************************************
COMPUTATIONAL LOGIC / LOGIC PROGRAMMING
The subsection on COMPUTATIONAL LOGIC / LOGIC PROGRAMMING
features state of the art lectures by John W. Lloyd, Manuel Hermenegildo and
Pascal Van Hentenryck as well as a number of technical sessions on the topic.
To organize the latter, a subcommittee for handling research papers
in the area of COMPUTATIONAL LOGIC and LOGIC PROGRAMMING has been created.
It consists of
Maurice Bruynooghe (Belgium, Chair)
Francois Bry (Germany)
Pierre Deransart (France)
Mehmet Dincbas (France)
Seif Haridi (Sweden)
Luis M. Pereira (Portugal)
MACHINE LEARNING AND KNOWLEDGE ACQUISITION
The subsection on MACHINE LEARNING AND KNOWLEDGE ACQUISTION
features state of the art lectures by Yves Kodratoff and Stephen Muggleton
(tentative) as well as a number of technical sessions on the topic.
To organize the latter, a subcommittee for the handling research papers
in the area of MACHINE LEARNING AND KNOWLEDGE ACQUISITION has been created.
It consists of
Francesco Bergadano (Italy)
Luc De Raedt (Co-chair, Belgium)
Paul Fischer (Germany)
Nada Lavrac (Yugoslavia)
Celine Rouveirol (France)
Yves Willems (Co-Chair, Belgium)
The MACHINE LEARNING AND KNOWLEDGE ACQUISITION subtrack of CompEuro
is concerned with all aspects of the field. This includes symbolic
learning, neural network learning, learnability theory, and
knowledge acquisition.
Authors, interested in submitting a paper, should send 5 copies of an
original research paper to the COMPEURO-92 Program Chair Patrick Dewilde
at
IEEE CompEuro 92
Prof. P. Dewilde
Delft University of Technology
Dept. of EE
POB. 5031, 2600 GA Delft
The Netherlands.
before 1 November 1991.
Authors in the subfield of COMPUTATIONAL LOGIC / LOGIC PROGRAMMING should also
send a copy of their abstract to
IEEE CompEuro 92
Maurice Bruynooghe
Katholieke Universiteit Leuven
Dept. of Computer Science,
Celestijnenlaan 200A
B-3001 Heverlee
Belgium
email : maurice@cs.kuleuven.ac.be
preferably before October 25, 1991 by email.
Authors in the subfield of MACHINE LEARNING AND KNOWLEDGE ACQUISITION should
also send a copy of their abstract to
IEEE CompEuro 92
Luc De Raedt
Katholieke Universiteit Leuven
Dept. of Computer Science,
Celestijnenlaan 200A
B-3001 Heverlee
Belgium
email : lucdr@cs.kuleuven.ac.be
preferably before October 25, 1991 by email.
Each manuscript should include
a cover page stating the names, affiliations and addresses of
all authors together with the identification, telephone and fax
numbers of the principal author. In addition, the cover page
should include the following signed statement :
"All appropriate clearances for the publication of this
paper have been obtained and if accepted, the authors
will prepare the final manuscipt in time for inclusion
in the conference proceedings and will present the paper
at the conference."
Papers should be no longer than twelve double spaced A4 pages, including
illustrations, figures and references.
Notification of acceptance / rejection is by January 15, 1992.
Final manuscripts are due by 1 February 1992.
Ideally, the papers submitted to COMPEURO 92 should be of interest
and accessable to a wide audience. Papers on applications are also
wellcome.
------------------------------
Subject: IFIP LQI Workshop CFP
Date: Thu, 18 Jul 91 16:56:13 +0100
From: jq@crim.fr
IFIP
Call for papers
First International TC-12 Working Conference on Low Quality Information (LQI)
Organized by WG 12.3, Problem-Solving
MONTPELLIER (FRANCE)
March 11-13, 1992
Theme of the Conference
Processing Low Quality Information (LQI) is an emerging new area
of information processing. This conference focusses on symbolic
(i.e. not statistical) processing of LQI, with emphasis on
problem-solving.
Until now, most of the information processed by computers can be
categorized as high quality information. In fact, much effort
is expanded to verify the accuracy, integrity and completeness
of information in present information systems.
Information is of low quality when it is incomplete, ambiguous,
uncertain, of unknown accuracy, etc. LQI is often provided by
informants who have only partial knowledge, are not always
reliable, may be biased, have (hidden) goals which they want to
further, show inventiveness to hide ignorance or embarrassment, etc.
Certainly, the great majority of the information circulating in
our world is LQI. The present information technology requires
high quality information and is usually unable to support the
many applications which would use the largely prevalent LQI .
By contrast, the mining industry has learned to mine ores of
increasingly lower quality, using advances in technology.
Improvements in hardware, software and AI methods will make it
possible to process LQI, thereby opening up enormous domains to
information technology, like for instance experimental sciences
(Molecular Biology, Chemistry,...), to provide an efficient
help to discovery.
Computers must learn to process LQI!
Time Table
May 15, 1991
Call for papers
September 30, 1991
Submission deadline
December 1st, 1991
Notification of acceptance/rejection
March 11-13, 1992
Conference at University of Montpellier II
Suggested topics of the conference
The following is a list of suggested topics of the conference, not
meant to be limitative. All these topics are to be considered in a
context of LQI, or occuring in a LQI environment.
*Acquisition of information
*Agreement, compromise and consensus
*Communication between sources
*Decision making
*Detection and correction (of LQI)
*Improving the quality of LQI at acquisition and processing time
*(LQI) environments
*Multiple informant systems and sources
*Presentation (of LQI)
*Question answering and reasoning
National Organizing Committee
Dr J. Quinqueton (Chairperson) (F)
Dr J. Sallantin (co-chairperson) (F)
Pr L. Siklossy (NL)
Ir P.A. Flach (NL)
Pr E. Diday (F)
Dr C. De Sainte Marie (F)
International Program Committee
Pr L. Siklossy (Chairperson) (NL)
Dr A.J. Baborski (PL)
Pr L. Bolc (PL)
Dr E. Knuth (H)
Dr Y. Kodratoff (F)
Pr R.S. Michalski (USA)
Pr I. Plander (CS)
Pr A. Porto (P)
Dr N. Prakash (India)
Dr Z. Ras (USA)
Dr P. Revay (S)
Dr J.C. Simon (F)
Dr J. Sowa (USA)
Pr T. Vamos (H)
Pr W Wahlster (FRG)
Information on IFIP WG 12.3 can be obtained from the Chairman,
Prof L. Siklossy, Postbus 71710,
NL-1008 DE Amsterdam
Aims of the conference
This Working Conference wants to bring together those who have
worked in or are interested in symbolic processing of LQI by
computer.
At this Working Conference, the present approaches to processing
LQI will be reviewed, and promising future approaches will be
presented and discussed.
The emphasis of the conference is on problem-solving aspects,
but the conference wants to maintain an interdisciplinary
nature.
------------------------------
END of ML-LIST 3.12