Machine Learning List Vol. 3 No. 12

Published in
· 1 year ago
 
		 Machine Learning List: Vol. 3 No. 12 
			 Thursday, July 25, 1991 

Contents: 
	Comments on Dietterich's AAAI 91 Talk (and bibliography) 
	MACHINE LEARNING AND KNOWLEDGE ACQUISITION subsection of IEEE CompEuro 92 
	First International TC-12 Working Conference on Low Quality Information (LQI) 

The Machine Learning List is moderated.  Contributions should be relevant to 
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu. 
Mail requests to be added or deleted to ml-request@ics.uci.edu.  Back issues  
may be FTP'd from ics.uci.edu in /usr2/spool/ftp/pub/ml-list/V<X>/<N> or N.Z  
where X and N are the volume and number of the issue; ID & password: anonymous 

------------------------------ 
Date: Tue, 23 Jul 91 11:49:25 PDT 
From: Wray Buntine <wray@ptolemy.arc.nasa.GOV> 
Subject: Comments on Dietterich's AAAI 91 Talk (and bibliography) 

 
Following Tom Dietterich's talk at AAAI, I thought the community might 
like to get an annotated bibliography for some of the topics he mentioned. 
Below I mention: 
	cross validation 
	minimum encoding 
	averaging multiple models 
I've spent a great deal of time in implementation and theory of these 
three topics, and Tom only had time to give a tiny and somewhat  
skewed selection. 

Wray Buntine 
NASA Ames Research Center                 phone:  (415) 604 3389 
Mail Stop 244-17                          fax:    (415) 604 6997 
Moffett Field, CA, 94035 		  email:  wray@ptolemy.arc.nasa.gov 

 

CROSS-VALIDATION  (and other resampling techniques) 
================ 

Tom gave this a big plug.  This is traditionally used in two tasks: 
	(1)   estimating subsequent prediction error ("out-of-sample") 
	(2)   coping with over-fitting 
My experience is that they are real good at (1), but I've managed to get 
better ways of doing (2), for instance doing a proper job of 
"averaging multiple models" (another topic Tom briefly mentioned). 
Below gives some references.  The experiments showing 
how other methods are sometimes better at (2) are described in (Buntine 1990). 
WARNING   the theory of cross-validation and similar techniques only 
          says they give "near unbiased estimates" of (1), 
	  which doesn't mean they are "optimal" in any sense; 
          and the theory gives no support that they are doing (2) well 
          "on average";  so don't be confused into thinking they are  
          "optimal" methods 
Nevertheless, cross-validation is a neat trick that is framed in 
a manner such that it can be applied to just about about empirical learning 
problem.  I use it as a bench-mark technique because I think better exist, 
and from experience I know cross-validation works pretty well. 

 

This is a mildly mathematical introduction to cross-validation, jack-knife, 
boot-strap.  Cross validation is also used 
to prune trees in CART, but the description in CART is harder to understand.   
	NB.  jack-knife and boot-strap are related resampling procedures 
@article{efron:siam, 
        AUTHOR = "B.~Efron", 
        TITLE = "Computers and the theory of statistics:  
			thinking the unthinkable", 
        YEAR = 1979, 
        JOURNAL = "SIAM Review", 
        VOLUME = 21, 
        NUMBER = 4, 
        PAGES = "460--480"} 

This probably the best introduction for the machine learning audience. 
This book also gives some simplistic guidelines for choosing between 
cross-validation/resampling, etc. on page 38.  The authors should be 
embarrassed that their magic numbers of 100 and 50 are independent 
of the learning problem and its inherent complexities.  Tsk, tsk. 
@book{weiss.kulik:91, 
        AUTHOR = "S.M.~Weiss and C.A.~Kulikowski", 
        TITLE = "Computer Systems That Learn", 
        PUBLISHER = "Morgan-Kaufmann", 
        YEAR = "1991" } 

Cullen Schaffer has a very nice paper appearing in the 
IJCAI-91 Workshop that discusses the topic in a machine learning context. 
A good paper to get you thinking.  He argues cross-validation is just 
another way of encoding a bias. 
@inproceedings{schaffer:ijw, 
        AUTHOR = "C.~Schaffer", 
        TITLE = "Overfitting avoidance as bias", 
        YEAR = 1991, 
        BOOKTITLE = "IJCAI-91 Workshop on 
		Evaluating and Changing Representation in Machine Learning", 
        ADDRESS = "Sydney", 
	NOTE = "To appear" } 

I did some comparative experiments several years ago now comparing 
cross-validation with things like Bayesian methods, MDL and a 
reimplementation of Quinlan's C4 on the task of decision tree pruning. 
This appears in my thesis, and is currently under modification for a  
journal.  An advance copy is: 
@techreport{buntine:trees, 
        AUTHOR = "W.L. Buntine", 
        TITLE =  "Learning Classification Trees", 
        YEAR =  1990, 
        INSTITUTION = "RIACS and NASA Ames Research Center", 
        NUMBER = "FIA-90-12-19-01", 
        ADDRESS = "Moffett Field, CA", 
        NOTE = "Paper presented at Third International Workshop 
                on Artificial Intelligence and Statistics"  } 

 
MINIMUM ENCODING METHODS 
======================== 

Tom briefly mentioned these before dismissing them.   In these 
methods, the coding scheme for the hypotheses corresponds to a prior 
in the Bayesian sense, and the schemes are usually viewed mathematically 
as approximate Bayesian methods. 
Using a different coding scheme gives you a different answer, so you 
have to think carefully about it.  One way to alleviate this is to do  
a scheme of the form: 
\[ 
	cost-of-coding-data + A * cost-of-coding-hypothesis 
\] 
and to use fancy methods to pick an appropriate value for the 
tradeoff parameter A. 
This is standard practice in regularization, Bayesian, and GCV 
(generalized cross validation) methods. 
Quinlan and Rivest allude to this in section 8 of their paper but 
don't appear to be familiar with the literature.  (I only just found this 
out recently too!) 
NB.     with the parameter A used above, minimum encoding is beginning 
         to look like cross-validation in its tree implementation 
NB.     with the parameter A used above, the scheme loses its neat 
	"coding" interpretation, but no loss since this is just  
	embelishment intended to appeal to the information theoreticians 
When done properly, these methods work well. 
WARNING	 Minimum encoding methods are quite widely misapplied and  
         misinterpreted.  Of course the state of the theoretical literature 
	 hasn't really helped here.  Some people forget that code-lengths 
	 originate as "log-likelihoods", some people forget that noise-free 
	 data has zero code length because its likelihood is 1.0, and some 
	 people forget that there is no inherently correct "coding scheme" 
	 so we really have to think carefully about the one we pick. 
NOTE     There are a whole bunch of good minimum encoding papers in 
	 recent AI proceedings (AAAI'90,'91, ML'91, etc.).  In general you 
	 may be better off looking to earlier work outside of AI 
	 if you want to learn the basics of these methods. 

 

Here is the Quinlan and Rivest paper that Pednault and Dietterich 
mentioned at AAAI.   If you wish to read this in detail, please get the 
next paper too, and maybe look at (Buntine 1990) too to get the whole 
cross-section. 
@article{quinlan.rivest:ic, 
        AUTHOR = "J.R. Quinlan and R.L. Rivest", 
        TITLE = "Inferring Decision Trees Using the Minimum Description 
                        Length Principle", 
        JOURNAL = "Information and Computation", 
        YEAR = "1989", 
        VOLUME = 80, 
        PAGES = "227--248" } 

Here is a paper listing modifications/corrections to Quinlan and Rivest's paper 
from some of the founders of the "minimum encoding" field. 
@techreport{wallace:trees, 
        AUTHOR = "C.S. Wallace and J.D. Patrick", 
        TITLE = "Coding Decision Trees", 
        YEAR = "1991", 
        INSTITUTION = "Monash University", 
        ADDRESS = "Melbourne", 
        TYPE = "Technical Report", 
        NUMBER = "151" } 

This is a good theoretical overview, but pretty heavy going. 
@article{barron.cover:it, 
        AUTHOR = "A.R.~Barron and T.M.~Cover", 
        TITLE =  "Minimum Complexity Density Estimation", 
        YEAR =  1991, 
        JOURNAL = "IEEE Trans. on IT", 
        VOLUME = "??", 
        NOTE = "I suspect it should have appeared, my copy is: 
		Univ. of Illinois, Dept.\ of Stat.\ TR~28, 1989"  } 

There is a somewhat theoretical but much diluted and more accessible  
discussion in sections 6.2 and 6.3.  Does a comparatively simple 
derivation of the various methods. 
@techreport{buntine.weigend:bbp, 
        AUTHOR = "W.L. Buntine and A.S. Weigend", 
        TITLE =  "Bayesian Back-Propagation", 
        YEAR =  1991, 
	INSTITUTION = "NASA Ames Research Center", 
	NUMBER = "FIA-91-22", 
        NOTE = "Submitted, also available on the neuroprose archive" } 

 
AVERAGING MULTIPLE MODELS (and redundant knowledge) 
========================= 

This seems to be coming a trend at COLT'91.  Historians and Tom might 
note I finished writing my thesis on this same topic at the end of '89 
using the same theory and implementing it in a real system. 
Turns out, statisticians have been doing similar stuff too, and some 
related ideas have appeared elsewhere.  In my biased view this is a 
hot topic because it actually includes things like minimum encoding 
methods, etc., as a special case (see Buntine & Weigend '91). 

 

Applies "averaging multiple models" to arbitrary boolean 
formula in a noise free environment.  I also handled 
conjunctive rules in my thesis. 
@inproceedings{buntine:ijcai89, 
        AUTHOR = "W.L. Buntine", 
        TITLE = "A Critique of the {V}aliant Model", 
        YEAR = 1989, 
        BOOKTITLE = IJCAI, 
        PAGES = "837--842", 
        PUBLISHER = "Morgan Kaufmann", 
        ADDRESS = "Detroit" } 

I implemented a decision tree learning package that 
approximates "averaging multiple models" on real data with noise, etc. 
Its described here and also in (Buntine 1990) above. 
@phdthesis{buntine:thesis, 
        AUTHOR = "W.L. Buntine", 
        SCHOOL = "University of Technology, Sydney", 
        TITLE = "A Theory of Learning Classification Rules", 
        YEAR = 1991 } 

Develops general bounds for "averaging multiple models"  
in a noise free environment.  I suspect this may become a classic 
reference.  Implementation in real (i.e. noisy) enviroments 
isn't easy, though, see my own work above. 
@inproceedings{haussler:percept, 
        AUTHOR = "M. Opper and D. Haussler", 
        TITLE = "Generalised Performance of {B}ayes Optimal Classification 
                Algorithm for Learning a Perceptron", 
        BOOKTITLE = "{COLT}'91: 1991 Workshop on Computational 
                        Learning Theory", 
        PUBLISHER = "Morgan Kaufmann", 
        YEAR = "1991", 
        NOTE = "To appear." } 

Applies "averaging multiple models" to linear perceptrons in a noise free 
environment.  See the next paper too. 
@inproceedings{haussler.kearns:colt, 
        AUTHOR = "D. Haussler and M. Kearns and R.E. Schapire", 
        TITLE = "Unifying Bounds on the Sample Complexity of {B}ayesian 
                Learning Using Information Theory and the {VC} Dimension", 
        BOOKTITLE = "{COLT}'91: 1991 Workshop on Computational 
                        Learning Theory", 
        PUBLISHER = "Morgan Kaufmann", 
        YEAR = "1991", 
        NOTE = "To appear." } 

An example of the same ideas in statistics, applied to logistic 
regression (i.e. rather like single layer perceptron with 
sigmoid activation in a noisy environment).  Except this is implemented. 
@article{stewart:stat, 
        AUTHOR = "L. Stewart", 
        TITLE =  "Hierarchical Bayesian analysis using Monte Carlo 
                integration:  computing posterior distributions 
                when there are many possible models", 
        YEAR =  1987, 
        JOURNAL = "The Statistician", 
        VOLUME = 36, 
        PAGES = "211--219"  } 

Here's some machine learning work on redundant knowledge.  They also have a 
paper recently in Int. Jnl. Man Mach. Studies. 
@inproceedings{gams:ewsl89, 
	AUTHOR = "M. Gams", 
	TITLE = "New Measurements Highlight the  
		Importance of Redundant Knowledge", 
	YEAR = 1989, 
	BOOKTITLE = "Proceedings of the Fourth European Working Session on Learning", 
	ADDRESS = "Montpellier", 
	EDITOR = "K. Morik", 
	PUBLISHER = "Pitman Publishing", 
	PAGES = "71--80"} 

Just to show you how pervasive the idea of redundant knowledge and 
multiple models is, here is some work from philosophers in a social 
science setting. 
@article{spirtes:smr, 
        AUTHOR = "P. Spirtes and R. Scheines and C. Glymour", 
	TITLE = "Simulation Studies of the Reliability of Computer-Aided 
             Model Specification Using {TETRAD II}. {EQS} and {LISREL} 
	   Programs", 
	 JOURNAL   ="Socialogical Methods and Research", 
	 YEAR        ="1990", 
	 VOLUME	= 19, 
	 NUMBER = 1, 
	 PAGES = "3--66" } 

[THANKS WRAY- I'd like to hear other reactions to or summaries of AAAI and MLW 
-MP] 
------------------------------ 
Date: Tue, 23 Jul 91 09:33:08 +0200 
From: Luc De Raedt <lucdr@cs.kuleuven.ac.be> 
Subject: MACHINE LEARNING AND KNOWLEDGE ACQUISITION subsection of IEEE CompEuro 92 

				COMPEURO-92 

IEEE 1992 International Conference on Computer Systems and Software Engineering. 

		4-8 May, 1992, The Hague, The Netherlands 

=============================================================================== 

This conference with main themes Computers and Optimization, Parallel Processing 
and Computational Algebra organizes subsections on  

MACHINE LEARNING AND KNOWLEDGE ACQUISITION, COMPUTATIONAL LOGIC / LOGIC 
PROGRAMMING, optimization and operations research, theory of computation, 
parallel processing, databases, numerical mathematics, data security 
and computer graphics 

 

	********************************************************** 
	*                                                        * 
	*        		subsections :			 * 
	*                                                        * 
	*		COMPUTATIONAL LOGIC and                  * 
	*	MACHINE LEARNING and KNOWLEDGE ACQUISITION.      * 
	*                                                        * 
	********************************************************** 

 
COMPUTATIONAL LOGIC / LOGIC PROGRAMMING  

The subsection on COMPUTATIONAL LOGIC / LOGIC PROGRAMMING 
features state of the art lectures by John W. Lloyd, Manuel Hermenegildo and  
Pascal Van Hentenryck as well as a number of technical sessions on the topic. 
To organize the latter, a subcommittee for handling research papers 
in the area of COMPUTATIONAL LOGIC and LOGIC PROGRAMMING has been created. 
It consists of  

	Maurice Bruynooghe (Belgium, Chair) 
	Francois Bry (Germany) 
	Pierre Deransart (France) 
	Mehmet Dincbas (France) 
	Seif Haridi (Sweden) 
	Luis M. Pereira (Portugal) 

MACHINE LEARNING AND KNOWLEDGE ACQUISITION 

The subsection on MACHINE LEARNING AND KNOWLEDGE ACQUISTION  
features state of the art lectures by Yves Kodratoff and Stephen Muggleton 
(tentative) as well as a number of technical sessions on the topic. 
To organize the latter, a subcommittee for the handling research papers 
in the area of MACHINE LEARNING AND KNOWLEDGE ACQUISITION has been created. 
It consists of  

	Francesco Bergadano (Italy) 
	Luc De Raedt (Co-chair, Belgium) 
	Paul Fischer (Germany) 
	Nada Lavrac (Yugoslavia) 
	Celine Rouveirol (France) 
	Yves Willems (Co-Chair, Belgium) 

The MACHINE LEARNING AND KNOWLEDGE ACQUISITION subtrack of CompEuro 
is concerned with all aspects of the field. This includes symbolic 
learning, neural network learning, learnability theory, and  
knowledge acquisition. 

Authors, interested in submitting a paper,  should send 5 copies of an   
original research paper to the COMPEURO-92 Program Chair Patrick Dewilde  
at  

	IEEE CompEuro 92 
	Prof. P. Dewilde 
	Delft University of Technology 
	Dept. of EE 
	POB. 5031, 2600 GA Delft 
	The Netherlands.   

before 1 November 1991.   

Authors in the subfield of COMPUTATIONAL LOGIC / LOGIC PROGRAMMING should also  
send a copy of their abstract to  

	IEEE CompEuro 92 
	Maurice Bruynooghe  
	Katholieke Universiteit Leuven 
	Dept. of Computer Science, 
	Celestijnenlaan 200A 
	B-3001 Heverlee 
	Belgium 
         
	email : maurice@cs.kuleuven.ac.be 

preferably before October 25, 1991 by email. 

Authors in the subfield of MACHINE LEARNING AND KNOWLEDGE ACQUISITION should  
also send a copy of their abstract to  

	IEEE CompEuro 92 
	Luc De Raedt 
	Katholieke Universiteit Leuven 
	Dept. of Computer Science, 
	Celestijnenlaan 200A 
	B-3001 Heverlee 
	Belgium 
         
	email : lucdr@cs.kuleuven.ac.be 

preferably before October 25, 1991 by email. 

Each manuscript should include  
a cover page stating the names, affiliations and addresses of 
all authors together with the identification, telephone and fax 
numbers of the principal author. In addition, the cover page 
should include the following signed statement : 

	"All appropriate clearances for the publication of this 
	paper have been obtained and if accepted, the authors 
	will prepare the final manuscipt in time for inclusion 
	in the conference proceedings and will present the paper 
	at the conference." 

Papers should be no longer than twelve double spaced A4 pages, including 
illustrations, figures and references. 

Notification of acceptance / rejection is by January 15, 1992. 
Final manuscripts are due by 1 February 1992. 

Ideally, the papers submitted to COMPEURO 92 should be of interest 
and accessable to a wide audience. Papers on applications are also 
wellcome. 

------------------------------ 
Subject: IFIP LQI Workshop CFP 
Date: Thu, 18 Jul 91 16:56:13 +0100 
From: jq@crim.fr 

 
IFIP 
Call for papers 
First International TC-12 Working Conference on Low Quality Information (LQI) 
Organized by WG 12.3, Problem-Solving 
MONTPELLIER (FRANCE) 
March 11-13, 1992 

 
Theme of the Conference 

Processing Low Quality Information (LQI) is an emerging new area 
 of information processing. This conference focusses on symbolic 
 (i.e. not statistical) processing of LQI, with emphasis on 
 problem-solving. 
Until now, most of the information processed by computers can be 
 categorized as high quality information. In fact, much effort 
 is expanded to verify the accuracy, integrity and completeness 
 of information in present information systems. 
Information is of low quality when it is incomplete, ambiguous, 
 uncertain, of unknown accuracy, etc. LQI is often provided by 
 informants who have only partial knowledge, are not always 
 reliable, may be biased, have (hidden) goals which they want to 
 further, show inventiveness to hide ignorance or embarrassment, etc. 
Certainly, the great majority of the information circulating in 
 our world is LQI. The present information technology requires 
 high quality information and is usually unable to support the 
 many applications which would use the largely prevalent LQI . 
By contrast, the mining industry has learned to mine ores of 
 increasingly lower quality, using advances in technology. 
 Improvements in hardware, software and AI methods will make it 
 possible to process LQI, thereby opening up enormous domains to 
 information technology, like for instance experimental sciences 
 (Molecular Biology, Chemistry,...), to provide an efficient 
 help to discovery. 
Computers must learn to process LQI! 

Time Table 

May 15, 1991 
Call for papers 
September 30, 1991 
Submission deadline 
December 1st, 1991 
Notification of acceptance/rejection 
March 11-13, 1992 
Conference at  University of Montpellier II 

Suggested topics of the conference 

The following is a list of suggested topics of the conference, not 
meant to be limitative. All these topics are to be considered in a 
context of LQI, or occuring in a LQI environment. 
*Acquisition of information 
*Agreement, compromise and consensus 
*Communication between sources 
*Decision making 
*Detection and correction (of LQI) 
*Improving the quality of LQI at acquisition and processing time 
*(LQI) environments 
*Multiple informant systems and sources 
*Presentation (of LQI) 
*Question answering and reasoning 

National Organizing Committee 

Dr J. Quinqueton (Chairperson) (F) 
Dr J. Sallantin (co-chairperson) (F) 
Pr L. Siklossy (NL) 
Ir P.A. Flach (NL) 
Pr E. Diday (F) 
Dr C. De Sainte Marie (F) 

International Program Committee 

Pr L. Siklossy (Chairperson) (NL) 
Dr A.J. Baborski (PL) 
Pr L. Bolc (PL) 
Dr E. Knuth (H) 
Dr Y. Kodratoff (F) 
Pr R.S. Michalski (USA) 
Pr I. Plander (CS) 
Pr A. Porto (P) 
Dr N. Prakash (India) 
Dr Z. Ras (USA) 
Dr P. Revay (S) 
Dr J.C. Simon (F) 
Dr J. Sowa (USA) 
Pr T. Vamos (H) 
Pr W Wahlster (FRG) 
Information on IFIP WG 12.3 can be obtained from the Chairman, 
Prof L. Siklossy, Postbus 71710, 
NL-1008 DE Amsterdam 

Aims of the conference 

This Working Conference wants to bring together those who have 
 worked in or are interested in symbolic processing of LQI by 
 computer. 
At this Working Conference, the present approaches to processing 
 LQI will be reviewed, and promising future approaches will be 
 presented and discussed. 
The emphasis of the conference is on problem-solving aspects, 
 but the conference wants to maintain an interdisciplinary 
 nature. 

------------------------------ 

END of ML-LIST 3.12
Machine Learning List Vol. 3 No. 12

Share this article

Let's discover also

Machine Learning List Vol. 6 No. 02

Machine Learning List Vol. 6 No. 09

Machine Learning List Vol. 6 No. 17

Machine Learning List Vol. 1 No. 12

Machine Learning List Vol. 3 No. 10

Machine Learning List Vol. 4 No. 20

Machine Learning List Vol. 6 No. 10

Machine Learning List Vol. 6 No. 18

Machine Learning List Vol. 6 No. 27

Machine Learning List Vol. 5 No. 08

Recent Articles

The First Earth's Circumnavigation by Antonio Pigafetta

Yak Facts Issue #10: It's Flavorific!

Yak Facts Issue #9: Now with Ginseng

Yak Facts Issue #8: As Seen On TV

Yak Facts Issue #7: Caution: Live Animals

Yak Facts Issue #6: Repeat as necessary

The Esoteric Origin of the Universal Weekly Sequence

Yak Facts Issue #5: Repeat as necessary

Yak Facts Issue #4: In Technicolor

SA CROCORIGA MANNOSA

Recent Comments