Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 6 No. 25
Machine Learning List: Vol. 6 No. 25
Monday, September 19, 1994
Contents:
Kolmogorov complexity, Duda and Hart, and generalization
FOIL 6.2
MLJ ILP Special Issue
AAAI CFP: Representing Mental States and Mechanisms
JLS Special Issue on Conceptual Change
Call for Papers: Kruse '95.
Position Announcement
Post-Doc: AI and meteorology
The Relationship Between PAC ...
IBEROAMERICAN CONGRESS ON ARTIFICIAL INTELLIGENCE - 2nd. Announcement
Coltbib info (long)
Conservation Law / InfoLoss : Empirical Results
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
may be FTP'd from ics.uci.edu in pub/ml-list/V<X>/<N> or N.Z where X and N are
the volume and number of the issue; ID: anonymous PASSWORD: <your mail address>
URL- http://www.ics.uci.edu/AI/ML/Machine-Learning.html
----------------------------------------------------------------------
Date: Sat, 3 Sep 94 16:20:22 MDT
From: David Wolpert <dhw@santafe.edu>
Subject: Kolmogorov complexity, Duda and Hart, and generalization
Schmidhuber writes
>>>
Concerning the recent discussion on ``conservation laws'',
``nfl theorems'', and other statements expressing that (in
general) generalization is impossible: This is my second
attempt to draw attention to the fact that such statements
appear to be obvious from the perspective of Kolmogorov
complexity theory.
>>>
Kolmogorov complexity theory has nothing to do with the
nfl statements. The statements hold independent of where on the
Chomsky hierarchy the learning algorithms lie. In fact, the
algorithms can have different computational abilities and the
results still hold.
In addition, there are many subtleties to the nfl results
that are, at best, glossed over in Schmidhuber's complexity theory
argument. E.g., the nfl results only hold for certain kinds of
likelihoods (noise in the input space can invalidate them, as can
"asymmetric" output space noise when there are > 2 possible outputs),
only for certain kinds of loss functions (for quadratic loss, there _are_
a priori reasons to use certain learning algorithms rather than others),
etc.
Moreover, supervised-learning-as-complexity-theory arguments
usually (I can't say for sure in Schmidhuber's case, not yet having
read his paper) reduce to the setting of the prior over targets to
some particular form (invariably called "universal"). However the nfl
results can be cast as an average over all priors, rather than as the
assumption of one particular prior. In such a guise they say that
there are as many priors for which algorithm A beats algorithm B as
vice versa. Conventional complexity theory arguments make no such
claim.
So although such complexity theory is suggestive and
illuminating, it does not directly address the same scenario as
the nfl results. One might very well be able to use complexity theory
to derive results whose implications are intuitively very similar
to the nfl results. But they are not the nfl results.
They simply do not say that uniform averages of certain conditional
probability distributions are independent of the learning algorithm.
***
Any brave souls still following this nfl thread might be
interested in the following quote from Duda and Hart's 1973 classic
text (section 3.8.5), on a topic related to the recent posting by
Duin. The context of the quote isn't off-training set error. However
it is interesting to see that there is such a rich tradition behind
uniformly averaging over targets to uncover "a priori" effects in
supervised learning.
"Of course, choosing the a priori distribution is a delicate
matter. We would like to choose a distribution corresponding to the
class of problems we typically encounter, but there is no obvious way
to do that. A bold approach is merely to assume that problems are
`uniformly distributed'. Let us consider some of the implications (of
such an assumption)..."
David Wolpert
------------------------------
Date: Tue, 20 Sep 1994 08:38:27 +1000
From: Ross Quinlan <quinlan@ml2.cs.su.oz.au>
Subject: FOIL 6.2
An updated version of FOIL is available by anonymous ftp from ftp.cs.su.oz.au,
directory pub, file foil6.sh. Several small bugs in 6.1a have been fixed,
and 6.2 contains better clause simplification.
Please report any problems to quinlan@cs.su.oz.au.
Ross Quinlan
------------------------------
Date: Wed, 7 Sep 94 12:51:07 BST
From: Steve.Muggleton@comlab.ox.ac.uk
Subject: MLJ ILP Special Issue
**********************************************************
CALL FOR PAPERS
**********************************************************
Special Issue: INDUCTIVE LOGIC PROGRAMMING
MACHINE LEARNING JOURNAL
Edited by Stephen Muggleton and David Page
Oxford University Computing Laboratory
Inductive Logic Programming (ILP) is a growing research area
spawned by Machine Learning and Logic Programming. While the in-
fluence of Logic Programming has encouraged the development of
strong theoretical foundations, the new area has inherited its
experimental orientation from Machine Learning. Already ILP has
been applied successfully to a variety of complex problems, in-
cluding structure-activity and mutagenicity prediction of pharma-
ceutical chemicals, protein secondary-structure prediction,
finite-element mesh design, and optimal play in chess endgames.
For this special issue we encourage submission of papers describ-
ing novel ILP algorithms, experimental applications, or theoreti-
cal results.
Submission deadline: September 15, 1995
It is the editors' intention to publish the special issue as a
book as well.
Papers should be double spaced and 8,000 to 12,000 words in
length, with full-page figures counting for 400 words. All sub-
missions will be subject to the standard review procedure.
Send three (3) copies of submissions to:
David Page Phone: +44-865-283-520
Oxford University Computing Lab Fax: +44-865-273-839
Wolfson Building David.Page@prg.ox.ac.uk
Parks Road
Oxford, OX1 3QD
U. K.
Also mail five (5) copies of submitted papers to:
Karen Cullen Phone: (617) 871-6300
MACHINE LEARNING Editorial Office karen@world.std.com
Kluwer Academic Publishers
101 Philip Drive
Norwell, MA 02061
U. S. A.
Note: Machine Learning is now accepting submission of final copy
in electronic form. A latex style file and related files are
available via anonymous ftp from ftp.std.com. Look in
Kluwer/styles/journals for the files README, smjrnl.doc,
smjrnl.sty, smjsamp.tex, smjtmpl.tex, or smjstyles.tar (which
contains them all).
------------------------------
Date: Wed, 7 Sep 1994 12:20:35 -0400
From: Michael Cox <cox@cc.gatech.edu>
Subject: AAAI CFP: Representing Mental States and Mechanisms
REPRESENTING MENTAL STATES AND MECHANISMS
AAAI 1995 Spring Symposium Series
March 27 - 29, 1995
Stanford University,
Stanford, California
CALL FOR PARTICIPATION
The ability to reason about mental states and cognitive mechanisms facilitates
performance at a variety of tasks. The purpose of this symposium is to enhance
our ability to construct programs that employ commonsense knowledge of the
mental world in an explicit representational format that can be shared across
domains and systems. Such knowledge can, for example, assist
story-understanding programs to understand characters that learn, forget, pay
attention, make a decision, and change their mind. The need to represent
knowledge of mental activity transcends usual disciplinary boundaries to
include most reasoning tasks where systems interact with users, coordinate
behaviors with autonomous agents, or consider their own beliefs and
limitations. For example, distributed problem-solving agents can use knowledge
of mental phenomena to predict and explain the behavior of cooperating agents.
In MACHINE LEARNING, a system's knowledge of its own mental states, capacities
and mechanisms crucially determines the reliability with which it can diagnose
and repair reasoning failures. The focus of the symposium, however, is on
representation of the mental world and the sharing/reuse of such
representations, rather than the applications that such representations
support.
Important questions to consider:
o (SHARABILITY) What tools / techniques can facilitate the sharing of
representations among researchers?
o (REUSE) What portions of the representation can be transferred
across reasoning tasks?
o (ARCHITECTURE) How can functional models of reasoning-components be
represented explicitly?
o (LOGICAL FORM) What statements can be logically asserted about the
self and its beliefs? What benefits arise from such representations?
o (APPLICATIONS) How can knowledge of mental phenomena be used in
tasks ranging from student instruction to intelligent interface
control?
o (INTROSPECTION) What must an intelligent system know about its own
mental states and processes?
PLEASE MONITOR THE WEB FOR ADDITIONAL INFORMATION:
ftp://ftp.cc.gatech.edu/pub/ai/symposia/aaai-spring-95/home_page.html
The symposium will consist of invited talks, individual presentations, and
group discussion. "Key position" papers describing possible topics for
submitted papers will be available at the network address listed above. If you
wish to present, submit up to 12 pages (fewer pages are encouraged) in
12-point, with 1'' margins. Others interested in attending should submit a
research abstract or position paper (3-pp. max). Financial assistance is
available for student participation.
Submit 1 postscript copy to
freed@picasso.arc.nasa.gov
or 4 hardcopies to Michael Freed, MS 262-2 NASA ARC, Moffett Field, CA, 94035.
SUBMISSION DATES:
Submissions for the symposia are due on October 28, 1994. Notification of
acceptance will be given by November 30, 1994. Material to be included in the
working notes of the symposium must be received by January 20, 1995.
ORGANIZING COMMITTEE:
Co-chairs -
Michael Cox cox@cc.gatech.edu
(Georgia Tech, AI/Cognitive-Science Group, College of Computing)
Michael Freed freed@picasso.arc.nasa.gov
(NASA Ames Research Center, Human Factors Group)
Gregg Collins (Northwestern University, Institute of the Learning Sciences)
Bruce Krulwich (Andersen Consulting, Center for Strategic Technology Research)
Cindy Mason (NASA Ames Research Center, Artificial Intelligence Group)
John McCarthy (Stanford University, Department of Computer Science)
John Self (Lancaster University, Department of Computing)
------------------------------
Date: Fri, 9 Sep 1994 18:28:08 -0400
From: Ashwin Ram <ashwin@cc.gatech.edu>
Subject: JLS Special Issue on Conceptual Change
THE JOURNAL OF THE LEARNING SCIENCES
Call for Papers
SPECIAL ISSUE ON CONCEPTUAL CHANGE
GUEST EDITORS
Ashwin Ram, Georgia Institute of Technology
ashwin@cc.gatech.edu, (404) 853-9372
Nancy J. Nersessian, Georgia Institute of Technology
nancyn@cc.gatech.edu, (404) 894-1232
Frank C. Keil, Cornell University
keil@tc.cornell.edu, (607) 255-6365
Conceptual change is the creation and modification of concepts through
development and experience, resulting in new concepts that are often
qualitatively very different. The topic is being studied from a variety
of perspectives. Cognitive development has been concerned with the
nature of children's concepts, how they differ from adult concepts, and
how the former develop into the latter. Research in scientific
conceptual change has investigated how new conceptual structures in a
scientific community come to replace existing ones through scientific
revolutions or through long-term scientific enterprise. Research in
education has been concerned with the nature of students' concepts and
misunderstandings and their development through learning processes in
and out of the classroom. Artificial intelligence researchers have
created computational models of conceptual and representational change.
This special issue will bring together a diverse set of approaches to
the common fundamental problems of conceptual change: what it is, how it
occurs, and how to facilitate it. Papers on all aspects of conceptual
change are invited. Discussion of psychological, philosophical,
educational, and computational studies of conceptual change are
appropriate; of particular interest are the implications of the results
of such studies towards understanding the fundamental nature of
conceptual change.
Five copies of manuscripts should be submitted by December 1, 1994 to
Ashwin Ram
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
following guidelines set forth in the journal. For more information,
contact any of the guest editors.
------------------------------
Date: Mon, 12 Sep 1994 09:12:01 -0700
From: Bob Levinson <levinson@cse.ucsc.edu>
Subject: Call for Papers: Kruse '95.
CALL FOR PAPERS
International KRUSE Symposium
___ Knowledge Retrieval, Use, and Storage for Efficiency ___
University of California, Santa Cruz
August 11-13 1995
IMPORTANT DATES
submission postmark deadline February 13, 1995
notification of acceptance April 12, 1995
camera-ready copy June 12, 1995
THEME
The symposium will provide a forum for exploring current research in
artificial intelligence, cognitive science, and databases that pertains
to the organization, encoding and retrieval of logical and complex
objects. The symposium will draw together researchers from diverse
disciplines as well as practitioners engaged in developing real
object-oriented term classification systems. Mathematical and
Graph-Theoretic approaches will be favoured over those approaches based
on analogy with human cognitive processes, though mathematical
discussions of such processes will be appropriate. The basic questions
to be addressed include
o classification of objects in a taxonomy: systemic classification,
semantic indexing, partial-order sorting, description identification,
and taxonomy maintenance.
o efficient order, lattice, graph, and code theoretic operations on objects:
subsumption, generalization, specialization, least common generalization,
and greatest common specialization.
o advanced uses of taxonomies: knowledge compression, knowledge compilation,
and knowledge evolution.
o using classified knowledge: classification as problem solving,
classification as constraint satisfaction, and exploiting abstraction.
o scalable techniques for large object databases
o integration of data and knowledge base technologies
The symposium will maintain a balance between theoretical issues and
descriptions of implemented systems providing a balance between theory and
practice. The focus of the symposium is on efficiency of retrieval, use
and storage.
AUTHORS' INFORMATION
Papers may not exceed 15 pages. Shorter, substantive papers are
welcome. Authors are requested to submit five (5) copies of their
paper. Alternatively, electronic submissions of papers (postscript
output) are encouraged.
Authors are further requested to attach title pages to their
submissions bearing their names, addresses, telephone numbers, FAX
numbers and e-mail addresses. In addition, authors are asked to
include abstracts of approximately twenty (20) lines with each paper,
and a list of short phrases descriptive of the content.
Papers must be postmarked on or before Monday February 13, 1995.
Address: KRUSE
c/o Gerard Ellis
Computer Science Dept.
RMIT
GPO Box 2476V, Melbourne, VIC 3001
Australia
email: ged@cs.rmit.edu.au ph:61-3-660-5090 fax:61-3-662-1617
ORGANIZING COMMITTEE:
Veronica Dahl (Co-Chair) Gerard Ellis, RMIT (Program Chair)
Director, Logic and Functionall Computer Science Dept.
Programming Group Royal Melbourne Univ of Technology
Professor, Computing Sciences Dept. GPO Box 2476, Melbourne, VIC 3001
Simon Fraser University Australia
Burnaby, B.C. V5A 1S6 CANADA
veronica@cs.sfu.ca ged@cs.rmit.edu.au
Phone (604) 291-3372 Phone: 61-3-660-5090
Fax (604) 291-3045 Fax: 61-3-662-1617
Andrew Fall (Co-Chair) Robert Levinson (Local Arrangements Chair)
School of Computing Science Dept. of Computer & Information Sciences
Simon Fraser University 229 Applied Sciences Building
Burnaby, B.C. V5A 1S6 CANADA University of California
fall@cs.sfu.ca Santa Cruz, CA 95064 U.S.A.
Phone: (604) 291-4302 levinson@cis.ucsc.edu
Fax: (604) 291-3045 Phone: (408) 429-2087
Fax: 459-4829
PROGRAM COMMITTEE:
Mohan Ahuja (USA) Robert Levinson (USA)
Hassan Ait-Kaci (Canada) Patrick Lincoln (USA)
Franz Baader (Germany) Robert MacGregor (USA)
Yves Caseau (France) Deborah McGuinness (USA)
Darrell Conklin (Canada) Guy Mineau (Canada)
Veronica Dahl (Canada) Werner Nutt (Germany)
Francesco Donini (Italy) Peter Patel-Schneider (USA)
Gerard Ellis (Australia) Raghu Ramakrishnan (USA)
Andrew Fall (Canada) Manfred Schmidt-Schauss (Germany)
Brian Gaines (Canada) James Schmolze (USA)
Jim Hendler (USA) Gert Smolka (Germany)
Fritz Lehmann (USA) Leon Sterling (USA)
Maurizio Lenzerini (Italy)
SYMPOSIUM LOCATION
The symposium will be held at the University of California, Santa Cruz
in a redwood forest in the Santa Cruz mountains. The university and
conference facilities are retreat style with housing available in
family-style apartments residing on the campus. The university is well
serviced by buses to downtown Santa Cruz. The campus, just 10 minutes
from the oceanside, overlooks Monterey Bay, the popular surfing
beaches, and you can watch the eagles soar from the Birds of Prey
sanctuary which forms part of the campus. Santa Cruz is approximately
a 90 minute bus ride from San Francisco Airport and about 45 minutes
from San Jose.
------------------------------
From: Phil Goodman <goodman@unr.edu>
Subject: Position Announcement
Date: Sat, 17 Sep 1994 14:53:01 -0700 (PDT)
******* Preliminary Position Announcement *******
NEURAL NETWORK METHODOLOGIST -- VISITING or RESEARCH FACULTY MEMBER
(Basic and Applied Research; 100% of Time Protected for
Project-Related and Independent Research)
Center for Biomedical Modeling Research
University of Nevada, Reno
The University of Nevada Center for Biomedical Modeling Research (CBMR),
located at the base of the Sierra Nevada Mountains near Lake Tahoe, is an
interdisciplinary research project involving the Departments of Medicine,
Electrical Engineering, and Computer Science. Under federal funding, CBMR
faculty and collaborators apply neural network and advanced probabilistic/
statistical concepts to large health care databases. In particular, they are
developing methods to: (1) improve the accuracy of predicting surgical
mortality, (2) interpret nonlinearities and interactions among predictors,
and (3) manage missing data.
The CBMR seeks a PhD (or equivalent) methodologist trained in advanced
artificial neural network theory, model generalization, probability and
statistical theory, and C programming. This person will have major
responsibility for the design of techniques that improve the ability of
nonlinear models to generalize, and will supervise several C programmers to
implement concepts into software (much of the resulting software will be
freely distributed for use in many fields). Working knowledge of decision
theory, Bayesian statistics, bootstrap, ROC analysis, or imputation of
missing data is desirable.
Starting date is November 15, with an expected duration of at least 2 years.
Appointment possibilities include:
* Research Assistant Professor (non-tenure track)
* Visiting Professor (Assistant/Associate/Full)
(salary could be added to available sabbatical or other partial funding)
Funding is also available for a graduate student to work under the faculty
member, and possibly a post-doctoral position.
The position will remain open until filled. The University of Nevada employs
only U.S. citizens and aliens lawfully authorized to work in the United
States. AA-EOE.
If interested, please send (by written, faxed, or plain-text electronic mail):
a cover letter detailing your qualifications, and a resume that includes the
names and phone numbers of three references.
Philip H. Goodman, MD, MS E-mail: goodman@unr.edu
Associate Professor of Medicine, Electrical Engineering, & Computer Science
University of Nevada Center for Biomedical Modeling Research
World-Wide Web: http://www.scs.unr.edu/~cbmr/
Washoe Medical Center, 77 Pringle Way, Reno, Nevada 89520 USA
Voice: +1 702 328-4869 FAX: +1 702 328-4871
------------------------------
From: Eric Jones <Eric.Jones@comp.vuw.ac.nz>
Subject: Post-Doc: AI and meteorology
Date: Mon, 19 Sep 1994 17:37:28 +1200
Post Doctoral position - Severe Weather and Climate in New Zealand
Applications are invited for a Postdoctoral Fellowship from persons who have
completed or are about to complete a doctorate. The position is tenable
immediately, and some preference will be given to candidates who can start at
an early date.
The successful candidate will be based in the Institute of Geophysics, which
is part of the Research School of Earth Sciences at Victoria University of
Wellington, New Zealand.
The position will involve applying techniques from artificial intelligence and
machine learning to problems in meteorology and climatology. This is an
exciting opportunity to get in on the ground floor of a promising area of
interdisciplinary research.
The meteorology group in the Institute of Geophysics has close links with the
Department of Computer Science, and the successful candidate will work closely
with staff in both the Institute of Geophysics and the Department of Computer
Science.
The ideal candidate has a good publications record, a background in
meteorological research, well-developed computer skills, demonstrated
competence in written English, and some experience in the application of
artificial intelligence techniques. However, candidates with lesser relevant
experience are also invited to apply for this position.
The project is led by Dr Jim McGregor of the Institute of Geophysics (e-mail:
mcgregor@kauri.vuw.ac.nz, phone: + 64 4 495 5278, fax: +64 4 495 5186) and Dr
Eric Jones of the Department of Computer Science (e-mail:
Eric.Jones@comp.vuw.ac.nz, phone: + 64 4 472 1000 ext 8555), and queries
regarding this position can be addressed to them.
The Fellowship is tenable for a maximum period of two years and carries an
emolument not exceeding NZ$37,000 per annum. Appointees from overseas will be
eligible for a grant towards the cost of travel which is not payable until
after the Fellowship is taken up.
APPLICATIONS
Expressions of interest by electronic mail are welcomed.
Two copies of formal applications should be sent to
Appointments Administrator,
Personnel Office
Victoria University of Wellington
P.O. Box 600
Wellington,
New Zealand
Tel:- + 64 4 495 5272
Fax:- + 64 4 495 5238
Formal applications should contain the following information:
(a) Name
(b) address
(c) contact telephone numbers (and fax if possible)
(d) date and place of birth
(e) nationality
(f) academic qualifications
(g) research experience
(h) the names of up to three referees. Please arrange for your referees
to send their reports directly to the Appointments Administrator at
the address above.
(i) date at which applicant can arrive at the University.
THE UNIVERSITY IS AN EQUAL OPPORTUNITY EMPLOYER
------------------------------
Date: Mon, 19 Sep 94 16:36:43 MDT
From: David Wolpert <dhw@santafe.edu>
Subject: The Relationship Between PAC ...
The following paper is available by anonymous ftp from
ftp.santafe.edu. It is a major revision of an earlier paper on the
same posted about six months ago on the connectionst mailing list.
Of particular interest to ML readers may be the chapter on the
"no-free-lunch theorems/conservation laws" that were discussed
recently.
The Relationship Between PAC, the Statistical Physics framework, the
Bayesian framework, and the VC framework.
Abstract: This paper discusses the intimate relationships between the
supervised learning frameworks mentioned in the title. In particular,
it shows how all those frameworks can be viewed as particular
instances of a single overarching formalism. In doing this many
commonly misunderstood aspects of those frameworks are explored. In
addition the strengths and weaknesses of those frameworks are
compared, and some novel frameworks are suggested (resulting, for
example, in a "correction" to the familiar bias-plus-variance
formula).
To retrieve the paper, ftp to ftp.santafe.edu, and log on as
'anonymous'. Then type 'cd pub/Users/dhw/Stored_stuff'. Retrieve
either the file called relating.frameworks.ps.Z (compressed
postscript) or relating.frameworks.ps.Z.encoded (uuencoded compressed
postscript).
Please contact me if you have any problems.
Warning: The paper is quite long (about 100 pages) and covers _many_
topics besides the no-free-lunch theorems.
David Wolpert
The Santa Fe Institute
1399 Hyde Park Road
Santa Fe, NM, 87501, USA
(505) 984-8800 (voice); (505) 982-0565 (FAX).
dhw@santafe.edu
------------------------------
Date: Sat, 10 Sep 1994 13:57:32 -0400
From: Jose Ramirez <jramire@conicit.ve>
Subject: IBEROAMERICAN CONGRESS ON ARTIFICIAL INTELLIGENCE - 2nd. Announcement
CALL FOR PARTICIPATION
IBEROAMERICAN CONGRESS ON ARTIFICIAL INTELLIGENCE
IBERAMIA 94
NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
CNIASE 94
IBERAMIA 94/CNIASE 94 will be sponsored by the Venezuelan Association for
AI -AVINTA-, the Mexican AI Society -SMIA- and the Spanish Association for
IA -AEPIA-. The goal of the conference is to promote research and
development in Artificial Intelligence and scientific interchange among AI
researchers and practitioners.
IBERAMIA 94/CNIASE 94 will be hosted by the Centro de Investigaciones
Oficina Metropolis and the School of Systems Engineering of the Universidad
Metropolitana of Caracas -UNIMET-, between Tuesday 25th October and Friday
28th October 1994.
* Invited Speakers:
Nils Nilsson (Stanford University, USA)
Artificial Intelligence: Where Are We? Where Are We Going?
Bart Selman (AT&T, USA)
Efficient Knowledge Representation
Constraints Satisfaction
Helder Coelho (INESC, Portugal)
Distributed AI in Prolog
AI tools illustrated by Applications
Ted Briscoe (Cambridge, UK)
Issues in Lexical Knowledge Representation
* Technical sessions, panels and tutorials will address the following topics=
:
Machine Learning
Knowledge Acquisition
Natural Language Processing
Genetic Algorithms
Knowledge Based Systems
Knowledge Representation and Reasoning
Automated Reasoning
Knowledge-based Simulation
Robotics
Case-based Reasoning
Distributed Artificial Intelligence
Neural Networks
Virtual Reality
* For further information, please contact:
AVINTA
Apartado 67079
Caracas 1061 - Venezuela
+58-2-2836942,fax:+58-2-2832689
jramire@conicit.ve
Universidad Metropolitana
Oficina Metropolis
Terrazas del Avila
Caracas 1070-A - Venezuela
+58-2-2423089, fax:+58-2-2425668
abianc@conicit.ve
------------------------------
Received: from ics.uci.edu by paris.ics.uci.edu id aa12380; 17 Aug 94 9:26 PDT
Received: from dontcare.cs.umd.edu by q2.ics.uci.edu id aa20767;
17 Aug 94 9:26 PDT
Received: by dontcare.cs.UMD.EDU (8.6.9/UMIACS-0.9/04-05-88)
id MAA10192; Wed, 17 Aug 1994 12:26:19 -0400
Date: Wed, 17 Aug 1994 12:26:19 -0400
From: William Ian Gasarch <gasarch@cs.umd.edu>
Message-Id: <199408171626.MAA10192@dontcare.cs.UMD.EDU>
To: mllist@ics.uci.edu, pazzani@ics.uci.edu
Subject: Coltbib info (long)
(If you are already familiar with the COLTBIB then read this,
otherwise read what is below and then read this)
This is being send in August 1994 and includes
the recent COLT 1994 conference.
This time the files LOOKUP, pick, and tools
are included in the big file that you extract all
the files from.
PLEASE check papers that YOU wrote that
may require updated references.
Send updates by just submitting new entries
(I will take care of duplicates)
I would like updates by OCTOBER 30
to put into the next version.
See HOW TO SUBMIT below for how to submit.
================================================================
(THIS WAS IN APRIL 1994 SENDOUT)
(If you are already familiar with the COLTBIB then read this,
otherwise read what is below and then read this)
The coltbib was found to be in error in a few entries.
In particular, some of the FOCS conference entries did not
have the correct year or ordinal (e.g., `17th annual')
The new version has all that fixed.
(If you are already familiar with the COLTBIB then read this,
otherwise read what is below and then read this)
================================================================
(THIS WAS IN MARCH 1994 SENDOUT)
(If you are already familiar with the COLTBIB then read this,
otherwise read what is below and then read this)
1) Some of the keys have changed. In particular, for the COLT
PROCEEDINGS themselves the editors are NOT in the keys.
Same for any other proceedings.
2) In the file tools is information about tools that are
available to use bibtex files in general.
______________________________________________________________-
(THIS WAS IN JAN 1994 SENDOUT)
(If you are already familiar with the COLTBIB then read this,
otherwise read what is below and then read this)
There are TWO additional files that you will receive
if you get the COLTBIB package
1) A file called pick which was written by Prasad Chalasani as
an aid to using the bibliography. Details are in that file.
2) A file LOOKUP which includes a few references that I have TWO
entries for and do not know which one is correct. I would appreciate
if you have more information on these references to email me
directly (gasarch@cs).
____________________________________________________________-
HISTORY AND OVERVIEW
This bibliography, an ongoing project whose goal is to offer a reasonably
complete database of publications in computational learning theory, came into
existence in 1993 after the publication of the Computational Geometry
Column in
the Spring 1993 SIGACT News, describing the bibliogrpahy maintained by that
community. At the suggestion of Lenny Pitt, Bill Gasarch and Anselm Blumer
volunteered to begin a similar effort for computational learning theory.
Dana Angluin, Sanjay Jain, Phil Long and Ron Rivest provided invaluable
help by allowing their existing online bibliographies to be merged to provide
a starting point. The tables of contents from all the COLT conferences were
then added, as well as those papers from STOC and FOCS which seemed the most
relevant. Gisli Hjaltason provided invaluable help by programming awk scripts
to put everything into a standard format and find duplicates.
WHAT FILES WE HAVE AVAILABLE
Several files are available via anonymous ftp or email.
(Next section says HOW to access them)
The files available are the following.
1) colt.bib.Z.
This is a compressed bibtex file. If you do
decompress it (command: uncompress colt.bib.Z)
then you will have a file colt.bib which is a latex
biblio file.
2) coltbib.dvi.Z
This is a compressed file that when decompressed is
a dvi file that, when printed out, gives you all
the entries as they would appear in a bibliography.
3) coltbib.ps.Z
This is a compressed file that when decompressed is
a postscript file that, when printed out, gives you all
the entries as they would appear in a bibliography.
4) authority
This file contains the conventions that we
use in naming conferences and journals in
the bibilography file.
(e.g. FOCS conference is referred to by
Proc. Nth Annu. IEEE Sympos. Found. Comput. Sci.)
5) This README file.
6) coltall One file that when executed produces the five files above.
To execute the file type
sh coltall
or type
coltall
7) SOFT is a directory that contains our software and misc
files. There is no real need to look in this directory and
it is not organized for public use. Its only there to make
the job of maintaining this bib easy.
The file you will be working with the most is colt.bib
You are encouraged to use it actively, make additions and
corrections, and send the changes to coltbib@cs.umd.edu for integration
into a new revision.
After unpacking the file you should preserve the original biblio
source file of colt.bib in some read-only form (called oldcolt.bib, say)
and make a writeable copy for your actual changes. Later on, you should
email me your changes by emailing me either entries that are not
in the file or updated versions of entries that are in the file.
(Details in next section.)
HOW TO ACCESS
You can access these files two ways
1) Using ftp
Establish an anonymous ftp connection to cs.umd.edu, then change
directories to pub/coltbib, then retrieve
colt.bib.Z with ftp in binary mode. (This site is on Eastern Standard Time
and although we place no restrictions on access time,
the hours at which you can retrieve files efficiently may vary with
the load on intervening networks.)
$ ftp cs.umd.edu
Name: anonymous
Password: yourname@yoursite
ftp> cd pub/coltbib
ftp> dir
total 924
-rw-r--r-- 1 gasarch 28679 Dec 10 17:57 README
drwxr-xr-x 2 gasarch 1024 Dec 10 17:56 SOFT
-rw-r--r-- 1 gasarch 7342 Nov 19 15:14 authority
-rw-r--r-- 1 gasarch 87055 Nov 19 15:14 colt.bib.Z
-rwxr-xr-x 1 gasarch 518953 Nov 19 15:17 coltall
-rw-r--r-- 1 gasarch 112879 Nov 19 15:14 coltbib.dvi.Z
-rw-r--r-- 1 gasarch 149048 Nov 4 15:13 coltbib.ps.Z
ftp> binary
ftp> get colt.bib.Z
ftp> quit
(you could get any of these files this way)
Unpack it with
uncompress colt.bib.Z
If you get the file coltall then
put it into an empty directory and execute it.
It will produce all the files in compressed form.
2) Using email. If you mail to coltbib@cs.umd.edu then you will
get the file coltall in return. If you want all five files or if
you cannot ftp then this is the best way to get them. If you just want
colt.bib then ftp is better to use.
WARNING:
Standard bibtex comes with a limit of 750 entries per bibliography, which
is laughably small for us. People will be able to produce hardcopy
according to their own tastes only once they get their local bibtex
reconfigured.
HOW TO SUBMIT
Email new entries or updated versions of old entries
to coltadd@cs.umd.edu
They will be intergrated into the next version.
What forms are acceptable for entries is discussed
below.
We will be updating this bibliography on a regular
schedule (perhaps every four months).
We will email to the colt mailing list
a short while before the deadline that we
are updating soon, though updates can be sent anytime.
BIBLIOGRAPHERS
This bibliography is and always has been a product of the good will of
volunteers from our community. An essential idea of this project is that
it ought to be a social effort, and that everyone who is using this
bibliography should feel willing to contribute his or her share in
improving it.
The following are the volunteers who have helped produce this electronic
bibliography, whether in getting it up and running, or converted to
bibtex, or continuing to improve its coverage and accuracy.
Dana Angluin, Anselm Blumer, Bill Gasarch, Gisli Hjaltason,
Sanjay Jain, Bill Jones, Phil Long, Joseph O'Rourke, Ron Rivest
However, if the project is to continue, it can only do so through widespread
participation. We ask that, if you are using this bibliography, find it
helpful, and wish it to carry on, you ``pay forward'' a fraction of the
time which it saves you by joining us and contributing updates as
described herein.
CREATING ENTRIES
What goes in? Papers relevant to computational learning theory, which for us
means the study of the computational complexity of well-defined learning
problems. Thus we are talking algorithms, data structures, analysis of
time and storage, lower and upper bounds, etc. We interpret relevance in a
rather broad sense, although we prefer that references from cognate areas
(such as statistics, recursion theory, psychology of learning, etc.) emphasize
books or survey articles rather than individual papers, and that these be
included only if they would be referenced by several different papers in
the bibliography.
In the end, your judgement as a working researcher decides what is relevant
and worth inclusion. (A pragmatic test: have you cited or would you cite
the item in your own papers?)
Future maintenance is easiest if you include only papers which are "stable";
i.e. published and openly available at least in the form of a numbered
techreport, and preferably in a conference proceedings. However, it is
okay to include preprints too. If the paper is slated to appear somewhere
else, that information can usefully annotate the entry for an existing
appearance.
Mary-Claire van Leunen's book _A Handbook for Scholars_ (Knopf, New York,
1979) suggests that for utmost scholarship nothing short of the original
title page should be trusted: "To write a reference, you must have the
work you're referring to in front of you.... The temptation to write a
reference without having the work before you will be powerful. Resist
it. A vague recollection is worthless; a vivid recollection is probably
the result of your imagination --- ingenious, no doubt, but of little use
to your reader. Don't rely on your memory.... If you must not rely on
your own memory, even less should you rely on someone else's. If your
only access to a reference is through a secondary source, then you must
refer to the secondary source as well as the primary one."
We are less concerned with the sheer volume of what you add to the
bibliography than with its accuracy and relevance. But please bear in
mind that there is a minimum overhead of at least an hour to process each
submission in the merging process, making larger submissions more
efficient than tiny ones. Coordinating your changes with those of other
colleagues, grad students, and so on at your site before sending is
greatly appreciated.
We are always open to suggestions on how to capture data with best
efficiency and least overlap, but at the moment would suggest the
following approach:
1) use the bibliography as a bibtex database when typesetting
references for your own papers, so that adding entries and making
corrections can happen as a natural side effect of your own work.
2) during that process, you will likely wind up referring to papers
from some conference or journal year which isn't known to have
been covered by the bibliography (see the list below). It would
be very helpful if you took the time for at least one such paper
to check through the whole volume and ensure that all relevant
learning papers have been incorporated in the bibliography. (This
doesn't take so long as you might think: by keeping an entry
template with repetitive details ready in your editor, within an
hour you can enter a full conference of about 50 papers.)
3) please look carefully at entries for papers written by you, or by
people at your institution, to ensure their correctness. No one
else can do this more accurately or more efficiently.
4) if you are caught up with current events and looking for a pastime,
you can work on something from our open problems list; or you can
check back through unexamined years of a journal or conference to
ensure that all relevant papers are included, correct, and
keywordized.
FORMATTING ENTRIES
Because of the distributed nature of updates, it seems desirable to have
some written guidelines for the format of entries, in order that the final
product have a consistent style. The following suggestions are based on
common practice where discernible, established authorities where possible,
and personal opinion where unavoidable.
You will likely find existing entries in disagreement with these guidelines.
Either the entry or the guidelines should be fixed. If some entry can't
be decently handled by the current guidelines, or you think they're just
plain wrong in any case, please let us know about it.
In the hope of keeping future input work reasonably simple and error-free,
a few lexical conventions were set at the time of bibtex conversion, as
follows. Where possible you should use lower case (for simplicity), a
leading comma ``, volume = 12'' (to make missing commas obvious), and put
all text for a field on a single line (to avoid spending time on
prettyprinting). If you must break lines, such as in the abstract= or
annote= or comments= or note= fields, start subsequent lines with a tab.
Special characters and diacriticals should be entered as specified on
p.52 of the TeXbook. The common single-letter ones are described below.
\' acute sup{\'e}rieur {\'}O'D{\'}unlaing [\*' in troff -ms]
\` grave probl{\`e}me Bruy{\`e}re [\*` in troff -ms]
\^ circumflex m{\^e}me Lema{\^i}tre [\*' in troff -ms]
\~ tilde ma{\~n}ana N{\'u}{\~n}ez [\*' in troff -ms]
\v hacek h{\'a}{\v c}ek Matou{\v s}ek [\*C in troff -ms]
\c cedilla fran{\c c}ais {\'}Swi{\c a}tek [\*, in troff -ms]
\" umlaut f{\"u}r G{\"u}ting [\*: in troff -ms]
\H Hung. umlaut Erd{\H o}s
Note that diacriticals precede the letter affected. A complication is
that in TeX, control sequences specified using letters must somehow be
separated from the ordinary letters that follow. A simple way is to use
spaces as in "Erd\H os", but this will look like two separate words to
bibtex. Another is to use braces as in "Erd\H{o}s", but this too is
confounded by bibtex, which (1) normally wants to decapitalize text in
titles not protected by braces, to support variant capitalization styles,
and (2) will interpret an umlaut \" as the end of a quoted string, unless
specially protected.
Initially it might seem enough to put braces around the whole word when it
contains either a fussy diacritical or (in a title field) a capital
letter. However, it turns out that, because of how it handles the author
field, bibtex dictates the convention to follow. Since adding a feature to
recognize and handle accented characters in author fields (for benefit of
the alpha bibliography styles), bibtex requires that we "place the
entire accented character in braces; in this case either {\"o} or {\"{o}}
will do .... furthermore these braces must not themselves be enclosed in
braces (other than the ones that might delimit the entire field or the
entire entry), and there must be a backslash as the very first character
inside the braces". Thus you should use, for example, {\'O}'D{\'u}nlaing,
Matou{\v s}ek, G{\"u}ting, and Erd{\H o}s, and we recommend that for
consistency you treat all accents this way in whatever bibtex fields they
appear. However you will further have to embrace the whole of any
capitalized name that appears in a title field. Such is life with bibtex.
Mathematical expressions, including numbers in titles, should always be
entered in TeX notation. Author, title, and page information from other
than the title page of the paper itself is untrustworthy: you might want
to do data entry from a proceedings table of contents for speed, but
please take time to proofread against title pages for accuracy.
Below is a quick naming of parts for entries in the database, with
discussions of the conventions that have evolved. More detailed
information on entry formats can be found in the bibtex documentation.
Entry type: we ignore some of the fine distinctions available in bibtex
and map most everything onto the types of article, book, inbook,
incollection, inproceedings, mastersthesis, phdthesis, and techreport.
Preprints (a.k.a. "Manuscripts") are considered unnumbered techreports for
our purposes, since they are often later distributed in that form. If
present entries in the bibliography are any guide, you should rarely need
other entry types. In particular, note that low-grade items like personal
communications should not be included since our charter is to cover only
openly available materials. If you need such an entry in your papers'
reference lists, please keep it in a supplementary bibliography file until
it is published. For example, "\bibliography{mine,mygroup,geom}"
specifies a search path of three files bibtex can use to satisfy
references.
Citation tag: it's easy to come up with citetags that are mnemonic,
short, or unique, but not to have all three at the same time. The system
we use is a compromise. Our citetags consist of an author part (first
letter of surname of each author), a title part (first letter or digit
string of each significant word in the title, up to 7 characters), and a
year part (last two digits of year of publication), separated by dashes.
Thus "J. O'Rourke, Art Gallery Theorems and Algorithms, 1987" reduces to
"o-agta-87". The tricky part of this is how to define "significant" words
of the title, particularly when punctuation and mathematical strings are
involved. Here are the formal rules, which are intended to produce a
commonsense result as often as possible:
- remove articles, conjunctions, and prepositions (i.e. words which
wouldn't be capitalized in a title)
- convert Roman numerals to Arabic, remove diacriticals and braces,
remove quotes and apostrophes, convert other punctuation to spaces
- retain only the first alpha/numeric token within $...$ delimiters
- take the first letter, or first digit string, of remaining words
- take the first 5 characters so produced
For just under 99% of entries the citetag generated by this procedure will
not conflict with that of any existing entry. But if it does, you'll have
to find some way to break the tie. In our experience, collisions at this
stage have come about only from various forms of the ``same paper''
conflicting with each other, and one of the following tiebreaking rules
suffices:
- for multipart papers, add the part number, in Arabic, to the
title field (c-lbors1-90, c-lbors2-90)
- for other variations on a theme, add a letter from a distinguishing
word to the title field (s-mmdpsl-90, s-mmdpsp-90)
- for alternate publications of a paper, append "a" for article, "i"
for incollection or inbook or inproceedings, or "t" for techreport,
to the year field (kkt-ptots-90i, kkt-ptots-90t)
- otherwise, punt and discriminate using whatever you can
(g-gramq-cga-88, g-gramq-edbt-88)
If the resulting citetags don't match your favourite descriptor for the
reference, you can still use the old familiar version if you declare a
mapping between the two in your TeX source, such as the following:
\newcommand{\smawk}{akmsw-gamsa-87}
You needn't go through the process for entries you don't cite: the
merging software will automatically generate citetags for contributed
entries lacking them. It also keeps entries in the colt.bib file sorted
in order of author, title, and year, in order to bring various appearances
of the same paper together. (This should match the order of the default
softcopy.)
Fields: we support all fields from the bibtex standard styles, plus a few
common extensions like abstract=, annote=, isbn=. Fields cites=, comments=,
keywords=, precedes=, succeeds=, and update= are our own. Conventions for
entering all of these are as follows. Quotes aren't necessary when the
field value is entirely digits (true for volume, number, and year,
normally). You should use the empty string "" for fields you can't
complete just yet (e.g. pages = "" for a conference or journal paper to
appear). Some older entries use a visible placeholder like "??"; if you
need to cite them, please fill in the hole, or change to "", as feasible.
abstract: verbatim from the original item (optional)
- little used in present entries, it's best added only when short and
sweet
address: city of publication
- use for books, techreports, theses, and obscure irregular conferences;
otherwise discouraged
- use only first city if publisher lists several
- add two-letter state/province codes for US/Canada cities; for others,
add country
- give English-language name, with correct diacriticals (e.g. Munich
rather than M{\"u}nchen, Saarbr{\"u}cken rather than Saarbruecken)
- city/country names to use are those in effect at time of publication
(e.g. West and East Germany from 1949 until 3 October 1990, Germany
thereafter)
annote: explanatory or critical comment on item content (optional)
- little used in present entries, but welcomed
author:
- separate multiple author fields with " and ", order same as in reference
- author's names in name, initials order
- use braces to enclose capitalized or comma-separated elements of a
compound surname, e.g. {Van Wyk} or {Lipski, Jr.}
- instead of full given names you may follow the custom of mathematical
literature and use initials, space-separated (exceptions to avoid
collision: Ta. Asano, Te. Asano)
- [van Leunen p.155] by "strict and narrow propriety" we should cite
precisely the name which appears on the item, even if it leads to
irregularities. While it is reasonable to fix up such typographical
glitches (attributable to coauthors, copy editors, and the pressure of
deadlines) as you are certain the author would want you to, inconsistent
practice is for the author and not the bibliographer to worry about.
booktitle: title of book or proceedings containing item
- for English items, capitalize first word, first word after a colon,
and all other words except articles and unstressed conjunctions and
prepositions. Otherwise follow capitalization conventions of the
native language, if you know them. (According to the MLA Handbook,
for French, German, Italian, Latin, and Spanish, capitalization in
titles is the same as in normal prose.) There is no need for braces
on capitalized words in this field.
- abbreviations for some popular conferences are in the authority file.
The merging software will recognize and convert most variant
abbreviations to standard form.
chapter: chapter or section number, where item is part of a monograph
- use entry type of incollection if chapter has its own title,
inbook otherwise
cites: citations made by item (optional)
- give as list using biblio citetags, such as
cites = "bs-dcms-76, gjpt-tsp-78, o-agta-87"
- needn't be an exhaustive copy of the item's citations, but if used
should at least give the significant ones. You can say cites =
"(complete) bs-dcms-76, ..." if the list is exhaustive.
comments: bibliographic marginal notes
- supplemental information not a part of the reference proper: notes on
a item's source language, or relation to other items, or a UMI order
number and page count, or a Computing Reviews or Math Reviews number...
- separate multiple comments with a semicolon
- "to appear in", "submitted to", "in press" and the like require fixup
later, at which time changes in other info such as title (and thus
label) tend to be overlooked; so please use these only as comments
on the future of an entry already published in some form
edition: of a book
- use numbered ordinal, e.g. "2nd"
editor:
- editors of proceedings not needed, and discouraged
- otherwise, use guidelines for author
institution: publisher of a techreport
- include any relevant department, and list in minor-to-major order
(e.g. "Dept. Comput. Sci., Univ. California Berkeley")
isbn: of book (optional)
- worthwhile only for obscure or otherwise hard-to-find items
- give with hyphens as specified by publisher
journal:
- abbreviations for some popular journals are in the authority file.
The merging software will recognize and convert most variant
abbreviations to standard form.
- separate journal series are considered separate journals, e.g.
journal = "J. Combin. Theory Ser. A" rather than series or volume A
keywords:
- use to supplement, for searching or descriptive purposes, terms
already present in the item's title
- separate multiple keywords with commas
- keywords need only be attached to the newest of a paper's appearances,
if identical for all
- use those in authority file, by preference
- additions to the list of keywords, are expected and welcomed, within
reason;
month: month of publication
- encouraged for techreports and theses, discouraged otherwise
- use bibtex standard abbreviations (three letters, lower case, no quotes)
note:
- use for supplemental information which should appear in a citing paper's
reference list; otherwise use comments field
- e.g. note = "Errata in 2(1981), 105"
- for theses, give techreport type and number, if known, e.g. note =
"Report TR-86-103"
number: of techreport, work in a series, or issue of journal
- essential for true techreports (nolle techreportum sine numeratum)
- for journals, necessary iff there exists more than one "page 1" per
volume (e.g. proceedings as separately-paginated issue of journal),
and discouraged otherwise
- use "--" for combined issues, e.g. "3--4"
pages:
- use double dash "--" in a number range
precedes/succeeds: pointer lists for temporal relationships among entries
- for example
precedes = "oy-nubks-88" points to new & improved paper
succeeds = "k-cmgcl-77" is backpointer from it
publisher:
- see authority file for standard names of some publishers
school: granting degree, for thesis
- include any relevant department, since this assists inquiries about
availability or contents, and list in minor-to-major order (e.g.
"Dept. Comput. Sci., Univ. California Berkeley")
series: of books
- e.g. "Lecture Notes in Computer Science"
title: of item
- for English non-books, you need only capitalize first word and proper
names, and enclose latter capitalized words in braces so that bibtex
will leave them alone. If you prefer, you may capitalize other words
in a title to get full uppers & lowers capitalization, but take care
not to embrace them. (Full capitalization is optional because it's
more complex, and we know of no journals still requiring it.)
- omit qualifiers like "(extended abstract)"
- [van Leunen p.170] regardless of the style of the original, use colon to
separate title from subtitle (edit if necessary): for example change
"Serial science. {I}. Definitions" to "Serial science, {I}: Definitions"
- otherwise "correct" only what you're certain the author would want you to
- enclose math expressions (including numbers) in {$ ... $}, and express
in TeX notation
type: of techreport or thesis
- e.g. "Technical Report" or "Manuscript" or "M.{Phil}. Thesis"
- for theses, give the actual degree name, and supplement with keywords
"master thesis" or "doctoral thesis" accordingly
- if the thesis was distributed as a numbered report, then give its
type and number in the note= field
- capitalized words after the first need braces in this field
update: date and bibliographer corresponding to last change
- maintained by the merging software (so don't bother)
volume: in journal or multivolume book
year: year of publication
- "to appear", "submitted", "in press", etc? See the comments field.
'%' lines before entries: marginal comments wrt bibliography maintenance
- use to flag entries with errors you can't fix just now
("% wrong volume number"), or to flag truthful data that may look
erroneous ("% yes, ``connexion''")
- the string "###" can be used to call attention where you believe
something is missing or wrong. Feel free to fix such entries
if you have the correct details handy.
MISCELLANEOUS COMMENTS and OPEN PROBLEMS
Johnson's STOC/FOCS index issued in 1991 contains a wealth of information
on papers originally published through those conferences. Look to it
first if you need supplementary information on such a paper.
Standard bibtex comes with a limit of 750 entries per bibliography, which
is laughably small for us. People will be able to produce hardcopy
according to their own tastes only once they get their local bibtex
reconfigured.
bibtex (or at least its standard style files) have some fixed ideas about
how things can be published; for instance, the following won't print
completely and elicits a warning on the grounds that a proceedings cannot
have both volume and issue numbers (i.e. be published as a journal issue).
However SIGGRAPH's position is that its papers are published *in* a
proceedings and that the proceedings is *in* an issue of the journal.
@inproceedings{awg-psg-78
, author = "P. Atherton and K. Weiler and D. P. Greenberg"
, title = "Polygon shadow generation"
, booktitle = "Proc. SIGGRAPH '78"
, journal = "Comput. Graph."
, volume = "12"
, number = "3"
, year = "1978"
, pages = "275--281"
, oldlabel = "geom-20"
}
Patashnik's Bibtexing notes suggest that we ``don't take the field names
too seriously'' and adjust the entry until it prints right. For the
moment, entries of this sort (the only ones are from SIGGRAPH) have been
given the article type, and the booktitle field won't print. But if
SIGGRAPH's position is correct, then it is bibtex which should adapt in
the long run. (Perhaps bibtex might better have used a general inclusion
field ``in = "siggraph78"'' rather than crossref with its hardwired list of
permitted situations.)
BIB
SOFTWARE AVAILABLE ELSEWHERE
The computational geometry community has made some programs available for
working with bibliographies. They can be retrieved via anonymous
ftp to cs.usask.ca [128.233.128.5], in file pub/geometry/geombib.tar.Z.
They include programs to translate between `bibtex' format and the older
`refer' format, and programs to search quickly in large bibliographies.
A new program "bibview" which is an X tool for manipulating bibtex
databases has been published in comp.sources.x/v18i099. Its README says
it "supports the user in making new entries, searching for entries and
moving entries from one bib to another. It is possible to work with more
than one bib simultaneously. bibview is implemented with Xt and Athena
Widgets." bibview is available for ftp from (among other places)
comp.sources.x archive sites, from cs.orst.edu:/pub/src/printers/bibview.tar.Z,
and from dsrbg2.informatik.tu-muenchen.de:/pub/tex/bibview-1.0.tar.Z.
------------------------------
Date: Fri, 16 Sep 1994 12:38:25 +0800
From: Ronny Kohavi <ronnyk@cs.stanford.edu>
Subject: Conservation Law / InfoLoss : Empirical Results
After seeing some of the discussions on the conservation law when I came
back from my vacation, I decided to run some experiments.
Here are some results testing infoloss on common UCI datasets.
Don't expect infoloss to be a winner, but Cullen and I thought these
were interesting enough to share with everyone.
The results are for 10-fold CV. The first accuracy is regular
ID3 (no pruning, information-gain). The second accuracy result
is for ID3 with information loss. The numbers after the +- indicate
standard deviation between the 10 folds (these are usually somewhat
optimistic estimates as the folds are not independent).
Runs were done using MLC++ being developed at Stanford
(http://robotics.stanford.edu:/users/ronnyk/mlc.html).
Summary of results: The mean accuracy for Corral, monk2, and
breast-cancer are higher when infoloss is used. The trees formed by
infoloss are much larger. I have "time" set to 60, meaning that if a
run takes more than 60 seconds, the time is printed. In some cases
infoloss takes 5 times longer to build and test the trees (I also have
the actual sizes if anyone is interested).
Some comments about the experiment, my interpretation of these
results, and how to get the executable that generated these.
______________________________________________________________________
DATAFILE is /u/mlc/db/parity5+5.data with 100 instances
Inducer: ID3. Accuracy: 53% +-3.67%
Inducer: ID3loss. Accuracy: 46% +-6.18%
==============
DATAFILE is /u/mlc/db/corral.data with 32 instances [not at UCI]
Inducer: ID3. Accuracy: 68.33% +-10.67%
Inducer: ID3loss. Accuracy: 71.67% +-5.98% <=== beats ID3
==============
DATAFILE is /u/mlc/db/monk1.data with 124 instances
Inducer: ID3. Accuracy: 79.62% +-3.79%
Inducer: ID3loss. Accuracy: 56.67% +-4.13%
==============
DATAFILE is /u/mlc/db/monk2.data with 169 instances
Inducer: ID3. Accuracy: 55.07% +-5.38%
Inducer: ID3loss. Accuracy: 58.64% +-4.06% <==== beats ID3
==============
DATAFILE is /u/mlc/db/monk2-local.data with 169 instances
Inducer: ID3. Accuracy: 72.17% +-3.53%
Inducer: ID3loss. Accuracy: 56.25% +-4.7%
==============
DATAFILE is /u/mlc/db/monk3-full.data with 122 instances
Inducer: ID3. Accuracy: 88.53% +-3.08%
Inducer: ID3loss. Accuracy: 54.23% +-3.88%
==============
DATAFILE is /u/mlc/db/tic-tac-toe.all with 958 instances
Inducer: ID3. Accuracy: 85.17% +-0.81%
Inducer: ID3loss. Accuracy: 66.29% +-1.68%
==============
DATAFILE is /u/mlc/db/chess.all with 3196 instances
Inducer: ID3. Accuracy: 99.69% +-0.1%
85.0u 0.0s 3:17 43% 0+0k 0+0io 0pf+0w
Inducer: ID3loss. Accuracy: 66.93% +-1.05%
493.0u 0.0s 18:31 44% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/iris.all with 150 instances
Inducer: ID3. Accuracy: 94% +-2.1%
Inducer: ID3loss. Accuracy: 75.33% +-3.3%
==============
DATAFILE is /u/mlc/db/labor-neg.all with 57 instances
Inducer: ID3. Accuracy: 87.67% +-4.45%
Inducer: ID3loss. Accuracy: 82% +-7.19%
==============
DATAFILE is /u/mlc/db/vote-irvine.all with 435 instances
Inducer: ID3. Accuracy: 93.57% +-0.66%
Inducer: ID3loss. Accuracy: 84.36% +-0.97%
==============
DATAFILE is /u/mlc/db/vote1-irvine.all with 435 instances
Inducer: ID3. Accuracy: 84.81% +-1.22%
Inducer: ID3loss. Accuracy: 82.75% +-1.46%
==============
DATAFILE is /u/mlc/db/crx.all with 690 instances
Inducer: ID3. Accuracy: 78.84% +-0.99%
Inducer: ID3loss. Accuracy: 58.99% +-2.19%
119.0u 0.0s 4:11 47% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/pima.all with 768 instances
Inducer: ID3. Accuracy: 70.56% +-1.66%
63.0u 0.0s 1:59 52% 0+0k 0+0io 0pf+0w
Inducer: ID3loss. Accuracy: 60.66% +-1.94%
135.0u 0.0s 3:45 59% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/breast-cancer.all with 286 instances
Inducer: ID3. Accuracy: 67.16% +-1.32%
Inducer: ID3loss. Accuracy: 67.94% +-3.27% <==== beats ID3
==============
DATAFILE is /u/mlc/db/glass.all with 214 instances
Inducer: ID3. Accuracy: 67.27% +-3.42%
Inducer: ID3loss. Accuracy: 45.91% +-4.38%
==============
DATAFILE is /u/mlc/db/glass2.all with 163 instances
Inducer: ID3. Accuracy: 84.52% +-2.85%
Inducer: ID3loss. Accuracy: 61.99% +-4.58%
==============
DATAFILE is /u/mlc/db/cleve.all with 303 instances
Inducer: ID3. Accuracy: 72.99% +-3.29%
Inducer: ID3loss. Accuracy: 65.39% +-1.76%
==============
DATAFILE is /u/mlc/db/hepatitis.all with 155 instances
Inducer: ID3. Accuracy: 80.67% +-2.71%
Inducer: ID3loss. Accuracy: 79.17% +-3.62% <==== insignificant diff
==============
DATAFILE is /u/mlc/db/horse-colic.all with 368 instances
Inducer: ID3. Accuracy: 77.7% +-1.64%
Inducer: ID3loss. Accuracy: 67.93% +-2.32%
==============
DATAFILE is /u/mlc/db/hypothyroid.all with 3163 instances
Inducer: ID3. Accuracy: 98.83% +-0.29%
150.0u 0.0s 5:25 46% 0+0k 0+0io 0pf+0w
Inducer: ID3loss. Accuracy: 94.97% +-0.69%
945.0u 1.0s 34:42 45% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/sick-euthyroid.all with 3163 instances
Inducer: ID3. Accuracy: 97.28% +-0.37%
220.0u 0.0s 7:08 51% 0+0k 0+0io 0pf+0w
Inducer: ID3loss. Accuracy: 87.48% +-0.58%
1047.0u 1.0s 30:36 57% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/lymphography.all with 148 instances
Inducer: ID3. Accuracy: 71.62% +-3.1%
Inducer: ID3loss. Accuracy: 63.43% +-3.3%
==============
DATAFILE is /u/mlc/db/mushroom.all with 8124 instances
Inducer: ID3. Accuracy: 100% +-0%
71.0u 1.0s 1:18 91% 0+0k 0+0io 0pf+0w
Inducer: ID3loss. Accuracy: 99.41% +-0.11%
279.0u 1.0s 5:05 91% 0+0k 0+0io 0pf+0w
==============
DATAFILE is /u/mlc/db/soybean-small.all with 47 instances
Inducer: ID3. Accuracy: 98% +-2%
Inducer: ID3loss. Accuracy: 47.5% +-6.38%
==============
4168.0u 13.0s 2:14:09 51% 0+0k 0+0io 0pf+0w
______________________________________________________________________
A few comments on the experiments:
1. Except for the monk problems, corral, and parity where
the test set is the full space, I used all the data
for cross validation. For the abovementioned problems I used
a smaller portion of the space as the data for CV (sizes are given).
2. The cross validation folds generated are the same for
both runs (info-gain, info-loss). This is critical,
as differences in folds can mean large differences in estimated
accuracies.
Discussion of results:
1. In all three cases were infoloss was superior, there is a
large overlap between the confidence intervals, making any conclusion
about the superiority of infoloss in these cases questionable. Of
course, it is still interesting to see that there are more than a few
cases where differences are not significant!
Here are the results from leave-one-out runs on those datasets
DATAFILE is /u/mlc/db/corral.data with 32 instances
Inducer: ID3. Accuracy: 59.38% +-8.82%
Inducer: id3loss. Accuracy: 71.88% +-8.08%
DATAFILE is /u/mlc/db/monk2.data with 169 instances
Inducer: ID3. Accuracy: 56.8% +-3.82%
Inducer: id3loss. Accuracy: 57.4% +-3.82%
DATAFILE is /u/mlc/db/hepatitis.all with 155 instances
Inducer: ID3. Accuracy: 81.29% +-3.14%
Inducer: id3loss. Accuracy: 82.58% +-3.06%
DATAFILE is /u/mlc/db/breast-cancer.all with 286 instances
Inducer: ID3. Accuracy: 66.43% +-2.8%
Inducer: id3loss. Accuracy: 69.93% +-2.72%
The confidence intervals still overlap badly even with leave-one-out.
2. On the three (four when doing leave-one-out) datasets were infoloss
was superior, I will say the following:
a. Corral was intended to fool ID3 into picking the worst attribute at
the root (John, Kohavi, Pfleger, ML 1994), so it explains why infoLoss
does better.
b. Monk2 is a symmetric concept where it does not matter
which attribute you pick for a subtree. Still, this may
count as one of those cases where infogain is losing on the average.
c. Breast-cancer using majority gives
Inducer: Const. Accuracy: 70.36% +-2.29% (10-fold CV)
so both are doing below majority.
d. Hepatitis (a close match) on a majority inducer gives:
Inducer: Const. Accuracy: 79.17% +-3.85%
e. Mushroom is the first one they both do well and way above majority:
Inducer: Const. Accuracy: 51.8% +-0.49% (10-fold CV)
Maybe with >8,000 instances you can't do really bad :-)
Trying this on your datasets:
If you would like to try these on some of your datasets, you can get
the object program (for Sparc 10) that generated these, by anon ftp from
starry.stanford.edu:pub/ronnyk/infoloss/* (see README there).
If you don't have access to a Sparc10, I'll be glad to run these
on other datasets PROVIDED you give me a datafile and namesfile
in C4.5 format.
Ronny Kohavi ronnyk@CS.Stanford.Edu, http://robotics.stanford.edu/~ronnyk
------------------------------
End of ML-LIST (Digest format)
****************************************