Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 1 No. 05
Machine Learning List: Vol. 1 No. 5
Saturday, August 5, 1989
Contents:
ML91
SRI-NIC Name Database for U.S. Internet Users
Experimentation
Notes: Reviewers of IJCAI-89 & CogSci sessions for ML-LIST solicited
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu
----------------------------------------------------------------------
Date: Mon, 31 Jul 89 13:28:44 CDT
>From: "B. Porter and R. Mooney" <ml90@cs.utexas.EDU>
Subject: ML91
A decision on the location of the Machine Learning Workshop for 1991
will be made soon. If you are interested in hosting this meeting, you
should send a bid to either Jaime Carbonell (carbonell@nl.cs.cmu.edu)
or Bruce Porter and Ray Mooney (ml90@cs.utexas.edu). Be sure to include:
1. Institution/dept
2. Proposed conference site and dates
3. Chairperson & local committee
4. ML work in the institution & geographical area
5. Industrial support (if any)
6. Accessibility/housing cost of location
7. Willingness to cooperate openly with ML community in organization
A couple of bids have already been submitted and a final decision will
be made at IJCAI-89.
----------------------------------------------------------------------
Subject: SRI-NIC Name Database for U.S. Internet users
Date: Mon, 31 Jul 89 16:40:37 -0700
>From: "David W. Aha" <aha@ICS.UCI.EDU>
I frequently need to access the e-mail address of USA researchers in our
community and often turn to the SRI-NIC name database. Unfortunately, their
names are seldomly there. So I'd like to encourage you to register
yourselves. Registering means that you send mail to:
registrar@sri-nic.arpa
and tell them your full name, U.S. mail address, phone number (optional),
and (of course) e-mail address. If you do this, other people on the net
will be able to query the database and find out how to send you e-mail.
At UCI the command is ``whois.'' Try the following command, and, if you
want researchers elsewhere to be able to reach you, please register.
% whois aha
Thanks,
David Aha
----------------------------------------------------------------------
Subject: Experimentation
Date: Sat, 05 Aug 89 10:52:44 -0700
>From: Michael Pazzani <pazzani@ICS.UCI.EDU>
Message-ID: <8908051052.aa18584@ICS.UCI.EDU>
In Ml-LIST 1.1, Bernd Nordhausen writes:
>I am interested to hear from other people what they think about the
>subject of experimentation, so let the flames roll.
I feel that experimentation is being overemphasized in the current machine
learning research to the extent that it is often misapplied. First, I'll
start with some good points about experimentation:
1. Experimentation forces the researcher to test his program and
theory under a wide variety of circumstances, and with examples
in many different presentation orders. (In an early version
of OCCAM, I tested the economic sanctions database in chronological
order and didn't find several bugs in the program until I tested
with random orders.)
2. Intuitive arguments and anecdotal evidences aren't a firm foundation
upon which to build future results. (After reading about ABSTRIPS
I thought that searching in an abstract space would reduce search costs.
More recent work has shown that this result does not generalize to many
problems.)
I do have several concerns however:
1. Currently, in ML, experimentation is done poorly. First, many
experiments are not run to test a particular hypothesis. Instead
they are exploratory post hoc data analyses. It is much easier to
prove statistical significance of results when testing a particular
hypothesis. This leads me to my second point. Too few people are
worrying about the statistical significance of their results. I worry
that many results may not replicated.
2. Experimentation on "real-world" data has little scientific value. It
makes nice PR to show people outside of machine learning that one
program can perform slightly better than another on the soybean data,
but this alone does not help us understand why one algorithm performs
better than another. Experimentation on artificial domains, in which
the complexity of the hypothesis and other characteristics of the data
are known and can be systematically varied are more useful in
understanding our algorithms. (Of course, "real-world" data sets are
not useless; they are very useful in pointing out research topics, etc.)
3. Overemphasis on performance measures obscures analysis of why algorithms
work or fail to work. An algorithm that cannot learn x-or can
perform very well even if the "correct" hypothesis requires an x-or.
Many experiments confound the class of situations on which an algorithm
will fail with how often that situation occurs in a given data set. The
former is more important to machine learning than the latter.
----------------------------------------------------------------------
Subject: Reviewers of IJCAI-89 & CogSci sessions for ML-LIST solicited
>From: ml-request <ml-request@ICS.UCI.EDU>
People willing to write a short review for Ml-LIST of sessions at the
upcoming CogSci & IJCAI conference are solicited. If you want to
comment on 3 or 4 papers, send your name and the session you'd like to
comment on to ml-request@ics.uci.edu
----------------------------------------------------------------------
END of ML-LIST 1.5