Copy Link
Add to Bookmark
Report

Machine Learning List Vol. 2 No. 19

eZine's profile picture
Published in 
Machine Learning List
 · 11 months ago

 
Machine Learning List: Vol. 2 No. 19
Friday, Sept 28, 1990

Contents:
m-of-n learning
Cobweb/3

The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
may be FTP'd from ics.uci.edu in /usr2/spool/ftp/pub/ml-list/V<X>/<N> or N.Z
where X and N are the volume and number of the issue; ID & password: anonymous

------------------------------

Date: Sat, 15 Sep 90 07:07:59 CDT
From: "Douglas H. Fisher" <dfisher@vuse.vanderbilt.edu>
Message-Id: <9009151207.AA04791@air.vuse>
Subject: m-of-n learning

Ritter's note quoted the result as `if one only searches the space
of m of n concepts' then they are not learnable -- the flurry of
notes that followed noted systems (Hampson and colleagues too?) that
learned m of n as special cases -- someone should solicit a clarification.
------------------------------
Date: Mon, 17 Sep 90 18:37:51 PDT
From: Wray Buntine <wray@ptolemy.arc.nasa.GOV>
Subject: Re: learnability & m-of-n learning

I couldn't resist the temptation of jumping in on the Pazzani/Ritter
discussion about learning m-of-n concepts and learnability.

First, lets just consider Pazzini's representation.
Notice that (let a,b,c,d be boolean variables)

2-out-of-4(a,b,c,d) = hamming distance (between the concept "abcd"
and the concept "a=true,b=true,c=true,d=true")
>= 2

Now hamming distance is the standard similarity metric used by
nearest-neighbour (called "instance-based" or IBL in AI) algorithms
when applied to boolean variables. So, we could learn
general n-of-m concepts by modifying an instance-based approach
so that it can (1) remove irrelevant attributes (i.e. in the above case,
all variables except a,b,c,d are irrelevant to the metric)
and (2) whittle away the original training set of examples to a small
set of (presumably) prototypical ones. Davis Aha at
UC Irvine is working on instance-based learning algorithms of this sort
or something similar.

If you really wanted to get messy, you could learn concepts using
AUTOCLASS and then translate the probabilistic classes AUTOCLASS produces
into concepts described by "weighted" hamming distances using the
same kind of transformation Minsky and Papert did to turn a
"linear Bayes classifier" into a "perceptron".

So there's two more wild ideas for you to try and unscramble Mike.

Second, some comments about Ritter's response. Steve Ritter says:
> I'm not sure if this is too theoretical for you, but Pitt and Valiant
> (1988, JACM, 35, 965-984) have shown that n-of-m concepts (they call
> them "boolean threshold functions") are not learnable.

Aside:
Pitt and Valiant's results were concerned with "noise-free" concepts.
I'm sure Mike's original request was for learning n-of-m concepts in the
presence of noise (or uncertainty--one man's noise is another man's
uncertainty). Well, since medical domains was one stated application,
to be realistic it should have been.

How useful is it to know that something is not learnable?
It's really just a warning flag that says, "this problem is not
going to be easy". Of course, if you look into the guts of the
proof of non-learnability, you may also find some useful hints
on how Mike might design his learning algorithm.

I think there is a useful comparison here with the P/NP question.
i.e. What does knowing something is NP-complete tell us about
how we might write an algorithm for that particular problem class?
Here's a quote from Turner (SIAM J Comp., 15(2), '86):

Most research in algorithm design relies on worst-case analysis
for performance comparisons. Unfortunately, worst-case analysis does not
always provide an adequate measure of an algorithm's effectiveness.
This is particularly true in the case of heuristic algorithms ... In
such cases, analysis of the probable performance can yield more meaningful
results and can provide insight leading to better algorithms.

There is now a whole body of theory in theoretical computing devoted to
average-case analysis, which shows that in "most cases", problems like
k-colourability and 3-partition are solvable in O(N^2), O(N^3) etc.
So what we thought was hard often turns out to be easy in "most cases".

The comparison becomes even more interesting when you try and relate
average & worst-case time complexity of algorithms with average &
worst-case sample complexity of learning. Here you start getting
into sticky bits and pieces about uniform convergence vs. Bayesian
statistics, etc.

I could go on at length, but to cut a long story short:
Pitt and Valiant's result is not "too theoretical" for Mike, rather
they are answering the wrong question (i.e. "is it hard?",
instead of "how would you design an algorithm?").

Wray Buntine

(Who's spent the last few years of his research life designing and
writing learning algorithms for non-learnable problems.)
------------------------------
Date: Mon, 24 Sep 90 22:06:38 -0500
From: Lenny Pitt <pitt@cs.uiuc.edu>
Subject: Learnability and m-of-n concepts

Here are some (hopefully) clarifying remarks for ML-LIST regarding
n-of-m concepts.

Mike Pazzani brought the discussion on n-of-m concepts to my
attention... I thought I might make some comments on the role of
such negative results in learning theory, and in particular,
on the meaning of the "non-learnability of n-of-m concepts" proof
by Valiant and myself. (My opinions are not necessarily shared by
Les Valiant).

There are a variety of formal definitions of "learnable".
Our theorem shows that for the distribution-independent
(PAC, or "probably approximately correct") model of Valiant,
no polynomial-time learning algorithm exists unless
RP (random polynomial time) is equal to NP (nondeterministic
polynomial time), which is thought unlikely. The proof
relies on showing that it is NP-hard to determine, given
a finite set of example instances, whether there exists
an n-of-m concept that is consistent with the examples.
This also shows that learning is not possible (unless P = NP)
in the model of exact identification with equivalence queries
due to Angluin, or in any model for which the ability to
find consistent hypotheses is a necessary requirement for satisfying
the definition of "learnable".

This type of negative result is very dependent on the hypothesis
space used by the learning algorithm in forming its conjecture.
The result thus says that it is hard [in the worst case] for an algorithm
to find reasonably accurate n-of-m hypotheses for n-of-m concepts.
The result says nothing of the ease or difficulty of learning n-of-m
concepts if the algorithm may conjecture rules that are not expressible
as an n-of-m concept. In this case, as with a number of other (but not
all) "nonlearnabilty" results of this type, efficient algorithms in fact
*are* known that PAC learn the class, but using a different hypothesis
space. An n-of-m concept is just a linear threshold function, with all
weights either 0 or 1, and with an integer threshold. The class is a
subset of real-valued linear threshold functions, and can thus be efficiently
learned by an algorithm for this larger class. Dietterich mentions
Littlestone's algorithm as an example. I'm not familiar with
Zhang's work (mentioned by Michalski), but a different positive
result than Littlestone's is certainly possible. Further note that
it is straightforward to see that such concepts can be learned efficiently
by an algorithm that is allowed to ask whether *particular* examples
are positive or negative examples of the concept.

Our negative result for n-of-m concepts, and others contained in the same
paper, should be taken in the context of the field at the time:
We were investigating what techniques were available for showing
that certain types of concepts were *not* learnable according to the
PAC learning model. In my view, the moral of that paper is not that
"m-of-n" threshold functions are hard to learn... they clearly are not.
It is that the representation used by the algorithm can make a big difference
in the complexity of an inference. That is, it is not only the complexity
of the *function* to be inferred that places computational limitations
on the inference process, but additionally, the syntax of the
class of possible representations used by the learning algorithm
can make the difference between tractable and intractable.
Subsequent research has focussed on relaxing syntactic restrictions
on the output of learning algorithms, so that negative learning results are
more meaningful. For example, see the paper by Kearns and Valiant
[Proc. 21st STOC], by myself and Warmuth [to appear, JCSS],
or by Schapire [Proc. COLT '90].

I'll agree with Wray Buntine that such nonlearnability
proofs (even those that do not depend on syntactic restrictions
on the hypothesis space) are really just flags that say
"the problem is not going to be easy". Alas,
"the guts of the proof[s] of non-learnability" do not appear
to yield "some useful hints" on how to design learning algorthms.

A main complaint about most negative results in analysis of
algorithms and complexity theory is that they are "worst-case" complexity.
The PAC learning model is a mixture of worst and average case:
The definitions only require that a learning algorithm perform
well "on average" (or with reasonable accuracy) against the distribution
on which it was trained. However, the particular concept to be learned
can be chosen in "the worst case", as can the distribution
on training examples. [** It should be noted though, that the
"worst case" distributions that are typically used in nonlearnability
results are uniform distributions on finite subsets of all possible
examples; i.e., fairly simple distributions. Thus the PAC requirement
that the algorithm work "for all distributions" does not seem to
me to be the reason that certain classes can be proven nonlearnable. **]
In my view, an extremely important (but perhaps difficult)
avenue of research is that of determining what an "average" input
(concept-learning instance) of a concept class is, and formalizing the notion
of hard (or easy) on average for a given concept class in a way that
yields meaningful positive and negative results.

------------------------------
Date: Mon, 17 Sep 90 18:33:27 PDT
From: Kevin Thompson <kthompso@ptolemy.arc.nasa.GOV>
Subject: Announcing Cobweb/3, available in the public domain.

We are pleased to announce that Release 1.1 of Cobweb/3 has been completed,
and is available for anonymous FTP. Cobweb/3 is an implementation of Cobweb
(Fisher, 1987) and Classit (Gennari, Langley, and Fisher, 1989) released by
the Icarus group at NASA Ames Research Center for experimentation by the
community. It is designed for ease of use, and features an extensive manual,
available in postscript form via FTP or as hardcopy if you send us your
address.

Below is the README file for this release. Comments to
labyrinth@ptolemy.arc.nasa.gov.

Kevin Thompson

==============================================================================
WHAT IS COBWEB/3:

Cobweb/3 is an implementation of Cobweb and Classit. These algorithms
are incremental, unsupervised concept formation algorithms using
Category Utility to search for a concept hierarchy that maximizes the
ability to make predictions about missing attributes.

Input consists of a data set of objects described as vectors of either
discrete or real-valued attributes. The output is a concept hierarchy,
and Cobweb/3 defines various prediction modes that allow testing of
predictive accuracy on unseen objects.

Cobweb/3 is written in (mostly portable) Common Lisp, and has been
tested using Franz, Lucid, Harlequin LispWorks, and AKCL on Sun 4s.
For Franz, there is an optional user interface for visualizing
classification results, and possibly building a concept hierarchy with
user input.

RELEASE HISTORY:

Version: 1.0 20 Jun 90 Initial release.

Version: 1.1 17 Sep 90 second release --
a few bug fixes.
deal with missing attributes
more prediction modes.
Tested with Lucid, Harlequin.
on-line help

FOR FURTHER INFORMATION:

For more detailed discussions of this implementation of Cobweb, and the
underlying theory, see the associated manual:

McKusick, K., & Thompson, K. (1990). Cobweb/3: A portable implementation
(Technical Report FIA-90-6-18-2). Moffett Field, CA.: NASA Ames Research
Center, Artificial Intelligence Research Branch.

We will be happy to send you a hard copy of this manual (it's about 70
pages long), please send e-mail with your (real) mail address to
labyrinth@ptolemy.arc.nasa.gov. The manual is also available in
compressed postscript form as cobweb-manual.ps.Z (see ftp information
below). If you cannot print this out please let us know.

Another good reference to Classit and Cobweb is:

Gennari, J.H., Langley, P., & Fisher, D.H. (1989). Models of incremental
concept formation. Artificial Intelligence, 40, 11--61.

Comments, bug reports, etc., are welcome, and should be sent to
labyrinth@ptolemy.arc.nasa.gov. We would appreciate your registering
your use of this system, describing applications and any results you get
(positive or negative). Please feel free to make suggestions about
desirable improvements and extensions, and perceived problem areas. We
regard such feedback as an essential element of the development process.


WHY THIS IMPLEMENTATION:

This implementation of Cobweb is *not* the most efficient one, nor the
smallest. It has been designed more for usability, and could undoubtedly
be sped up considerably. Cobweb/3 implementation features:

1) A graphic interface that allows dynamic inspection of trees and node
contents, as well as capability for users to build trees themselves, with
guidance from the system evaluation functions (Franz, X windows only,
until CLIM becomes more widely available);

2) An extensive set of top-level switches to control how nodes are
printed, how much output to show, allow passing in of trees from an
external file, etc.;

3) Availability of both continuously-valued and nominal attributes (can
be used together, but this is a research issue);

4) Preliminary mechanism for value hierarchies given as background
knowledge;

5) Performance mechanisms for testing, allowing prediction of missing
attributes (warning: research code);

6) A fairly extensive manual.

AVAILABILITY:

Cobweb/3 is available via anonymous FTP. For those of you without FTP
access, send us e-mail with particulars about your situation and
requirements and we will try to accommodate you (no promises).

To FTP the code, connect to muir.arc.nasa.gov (128.102.112.24) (user
anonymous, password 'guest'), and cd to directory pub/icarus. Get the file
cobweb.tar.Z (remember to use binary mode), and, if you are using Franz
Common Lisp with Common Windows on a Sun, either gr3.fasl.Z or gr4.fasl.Z
(depending on whether you are on a Sun 3 or Sun 4).

LISP ENVIRONMENT DEPENDENCIES:

Cobweb/3 was developed using Franz Common Lisp on Sun workstations, but
has been tested using these implementations:

Franz Allegro Common Lisp 3.1
AKCL 1.243
Lucid Common Lisp 3.0
Harlequin LispWorks 2.0

It should compile without problems on any other implementation; please let
us know if it does not (with associated patches :). However, on other
platforms, for full flexibility with node inspection, you may have to
define get-printer and set-printer in implementation-dep.cl.

The current graphics interface was developed with Franz's Common Windows.
We hope to move to International Lisp Associates' CLIM in the near future.

INSTALLATION:

Once you have these files, execute the UNIX command:

uncompress -c cobweb.tar.Z | tar xf -

You should have a subdirectory called 'code', and a file INSTALL that will
tell you how to install the system.


Kevin Thompson
Kathleen McKusick
labyrinth@ptolemy.arc.nasa.gov
NASA Ames Research Center
Moffett Field, CA, USA
------------------------------
END of ML-LIST 2.19

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT