Copy Link
Add to Bookmark
Report
Neuron Digest Volume 04 Number 31
Neuron Digest Saturday, 10 Dec 1988 Volume 4 : Issue 31
Today's Topics:
Schwartz Associates Neural Net Promotional Advertisement is bogus
Local learning algs?
Re: Local learning algs?
lessons in Connectionism
Polynomial Discriminant Method
Re: Learning arbitrary transfer functio
Papers on Neuralnets and expertsystems
Re: Learning arbitrary transfer functio
Request for info on Weight-Decay in NN Models
high level cognition
Back-propogation question
Send submissions, questions, address maintenance and requests for old issues to
"neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request"
------------------------------------------------------------
Subject: Schwartz Associates Neural Net Promotional Advertisement is bogus
From: MINSKY@AI.AI.MIT.EDU (Marvin Minsky)
Organization: The Internet
Date: 29 Nov 88 04:18:08 +0000
A company named Schwartz Associates of Mountain View, CA,has
been mailing a neural net promotional advertisement that prominently
displays my name on the envelope and every page inside, with
statements that broadly suggest that I am involved in their
enterprise. They suggest that along with the $1495.00 collection of
videos and reprints, the customer is entitled to contact me for a
final qualifying examination. Needless to say, I have nothing to do
with them and it would seem natural to presume that their wares are as
shabby as their disgraceful business practices.
[[ Editor's note: In the new commercial field of Neural Nets, as in
nearly all other fields, caveat emptor. However, I am shocked that a
company should engage in such business practices that a well known
researcher must issue a disclaimer as above. If anyone from Schwartz
Associates (Tom?) would respond, I will publish the answer. If you,
readers, wish to get information about their claims, please let me
know so I may set the record straight. -PM ]]
------------------------------
Subject: Local learning algs?
From: hassell@tramp.Colorado.EDU (Christopher Hassell)
Date: 25 Nov 88 08:01:25 +0000
Hello,
I'm quite inexperienced in this field and am persuing information on
the 'general' background of this subject BUT,
I'd like to know if anyone (if not a lot of you already) uses learning
algorithms that work at integrating a neuron's weights DIRECTLY from
that neurons input with maybe a little environment's extra input,
instead of the all-knowledgable teaching algo telling what is wrong
and where.
This would appear to be the more accurate 'simulation' of the brain,
though from what I guess, more unwieldly. My own theory is something
along the lines of a 'hardening' of weights and 'weakening' upon
various signals (a 'stressful' signal would cause fluctuation, a
signal with sharp dropoffs, lower activity, 'quick' relaxation would
'harden' the wieghts employed).
I've though of running my own simulations, but more info is needed
here. I got the above model simply from a degree of actual data
(unknowledgable test subjects exhibit higher constant activity than
comfortable ones) and good ol' gee-that's-the-way-it-feels-to-me data
(ugh).
This is a long request but info would definately be appreciated.
sendat(TimeNow>TimeInfoRecieved , ThankYouMessage)
{rutgers!sunybcs, nbires, ncar}!boulder!tramp!hassell
[[ Editor's note: I cannot agree that the "unsupervised learning" this
writer refers to (see PDP vol 1 for a good introduction) is "a more
accurate simulation of the brain" simply because the brain appears to
use a panoply of mechanisms... most of which we have not yet
discovered. However, I suggest looking at Grossberg's ART work, and
any of then many papers on "unsupervised learning." -PM ]]
------------------------------
Subject: Re: Local learning algs?
From: manj@brand.usc.edu (B. S. Manjunath)
Organization: University of Southern California, Los Angeles, CA
Date: 25 Nov 88 16:09:32 +0000
In previous article hassell@tramp.Colorado.EDU (Christopher Hassell) writes:
>I'd like to know if anyone (if not a lot of you already) uses
>learning algorithms that work at integrating a neuron's weights
>DIRECTLY from that neurons input with maybe a little environment's
>extra input, instead of the all-knowledgable teaching algo telling
>what is wrong and where.
I did some simulations using a stochastic learning rule in which the
environment supplies a global feedback ( all the neurons, including
the ones in the hidden layer receive the same feedback) and the
weights are updated by a local algorithm. No BP (Back Propagation) of
the errors is involved.
Under the same testing conditions and random initial weights in the
range (0,1) this algorithm was faster by a factor of atleast 1.5
compared to BP rule. I tested on the XOR and Parity learning problems
(both described in detail in the PDP book by Rumelhart et al) and the
results were impressive.
The following references might be useful:
1. R.J. Williams,"A class of gradient estimating algorithms for
reinforcement learning in NN", ICNN 87, vol. 2, pp. 601-608.
2. A.G. Barto and M.J. Jordan,"Gradient following without BP in
layered nets",ICNN 87,vol. 2, pp. 629-636.
If you want to go a little deeper in to stochastic learning algorithms
3. Narendra K.S. and Thathachar M.A.L,"Learning Automata - A survey",
IEEE Tran Syst., Man and Cyber., vol 4,pp 323-334, 1974.
Though this is a little bit outdated it contains all the relevant info.
PS: I am not claiming that this stochastic rule is superior compared
to BP under all circumstances. The main advantages of this are its
local updating and it is less sensitive to the initial weight
configuration than the BP.
bs manjunath.
------------------------------
Subject: lessons in Connectionism
From: aguero.j@oxy.edu (Josefina Aguero)
Organization: Occidental College, Los Angeles, CA 90041
Date: 29 Nov 88 10:55:24 +0000
I am a senior at Occidental College busily putting together the thesis
that is requiured for a Cognitive Science major. In my project, I am
interested in making Connectionism accessible to people who know
nothing about cognitive science. This involves educating people on
simple things such as a little neurobiology, computer science and
philosophy. The project is progressing slowly and carefully as I pour
over McClelland and Rumelhart's PDP books. I would like some advice
on what sort of issues I should emphasize, aside from what I am
emphasizing now, which is reductionism. Advice on what approach to
take, given that this project is aimed to be educational and for the
most part descriptive, would be greatly appreciated.
Also, just out of curiosity, I think the fact there have been other
attempts in the past at building Connectionist models is important in
explaining intellectual debts and understanding the context in which
Connectionist models are flourishing, as opposed to before when
parallel architectures were more experimental (sorry); I wonder if
anyone believes this is important in understanding Connectionism. With
this, I invite comments and advice. Thank you in advance.
Josefina Aguero
------------------------------
Subject: Polynomial Discriminant Method
From: tomh@proxftl.UUCP (Tom Holroyd)
Organization: Proximity Technology, Ft. Lauderdale
Date: 29 Nov 88 18:21:36 +0000
The Dec. '88 issue of Computer Language has an article on the
PADALINE, a polynomial ADALINE. The only reference given is a Ph.D.
dissertation by Donald Specht (Stanford) written in the mid-60's. Are
there any more recent references?
A brief description of the method:
Given two categories of patterns, A and B. Construct a polynomial
that will yield a positive number given a vector XA from category A
and a negative number given a vector XB from category B. Compute the
constants of the polynomial using the general formula:
c = 1/(z1!z2!...zp! * s^2h) *
z1,z2,...zp m
[ 1/m * sum(XA ^z1 * XA ^z2 * ... * EXP ) -
i=1 i1 i2 XA
i
n
K/n * sum(XB ^z1 * XB ^z2 * ... * EXP ) ]
i=1 i1 i2 XB
i
where z1...zp are the subscripts for the constants, s^2h is a smoothing
factor, XA is the 1st element of the ith vector from the set of m vectors
i1
in category A. XB is the 1st element of the ith category B vector; there
i1
are n of these. K is computed from the ratio of the number of A vectors to
the number of B vectors. EXP is exp(-L/2s^2) where L is the square of the
X
length of X.
What you do is compute each constant using the above formula, and then
construct a polynomial which, when given a vector from either
category, will yield the correct categorization.
You need to have all your vectors pre-categorized, of course. It
looks nice since you only run the calculation once, and after that the
polynomial just works. You don't have to wait for something to
converge. Adding a new pattern vector means re-doing the whole
calculation, tho.
Anybody actually used this and care to comment? How many terms do you
need, say, if you have 1000 10-dimensional vectors? Can you throw
away the higher order terms without hurting the categorization too
much?
Tom Holroyd
UUCP: {uflorida,uunet}!novavax!proxftl!tomh
The white knight is talking backwards.
------------------------------
Subject: Re: Learning arbitrary transfer functio
From: joe@amos.ling.ucsd.edu (Shadow)
Organization: Univ. of Calif., San Diego
Date: 29 Nov 88 20:30:23 +0000
In article <163400002@inmet> ryer@inmet.UUCP writes:
>> So, how do human's learn non linear functions ?
>>
>> : you learn that x^2, for instance, is X times X.
>>
>> And how about X times Y ? How do humans learn that ?
>>
>> : you memorize it, for single digits, and
>> : for more than a single digit, you multiply streams
>> of digits together in a carry routine.
>Although my knowledge of neural nets is limited, I won't buy what is
>written above. Most persons can, for example, throw a baseball more
>or less at the target in spite of gravity. This requires a non-linear
>calculation. This is not done via multiplication tables. Sure it is
>done by "experience", but so are neural network calculations.
Hmm. I'm no expert on human learning, but I don't buy what's written above.
When I throw a baseball off the top of a ten-story building, I am very
bad at hitting that at which I aimed (e.g., students). This would lead
me to theorize that I have not learned a non-linear relationship.
All of this aside, I must note that the original article was
misinterpreted. That was unfortunate, as I was theorizing on ways to
improve generalized learning of non-linear mathematical relationships
for data outside of the training domain... results in this area were
usally fairly dismal in the experiments which I conducted.
Ideas:
1. how about linear units on the output layer ?
(Idea care of Jeff Elman, ICS, CRL)
2. sub-networks trained for sub-tasks.
(sub-networks mentioned to me in passing by Jeff Elman, ICS,CRL)
I welcome comments, and actually, I would really like to hear from
people who are experts on human learning. This topic is obviously too
hot for me to handle.
(feel free to send mail)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
= - =
= "But why not play god ? " - joe@amos.ling.ucsd.edu =
= - un-named geneticist - =
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
------------------------------
Subject: Papers on Neuralnets and expertsystems
From: ramanuja@cs.Buffalo.EDU (Sridhar Ramanujam)
Organization: SUNY/Buffalo Computer Science
Date: 29 Nov 88 21:20:31 +0000
I'm looking for the following papers presented at the INNS conference,
and will be greateful if some kind soul out there could send me
copies.
The papers are:
1.) A NEURAL NETWORK APPROCH FOR IMPLEMENTING EXPERT SYSTEMES.
P.A.Ramamoorthy and S.Ho.
Department of Electrical and Computer Engineering
University of Cincinnati.
2.) INTEGRATION OF NEURAL NETWORKS AND EXPERT SYSTEMES FOR PATTERN
RECOGNITION.
Daeng Li and Willian G.Wee.
Departament of Electrical and Computer Engineering
University of Cincinnati.
3.) DYNAMICS SCHEMAS,EXPERT SYSTEMS and A.R.T
Sam Leven and Young Yoon
The University of Texas at Arlington.
thanks in advance.
sridhar
e-mail: ramanuja@cs.buffalo.edu (CSNET) or IN%"V999QJGY@UBVMSC.BITNET"
us-mail: Sridhar Ramanujam, 518 LaSalle Ave, Buffalo, NY 14215.
------------------------------
Subject: Re: Learning arbitrary transfer functio
From: aluko@Portia.Stanford.EDU (Stephen Goldschmidt)
Organization: Stanford University
Date: 30 Nov 88 18:33:20 +0000
In article <5572@sdcsvax.UCSD.EDU> you write:
>All of this aside, I must note that the original article was misinterpreted.
>That was unfortunate, as I was theorizing on ways to improve generalized
>learning of non-linear mathematical relationships for data outside
>of the training domain... results in this area were usally fairly dismal
>in the experiments which I conducted.
I have done considerable work in modeling non-linear functions with a
program called ASPN (Algorithm for Synthesis of Polynomial Networks)
which I helped to develop at Barron Associates Inc. during 1986. My
experience was that polynomial functions (which is what ASPN
ultimately produces, though in the form of a network) are excellent
for interpolations under certain conditions, but fail miserably on
extrapolation. Part of the art is to configure your problem so that
the network is never asked to extrapolate.
An example:
Suppose you want to predict the output of an unforced linear system
of the form y'(t) = y(t) - b
If you train your network to model the function y(t, b, y(0)) for t < 2
and then evaluate the network on t = 3, you are asking it to extrapolate
to values of t that it has never seen before. This is too much to
ask of an economist, let alone a computer! :-)
If, instead, you model the function y( y(t-1), y(t-2) )
the network should discover that
y(t) = (1+e)*y(t-1) - e*y(t-2)
which is not only an easier function to model, but also does not
require explicit knowledge of b.
When you evaluate it on t=3, the network is not going to try to
extrapolate (assuming that your input values of y(t-1) and y(t-2)
are in the range of the values used in training the network).
Thus, it is often possible to turn an extrapolation problem into
an interpolation problem.
Stephen R. Goldschmidt
aluko@portia.stanford.edu
------------------------------
Subject: Request for info on Weight-Decay in NN Models
From: kruschke@cogsci.berkeley.edu (John Kruschke)
Date: Fri, 02 Dec 88 13:14:06 -0800
I'm interested in all the information I can get regarding WEIGHT DECAY
in back-prop, or in other learning algorithms.
*In return* I'll collate all the info contributed and send the
complilation out to all contributors.
Info might include the following:
REFERENCES:
- Applications which used weight decay
- Theoretical treatments
Please be as complete as possible in your citation.
FIRST-HAND EXPERIENCE
- Application domain, details of I/O patterns, etc.
- exact decay procedure used, and results
(Please send info directly to me: kruschke@cogsci.berkeley.edu)
T H A N K S ! --John Kruschke.
------------------------------
Subject: high level cognition
From: smf7s@hudson.acc.virginia.edu (friedman steven michael)
Organization: University of Virginia
Date: 04 Dec 88 22:59:26 +0000
I have done some reading in the field of neural networking and
PDP, and I have seen much research in the areas of pattern
recognition, associative memory, and others. I have not seen much in
the way of higher level cognitive processes, such as logic and
inference, abstract knowlege representation or generalization. Can
anybody point me towards some references for books and journal
articles in these areas? Thanks in advance.
Steven M Friedman
Mail path: smf7s@virginia.BITNET
Voice path: (804) 295 0235
------------------------------
Subject: Back-propogation question
From: reiter@endor.harvard.edu (Ehud Reiter)
Organization: Aiken Computation Lab Harvard, Cambridge, MA
Date: 05 Dec 88 17:23:18 +0000
Is anyone aware of any empirical comparisons of back-propogation to
other algorithms for learning classifications from examples (e.g.
decision trees, exemplar learning)? The only such article I've seen
is Stanfill&Waltz's article in Dec 86 CACM, which claims that
"memory-based reasoning" (a.k.a. exemplar learning) does better than
back-prop at learning word pronunciations. I'd be very interested in
finding articles which look at other learning tasks, or articles which
compare back-prop to decision-tree learners.
The question I'm interested in is whether there is any evidence that
back-prop has better performance than other algorithms for learning
classifications from examples. This is a pure engineering question -
I'm interested in what works best on a computer, not in what people
do.
Thanks.
Ehud Reiter
reiter@harvard (ARPA,BITNET,UUCP)
reiter@harvard.harvard.EDU (new ARPA)
------------------------------
End of Neurons Digest
*********************