Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 4 No. 12
Machine Learning List: Vol. 4 No. 12
Tuesday, June 16, 1992
Contents:
ECML-93 Announcement
Real-World Applications of Machine Learning Techniques
Paradigmatic over-fitting
COLT 92
Machine Discovery Workshop
Job Advertisement- Arris Pharmaceutical
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
may be FTP'd from ics.uci.edu in pub/ml-list/V<X>/<N> or N.Z where X and N are
the volume and number of the issue; ID: anonymous PASSWORD: <your mail address>
----------------------------------------------------------------------
Date: Mon, 15 Jun 1992 16:45:16 +0100
From: Pavel Brazdil <pbrazdil@ciup1.ncc.up.pt>
Subject: ECML-93 Announcement
ECML-93
European Conference on Machine Learning
___________________________________________________
5-7 April 1993
Vienna, Austria
Announcement and Call for Papers
__________________________________________
General Information:
________________________
ECML-93 will continue with the tradition of earlier EWSL's
(European Working Session on Learning) and provide a platform
for presenting the latest results in the area of machine learning.
Although ECML-93 is the first conference under this name,
it can be considered as the sixth meeting of this kind in
Europe.
Programme:
______________
The scientific programme will include an invited talk by Ross
Quinlan from the University of Sydney and presentation of
selected papers. The programme will be complemented by the
possibility of organizing discussion groups and workshops on
selected topics (8 April).
Submission of Papers:
__________________________
Submissions are invited on original research covering all
aspects of machine learning including, but not limited to:
learning system architectures multi-strategy learning
inductive & deductive methods inductive logic programming
abduction automated discovery
representational change in learning learning in problem solving
reinforcement learning learning by analogy
case-based learning unsupervised learning
neural network learning genetic approaches
theory of learnability evaluation of learning methods
applications of ML
Long papers should be limited to 18 pages. Short papers
describing the work in progress should be limited to 10 pages.
Submissions should be made in four copies to the Programme
Chairman.
Important Dates:
___________________
Submission deadline: 16 October 1992
Notification of acceptance / rejection: 4 December 1992
Camera ready copy: 15 January 1993
Proposals for Discussion Groups and Workshops (8 April):
__________________________________________________________________
Particularly welcome are proposals for discussion groups and
workshops that promote collaboration between the machine
learning and other related research areas, such as logic
programming, program transformation, probabilistic approaches,
knowledge acquisition, human learning, distributed AI etc. The
position of each group can be summarized in the form of paper
that can appear in the ECML-93 Proceedings. The submission
deadline for proposals is 1 September 1992.
Programme Chairman:
__________________________
Pavel Brazdil Tel.: (+351) 2 600 1672 Ext. 106
LIACC, Rua Campo Alegre 823 Fax: (+351) 2 600 3654
4100 Porto, Portugal email: pbrazdil @ ncc.up.pt
Programme Committee:
__________________________
F. Bergadano (Italy) I. Bratko (Slovenia)
P. Brazdil (Portugal) L. de Raedt (Belgium)
J. G. Ganascia (France) K. de Jong (USA)
A. Kakas (Cyprus) Y. Kodratoff (France)
N. Lavrac (Slovenia) R.L. de Mantaras (Spain)
K. Morik (Germany) I. Mozetic (Austria)
S. Muggleton (UK) L. Saitta (Italy)
D. Sleeman (UK) J. Shavlik (USA)
M. Someren (Netherl.) W. Van de Velde (Belgium)
R. Wirth (Germany)
Local Arrangements:
________________________
Igor Mozetic and Gerhard Widmer Tel.: (+43) 1 533 6112
Austrian Research Institute for AI & or (+43) 1 535 32810
Dept. of Medical Cybernetics and AI Fax: (+43) 1 532 0652
Schottengasse 3
A - 1010 Wien, Austria Email: ecml @ ai.univie.ac.at
------------------------------
Date: Mon, 1 Jun 92 15:57:30 +0200
From: yk@lri.fr
Subject: Real-World Applications of Machine Learning Techniques
Call for Papers
Heuristics -The Journal of Knowledge Engineering
Special issue on
"Real-World Applications of Machine Learning Techniques"
During the last decade, the artificial intelligence (AI) approach to
machine learning (ML) produced a large body of different methods
illustrated by the systems born, for instance, from ID3 and AQ. Many other
techniques have also been discovered, tested, and compared to each other.
They include program synthesis and inductive logic programming,
explanation-based learning, PAC-learning, chunking, conceptual clustering,
generalization techniques, knowledge refinement, scientific discovery,
analogy and case-based reasoning.
In parallel, we witnessed the vigorous growth of Bayesian learning,
connectionism, and genetic algorithms.
This issue will stress real-world applications of the first set of
techniques, and also mixed systems that made use of Bayesian learning, or
connectionism, or genetic algorithms, together with one of the more AI
inspired methods.
The papers should present clearly why the learning technique, or the
combination of techniques, have been necessary to solve the peculiar
problem of their application. They should also indicate what actual
improvement was obtained because of the use of ML techniques, as
opposed to the same problem solved in a more standard way.
If possible, we would also like to establish a catalogue of successes
and failures. People that applied ML with success, but who cannot
undertake the work of writing a review paper are welcome to communicate a
few characteristics of their application. In parallel, people that failed to
obtain any improvement from the use of a ML technique, and who can
explain why, are also welcome to contact the guest editor.
Thus, the spirit of this special issue is to welcome a detailed
presentation of all systems that were able to solve a real-life problem (not
necessarily a complicated one) by using at least one of the methods born
from the AI approach to ML.
Submissions
Submit 4 copies of an original, unpublished paper to the guest
editor. All submissions should be double-spaced, each copy should begin
with a short (less than 200 words) abstract. Submissions should be received
by the guest editor by September 1st, 1992. Acceptance notices will be
issued by December 15th 1992, and the dead-line for submitting final
manuscripts and accompanying material will be February 1st, 1993.
Questions regarding the special issue should be directed to the guest
editor, phone (33) (1) 69 41 69 04, and preferably by email: yk@lri.lri.fr.
Yves Kodratoff
Guest Editor, Heuristics
LRI, Bldg 490
University Paris-Sud
F-91405 Orsay, France
------------------------------
Date: Fri, 5 Jun 92 10:44:21 PDT
Subject: Subject: Paradigmatic over-fitting
From: tgd@icsi.berkeley.EDU
Rik Belew recently posted the following message to the GA Digest.
With his permission, I am reposting this to ML-LIST. I think if you
substitute some of the standard Irvine databases for DeJong's F1-F5,
the same argument applies to much of the current experimental work in
Machine Learning. We need a constant flow of new databases.
--Tom
======================================================================
From: belew%FRULM63.BITNET@pucc.Princeton.EDU (Rik BELEW)
Date: Fri, 15 May 92 14:55:04 +0200
Subject: Paradigmatic over-fitting
There has been a great deal of discussion in the GA community concerning
the use of particular functions as a "test suite" against which different
methods (e.g., of performing cross-over) might be compared. The GA is
perhaps particularly well-suited to this mode of analysis, given the way
arbitrary "evaluation" functions are neatly cleaved from characteristics
of the algorithm itself. We have argued before that dichotomizing the
GA/evaluation function relationship in this fashion is inapprorpiate [W.
Hart & R. K. Belew, ICGA'91]. This note, however, is intended to focus on
the general use of test sets, in any fashion.
Ken DeJong set an ambitious precident with his thesis [K. DeJong, U.
Michigan, 1975]. As part of a careful empirical investigation of several
major dimensions of the GA, DeJong identified a set of five functions that
seemed to him (at that time) to provide a wide variety of test conditions
affecting the algorithm's performance. Since then, the resulting "De Jong
functions F1-F5" have assumed almost mythic status, and continue to be one
of the primary ways in which new GA techniques are evaluated.
Within the last several years, a number of researchers have re-evaluated
the DeJong test suite and found it wanting in one respect or another.
Some have felt that it does not provide a real test for cross-over, some
that the set does not accurately characterize the general space of
evaluation functions, and some that naturally occuring problems provide a
much better evaluation than anything synthesized on theoretical grounds.
There is merit to each of these criticisms, and all of this discussion has
furthered our understanding of the GA.
But it seems to me that somewhere along the line the original DeJong suite
became villified. It isn't just that "hind-sight is always 20-20." I
want to argue that DeJong's functions WERE excellent, so good that they
now ARE a victim of their own success. My argument goes beyond any
particular features of these functions or the GA, and therefore won't make
historical references beyond those just sketched. \footnote{If I'm right
though, it would be an interesting exercise in history of science to
confirm it, with a careful analysis of just which test functions were
used, when, by whom, with citation counting, etc.} It will rely instead on
some fundamental facts from machine learning.
Paradigmatic Over-fitting
"Over-fitting" is a widely recognized phenonemon in machine learning (and
before that, statistics). It refers to a tendancy by learning algorithms
to force the rule induced from a training corpus to agree with this data
set too closely, at the expense of generalization to other instances. We
have all probably seen the example of the same data set fit with two
polynomials, one that is correct and a second, higher-order one that also
attempts to fit the data's noise. A more recent example is provided by
some neural networks, which generalize much better to unseen data if their
training is stopped a bit early, even though further epochs of training
would continue to reduce the observed error on the training set.
I suggest entire scientific disciplines can suffer a similar fate. Many
groups of scientists have found it useful to identify a particular data
set, test suite or "model animal" (i.e., particular species or even
genetic strains that become {\em de rigueur} for certain groups of
biologists). In fact, collective agreement as the validity and utility of
scientific artifacts like this are critically involved in defining the
"paradigms" (ala Kuhn) in which scientists work. Scientifically, there
are obvious benefits to coordinated use of common test sets. For example,
a wide variety of techniques can be applied to common data and the results
of these various experiments can be compared directly. But if science is
also seen as an inductive process, over-fitting suggests there may also be
dangers inherent in this practice.
Initially, standardized test sets are almost certain to help any field
evaluate alternative methods; suppose they show that technique A1 is
superior to B1 and C1. But as soon as the results of these experiments
are used to guide the development ("skew the sampling") of new methods (to
A2, A3 and A4 for example), our confidence in the results of this second
set of experiments as accurate reflections of what will be found generally
true, must diminish. Over time, then, the same data set that initially
served the field well can come to actually impeed progress by creating a
false characterization of the real problems to be solved. The problem is
that the time-scale of scientific induction is so much slower than that of
our computational methods that the biases resulting from "paradigmatic
over-fitting" may be very difficult to recognize.
Machine learning also offers some remedies to the dilema of over-training.
The general idea is to use more than one data set for training, or more
accurately, partition available training data into subsets. Then,
portions of the training set can be methodically held back in order to
compare the result induced from one subset with that induced from another
(via cross validatation, jack-knifing, etc.).
How might this procedure be applied to science? It would be somewhat
artificial to purposefully identify but then hold back some data sets,
perhaps for years. More natural strategies with about the same effect
seem workable, however. First, a field should maintain MULTIPLE data
sets, to minimize aberations due to any one. Second, each of these can
only be USED FOR A LIMITED TIME, to be replaced by a new ones.
The problem is that even these modest conventions require significant
"discipline discipline." Accomplishing any coordination across
independent-minded scientists is difficult, and the use of shared data
sets is a fairly effortless way to accomplish useful coordination. Data
sets are difficult to obtain in the first place, and convincing others to
become familiar with them ever harder; these become intertial forces that
will make scientists reluctant to part with the classic data sets they
know well. Evaluating results across multiple data sets also makes new
problems for reviewers and editors. And, because the time-scale of
scientific induction is so long relative to the careers of the scientists
involved, the costs associated with all these concrete problems, relative
to theoretical ones due to pardigmatic over-fitting, will likely seem
huge: "Why should I give up familiar data sets when we previously agreed
to their validity, especially since my methods seem to be working better
and better on them?!"
Back to the GA
The GA community, at present, seems fairly healthy according to this
analysis. In addition to De Jong's, people like Dave Ackley have
genereated very useful sets of test functions. There are now test suites
that have many desirable properities, like being "GA-hard,"
"GA-deceptive," "royal road," "practical" and "real-world." So there are
clearly plenty of tests.
For this group, my main point is that this plurality is very desirable. I
too am dissatisfied with De Jong's test suite, but I am equally
dissatisfied with any ONE of the more recently proposed alternatives. I
suggest it's time we move beyond debates about whose tests are most
illuminating. If we ever did pick just one set to use for testing GAs it
would --- like De Jong's --- soon come to warp the development of GAs
according to ITS inevitable biases. What we need are more sophisticated
analyses and methodologies that allow a wide variety of testing
procedures, each showing something different.
Flame off,
Rik Belew
[I owe the basic insight --- that an entire discipline can be seen to
over-fit to a limited training corpora --- to conversations with Richard
Palmer and Rich Sutton, at the Santa Fe Institute in March, 1992. Of
course, all blame for damage occuring as the neat, little insight was
stretched into this epistle concerning GA research is mine alone.]
Richard K. Belew
Computer Science & Engr. Dept (0014)
Univ. California - San Diego
La Jolla, CA 92093
rik@cs.ucsd.edu
From now until about 20 July I will be working in Paris:
Status: RO
c/o J-A. Meyer
Groupe de BioInformatique
URA686. Ecole Normale Superieure
46 rue d'Ulm
75230 PARIS Cedex05
France
Tel: 44 32 36 23
Fax: 44 32 39 01
belew@wotan.ens.fr
------------------------------
Date: Fri, 29 May 1992 12:33:34 -0700
From: David Haussler <haussler@cse.ucsc.EDU>
Subject: COLT 92
COLT '92
Workshop on Computational Learning Theory
Sponsored by ACM SIGACT and SIGART
July 27 - 29, 1992
University of Pittsburgh, Pittsburgh, Pennsylvania
GENERAL INFORMATION
Registration & Reception: Sunday, 7:00 - 10:00 pm, 2M56-2P56 Forbes Quadrangle
Conference Banquet: Monday, 7:00 pm
The conference sessions will be held in the William Pitt Union.
Late Registration, etc.: Kurtzman Room (during technical sessions)
Lectures & Impromptu Talks: Ballroom
Poster Sessions: Assembly Room
SCHEDULE OF TALKS
Sunday, July 26
RECEPTION: 7:00 - 10:00 pm
Monday, July 27
SESSION 1: 8:45 - 10:05 am
8:45 - 9:05 Learning boolean read-once formulas with arbitrary symmetric
and constant fan-in gates,
by Nader H. Bshouty, Thomas Hancock, and Lisa Hellerstein
9:05 - 9:25 On-line Learning of Rectangles,
by Zhixiang Chen and Wolfgang Maass
9:25 - 9:45 Cryptographic lower bounds on learnability of AC^1 functions on
the uniform distribution,
by Michael Kharitonov
9:45 - 9:55 Learning hierarchical rule sets,
by Jyrki Kivinen, Heikki Mannila and Esko Ukkonen
9:55 - 10:05 Random DFA's can be approximately learned from sparse uniform
examples,
by Kevin Lang
SESSION 2: 10:30 - 11:50 am
10:30 - 10:50 An O(n^loglog n) Learning Algorithm for DNF,
by Yishay Mansour
10:50 - 11:10 A technique for upper bounding the spectral norm with
applications to learning,
by Mihir Bellare
11:10 - 11:30 Exact learning of read-k disjoint DNF and not-so-disjoint DNF,
by Howard Aizenstein and Leonard Pitt
11:30 - 11:40 Learning k-term DNF formulas with an incomplete membership
oracle,
by Sally A. Goldman, and H. David Mathias
11:40 - 11:50 Learning DNF formulae under classes of probability
distributions,
by Michele Flammini, Alberto Marchetti-Spaccamela and Ludek Kucera
SESSION 3: 1:45 - 3:05 pm
1:45 - 2:05 Bellman strikes again -- the rate of growth of sample
complexity with dimension for the nearest neighbor classifier,
by Santosh S. Venkatesh, Robert R. Snapp, and Demetri Psaltis
2:05 - 2:25 A theory for memory-based learning,
by Jyh-Han Lin and Jeffrey Scott Vitter
2:25 - 2:45 Learnability of description logics,
by William W. Cohen and Haym Hirsh
2:45 - 2:55 PAC-learnability of determinate logic programs,
by Savso Dvzeroski, Stephen Muggleton and Stuart Russell
2:55 - 3:05 Polynomial time inference of a subclass of context-free
transformations,
by Hiroki Arimura, Hiroki Ishizaka, and Takeshi Shinohara
SESSION 4: 3:30 - 4:40 pm
3:30 - 3:50 A training algorithm for optimal margin classifiers,
by Bernhard Boser, Isabell Guyon, and Vladimir Vapnik
3:50 - 4:10 The learning complexity of smooth functions of a single
variable,
by Don Kimber and Philip M. Long
4:10 - 4:20 Absolute error bounds for learning linear functions online,
by Ethan Bernstein
4:20 - 4:30 Probably almost discriminative learning,
by Kenji Yamanishi
4:30 - 4:40 PAC Learning with generalized samples and an application to
stochastic geometry,
by S.R. Kulkarni, S.K. Mitter, J.N. Tsitsiklis and O. Zeitouni
POSTER SESSION #1 & IMPROMPTU TALKS: 5:00 - 6:30 pm
BANQUET: 7:00 pm
Tuesday, July 28
SESSION 5: 8:45 - 10:05 am
8:45 - 9:05 Degrees of inferability,
by P. Cholak, R. Downey, L. Fortnow, W. Gasarch, E. Kinber, M. Kummer,
S. Kurtz, and T. Slaman
9:05 - 9:25 On learning limiting programs,
by John Case, Sanjay Jain, and Arun Sharma
9:25 - 9:45 Breaking the probability 1/2 barrier in FIN-type learning,
by Robert Daley, Bala Kalyanasundaram, and Mahendran Velauthapillai
9:45 - 9:55 Case based learning in inductive inference,
by Klaus P. Jantke
9:55 - 10:05 Generalization versus classification,
by Rolf Wiehagen and Carl Smith
SESSION 6: 10:30 - 11:50 am
10:30 - 10:50 Learning switching concepts,
by Avrim Blum and Prasad Chalasani
10:50 - 11:10 Learning with a slowly changing distribution,
by Peter L. Bartlett
11:10 - 11:30 Dominating distributions and learnability,
by Gyora M. Benedek and Alon Itai
11:30 - 11:40 Polynomial uniform convergence and polynomial-sample
learnability,
by Alberto Bertoni, Paola Campadelli, Anna Morpurgo, and Sandra Panizza
11:40 - 11:50 Learning functions by simultaneously estimating errors,
by Kevin Buescher and P.R. Kumar
INVITED TALK: 1:45 - 2:45 pm: Reinforcement learning,
by Andy Barto, University of Massachusetts
SESSION 7: 3:10 - 4:40 pm
3:10 - 3:30 On learning noisy threshold functions with finite precision
weights,
by R. Meir and J.F. Fontanari
3:30 - 3:50 Query by committee,
by H.S. Seung, M. Opper, H. Sompolinsky
3:50 - 4:00 A noise model on learning sets of strings,
by Yasubumi Sakakibara and Rani Siromoney
4:00 - 4:10 Language learning from stochastic input,
by Shyam Kapur and Gianfranco Bilardi
4:10 - 4:20 On exact specification by examples,
by Martin Anthony, Graham Brightwell, Dave Cohen and John Shawe-Taylor
4:20 - 4:30 A computational model of teaching,
by Jeffrey Jackson and Andrew Tomkins
4:30 - 4:40 Approximate testing and learnability,
by Kathleen Romanik
IMPROMPTU TALKS: 5:00 - 6:00 pm
BUSINESS MEETING: 8:00 pm
POSTER SESSION #2: 9:00 - 10:30 pm
Wednesday, July 29
SESSION 8: 8:45 - 9:45 am
8:45 - 9:05 Characterizations of learnability for classes of 0,...,n-valued
functions,
by Shai Ben-David, Nicol`o Cesa-Bianchi and Philip M. Long
9:05 - 9:25 Toward efficient agnostic learning,
by Michael J. Kearns, Robert E. Schapire, and Linda Sellie
9:25 - 9:45 Approximating Bayes decisions by additive estimations
by Svetlana Anoulova, Paul Fischer, Stefan Polt, and Hans Ulrich Simon
SESSION 9: 10:10 - 10:50 am
10:10 - 10:30 On the role of procrastination for machine learning,
by Rusins Freivalds and Carl Smith
10:30 - 10:50 Types of monotonic language learning and their
characterization,
by Steffen Lange and Thomas Zeugmann
SESSION 10: 11:10 - 11:50 am
11:10 - 11:30 An improved boosting algorithm and its implications on learning
complexity,
by Yoav Freund
11:30 - 11:50 Some weak learning results,
by David P. Helmbold and Manfred K. Warmuth
SESSION 11: 1:45 - 2:45 pm
1:45 - 2:05 Universal sequential learning and decision from individual data
sequences,
by Neri Merhav and Meir Feder
2:05 - 2:25 Robust trainability of single neurons,
by Klaus-U. Hoffgen and Hans-U. Simon
2:25 - 2:45 On the computational power of neural nets,
by Hava T. Siegelmann and Eduardo D. Sontag
===============================================================================
ADDITIONAL INFORMATION
To receive complete information regarding conference registration and
accomodations contact Betty Brannick:
E-mail: brannick@cs.pitt.edu
PHONE: (412) 624-8493
FAX: (412) 624-8854.
Please specify whether you want the information sent in PLAIN text or LATEX
format.
NOTE: Attendees must register BY JUNE 19 TO AVOID THE LATE REGISTRATION FEE.
------------------------------
Date: Thu, 11 Jun 92 23:22:47 -0500
From: jan zytkow <zytkow@wise.cs.twsu.EDU>
Subject: Machine Discovery Workshop
Machine Discovery Workshop
Aberdeen, July 4, 1992
PROGRAM
Program in brief:
9:00-10:30AM discovery in databases; 3 papers plus discussion
10:30-12:30 poster session and coffee; about 12-14 papers
12:30-13:45 lunch break
13:45-15:40 scientific and math discovery; 4 papers plus discussion
15:40-16:10 coffee break
16:10-17:10 panel on evaluation of discovery systems
17:10-17:40 general discussion
Detailed program:
9:00-10:30 DISCOVERY IN DATABASES; Session Chair: Derek Sleeman
(each presentation 20 minutes plus 5 minutes discussion)
Willi Kloesgen (German National Research Center)
Patterns for Knowledge Discovery in Databases (9:00-9:25)
Gregory Piatetsky-Shapiro (GTE Laboratories)
Probabilistic Data Dependencies (9:25-9:50)
Robert Zembowicz and Jan Zytkow (The Wichita State University)
Discovery of Regularities in Databases; and a commentary on the
previous papers (9:50-10:15)
Discussion (10:15-10:30)
10:30-12:30 POSTER SESSION
Sakir Kocabas (Marmara Research Center, Turkey)
Elements of Scientific Research: Modeling Discoveries in Oxide
Superconductivity
Thomas Dietterich (Oregon State University)
Towards Model-Based Learning: A Case Study in Ecosystem Prediction
Marie desJardins (SRI International)
Goal-Directed Learning: A Decision-Theoretic Model for Deciding What
to Learn Next
Jason Catlett (AT&T Bell Laboratories)
Large-scale induction of Ripple-Down-Rules: some preliminary
considerations
Sambasiva Bhatta and Ashok Goel (Georgia Tech)
Discovery of Principles and Processes from Design Experience
Marek Bielecki (California State Univ. Hayward)
Machine Discovery Approach to Dynamic Systems in a Real Laboratory
Jack Park and Dan Wood (ThinkAlong Software)
Getting the Model Right; Coupling a Cellular Automata to a Discovery
System
Darrell Conklin, Suzanne Fortier, Janice Glasgow, Frank Allen
(Queen's University, Canada)
Discovery of Spatial Concepts in Crystallographic Databases
Adrian Gordon (Laboratoire de Recherche en Informatique, France)
Informal Qualitative Models in Scientific Discovery
Stefan Schrodl and Oliver Wendel (Universitat Kaiserslautern, Germany)
Automated Data Analysis and Discovery in Neurophysiological Simulation
Experiments Using a Combination of Numerical and Symbolic Methods
M.A.Klopotek, M.Michalewicz, M.Matuszewski (Polish Academy of Sciences)
Extracting Knowledge from Data - SYS8688 Approach
Davide Roverso, Peter Edwards, Derek Sleeman
(University of Aberdeen, UK) Machine Discovery by Model Driven
Analogy
Usama Fayyad, Richard Doyle, Nick Weir, Stanislav Djorgowski
(California Institute of Technology)
Automating Sky Object Classification In Astronomical Survey Images
Chenjiang Mao (The First Academy, Beijing, China)
Knowledge Acquisition from Examples with Mixed Attributes
Paul Fischer and Jan Zytkow (The Wichita State University)
Incremental Generation and Exploration of Hidden Structure
12:30-13:45 LUNCH BREAK
13:45-15:40 SCIENTIFIC AND MATH DISCOVERY; Session Chair: Tom Dietterich
(each presentation 20 minutes plus 5 minutes discussion)
Rudiger Oehlmann, Derek Sleeman, Peter Edwards (University of
Aberdeen, UK)
Self-Questioning and Experimentation in an Exploratory Discovery
System (13:45-14:10)
Raul Valdes-Perez, Herbert Simon, and Robert Murphy (Carnegie Mellon Univ.)
Discovery of Pathways in Science (14:10-14:35)
Marjorie Moulet (Laboratoire de Recherche en Informatique, France)
ARC2: Linear Regression in Abacus (14:35-15:00)
Kenneth Haase (Massachusetts Institute of Technology)
An Experiment in Representational Invention (15:00-15:25)
Discussion (15:25-15:40)
16:10-17:10 PANEL DISCUSSION ON EVALUATION OF DISCOVERY SYSTEMS;
Panel Chair: Pat Langley
Panelists:
Cullen Schaffer (City University of New York); joint position paper
with Armand Prieditis (University of California, Davis)
Sakir Kocabas (Marmara Research Institute, Turkey)
Raul Valdes-Perez (Carnegie Mellon University); joint position paper
with Herbert Simon (Carnegie Mellon Univ.)
17:15-17:45 GENERAL DISCUSSION
*******************************
All papers are published in the proceedings. Proceedings will be
provided to the workshop participants in Aberdeen.
For copies of the proceedings contact:
Jan Zytkow
Computer Science Department
Wichita State University
Wichita, KS 67208
U.S.A.
phone: 316-689-3925
email: zytkow@wise.cs.twsu.edu
A nominal fee of $7.50 may be charged for the proceedings and postage.
Proceedings will be mailed from Wichita after the workshop. The volume
includes 25 state-of-art papers divided into categories of:
knowledge discovery in databases,
scientific discovery,
concept discovery,
automated data analysis,
exploration of environment,
discovery in mathematics,
evaluation of discovery systems.
------------------------------
Date: Wed, 3 Jun 92 14:00:32 PDT
From: Tom Dietterich <tgd@arris.COM>
Subject: Job Advertisement- Arris Pharmaceutical
RESEARCH SCIENTIST in
Machine Learning, Neural Networks, and Statistics
Arris Pharmaceutical
Arris Pharmaceutical is a start-up pharmaceutical company founded in
1989 and dedicated to the efficient discovery and development of
novel, orally-active human therapeutics through the application of
artificial intelligence, machine learning, and pattern recognition
methods.
We are seeking a person with a PhD in Computer Science, Mathematics,
Statistics, or related fields to join our team developing new machine
learning algorithms for drug discovery. The team currently includes
contributions from Tomas Lozano-Perez, Rick Lathrop, Roger Critchlow,
and Tom Dietterich. The ideal candidate will have a strong background
in mathematics (including spatial reasoning methods) and five years'
experience in machine learning, neural networks, or statistical
model-building methods. The candidate should be eager to learn the
relevant parts of computational chemistry and to interact with
medicinal chemists and molecular biologists.
To a first approximation, the Arris drug design strategy begins by
identifying a pharmaceutical target (e.g., an enzyme or a cell-surface
receptor), developing assays to measure chemical binding with this
target, and screening large libraries of peptides (short amino acid
sequences) with these assays. The resulting data, which indicates for
each compound, how well it binds to the target, will then be analyzed
by machine learning algorithms to develop hypotheses that explain why
some compounds bind well to the target while others do not.
Information from X-ray crystallography or NMR spectroscopy may also be
available to the learning algorithms. Hypotheses will then be refined
by synthesizing and testing additional peptides. Finally, medicinal
chemists will synthesize small organic molecules that satisfy the
hypothesis, and these will become candidate drugs to be tested for
medical safety and effectiveness.
For more information, send your resume with the names and addresses of
three references to Tom Dietterich (email: tgd@arris.com; voice:
415-737-8600; FAX: 415-737-8590).
------------------------------
End of ML-LIST 4.12 (Digest format)
****************************************