Copy Link
Add to Bookmark
Report
Machine Learning List Vol. 5 No. 22
Machine Learning List: Vol. 5 No. 22
Tuesday, October 19, 1993
Contents:
NEWS FOR TEACHERS and RESEARCHERS in ML
OC1 -- decision tree learning software
Systematic search for learning conjunctive expressions
Faculty Position in Cognitive Psychology at Michigan
Graduate Research Assistant in Machine Learning
CFP TREC-3: Text Retrieval/Filtering Conference/Dataset/Evaluation
CFP- SISCAP '94
The Machine Learning List is moderated. Contributions should be relevant to
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu.
Mail requests to be added or deleted to ml-request@ics.uci.edu. Back issues
may be FTP'd from ics.uci.edu in pub/ml-list/V<X>/<N> or N.Z where X and N are
the volume and number of the issue; ID: anonymous PASSWORD: <your mail address>
----------------------------------------------------------------------
Date: Fri, 8 Oct 93 11:18:28 EDT
From: Janusz Wnek <jwnek@aic.gmu.EDU>
Subject: NEWS FOR TEACHERS and RESEARCHERS in ML
NEWS FOR TEACHERS and RESEARCHERS in ML
If you are teaching machine learning or cognitive science, we would like to
inform you that there is a system available as a tool for supporting education
and research in machine learning/cognitive science.
The Artificial Intelligence Center at George Mason University has developed
EMERALD (version 2), a system of machine learning and discovery tools
for education and research. It introduces users to five different
learning programs, provides explanations how they work, and allows users to
experiment with them by designing their own problems, made up
from predefined objects. The system has well-designed and attractive
interface, utilizing color graphics. Rules learned by the system are
automatically translated to English and spoken by a speech synthesizer.
The system has already been delivered to many universities, including many
in Europe, where the system was demonstrated at several Summer schools.
The system includes several learning systems integrated at the user's level:
1) for learning rules from examples,
2) for learning structural descriptions of objects,
3) for conceptually grouping objects or events,
4) for discovering rules characterizing sequences, and
5) for learning equations based on qualitative and quantitative data.
It is envisioned that users could add their own modules in the future
that represent other learning paradigms.
EMERALD runs on a Sun Workstation with a color monitor. Sun Common Lisp
and OpenWindows (version 2 or higher) are required. A Sun Pascal library
is necessary to run the Pascal applications. While not necessary, DecTalk
voice synthesis device is highly recommended to enhance the presentation.
The system is delivered on a high-density 1.5" tape unless other arrangements
are made.
If interested, contact:
Dr. Janusz Wnek
Assistant Director for Research Management
Center for Artificial Intelligence
George Mason University
4400 University Dr.
Fairfax, VA 22030, USA
jwnek@aic.gmu.edu
tel. (703) 993-1717
fax. (703) 993-3729
------------------------------
Date: Tue, 12 Oct 93 16:29:52 EDT
From: salzberg@blaze.cs.jhu.EDU
Subject: OC1 -- decision tree learning software
OC1 (Oblique Classifier 1) is a multivariate decision tree
induction system designed for applications where the instances have
numeric feature values. OC1 builds decision trees that contain linear
combinations of one or more attributes at each internal node; these
trees then partition the space of examples with both oblique and
axis-parallel hyperplanes. OC1 has been used for classification of
data from several real world domains, such as astronomy and cancer
diagnosis. A technical decription of the algorithm can be found in
the AAAI-93 paper by Sreerama K. Murthy, Simon Kasif, Steven Salzberg
and Richard Beigel. A postscript version of this paper is provided
with the package.
OC1 is a written entirely in ANSI C. It incorporates a number of
features intended to support flexible experimentation on real and
artificial data sets. We have provided support for cross-validation
experiments, generation of artificial data, and graphical display of
data sets and decision trees. The OC1 software allows the user to
create both standard, axis-parallel decision trees and oblique
(multivariate) trees.
TO OBTAIN OC1 BY ANONYMOUS FTP
The latest version of OC1 is available free of charge, and may be
obtained via anonymous FTP from the Department of Computer Science at
Johns Hopkins University.
To obtain a copy of OC1, type the following commands:
UNIX_prompt> ftp blaze.cs.jhu.edu
[Note: the Internet address of blaze.cs.jhu.edu is 128.220.13.50]
Name: anonymous
Password: [enter your email address]
ftp> bin
ftp> cd pub/oc1
ftp> get oc1.tar.Z
[This announcement is also contained in pub/oc1.]
ftp> bye
[Place the file oc1.tar.Z in a convenient subdirectory.]
UNIX_prompt> uncompress oc1.tar.Z
UNIX_prompt> tar -xf oc1.tar
[Read the file "README", to get cues to other documentation files, and
to run the programs.]
If you have any comments, questions or suggestions, please contact
Sreerama K. Murthy or
Steven Salzberg or
Simon Kasif
Department of Computer Science
The Johns Hopkins University
Baltimore, MD 21218
Email: murthy@cs.jhu.edu (primary contact)
salzberg@cs.jhu.edu
kasif@cs.jhu.edu
OC1 IS INTENDED FOR NON-COMMERCIAL PURPOSES ONLY. OC1 may be used,
copied, and modified freely for this purpose. Any commercial use of
OC1 is strictly prohibited without the express written consent of
Sreerama K. Murthy, Simon Kasif, and Steven Salzberg, at the
Department of Computer Science, Johns Hopkins University.
------------------------------
Subject: Systematic search for learning conjunctive expressions
Date: Fri, 15 Oct 93 16:13:30 +1000
From: Geoff Webb <webb@deakin.edu.au>
A number of researchers have recently investigated techniques for
systematic search through spaces of conjunctive expressions (Webb,
1990; Schlimmer, 1993; Rymon, 1993). All of the approaches developed
have been restricted in the spaces that they can explore.
OPUS is a systematic search algorithm that is more efficient than
these previous techniques. It has been incorporated into an AQ-like
covering algorithm for learning disjunctive normal form concept
descriptions. An implementation of this system for use on Sun
workstations is available for anonymous FTP from sol.ccs.deakin.edu.au
in the directory webb/cover. Postscript format papers describing and
evaluating the algorithm are available in the directory webb/papers in
the files OPUS1.ps and OPUS2.ps.
Rymon, R. An SE-tree based characterization of the induction problem.
Proceedings of the 1993 International Conference on Machine Learning.
Schlimmer, J. C. (1983) Efficiently inducing determinations: A
complete and systematic search algorithm that uses optimal pruning.
Proceedings of the 1993 In ternational Conference on Machine Learning.
Webb, G. (1990) Techniques for efficient empirical induction. In C.
J. Barter & M. J. Brooks (Eds) AI'88. Springer-Verlag, Berlin, pp. 225-239.
Geoff Webb
School of Computing and Mathematics,
Deakin University, Victoria, 3217, Australia.
PHONE: +61 (052) 27 2606
------------------------------
Date: Tue, 5 Oct 93 09:04:42 -0500
From: Colleen Seifert <seifert@psych.lsa.umich.EDU>
Subject: Faculty Position in Cognitive Psychology at Michigan
Position in Cognitive Psychology
University of Michigan
The University of Michigan Department of Psychology invites
applications for a tenure-track position in the area of Cognition,
beginning September 1, 1994. The appointment will most likely be made
at the Assistant Professor level, but it may be possible at other
ranks. We seek candidates with primary interests and technical skills
in cognitive psychology. Our primary goal is to hire an outstanding
cognitive psychologist, and thus we will look at candidates with any
specific research interest. We have a preference for candidates
interested in higher mental processes or for candidates with
computational modeling skills (including connectionism) or an interest
in cognitive neuroscience. Responsibilities include graduate and
undergraduate teaching, as well as research and research supervision.
Send curriculum vitae, letters of reference, copies of recent
publications, and a statement of research and teaching interests no
later than January 7, 1994 to: Gary Olson, Chair, Cognitive Processes
Search Committee, Department of Psychology, University of Michigan,
330 Packard Road, Ann Arbor, Michigan 48104. The University of
Michigan is an Equal Opportunity/Affirmative Action employer.
------------------------------
Subject: Graduate Research Assistant in Machine Learning
Date: Mon, 18 Oct 1993 15:15:50 -0700
From: Michael Pazzani <pazzani@pan.ICS.UCI.EDU>
In the Fall of 1994, I will have a Graduate Research Assistant
available for a student interested in pursuing a Ph.D. degree in
Machine Learning at the University Of California, Irvine. This is a 2
year position that pays $20,800 per year and includes all tuition and
fees. [However, it is likely that additional funding will be
available after the second year]. Only US citizens are eligible for
this award. Those interested in applying for this position should:
1. Apply to graduate program in artificial intelligence at UCI.
An application can be obtained from:
Graduate Advisor
Information and Computer Science
UCI
Irvine, CA 92717
or via e-mail: theresa@ics.uci.edu
2. Send a short note to pazzani@ics.uci.edu indicating that
you want to be considered for this position.
Mike Pazzani
P.S. UCI also has fellowships available for female and minority
students, and exceptionally qualified students (as indicated by GRE
scores and grades).
------------------------------
Date: Thu, 7 Oct 93 17:20 EDT
From: David Lewis <lewis@research.att.COM>
Subject: CFP TREC-3: Text Retrieval/Filtering Conference/Dataset/Evaluation
CALL FOR PARTICIPATION
TEXT RETRIEVAL CONFERENCE
January 1994 - November 1994
Conducted by:
National Institute of Standards and Technology
(NIST)
Sponsored by:
Advanced Research Projects Agency
Software and Intelligent Systems Technology Office
(ARPA/SISTO)
A new conference for examination of text retrieval methodologies (TREC) was
held in November 1992 at Gaithersburg, Md. The goal of this conference was
to encourage research in text retrieval from large document collections by
providing a large test collection, uniform scoring procedures and a forum for
organizations interested in comparing their results. Both ad-hoc queries
against archival data collections and routing (filtering or dissemination)
queries against incoming data streams were tested. The conference was a
workshop open only to the 24 participating systems and government sponsors;
however, the proceedings were published by NIST in the spring of 1993. A
second workshop (TREC-2) was held in September 1993, with 31 participating
systems, and proceedings to be published in the spring of 1994.
This announcement serves as a call for participation from groups interested
in working in the third year of this workshop (TREC-3). Participants will be
expected to work with approximately million documents (2 gigabytes of data),
retrieving lists of documents that could be considered relevant to each of
100 topics (50 routing and 50 adhoc topics). NIST will distribute the data
and will collect and analyze the results. As before, the workshop will be
open only to participating systems and government sponsors.
Because of government cutbacks, there will be no financial support this
year for participants.
Schedule:
Dec. 1, 1993 -- deadline for participation applications
Jan. 1, 1994 -- acceptances announced, and training data distributed to
new participants (including 3 CD-ROMS containing about
3 gigabytes of data, and 150 training topics and relevance
judgments)
June 1, 1994 -- Test gigabyte of data distributed via CD-ROM, after
routing queries received at NIST
July 1, 1994 -- 50 new test topics distributed
Aug. 1, 1994 -- results from 50 routing queries and 50 test topics due
at NIST
Oct. 1, 1994 -- relevance judgments and individual evaluation scores due
back to participants
Nov. 2-4 -- TREC-3 conference at NIST in Gaithersburg, Md.
Task Description:
Participants will receive 3 gigabytes of data to use for training of their
systems, including development of appropriate algorithms or knowledge bases.
The 150 topics used in the first two TREC workshops, and the relevance
judgments for these topics will also be sent. The topics are in the form of
a highly-formatted user need statement (see attachment 1). Queries can
either be constructed automatically from this topic description, or can be
manually constructed.
Two types of retrieval operations will be tested: a routing or filtering
operation against new data, and an ad-hoc query operation against archival
data. Fifty of the topics (numbers 101-150) initially distributed as
training topics will be used by each participating group to create formalized
routing or filtering queries to be used for retrieval against a new test
gigabyte of data (disk 4). Fifty new test topics (151-200) will be used
against 2 gigabytes of the training data (disks 2 and 3) as ad-hoc queries.
Results from both types of queries (routing and ad-hoc) will be submitted
to NIST as the top 1000 documents retrieved for each query. Participants
creating queries both automatically and manually may submit both sets for
evaluation. Scoring techniques including traditional recall/precision
measures will be run for all systems and individual results will be returned
to each participant.
Conference Format:
The conference itself will be used as a forum both for presentation of
results (including failure analyses and system comparisons), and for more
lengthy system presentations describing retrieval techniques used,
experiments run using the data, and other issues of interest to researchers
in information retrieval. As there is a limited amount of time for these
presentations, the program committee will determine which groups are asked to
speak and which groups will present in a poster session. Additionally some
organizations may not wish to describe their proprietary algorithms, and
these groups may chose to participate in a different manner (see Category C).
To allow a maximum number of participants, the following three categories
have been established.
Category A: Full participation
Participants will be expected to work with the full data set, and to present
full details of system algorithms and various experiments run using the data,
either in a talk or in a poster session. In addition to algorithms and
experiments, some information on time and effort statistics should be
provided. This includes time for data preparation (such as indexing,
building a manual thesaurus, building a knowledge base), time for construction
of manual queries, query execution time, etc. More details on the desired
content of the presentation will be provided later.
Category B: Exploratory groups
Because small groups with novel retrieval techniques might like to
participate but may have limited research resources, a category has been set
up to work with only a subset of the data. This subset will consist of about
1/2 gigabyte of training data (and all training topics), and 1/4 gigabyte of
test data (and all test topics). Participants in this category will be
expected to follow the same schedule as category A, except with less data,
and will be expected to present full details of system algorithms,
experiments, and time and effort statistics either in a poster session or
in a talk.
Category C: Evaluation only
Participants in this category will be expected to work on the full data set,
submit results for common scoring and tabulation, and present their results in
a poster session, including the time and effort statistics described in
Category A. They will not be expected to describe their systems in detail.
Data (Test Collection):
The test collection (documents, topics, and relevance judgments) will be an
extension of the collection (English only) used for the ARPA TIPSTER project.
The collection is being assembled from Linguistic Data Consortium text, and a
LDC User Agreement will be required from all participants. The documents
are an assorted collection of newspapers (including the Wall Street Journal),
newswires, journals, technical abstracts and email newsgroups. The test set
will be of approximately the same composition as the training set, and all
documents will be typical of those seen in a real-world situation (i.e. there
will not be arcane vocabulary, but there may be missing pieces of text or
typographical errors). The format of the documents is relatively clean and
easy-to-use as is (see attachment 2). Most of the documents will consist of
a text section only, with no titles or other categories. The relevance
judgments against which each system's output will be scored will be made by
experienced relevance assessors based on the output of all TREC participants
using a pooled relevance methodology.
Response format and submission details
By Dec. 1, 1993 organizations wishing to participate should respond to
the call for participation by submitting a summary of their text retrieval
approach and a system architecture description, not to exceed five pages in
total. The summary should include the strengths and significance of their
approach to text retrieval, and highlight differences between their approach
and other retrieval approaches. Each organization should indicate in which
category they wish to participate.
Please indicate clearly the persons responsible for the summary statement
and to whom correspondence should be directed. A full regular address,
telephone number, and an email address should be given. EMAIL IS THE
PREFERRED METHOD OF COMMUNICATION, although it is realized that diagrams and
figures will need to be sent by regular mail or FAX. It is expected that
ALL participants have some access to email, as conference communications will
be done via email.
It is highly likely that some Spanish text and topics (approximately a
1/4 gigabyte of text and 25 topics) will also be available for retrieval
tests. If your organization is interested in trying Spanish (in addition to
English), please state this and indicate the availability of at least one
person who can read Spanish.
All responses should be submitted by Dec. 1, 1993 to the Program Chair,
Donna Harman:
harman@magi.ncsl.nist.gov
or
Donna Harman, NIST, Building 225/A216,
Gaithersburg, Md. 20899
FAX: 301-975-2128
AS NOTED ABOVE, EMAIL IS THE DESIRED FORM OF COMMUNICATION.
*****************************************************************************
Any questions about conference participation, response format, etc. should
also be sent to the same address.
Selection of participants:
As the goal of TREC is to further research in large-scale text retrieval,
the program committee will be looking for as wide a range of text retrieval
approaches as possible, and will select the best representatives of these
approaches as participants for categories A and B. Category C participants
must be able to demonstrate their ability to work with the full data
collection. The program committee has been chosen from a broad range of
information retrieval researchers and government users, and will both select
the participants and provide guidance in the planning of the conference.
Program Committee
Donna Harman, NIST, chair
Chris Buckley, Cornell University
Susan Dumais, Bellcore
Darryl Howard, U.S. Department of Defense
David Lewis, AT & T Bell Labs
Matt Mettler, TRW
John Prange, U.S. Department of Defense
Alan Smeaton, Dublin City University, Ireland
Karen Sparck Jones, Cambridge University
Richard Tong, Advanced Decision Systems
Steve Walker, City University, London
============================================================================
Attachment 1 -- Sample Topic
<top>
<head> Tipster Topic Description
<num> Number: 028
<dom> Domain:Science and Technology
<title> Topic: AT&T's Technical Efforts
<desc> Description: Document must describe AT&T's technical efforts in
computers and communications.
<narr> Narrative: To be relevant, a document must contain information
on American Telephone and Telegraph's (AT&T) technical efforts in
computers and communications. Examples of relevant subject matter
would include: product announcements, releases or cancellations, and
discussion of AT&T Bell Labs research. Documents focusing either
AT&T's efforts to buy other computer companies or AT&T's legal battles
with other organizations, or AT&T's Unix operating system are NOT
relevant. For the purposes of this topic the Regional Bell Operating
Companies, (RBOC's) or the "Baby Bells" are not considered AT&T.
<con> Concept(s):
1. AT&T, American Telephone and Telegraph
2. 3B-2 minicomputer, AT&T 386 PC
3. AT&T Starlan
4. PBX,
5. Product announcements, product releases
</top>
==============================================================================
Attachment 2 -- Sample Document (abridged)
<DOC>
<DOCNO> WSJ880406-0090 </DOCNO>
<HL> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </HL>
<AUTHOR> Janet Guyon (WSJ Staff) </AUTHOR>
<SO> </SO>
<CO> T </CO>
<IN> TEL </IN>
<DATELINE> NEW YORK </DATELINE>
<TEXT>
American Telephone & Telegraph Co. introduced the first of a new generation
of phone services with broad implications for computer and communications
equipment markets.
AT&T said it is the first national long-distance carrier to announce prices
for specific services under a world-wide standardization plan to upgrade phone
networks. By announcing commercial services under the plan, which the industry
calls the Integrated Services Digital Network, AT&T will influence evolving
communications standards to its advantage, consultants said, just as
International Business Machines Corp. has created de facto computer standards
favoring its products.
.
.
</TEXT>
</DOC>
------------------------------
Date: Tue, 5 Oct 93 10:30:12 EDTM
From: Shervin Erfani <sie@probe.att.com>
Subject: CFP- SISCAP '94
SYMPOSIUM ON INTELLIGENT SYSTEMS IN COMMUNICATIONS AND POWER
(SISCAP'94)
(The First in Puerto Rico)
February 21-23, 1994
Mayaguez Hilton, Mayaguez, PR
SISCAP '94 is organized to provide technical and tutorial
programs in an interactive atmosphere in which creative
discussions among participants can be fostered. There will be
keynote speakers, contributed and invited papers, tutorials,
poster sessions, and wrap-up sessions.
MAJOR THEMES AND TOPICS (not limited):
* COMMUNICATIONS SYSTEMS AND NETWORKING:
* SIGNAL AND IMAGE PROCESSING:
* ARTIFICIAL INTELLIGENCE, FUZZY LOGIC, AND EXPERT SYSTEMS:
- Uncertainty Management
- Foundations of Artificial Intelligence, Expert Systems, and
Fuzzy Logic
- Intelligent Knowledge-based and Database Systems
- Model-based Reasoning and Object-Oriented Modeling
* POWER SYSTEMS:
* NEURAL NETWORKS AND PARALLEL PROCESSING:
- Neural Networks: Theory Implementation and Applications
- Neural Training Algorithms
- Relations Between Neural Networks, Parallel Processing,
and Fuzzy Logic
- Neural-based Controllers
* Send 3 copies of extended summaries, typed in double space, not
exceeding 500 words, to the Program Chairman.
* Summaries should be received by October 30, 1993. Authors will
be notified by November 30, 1993.
* For Additional Information, or a copy of the Advance Program,
contact the Program Chairman.
Hamed Parsiani, Program Chairman
University of Puerto Rico, Mayaguez
Department of Electrical
and Computer Engineering
Mayaguez, PR, 00681-5000
Fax: (809) 831-7564
Tel: (809) 832-4040, ext 3653/3094
E-mail: siscap@rmece01.upr.clu.edu
------------------------------
End of ML-LIST (Digest format)
****************************************