Copy Link
Add to Bookmark
Report
IRList Digest Volume 2 Number 41
IRList Digest Monday, 8 September 1986 Volume 2 : Issue 41
Today's Topics:
Query - Back issues, applying IR ideas to software systems?
Announcement - Xerox PARC Forum on NoteCards
Announcement - News on National Archives storage
Discussion - Comments regarding News on National Archives storage
Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 4 of 4
News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet
CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------
Date: Fri, 5 Sep 86 00:42:30 CDT
From: seismo!gswd-vms.ARPA!marick%turkey (Brian Marick)
Subject: back issues of the mailing list
There's a line of information retrieval/organization research that
stretches back to Vannevar Bush and Memex, through Englebart and NLS, up
to Xerox Notecards and TextNet. I'd like to apply those ideas about
organizing and retrieving information to the (I feel) analogous problems
involved in the maintenance and enhancement of large software systems by
smallish groups of people. The more I learn about the
Bush-Englebart-... family tree, the better. Would back issues of the
IRList help me? If so, is there any way to get to those back issues?
Thanks much.
Brian Marick, Wombat Consort
Gould Computer Systems -- Urbana && University of Illinois
...ihnp4!uiucdcs!ccvaxa!marick
ARPA: Marick@GSWD-VMS
[Note: yes, see the Welcome message I will send you for details - Ed]
[Note: Dr. William Frakes at AT&T Bell Laboratories and some of his
colleagues have been interested in searching large software
collections for "relevant" modules, which is a little like what
you are talking about. Perhaps Bill and others will comment on
your ideas. Please let us know what further developments result. - Ed]
------------------------------
Date: Fri, 5 Sep 86 13:25:06 PDT
From: Hibbert.pa@Xerox.COM
Subject: PARC Forum September 11: NoteCards
[Forwarded from: AI-ED Digest Friday, 5 Sep 1986 V.1: Issue 30 - Ed]
PARC Forum
Thursday, September 11, 1986
4:00PM, PARC Auditorium
Frank Halasz
Randy Trigg
Tom Moran
Intelligent Systems Lab
Xerox PARC
NoteCards: An Experimental Environment for Idea Processing and
Information Management
NoteCards is an extensible environment designed to help people
formulate, structure, compare, and manage ideas. It was developed here
at PARC as a vehicle for our research on the nature of idea processing
tasks and the ways in which computers can be used to support
intellectual work. As part of this research, we have been actively
seeding a community of NoteCards users inside Xerox and at a number of
university, government, and industrial sites. NoteCards is currently
being used by more than 50 people engaged in idea processing tasks
ranging from writing research papers through designing parts for
photocopiers.
In this forum, we will briefly demonstrate the current version of
NoteCards and discuss the major design considerations that drive its
development. We will describe the NoteCards user community and the
range of clever applications that are being developed using NoteCards.
Finally, we will assess how well the system meets the needs of its
users. Specifically, we will argue that NoteCards is very successful in
supporting the task of managing and organizing large collections of
ideas, but is relatively less suited to the task of formulating and
structuring these ideas. We will also argue that the system lacks
adequate support for collaborative work. These assessments will be used
to motivate and briefly describe the current research directions of the
NoteCards project.
------------------------------
Date: 4 Sep 86 19:58 PDT
From: William Daul / McDonnell-Douglas / APD-ASD <WBD.MDC@OFFICE-1.ARPA>
Author: Mitch Betts (ComputerWorld)
Subject: ComputerWorld 9/1/86 p.31 "National Archives' Storage Under Scrutiny"
Comment: I thought this might be of interest to you. It is copied without
permission. --Bi//
Keywords: National Archives, information retrieval, infomation storage,
archives, historians
WASHINGTON, D.C. -- The prestigious National Research Council has issued a
report urging the National Archives not to use magnetic media or optical disks
to permanently store historical documents.
Optical disks and magnetic media last only 10 to 20 years for archival
purposes, and the rapid pace of change in hardware and software technology
suggests that it may be impossible to read the historical records in the
centuries to come, according to the report, "Preservation of Historical
Records."
William Holmes, director of the National Archives and Records Administration's
archival research and evaluation staff, stated that he agrees with the research
report's conclusions.
He said that although the agency plans a pilot test of digital imaging and
optical-disk technology, optical disks will be used only for public retrieval
and not for permanent storage.
"Even if the operating systems and documentation problems somehow are dealt
with, what is the archivist to do when the machine manufacturer declares the
hardware obsolete or simply goes out of business?," the research report adked.
"Will there be an IBM or a Sony in the year 2200? If they still exist, will
they maintain a 1980-1990 vintage machine?" the report continued.
An example of the problem occurred in the mid-1970s when archivists discovered
that there were only two computers that could read the 1960 U.S. census; one
was in the Smithsonian Institution and the other was in Japan.
The inescapable conclusion, the researchers said, is that long-term archives
would be committed to an expensive file conversion program every 10-20 years if
it uses electronic media for permanent storage.
------------------------------
Date: 5 Sep 86 09:34 CDT
From: "Don Young"@csnet-relay.csnet,
"Augmentation Systems Division"@csnet-relay.csnet,
MDC <DFY.MDC@OFFICE-1.ARPA>
Subject: Re:ComputerWorld 9/1/86p.31 "National Archives' Storage Under Scrutiny"
[Note: this is a follow up to previous message. - Ed]
Thanks for putting this article on-line.
Yes, the National Archive folks have two major problems:
1. The question that they ask us "WILL YOU BE AROUND AS A VENDOR TO SUPPORT
YOUR PRODUCT OVER THE LONG TERM".
2. Problem with finding the proper recording devices for long term storage.
The positive thing in the article is that they confirmed that they are going to
run a pilot test. This pilot test could be with ASD. Also, the Air Force/Navy
Standard Multiuser small Computer Requirements Contract (RFIas this point)
describes Augment On-Line Files in good detail as a requirement. The
specification is Augment coupled with the methodology used by AFCC. Will hope
that the RFP states the same when available next month.
------------------------------
Date: Wed, 23 Jul 1986 13:06 CST
From: Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet>
Subject: SIGIR FORUM Abstracts [Part 4 of 4 - Ed]
[Note: Members of ACM SIGIR should have received the spring/summer
Forum, and can find these on pages 39-42. The previous parts have
appeared in machine readable form in earlier issues of IRList. - Ed]
ABSTRACTS
(Chosen by G. Salton or V. Raghavan from 1984 issues of journals
in the retrieval area)
30. STRUCTURE OF HIERARCHIC CLUSTERINGS: IMPLICATIONS FOR
INFORMATION RETRIEVAL AND FOR MULTIVARIATE DATA ANALYSIS
F. Murtagh
Department of Computer Science, University College Dublin,
Dublin 4 Ireland
Hierarchic clustering methods may be used to condense
information for a user, as they are in multivariate data
analysis, or to achieve computational advantages, as they are
in information retrieval. The structure of the hierarchic
classification produced has a direct bearing on the
effectiveness and utility of using cluster analysis, yet this
important feature of the classification has only been
implicitly referred to in the literature to date. In this
study, three different coefficients are defined, each of
which quantify the symmetry-asymmetry (balancedness-
unbalancedness) of hierarchic clusterings on a scale from 0
to 1. Using examples of data from the areas of information
retrieval and of multivariate data analysis, a number of
hierarchic clustering methods are discussed in terms of the
hierarchies they produce.
(INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp.
611-617, 1984).
31. AUTOMATIC INDEXING OF FULL TEXTS
Dr. Zdenek Jonak
Central Office of Scientific, Technical and Economic
Information, Prague, Czechoslovakia
The article deals with the preparation of query description
using a semantic analyser method based on the analysis of
semantic structure of documents. The aim of the paper is to
demonstrate the efficiency of this method in the field of
automatic indexing. The results obtained by means of this
method are compared with results of automatic indexing
performed by some traditional methods and with the results of
indexing done by human indexers.
(INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp.
619-627, 1984).
32. ASPECTS AND THE OVERLAP FUNCTION
Marilyn M. Levine
Dr. Levine's Information Machine, 823 N. 2nd Street, Room
200, Milwaukee, WI 53203, USA
Leonard P. Levine
Department of Electrical Engineering and Computer Science,
University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA
It is intuitively clear that putting the cart before the
horse is not the same as putting the horse before the cart.
It is equally clear that a history of philosphy is different
from a philosophy of history. Yet there is no logical
relationship, like the AND/OR/NOT functions, which would
enable manipulation of these permuted, non-commutative,
relationships. In this paper we present a system for
automatic handing of ordered sets, states based on these
sets, and of differing points of view regarding a Universe of
Discourse. We call what we are dealing with aspects and we
represent them by means of a new logical function called the
Overlap function.
(INFORMATION PROCESSING AND MANAGEMENT, Vol 20, NO. 5/6, pp.
629-636, 1984).
33. A COMPARISON OF TWO METHODS FOR BOOLEAN QUERY RELEVANCY
FEEDBACK
G. Salton and E. Voorhees
Department of Computer Science, Cornell University, Ithaca,
NY 14853, USA
E. A. Fox
Department of Computer Science, Virginia Polytechnic
Institute and State University, Blacksburg, VA 24061, USA
The relevance feedback process uses information derived from
an initially retrieved set of documents to improve subsequnt
search formulations and retrieval output. In a Boolean query
environment this implies that new query terms must be
identified and Boolean operators must be chosen automatically
to connect the various query terms. In this study two
recently proposed automatic methods for relevance feedback of
Boolean queries are evaluated and conclusions are drawn
concerning the use of effective feedback methods in a Boolean
query environment.
(INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp.
637-651, 1984).
34. ORGANIZATION OF CLUSTERED FILES FOR CONSECUTIVE RETRIEVAL
J. S. Deogun
University of Nebraska
V. V. Raghavan and T. K. W. Tsou
University of Regina
This paper studies the problem of storing single-level and
multilevel clustered files. Necessary and sufficient
conditions for a single-level clustered file to have the
consecutive retrieval property (CRP) are developed. A linear
time algorithm to test the CRP for a given clustered file and
to identify the proper arrangement of objects, If CRP exists,
is presented. For the single-level clustered files that do
not have CRP, it is shown that the problem of identifying a
storage organization with minimum redundancy is NP-complete.
Consequently, an efficient heuristic algorithm to generate a
good storage organization for such files is developed.
Furthermore, it is shown that, for certain types of
multilevel clustered files, there exists a storage
organization such that the objects in each cluster, for all
clusters in each level of the clustering, appear in
consecutive locations.
(ACM TRANSACTIONS ON DATABASE SYSTEMS, Vol. 9, No. 4,
December 1984, Pages 646-671)
35. LASER OPTICAL DISK: THE COMING REVOLUTION IN ON-LINE STORAGE
Larry Fujitani
Commercially available only recently, the optical disk drive
uses a laser beam to burn impressions onto a plastic disk.
Employing a highly focused beam rather than a diffuse
magnetic field to write, the laster optical disk drive yields
storage densities up to 10 times those of magnetic disks.
(COMMUNICATIONS OF THE ACM, Vol. 27, Number 6, June 1984)
36. AUTOMATIC SPELLING CORRECTION IN SCIENTIFIC AND SCHOLARLY
TEXT
Joseph J. Pollock and Antonio Zamora
An automatic spelling correcting algorithm corrects most of
the 50,000 misspellings culled from 25,000,000 words of text
from seven scientific and scholarly databases. It uses a
similarity key to identify words in a large dictionary that
are most similar to a particular misspelling, and then an
error-reversal test to select from these the most plausible
correction(s).
(COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April, 1984)
37. THE DATA-DOCUMENT DISTINCTION IN INFORMATION RETRIEVAL
David C. Blair
The speed and effectiveness of documents retrieval systems
can be greatly improved by reducing the number of logical
decisions required of the user. Based on the weighting of
single terms by the user, the proposed system provides an
optimized search strategy by combining the terms to yield the
highest probabilities and then calculating the size of the
retrieval set in each case.
(COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April 1984)
------------------------------
END OF IRList Digest
********************