Copy Link
Add to Bookmark
Report

IRList Digest Volume 4 Number 34

eZine's profile picture
Published in 
IRList Digest
 · 1 year ago

IRList Digest           Tuesday, 7 May 1988      Volume 4 : Issue 34 

Today's Topics:
Abstract - Selected abstracts appearing in SIGIR FORUM (part 2 of 2)

News addresses are
Internet or CSNET: fox@vtopus.cs.vt.edu
BITNET: foxea@vtvax3.bitnet

----------------------------------------------------------------------

Date: Tue, 17 May 88 09:10:51 CDT
From: "Dr. Raghavan" <raghavan%raghavansun%usl.csnet@RELAY.CS.NET>
Subject: Abstracts from SIGIR Forum [Part II of II - Ed.]

[Note: this is the final part, continued from previous issue - Ed.]

ABSTRACTS [Note: continued - Ed.]

ON MODELING OF INFORMATION RETRIEVAL CONCEPTS IN VECTOR SPACES
S.K.M. Wong, W. Ziarko, V.V. Raghavan, and P.C.N. Wong, Department of Computer
Science, University of Regina, Regina, Canada S4S 0A2
The Vector Space Model (VSM) has been adopted in information retrieval as a
means of coping with inexact representation of documents and queries, and the
resulting difficulties in determining the relevance of a document relative to
a given query. The major problem in employing this approach is that the
explicit representation of term vectors is not known a priori. Consequently,
earlier researchers made the assumption that the vectors corresponding to
terms are pairwise orthogonal. Such an assumption is clearly unrealistic.
Although attempts have been made to compensate for this assumption by some
separate, corrective steps, such methods are ad hoc and, in most cases,
formally inconsistent.
In this paper, a generalization of the VSM, called the GVSM, is advanced.
The developments provide a solution not only for the computation of a measure
of similarity (correlation) between terms, but also for the incorporation of
these similarities into the retrieval process.
The major strength of the GVSM derives from the fact that it is
theoretically sound and elegant. Furthermore, experimental evaluation of the
model on several test collections indicates that the performance is better
than that of the VSM. Experiments have been performed on some variations of
the GVSM, and all these results have also been compared to those of the VSM,
based on inverse document frequency weighting. These results and some ideas
for the efficient implementation of the GVSM are discussed.
(ACM TRANSACTIONS ON DATABASE SYSTEMS, Vol. 12, No. 2, pp. 299-321, 1987)



TERM CO-OCCURRENCE IN CITED/CITING JOURNAL ARTICLES AS A MEASURE OF DOCUMENT
SIMILARITY
Donna Trivison, 1453 Elbur Avenue, Lakewood, OH 44107,
Term co-occurrences were measured in pairs of cited/citing research
articles selected over the period of time from 1971 until 1983 from a core
literature in the field of information science. A consistent pattern of term
similarity was observed in these article pairs. In contrast, document
similarity was extremely low in randomly paired articles selected from the
same core data base. In 77% of cited/citing articles, there were more co-
occurrences of significant terms than there were in 87% of the same articles
paired randomly. The study served to quantify terminology-relatedness. A
comparison of the similarity of cited/citing literature of various ages
resulted in an indication of the amount of new terminology entering the field.
And, because a clear delineation was achieved between the similarity of
cited/citing articles and the similarity of non-cited/citing articles, the
results were extended to define an expected success rate of a matching
procedure in one context of information retrieval.
(INFORMATION PROCESSING 7 MANAGEMENT, Vol. 23, No. 3, pp. 183-194, 1987)



KNOWLEDGE-SPARSE AND KNOWLEDGE-RICH LEARNING IN INFORMATION RETRIEVAL
Roy Rada, National Library of Medicine, Bethesda, MD 20894
This paper reviews some aspects of the relationship between the large and
growing fields of machine learning (ML) and information retrieval (IR).
Learning programs are described along several dimensions. One dimension
refers to the degree of dependence of an ML + IR program on users, thesauri,
or documents. This paper emphasizes the role of the thesaurus in ML + IR
work. ML + IR programs are also classified in a dimension that extends from
knowledge-sparse learning at one end to knowledge-rich learning at the other.
Knowledge-sparse learning depends largely on user yes-no feedback or on word
frequencies across documents to guide adjustments in the IR system.
Knowledge-rich learning depends on more complex sources of feedback, such as
the structure within a document or thesaurus, to direct changes in the
knowledge bases on which an intelligent IR system depends. New advances in
computer hardware make the knowledge-sparse learning programs that depend on
word occurrences in documents more practical. Advances in artificial
intelligence bode well for knowledge-rich learning.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 3, pp. 195-210, 1987)



KNOWLEDGE RESOURCE TOOLS FOR ACCESSING LARGE TEXT FILES
Donald E. Walker, Artificial Intelligence and Information Science Research,
Bell Communications Research, 435 South Street MRE 2A379, Morristown, NJ 07960
This paper provides an overview of a research program just being defined at
Bellcore. The objective is to develop facilities for working with large
document collections that provide more refined access to the information
contained in these ``source'' materials than is possible through current
information retrieval procedures. The tools being used for this purpose are
machine-readable dictionaries, encyclopedias, and related ``resources'' that
provide geographical, biographical, and other kinds of specialized knowledge.
A major feature of the research program is the exploitation of the reciprocal
relationships between sources and resources. These interactions between texts
and tools are intended to support experts who organize and use information in
a workstation environment. Two systems under development will be described to
illustrate the approach: one providing capabilities for full-text subject
assessment; the other for concept elaboration while reading text. Progress in
the research depends critically on developments in artificial intelligence,
computational linguistices, and information science to provide a scientific
base, and on software engineering, database management, and distributed
systems to provide the technology.
(PROCEEDINGS OF THE FIRST CONFERENCE OF THE UNIVERSITY OF WATERLOO CENTER FOR
THE NEW OXFORD ENGLAND DICTIONARY, Waterloo, Canada, pp. 11-24, November,
1985)



PICTURES OF RELEVANCE: A GEOMETRIC ANALYSIS OF SIMILARITY MEASURES
William P. Jones, Microelectronics and Computer Technology Corporation, P.O.
Box 200195, Austin, Texas 78720 and George W. Furnas, Bell Communications
Research, 435 South Street, Morristown, N.J. 07960
We want computer systems that can help us assess the similarity or
relevance of existing objects (e.g., documents, functions, commands, etc.) to
a statement of our current needs (e.g., the query). Towards this end, a
variety of similarity measures have been proposed. However, the relationship
between a measure's formula and its performance is not always obvious. A
geometric analysis is advanced and its utility demonstrated through its
application to six conventional information retrieval similarity measures and
a seventh spreading activation measure. All seven similarity measures work
with a representational scheme wherein a query and the database objects are
represented as vectors of term weights. A geometric analysis characterizes
each similarity measure by the nature of its iso-similarity contours in an n-
space containing query and object vectors. This analysis reveals important
differences among the similarity measures and suggests conditions in which
these differences will affect retrieval performance. The cosine coefficient,
for example, is shown to be insensitive to between-document differences in the
magnitude of term weights while the inner product measure is sometimes overly
affected by such differences. The context-sensitive spreading activation
measure may overcome both of these limitations and deserves further study.
The geometric analysis is intended to complement, and perhaps to guide, the
empirical analysis of similarity measures.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 6, pp.
420-442, 1987)



3
I R: A NEW APPROACH TO THE DESIGN OF DOCUMENT RETRIEVAL SYSTEMS
W.B. Croft and R.H. Thompson, Department of Computer and Information Science,
University of Massachusetts, Amherst, MA 01003
The most effective method of improving the retrieval performance of a
document retrieval system is to acquire a detailed specification of the user's
information need. The system described in this article, IIIR, provides
a number of facilities and search strategies based on this approach. The
system uses a novel architecture to allow more than one system facility to be
used at a given stage of a search session. Users influence the system actions
by stating goals they wish to achieve, by evaluating system output, and by
choosing particular facilities directly. The other main features of IIIR
are an emphasis on domain knowledge used for refining the model of the
information need, and the provision of a browsing mechanism that allows the
user to navigate through the knowledge base.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol 38, No. 6, pp.
389-404, 1987)



HYPERTEXT: AN INTRODUCTION AND SURVEY
Jeff Conklin, Microelectronics and Computer Technology Corp., P.O. Box 200195,
Austin, TX 78720
As workstations grow cheaper, more powerful, and more available, new
possibilities emerge for extending the traditional notion of ``flat'' text
files by allowing more complex organizations of the material. Mechanisms are
being devised which allow direct machine-supported references from one textual
chunk to another; new interfaces provide the user with the ability to interact
directly with these chunks and to establish new relationships between them.
These extensions of the traditional text fall under the general category of
hypertext (also known as nonlinear text).
This article is a survey of existing hypertext systems, their applications,
and their design. It is both an introduction to the world of hypertext and,
at a deeper cut, a survey of some of the most important design issues that go
into fashioning a hypertext environment.
(COMPUTER, Vol. 20, No. 9, pp. 17-42, 1987)



PARALLEL QUERYING OF LARGE DATABASES: A CASE STUDY
Harold S. Stone, IBM T.J. Watson Research Center,
Parallelism by itself does not necessarily lead to higher speed. In the
case study presented here, the parallel algorithm was far less efficient than
a good serial algorithm. The study does, however, reveal how to best use
parallelism to best use - run the more efficient serial algorithm in a
parallel manner.
The case study extends the work of Stanfil and Kahle, who presented
an algorithm for high-speed querying of a large database. They demonstrated
the use of a parallel program running on a 16,000-processor Connection Machine
and obtained estimates for the running time of the algorithm on a 64K-
processor system with queries made against a very large database of Reuters
news releases. Their results show that the throughput for parallel query
analysis is high in an absolute sense. But they did not provide a performance
analysis of speedup or other aspects of algorithmic behavior that would reveal
what factors of machine and algorithm design contribute most strongly to the
performance. This article provides that analysis.
(COMPUTER, Vol. 20, No. 10, pp. 11-12, 1987)



HISTORICAL NOTE: A PERSONALIZED HISTORY OF OCLC
Frederick G. Kilgour, Founder Trusteed, OCLC Online Computer Library Center,
Inc., Dublin, Ohio
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 5, pp.
381-384, 1987)



HISTORICAL NOTE: THE PAST THIRTY YEARS IN INFORMATION RETRIEVAL
Gerard Salton, Department of Computer Science, Cornell University, Ithaca, New
York 14853
The doucmentation literature of the 1950s is reviewed briefly, and some
early text processing endeavors are discussed. Various predictions made in
1960 by Mooers about the creative role of computers in information retrieval
are then considered, and an attempt is made to explain why some of the more
exciting predictions have not been fulfilled. Conclusions are drawn
concerning the limits of computer power in text retrieval applications.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol 38, No. 5, pp.
375-380, 1987)



HISTORICAL NOTE: INFORMATION SCIENCE AND TECHNOLOGY: FROM COORDINATE
INDEXING TO THE GLOBAL BRAIN
Cloyd Dake Gull, 8 Pimlico Court, Silver Spring, MD 20906
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 5, pp.
338-366, 1987)



HISTORICAL NOTE: SHINING PALACES, SHIFTING SANDS: NATIONAL INFORMATION
SYSTEMS
Harold Wooster, Senior Information Scientist (Retired), Lister Hill National
Center for Biomedical Communications, National Library of Medicine, Department
of Health and Human Services, Bethesda, MD 20894
This article discusses post-Sputnik national information systems under
three major headings: Shifting Sands, the false assumptions that the Soviets
were first in space because of the superiority of their educational system and
their scientific and technical information system, VINITI; The Shining Palaces
lists as appendixes 31 reports since 1958 which propose various forms of a
national information system, and analyzes 30 National Plans. The author does
not presume to favor any of them; in Solid Rock-The Ugly Houses the author
lists in an appendix the involvement of the federal government with scientific
and technical information since the first patent act of 1709, and discusses
what he thinks should be done for the users of a national system, the role of
technical documentary reports, project information systems and scientific
journals. The Summary and Conclusions starts with three quotations, written 22
years apart, which show that nothing has changed in over two decades. In a
Personal Note the author summarizes his forty year career as an information
scientist.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 5, pp.
321-335, 1987)

------------------------------

END OF IRList Digest
********************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT