Copy Link
Add to Bookmark
Report
IRList Digest Volume 2 Number 44
IRList Digest Thursday, 18 September 1986 Volume 2 : Issue 44
Today's Topics:
Email - Address of surveyer of work on automatic indexing
Query - Sound-alike matching?
COGSCI - Alvey Speech Input Workstation and Word Processor
Abstracts - More from latest issue of ACM SIGIR Forum, Part 2
News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet
CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------
From: Mitchell Wyle <wyle@ethz.uucp>
Date: Sat, 13 Sep 86 13:10:16 -0200
Subject: paper on research in automatic indexing
...
I just joined professor H.P. Frei's IR group. I am preparing a conspectus
paper of the fundamentals and current research in automatic indexing in IR.
Thanks in advance. -M Wyle
------------------------------
Date: Sat, 13 Sep 86 18:37:08 edt
From: sdpage%sevax.prg.oxford.ac.uk%sevax.prg.oxford.ac.uk@CS.UCL.AC.UK
Subject: Code to match sound-alike words?
I have vague memories of a code which will map two words which sound
alike onto each other. The typical application is an airline
reservations system, where a telephone caller could be saying "Smith" or
"Smyth" -- the database query system will match either.
Can anyone give me a reference to a code like this one? - Thanks.
Stephen Page
Programming Research Group -- Oxford
sdpage%prg.oxford.ac.uk@cs.ucl.ac.uk
[Note: some early work is described below
Davidson, Leon. Retrieval of Misspelled Names in an Airlines
Passenger Record System. Commun. ACM, 5(5): 169-71, May 1962.
Greenfield, R.H. An Experiment to Measure the Performance of
Phonetic Key Compression Retrieval Schemes. Meth. Inform. Med.,
16: 230-233, 1977.
Joseph, D.M. and Ruth L. Wong. Correction of Misspellings and
Typographic Errors in a Free-Text Medical English Information Storage
and Retrieval System. Meth. Inform. Med., 18: 228-234 (sic?), 1979.
perhaps others will comment on more recent articles - Ed]
------------------------------
Date: Tue, 9 Sep 86 18:40:45 edt
From: DEJONG%OZ.AI.MIT.EDU@AI.AI.MIT.EDU
Subject: Cognitive Science Calendar
Date: Tuesday, 9 September 1986 11:09-EDT
From: AHAAS at G.BBN.COM
Thursday, 11 September 10:00am Room: BBN 2nd Floor Large Conference
Room, 10 Moulton St.
BBN ARTIFICIAL INTELLIGENCE SEMINAR
Interactive Incremental Speech Input:
Interim Report on a Linguistics/AI Approach to Speech Recognition
Henry Thompson
University of Edinburgh
The Alvey Large Scale Demonstrator Project entitled 'Speech Input
Workstation and Word Processor' is a British effort involving the
Plessey company and three universities, including Edinburgh, in
the construction of a demonstration prototype of a commercially
viable speech input system. Edinburgh is responsible for the
speech processing aspects of the project. In this talk I will
try to cover three things:
1) An overview of the systems architecture and methodology of our
work. We are committed to using explicit knowledge bases at
as many levels of the processing as possible, to employing
parsing (active chart based) in using those knowledge bases,
and to supporting only selective, as opposed to instructional,
interaction between levels.
2) A brief report of the performance of our first milestone sys-
tem, which came up in June of this year about 18 months in to
our five year effort.
3) A more detailed exposition of how we are employing parsing at
the segmentation and labelling level.
------------------------------
Date: Wed, 23 Jul 1986 13:06 CST
From: Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet>
Subject: More SIGIR FORUM Abstracts [Part 2 - Ed]
[Note: Members of ACM SIGIR should have received the spring/summer
Forum, and can find these on pages 24-27. The remaining part will
appear in machine readable form in the next issue of IRList. - Ed]
ABSTRACTS
(Selected from recent issues of journals)
6. STATISTICS IN INFORMATION RETRIEVAL EXPERIMENTS
V. E. Weissmann
Institut fur Angewandte Informatik, Technische Universitat
Berlin Projekt LIVE
Nearly all people use statistics, but very often in the wrong
way. To give some clues for the proper use of statistics, a
framework will be developed in this paper to help one
understand the methodology of applying statistics in IR
experiments.
The central idea of this framework is that one should i)
distinguish between two kinds of models: an expert model and
a mathematical-statistical model, and ii) recognize that
these two models are highly interdependent.
The argument for the need for these two models (and the
distinction between them) will follow a meta-scientific
approach of J. D. Sneed[1].
To make the numerous relationships in this framework more
comprehensible a graphical method called Isac is used.
(INFORMATION PROCESSING AND MANAGEMENT, Vol. 22, No. 1, pp.
29-37, 1986).
7. INFORMATION RETRIEVAL IN AN OFFICE FILING FACILITY AND FUTURE
WORK IN PROJECT MINSTREL
A. F. Smeaton and C. J. Van Rijsbergen
University College Dublin, Department of Computer Science,
Belfield, Dublin 4, Ireland
In this paper we review filing and retrieval mechanisms for
unstructured and mixed media information in an office filing
facility. In particular, we concentrate on methods of filing
and retrieval using the content of the unstructured or free
text parts of office objects, but the state of the art in the
handling of voice and image data is also discussed. Two of
the ways of implementing content retrieval of free text are
to search the text itself or to search some text surrogate.
Two of the problems associated with the latter method, choice
of an internal representation form and analysis of text into
this form, are detailed in the paper. Finally, an outline is
given of work to be done as part of Project Minstrel.
(INFORMATION PROCESSING AND MANAGEMENT, Vol. 22, No. 2, pp.
135-149, 1986).
8. AN INDUCTIVE SEARCH SYSTEM: THEORY, DESIGN, AND
IMPLEMENTATION
M. E. Maron and Paul Thompson
School of Library and Information Studies
University of California,
Berkeley, CA 94720
Sean Curry
University of California
San Francisco, CA 94143
An automated information system that can accept requests for
information and, in response, selects and ranks by
probability of satisfaction the names of those people who can
answer the input queries is described. This information
system (called Helpnet) is based on new probabilistic design
principles, which were previously proposed (but never
implemented) for the document retrieval problem. Helpnet has
now been implemented on an IBM Personal Computer. The
theoretical design principles used for Helpnet and the
computer programs used by this implementation of Helpnet are
discussed. Also, a preliminary sensitivity analysis is
presented, which looks at the question of how imput errors
influence the rankings at the output. The probabilistic
design principles used in Helpnet can be applied to a much
larger class of similar situations, which we call "inductive
search" situations.
(IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, Vol. SMC-
16, No. 1, pp. 21-28, January/February 1986)
9. MULTIPLE GENERATION TEXT FILES USING OVERLAPPING TREE
STRUCTURES
F. Waren Burton
Department of Electrical Engineering and Computer Science,
University of Colorado at Denver, Denver, Colorado 80202, USA
Matthew M. Huntbach
Cognitive Studies, University of Sussex, Brighton, U.K.
J. (Yiannis) G. Kollias
Department of Computer Science, National Technical University
of Athens, 9 Heroon Polytechniou Avenue, Zografou Athens
(624), Greece
When repeatedly editing a text file, one is often faced with
a choice of keeping previous generation for backup or
deleting previous generations to reduce storage requirements.
Since on generation of a text file is often very similar to
the previous generation, the above conflict can often be
resolved by sharing much of the common information.
We propose using a tree structure to represent a text file.
Common subtrees can be shared. Results of an experiment
with one file are reported.
(THE COMPUTER JOURNAL, Vol. 28, No. 4, pp. 414-416, 1985)
10. STRUCTURAL PROPERTIES OF THE STRING STATISTICS PROBLEM
A. Apostolico
Department of Computer Science, Purdue University, West
Lafayette, Indiana 47907
F. P. Preparata
Coordinated Sciences Laboratory, University of Illinois at
Urbana-Champaign, Urbana, Illinois 61801
A suitably weighted Index Tree such as a B-tree or a Suffix
Tree can be easily adapted to store, for a given string x and
for all substrings w of x, the number of distinct instances
of w along x. The storage needed is seen to be linear in the
length of x: moreover, the whole statistics can itself be
derived in linear time, off-line of a RAM. If the substring
w has nontrivial periods, however, the number of distinct
instances might differ from that of distinct nonoverlapping
occurrences along x. It is shown here that O(n log n)
storage units - n standing for the length of x - are
sufficient to organize this second kind of statistics, in
such a way that the maximum number of nonoverlapping
instances for arbitrary w along x can be retrieved in a
number of character comparisons not exceeding the length of
w.
(JOURNAL OF COMPUTER AND SYSTEM SCIENCES 31, 394-411, 1985)
11. A COMPARISON OF A NETWORK STRUCTURE AND A DATABASE SYSTEM
USED FOR DOCUMENT RETRIEVAL
W. Bruce Croft
Thomas J. Parenty
Computer and Information Science Department, University of
Massachusetts, Amherst, MA 01003
Database systems have many advantages for implementing
document retrieval systems. One of the main advantages would
be the integration of data and text handling in a single
information system. However, it has not been clear how much
a database implementation would cost in terms of efficiency.
In this paper, we compare a database implementation and a
stand-alone implementation of a flexible representation of
the content of documents and the associated search
strategies. The representation used is a network of document
and index term nodes. The comparison shows that certain
features of a database system can have a significant effect
on the efficiency of the implementation. Despite this, it
appears that a database implementation of a sophisticated
document retrieval system can be competitive with a stand-
alone implemention.
(INFORM. SYSTEMS Vol. 10, No. 4, pp. 377-390, 1985)
12. A NOTE ON NATURAL SELECTION
Wlodzimierz Dobosiewicz
Department of Computing Information Science, University of
Guelph, Guelph, Ontario N1G 2W1, Canada
Replacement selection is the most popular algorithm used in
the creation of initial runs for a sort/merge external sort.
In 1972, Frazer and Wong suggested a variation, called
natural selection, which uses an auxiliary memory reservoir
to increase the performance of replacement selection.
Natural selection produces longer runs than replacement
selection if the auxiliary memory reservoir is sufficiently
large, but it behaves very strangely when the size of the
auxiliary memory is small: while using more memory resources
than replacement selection, it creates shorter runs, thus
being less efficient.
As it turns out, this deficiency can be avoided at low cost.
This note presents a variation of natural selection that is
efficient when the auxiliary memory is small.
(INFORMATION PROCESSING LETTERS 21 (1985) 239-243)
------------------------------
END OF IRList Digest
********************