IRList Digest Volume 4 Number 33

Published in
· 1 year ago
IRList Digest           Tuesday, 7 May 1988      Volume 4 : Issue 33 

Today's Topics: 
   Abstract - Selected abstracts appearing in SIGIR FORUM (part 1 of 2) 

News addresses are 
   Internet or CSNET: fox@vtopus.cs.vt.edu 
   BITNET: foxea@vtvax3.bitnet 

---------------------------------------------------------------------- 

Date: Tue, 17 May 88 09:10:51 CDT 
From: "Dr. Raghavan" <raghavan%raghavansun%usl.csnet@RELAY.CS.NET> 
Subject: Abstracts from SIGIR Forum  [Part I of II - Ed.] 

Ed, 
 These are the abstracts I included in the recent Forum. 
... 
  Regards, Vijay 

                                         ABSTRACTS 

 
       (Chosen by G. Salton from recent issues of journals in the retrieval area.) 

       INFORMATION RETRIEVAL BY CONSTRAINED SPREADING ACTIVATION IN SEMANTIC NETWORKS 
       Paul R. Cohen and Rick Kjeldsen, Department of Computer Information Science, 
       Lederle Graduate Research Center, University of Massachusetts, Amherst, MA 
       01003 
          GRANT is an expert system for finding sources of funding given research 
       proposals.  Its search method - constrained spreading activation - makes 
       inferences about the goals of the user and thus finds information that the 
       user did not explicitly request but that is likely to be useful.  The 
       architecture of GRANT and the implementation of constrained spreading 
       activation are described, and GRANT's performance is evaluated. 
       (INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 255-268, 1987) 

 

        DEVELOPMENT OF THE CODER SYSTEM:  A TESTBED FOR ARTIFICIAL INTELLIGENCE 
       METHODS IN INFORMATION RETRIEVAL 
       Edward A. Fox, Department of Computer Science, Virginia Tech, Blacksburg, VA 
       24061 
          The CODER (COmposite Document Expert/Extended/Effective Retrieval) system 
       is testbed for investigating the application of artificial intelligence 
       methods to increase the effectiveness of information retrieval systems. 
       Particular attention is being given to analysis and representation of 
       heterogeneous documents, such as electronic mail digests or messages, which 
       vary widely in style, length, topic, and structure.  Since handling passages 
       of various types in these collections is difficult even for experimental 
       systems like SMART, it is necessary to turn to other techniques being explored 
       by information retrieval and artificial intelligence researchers.  The CODER 
       system architecture involves communities of experts around active blackboards, 
       accessing knowledge bases that describe users, documents, and lexical items of 
       various types.  The initial lexical knowledge base construction work is now 
       complete, and experts for search and time/date handling can perform a variety 
       of processing tasks.  User information and queries are being gathered, and a 
       simple distributed skeletal system is operational.  It appears that a number 
       of artificial intelligence techniques are needed to best handle such common 
       but complex document analysis and retrieval tasks. 
       (INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 341-366, 1987) 

 

       USER MODELING IN INTELLIGENT INFORMATION RETRIEVAL 
       Giorgio Brajnik, Giovanni Guida, and Carlo Tassom, Laboratorio di Intelligenza 
       Artificiale, Dipartimento di Matematica e Informatica, Universita di Udine, 
       Udine, Italy 
          The issue of exploiting user modeling techniques in the framework of 
       cooperative interfaces to complex artificial systems has recently received 
       increasing attention.  In this paper we present the IR-NLI II system, an 
       expert interface that allows casual users to access online information 
       retrieval systems and encompasses user modeling capabilities.  More 
       specifically, an illustration of the user modeling subsystem is given by 
       describing the organization of the user model proposed for the particular 
       application area, together with its use during system operation.  The 
       techniques utilized for the construction of the model are presented as well. 
       They are based on the use of sterotypes, which are descriptions of typical 
       classes of users.  More specifically, they include both declarative and 
       procedural knowledge for describing the features of the class to which the 
       sterotype is related, for assigning a user to that class, and for acquiring 
       and validating the necessary information during system operation. 
       (INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 305-320, 1987) 

 

       A PROTOTYPE OF AN INTELLIGENT SYSTEM FOR INFORMATION RETRIEVAL:  IOTA 
       Y. Chiaramella and B. Defude, 
       Laboratoire IMAG ``Genie Informatique,'' 
       BP 68-38402 St. Martin d'Heres, 
       France 
          Recent results in artificial intelligence research are of prime interest in 
       various fields of computer science; in particular we think information 
       retrieval may benefit from significant advances in this approach.  Expert 
       systems seem to be valuable tools for components of information retrieval 
       systems related to semantic inference. The query component is the one we 
       consider in this paper.  IOTA is the name of the resulting prototype presented 
       here, which is our first step toward what we can an intelligent system for 
       information retrieval. 
          After explaining what we mean by this concept and presenting current 
       studies in the field, the presentation of IOTA begins with the architecture 
       problem, that is, how to put together a declarative component, such as an 
       expert system, and a procedural component, such as an information retrieval 
       system. Then we detail our proposed solution, which is based on a procedural 
       expert system acting as the general scheduler of the entire query processing. 
       The main steps of natural language query processing are then described 
       according to the order in which they are processed, from the initial parsing 
       of the query to the evaluation of the answer.  The distinction between expert 
       tasks and nonexpert tasks is emphasized.  The paper ends with experimental 
       results obtained from a technical corpus, and a conclusion about current and 
       future developments. 
       (INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 285-303, 1987) 

 

       TEXT SIGNATURES BY SUPERIMPOSED CODING OF LETTER TRIPLETS AND QUADRUPLETS 
       Friedrich Gebhadt, Gesellschaft fur Mathematik und Datenverabeitung mbH, D- 
       5205 St Augustin, West Germany 
          Text signatures are a condensed, coded form of a text; due to the reduced 
       length, information is retrieved faster than with the full text if inverted 
       files are not available.  It has been proposed to base a particular form of 
       signatures, the superimposed coding, on letter triplets (or quadruplets) 
       rather than on complete words admitting in this way the masking of 
       searchwords.  This situation is analyzed here theoretically considering the 
       unequal occurrence probabilities of the triplets; the results are compared 
       with a set of experiments.  It turns out that the signatures based on letter 
       triplets produce too many false associations since the triplets occur in words 
       other than the searchword.  With quadruplets, the number of false associations 
       might be tolerable. 
       (INFORMATION SYSTEMS, Vol. 12, No. 2, pp. 151-156, 1987) 

 

       CONCEPT RECOGNITION IN AN AUTOMATIC TEXT-PROCESSING SYSTEM FOR THE LIFE 
       SCIENCES 
       Natasha Vieduts-Stokolov, BIOSIS, 2100 Arch Street, Philadelphia, PA 19103 
          This article describes a natural-language text-processing system designed 
       as an automatic aid to subject indexing at BIOSIS.  The intellectual procedure 
       the system should model is a deep indexing with a controlled vocabulary of 
       biological concepts - Concept Headings (CHs).  On the average, ten CHs are 
       assigned to each article by BIOSIS indexers.  The automatic procedure consists 
       of two stages:  (1) translation of natural-language biological titles into 
       title-semantic representations which are in the constructed formalized 
       language of Concept Primitives, and (2) translation of the latter 
       representations into the language of CHs.  The first stage is performed by 
       matching the titles against the system's Semantic Vocabulary (SV).  The SV 
       currently contains approximately 15,000 biological natural-language terms and 
       their translations in the language of Concept Primitives.  For the ambiguous 
       terms, the SV contains the algorithmical rules of term disambiguation, rules 
       based on semantic analysis of the contexts.  The second stage of the automatic 
       procedure is performed by matching the title representations against the CH 
       definitions, formulated as Boolean search strategies in the language of 
       Concept Primitives.  Three experiments performed with the system and their 
       results are described.  The most typical problems the system encounters, the 
       problems of lexical and situational ambiguities, are discussed.  The 
       disambiguation techniques employed are described and demonstrated in many 
       examples. 
       (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 4, pp. 
       269-287, 1987) 

 

       PROBABILISTIC RETRIEVAL AND COORDINATION LEVEL MATCHING 
       Robert Losee, School of Library Science, University of North Carolina, Chapel 
       Hill, NC  27514 
          Probabilistic models of document-retrieval systems incorporating sequential 
       learning through relevance feedback may require frequent and time-consuming 
       reevaluations of documents.  Coordination level matching is shown to provide 
       equivalent document rankings to binary models when term discrimination values 
       are equal for all terms; this condition may be found, for example, in 
       probabilistic systems with no feedback.  A nearest-neighbor algorithm is 
       presented which allows probabilistic sequential models consistent with two- 
       Poisson or binary-independence assumptions to easily locate the ``best'' 
       document using temporary sets of documents at a given coordination level. 
       Conditions under which reranking is unnecessary are given. 
       (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 4, pp. 
       239-244, 1987) 

 

       OPTIMAL DETERMINATION OF USER-ORIENTED CLUSTERS 
       Vijay V. Raghavan, The Center for Advanced Computer Studies, University of 
       Southwestern Louisiana, Lafayette, LA 70504-4330 and Jitender S. Deogun, 
       Department of Computer Science, University of Nebraska, Lincoln, NE 68588-0115 
          User-oriented clustering schemes enable the classification of documents 
       based upon the user perception of the similarity between documents, rather 
       than on some similarity function presumed by the designer to represent the 
       user criteria.  In this paper, an enhancement of such a clustering scheme is 
       presented.  This is accomplished by the formulation of the user-oriented 
       clustering as a function-optimization problem.  The problem formulated is 
       termed the Boundary Selection Problem (BSP).  Heuristic approaches to solve 
       the BSP are proposed and some preliminary results that motivate the need for 
       further evaluation of these approaches is provided. 
       (PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON 
       RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp. 
       140-146, 1987) 

 

       PROBABILISTIC SEARCH TERM WEIGHTING--SOME NEGATIVE RESULTS 
       Norbert Fuhr and Peter Muller, TH Darmstadt, Fachbereich Informatik, 6100 
       Darmstadt, West Germany 
          The effect of probabilistic search term weighting on the improvement of 
       retrieval quality has been demonstrated in various experiments described in 
       the literature.  In this paper, we investigate the feasibility of this method 
       for boolean retrieval with terms from a prescribed indexing vocabulary.  This 
       is a quite different test setting in comparison to other experiments where 
       linear retrieval with free text terms was used.  The experimental results show 
       that in our case no improvement over a simple coordination match function can 
       be achieved.  On the other hand, models based on probabilistic indexing 
       outperform the ranking procedures using search term weights. 
       (PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON 
       RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp. 
       13-18, 1987) 

 

       NON-HIERARCHIC DOCUMENT CLUSTERING USING THE ICL DISTRIBUTED ARRAY PROCESSOR 
       Edie M. Rasmussen and Peter Willett, Department of Information Studies, 
       University of Sheffield, Western Bank, Sheffield S10 2TN, U.K. 
          This paper considers the suitability and efficiency of a highly parallel 
       computer, the ICL Distributed Array Processor (DAP), for document clustering. 
       Algorithms are described for the implementation of the single-pass and 
       reallocation clustering methods on the DAP and on a conventional mainframe 
       computer.  These methods are used to classify the Cranfield, Vaswani and UKCIS 
       document test collections.  The results suggest that the parallel architecture 
       of the DAP is not well suited to the variable-length records which 
       characterize bibliographic data. 
       (PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON 
       RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp. 
       132-139, 1987) 

 

       QUALITY OF INDEXING IN ONLINE DATA BASES 
       Howard D. White and Belver C. Griffith, College of Information Studies, Drexel 
       University, Philadelphia, PA 19104 
          We describe practical tests by which the quality of subject indexing in 
       online bibliographic data bases can be compared and judged.  The tests are 
       illustrated with 18 clusters of documents from the medical behavioral science 
       literature and with terms drawn from MEDLINE, PsycINFO, BIOSIS, and Excerpta 
       Medica.  Each test involves obtaining a cluster of about five documents known 
       on some grounds to be related in subject matter, and retrieving their 
       descriptors from at least two data bases.  We then tabulate the average number 
       of descriptors applied to the documents, the number of descriptors applied to 
       all and to a majority of the documents in the cluster, and the relative rarity 
       of the applied descriptors.  Comparable statistics emerge on how each data 
       base links related documents and discriminates broadly and finely among 
       documents.  We also gain qualitative insights into the expressiveness and 
       pertinence of the available indexing terms. 
       (INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 3, pp. 211-224, 1987) 

------------------------------ 

END OF IRList Digest 
********************
IRList Digest Volume 4 Number 33

Share this article

Let's discover also

IRList Digest Volume 3 Number 40

IRList Digest Volume 2 Number 60

IRList Digest Volume 4 Number 57

IRList Digest Volume 1 Number 12

IRList Digest Volume 5 Number 01

IRList Digest Volume 2 Number 31

IRList Digest Volume 3 Number 45

IRList Digest Volume 1 Number 15

IRList Digest Volume 1 Number 13

IRList Digest Volume 4 Number 36

Recent Articles

The First Earth's Circumnavigation by Antonio Pigafetta

Yak Facts Issue #10: It's Flavorific!

Yak Facts Issue #9: Now with Ginseng

Yak Facts Issue #8: As Seen On TV

Yak Facts Issue #7: Caution: Live Animals

Yak Facts Issue #6: Repeat as necessary

The Esoteric Origin of the Universal Weekly Sequence

Yak Facts Issue #5: Repeat as necessary

Yak Facts Issue #4: In Technicolor

SA CROCORIGA MANNOSA

Recent Comments