Copy Link
Add to Bookmark
Report

IRList Digest Volume 4 Number 47

eZine's profile picture
Published in 
IRList Digest
 · 1 year ago

IRList Digest           Sunday, 28 August 1988      Volume 4 : Issue 47 

Today's Topics:
Discussion - Stemming (recall V4 #42)
Announcement - SGML Standard for Machine Readable Dictionaries Workshop
Abstracts - Dissertations selected by S. Humphrey [Part 4 of 5]

News addresses are
Internet: fox@fox.cs.vt.edu or fox%fox.cs.vt.edu@dcssvx.cc.vt.edu
BITNET: foxea@vtcc1.bitnet (replaces foxea@vtvax3)

----------------------------------------------------------------------

Date: 08. August 1988, 18:02:02 (CET)
From: XID2FUHR@DDATHD21.BITNET (Norbert Fuhr)
Subject: ... Comment on Stemming

Dear Ed,
...

Reply to Donna Harman' comments on stemming in IRLIST #42:
I do not agree with Donna Harman's comments on the absence of any quality
differences between different stemming algorithms. We have been working
with two kinds of stemming in our work and found them very useful:
- The first algorithms reduces nouns to their singular form and verbs
to their infinite form. We call the result the standard form.
- The second algorithm is similar to the one used in the SMART system
and reduces all words to their stem, e.g. computation, computing and
computer to 'comput'.
Unfortunately the algorithms have been published only in German:
R. Kuhlen: Experimentelle Morphologie in der Informationswissenschaft.
Verlag Dokumentation, Muenchen, 1977.
The point is that you have to assign different weights to the terms in
the documents according to the stemming algorithm employed: When you
have the term 'computers' in your query and you find 'computer' in a
document, that is both terms agree in the standard form, then you should
assign a higher weight than in the case where you find 'computation' in a
document, so that the terms only have equal word stems.
Now the only problem is to assign proper weights in the different cases.
We have described our approach in the paper presented at the SIGIR88:
"The Automatic Indexing System AIR/PHYS - from Research to Application"
(Biebricher et al.). A more theoretic description and possible applications
are outlined in the forthcoming paper "Models for Retrieval with Probabilistic
Indexing" (by N. Fuhr) which will appear in Information Processing and
Management.
Kind regards,
Norbert

------------------------------

Date: Fri, 12 Aug 88 19:10:42 EDT
From: Robert A Amsler <amsler@FLASH.BELLCORE.COM>
Subject: Workshop Announcement

DICTIONARY ENCODING INITIATIVE

A ONE-DAY WORKSHOP ON THE DEVELOPMENT OF AN
SGML STANDARD FOR MACHINE-READABLE DICTIONARIES

Hosted by Robert A. Amsler and Frank Wm. Tompa

Wednesday, October 26, 1988, 10 AM - 5 PM
(the day before the 1988 Waterloo Conference: Information in Text)
Davis Building, University of Waterloo, Ontario, Canada


The development of a text standard for the interchange of machine-
readable lexical entries is seen as an essential step toward making
such information useful to future generations of computational
scientists and scholars. Whereas several ad hoc schemes for encoding
dictionary entries exist, and even larger numbers of idiosyncratic
typesetting formats exist, there is an increasing number of variants
of such formats being propagated through the research community.
Without the introduction of some standard formats for the interchange
of such information, both the publishing and research communities
will suffer.

A preliminary draft of such an interchange standard for encoding
machine-readable English monolingual dictionary entries has been
developed in Standard Generalized Markup Language (SGML). This
workshop will present the contents and rationale for this standard
and offer attendees the opportunity to join the Dictionary Encoding
Initiative to refine and complete the standard. We are both inviting
your commentary and soliciting your help in attempting to make the
resultant standard serve the needs of all researchers.

If you are able to attend the workshop, please reply via email or
postal mail to:

Robert A. Amsler
Dictionary Encoding Initiative Workshop
Bellcore, MRE 2D-398
445 South Street
P.O. Box 1910
Morristown, NJ 07960-1910, USA

email:
amsler@flash.bellcore.com
uunet.uu.net!bellcore!amsler

------------------------------

Date: Wed, 3 Aug 88 13:36:58 EDT
From: "Susanne M. HUMPHREY" <humphrey@MCS.NLM.NIH.GOV>
Subject: dissertation abstracts [Note: Part 4 of 5 - Ed.]

.[
AN University Microfilms Order Number ADG88-04609.
AU FAGAN, JOEL L.
IN Cornell University Ph.D 1988, 278 pages.
TI EXPERIMENTS IN AUTOMATIC PHRASE INDEXING FOR DOCUMENT RETRIEVAL: A
COMPARISON OF SYNTACTIC AND NONSYNTACTIC METHODS.
DE Information Science.
AB In order for an automatic information retrieval system to
effectively retrieve documents related to a given subject area,
the content of each document in the system's database must be
represented accurately. This study examines the hypothesis that
better representations of document content can be constructed if
the content analysis method takes into consideration the syntactic
structure of document and query texts. Two methods of
automatically generating phrases for use as content indicators
have been implemented and tested experimentally. The non-syntactic
(or statistical) method is based on simple text characteristics
such as word frequency and the proximity of words in text. The
syntactic method uses augmented phrase structure rules (production
rules) to selectively extract phrases from parse trees generated
by an automatic syntactic analyzer.

Experimental results show that the effect of non-syntactic phrase
indexing is inconsistent. For the five collections tested,
increases in average precision ranged from 22.7% to 2.2% over
simple, single term indexing. The syntactic phrase indexing method
was tested on two collections. Precision figures averaged over all
test queries indicate that non-syntactic phrase indexing performs
significantly better than syntactic phrase indexing for one
collection, but that the difference is insignificant for the other
collection. More detailed analysis of individual queries, however,
indicates that the performance of both methods is highly variable,
and that there is evidence that syntax-based indexing has certain
benefits not available with the non-syntactic approach.

Possible improvements of both methods of phrase indexing are
considered. It is concluded that the prospects for improving the
syntax-based approach to document indexing are better than for the
non-syntactic approach.

The PLNLP system was used for syntactic analysis of document and
query texts, and for implementing the syntax-based phrase
construction rules. The SMART information retrieval system was
used for retrieval experimentation.

This thesis is available as a technical report from the Department
of Computer Science, Cornell University.
.]
.[
AN University Microfilms Order Number ADG88-02784.
AU JACOBS, SHEILA MAUREEN.
IN Arizona State University Ph.D 1987, 175 pages.
TI HYPOTHESIS-CONFIRMING INFORMATION SEARCH STRATEGIES AND COMPUTERIZED
INFORMATION RETRIEVAL SYSTEMS.
DE Information Science.
AB A recent trend in information retrieval systems technology is the
development of on-line information retrieval systems. One
objective of these systems has been to attempt to enhance decision
effectiveness by allowing users to preferentially seek
information, thereby facilitating the reduction or elimination of
information overload. These systems do not necessarily lead to
more effective decision making, however. Recent research in
information search strategy suggests that when users are seeking
information subsequent to forming initial beliefs, they may
preferentially seek information to confirm these beliefs.
Therefore, decision making effectiveness may be dependent on the
accuracy of the decision maker's initial hypothesis of causality.

It seems that effective computer-based decision support requires
an information retrieval system capable of: (a) retrieving a
subset of all available information, in order to reduce
information overload, and (b) supporting an information search
strategy that considers all relevant information, rather than
merely hypothesis-confirming information. An information retrieval
system with an expert component (i.e., a knowledge-based DSS)
should be able to provide these capabilities.

The basic research question is: Will the use of a KBDSS, designed
to search for and present both confirming and disconfirming
evidence, result in enhanced decision effectiveness? Enhanced
decision effectiveness is defined, in this study, as a significant
change to the initial attribution of causality for a described
problem.

To assess the effect of information retrieval system type on
decision effectiveness, a laboratory experiment was conducted.
Participants were presented with brief work histories describing a
job performance problem and suggesting a cause for the problem.
They were required to make an initial attribution of causality for
the problem, to query either a conventional on-line information
retrieval system or a KBDSS for additional information, and then
to make a final attribution of causality.

The results of this study are not conclusive; there was neither
strong confirmatory evidence nor strong disconfirmatory evidence
regarding the effectiveness of the KBDSS. Further research on this
type of decision aid is needed before definite recommendations can
be made regarding the design of computer-based decision aids that
support preferred information search strategies.
.]
.[
AN University Microfilms Order Number ADG87-27638.
AU NARA, HIROSHI.
IN University of Kansas Ph.D 1987, 201 pages.
TI MODULAR DENOTATIONAL SEMANTICS IN A ROBUST NATURAL LANGUAGE
FRONT-END TO A RELATIONAL DATABASE.
DE Language, Linguistics.
AB This dissertation describes the details of a robust and
transportable natural language interface to a relational database.
Called the English Database Access and Management System (EDAMS),
it differs from many other Natural Language Interfaces (NLIs) in
that the parser and the semantic component work in tandem so that,
as soon as a denoting expression is parsed, the corresponding
semantics is given to it. These two components communicate with
each other very closely, until the parse for the entire input
string is successfully interpreted.

The emphasis of the dissertation is the design and implementation
of the semantic component. The semantics of a basic expression is
given by first reducing it to a procedure in SQL/DML Emulator,
which is executed to compute the referent of the expression. The
COMPOSE module assembles the referents of basic expressions and
builds the denotation of progressively larger derived expressions,
ultimately giving the semantics to the entire input.

In the implementation of the semantic component, special attention
is paid to the semantic analysis of measure adjectives, noun
compounds, and quantifiers. In the analysis of these adjectives,
their meanings are procedurally defined, and semantically complex
adjectives are decomposed into more elementary attributes found in
the database. Noun compounds are given interpretation by way of
'semantic connectedness.'

The system works well with a multi-file relational database,
responds satisfactorily to syntactically deviant and telegraphic
queries for improved robustness, and has the ability to detect
denotationally empty expressions early in the parsing process and
to use this information to reject unfruitful parses. The
dissertation concludes with an evaluation of EDAMS, possible ways
to enhance reference and composition algorithms, and possible
extensions to the present system.

EDAMS offers many amenities: an interactive module to register,
view, and manipulate compounds, alternate spellings, synonyms, and
abbreviations, facilities for both interactive and batch
processing of queries, a spelling checker, an ATN compiler,
interactive access to domain dependent information, a system
access manager for controlled access to EDAMS, a dictionary access
manager, facilities for historical databases, and facilities to
permit hierarchical data to reside in the relational database.
.]

From rootcsh Wed Aug 3 16:56 EDT 1988
Received: by mcs.nlm.nih.gov (5.59/1.14)
id AA10005; Wed, 3 Aug 88 15:54:17 EDT
Date: Wed, 3 Aug 88 15:54:17 EDT
From: humphrey@mcs.nlm.nih.gov (Susanne M. HUMPHREY)
Message-Id: <8808031954.AA10005@mcs.nlm.nih.gov>
To: fox@fox.cs.vt.edu, humphrey@mcs.nlm.nih.gov
Subject: Re: dissertation abstracts
Status: RO

Ed, I noticed a typo. The line:

IN University of California, Los Angeles Ph.Do 1987, 219 pages.

should be:

IN University of California, Los Angeles Ph.D 1987, 219 pages.

--Susanne



From rootcsh Fri Aug 5 16:19 EDT 1988
Return-Path: <humphrey@MCS.NLM.NIH.GOV>
Received: from mcs.nlm.nih.gov by RELAY.CS.NET id aa03581; 5 Aug 88 14:08 EDT
Received: by mcs.nlm.nih.gov (5.59/1.14)
id AA09050; Fri, 5 Aug 88 14:03:38 EDT
Date: Fri, 5 Aug 88 14:03:38 EDT
From: "Susanne M. HUMPHREY" <humphrey@MCS.NLM.NIH.GOV>
Message-Id: <8808051803.AA09050@mcs.nlm.nih.gov>
To: fox%vtopus.cs.vt.edu@RELAY.CS.NET, humphrey@MCS.NLM.NIH.GOV
Subject: a few more
Status: R

Ed, another bunch. This will probably be it for a while. --Susanne

.[
AN University Microfilms Order Number ADGD--80478.
AU YOON, CHOON SUP.
IN University of Edinburgh (United Kingdom) Ph.D 1987, 325 pages.
TI A HOUSING INFORMATION SERVICE: A SYSTEMATIC APPROACH TOWARDS THE
EFFECTIVE USE OF STRUCTURED BUILDING APPRAISALS IN THE DESIGN OF NEW
HOUSING.
DE Architecture.
AB Available from UMI in association with The British Library.
Requires signed TDF.

This study is concerned with the search for workable improvements
in the design of housing schemes by means of feedback obtained
through the appraisal and measurement of performance of existing
housing schemes.

Feedback information is seldom fully utilised by designers. This
is due on the one hand, to the scattered and disorganised nature
of feedback information sources and on the other, to the general
lack of exchange of experience and information between designers.
Valuable experience gained from past projects is thereby often
wasted, resulting in the tendency to repeat mistakes and to
overlook the existence of proven solutions.

There is then, a serious need for access to sources of relevant
information, enabling us to find simply and precisely what we want
without continual reference to colleagues or written sources. This
can only be achieved where there is a provision for the
structuring of feedback information, ensuring its easy retrieval
and in a form that can be readily used.

To this end, this thesis proposes a computerised housing
information service which will process feedback information
derived from the analysis and appraisal of existing housing
schemes. Furthermore, this thesis explores whether the
establishment of such a housing information service on a national
scale would be both a desirable and viable proposition.

Discussion of the conceptual and technical specifications for the
proposed service is followed by the description of a small pilot
demonstration system, developed to appraise potential user
acceptance. The results of a series of system demonstrations are
analysed.
.]
.[
AN University Microfilms Order Number ADG88-02472.
AU BRICKER, ROBERT JAMES.
IN Case Western Reserve University Ph.D 1987, 392 pages.
TI AN EMPIRICAL INVESTIGATION OF THE INTELLECTUAL STRUCTURE OF THE
ACCOUNTING DISCIPLINE: A CITATIONAL ANALYSIS OF SELECTED SCHOLARLY
JOURNALS, 1983-1986.
DE Business Administration, Accounting.
AB This study empirically investigated the intellectual structure and
knowledge accumulation of the scholarly accounting discipline. A
model of competition in the research environment, entitled the
Research Markets Model, was synthesized from existing literature
and used as the basis for the hypothesis formation.

It was hypothesized that the accounting discipline could be
represented by a model portraying an arrangement of many research
areas which recursively nest together to form larger research
areas at more general levels of association. This model formed an
intellectual structure and consisted of two components--a
representational structure which is a syntactic expression of the
intellectual structure, and intellectual content which is a
semantic expression of the intellectual structure.

A representational structure was inferred through the application
of cocitation clustering to a sample of published accounting
literature. The analysis was based on a data sample consisting of
nearly 11,000 citations drawn from the main journal articles of
six mainstream scholarly accounting journals between 1983 and
early 1986. The resulting structure was validated using Multiple
Discriminant Analysis.

The intellectual content of this representational structure was
established through content analysis and bibliometric methods. The
representational structure and intellectual content results
supported the intellectual structure hypothesis.

The integration of the accounting discipline was tested by
examining accounting interdisciplinary citation patterns and the
structure of the inferred representational structure. The results
showed both a lack of structural integration and a
disproportionately large reliance upon interdisciplinary models
and theories. This suggests that accounting lacks the level of
integration shown by other disciplines.

The hypothesis that accounting scholars employ a scientific
approach to knowledge accumulation was tested by examining
accounting citation age patterns. The results suggested that
accounting does not accumulate knowledge as scientifically as
other social sciences. A systematic bias precluded a firm
conclusion.

This research is the first attempt to provide an empirical and
replicable approach to determining a structure of the accounting
discipline. Extensions and innovations to existing methods of
analysis were developed during the course of this research. The
results demonstrate the existence of numerous individual research
areas and their interrelationships, which may help students and
scholars understand the accounting discipline.
.]
.[
AN University Microfilms Order Number ADG88-03957.
AU CHANG, PHILIP YEN-TANG.
IN The University of Utah PH.D 1987, 163 pages.
TI OPTIMIZATION TECHNIQUES FOR RELATIONAL DATABASE SYSTEMS.
DE Computer Science.
AB Efficient implementation of relational database systems has been a
difficult problem noted by many researchers and system
implementers. In a relational database system, the efficiency
related factors are deliberately hidden from the user. With
complete freedom for specifying queries, the users can easily
formulate queries that are extremely expensive if implemented
directly. It is therefore necessary for a relational database
system to include a "query optimizer" in order to improve the
efficiency of query execution.

This dissertation uses an "automatic programming" approach to
develop a "framework" for relational database query optimization.
A set of specific techniques is also developed to illustrate how
this framework can be applied to different database environments.

Three kinds of optimization form the basis of the framework: query
transformation, binding and run-time processing. Query
transformation is to transform user queries to equivalent queries
that are more efficient to implement. Binding techniques are used
to select the best algorithm for each relational operator.
Run-time processing techniques include pipelining for parallel
execution and information feedback for re-evaluation of earlier
implementation decisions. It is shown that by applying these three
techniques in different degrees, one can design optimizers to fit
different system requirements. It is also shown that the framework
is general enough as a basis for the comparison of many optimizers
developed by others.
.]
[Note: continued in next issue - Ed]

------------------------------

END OF IRList Digest
********************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT