Copy Link
Add to Bookmark
Report
IRList Digest Volume 4 Number 45
IRList Digest Saturday, 27 August 1988 Volume 4 : Issue 45
Today's Topics:
Query - NY Times Collection
Abstracts - Dissertations selected by S. Humphrey [Part 2 of 5]
News addresses are
Internet: fox@fox.cs.vt.edu or fox%fox.cs.vt.edu@dcssvx.cc.vt.edu
BITNET: foxea@vtcc1.bitnet (replaces foxea@vtvax3)
----------------------------------------------------------------------
Date: Tue, 2 Aug 88 16:52 EDT
From: krovetz@UMass
Subject: NY Times test collection
Ed,
Bruce Croft told me you might know about a test collection of
articles from the NY Times. He said it was pretty small. Could
it be sent in a mail message?
Thanks,
Bob
[Note: There is an old collection used in some of the SMART
studies that might be available from Cornell - this is the
only one other than the big collections (probably without
queries) of NY Times documents made at several locations.
Let me know if you turn up something and we can put it on
Virginia Disc One (which is now almost ready!) - Ed]
------------------------------
Date: Wed, 3 Aug 88 13:36:58 EDT
From: "Susanne M. HUMPHREY" <humphrey@MCS.NLM.NIH.GOV>
Subject: dissertation abstracts [Note: Part 2 of 5 - Ed.]
.[
AN University Microfilms Order Number ADG87-29523.
AU YU, JUNG-PIN.
IN The University of Iowa Ph.D 1987, 203 pages.
TI LIBRARY SEARCHES OF INFRARED SPECTRA AND MULTICOMPONENT ANALYSIS.
DE Chemistry, Analytical.
AB Odd moments of the cross-correlation function and average
wavenumber methods are used for library searches. The absorbances
of a spectrum are represented by 24 values if the spectrum is
divided into twelve regions. The indirect spectral data base is
split into twelve subsets according to the spectral regions with
the most intense integrated intensities. Only several subsets are
used for a search. Several versions of integrated intensity
filters are applied to pre-select the library entries. The best
method of this work presently has about an 84% chance of locating
an experimental spectrum among the best five matches if this
spectrum is in the library and the correlation coefficient between
them is not less than 0.9000. The average search time per spectrum
is approximately 30 seconds with a Leading Edge Model D computer
equipped with an 8087 co-processor. The search results by the
correlation coefficient method are comapred, and the effects of
baseline shift, noise, and frequency shift are also investigated.
A factor analysis of n-component spectra generates n orthogonal
spectra. Proper combinations of any two of the abstract spectra
can eliminate one component and yield new spectra with n $-$ 1
components. The orthogonal spectra obtained by a factor analysis
of the newly generated spectra have n $-$ 1 components each.
Iterative subtraction processes finally resolved all pure
component spectra. A minimum entropy method and a minimum
pathlength method are used to evaluate the component elimination
procedures.
The relative component fractions of a sample with indeterminate
pathlength can be determined by using normalized pure component
spectra as references. The relative component fractions of a
mixture is constant for each component of all samples. Q matrix
and T matrix methods are investigated. Singular value
decomposition method is found to have utility in the spectral
reconstruction.
.]
.[
AN This item is not available from University Microfilms International
ADG05-62008.
AU CASAS-RAPOSO, IGNACIO ANTONIO.
IN University of Toronto (Canada) Ph.D 1986.
TI PROPHET: A LAYERED ANALYTICAL MODEL FOR PERFORMANCE PREDICTION OF
DATABASE SYSTEMS.
DE Computer Science.
AB The efficiency and performance of database systems is
substantially affected by many decisions which are required
throughout the development stages. Unfortunately, the impact on
performance of many design alternatives is not yet well
understood. In this thesis we propose an analytical performance
model of database systems that provides a means of quantitative
comparison of design alternatives. Four levels of abstraction are
distinguished for design choices, models of representation, and
related performance measures. Queuing network models are used at
the bottom, most concrete level. Several subproblems within the
layered framework are studied in detail including schema and
internal access plans models and a performance model for database
buffer management. We also extend and refine several existing
modeling techniques such as schema-to-internal database mappings
and models of internal database structures. All the proposed
submodels are integrated within a generalized framework for
database performance analysis, named PROPHET, with detailed models
of computation at the four layers of abstraction. The
applicability and feasibility of the framework is illustrated with
applications to several present-day commercial DBMSs, including
IMS, MISTRESS and SYSTEM 2000. We also describe the design and
implementation of a specialization of our generalized framework to
SYSTEM 2000 environments. We report preliminary results of the
validation of the SYSTEM 2000 performance predictor against data
measured in an actual operational environment.
.]
.[
AN University Microfilms Order Number ADG88-00136.
AU CHIN, YU-TUNG.
IN University of Rhode Island Ph.D 1987, 194 pages.
TI SYSTEM DESIGN AND IMPLEMENTATION OF A COMPUTERIZED RHODE ISLAND
WATER RESOURCES INFORMATION SYSTEM.
DE Computer Science.
AB Water resources data from various federal and state agencies of
Rhode Island were collected, analyzed and reorganized. The nature
of the data was studied. A conceptual model representing the
entities and relationships among the data was derived by means of
a semantic data model. The conceptual model was then mapped into a
logical file structure represented by a pseudo Data Definition
Language. A distributed configuration was chosen to implement the
database in order to achieve desired functional performance and
cost effectiveness. Finally, the logical structure was realized by
appropriate Database Management Systems in mainframe and personal
computers.
Programs were developed to support data entry as well as to
maintain integrity and security of the database. Applications of
the system in water resources decision-making and management were
demonstrated by examples.
.]
.[
AN This item is not available from University Microfilms International
ADG05-62262.
AU FALOUTSOS, CHRISTOS NICK.
IN University of Toronto (Canada) Ph.D 1986.
TI SIGNATURE FILES: AN ACCESS METHOD FOR TEXTUAL MESSAGES.
DE Computer Science.
AB Signature files seem to be a promising method for text retrieval
and message retrieval. According to this method the messages are
stored sequentially in one file ("text file") while abstractions
of the messages ("signatures") are stored sequentially in another
file ("signature file"). In order to resolve a query, the
signature file is scanned first and many non-qualifying messages
are immediately rejected. Those non-qualifying messages that pass
the signature test are called false drops and are undesirable.
Careful design of the signature extraction method and the
appropriate signature size can restrict the average number of
false drops. The screening capacity of some signature extraction
methods is examined analytically. New methods are proposed and
their performance is compared with one of the old methods.
Finally, the relationship between the false drop probability and
the information loss during the signature extraction is
investigated.
.]
.[
AN This item is not available from University Microfilms International
ADG05-61631.
AU ICAZA, JOSE IGNACIO.
IN University of Waterloo (Canada) Ph.D 1987.
TI ADAPTIVE SELECTION OF QUERY PROCESSING STRATEGIES.
DE Computer Science.
AB Approximate cost models that are traditionally needed to evaluate
query execution strategies are sometimes unavailable or too
unreliable for particular environments. Our new approach for
strategy selection is based on feeding back the actual costs of
query execution under different strategies, rather than on
assumptions-loaded estimates of these costs. We propose three
different methods for using these feedback costs, under a common
general framework for adaptive systems.
In optimal selection we define the problem of designing an optimal
decision policy for a single query class, where the policy
specifies which strategy to execute for each query. This problem
is mapped to some versions of the bandit problem in statistics,
which can be solved using bayesian statistics and dynamic
programming. We present and analyze a program that derives the
optimal policy.
In approximate selection we use a learning automaton to select
strategies. The learning algorithm by Thathachar and Sastry is
modified to handle the dynamic environments of databases. This
adaptive algorithm is tested using existing databases and query
loads, showing how this approach determines the best execution
strategies over time, and changes which strategies are selected as
the environment changes.
Finally, the method of query class partitioning considers the
problem of evolving optimal query partitionings, in which each
partition includes all queries that have a common optimal
strategy. We present a novel adaptive classification scheme that
uses several learning automata.
.]
.[
AN University Microfilms Order Number ADG88-04134.
AU KEARNS, TIMOTHY G.
IN Air Force Institute of Technology Ph.D 1987, 351 pages.
TI A METHODOLOGY, BASED ON ANALYTICAL MODELING, FOR THE DESIGN OF
PARALLEL AND DISTRIBUTED ARCHITECTURES FOR RELATIONAL DATABASE QUERY
PROCESSORS.
DE Computer Science.
AB The design of faster relational database query processors to
improve the data retrieval capability of a database was the goal
of this research. The emphasis was on evaluating the potential of
parallel implementations to allow use of multiprocessing. First,
the theoretical considerations of applying relational operations
to distributed data was considered to provide an underlying data
distribution and parallel processing environment model. Next,
analytical models were constructed to evaluate various
implementations of the select, project, and join relational
operations and the update operations of addition, deletion, and
modification for a range of data structures and architectural
configurations. To predict the performance of the query processor
for all cases, the individual operator models needed to be
extended to models for complex queries consisting of several
relational operations. The solution to modeling multi-step queries
was the use of a general form to express a query. This normal form
query tree uses combined operations to express relational algebra
equivalent queries in a standard form. This standard tree form was
then used to construct analytical models for multi-step queries.
These models provide the capability to model the potential of
different forms of parallelism in solving complex queries. The
results of the analytical models present a logical design for a
multiprocessor query processor. This logical query processor using
multiple processors and employing parallelism illustrates the
potential for an improved query processor using parallel
processing when the analytical model results of complex queries
are compared to a benchmark of some current database systems.
.]
.[
AN University Microfilms Order Number ADG88-01732.
AU LUCCI, STEPHEN.
IN City University of New York Ph.D 1987, 241 pages.
TI STRING MATCHING: A COMPARATIVE STUDY OF ALGORITHMS, AND ITS RELATION
TO PROBLEMS OF PARALLEL AND DISTRIBUTED COMPUTING.
DE Computer Science.
AB Our purpose in this work is to provide a comparative study of
string matching algorithms and their relationship to problems in
parallel and distributed computing. We begin our discussion in
chapters one and two with the Knuth-Morris-Pratt and Boyer-Moore
algorithms. These optimal serial approaches to string matching
provide a reference point against which their parallel
counterparts in chapters three through five may be judged. Galil's
parallel algorithm in chapter three searches for prefixes of the
pattern of increasing length which occur within the text string.
Vishkin's Algorithm, described in chapter four employs the concept
of duels between various text positions to obtain a sparse set of
indices within the text in which the pattern might begin. In
chapter five the Rabin-Karp Algorithm is outlined. This approach
uses hashing functions called fingerprints, rather than character
comparisons, as its basic operation. It is a serial algorithm
which unlike other serial approaches is readily parallelizable.
In chapter six we outline an architecture for a parallel processor
for Galil's algorithm. The "parallel string processor" (PSP) is an
SIMD computer with 2$\sp{16}$ PEs and a perfect shuffle
interconnection network. In the process of specifying our design
we consider sorting algorithms, connection networks, and various
paradigms for data broadcasting. Hopefully, this work foreshadows
the rapidly unfolding technological breakthroughs that will serve
as a bridge to fifth generation systems.
.]
.[
AN This item is not available from University Microfilms International
ADG03-75884.
AU SHEU, CHEN-YU.
IN University of California, Berkeley Ph.D 1986.
TI QUERY PROCESSING AND COMMUNICATION CONTROL IN OBJECT BASES AND
DISTRIBUTED OBJECT BASES.
DE Computer Science.
.]
.[
AN This item is not available from University Microfilms International
ADG05-62253.
AU TUORI, MARTIN IAN.
IN University of Toronto (Canada) Ph.D 1987.
TI A FRAMEWORK FOR BROWSING IN THE RELATIONAL DATA MODEL.
DE Computer Science.
AB This thesis addresses problems in HCI (Human-Computer Interaction)
with information systems, especially database management systems
(DBMSs). Inexperienced or occasional users are often unable or
unwilling to cope with the linguistic, structural and intentional
demands of traditional interfaces, especially high-level query
languages. The primal form of interaction is looking, or
searching. A four part taxonomy of searching is given (intention,
structure, language and modality), and used as a descriptive basis
for a review of relevant HCI literature and interface approaches.
The most important form of searching for casual use is browsing,
defined here as searching without an explicit goal or intention.
A three-part framework is presented for the design and evaluation
of browsing interfaces--conceptual models of the interface, design
guidelines, and evaluation criteria. A closed-loop model of the
user interface is presented, along with a novel categorization of
relational database interfaces, by the way a user may approach the
information (data-model-oriented, object-oriented, and
domain-oriented) and by structural level (system, database, table
and tuple).
Two selection methods for domain-oriented, table-level browsing
are presented--multidimensional range query and point of view
using nearest neighbours. Their strengths and weaknesses are
examined, and the nearest-neighbours approach developed in detail.
Problems arising from this method are examined, solutions offered,
and an implementation supporting both these methods is presented
and discussed.
The descriptive taxonomy and development framework presented in
this thesis offer a basis for understanding computer-assisted
searching and for assessing the contributions of various interface
approaches. Previous literature on HCI is based on the premise of
well-defined intention; this premise must be rejected in favour of
a model of HCI in which the user's intention is variable.
Computerized information systems capable of supporting a wide
range of users will result from the integration of many interface
approaches; browsing offers a useful approach for initial,
exploratory search, especially by inexperienced or occasional
users.
.]
.[
AN University Microfilms Order Number ADG88-04620.
AU TURBYFILL, CAROLYN.
IN Cornell University Ph.D 1988, 286 pages.
TI COMPARATIVE BENCHMARKING OF RELATIONAL DATABASE SYSTEMS.
DE Computer Science.
AB New and diverse architectures have been developed to meet rising
expectations of both functionality and performance from relational
database systems. A comprehensive, portable, comparative benchmark
is needed to evaluate the performance tradeoffs inherent in these
different architectures. In this thesis, we develop an
experimental framework for systematically evaluating and comparing
relational database systems across diverse architectures. We
describe and motivate the design of a scalable, portable benchmark
for relational database systems, the AS$\sp3$AP benchmark (ANSI
SQL Standard Scalable And Portable). The AS$\sp3$AP benchmark is
designed to provide meaningful measures of database processing
power and to be a useful tool for system designers. We introduce a
new performance metric, the equivalent database ratio, to be used
in comparing systems. The equivalent database ratio is the ratio
of database sizes on two different systems for which equivalent
performance on a set of test queries is obtained.
The benchmark consists of two parts: tests of the access methods
and building blocks, and test of the optimizer. In tests of the
access methods and building blocks, access methods and functions
common to implementations of relational database systems are
tested. The relations and queries used in these tests are designed
to increase the likelihood that the access method or program
branch targeted by the test will actually be chosen by the query
optimizer. The tests of the access methods and building blocks are
used to compute the equivalent database ratio. In tests of the
optimizer, assumptions typically made by query optimizers such as
the uniform distribution of data values or the random placement of
tuples are systematically violated. Comparisons between queries
are used to indicate whether the optimizer has made a correct
choice. Queries are included to illustrate current limitations in
the state of the art in query optimization. For a representative
cross section of access methods, estimates of the best case
response time for the benchmark queries on disk bound systems are
computed. The response time estimates and the systematic
benchmarking methodology provide for clear interpretation of
benchmark results.
.]
[Note: continued in next issue - Ed]
------------------------------
END OF IRList Digest
********************