Copy Link
Add to Bookmark
Report

IRList Digest Volume 2 Number 37

eZine's profile picture
Published in 
IRList Digest
 · 1 year ago

IRList Digest           Wednesday, 13 August 1986      Volume 2 : Issue 37 

Today's Topics:
Discussion - Work at Bellcore on Collins English Dictionary
Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 1

----------------------------------------------------------------------

Date: Fri, 8 Aug 86 11:35:23 edt
From: amsler@mouton.bellcore.com (Robert Amsler)
Subject: Re: IRList Digest V2 #33 on Collins Dictionary Work

Collins Dictionary Work:
Work at Bellcore is proceeding with an effort to make a comprehensive
database format for the CED comparable to that prepared by Jim Peterson
for the W7 (Merriam-Webster Seventh Collegiate). The following is
an approximation to the format we intend to convert the data into.

Headword
H1: H
H2: Headword
H3: Homograph Number
H4: Syllabification (as numeric code)
H5: Preferred Hyphenation (as numeric code)
H6: Headword Part of Speech (n, vb, adj, symbol for)
H7: Alternate Part of Speech
Alternate form of Headword (Inflectional and Variant forms)
A1: A
A2: Headword Alternate
A3: " " Part of Speech (e.g. n.)
A4: " " Inflection Type (e.g. pl.)
A5: " "'s Primary Headword
A6: " "'s Homograph Number
A7: Type of Alternate Form (e.g. USA, for U.S. Spelling)
Pronunciation
P1: P
P2: Pronunciation
P3: Type of Pronunciation (e.g. USA, for U.S.; French, etc.)
Label
L1: L
L4: Sense Number of this Label
L5: Subsense Number of this label
L3: Label
L4: Type of Label (Temporal, Usage, Connotative, Subject, National
Regional)
Definition
D1: D
D4: Definition Sense
D5: " Subsense Letter
D6: " " Part (signified by ;'s in definition text)
D6: Part of Speech of (Sub)Sense
D7: Definition label
D8: " Text
Cross-Reference
X1: X
X2: Cross-Referenced Headword
X3: "-" " Homograph Number
X3: "-" " Sense
X4: "-" " Subsense Letter
X6: "-" Type (e.g. See, See Also, Also Called, etc.)
X7: "-" Definition Text
Related Expressions (including Run-In and Run-On Entries)
R1: R
R2: Type of Related Expression (I = Run-in; O = Run-On)
R2: Related Expression
R3: " " Part of Speech (e.g. n.)
R7: " "'s Primary Headword's SubSense Letter
R8: " "'s Primary Headword's SubSense Part Number (;'s)
Citation or Example Sentence
C1: C
C2: Citation's Local Headword
C3: Citation's Primary Headword
C4: " " "'s Homograph Number
C5: " " "'s Part of Speech
C6: " " "'s Sense Number
C7: " " "'s SubSense Letter
C8: " " "'s SubSense Part Number (;'s)
Etymology
E1: E
E2: Primary Headword
E3: " " Homograph Number
E4: Century
E5: Etymology Text
Usage Note Comments
U1: U
U2: Usage Primary Headword
U3: " " " Homograph Number
U4: Usage Note Text

Here is an entry from the B's in this format...

[Note: the example was deleted 8/21 since it apparently caused kermit and
UUCP to not agree to transfer this file! - Ed]

I would very much like to obtain a list of the decoded special symbols
in the CED, i.e. those represented by the sequential #800 numbers.
These appear to be unique assignments and are nothing but tedium to
extract.

[Note: We have already extracted the data into a similar form and will
be sending that to the Oxford Text Archive soon. Since almost a year
of part-time effort, cleaning up data, editing by hand, etc. have
been involved, it might be wiser to wait for that. A MS project
report by R. Wohlwend documents much of the tape analysis effort. - Ed]

------------------------------

Date: Wed, 23 Jul 1986 13:06 CST
From: Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet>
Subject: SIGIR FORUM Abstracts [Part 1 - Ed]

[Note: Members of ACM SIGIR should soon receive the spring/summer
Forum, and can find these on pages 30-31. The rest will appear in
machine readable form also in later issues of IRList. - Ed]

ABSTRACTS

(Chosen by G. Salton or V. Raghavan from 1984 issues of journals
in the retrieval area)

1. APPLICATION OF MODERN TECHNOLOGIES TO INTERLIBRARY RESOURCE-
SHARING NETWORKS
J. Francis Reintjes
Laboratory for information and decision Systems,
Massachusetts Institute of Technology, Cambridge, MA 02139

Examined in this article is the hypothesis that it is now
technologically and economically feasible to move the content
of documents electronically among nodes of a library network
rather than the documents themselves or photocopies thereof.
Comparisons are made on the basis of response-to-request
time, quality of reproduced copy and cost factors. The
conclusion is reached that electronic interlibrary resource-
sharing networks are ideally suited to situations where there
are high frequency occurrences of internode requests for
information contained in serials, where nodal separation
distances do not exceed a few tens of miles and where copy is
in six-point type or larger. A three-node network is
examined in detail. Specifications for each element of the
network are given, with emphasis placed on a highly critical
element, the bound-document scanner. The results of an
economic study of interlibrary electronic networks are also
presented.
(JASIS, Vol. 35(1): 45-52; 1984)

2. CO-CITATION ANALYSIS AND THE INVISIBLE COLLEGE
Elliot Noma
CHI Research/Computer Horizons, Inc., 1050 Kings Highway
North, Cherry Hill, NJ 08034

Co-citation analysis is based on the assumption that all
citing articles view the scientific literature from a common
point-of-view. When a co-citation matrix is analyzed, this
assumption affects measures of the dimensionality and
clustering of articles. Therefore, before a co-citation
matrix is constructed, the citing articles should be limited
to those written by individuals in an invisible college.
(JASIS, Vol. 35(1): 29-33; 1984.)

3. LESS THAN FULL-TEXT INDEXING USING A NON-BOOLEAN SEARCHING
MODEL
Donald B. Cleveland
School of Library and Information Sciences, North Texas State
University, Denton TX 76203

Ana D. Cleveland and Olga B. Wise
Texas Woman's University, Denton TX 76204

The relative effectiveness of indexing using full-text or
less than full-text was tested using a non-Boolean, chaining
type of file structure and searching method. Indexing was
done using titles, abstracts, full-text, references, and
various combinations of these surrogates and then Goffman's
indirect method of information retrieval was used to
structure and search the file. The database consisted of 733
documents and 38 queries were searched. The hypothesis of
the study was that by using a particular non-Boolean method
as a file structuring and searching technique, full-text
indexing is not essential to optimum information retrieval
effectiveness. The outcome of the study was positive.
(JASIS, Vol. 35(1): 19-28; 1984)

4. STATISTICAL RECOGNITION OF CONTENT TERMS IN GENERAL TEXT
Martin Dillon
School of Library Science, University of North Carolina,
Chapel Hill, NC

Peggy Federhart
Library, IBM Corporation, Charlotte, NC 28257

This article discusses ways to improve the quality of
retrieval systems that depend on the use of truncated words of
quasi-word stems as an indexing vocabulary. The problems
addressed are the generalizability and stability of
discriminant function analysis for selecting good topical
terms from terms of relatively high frequency in a database
drawn from abstracts of Harris Survey press releases. Results
confirm that topical terms can be identified by their
statistical properties. Consistently high recall of topical
terms under a variety of different conditions implies
persistent underlying properties strong enough to resist
changes in test environment.
(JASIS, Vol. 35(1): 3-10; 1984)

5. INFORMATION RETRIEVAL FROM CLASSICAL DATABASES FROM A SIGNAL-
DETECTION STANDPOINT - A REVIEW
M. H. Heine
School of Librarianship & Information Studies
New Castle upon Tyne Polytechnic, UK

The retrieval of information from classical
(object/attribute) databases is discussed in the light of
signal-detection theory. The approach is based on the
Swetsian schema, although it is expressed in a more general
form.
(Information Technology, Vol. 3, No. 2. 95-112, April 1984)

6. MAXIMUM ENTROPY AND THE OPTIMAL DESIGN OF AUTOMATED
INFORMATION RETRIEVAL SYSTEMS
Paul B. Kantor
Tantalus Inc.
Suite 218, 2140 Lee Road
Cleveland, Ohio 44118

The application of the maximum entropy principle is extended
to problems of information storage and retrieval. The
extension includes continuous or 'fuzzy' relevance
valuations, fuzzy descriptors, and prior or feedback
constraints. A decomposition property of the entropy
function is used to express the total entropy in terms of the
entropy of nonoverlapping components. Each component is
described by a richness parameter which is determined by a
set of coupled constraint equations given in closed form. A
method is outlined for solving those equations in real time,
and possible grounds for applying the maximum entropy
principle are explored. The relation to term weighting, and
the possibility of constructing rigorous relations between
information and effort, are also discussed.
(Information Technology, Vol. 3 No. 2 88-94 April 1984)

7. INFORMATION SCIENCE RESEARCH: THE SEARCH FOR THE NATURE OF
INFORMATION
Manfred Kochen
Schools of Medicine and Business Administration, The
University of Michigan, Ann Arbor, MI 48109

High level scientific research in the information sciences is
illustrated by a sample of recent discoveries involving the
design of information-processing algorithms, bibliometric
scaling, and flows of information in biological systems and
in countries. It is pointed out that when the concept of
information first assumed an independent identity, the only
known information processing systems were biological; now,
after four decades of vigorous development of electronic
information systems, the search for the essential nature of
information is focussing again on biological systems and on
sociotechnological systems as well.
(JASIS, Vol. 35(3): 194-199; 1984)

8. BRIEF HISTORY OF INFORMATION SCIENCE
Saul Herner
President, Herner and Company, 1700 North Moore Street,
Arlington, VA 22209

Information science is the product of convergences of library
science, computer and punched card science, R & D
documentation, abstracting and indexing, communications
science, behavioral science, micro- and macro-publishing,
video and optical science, and various other fields and
disciplines. The role and contribution of each participating
segment is reflected in certain basic and seminal writings,
in the work of "major actors" in the field, and in major
events or developments. These contributing sources are
reviewed, analyzed, and related, as a means of tracing the
history of the field, from its pre- and post-World War II
beginnings to the early 1980's, to the near-term future.
(JASIS, Vol 35(3): 157-163; 1984)

9. A NOTE ON THE USE OF NEAREST NEIGHBORS FOR IMPLEMENTING
SINGLE LINKAGE DOCUMENT CLASSIFICATIONS
Peter Willett
Department of Information Studies, University of Sheffield,
Western Bank, Sheffield S10 2TN, United Kingdom

Best match search algorithms provide an efficient means of
identifying the sets of nearest neighbors for each of the
documents in a collection. These sets contain much of the
important similarity data contained in a full interdocument
similarity matrix and may be used for the generation of
hierarchic document classifications, such as those arising
from the use of the single linkage clustering method.
Cluster based retrieval experiments based upon such
classifications are shown to give results that are comparable
in effectiveness with those obtained using the full
similarity matrix.
(JASIS, Vol. 35(3): 149-152; 1984)

------------------------------

END OF IRList Digest
********************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT