Copy Link
Add to Bookmark
Report

IRList Digest Volume 2 Number 40

eZine's profile picture
Published in 
IRList Digest
 · 1 year ago

IRList Digest           Sunday, 7 September 1986      Volume 2 : Issue 40 

Today's Topics:
Query - Mail list digest indexing?
Announcement - Oxford Text Archive shortlist, new acquisitions
Call for Papers - IFIP WG6.5 Conf. on Message Handling Systems
Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 3

News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet
CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------

Date: Sat, 30 Aug 86 00:43:45 edt
From: Ewgorc@CS.UCL.AC.UK
Subject: indexing of mailing-list items

mailing list digest indexing
Dear Mr. Fox,

I've just found out from the moderator of the AIList that you have built
a system for the automatic indexing of mailing-list items. I am currently
implementing a similar system for an M.Sc. project, and I would be very
grateful if you could tell me a little about your system, as a comparison
would be useful as part of my project report. (The only similar system
I'd heard about before was one designed by an M.Sc. student at
Queen Mary College, University of London : this was very much oriented
towards reserarch into user-modelling, and was never actually implemented.)

My system is designed to run under UNIX on a Sun workstation.
It first identifies the mailing-list to which each new message belongs,
copying them into an appropriate directory (and splitting digests into
several files, one for each item or for "Today's Topics").

An index file for each is then produced : indexing terms are taken from
profiles set up by potential readers, and from a dummy profile maintained
by the system administrator which will be particularly useful in the early
days of the system before many readers have created large profiles.
The indexing process is based on "fgrep" : the number of times each term
occurs in the text of the message is noted (terms occurring in the "Subject"
header are given an especially high score, but other header lines are
disregarded.)

Each reader's profile is then compared to the index files,
adding up the scores for each term which occurs in both the profile and
the index of a message, and then dividing by the number of lines in the
text, to obtain the interest value of the message to that reader : messages
for which the interest value is greater than or equal to a
threshold (which may be given in the profile, although a default is
provided) are mailed to a file associated with the reader, using
details of the match - the interest value and a list of the matched
terms - as a new Subject header.

The three main programs of the system are written in C : although much
of the work could possibly have ben done by "grep", "awk" and similar
utilities, I felt C code would be more maintainable (e.g. once the
basic system is in operation, I would like to allow readers to link
terms together by "and" and "or").

I would be interested to know if my system differs radically from yours
in any way, particularly in the algorithms used for indexing and
matching. For example, does your system perform any semantic analysis
of messages rather than just string-matching, and if so how?

Thanks in advance for any information you can give me.

Tim Miles
BP Research Centre, Sunbury-on-Thames, England.

[Tim: Sounds like an interesting and useful project. Yes, my system is
different. I will try to fill in a few comments and give pointers to
related works - other readers are invited to add to this discussion.
1) Thomas Malone at MIT has worked on the "Information Lens" which
deals with mail handling and profiles.
2) Michael Mauldin (see IRList V2 #25, 30) is working on FERRET which
is an application and extension of methods like FRUMP to mail
indexing and retrieval.
3) The SMART system at Cornell and Virginia Tech has been used to
index and retrieve collections of mail and mail digest messages.
4) The CODER system is under development at Virginia Tech to use AI
methods to analyze and retrieve (parts of) messages from an archive
of messages from AIList Digest. A brief discussion appears in IRList
V2 #26, and there have been several conference papers and technical
reports on it.
5) Your approach is like several SDI (selective dissemination of
information) systems that matched profiles against new files of
recent abstracts.
6) Some online interactive systems for searching archives have recently
become available on networks like BITNET.
7) R. Korfhage has recently done work on profiles and queries in
information retrieval.

Hope this is the kind of info you wanted. Regards, Ed]

------------------------------

Date: 29-AUG-1986 14:33:03
From: LOU%UK.AC.OXFORD.VAX1@AC.UK
Subject: Oxford Text Archive news

OXFORD TEXT ARCHIVE

A new edition of the Oxford Text Archive Shortlist was published in
August 1986. Send your name and address for a free copy! (second and subsequent
copies cost $3 each)

Recent acquisitions include TWO diffent parsed/structured versions of the
Oxford Advanced Lwearners Dictionary, one produced by Roger Mitton at
Birkbeck College; the other by Rick Kazman at University of Waterloo.
Both versions are available under the OTA's standard conditions of use.

------------------------------

Subject: Call for Papers WG 6.5
From: Peter Schicker <schicker%ean.cs.nott.ac.uk@cs.ucl.ac.UK>
Date: 13 Jun 86 7:55 -0100 BST

[Forwarded msg below was sent to me for redistribution - Ed]

Hugh and Stef:
Could you post the following call for papers on the US and European nets,
please.

CALL FOR PAPERS

IFIP WG 6.5 International Working Conference on

MESSAGE HANDLING SYSTEMS
(State of the Art and Future Directions)

27 to 29 April 1987
Munich
Fed. Rep. of Germany

Program:
The purpose of the conference is to provide an international forum for the
exchange of information on the technical, economic, social, and political
impacts of computer message and office systems. The conference format will
be two days of conference paper presentations followed by one day of work-
shops.


Papers are desired in the following topic areas:

MHS Interconnection and Interworking
Interconnection of X.400 Systems (Private and public)
Gateways to X.400 Systems
X.400 Shell to non-X.400 Systems
Interworking between X.400 and the Postal System
Interworking with other Architectures (e.g., DIA/DCA, All-In-1, etc.)
Multi-Vendor Private Message Systems

Documents and Messages
Document and Message Architectures
Multimedia Documents and Messages
Graphics (GKS) vs. Facsimile
Communication of Business Forms and Trade Documents

Directory Services
Naming and Addressing
Public Directory Systems
Interworking between Public and Private Directory Systems

New Access Protocols
Mailbox Services
Extensions to X.400 Series Recommendations

Message Management
Personal Message Management
Message and Document Filing and Retrieval

Group Communication
Distribution Lists
Organization of Message Flow
Real-Time Conferencing
Models for Group Communication

Workstations and User Interface
Workstation and Cluster Design
Backup and Archiving
User Interface Issues
Message Editing

Security Aspects
Authentication
Confidentiality

Impacts of MHS
Social and Behavioral Impacts
Impacts on Organizations
Impacts on Nations
Inpacts on Relieving Impairment

Policy Issues
Public Policy Issues in MHS
Transborder Data Flow
Legal Status of MHS
Privacy and Confidentiality

- ------------------------------------------------------------------------

Instructions to Authors:

Prospective Authors are invited to submit for review unpublished original
contributions (not exceeding 5000 words) which describe recent developments
on any design or service aspect of computer message systems.

Accepted papers will appear in the Conference Proceedings published by
North-Holland Publishing Company.

Deadlines:

Today Send a postcard with your name, telephone, and EMail
address to:
Message Systems '87
Mrs. Stenzel
Siemens AG
D-AP.11
Otto Hahn Ring 6
D-8000 Munich 83
Fed. Rep. of Germany
This will ensure that you will receive further information
about the conference. Please indicate also the provisio-
nal title if you intend to submit a paper.
Sept. 30, 1986 Draft versions of papers required
Nov. 30, 1986 Notification of acceptance
Jan. 31, 1987 Camera-ready papers required

Papers should be submitted to:

Peter Schicker
Zellweger Telecommunications AG
CH-8634 Hombrechtikon
Switzerland

------------------------------

Date: Wed, 23 Jul 1986 13:06 CST
From: Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet>
Subject: SIGIR FORUM Abstracts [Part 3 - Ed]

[Note: Members of ACM SIGIR should have received the spring/summer
Forum, and can find these on pages 37-39. The rest will appear in
machine readable form also in a later issue of IRList. - Ed]

ABSTRACTS

(Chosen by G. Salton or V. Raghavan from 1984 issues of journals
in the retrieval area)

23. RANKING TECHNIQUES AND THE EMPIRICAL LOG LAW

Bertram C. Brookes
64 Abbots Gardens, London N2 0JH, England

Four empirical laws of bibliometrics - those of anomalous
numbers, of Lotka, Zipf and Bradford, together with Laplace's
notorious "law of succession" and de Solla Price's cumulative
advantage distribution, are shown to be almost identical.
Some of these laws are expressed as frequency distributions,
some are frequency-ranked. A simple model which
discriminates these various forms is described. It shows
that the frequency forms conform with an inverse square law
over the appropriate interval and that the equivalent rank
distribution - the Log Law - has the Df

Q(r) = log b (r + 1)

where b is the rank interval.

It is further shown that frequency distributions discard
empirical statistical information which the equivalent rank
distributions retain for analysis. So that rank
distributions of theoretical advantages in this field.

The paper concludes with comments on the analysis of the
empirical hybrid forms which arise. The reduction of the
above laws, empirical and hypothetical, to a single law is
achieved by NOT equating the ordinals 1st, 2nd, 3rd,... to
the numbers 1, 2, 3,... as is commonly done.

(Information Processing & Management, Vol. 20, No. 1-2, pp.
37-46, 1984).


24. HUMAN INFORMATION SEEKING AND DESIGN OF INFORMATION SYSTEMS

William B. Rouse and Sandra H. Rouse
Center for Man-Machine Systems Research, Georgia Institute of
Technology, Atlanta, GA 30332, USA

The literature of psychology, library science, management,
computer science, and systems engineering is reviewed and
integrated into an overall perspective of human information
seeking and the design of information systems. The nature of
information seeking is considered in terms of its role in
decision making and problem solving, the dynamics of the
process, and the value of information. Discussions of human
information seeking focus on basic psychological studies,
effects of cognitive style, and models of human behavior.
Design issues considered include attributes of information
systems, analysis of information needs, aids for information
seeding, and evaluation of information systems.

INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 1-2, pp
129-138, 1984.


25. EXPERIMENTAL AND QUASI-EXPERIMENTAL DESIGNS FOR RESEARCH IN
INFORMATION SCIENCE

David F. Haas and Donald H. Kraft
Department of Computer Science, Louisiana State University,
Baton Rouge, LA 70803, USA

This is a paper about research designs in information
science. We look at a sample of current research and compare
its designs with an abstract ideal of experimental research
design to see how closely they approximate it. We then
consider ways in which research in our field might be brought
closer to the ideal. We do this because we believe that
experimental and quasi-experimental designs offer unique
advantages over other research designs, especially in the
production of knowledge that can be applied to the solution
of practical problems in information and in software science.

(INFORMATION PROCESSING AND MANAGEMENT Vol. 20, No. 1-2,
pp. 229-237, 1984).


26. THE RELATION BETWEEN THEORY AND METHODOLOGY FOR DESIGNING
EXPERIMENTS IN INFORMATION SCIENCE

Charles Pearson
Catronix Corporation, 151 Sixth STNW, Suite 100, Atlanta, GA
30313, USA

The relation between theory and experiment in Information
Science is the same as that in any other science. This
relation is examined in some detail in order to provide a
better understanding for designing experiments in Information
Science.

(INFORMATION PROCESSING AND MANAGEMENT, Vol 20, No. 1-2, pp.
239-241, 1984)


27. MATHEMATICAL MODELS OF TEXT

Harold P. Edmundson
Department of Computer Science, University of Maryland,
College Park, MD 20742, USA

An object of serious study in the information sciences is
printed language, called text. This paper presents numerous
examples of mathematical models of text and in so doing
exposes some interesting results and problems associated with
the linquistic, mathematical, and computational aspects of
current research involving text.

First, the notions of mathematical models and modeling are
reviewed. Then the graphemic, morphological, syntactic, and
semantic linquistic levels of text analysis are distinguished
and discussed. Next, Numerous deterministic models and
stochastic models of text are treted in some detail.
Finally, thses research accomplishments in information
science are summarized and future research is discussed.

(INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 1-2, pp.
261-268, 1984).



28. FUZZY PROBABILITIES

Lotfi A. Zadeh
Computer Science Division, Department of Electrical
Engineering and Computer Sciences and the Electronics
Research Laboratory, University of California, Berkeley, CA
94720, USA.

The conventional approaches to decision analysis are based on
the assumption that the probabilities which enter into the
assessment of the consequences of a decision are known
numbers. In most realiastic settings, this assumption is of
questionable validity since the data from which the
probabilities must be estimated are usually incomplete,
imprecise or not totally reliable.

In the approach outlined in this paper, the probabilities are
assumed to be fuzzy rather than real numbers. It is shown
how such probabilities may be estimated from fuzzy data and a
basic relation between joint, conditional and marginal fuzzy
probabilities is established. Manipulation of fuzzy
probabilities requires, in general, the use of fuzzy
arithmetic, and many of the properties of fuzzy probabilities
are simple generalization of the corresponding properties of
real-valued probabilities.

(INFORMATION PROCESSING AND MANAGEMENT. Vol. 20, No. 3, 363-
372, 1984).

29. ENTROPIES WITH AND WITHOUT PROBABILITIES
APPLICATIONS TO QUESTIONNAIRES

Bruno Forte
Department of Applied Mathematics, University of Waterloo,
Waterloo, Ontario, Canada N2L 3G1

Entropy is a basic quantity in Information Theory. As it
measures the amount of uncertainty one has in an alternative,
it is "conditional" upon all kinds of "information" that has
been given. New motivations for measures of uncertainty and
infomation are provided. A more natural interpretation of
the entropies in the "mixed theory" and the entropies for a
random vector is given. The proposed new approach in
measuring uncertainty is illustrated with examples, in
particular, from questionnaires theory.

(INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 3, 397-
405, 1984).

------------------------------

END OF IRList Digest
********************


← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT