Copy Link
Add to Bookmark
Report

IRList Digest Volume 2 Number 48

eZine's profile picture
Published in 
IRList Digest
 · 1 year ago

IRList Digest           Tuesday, 23 September 1986      Volume 2 : Issue 48 

Today's Topics:
COGSCI - Dimensionality Reduction (conn. networks)
Article - Software Reuse Through Information Retrieval - Part 2

News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet
CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------

Date: Mon, 15 Sep 86 18:47:55 edt
From: DEJONG%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU
Subject: Cognitive Science Calendar


Date: Monday, 15 September 1986 12:09-EDT
subject: Center for Biological Information Processing Seminar

Wednesday, 17 September 12:00pm Room: E25-401

Dimensionality-Reduction Using Connectionist Networks

Eric Saund
MIT Department of Psychology

Recently, methods have been developed for training "connectionist"
networks of simple computing elements to perform a wide variety of
input/output associative mappings. These methods are interesting
because, under some circumstances, networks are able to acquire
automatically intermediate representations that capture regularities
in the mapping. In this talk I will present a way to perform
dimensionality-reduction in such a network. Dimensionality-reduction
is a coding of multi-dimensional data that is constrained to lie on a
lower-dimensional surface embedded in a high-dimensional
feature-space; dimensionality-reduction is a generalization of factor
analysis. I will discuss why dimensionality-reduction may prove a
useful computational tool for later visual processing such as shape
analysis.

------------------------------

Date: Fri, 12 Sep 86 22:31:44 EDT
From: seismo!allegra!hoqam!wbf
Subject: Software Reuse through IR [ Part 2 - Ed]

Software Reuse Through Information Retrieval

W. B. Frakes
B. A. Nejmeh

AT&T Bell Laboratories
Holmdel, New Jersey 07733

[Note: sections 1-4 appeared in the last IRList issue - Ed]

5. Types of Interactive Information Systems

Many different types of systems for handling information are
currently in use. These system have different underlying
models and capabilities. Perhaps the best known type of
system is the database management system (DBMS) [11]. DBMS
are widely used for storing, managing, and retrieving highly
structured information such as parts lists, personnel files,
etc. Retrieval from these systems is deterministic. For
example, if a query is put to a DBMS asking for records of
all employees in Kansas City who make more than $35,000, the
system will retrieve all and only those records matching the
query criteria.

While DBMS are powerful, they are usually limited in their
ability to handle data that is not highly structured, such
as text or source code. Current systems for handling this
kind of data are information retrieval (IR) systems. [12]
[13] Originally developed to manage the literature of the
natural sciences, IR systems incorporate many techniques for
storing and retrieving unstructured data. These techniques,
such as boolean queries and partial string matching are
discussed below.

As a demonstration of the use of IR Systems for software
reuse, we built a small database of software modules using
CATALOG, an IR system developed by Bill Frakes, Steve Cox,
and Bill Leighton at AT&T Information Systems. These modules
were from SUPER [14], a system built at Bell Laboratories
for interactive reliability analysis. The information used
to index these modules was taken from the descriptive
headers required of each module in the SUPER system. The
text from these headers was passed to CATALOG which placed
the words from the text in inverted files. Searches to this
database could then be made via CATALOG's menu or command
driven search interfaces as described below.


6. The CATALOG Information Retrieval System

CATALOG is a high performance information retrieval system
designed to allow end users to create, maintain, and search
databases containing both formatted records, such as are
typically found in DBMS, and unformatted records, such as
text, which most DBMS handle poorly. It is now being used
widely within AT&T for such tasks as document management,
marketing information management and distribution, as the
basis of LATTIS, the AT&T IS library system [15], and as the
basis of Video Data Locator, a CATALOG application that
allows retrieval of both text and color images.

CATALOG features a database generator which assists users in
setting up databases, an interactive tool for creating,
modifying, adding, and deleting records, and a search
interface with a menu driven mode for novice users, and a
command driven mode for expert users. The search interface
allows full boolean combinations of search terms and sets of
retrieved records, and sophisticated partial term matching
techniques such as automatic stemming, and phonetic
matching. CATALOG databases are built using B-Trees,
providing rapid search and retrieval capabilities.

CATALOG was written in the C programming language, and
currently runs under UNIX, and MS-DOS. CATALOG was
originally developed on a VAX 11-780, and has since been
ported to the 3B2, 3B5, and 3B20, the IBM PC, the AT&T
PC6300, and the PC7300.

6.1 Searching using CATALOG

CATALOG will allow the complete source code for a module to
be entered into the system providing full source searchable
databases. It is also possible to enter only source code
module surrogates, for example the information in the header
such as title, author, and description. These surrogates
are then available as primary searchable records, and the
full records are available as secondary records for viewing
and printing. Both record size and database size are
unlimited.

Searching is carried out using inverted indexes of every
significant word in a database. CATALOG creates sets of
records in response to user queries. These sets can then be
combined using boolean operators to form new sets. The
display of these sets shows the query used to create the
list, and the number of records that match the query.

6.2 Multiple Search Interfaces

CATALOG has two main user modes; a novice user mode, which
is menu driven, and a command mode for more experienced
users. This allows the system to adapt itself to the user's
level of knowledge.

Novice user mode assumes as little knowledge as possible of
the user. In this menu driven mode, users are prompted to
provide search queries. CATALOG then retrieves and places
records corresponding to the queries into sets. By
selecting the appropriate items from menus, users can sort,
display, and perform boolean operations on retrieved sets of
records. It is also possible for an expert user to overide
many of the default settings for novice user mode using the
methods described below. Expert mode assumes a
knowledgeable system user, and thus provides only a simple
prompt for commands.

6.3 Queries

In novice mode, CATALOG prompts for queries with the phrase
"Look for:". In response, a query or command (described
below) may be entered. For example the query:


| Look for: sorting routines |
| |















- 6 -



will cause CATALOG to attempt to find records that contain
the terms "sorting" and "routines", and their variants as
described below.

CATALOG provides for full boolean search specification
through menu selection. It is also possible, though not
necessary, to specify boolean logic in a query. For
example,

Look for: ((sorting and routines) or quicksort) not heapsort

This query will retrieve source code records about sorting
routines or quicksort, which are not about heapsort.

To find records relevant to a query, CATALOG will take the
words in the query one at a time, and try to find other
words in the database which might be related. If it finds
any possibly related words, CATALOG will present its guesses
to the user for selection. For the query term "sorting" for
example, CATALOG might respond as follows:


Search Term: sorting

Term Occurrences
1. sort 15
2. sorting 1
3. sorts 3

Which terms (0 = none, CR = all) :


Users select the terms they want by entering their numbers.

The "related word" feature can be suppressed by putting the
character "\" at the end of an entered word, in which case
the index is searched for an exact match. The "related
word" feature can also be suppressed by putting wild card
characters into an entered word. Two wild card characters
are available. The character "*" stands for zero or more
occurrences of any character, and the character "?" stands
for a single arbitrary character. Thus, the term airlin*
will match the words airline, airlines, airliner, etc.,
while the term airlin? will match airline but not airlines
or airliner. Wild card characters cannot be used as the
first character of a word. That is, air*ine and airlin? are
legal search terms, but *irline and ???line are not.

CATALOG also provides the ability to match on phonetic
variants of a query term. This feature will be most useful
with human names. If a field has been marked for phonetic
searching, the phonetic match routine will relate such names
as "Kahn", "Cohen", "Cohn", etc. The phonetic match is
invoked by appending the character "#" to a search term.

6.4 List Display

When a user has made his choices for all the words in the
query, a list such as the following is formed.


Lists (& indicates a stemmed term) records

a) (software) 26
b) (sorting and algorithms) 9
c) (system& and call&) 3



This display indicates that three searches have been done,
and that the last search formed list "c" which contains
three records. These three records are related to both the
concept "system" (i.e. the records contain one or more words
related to the word stem "system") and the concept "call"
(i.e. the records contain one or more words related to the
word stem "call").

6.5 Main Menu Display

The main menu in CATALOG allows users to exit the system,
access help messages, go back to the "Look for:" prompt, and
perform operations on record lists. The main menu display
looks like this.

Options:
1 Exit from system
2 Display items from a list
3 Do another search (Go back to 'Look for')
4 Create a new list by finding items common to 2 or more lists (AND)
5 Create a new list by including all items from 2 or more lists (OR)
6 Create a new list by removing from a list
items from 1 or more other lists (NOT)
7 Delete 1 or more lists
8 Help message menu

Choice:


By selecting appropriate items from this menu, users can
manipulate the system to give desired results such as:

o Creating new lists of records, e.g. software modules,
from old lists

o Removing lists

o Displaying and printing records

o Sorting lists

o Placing records in files

o Restricted field searching

o Restricted field display

o Restricted date searching

6.6 Help Messages

Detailed help messages are available for all system
functions. When users first log into CATALOG, they are
asked if they need help. If they say yes, they can access
help through interactive menus. Help messages can also be
accessed throughout the search session by specifying the
appropriate menu items or commands.

6.7 Command Mode

By typing the command "-c" at the search prompt in naive
user mode, a user can enter command mode. Command mode
assumes a knowledgeable user, and thus only prompts for
commands. It is possible to use any search or display
function with commands.

6.8 SDI - Setting Up a User Interest Profile

Selective Dissemination of Information (SDI) is a technique
used to keep users of an information system alerted to new
additions to the system database. CATALOG provides such a
capability. SDI using CATALOG is done by maintaining a file
containing user ID's and lists of interest words. The
general form is:

userid<sp>word1<sp>word2<sp>...wordn<RETURN>

for example,

smith set manipulation algorithms


This file can be matched against the full database, or
updates to the database. Lists of the records matching the
profiles are then sent to the appropriate users. The SDI
feature will be useful for alerting software developers to
software modules of a given type that have been added to the
system.

[Note: continued in next issue of IRList - Ed]

------------------------------

END OF IRList Digest
********************

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT