IRList Digest Volume 2 Number 41

Published in
· 1 year ago
IRList Digest           Monday, 8 September 1986      Volume 2 : Issue 41 

Today's Topics: 
   Query - Back issues, applying IR ideas to software systems? 
   Announcement - Xerox PARC Forum on NoteCards 
   Announcement - News on National Archives storage 
   Discussion - Comments regarding News on National Archives storage 
   Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 4 of 4 

News addresses are ARPANET: fox%vt@csnet-relay.arpa  BITNET: foxea@vtvax3.bitnet 
   CSNET: fox@vt   UUCPNET: seismo!vtisr1!irlistrq 
---------------------------------------------------------------------- 

Date: Fri, 5 Sep 86 00:42:30 CDT 
From: seismo!gswd-vms.ARPA!marick%turkey (Brian Marick) 
Subject: back issues of the mailing list 

 
There's a line of information retrieval/organization research that 
stretches back to Vannevar Bush and Memex, through Englebart and NLS, up 
to Xerox Notecards and TextNet.  I'd like to apply those ideas about 
organizing and retrieving information to the (I feel) analogous problems 
involved in the maintenance and enhancement of large software systems by 
smallish groups of people.  The more I learn about the 
Bush-Englebart-... family tree, the better.  Would back issues of the 
IRList help me?  If so, is there any way to get to those back issues? 

Thanks much. 

Brian Marick, Wombat Consort 
Gould Computer Systems -- Urbana && University of Illinois 
...ihnp4!uiucdcs!ccvaxa!marick 
ARPA:  Marick@GSWD-VMS 

[Note: yes, see the Welcome message I will send you for details - Ed] 

[Note: Dr. William Frakes at AT&T Bell Laboratories and some of his 
colleagues have been interested in searching large software 
collections for "relevant" modules, which is a little like what 
you are talking about.  Perhaps Bill and others will comment on 
your ideas.  Please let us know what further developments result. - Ed] 

------------------------------ 

Date: Fri, 5 Sep 86 13:25:06 PDT 
From: Hibbert.pa@Xerox.COM 
Subject: PARC Forum September 11:  NoteCards 

[Forwarded from: AI-ED Digest   Friday, 5 Sep 1986   V.1: Issue 30 - Ed] 

			PARC Forum 

                Thursday, September 11, 1986 
                4:00PM, PARC Auditorium 

Frank Halasz 
Randy Trigg 
Tom Moran 

Intelligent Systems Lab 
Xerox PARC 

NoteCards: An Experimental Environment for Idea Processing and 
Information Management 

NoteCards is an extensible environment designed to help people 
formulate, structure, compare, and manage ideas.  It was developed here 
at PARC as a vehicle for our research on the nature of idea processing 
tasks and the ways in which computers can be used to support 
intellectual work.  As part of this research, we have been actively 
seeding a community of NoteCards users inside Xerox and at a number of 
university, government, and industrial sites.  NoteCards is currently 
being used by more than 50 people engaged in idea processing tasks 
ranging from writing research papers through designing parts for 
photocopiers. 

In this forum, we will briefly demonstrate the current version of 
NoteCards and discuss the major design considerations that drive its 
development.  We will describe the NoteCards user community and the 
range of clever applications that are being developed using NoteCards. 
Finally, we will assess how well the system meets the needs of its 
users.  Specifically, we will argue that NoteCards is very successful in 
supporting the task of managing and organizing large collections of 
ideas, but is relatively less suited to the task of formulating and 
structuring these ideas.  We will also argue that the system lacks 
adequate support for collaborative work.  These assessments will be used 
to motivate and briefly describe the current research directions of the 
NoteCards project. 

------------------------------ 

Date:  4 Sep 86 19:58 PDT 
From: William Daul / McDonnell-Douglas / APD-ASD <WBD.MDC@OFFICE-1.ARPA> 
Author: Mitch Betts (ComputerWorld) 
Subject: ComputerWorld 9/1/86 p.31 "National Archives' Storage Under Scrutiny" 

 
   Comment: I thought this might be of interest to you.  It is copied without 
            permission.  --Bi// 
   Keywords: National Archives, information retrieval, infomation storage, 
             archives, historians 

WASHINGTON, D.C. -- The prestigious National Research Council has issued a 
report urging the National Archives not to use magnetic media or optical disks 
to permanently store historical documents. 

Optical disks and magnetic media last only 10 to 20 years for archival 
purposes, and the rapid pace of change in hardware and software technology 
suggests that it may be impossible to read the historical records in the 
centuries to come, according to the report, "Preservation of Historical 
Records." 

William Holmes, director of the National Archives and Records Administration's 
archival research and evaluation staff, stated that he agrees with the research 
report's conclusions. 

He said that although the agency plans a pilot test of digital imaging and 
optical-disk technology, optical disks will be used only for public retrieval 
and not for permanent storage. 

"Even if the operating systems and documentation problems somehow are dealt 
with, what is the archivist to do when the machine manufacturer declares the 
hardware obsolete or simply goes out of business?," the research report adked. 
"Will there be an IBM or a Sony in the year 2200?  If they still exist, will 
they maintain a 1980-1990 vintage machine?" the report continued. 

An example of the problem occurred in the mid-1970s when archivists discovered 
that there were only two computers that could read the 1960 U.S. census; one 
was in the Smithsonian Institution and the other was in Japan. 

The inescapable conclusion, the researchers said, is that long-term archives 
would be committed to an expensive file conversion program every 10-20 years if 
it uses electronic media for permanent storage. 

------------------------------ 

Date:  5 Sep 86 09:34 CDT 
From: "Don Young"@csnet-relay.csnet, 
      "Augmentation Systems Division"@csnet-relay.csnet, 
      MDC <DFY.MDC@OFFICE-1.ARPA> 
Subject: Re:ComputerWorld 9/1/86p.31 "National Archives' Storage Under Scrutiny" 

[Note: this is a follow up to previous message. - Ed] 

Thanks for putting this article on-line. 

Yes, the National Archive folks have two major problems: 

1.  The question that they ask us "WILL YOU BE AROUND AS A VENDOR TO SUPPORT 
YOUR PRODUCT OVER THE LONG TERM". 

2.  Problem with finding the proper recording devices for long term storage. 

The positive thing in the article is that they confirmed that they are going to 
run a pilot test.  This pilot test could be with ASD.  Also, the Air Force/Navy 
Standard Multiuser small Computer Requirements Contract (RFIas this point) 
describes Augment On-Line Files in good detail as a requirement.  The 
specification is Augment coupled with the methodology used by AFCC.  Will hope 
that the RFP states the same when available next month. 

------------------------------ 

Date:         Wed, 23 Jul 1986 13:06 CST 
From:         Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet> 
Subject:      SIGIR FORUM Abstracts [Part 4 of 4 - Ed] 

[Note: Members of ACM SIGIR should have received the spring/summer 
 Forum, and can find these on pages 39-42. The previous parts have 
 appeared in machine readable form in earlier issues of IRList. - Ed] 

                            ABSTRACTS 

(Chosen by G.  Salton or V. Raghavan from 1984 issues of journals 
in the retrieval area) 

30. STRUCTURE   OF  HIERARCHIC  CLUSTERINGS:    IMPLICATIONS  FOR 
    INFORMATION RETRIEVAL AND FOR MULTIVARIATE DATA ANALYSIS 

    F. Murtagh 
    Department  of Computer Science,  University College  Dublin, 
    Dublin 4 Ireland 

    Hierarchic  clustering  methods  may  be  used  to   condense 
    information  for  a user,  as they are in  multivariate  data 
    analysis, or to achieve computational advantages, as they are 
    in  information retrieval.   The structure of the  hierarchic 
    classification   produced   has  a  direct  bearing  on   the 
    effectiveness and utility of using cluster analysis, yet this 
    important  feature  of  the  classification  has  only   been 
    implicitly  referred to in the literature to date.   In  this 
    study,  three  different  coefficients are defined,  each  of 
    which   quantify   the   symmetry-asymmetry    (balancedness- 
    unbalancedness)  of hierarchic clusterings on a scale from  0 
    to  1.   Using examples of data from the areas of information 
    retrieval  and  of multivariate data analysis,  a  number  of 
    hierarchic  clustering methods are discussed in terms of  the 
    hierarchies they produce. 

    (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 
    611-617, 1984). 

31. AUTOMATIC INDEXING OF FULL TEXTS 

    Dr. Zdenek Jonak 
    Central   Office  of  Scientific,   Technical  and   Economic 
    Information, Prague, Czechoslovakia 

    The  article deals with the preparation of query  description 
    using  a  semantic analyser method based on the  analysis  of 
    semantic structure of documents.   The aim of the paper is to 
    demonstrate  the  efficiency of this method in the  field  of 
    automatic  indexing.   The results obtained by means of  this 
    method  are compared with results of  automatic  indexing 
    performed by some traditional methods and with the results of 
    indexing done by human indexers. 

    (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 
    619-627, 1984). 

 
32. ASPECTS AND THE OVERLAP FUNCTION 

    Marilyn M. Levine 
    Dr.  Levine's Information Machine,  823 N.  2nd Street,  Room 
    200, Milwaukee, WI 53203, USA 

    Leonard P. Levine 
    Department  of  Electrical Engineering and Computer  Science, 
    University of Wisconsin-Milwaukee, Milwaukee, WI  53201, USA 

    It  is  intuitively clear that putting the  cart  before  the 
    horse  is not the same as putting the horse before the  cart. 
    It  is equally clear that a history of philosphy is different 
    from  a  philosophy  of history.   Yet there  is  no  logical 
    relationship,  like  the AND/OR/NOT  functions,  which  would 
    enable  manipulation  of  these  permuted,   non-commutative, 
    relationships.   In  this  paper  we  present  a  system  for 
    automatic  handing  of ordered sets,  states based  on  these 
    sets, and of differing points of view regarding a Universe of 
    Discourse.   We  call what we are dealing with aspects and we 
    represent them by means of a new logical function called  the 
    Overlap function. 

    (INFORMATION PROCESSING AND MANAGEMENT,  Vol 20, NO. 5/6, pp. 
    629-636, 1984). 

 
33. A  COMPARISON  OF  TWO METHODS FOR  BOOLEAN  QUERY  RELEVANCY 
    FEEDBACK 

    G. Salton and E. Voorhees 
    Department of Computer Science,  Cornell University,  Ithaca, 
    NY 14853, USA 

    E. A. Fox 
    Department   of   Computer  Science,   Virginia   Polytechnic 
    Institute and State University, Blacksburg, VA 24061, USA 

    The relevance feedback process uses information derived  from 
    an  initially retrieved set of documents to improve subsequnt 
    search formulations and retrieval output.  In a Boolean query 
    environment  this  implies  that  new  query  terms  must  be 
    identified and Boolean operators must be chosen automatically 
    to  connect  the  various query terms.   In  this  study  two 
    recently proposed automatic methods for relevance feedback of 
    Boolean  queries  are  evaluated and  conclusions  are  drawn 
    concerning the use of effective feedback methods in a Boolean 
    query environment. 

    (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 
    637-651, 1984). 

34. ORGANIZATION OF CLUSTERED FILES FOR CONSECUTIVE RETRIEVAL 

    J. S. Deogun 
    University of Nebraska 

    V. V. Raghavan and T. K. W. Tsou 
    University of Regina 

    This  paper  studies the problem of storing single-level  and 
    multilevel   clustered  files.    Necessary  and   sufficient 
    conditions  for  a single-level clustered file  to  have  the 
    consecutive retrieval property (CRP) are developed.  A linear 
    time algorithm to test the CRP for a given clustered file and 
    to identify the proper arrangement of objects, If CRP exists, 
    is  presented.   For the single-level clustered files that do 
    not have CRP,  it is shown that the problem of identifying  a 
    storage organization with minimum redundancy is NP-complete. 

    Consequently,  an efficient heuristic algorithm to generate a 
    good  storage  organization  for  such  files  is  developed. 
    Furthermore,   it  is  shown  that,   for  certain  types  of 
    multilevel   clustered   files,   there  exists   a   storage 
    organization  such that the objects in each cluster,  for all 
    clusters  in  each  level  of  the  clustering,   appear   in 
    consecutive locations. 

    (ACM  TRANSACTIONS  ON  DATABASE  SYSTEMS,  Vol.  9,  No.  4, 
    December 1984, Pages 646-671) 

35. LASER OPTICAL DISK:  THE COMING REVOLUTION IN ON-LINE STORAGE 

    Larry Fujitani 

    Commercially available only recently,  the optical disk drive 
    uses  a  laser beam to burn impressions onto a plastic  disk. 
    Employing  a  highly  focused  beam  rather  than  a  diffuse 
    magnetic field to write, the laster optical disk drive yields 
    storage densities up to 10 times those of magnetic disks. 

    (COMMUNICATIONS OF THE ACM, Vol. 27, Number 6, June 1984) 

36. AUTOMATIC  SPELLING  CORRECTION IN SCIENTIFIC  AND  SCHOLARLY 
    TEXT 

    Joseph J. Pollock and Antonio Zamora 

    An  automatic spelling correcting algorithm corrects most  of 
    the 50,000 misspellings culled from 25,000,000 words of  text 
    from  seven  scientific and scholarly databases.   It uses  a 
    similarity  key to identify words in a large dictionary  that 
    are  most similar to a particular misspelling,  and  then  an 
    error-reversal  test to select from these the most  plausible 
    correction(s). 

    (COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April, 1984) 

37. THE DATA-DOCUMENT DISTINCTION IN INFORMATION RETRIEVAL 

    David C. Blair 

    The  speed  and effectiveness of documents retrieval  systems 
    can  be  greatly improved by reducing the number  of  logical 
    decisions  required of the user.   Based on the weighting  of 
    single  terms by the user,  the proposed system  provides  an 
    optimized search strategy by combining the terms to yield the 
    highest  probabilities  and then calculating the size of  the 
    retrieval set in each case. 

    (COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April 1984) 

------------------------------ 

END OF IRList Digest 
********************
IRList Digest Volume 2 Number 41

Share this article

Let's discover also

IRList Digest Volume 3 Number 40

IRList Digest Volume 2 Number 60

IRList Digest Volume 4 Number 57

IRList Digest Volume 1 Number 12

IRList Digest Volume 5 Number 01

IRList Digest Volume 2 Number 31

IRList Digest Volume 3 Number 45

IRList Digest Volume 1 Number 15

IRList Digest Volume 1 Number 13

IRList Digest Volume 4 Number 36

Recent Articles

Italian Christmas Sweet Bread

How the Angel Ended Up on Top of the Christmas Tree: A Hilarious Holiday Tale

Salmone finocchi e agrumi

Classic Christmas Cartoons from the '30s and '40s

Berlin :)

The First Earth's Circumnavigation by Antonio Pigafetta

Yak Facts Issue #10: It's Flavorific!

Yak Facts Issue #9: Now with Ginseng

Yak Facts Issue #8: As Seen On TV

Yak Facts Issue #7: Caution: Live Animals

Recent Comments