Copy Link
Add to Bookmark
Report
IRList Digest Volume 2 Number 49
IRList Digest Wednesday, 24 September 1986 Volume 2 : Issue 49
Today's Topics:
Announcement - Correction for book on software for content analysis
Article - Software Reuse Through Information Retrieval - Part 3 of 3
News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet
CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------
Date: 1986 Sep 14 20:44 EST
From: Bob Weber <WEBER3@HARVARDA>
[Extracted from CRTNET #58, 9/15/86 - Ed]
ERROR IN CONTENT ANALYSIS BOOK AND SOFTWARE FOR CONTENT ANALYSIS
In the second printing of my book, Basic Content Analysis (1985),
published by Sage in their series on quantitative methodology,
they have not corrected an important factual error which I had
asked them to fix.
The first paragraph on page 80 indicates that a computer
program for key-word-in-context listings is available
from the Harvard Laboratory for Computer Graphics; this
is no longer so.
That paragraph will be replaced with the following information
on the availability of the General Inquier computer system for
automated content analysis (the references are given in the
bibliography of the Sage book).
The replacement paragraph for first paragraph on page 80 will
state:
Beginning early in 1987, the current version of the
General Inquirer system (Kelly and Stone, 1975), the latest
Harvard dictionary (Dunphy, et al., 1974), and the Lasswell
Value Dictionary (Namenwirth and Weber, 1987)* will be
distributed by ZUMA. Distribution of software,
dictionaries, and documentation will be on an "as is" basis
and only for non-commercial use. A very small fee for
handling will be charged. Note: the General Inquirer is
written in PL1 for IBM mainframe computers only!
Interested readers should contact ZUMA, the Center For Surveys,
Methods, and Analysis, in Mannheim, FRG:
Computer Department
ZUMA
B2,1
Postfach 5969
D-6800 Mannheim 1
Federal Republic of Germany
*Namenwirth, J. Zvi and Robert Philip Weber. 1987. Dynamics of Culture.
Winchester, MA: Allen & Unwin.
Robert Philip (Bob) Weber
Harvard University
BITNET: WEBER3@HARVARDA
ARPA: Weber3%Harvarda.Bitnet@Wiscvm.Wisc.Edu
------------------------------
Date: Fri, 12 Sep 86 22:31:44 EDT
From: seismo!allegra!hoqam!wbf
Subject: Software Reuse through IR [ Part 3 of 3 - Ed]
Software Reuse Through Information Retrieval
W. B. Frakes
B. A. Nejmeh
AT&T Bell Laboratories
Holmdel, New Jersey 07733
[Note: sections 1-4, 5-6 appeared in the last two IRList issues - Ed]
7. A Software Template Design to Promote Reuse
The extent to which IR technology will promote software
reuse is directly related to the quality and accuracy of the
information in its software database. That is, poor
descriptions of code capability and functionality will
decrease the probability that the code will be located for
potential reuse during the search process. Likewise, lack of
information about how to call a function, the side-effects
of the function, and the environmental requirements of the
function also increase the overhead associated with its
reuse. We now propose a template for the descriptive
information that should be maintained for each module and
function in the code base to increase the ease with which it
can be reused.
Throughout this section we will use the terms module and
function. For our purposes, a module is a file consisting of
one or more functions. A function is as defined in the C
programming language. We now describe the contents of
module and function prologues which we believe will increase
the probability that the code appearing in the module is
located as a candidate for reuse whenever possible.
Likewise, we believe that the information contained in each
template will reduce the amount of time required to
interface into an existing function and assure that it is
performing the necessary operations without harmful side-
effects.
Our basic premise is that every module and function must
begin with a prologue. The contents of the prologue each
case will now be described.
7.1 Module Prologue
We endorse the following format for a module prologue.
<Top of Page>
/*
* Module : the name of the module.
*
* Description: a concise description of what the
* functions contained in the file do. This
* description should be written with an understanding
* that generic inquiries into the source data base
* will be matched on the prose appearing in this
* section of the file.
*
* Supporting Docs: References to supporting requirements or design
* documents should be given here.
*
* Contents: List the functions appearing in the file in
* the order in which they appear, with a brief
* description of each function.
*
* Data: List all of the global data defined in the file with
* a brief description of each data item.
*
* Environmental Requirements :List all of the hardware and software
* that the module requires (i.e. certain
* kinds of hardware, specific software
* libraries, etc.) to function properly.
*
*/
7.2 Function Prologue
We endorse the following format for a function prologue.
<Top of Page>
/*
* Function : the name of the function.
*
* Author: name, location, and phone number of developer
* who wrote the function.
*
* Date: date the function was written
*
* Description: a concise overview of the function
* in terms of the processing it performs. In
* addition, the input, output, and transformational
* processing performed by the function should be
* described.
*
* Usage: List the #include files necessary to call the
* function.
*
* Parameters: The parameters passed to the function with a
* description of each parameter should appear
* here. For pointer parameters, the object
* pointed to should be discussed. Finally, if
* the value of any parameter is changed by the
* function, the modification should be described.
*
* Externals: All of the global variables referenced in the
* function, along with how their values are
* modified should be described here.
*
* Macros: List the macros used by the function.
*
* Returns: The value returned by the function should be
* described here. The function should be declared
* "void" if it does not return a value.
*
* Calls: List the functions called by this function along
* with the modules in which the called functions
* appear.
*
* Called By: List the functions and their corresponding files
* which call this function.
*
* Modifications: For each change to the file, list the following
* information: Date, Author of Change, Description
* of Change, Reason for Change.
*
*/
8. Future Directions
Certain areas of IR research are likely to improve IR
systems as tools for managing software reuse. Despite
extensive research on IR systems, improvements have been
slow in coming, and the systems in practical use today are
quite similar to those in use in the 1960's. Such
improvements as have been observed have in general been more
due to improvements in general computing environments than
to advances in IR research per se. However the use of user
feedback [16] has given experimental improvements in
retrieval performance, as has the use of extended boolean
models [17].
A major practical problem in IR is the management of very
large databases. Databases in existence today have already
pushed the limits of magnetic disk storage, and these
databases are growing exponentially. Storage of the source
code and documentation for projects in large corporations
will also result in very large databases. Optical storage
technology offers the ability to store gigabytes of
information on a single optical disk, thus offering a
solution to this problem. Current optical disk technology is
write once, however multiwrite technology will probably be
available within the next two years.
As IR databases become larger and larger, it becomes
difficult to search and retrieve records quickly. To address
this problem, specialized hardware to perform IR operations
has been built [18] [19]. Such hardware promises to provide
searching speeds for full text of millions of characters per
second. Specialized hardware is also needed to speed up
certain IR operations such as stemming and set processing
that are bottlenecks in current systems.
A central problem of IR has been how to represent the
meaning of text or other records in a way comprehensible to
a computer. The knowledge representation techniques used in
AI systems [20] offer promise in this direction. Oddy [21]
has used a semantic net approach to document representation,
production rules have been used to create an intelligent
thesaurus [22], and natural language systems have been used
to extract and formalize the information in medical
documents [23].
Taking these newer technologies together, it appears
probable that future IR systems for software reuse will have
capabilities for massive storage in the gigabyte range, and
specialized hardware for text searching, and set
combination. Such systems will allow better semantic
representation of records, and will provide intelligent
interfaces that will guide users in system use. Other
probable developments in IR technology can be found in Fox
[24].
9. Conclusion
We have argued that reuse is crucial if we are to deliver
efficient, reliable, and maintainable software in a timely
manner. The lack of adequate tools to organize, search, and
retrieve reusable modules has impeded reuse. We have
proposed IR systems as the technology of choice for managing
code reuse, using the CATALOG system to demonstrate the
feasibility of this approach. We have concluded by
discussing important trends in IR research and development
likely to impact the reuse problem.
REFERENCES
1. DeMarco, T., Lister, T. Controlling Software Projects:
Management, Measurement, and Evaluation, Seminar Notes,
New York, Atlantic Systems Guild Inc., 1984.
2. Frakes, W.B. "Term Conflation for Information
Retrieval", in VanRijsbergen C.J. Ed. Research and
Development in Information Retrieval Cambridge:
Cambridge University Press, 1984.
3. Frakes, W.B., Leighton W.J., "The Catalog Information
Management System", Proceedings of Symposium on
Workstations in the Future Computing Environment , AT&T
Bell Laboratories , Naperville Il., 1985.
4. Standish, T., "An Essay on Software Reuse", IEEE
Transactions on Software Engineering, Vol. SE-10, Sept.
1984.
5. Boehm, Barry, Software Engineering Economics, Prentice-
Hall, Englewood Cliffs N.J., 1981.
6. Horowitz, E. and Munson, J. "An Expansive View of
Software Reuse", IEEE Transactions on Software
Engineering, Vol. SE-10, Sept. 1984.
7. Frank, W.L., "What Limits to Software Gains",
Computerworld, pp. 65-70, May 4, 1984.
8. Grabow, P., "Software Reuse, Where Are We Going?", IEEE
COMPSAC85, Oct. 9-11, 1985, pp.202.
9. McNamara, D. "Japanese Software Factories", presentation
at Computer Science Colloquium, University of
California, Irvine, May 1983.
10. Huang, C., "Reusable Software Implementation Technology
: A Review of the Current Practice", IEEE COMPSAC85,
Oct. 9-11, 1985, pp.207.
11. Date, C. J., An Introduction to Database Systems, 3rd
Ed. Reading, Mass., Addison Wesley, 1981.
12. Lancaster F. W. and Fayen, E. G. Information Retrieval
On-Line, Los Angeles, Melville Publishing Co., 1973.
13. Salton G. and McGill M. Introduction to Modern
Information Retrieval, New York, McGraw-Hill, 1983.
14. Crocker, S.L., Frakes, W.B., Leon, R.V., Tortorella, M.,
"SUPER: System Used for Prediction and Evaluation of
Reliability", Paper read at IEEE Conference on
Reliability of Computer Controlled Telecommunications
Systems, 1985, at Val David, Canada.
15. Frakes, W.B., "LATTIS: A Corporate Library and
Information System for the UNIX Environment", To appear
in the Proceedings of the National Online Conference,
1986.
16. Rocchio, J. J., "Relevance Feedback in Information
Retrieval" in The SMART Retrieval System - Experiments
in Automatic Document Processing, G. Salton Editor,
Prentice-Hall Inc., Englewood Cliffs N.J., 1971, Chapter
14.
17. Salton, G., Fox, E., Wu, H., "Extended Boolean
Information Retrieval", Communications of the ACM,
26(11): pp. 1022-1036, Nov, 1983.
18. Proceedings of the Fourth Workshop on Computer
Architecture for Nonnumeric Processing, Syracuse, N.Y.
1979.
19. Hollaar, L.A., "The Utah Text Retrieval Project -- A
Status Report", in VanRijsbergen C.J. Ed. Research and
Development in Information Retrieval Cambridge:
Cambridge University Press, 1984.
20. Winston, Patrick Henry, Artificial Intelligence 2nd Ed.,
Reading Mass., 1984.
21. Oddy, R. N., "Information Retrieval Through Man-Machine
Dialogue", Journal of Documentation, 33. 1-14(1977).
22. McCune, B. et. al. "RUBRIC: A System for Rule Based
Information Retrieval", IEEE Transactions of Software
Engineering, 1985.
23. Sager, Naomi, "Information Structures in Texts of a
Sublanguage", Proceedings of 44th ASIS Annual Meeting,
Washington D.C., October 1981.
24. Fox, Christopher and Zappert, F., "Future Generation
Information Systems", To appear in the Journal of the
American Society for Information Science.
------------------------------
END OF IRList Digest
********************