Copy Link
Add to Bookmark
Report
IRList Digest Volume 3 Number 50
IRList Digest Wednesday, 23 December 1987 Volume 3 : Issue 50
Today's Topics:
Report - Vassar workshop on text encoding standard for the humanities
News addresses are
Internet or CSNET: fox@vtopus.cs.vt.edu
BITNET: foxea@vtvax3.bitnet
----------------------------------------------------------------------
Date: Wed, 2 Dec 87 22:50:46 est
From: amsler@flash.bellcore.com (Robert Amsler)
Subject: Text Encoding Standard for the Humanities - Vassar Workshop report
[The following is a summary prepared by Michael Sperberg-McQueen for
the HUMANIST mailing list of the first workshop on the preparation of
an encoding standard for text in the humanities held at Vassar
College last month. As an attendee and steering committee member, I
would be willing to answer further questions concerning this effort
for the IRLIST or NL-KR communities. The effort to develop a standard for
encoding texts in the humanities is just starting and anyone with
interest in this noble and ambitious goal should not feel the
slightest hesitancy about becoming a part of the effort. What is at
stake is nothing less than the creation, use and preservation of our
global electronic cultural heritage - R. Amsler, (amsler@flash.bellcore.com)]
Contributor: "Michael Sperberg-McQueen" <U18189@UICVM>
A followup on the current status of the ACH effort to formulate
guidelines for text encoding practices.
******************************************************************
* NOTE: The following encoding conventions have been used to *
* represent French accents throughout this message: *
* *
* To Represent Accents -- Pour la representation des accents *
* / acute accent - accent aigu *
* ` grave accent - accent grave *
* *
* The accent codes are typed Les codes pour les accents se *
* AFTER the letter, and are trouvent APRES la lettre qu'ils *
* used with both upper and modifient, et s'utilisent avec *
* lower case letters. les majuscules aussi bien que *
* les minuscules. *
******************************************************************
On November 12 and 13, 1987, 31 representatives of professional
societies, universities, and text archives met to consider the
possibility of developing a set of guidelines for the encoding of texts
for literary, linguistic, and historical research. The meeting was
called by the Association for Computers and the Humanities and funded
by the National Endowment for the Humanities. The list of participants
is appended to this document.
The participants heartily endorsed the idea of developing encoding
guidelines. In order to guide such development, they agreed on
the following principles:
The Preparation of Re/daction des directives
Text Encoding Guidelines pour le codage des textes
Poughkeepsie, New York
13 November 1987
1. The guidelines are intended 1. Le but des directives est de cre/er
to provide a standard format un format standard pour l'e/change
for data interchange in des donne/es utilise/es pour la
humanities research. recherche dans les humanite/s.
2. The guidelines are also 2. Les directives sugge/reront
intended to suggest principles e/galement des principes pour
for the encoding of texts l'enregistrement des textes
in the same format. destine/s a` utiliser ce format.
3. The directives should 3. Les directives devraient
a. define a recommended a. de/finir une syntaxe recommande/e
syntax for the format pour exprimer le format,
b. define a metalanguage b. de/finir un me/ta-langage
for the description de/crivant les syste`mes de
of text-encoding schemes, codage des textes,
c. describe the new format c. de/crire par le moyen de ce
and representative me/talangage, aussi bien qu'en
existing schemes both in prose, le nouveau syste`me de
that metalanguage and codage aussi bien qu'un choix
in prose. repre/sentatif de syste`mes
de/ja` en vigueur.
4. The guidelines should 4. Les directives devraient proposer
propose sets of coding des syste`mes de codage utilisables
conventions suited for pour un large e/ventail
various applications. d'applications.
5. The guidelines should 5. Sera incluse dans les directives
include a minimal set of l'e/nonciation d'un syste`me de
conventions for encoding codage minimum, pour guider
new texts in the format. l'enregistrement de nouveaux textes
conforme/ment au format propose/.
6. The guidelines are to be 6. Le travail d'e/laboration des
drafted by committees on: directives sera confie/ a` quatre
comite/s centre/s sur les sujets
suivants:
a. text documentation a. la documentation des textes,
b. text representation b. la repre/sentation des textes,
c. text interpretation c. l'analyse et l'interpre/tation
and analysis des textes
d. metalanguage definition d. la de/finition du me/talangage et
and description of son utilisation pour de/crire le
existing and proposed nouveau syste`me aussi bien que
schemes ceux qui existent de/ja`.
co-ordinated by a steering Ce travail sera coordonne/ par un
committee of representatives comite/ d'organisation ou`
of the principal sie`geront des repre/sentants des
sponsoring organizations. principales associations qui
soutiennent cet effort.
7. Compatibility with existing 7. Dans la mesure du possible, le
standards will be maintained nouveau syste`me sera compatible
as far as possible. avec les syste`mes de codage
existants.
8. A number of large text 8. Des repre/sentants de plusieurs
archives have agreed in grandes archives de textes en form
principle to support the lisible par machine acceptent en
guidelines in their function principe d'utiliser les directives
as an interchange format. en tant que description des formats
We encourage funding agencies pour l'e/change de leurs donne/es.
to support development of Nous encourageons les organismes
tools to facilitate this qui fournissent des fonds pour la
interchange. recherche de soutenir le
de/veloppement de ce qui est
ne/cessaire pour faciliter cela.
9. Conversion of existing 9. En convertissant des textes
machine-readable texts to lisibles par machine de/ja`
the new format involves the existants, on remplacera
translation of their automatiquement leur codage actuel
conventions into the syntax par ce qui est ne/cessaire pour les
of the new format. No rendre conformes au format nouveau.
requirements will be made for Nul n'exigera l'ajout
the addition of information d'informations qui ne sont pas
not already coded in the de/ja` repre/sente/es dans ces
texts. textes.
(trad. P. A. Fortier)
******************
The further organization and drafting of the guidelines will be
supervised by a steering committee selected by the three sponsoring
organizations: ACH (the Association for Computers and the Humanities),
ACL (the Association for Computational Linguistics), and ALLC (the
Association for Literary and Linguistic Computing). Drafts of the
guidelines will be submitted for comment to an editorial committee with
representatives of all participating organizations (in addition to the
sponsors, thus far: the Modern Language Association, the Association
for Computing Machinery Special Interest Group for Information
Retrieval, and the Association of American Publishers; the following
groups have indicated interest informally but have not yet formally
pledged participation, in most cases pending a formal vote: the
Linguistic Society of America, the Association for Documentary Editing,
the American Philological Association. The American Anthropological
Association, plus several organizations within Europe, are now being
asked to consider participation.
The interchange format defined by the guidelines is expected to be
compatible with the Standard Generalized Markup Language defined
by ISO 8859, if that proves compatible with the needs of research. The
needs of specialized research interests will be addressed wherever it
proves possible to find interested groups or individuals to do the
necessary work and achieve the necessary consensus. Formation of
specific working groups will be announced later; in the meantime, those
interested in working on specific problems are invited to contact
either Dr. C. M. Sperberg-McQueen, Computer Center, University of
Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on
Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer
Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet: IDE at
VASSAR).
- N.I., C.M.S-McQ
------------------------------------------------------------------------------
List of Participants
NOTE: Association names are given following the names of their
representatives at this meeting.
Helen Aguera, National Endowment for the Humanities
Robert A. Amsler, Bell Communications Research
David T. Barnard, Department of Computing and Information Science,
Queen's University, Ontario
Lou Burnard, Oxford Text Archive
Roy Byrd, IBM Research
Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa
David Chestnutt (Assoc. for Documentary Editing, American Historical
Assoc.), Department of History, University of South Carolina
Yaacov Choueka (Academy of the Hebrew Language), Department of
Mathematics and Computer Science, Bar-Ilan University
Jacques Dendien, Institut National de la Langue Francaise
Paul A. Fortier, Department of Romance Languages, University of
Manitoba
Thomas Hickey, OCLC Online Computer Library Center
Susan Hockey (Association for Literary and Linguistic Computing),
Oxford University Computing Service
Nancy M. Ide (Association for Computers and the Humanities),
Department of Computer Science, Vassar College
Stig Johansson, International Computer Archive of Modern English,
University of Oslo
Randall Jones (Modern Language Association), Humanities Research
Computing Center, Brigham Young University
Robert Kraft, Center for the Computer Analysis of Texts, University of
Pennsylvania
Ian Lancashire, Center for Computing in the Humanities, University of
Toronto
D. Terence Langendoen (Linguistic Society of America), Graduate
Center, City University of New York
Charles (Jack) Meyers, National Endowment for the Humanities
Junichi Nakamura, Department of Electrical Engineering, Kyoto
University
Wilhelm Ott, Universitaet Tuebingen
Eugenio Picchi, Istituto di linguistica computazionale, Pisa
Carol Risher (American Association of Publishers), American
Association of Publishers, Inc.
Jane Rosenberg, National Endowment for the Humanities
Jean Schumacher, Centre de traitement e/lectronique de textes,
Universite/ catholique de Louvain a` Louvain-la-neuve
J. Penny Small (American Philological Association), U.S. Center for
the Lexicon Iconographicum Mythologiae Classicae, Rutgers
University
C.M. Sperberg-McQueen, Computer Center, University of Illinois at
Chicago
Paul Tombeur, Centre de traitement e/lectronique de textes,
Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium
Frank Tompa, New Oxford English Dictionary Project, University of
Waterloo
Donald E. Walker (Association for Computational Linguistics), Bell
Communications Research
Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy
------------------------------
END OF IRList Digest
********************