Copy Link
Add to Bookmark
Report
NL-KR Digest Volume 10 No. 12
NL-KR Digest (Fri Mar 19 10:28:54 1993) Volume 10 No. 12
Today's Topics:
Query: algorithms to split words into morphemes
Query: English to Italian translation
Talk: Jon Ogborn on Modelling clay for computers at BBN
CFP: New OED Conference - Making Sense of Words
Announcement: IJCAI-93 server
Announcement: AISB93 Dinner speaker
Announcement: Corpus-Based Frequency Count of Modern Chinese
Announcement: HCRC Map Task Corpus on CD
Submissions: nl-kr@cs.rpi.edu
Requests, policy: nl-kr-request@cs.rpi.edu
Back issues are available from host archive.cs.rpi.edu [128.213.3.18] in
the files nl-kr/Vxx/Nyy (ie nl-kr/V01/N01 for V1#1), mail requests will
not be promptly satisfied. Starting with V9, there is a subject index
in the file INDEX. If you can't reach `cs.rpi.edu' you may want
to use `turing.cs.rpi.edu' instead.
BITNET subscribers: we now have a LISTSERVer for nl-kr.
You may send submissions to NL-KR@RPITSVM
and any listserv-style administrative requests to LISTSERV@RPITSVM.
-----------------------------------------------------------------
To: nl-kr@cs.rpi.edu
From: J_KANE@unhh.unh.edu (John J Kane)
Newsgroups: comp.ai.nlang-know-rep
Subject: Query: algorithms to split words into morphemes
Date: 16 Mar 1993 23:25:10 GMT
... possibly including discussion of methods for handling ambiguous cases.
Suggestions welcome. Will share results of search.
Limited news access; prefer mail at jjk%nhstrat@virgin.mv.com
[Explaining astrophysics is child's play compared to explaining child's play.]
------------------------------
To: nl-kr@cs.rpi.edu
Newsgroups: comp.ai.nlang-know-rep
From: ferretti@ipvmv1.unipv.it
Subject: Query: English to Italian translation
Summary: Is there any package ?
Keywords: nat-language, translation
Is anybody aware of a package for automatic translation from
English to Italian for specific language domains, such as
computer science, EE, and so on ?
The ideal tool would allow to tailor the associated dictionary
and would be capable of handling a fairly simple syntax.
If this group is the wrong one, a redirection is gratefully
acknowledged.
Hints through the Net or directly to
ferretti@ipvmv1.unipv.it
Marco Ferretti
DIS-University of Pavia, Italy
------------------------------
To: nl-kr@cs.rpi.edu
Date: Thu, 11 Mar 93 9:54:08 EST
From: Helene George <hgeorge@BBN.COM>
Subject: Talk: Jon Ogborn on Modelling clay for computers at BBN
AI Seminar Series
Who: Jon Ogborn
Professor of Science Education
Institute of Education
University of London
Title: Modelling clay for computers
Where: 6/471
Time: 12:30 - 1:30
Date: March 30, 1993
Abstract
How can students of all ages use the computer to model the real
world? Modelling systems which iteratively solve difference
equations are now common, and useful for older students. But
they require that the world be imagined as composed of variables,
not things. And they need some minimum mathematical
sophistication. This paper discusses two new modelling tools
suitable for quite young students, which could provide an
introduction to modelling. One tool allows systems of variables
to be constructed, without having to specify mathematical
relations between them. The other provides for interacting
objects whose behaviour can be specified, again without
mathematics, through drawing Tbefore and afterU pictures to
express interactions of objects. It is argued that the different types
of models fit naturally into a developmental sequence, matching
modelling at various ages to student's intellectual growth. A
radical re-sequencing of teaching about Mathematics in Science is
proposed.
To create a world, whether constituted of variables or of objects,
and to watch it evolve is a remarkable experience. It can teach
one what it means to have a model of reality, which is to say
what it is to think. It can show both how good and how bad such
models can be. And by becoming a game played for its own sake
it can be a beginning of purely theoretical thinking about forms.
The microcomputer brings something of this within the reach of
most pupils and teachers.
------------------------------
To: nl-kr@cs.rpi.edu
Date: Wed, 17 Mar 93 16:45:17 -0500
From: Frank Wm Tompa <fwtompa@daisy.uwaterloo.ca>
Subject: CFP: New OED Conference - Making Sense of Words
CALL FOR PAPERS
MAKING SENSE OF WORDS
9th Annual Conference of the
University of Waterloo Centre for the New OED and Text Research
September 27 - 28, 1993
St. Cross Building
Oxford, England
The Ninth Annual Conference of the University of Waterloo Centre
for the New OED and Text Research, jointly sponsored by the
University of Waterloo and the Oxford University Press, will be
held at St. Cross Building (with accommodations at St. Edmund
Hall), Oxford, England, on September 27-28, 1993.
This year's conference will focus on computational solutions to
problems of equivalence among words and phrases. Within lexicog-
raphy, one of the most important problems in this area is one of
grouping equivalents: sifting through corpus citations to form
sense groups. Within lexicology and computational linguistics,
there are problems of finding equivalents: matching citations to
dictionary senses, aligning one dictionary's senses with
another's, and aligning parts of texts with their translations.
In related fields, there are problems of forming equivalents:
generating translations, expanding full-text queries to include
synonyms, and tailoring texts to suit specific audiences.
Conference participants will again include researchers from com-
puter science and the humanities, as well as representatives from
publishing houses and other industries.
Papers presenting original research on theoretical and applied
aspects of the theme are being sought. Typical but not exclusive
areas of interest include computational lexicology, computational
linguistics, syntactic and semantic analysis, computational lexi-
cography, lexical databases, computer-assisted translation, and
online reference works.
Submissions will be refereed by the program committee listed
below. Authors should send seven copies of a detailed abstract
(5 to 10 pages) by April 27, 1993, to:
Prof. Frank Tompa, Program Chair
UW Centre for the New OED and Text Research
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
or
email: newoed@uwaterloo.ca
or
fax: 519-885-1208
Late submissions risk rejection without consideration. Authors
will be notified of acceptance or rejection by June 18, 1993. A
working draft of the paper, not exceeding 15 pages, will be due
by July 16, 1993, for inclusion in proceedings which will be made
available at the conference.
Program Committee
Beryl T. Atkins (Oxford University Press)
Kenneth Church (AT&T Bell Laboratories)
Eduard Hovy (University of Southern California)
Nancy Ide (Vassar College)
Robert Ingria (BBN Laboratories)
Frank Tompa, Chair (University of Waterloo)
------------------------------
To: nl-kr@cs.rpi.edu
From: Jean-Pierre Laurent <jplaure@imag.fr>
Date: Tue, 16 Mar 1993 17:49:24 +0100
Subject: Announcement: IJCAI-93 server
***************************************************************
* INFORMATION ABOUT IJCAI-93, USING THE EMAIL IJCAI SERVER *
***************************************************************
The IJCAI server contains the Conference Brochure of IJCAI-93
and the list of accepted papers.
To access to this information, you have to send mails to the
IJCAI server, as follows:
* First, to obtain the content of the IJCAI server,
send a mail to
ijcai-serv@imag.fr
the subject can be empty (or anything you want),
the content must be:
index
You will receive a reply with the list of all available files
in the IJCAI server (name and brief description of the content).
* Second, to receive the file NAME, send a new mail at the
same address :
ijcai-serv@imag.fr
the subject is again empty or anything you want,
the content must be :
get NAME
You will receive a reply with the content of the file NAME.
***************************************************************
- -
JP Laurent
------------------------------
To: nl-kr@cs.rpi.edu
To: comp-ai-nlang-know-rep
From: axs@cs.bham.ac.uk (Aaron Sloman)
Newsgroups: comp.ai,comp.ai.edu,comp.ai.neural-nets,comp.ai.nlang-know-rep
Subject: Announcement: AISB93 Dinner speaker
Date: 18 Mar 93 23:06:23 GMT
Organization: School of Computer Science, University of Birmingham, UK
I am very pleased to announce that Professor Derek Partridge, University
of Exeter, has agreed to give the "after dinner" talk at the Conference
Banquet on Thursday 1st April in the City of Birmingham's Repertory
Theatre.
His title is
"If you think connectionism killed AI wait till you hear
what it did to computer science."
Reminder: the AISB93 conference, at the University of Birmingham
March 30th to April 2nd has the theme "Prospects for AI as the
General Science of Intelligence". There are very large reductions for
student registrations. Full registration (excluding accommodation and
meals) 175 pounds (+30 pounds for non AISB members). 40 pounds for
full time students.
* For a programme and registration form please email the auto-reply
service aisb93-info@cs.bham.ac.uk
Brochures and posters available from:
* Other enquiries: AISB'93, School of Computer Science, The University of
Birmingham, Edgbaston, Birmingham, B15 2TT, U.K.
Phone: +44-(0)21-414-3711 Fax: +44-(0)21-414-4281
Email aisb93-prog@cs.bham.ac.uk
Aaron Sloman (Programme Chair)
=======================================================================
------------------------------
To: nl-kr@cs.rpi.edu
From: rocltsh@iis.sinica.edu.tw
Subject: Announcement: Corpus-Based Frequency Count of Modern Chinese
Date: Tue, 16 Mar 93 16:20:04 EAT
Corpus-Based Frequency Count of Modern Chinese
Corpus-based study of Chinese is one of the research projects of
the Chinese Knowledge Information Processing Group (CKIP) at
Academia Sinica. The current research is based on a Chinese
newspaper corpus, which amounts to 20,698,116 characters (
9,540,444 words after word segmentation.) Four technical reports
in Chinese are published. These include:
1. Corpus-Based Frequency Count of Characters in Journal Chinese
30 pages (US$ 5)
2. Corpus-Based Frequency Count of Words in Journal Chinese
300 pages (US$ 20)
3. The Most Frequent Verbs in Journal Chinese and Their
Classification
140 pages (US$ 10)
4. The Most Frequent Nouns in Journal Chinese and Their
Classification 150 pages (US$ 10)
The first report lists 5,666 distinct characters which appear in
the entire corpus. The second report contains 42,686 words that
occur more than three times in the corpus. The most common 14,956
words constitute more than 99.9995 percent of all the words
occurring in the corpus. The third and the fourth report include
19,907 verbs and 21,368 nouns respectively which occur more than
twice in the corpus with their syntactic or semantic
classification. To order, please list the desired title(s) and
enclose a cheque of the appropriate amount payable to the
Computational Linguistic Society of the R.O.C. (ROCLING). The
prices listed above include postage and handling.
Address : Miss Tsai Shu-hui
ROCLING
Institute of Information Science
Academia Sinica, Nankang
Taipei, Taiwan 11529
R.O.C.
Tel. : 886-2-788-1638
Fax : 886-2-788-1638
E-Mail : rocltsh@iis.sinica.edu.tw
------------------------------
To: nl-kr@cs.rpi.edu
From: "Henry S. Thompson" <ht@cogsci.edinburgh.ac.uk>
Date: Thu, 18 Mar 93 23:03:02 GMT
Subject: Announcement: HCRC Map Task Corpus on CD
The HCRC Map Task Corpus
The Human Communication Research Centre (HCRC) is happy to announce
the release of the Map Task Corpus. The Map Task Corpus is a set of 8
CD-ROMs containing linked audio and transcriptions of a total of about
18 hours of spontaneous speech that was recorded from 128 two-person
conversations according to a detailed experimental design.
Altogether, the corpus as distributed provides a thorough and
invaluable set of resources and tools for use in analyzing all levels
of linguistic structure, via both text-based and speech-based
investigation. The range of research questions that are addressable
using this corpus span a wide spectrum of linguistic and cognitive
issues. We have kept the price as low as possible to encourage
researchers from many disciplines to use this corpus as a common
reference point for many different kinds of research.
The HCRC is an interdisciplinary research centre at the Universities
of Edinburgh and Glasgow, supported by the UK Economic and Social
Research Council and the Universities Funding Council. The publication
of the Map Task Corpus was made possible by assistance from the
Linguistic Data Consortium.
Corpus Details
64 different speakers, 32 female, 32 male, all adults, each took part
in four conversations in a quiet recording studio. They were all
students at the University of Glasgow, 61 of them being native Scots.
The conversations were carried out in an experimental setting in which
each participant has a schematic map in front of them, not visible to
the other. Each map is comprised of an outline and roughly a dozen
labelled features (e.g. "a white cottage", "an oak forest", "Green
Bay", etc). Most features are common to the two maps, but not all. One
map has a route drawn in, the other does not. The task is for the
participant without the route to draw one on the basis of discussion
with the participant with the route. In addition to the conversations,
each speaker provides a wordlist reading, consisting of the major
vocabulary items contained in the conversations. All recordings were
direct to Digital Audio Tape (DAT) at 48KHz, providing very good
acoustic quality.
The experimental design allows a number of different phonemic,
syntactico-semantic and pragmatic contrasts to be explored in a
controlled way. In particular, maps and feature names were designed
to allow for controlled exploration of phonological reductions of
various kinds in a number of different referential contexts, and to
provide, via varying patterns of matches and mis-matches between the
two maps, a range of different stimuli for referent negotiation. Also
the conditions of the conversations were carefully balanced: In half
of them the speakers were strangers, in half friends; in half of them
the speakers could see each other's faces, in half they could not.
Subjects accommodated easily to the task and experimental setting, and
produced evidently unselfconscious and fluent speech. The syntax is
largely clausal rather than sentential; showing good turn-taking, with
modest amounts of overlap and interruption. The total corpus runs to
about 18 hours of speech, with the transcripts consisting of around
150,000 word tokens drawn from just over 2,000 word form types.
Transcription is at the orthographic level, quite detailed,
including filled pauses, false starts and repetitions, broken words,
etc. Considerable care has been taken to ensure consistency of
notation, which is thoroughly documented. Although the full
complexity of overlapped regions has not been reflected in the
transcriptions, such regions are clearly set off from the rest of the
transcripts. Transcripts are connected to the acoustic sampled data
by sample numbers marked every few turns.
CD-ROM Contents
The waveform data are provided in "raw" (headerless) files (16-bit
samples, 20 kHz sample rate, 2 channels per conversation), and
alternative header files are provided for use with software based on
either the NIST "SPHERE" header structure or the European "SAM" header
structure. Transcriptions are provided for each conversation, marked
up with TEI-compliant SGML, in a minimally intrusive and easily
separated way. PostScript files of the map images used in the
experiments are provided, along with full documentation of the
experimental design and data collection protocol, resources for using
SGML tools on the transcriptions and other text materials, and an
extensive set of source code for performing basic signal processing
functions on the waveform data, such as down-sampling,
de-multiplexing, channel summation, and D/A conversion for Sun
workstations (including playback of segments selected via inspection
of transcripts in Emacs).
The CD-ROMs are in High Sierra (ISO 9660) format with the RockRidge
extensions, and are compatible with (inter alia) Unix, MS-DOS and
Macintosh operating systems.
Copies of the Map Task Corpus are available from the LDC for $200 or
from HCRC for 164.50 UK pounds (including VAT) at the addresses given
below, plus postage and packing as necessary. Please contact us (by
e-mail if possible) for details of payment methods and shipping costs.
In Europe please contact
Henry Thompson
University of Edinburgh
Human Communication Research Centre
2 Buccleuch Place
Edinburgh EH8 9LW
Scotland
Tel: +44 31 650-4440
Fax: +44 31 650-4587
email: maptask@cogsci.ed.ac.uk
or
Dawn Griesbach
ELSNET
2 Buccleuch Place
Edinburgh EH8 9LW
Scotland
Tel: +44 31 650-4594
Fax: +44 31 650-4587
email: elsnet@cogsci.ed.ac.uk
Outside Europe please contact
Elizabeth Hodas
Linguistic Data Consortium
441 Williams Hall
University of Pennsylvania
Philadelphia, PA 19104-6305
Tel: (215) 898-0464
Fax: (215) 573-2175
email: ehodas@unagi.cis.upenn.edu
------------------------------
End of NL-KR Digest
*******************