Copy Link
Add to Bookmark
Report
NL-KR Digest Volume 12 No. 17
NL-KR Digest Tue Aug 24 21:16:02 PDT 1993 Volume 12 No. 17
Today's Topics:
FAQ: frequent questions about sources
Announcement: Report available - 36 Semantic Problems
CFP: ISIKNH'94, Integrating Knowledge and Neural Heuristics
Subcriptions, requests, policy: nl-kr-request@cs.rpi.edu
Submissions: nl-kr@cs.rpi.edu
Back issues are available from host ftp.cs.rpi.edu [128.213.3.254] in
the files nl-kr/Vxx/Nyy (e.g. nl-kr/V01/N01 for V1#1), or by gopher at
cs.rpi.edu, Port 70, choose RPI CSLab Anonymous FTP Server. Mail requests
will not be promptly satisfied. Starting with V9, there is a subject index
in the file INDEX. If you can't reach `cs.rpi.edu' you may want
to use `turing.cs.rpi.edu' instead.
BITNET subscribers: please use the UNIX LISTSERVer for nl-kr as given above.
You may send submissions to NL-KR@cs.rpi.edu as above
and any listserv-style administrative requests to LISTSERV@AI.SUNNYSIDE.COM.
-----------------------------------------------------------------------
Date: Tue, 17 Aug 1993 09:54:05 -0400
From: weltyc@cs.rpi.edu (Chris Welty)
Subject: FAQ: frequent questions about sources
Chris Welty has begun a FAQ (a response to frequently asked questions)
for this NL-KR newsgroup. Since many questions seem to deal with queries
about sources for lexicons, grammars, and the like, the beginnings of
this FAQ deal with some of these questions. If anyone has further
interesting information to contribute to this FAQ, please send them to
Chris at weltyc@cs.rpi.edu.
----------
From weltyc@cs.rpi.edu Tue Aug 17 06:55:35 1993:
A company called Circle-Noetics has several dictionaries available.
They also have software that hyphenates words on-the-fly. I don't
have any more about the company at this time except that they're based
in Massachusetts. Their dictionaries were some of the more reasonably
priced compared to other vendors.
----------
From: ted@NMSU.Edu Date: Wed, 5 Feb 92 12:21:47 MST
Subject: On line dictionaries (CLR)
The Consortium for Lexical Research is designed to serve as a focus
for research on lexical matters, and as a repository for software and
resources of importance to the lexical community.
While I don't know of anything that we have that is directly pertinent
to your interests, I am enclosing some information about our current
holdings. If any of these are of interest, you can obtain more
information via anonymous ftp to clr.nmsu.edu, or by sending me more
email. Many of these holdings are free to the public, while some are
only available to consortium members. Remember we have just started,
so our holdings are increasing quickly.
We also have a mailing list for discussion and announcements of
general interest to the lexical community. If you want to be part of
this mailing list, please ask.
If you would like to become a full fledged member part of the
Consortium, we can send you copies of the membership agreements.
----------
From: txsil!evan@utafll.uta.edu (Evan Antworth)
Date: Tue, 25 Feb 92 10:11:55 CST
Subject: English lexicon available
Englex is a morphological parsing lexicon of English intended for use with
PC-KIMMO and/or KTEXT. It's 20,000 entries consist of affixes, roots, and
indivisible stems. Both inflectional and derivational morphology are
analyzed. Englex will run under Unix, Macintosh, or MS-DOS (the files are
plain ascii and are identical for all three versions). Because of memory
requirements, to run Englex under MS-DOS you will need a 386 cpu and
the new 386 versions of PC-KIMMO and KTEXT. These 386 versions will use all
available extended/expanded memory and virtual memory. They support
VCPI-compliant memory managers such as DOS 5.0's EMM386 and Quarterdeck's
QEMM. They do not support (or need) Windows.
All of this software can by downloaded by anonymous FTP from the Consortium
for Lexical Research at clr.nmsu.edu [128.123.1.11]. Send e-mail inquiries
to lexical@nmsu.edu. (For a listing of their holdings, get the file
catalog-short in the top directory.) Here are the subdirectories and
file names:
Directory: pub/tools/ling-analysis/englex_pckimmo
englex10.zip Zipped MS-DOS file of englex10
englex10.tar.Z Compressed UNIX tar file of englex10
englex10.hqx Stuffed, binhexed Mac file of englex10
Directory: pub/tools/ling-analysis/morphology/pc-kimmo
pckim108.zip Zipped MS-DOS file of pc-kimmo108 (inc. 386 version)
pckim108.tar.Z Compressed UNIX tar file of pc-kimmo108 sources
pckimmo108.hqx Stuffed, binhexed Mac file of pc-kimmo108
Directory: pub/tools/ling-analysis/morphology/ktext
ktext103.zip Zipped MS-DOS fiel of ktext103 (inc. 386 version
ktext103.tar.Z Compressed UNIX tar file of ktext103 sources
ktext103.hqx Stuffed, binhexed Mac file of ktext103
Englex, PC-KIMMO, and KTEXT are offered as 'freeware' to the academic
community; your feedback is welcomed.
----------
[ The following info Courtesy of Computists International ]
Publisher/Editor: Dr. Kenneth I. Laws, 4064 Sutherland Drive,
Palo Alto, CA 94303, USA. Phone: (415) 493-7390.
Internet: laws@ai.sri.com. (Courtesy of SRI International.)
Copyright (C) 1992 by Kenneth I. Laws. The Computists' Communique
is a service to members of Computists International. Members may
make copies for backup or for recruiting, and may extract articles
if the copyright notice is retained.
Subject: Resources -- dictionaries; linguistics software:
Henk Smit has made available dictionaries in German (160,000
words), French (138,000 words), Dutch, Italian, and English
(53,000 words), in the dictionaries directory on ftp.cs.vu.nl
(Amsterdam). The German dictionary has been copied to
pelican.cit.cornell.edu. [Chaos Corner, 10/4. From WORDS-L
@uga.bitnet.]
Dr. Chaos (Bob Cowles, rdc@cornella.cit.cornell.edu) has
copies of a Glossary of English acronyms (FTP, etc.) explained
in German. [Chaos Corner, 3/24.]
Info about the commercial Moby Lexical Database may be
obtained from Grady Ward (grady@btr.com, btr!public!grady
@decwrl.dec.com), (408) 373-1491. [NL-KR, 11/8.]
A new 100K-entry, 4M-word Chinese dictionary has been
published by Shanghai Foreign Language Education Press. Called
A Comprehensive Chinese-English Dictionary, it includes many new
words in philosophy, social and natural sciences, technology,
politics, economics, trade, education, sports, public health,
international law, tourism, and advertising. Inputs included 116
dictionaries studied by 30 experts. [Xinhua, 2/27. agentsee.]
China's first dictionary of cultural symbols is available from
Tianjin Education Publishing House. It includes 750K words in 3K
entries with 500 color plates. [Xinhua, 3/5. agentsee.]
Gene Ferber's English <--> Japanese Dictionary of Computer
and Data-Processing Terms is available through 6/30 for $39.95
from The MIT Press. (List $80.) Item 216 in catalog 2SALE,
(800) 356-0343.
A new release of EDICTJ, a Japanese/English dictionary, is now
available in the /pub/Nihongo directory of monu6.cc.monash.edu.au.
V92-015 is about 18,500 lines, twice the size of the previous
release. Over 6000 of the new lines are person/place names.
If you can't FTP, try UWollongong's ftpmail@cs.uow.edu.au server
with commands like "HOST monu6.cc.monash.edu.au <cr> GET
/pub/Nihongo/edictj". [Jim Breen (jwb@capek.rdt.monash.edu.au),
sci.lang.japan, 3/17.]
Tim Burress (burress@twics.co.jp) has compiled a
frequency-sorted list of kanji compounds, and Jason Molenda
(molenda@i1.msi.umn.edu) has made it available as
kanjistringlist.Z in ftp/pub/nihongo/tims.data on msi.umn.edu.
If you can't deal with the 23Kb compressed file, FTP the files
kanjistringlist1-3. [sci.lang.japan, 2/24.]
Networking in Japan is being hindered by a multiplicity of
kanji representations. PCs often use the bit-frugal shift-JIS
encoding, which is incompatible with the Japan Industrial Standard
(JIS) code or the International Standards Organization (ISO) 2022
code. Other flavors are AT&T's 16-bit extended Unix code (EUC)
and the ISO-compatible 14-bit New-JIS code used on the Japan
University Network (JUNET). [David Lammers, EE Times, 1/27.]
If you'd like to work with Japanese text, get Electronic Handling
of Japanese Text, by Ken Lunde (lunde@adobe.com). Version 1.2
can be FTP'd from directory japan/japanese on cs.arizona.edu,
but the current copy will always be on ucdavis.edu and msi.umn.edu
[Rick Schlichting (rick@cs.arizona.edu), comp.research.japan,
3/22.]
The "proof" left-associative natural language parser, with
rules for English, is available from scam.berkeley.edu. FTP the
file /src/local/proof/README for instructions. [Craig R. Latta
(latta@xcf.berkeley.edu), NL-KR, 10/7.]
Englex is a free, 20K-entry morphological parsing lexicon
for use with the PC-KIMMO or KTEXT Mac/DOS/Unix programs.
Inflectional and derivational morphology are analyzed using
affixes, roots, and indivisible stems. Files are ASCII, and
large enough that DOS use requires the new 386 program versions.
They can be FTP'd from the Consortium for Lexical Research at
clr.nmsu.edu, directories pub/tools/ling-analysis/[englex_pckimmo,
morphology/pc-kimmo, and morphology/ktext]; queries to lexical
@nmsu.edu. [Evan Antworth (txsil!evan@utafll.uta.edu), NL-KR
Digest, 2/27.]
Finding names in text or checking them in database files
sounds like an interesting application. For training data, Mark
Kantrowitz (mkant@cs.cmu.edu) has compiled 2924 male first names
and 4964 female names. (Most are English, and the lists not
random or complete in any sense. Also, there may be neutral names
that appear on only one list.) [comp.ai, 12/27.] (I'd love to
have a spelling checker that wouldn't trip over all the personal
names in the Communique -- or even one that can spell hippocampus.
While you're up, could I also have a grammar checker that
understands corporate and university names, software trademarks,
city names, street addresses, phone numbers, and net addresses?
Commercial parsing technology is nowhere near such intelligence,
so there must be some opportunities for doctoral theses and
related employment.)
SHOEBOX is an MS-DOS ASCII database-management program
for field linguists, written by John Wimbish of the Summer
Institute of Linguistics. You can use it to file cultural
notes, maintain lexicons, interlinearize text, do grammatical
analysis, and maintain address lists and catalogs in up to 7
open databases. Database entries can reference other databases.
SHOEBOX accommodates special sort orders and selection criteria,
and includes a flash card function for language learning.
Version 12a is freeware, available by FTP from
pd1:<msdos.linguistics>sh12a.zip on wsmr-simtel20.army.mil.
Archives that mirror SIMTEL20 are
/mirrors/msdos/linguistics/sh12a.zip on wuarchive.wustl.edu;
/pub/PC/simtel-20/linguistics/sh12a.zip on rana.cc.deakin.oz.au;
and on /pub/msdos/science/linguistics/sh12a.lzh nic.funet.fi.
[Evan Antworth (evan@txsil.lonestar.org), sci.lang, 10/21.]
If you want to write in more than one language, get
Multi-Lingual Scholar v4 for your PC. The $595 program --
$357 for students, $175/$225 upgrade from v3.2 -- comes with
Latin languages, Hebrew, Arabic, Cyrillic, and Greek, plus a
choice of three additional languages or fonts. Variant keyboard
layouts are available, and can be made to switch automatically
when you switch languages. A font editor permits design of new
fonts and keyboards, and linguists have used these to provide
Amharic, Aramaic, Bengali, Burmese, Coptic, Devanagari, Egyptian
hieroglyphics, Gujarati, Gurmukhi, Inuktitut, IPA, Korean, Lao,
Malayalam, Nepali, Phoenician, Pushtu, Sanskrit, Sinhalese,
Syriac, Tamil, Telugu, Thai, Ugaritic, Urdu, and Romanized
Vietnamese. It isn't quite a layout or desktop publishing program
-- no graphics and outline fonts -- but it does have snaking
columns, style templates, and other powerful features.
Ideographic languages like Chinese and Japanese are not supported,
nor are vertical scripts like Manchu, Mongol, Cambodian, and
Tibetan. Arabic and Devanagari script do have kashideh ("long
connectors"), and character variants at different word positions
(in Arabic, Hebrew, and Greek) are handled automatically.
Spelling checkers are available for common languages, and a
variety of export formats are supported. Proportional output at
300dpi looks very professional, although you may be limited to
9pt and 12pt fonts. Gamma Productions, Inc. (Santa Monica, CA),
(213) 394-8622. [Birrell Walsh, MicroTimes, 2/17.]
Thesaurus Construction System (Professional Edition), by
Liu-Palmer (Los Angeles), is one of at least ten such programs for
PCs. For a discussion, see Jessica Milstead's "Thesaurus Software
Packages for Personal Computer," DATABASE, V12 N6, 12/90, pp. 61-
65, and her letter to the editor in the 6/91 issue. Ms. Milstead
is an independent consultant specializing in thesaurus and index
development; (203) 740-2433. [Paula Hane (phane@well.sf.ca.us),
PACS-L, 10/22.]
An evaluation of MS Windows grammar checkers by Ingram
Laboratories rated RightWriter (Que Software) more accurate than
default settings of Grammatik 2.0 (Reference Software Inc.) and
Correct Grammar 1.0 (Writing Tools Group Inc.). RightWriter's
6,500 rules flagged the most errors and made the fewest improper
suggestions. It is the only product that suggests splitting
complex sentences, flags long paragraphs, and challenges
unsubstantiated claims such as "expertise". [Business Wire, 3/9.
agentsee.] Perhaps the other companies are ready to buy some NLP
R&D.
-----------------------------------------------------------------------
From: Gabriele Scheler <scheler@informatik.tu-muenchen.de>
To: nl-kr@cs.rpi.edu
Subject: Announcement: Report available - 36 Semantic Problems
Date: Wed, 11 Aug 1993 13:29:42 +0200
The following report is available via anonymous ftp.
It contains a collection of problems, for which additions and
citical suggestions are especially welcome.
(s. end of abstract)
36 Problems for Semantic Interpretation
G. Scheler
Institut f\"ur Informatik
Technische Universit\"at M\"unchen
80290 M\"unchen, Germany
e-mail: scheler@informatik.tu-muenchen.de
Abstract:
This paper presents a collection of problems for natural language analysis
derived mainly from theoretical linguistics.
Most of these problems present major obstacles
for computational systems of language interpretation.
The set of given sentences can easily be scaled up
by introducing more examples per problem.
The construction of computational systems could benefit from such a collection,
either using it directly for training and testing or as a set of benchmarks
to qualify the performance of a NLP system.
This collection has been started during the work on a semantic interpretation
system. It could certainly be improved by a broader collective
effort. If a substantial number of additions, revisions and further
suggestions are received by the author, a second edition may be issued.
>ftp flop.informatik.tu-muenchen.de
or: ftp 131.159.8.35
>login: anonymous
>Password: e-mail address
>cd pub/fki
>binary
>get fki-179-93.ps.z [this seems really to be fki-179-93.ps.gz -Ed.]
>quit
(on your local machine)
gunzip fki-179-93.ps.z
Print using your local POSTSCRIPT print command
-----------------------------------------------------------------------
Date: Fri, 13 Aug 1993 13:08:31 -0500
From: rsun@athos.cs.ua.edu (Ron Sun)
Subject: CFP: ISIKNH'94, Integrating Knowledge and Neural Heuristics
To: nl-kr@cs.rpi.edu
CALL FOR PAPERS
International Symposium on Integrating Knowledge and Neural Heuristics
(ISIKNH'94)
Sponsored by University of Florida, and AAAI,
in cooperation with IEEE Neural Network Council,
and Florida AI Research Society.
Time: May 9-10 1994; Place: Pensacola Beach, Florida, USA.
A large amount of research has been directed
toward integrating neural and symbolic methods in recent years.
Especially, the integration of knowledge-based principles and
neural heuristics holds great promise
in solving complicated real-world problems.
This symposium will provide a forum for discussions
and exchanges of ideas in this area. The objective of this symposium
is to bring together researchers from a variety of fields
who are interested in applying neural network techniques
to augmenting existing knowledge or proceeding the other way around,
and especially, who have demonstrated that this combined approach
outperforms either approach alone.
We welcome views of this problem from
areas such as constraint-(knowledge-) based learning and
reasoning, connectionist symbol processing,
hybrid intelligent systems, fuzzy neural networks,
multi-strategic learning, and cognitive science.
Examples of specific research include but are not limited to:
1. How do we build a neural network based on {\em a priori}
knowledge (i.e., a knowledge-based neural network)?
2. How do neural heuristics improve the current model
for a particular problem (e.g., classification, planning,
signal processing, and control)?
3. How does knowledge in conjunction with neural heuristics
contribute to machine learning?
4. What is the emergent behavior of a hybrid system?
5. What are the fundamental issues behind the combined approach?
Program activities include keynote speeches, paper presentation,
and panel discussions.
*****
Scholarships are offered to assist students in attending the
symposium. Students who wish to apply for a scholarship should send
their resumes and a statement of how their researches are related
to the symposium.
*****
Symposium Chairs:
LiMin Fu, University of Florida, USA.
Chris Lacher, Florida State University, USA.
Program Committee:
Jim Anderson, Brown University, USA
Michael Arbib, University of Southern California, USA
Fevzi Belli, The University of Paderborn, Germany
Jim Bezdek, University of West Florida, USA
Bir Bhanu, University of California, USA
Su-Shing Chen, National Science Foundation, USA
Tharam Dillon, La Trobe University, Australia
Douglas Fisher, Vanderbilt University, USA
Paul Fishwick, University of Florida, USA
Stephen Gallant, HNC Inc., USA
Yoichi Hayashi, Ibaraki University, Japan
Susan I. Hruska, Florida State University, USA
Michel Klefstad-Sillonville CCETT, France
David C. Kuncicky, Florida State University, USA
Joseph Principe, University of Florida, USA
Sylvian Ray, University of Illinois, USA
Armando F. Rocha, University of Estadual, Brasil
Ron Sun, University of Alabama, USA
Keynote Speaker: Balakrishnan Chandrasekaran, Ohio-State University
Schedule for Contributed Papers
----------------------------------------------------------------------
Paper Summaries Due: December 15, 1993
Notice of Acceptance Due: February 1, 1994
Camera Ready Papers Due: March 1, 1994
Extended paper summaries should be
limited to four pages (single or double-spaced)
and should include the title, names of the authors, the
network and mailing addresses and telephone number of the corresponding
author. Important research results should be attached.
Send four copies of extended paper summaries to
LiMin Fu
Dept. of CIS, 301 CSE
University of Florida
Gainesville, FL 32611
USA
(e-mail: fu@cis.ufl.edu; phone: 904-392-1485).
Students' applications for a scholarship should also be sent
to the above address.
General information and registration materials can be obtained by
writing to
Rob Francis
ISIKNH'94
DOCE/Conferences
2209 NW 13th Street, STE E
University of Florida
Gainesville, FL 32609-3476
USA
(Phone: 904-392-1701; fax: 904-392-6950)
---------------------------------------------------------------------
---------------------------------------------------------------------
If you intend to attend the symposium, you may submit the following
information by returning this message:
NAME: _______________________________________
ADDRESS: ____________________________________
_____________________________________________
_____________________________________________
_____________________________________________
_____________________________________________
PHONE: ______________________________________
FAX: ________________________________________
E-MAIL: _____________________________________
---------------------------------------------------------------------
End of NL-KR Digest
*******************