IRList Digest Volume 3 Number 21

Published in
· 1 year ago
IRList Digest           Thursday, 6 August 1987      Volume 3 : Issue 21 

Today's Topics: 
   Email - Problems, plans for IRList 
   Address - Dr. M.B. Koll, Personal Library Software 
   Query - Contact for obtaining SMART 
         - Suggestions for providing online access to Canadian Tax Act 
   Seminar - Responsa system demonstration 
           - Short-context disambiguation in large text databases 

News addresses are ARPANET: fox@vtopus.cs.vt.edu  BITNET: foxea@vtvax3.bitnet 
   CSNET: fox@vt   UUCPNET: seismo!vtisr1!irlistrq 

---------------------------------------------------------------------- 

Date: Thu, 6 Aug 87 15:50:33 edt 
From: fox (Ed Fox) 
Subject: electronic mail problems and plans relating to IRList 

1. Recent problems 
  Two weeks ago we had lightning hits that caused around $40K of 
damage to our departmental computers.  The machine that IRlist is 
usually composed on was down for that period, so it has been difficult 
to get news out. I will attempt to catch up on this in the next week. 
If you sent in news and it does not appear soon, please send your 
communication in again, since some messages were lost.  I apologize 
for any inconvenience. 

2. Disappearance of seismo as UUCP connection 
By 1 September, the machine called "seismo" that is at the Center for 
Seismic Studies will stop serving as a polling center for UUCP mail. 
Please stop using seismo!vtisr1!fox as a UUCP address to reach me. 
We will have our machine "vtopus" connected to several other UUCP 
machines, so fox@vtopus.uucp or an address with the appropriate route 
should work as a replacement.  I do not encourage UUCP traffic, but if 
it is necessary, use vtopus!fox rather than vtisr1!fox since vtisr1 is 
becoming more isolated than before. 

3. Connection to the ARPANET 
By early September there will be some changes, hopefully improvements, with 
IRList mail handling.  The main point is that our machine "vtopus" 
will eventually become the central point for all IRList business.  Virginia Tech 
is now part of SURANET, which is part of NSFNET, and so we are on the DARPA  
Internet.  When we get all the addressing and other software issues corrected,  
vtopus will be accessible for FTP and other services.   I will post information 
when it is available and when we have finished testing.  At that time, 
people who want access to back issues in quantity will be able to get 
direct access; up till then I will honor requests for small numbers of 
back issues.  Later, vtopus will also be on BITNET, so UUCP, ARPANET, 
and BITNET mail will be from one place. 

4. Interim situation 
Meanwhile, please try to send mail to my BITNET address, 
foxea@vtvax3.bitnet, which will always remain as an option for 
reaching me.  ARPANET and CSNET members can reach that with address 
foxea%vtvax3.bitnet@wiscvm.wisc.edu and BITNET members can reach it 
directly.  The address for vtopus is now and will continue to be 
fox@vtopus.cs.vt.edu but I prefer it not be used a great deal till our 
ARPANET connection is perfected. 

4. Help with address changes 
Please notify me in advance if you change address or wish to drop 
your subscription, unless you are handling these matters with someone 
who maintains a local redistribution.  Please try to give complete 
addresses, and if it is not obvious, indicate if your address is 
relative to BITNET or ARPANET or UUCPNET since it is sometime hard to 
reach people.  If you stop receiving IRList, be sure to let me know 
and we can try to see what happened - I drop people when mailers tell 
me messages are not getting through. 

Thanks for your patience! - Ed 

------------------------------ 

Date: Thu, 6 Aug 87 15:58:43 edt 
From: fox (Ed Fox) 
Subject: Announcement from Dr. Matthew B. Koll 

Dr. Matthew B. Koll has asked me to announce his new address: 
  Personal Library Software 
  15215 Shady Grove Road 
  Rockville MD 20850 
  (301) 926-1402 
He is no longer with George Mason University, and has shifted efforts 
from his former company, KNM Inc., which marketed SIRE, to devote full 
time to Personal Library Software.  They have a package which is an 
enhanced version of SIRE. 

Dr. Koll does not now have an ARPANET address, so should be contacted 
directly at the address above.  He may have openings for experienced C 
programmers who are knowledgeable about information retrieval, and 
have some background in UNIX. 

------------------------------ 

Date: Fri, 24 Jul 87 16:28:09 PDT 
From: George Cross <cross@cs1.wsu.edu> 
Subject: SMART 

Hi, 
Do you have a contact for getting a copy of SMART from Cornell?  I remember 
seeing a license agreement posted some time ago and Don Kraft ordered one 
for LSU.  Thanks. 

 ---- George 

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
 George R. Cross                cross@cs1.wsu.edu 
 Computer Science Department    ...!ucbvax!ucdavis!egg-id!ui3!wsucshp!cs1!cross 
 Washington State University    faccross@wsuvm1.BITNET 
 Pullman, WA      99164-1210    Phone: 509-335-6319 or 509-335-6636 

[Note: contact chrisb@cornell.arpa by electronic mail, or write to 
Professor Gerard Salton at Cornell. - Ed] 

------------------------------ 

Date: Fri, 10 Jul 87 17:06:49 EDT 
From: seismo!mnetor!lsuc!dave 
Subject: Indexing of a complex statute for on-line retrieval 
      
We at the Law Society of Upper Canada are responsible for 
post-law school legal education in Ontario, both for call to 
the Bar (the Law Society governs the legal profession in the 
province and admits new members through the Bar Admission Course) 
and for continuing legal education. 
      
We've been using CAI for several years, particularly to teach 
Canadian income tax law.  Our tax courses are taken by over 1,000 
students a year plus a number of lawyers and others, and we're 
developing more advanced courses for lawyers' use. 
      
We have the opportunity to acquire an on-line version of the 
(Canadian) Income Tax Act, a rather massive statute.  In its 
published version, along with history of changes, regulations 
and various minor annotations, it's over 1400 pages.  I'm told 
the raw on-line data is something like 5-10Mb.  The publisher is 
interested in us putting the Act up on our system so they can 
gain experience in the "electronic publishing" field, and learn 
how it might be used and how it can best be organized for retrieval. 
They are therefore willing to let us have it for free. 
      
My interest is in making this tremendously useful information 
available to people who are on our system anyway for studying 
tax through CAI.  If the experiment is successful, we might look 
to putting other primary and secondary tax sources on-line in the 
future. 
      
Ours is a UNIX system, a Perkin-Elmer 3220 (roughly the power of 
a VAX-11/750) running UNIX version 7.  We're educational source-licensed 
for UNIX and can upgrade the license to System V if necessary. 
      
My question is: how should I go about putting the data up on-line? 
(We'll be getting the data in raw ASCII form from a different system.) 
We don't have a lot of time to devote to this, as we're very busy 
with other projects.  Are there existing tools I can make use of? 
      
At the most primitive level, I imagine I would just stick the 
data into a UNIX file and give people existing tools like "grep" 
and "more" for searching and browsing through it.  I can imagine 
indexing the section and subsection numbers too, perhaps by 
location in the file so the user could seek to the right provision 
quickly.  I'm a real novice in the field of information retrieval, 
however. 
      
I'd appreciate any suggestions as to (1) quick solutions or existing 
tools which will make the data more usable; (2) references to literature 
on storage/retrieval of complex statutes; and (3) specific ideas of 
more complex indexing or retrieval mechanisms that we might implement 
down the road.  Many thanks. 
      
David Sherman 
Computer Education Facility 
The Law Society of Upper Canada 
Osgoode Hall 
Toronto, Canada  M5H 2N6 
      
dave@lsuc.uucp        +1 416 947 3466 
{ seismo!mnetor  pyramid!utai  decvax!utcsri  ihnp4!utzoo } !lsuc!dave 

[Note: There are various retrieval packages that might work. The SMART 
system is available from Cornell for a nominal charge, but may not run 
on your hardware/software.  The Personal Librarian would probably work 
and Matt Koll could tell you. See other msgs in this digest for 
contact information about these two systems.  There are many others 
around, and many people working on legal information retrieval - I 
hope some will contact you with details and you will let us know what 
you decide. - Ed] 
      
------------------------------ 

Date: Thu, 6 Aug 87 16:49:24 edt 
From: fox (Ed Fox) 
Subject: Demonstration of RESPONSA System 

 YOU  ARE  INVITED  TO  AN  ONLINE DEMONSTRATION  OF  THE  RESPONSA   SYSTEM 
                    An advanced full-text retrieval system 
                      (with morphological processing) for 
                      2000 years of Rabbinical Literature 

                                       by 
                                 Yaacov Choueka 
                          Bell Communications Research 
                             Morristown, New Jersey 
                          (on sabbatical leave from the 
                 Department of Mathematics and Computer Science 
                     Bar-Ilan University, Ramat-Gan, ISRAEL) 

WHEN:	Wedn. August 12 from 1:30 - 3pm, and 7:30 - 9pm 
WHERE:	Newman Library, 6th floor board room 
WHAT:	Come and stop by if you would like to see 
    * An interesting full-text retrieval system with a 
      remarkably fast response time (despite some "hostile" 
      parameters such as the size of the database, the 
      complexity of the search, the long and not-so-reliable 
      telephone communications lines to Israel, and the 
      1200-baud transmission rate). 
    * An automatically lemmatized (in a context-free sense) 
      50-million words corpus (probably the only 
      lemmatized one of this size in any language). 
    *A complete morphological component embedded in an 
      operational     retrieval system. 
    * An online module for accurate and complete 
      morphological analysis of any word in the language. 
    * Some beginnings of applications of a short-context 
      approach (how many different "following neighbors" 
      are there for a given  ambiguous word with 200,000 
      occurrences?   How many of these neighbors occur 
      more than 1000 times, and which are they?    Do they 
      disambiguate the given word?   How can this 
      information be used in on-line retrieval or dictionary 
      building contexts?). 

WHO:	 
    Dr. Choueka has almost twenty years of experience in 
    teaching and research in computer science, some of it (in 
    the early years) in finite automata and formal languages 
    theory, but most of it in information retrieval, 
    computational     linguistics and text processing. He was 
    part of the team that initiated the RESPONSA  in 1966, 
    and has served as its Director and Principal Investigator since 1975. 

------------------------------ 

Date: Thu, 6 Aug 87 16:50:05 edt 
From: fox (Ed Fox) 
Subject: Seminar on Disambiguation 

                          COMPUTER  SCIENCE  SEMINAR 
                             McBryde Hall Room 201 
                       Wedn. August 12,  10:15 - 11:30AM 

                             Short Is Beautiful: 
             Short-context disambiguation in large textual databases 

                                       by 
                                 Yaacov Choueka 
                          Bell Communications Research 
                             Morristown, New Jersey 
                          (on sabbatical leave from the 
                 Department of Mathematics and Computer Science 
                     Bar-Ilan University, Ramat-Gan, ISRAEL) 

 
ABSTRACT: 
   Morphological disambiguation (i.e., finding the 
intended "correct" meaning of an ambiguous word in a 
specific context) is an intellectually challenging and 
practically important issue in automatic text processing. One 
of the suggested pragmatic approaches, specially viable for 
large textual databases, the short-context method, proposes 
to use the (very) short context of an ambiguous word as an 
adequate vehicle for its disambiguation. An experiment 
carefully designed to test this idea and its validity was 
developed and applied to a small French corpus some time 
ago, and the results were recently reported elsewhere.    
Based on the clearly positive outcome of this test, an online 
short-context disambiguation program was incorporated as 
an operational component in the Responsa full-text retrieval 
system (Hebrew, 50 million words), and is being now tested 
on a large scale.  
   Using this program, the user can submit a word W to 
the system, which will respond by instantly displaying a list 
of all the different right (left) neighbors of W in the 
database, together with the neighbor's "local" frequency (its 
frequency as a neighbor of W), ranked by the local 
frequencies.   Preliminary findings show that more often 
than not such a short context of the word is enough to 
correctly disambiguate its appropriate occurrences. If 
needed, however, a further expansion of the right neighbor 
into the corresponding set of its right ones can again be 
displayed, giving the set of all the different two-word right 
contexts of the word under examination.            
   It was found that, in general, no more than a few 
minutes are required for a casual user to decide on the 
intended meaning of an ambiguous W in its most frequent 
contexts, thus resulting in the immediate disambiguation of 
thousands of occurrences of W in the text. When 
automatically recorded, the user's decisions can greatly help 
in achieving a "context-sensitive" lemmatization of the 
corpus, once its "context-free" one has been completed.   The 
method is also very useful in information retrieval contexts, 
where it gives the user an efficient tool for specifying, in a 
query with an ambiguous word, which of the word's 
contexts should  be retrieved, thus greatly enhancing the 
precision of the retrieval.    Finally, it is expected that by 
gradually accumulating these disambiguation decisions in the 
appropriate word-entry of the available automatic 
dictionary of the language, "local expert systems" for many 
ambiguous words will develop, that can greatly facilitate  
ambiguity resolution in practical situations.  

------------------------------ 
      
END OF IRList Digest 
********************
IRList Digest Volume 3 Number 21

Share this article

Let's discover also

IRList Digest Volume 3 Number 40

IRList Digest Volume 2 Number 60

IRList Digest Volume 4 Number 57

IRList Digest Volume 1 Number 12

IRList Digest Volume 5 Number 01

IRList Digest Volume 2 Number 31

IRList Digest Volume 3 Number 45

IRList Digest Volume 1 Number 15

IRList Digest Volume 1 Number 13

IRList Digest Volume 4 Number 36

Recent Articles

Pizzette al taglio

Melikki and the Song of the Whale

Die Bayrische Hackerpost Systems 85

Die Bayrische Hackerpost IFA85

Die Bayrische Hackerpost 13

Die Bayrische Hackerpost 12

Die Bayrische Hackerpost 11A

Die Bayrische Hackerpost 11

Die Bayrische Hackerpost 10

Die Bayrische Hackerpost 9

Recent Comments