Copy Link
Add to Bookmark
Report
Ictari Issue 17
ICTARI USER GROUP ISSUE 17 December 1994
___ ______ ___ _________ _________ ___
\__\ \ __\ \ \__ \______ \ \ _____\ \__\
___ \ \ \ __\ _____\ \ \ \ ___
\ \ \ \ \ \ \ ____ \ \ \ \ \
\ \ \ \_____ \ \____ \ \__\ \ \ \ \ \
\ \ \ \ \ \ \ \ \ \ \ \
\__\ \_______\ \______\ \________\ \__\ \__\
* m a g a z i n e *
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
I C T A R I U S E R G R O U P
63 Woolsbridge Road, Ringwood, Hants, BH24 2LX Tel. 0425-474415
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
INDEX FOR ISSUE 17
==================
ASSEMBLY GEM Window redraw routine.
Sprite packer/unpacker routines.
Rectangle intersection detection code.
Shareware program encryption system.
C Useful MagiC C functions.
GEM Tutorial by J White. Part 7.
Prime number generator.
GFA Mandelbrot generator.
ShadeBob, 4 bit real time shading.
GFA menu locating techniques.
STOS Factorial calculator program.
Wordsearch puzzle solving program.
Using STOS on the Falcon.
PASCAL Program to calculate shift bonuses, etc.
MISC Algorithm Corner. Roman/Arabic number conversion.
Barrel program. Printer output redirection.
Current membership list.
Index to Issues 1-15.
In next months issue of ICTARI (this may change) :-
ASSEMBLY Roman to Arabic number conversion routine.
Arabic to Roman number conversion routine.
AES/VDI function reference sheets.
C GEM Tutorial by J White. Part 8.
GFA
BASIC CAD 3D to DXF converter program.
STOS Screen demo code routines.
PASCAL
MISC
For future issues :-
Polyline drawing routines in machine code.
Bezier curve drawing routines.
Picture compression routine for IMG pictures.
HP DeskJet/LaserJet compression routines (Mode 2 TIFF).
Picture switching techniques.
Printer driver code for printing mono/colour images.
Using Calamus fonts.
Sorting algorithms.
Using the BitBlit routines.
Code for using the File Selector.
Using multiple rasters in machine code.
Overscan techniques.
STOS sprites.
----------------------------------------------------------------------
EDITORIAL
=========
MEMBERSHIP
----------
Another three members for this month, welcome to them.
FOREIGN LANGUAGES
-----------------
I am writing a GEM program (a Calamus font display/print program)
which could be used in other European countries. As we have members in
Sweden, Denmark, Finland, Belgium and Greece, perhaps they could let
us know what is the best way to convert a program to their native
language. If all the text displays could be confined to the associated
resource file, is it possible to just rewrite the resource file with
the equivalent foreign words or are there other complications. Does
the ATARI operating system contain all the different characters for
European languages (including Greek) or is the ROM itself different
for foreign countries? Are the keyboards different to the English
versions and if not, how do you enter characters with accents on them?
If anyone has information on writing programs for use in other
languages perhaps they could send it in to us for publication.
NEXT MONTH
----------
If next months issue looks a little sparse it's just because I haven't
had time to sort it out yet, spending too much time Christmas shopping
I think. Hopefully, most of you will spend some of the Christmas
holiday preparing some more articles for ICTARI.
----------------------------------------------------------------------
CORRESPONDENCE
==============
To: *.*
From: Martin Milner
Problem 1
---------
I couldn't get the fire button on my joystick to register anything in
Nick Bates joystick routine (Ictari 15). My machine is a 2.5MB STE
with TOS 1.62. (It doesn't work with MagiC, either). It worked fine in
STOS.
I was going to write about it, but after much experimentation, I've
found it works if you disable the mouse first, so now I won't bother!
Problem 2
---------
I am writing a game in STOS making heavy use of the zoom command, but
zoom is so slow. Has anyone got an faster assembler routine which I
could use to replace it?
It also uses the reduce command, but I have got an assembly program,
(kindly written by Lee Upcraft), which hopefully, Martin Cubitt is
going to put into his EXTRA extension when he gets time.
If anyone could produce a zoom routine, then perhaps he could add that
too.
A quick comment on the subject of disk format, etc.
I prefer the disks to be standard 9 sector format and MSDOS
compatible. This is so that I can read them on my PC at work! I can
also print out any documentation there as well. The magazine is so
cheap, I can't see why anyone should think it's not value for money.
Also, the extra files that would have gone on an extended format disk
can go in the next issue, so making your available articles go
further!
*/ This is basically what we try and do, the problem is that sometimes
we get articles or programs which are very large. If we didn't
compress them or use a 10 sector format we would probably end up with
only one or two items on the disk. Generally though we will try and
keep to the standard format as much as possible. ICTARI /*
----------------------------------------------------------------------
To: Peter Hibbs
From: PoshPaws
As far as I can tell, most dictionaries use the same simple coding
technique:
All words are stored in alphabetical order and are subjected to a
peculiar form of delta coding (see last issue), such that only the
changes to the last word (and where the changes start) are stored.
e.g. ACTUAL,ACTUALISE,ACTUALISED,ACTUALISES,ACTUALISING,ACTUALIST
becomes ACTUAL (6) ISE (9) D (9) S (8) ING (8) T
where the numbers in brackets are ASCII numbers representing how many
letters to keep and are immediately followed by the letters to add to
make the new word.
I would never recommend checking every entry in a dictionary and would
put 27 longwords pointing to the start of each letter. Deleting words
would just mean putting null bytes (0) where the word was and getting
the checking routine to ignore null words (of length zero).
Adding words would normally mean a re-sort of the file so the 27th
longword pointer would point to a miscellaneous area where words are
added. This would be searched if the first letter search was non-
successful. If the number of words searched in the letter section was
compared to the number of words searched in the miscellaneous section,
this could be used as a measure of when a re-sort is needed.
The idea of using extra bytes for letter combinations is a good one,
although smaller combinations would be more effective I think
(qu,th,ph,ch,sh,st,etc). This would means extra pointers for words
starting with these combinations and would require that the word to be
searched for is converted (tokenised) into the same format.
Bits for common endings is great for things like scrabble, where you
don't know the word you're looking for, but you would have to consider
the overheads involved in bit testing the ending byte for the bits you
need. You could, with a bit of work, use the same technique for common
starts (pre,re,anti,etc).
On the other hand you could Huffcode the words as they stand and then
Huffcode the word you are looking for with the same tree and compare
bit patterns. You would find the space saving would be excellent, but
the search overheads would be quite large in terms of speed.
P.S. Quicksort is only fastest if the file to be sorted is not in
order. It becomes very slow when a file is already sorted. Use a
heapsort for large sorts because it is only 20% slower than quicksort
at worst and very much faster on a file that is already nearly in
order.
To: Dick Teuber
There are a few programs that do what you require in rerouting printer
output;
I have included the program BARREL on disk, which should be able to do
what you require for smaller printouts. If your printing programs use
GEMDOS programs, you can write a small program to reroute output by
opening a file and 'Fforce'ing the printer handle to that file handle
before your program is called.
There is a program floating about called re-route (which I have
somewhere) that re-routes to the serial port, but I don't know if it
will route to a file.
If you're still stuck, let us know exactly why you're stuck and
something can easily be written to do what you need.
To: Marten Lindstrom
ALL Atari Screens are WORD INTERLEAVED (including all TT modes that I
am aware of). Mono can be thought of as 1 plane of word interleaved.
High Colour (previously called True Colour) is a special case where
each word is a pixel colour value and can be thought of as one plane
of word sized pixels.
As far as I can tell, VsetRGB waits for the VBL to do the actual
update.
----------------------------------------------------------------------
To: Peter Hibbs
From: Keith Baines
Text compression in dictionary files.
You asked about this in the November 1994 Ictari disk. I did some
assembler programming using dictionary files a few years ago when I
developed a spelling checker desk accessory to use with the Wordup
word processor. This used a mixture of RAM and disk based dictionaries
and coded words to keep file sizes and memory usage down. The whole DA
would be too long to include in the disk magazine (and I guess that
spelling checkers are a minority interest among members) and in any
case is no use to anyone without the commercial program it was
designed to complement. So I've extracted the main routines and put
them in a TOS program to demonstrate how they work, and written up a
commentary of why I chose those particular dictionary formats.
*/ Thanks for the code and programs, we will publish them early next
year. Keith's code also uses the techniques described by Posh
Paws above and should be of interest to anyone using text
compression. See also next letter. ICTARI /*
----------------------------------------------------------------------
To: Peter Hibbs (and others interested)
From: Mrten Lindstrm
DICTIONARY FILES
It so happens that I made some investigations of the Protext
dictionary files a while ago. Unfortunately the personal notes I made
are a bit messy, and when reading them now I discovered a few
suspicious things that make me not trust them 100%. (But if you or
anyone else are interested I could tidy them up.) I think you only
wanted the general ideas though, rather than the exact file format, so
here goes:
The words in all Protext dictionary files are stored in (kind of)
alphabetical order. And for each word is stored a number that tells
how many letters to remove from the end of the previous word when
deriving the current one. Plus the letters to add to this root.
E.g. 'shock' and 'short' could be stored as figuratively 'shock2rt'
(ascii isn't really used for either letters or count numbers)
The range of possible letters excludes special uppercase versions but
includes hyphen and apostrophe. (See below for names/abbreviations.)
The information unit in the LEX files is a byte, i.e. it has a range
of 256 different possible values, much more than needed to cover each
of the used letters in any (western) language. The excess is partly,
as you guessed, used as codes for common endings, partly for common
two-letter combinations to be used anywhere in a word (about a hundred
of each) and partly for the 'removal counts' mentioned above (0-15,
more than one code can be combined), that also - in the main LEX files
- serve as separators between each word. Each of the ending codes
actually have a count/separator function "built in", which means that
they often add as little as one byte per word to the (main)
dictionary. (The user LEX files are not as tightly built as the main
ones and each word will here take up at least 6 bytes.)
A few codes are also used for special functions and the last possible
code value, 255, is used as an 'escape' for the following byte, thus
making available a further 255 possible values. I haven't seen all of
the codes used though and of the 'escaped' codes only a few. These are
codes at the end of a word to mark it as name or abbreviation, in
which the first or all letters are required to be upper case, or/and
the last or all letters are followed by a stop. One code is used to
add the literal ascii spelling of a word with foreign letters. For
instance if English is used and a word like say 'Mrten' is stored,
something like 'Marten<AsciiBegin>Mrten<AsciiEnd>' would be written
(where the first Marten wouldn't be in ascii but encoded). An ascii
supplement would also be used for words with irregular use of
uppercase letters or stops (neither all of the letters nor just the
first/last of them affected). Finally, I made this investigation
before automatic hyphenation was implemented in Protext but maybe you
could now expect to see some codes used to mark deviations from the
general hyphenation rules.
In the QIC files the information unit is a nybble (half a byte), so
its number of possible values is only 16. But by using the last
possible value - 15 - as an escape code a further 15 values are made
available for at total of 31, enough for an alphabet but with no great
margins. (In fact all of the 31 codes are used as letters in the
Swedish QIC file. In the English file 'only' 28 including hyphen and
apostrophe. The assignment is made through a table in the file header,
a list of letters in order of frequency - most frequent letters first.
The first 15 letters will be assigned one-nybble codes, the rest will
get the 2-nybble ones.) To still manage the control, the QIC
dictionary is divided into subsections with words of a fixed length in
each. The program thus always knows where each word ends and what
nybble to interpret as count rather than letter. And since a count of
zero would make no sense with a fixed word length, this can be used as
an escape for e.g. names/abbreviations (using uppercase letters or
stops).
But the QIC format may not be ideal for adding and removing words
to/from the list. The LEX files are split into 1 kb sections (spookily
the exact size of a DOS cluster on disk), each of which is begun with
an uncompressed word (not deriving from earlier words). This word is
also stored in an index for the dictionary, so that the program
quickly can establish in which section a certain word could possibly
be found. (In the main LEX files this lead word is removed from the
dictionary itself to save a little space. These files also have each
1K section divided into 8 subsections, not represented in the index
and of lengths variable in bytes but fixed in number of words.)
The user LEX files use a random order for the 1K sections, each of
them with a header containing the number of the next and the number of
bytes actually used in this section. By keeping a little workspace in
each section you would usually be able to insert words without having
to rearrange the data in more than one section. Some space is wasted,
but if you feel the need for it you could perhaps sometimes do a
global reorganization.
SOME FURTHER NOTES: If I were to design a dictionary (at least for a
language with more inflected noun forms than English - and I'm
thinking then surprisingly of Swedish) I would probably include an
inflection class code with each word. This could save space and, more
important, I have when using Protext found it very frustrating to be
repeatedly warned about words that I know I have already added to the
user dictionary - only in a differently inflected form. Not many
classes are needed, one of which would be 'irregular' meaning that
each form is stored as a separate word.
But, admittedly, for English - with its correspondence between
pronunciation and spelling - the phonetic information stored in the
Protext LEX files may be more useful, especially for someone like me.
(And thank you editor for correcting some other mistakes I make in
this language.) This works by assigning a range of different codes to
one and the same letter depending on which sound value it has in the
specific context (e.g. 'p' is encoded differently when in front of an
'h' than in most other positions, 'c' has a k-sound code before
certain vowels and an s-sound code before others etc.) and each 'x' is
represented by two codes - one for the k- and one for the s sound.
Also, general sound character values are calculated for each word,
based on its consonantal sounds - and only 6 or 7 general types of
these are distinguished, e.g. all 'tongue explosives' (d/t/g/k-sounds)
are equalled for this purpose. These values are encoded as two letters
prefixing each word. (But probably will add less to the file size than
you first think, considering how the compression is done.)
One further special code that could deserve to be mentioned is a
'consonant doubler', i.e. a code that is used in place of a second
consonant of the same type (except c). Although this code gets its
full meaning only in the phonetic system of Protext (where it is
silent), it can possibly be of some use without it as well. Namely as
the first code in definitions for some of the endings (and 2-letter
codes).
Hope that this is of some help!
To: *.*
Last month I got very excited about the Civilization IFF ILBM pictures
containing 5 bitplanes and 32 colours. Well they do, but on closer
inspection the 5th plane turns out to be cleared (in the ST version).
Sorry about that. Similarly the images are 256 lines high (56 more
than the ST screen) but the bottom 56 are cleared. This means,
although the images are compressed of course, that there is about half
a K of useless garbage in each image file.
I still believe that the principle holds true though, that 5-plane
images often look acceptable even when the 5th plane is skipped. The
reason is of course partly that developers for both Amiga and ST,
could save some effort designing images that way (I wouldn't be
surprised if the Amiga Civilization images are the same as the ST ones
with the 5th plane filled in). And I also believe that a common
practice on the Amiga anyway is to make the last 16 colours 'shadow
versions' of the first 16. (Maybe the same principle could be
considered even by 256-colour artists on the mighty Falcon.)
GIF and TIFF
With this message I send to Ictari a description of the GIF and TIFF
picture file formats. This may seem superfluous, considering that the
full official documentation, upon which I have mostly drawn, should be
available to all Ictari members anyway. (The TIFF document was
published in Ictari 16, the GIF documents I send to Ictari now in the
unlikely case that you don't already have them.) I still hope there is
some value in a slightly shorter description written by an Atari
programmer for other Atari programmers to read, and containing some
experiences I made when writing depacking routines for these formats.
Would anybody be interested in GIF- and TIFF-routines to complement
the PICPAC collection? (assembler routines in Ictari 16). I have some
preliminary routines ready (depacking only at this point), which will
probably need some polishing, though.
*/ Please do send the GIF/TIFF routines when they are ready. We will
publish the GIF document files over the next few issues. In our
opinion you can never have TOO much information on programming and
thanks also for the text compression information. Peter Hibbs /*
--------------------------- End of file ------------------------------