Copy Link
Add to Bookmark
Report
DSP Programming on Atari Falcon
DSP Programming on Atari Falcon
In these pages I will try to teach how to use the DSP in Atari
Falcon. It consists of a number of pages, some not yet written, which
will explain step by step now the DSP works and how to use it the
best way, or at least the best way I know of.
I will mainly keep to programming at the Falcon and whenever I use
the word DSP I mean DSP56001 made by Motorola, even though there are
a lot of other kinds made by other companies.
First of all I would like to point out that I am not at all an expert
on this subject, but simply a happy hacker who wants to share some of
his knowledge. I might also just as well tell you now that it is very
helpful if you have programmed before, preferrably in 68K assembly or
possibly in C. Assembly because the DSP assembly language is quite
like the 68K assembly, and C is a common and easy language to use for
controlling the DSP, if you don't count demo programming. I will
often compare the two assembly languages and point out main
differences.
Another thing that has helped me a lot for my DSP programming is
"DSP56000/DSP56001 Digital Signal Processor User's Manual", a book
written by Motorola. The User's Manual is also available as PDF
documents from Motorola's DSP homepage. The manual has no Atari
specific things, but contains all instructions and descriptions of all
registers in the DSP. The Atari specific parts is documented, among
other places, in "Atari Developers Manual" and in "The Atari
Compendium".
There are some public assemblers for the Falcon and Hisoft has also
made a Devpac DSP. But the one I use is a public one made by Motorola
and is called ASM56000.TTP. Together with some other programs it's
quite simple to use.
These page contains these parts, so far:
1. The basic structure of DSP56001
2. Fixpoint numbers and ALU registers
3. Addressing modes
4. Short descriptions of instructions (not yet finished)
5. Host communication (not yet written)
More will come soon. I will make those pages more html:ish to, with
links between, and within the parts.
Current project using the DSP is a Dolby Surround encoder, which
directs the input sound and outputs the encoded signal to the
amplifier in real time. The use of this in demos and games will make
the feeling, and reality much greater.
Also take a look at Fredrik Noring's DSP page
Part 1: The basic structure of DSP56001
The basic structure of DSP56001
In this first part I will now try and explain a little about what a
DSP is and what a thing like that is doing in Atari Falcon.
DSP stands for "Digital Signal Processor", not Digital Sound
Processor, even if it's designed for controlling digital sound. The
DSP has instructions that are specially optimised to perform very
fast digital filters, FFT, speech recognition and a lot more. I wont
go very deep into the mathematics of these things, mainly because I
myself don't understand it. But don't worry, the DSP can be used for
a lot more than just sound processing. Because it is very fast with
mathematical instructions it may also be used for image processing,
such as MPEG viewers, 3D calculations and more. The DSP is simply
useful for just about anything.
The DSP in Falcon030 is a DSP56001 made by Motorola, who also makes
the 68K processors which all Atari 16/32 bit computers are based on.
The similarities between the two processors are therefore many but
the differences are probably more.
Major components of the DSP56001
* 4 Data buses
* 3 Address buses
* 1 Program controller
* 1 Data ALU
* 2 AGUs
* 1 X data memory
* 1 Y data memory
* 1 Program memory
* 3 I/O ports
X, Y och P memory
Those of you who have programmed 68K assembly with immediately notice
a great difference in the way the DSP handles its memory. The DSP
doesn't count its memory in amount of bytes but in amount of DSP
words. On the DSP56001, one DSP word is 24 bit wide (3 bytes). Be
sure not to mix up the DSP word with 68K word (16 bit). To keep
compatibility with future versions of Atari computers with different
DSPs there is an XBios call, DSP_WordSize(), to find out how large a
DSP word is in the DSP that is in the computer. Here I will only talk
about DSP56001 and therefore only about 24 bit DSP words.
What's important with this is that the DSP does not address byte
addresses when using the memory, but DSP word addresses. This means
that the address 0 (zero) points to the first word, and the address 1
(one) points to the second word.
Another difference is the memory size. Internally, the DSP has three
different memory areas, X, Y and P memory. The X and Y memory areas
(Data memory) are 256 words each and the P memory (Program memory) is
512 words. In Falcon030 there is also an external 32K word memory,
which is 96Kb. There's not room for a lot of demos, but remember that
the DSP is made as a coprocessor and not as a CPU. This 32K memory is
divided into two 16K memory banks, one connected as external X
memory, and one as external Y memory. In the Falcon these 16K memory
banks together is also used as external P memory. This means that the
external P memory is the same physical memory as the external X and Y
memory. A better explanation of this will follow.
Address and data buses
The fact that there are two separate memory areas, X and Y, might in
the beginning seem a bit unnecessary and difficult, but thanks to
that there are also different address and data buses makes it
possible for the DSP to use both memories in the same instruction.
You are able to move two words to/from different places in memory at
the same time, but with some restrictions. These address and data
buses is called XAB, YAB, XDB and YDB. The P memory has its own
buses, PAB and PDB, where instructions are transferred to the program
controller, something I'll get into later. The fourth data bus is a
global data bus, GDB, which is used for the I/O ports among other
things.
The AGU
AGU stands for "Address Generation Unit" and is the part that handles
the address registers and generates the addresses for the address
buses using these. The AGU contains eight address registers, R0-R7,
eight offset registers, N0-N7 and eight modifier registers, M0-M7.
Each register is 16 bit wide which makes it possible to generate
65536 memory positions for either XAB, YAB or PAB. I wrote earlier
that there were two AGUs in the DSP, which isn't quite true. There
are two address generators in the AGU which makes it possible to
generate two addresses at the same time. All 24 registers may be used
as 16 bit data storage registers if you want and the data is the
read/written through GDB.
The ALU
ALU stands for "Arithmetic Logic Unit" and this is where all the
action in the DSP takes place. This is the part that does all its
calculations. The ALU har four 24-bit registers, X0, X1, Y0 and Y1
plus two 56-bit accumulators, A and B. All calculations are done to
the accumulators. In some instructions, for example ADD and SUB, the
24-bit registers may be joined two by two, to be used as two 48-bit
registers, X and Y. The accumulators may also be devided into two
24-bit and one 8-bit register each, A0, A1, A2 and B0, B1, B2.
The Program Controller
The program controller handles the execution of instructions,
hardware loops and interrupts, among other things. It has six 16-bit
registers: Program Counter (PC), Loop Address (LA), Loop Counter
(LC), Status Register (SR), Operating Mode Register (OMR) and Stack
Pointer (SP). The program controller also contains an internal 15
levels 32-bit stack, where PC, SR, LA and LC are saved at different
occasions. This stack is divided into two parts, System Stack High
(SSH) and System Stack Low (SSL) with 15 16-bit values each.
PC points at the address in the P memory where your program is being
executed.
SP points to the place in the stack where it should write its next
value.
SR consists of two 8-bit parts, MR and CCR. MR contains bits which
control if interrupts shall be run, if the DSP is in trace mode, if a
hardware loop is active etc. CCR contains the condition flags used
with the Jump-If instructions (Jcc = 68K's Bcc). The flags, starting
from bit 0 is: Carry (C), Overflow (V), Zero (Z), Negative (N),
Unnormalized (U), Extension (E) and Limit (L). The use of these will
be explained further on. Some of the flags, C, V, Z and N is
recognised from 68K assembler and work in similar way.
OMR is used to control how the memory should be arranged, something
that can be changed a little. I myself have never used this very
much, but just let it stay as is. Bit 2 in OMR is called Data ROM
Enable (DE) and when set, the addresses $0100-$01ff will change into
special internal ROMs. In the X memory will be a Mu-Law and an A-law
table on 128 words each. These are used in telecommunications. In the
Y memory at the same addresses, is a 256 words four quadrant
sinustable useful for among other things FFT (Fast Fourier
Transform). If DE is cleared, which it is on reset, these memory
addresses will be the external RAM.
LA and LC will be explained later.
Those registers, except for PC, may be changed with the MOVEC
instruction. MR, CCR and OMR may also be altered with ANDI and ORI.
The DSP56001 operates in a way called pipelining which basically
means that it is busy working with three instructions at the same
time. The execution of an instruction is made in three steps, line
up, aim and fire. In the world of DSP also called: fetching, decoding
and executing. With the use of pipelining, the program controller
first fetches the first instruction. When that instruction is being
decoded, the second instruction is being fetched and when the first
instruction is executing, the send is decoded and the third is
fetched, a.s.o. This is a great difference from 68K processors, even
if they have the same steps, they are all done with one instruction
at a time. Due to the use of pipelining in the DSP, one must watch
out with the use of address registers. If a value is moved to an
address register, this value will not be able to be used in the next
instruction. Most assemblers warn for this, so that nothing
unexpected should happen.
I/O ports
I'm not going to get very deep in these right now, but some overview
of them can be made. There are three I/O ports on the DSP56001,
called Port A, Port B and Port C.
Port A is used to handle the external memory and this manage its own
business in the Falcon so that we don't have to worry about it.
Port B is the port used for Host communication with the CPU, 68030.
This is the most common way to communicate with the DSP.
Port C consists of two parts, SCI and SSI. SCI is a network interface
used to communicate with other DSPs and is not used in Falcon. The
SSI is connected with the Matrix in the Falcon that can connect the
DMA, DSP, CODAC and the external DSP port. The SSI is used by WinRec
and similar applications to add effects to the sound in real time. It
is also possible to send data between the DSP and CPU through the
SSI, which is faster than sending through the host, but it is also
just about as complicated.
Those of you who know of the 68K processors notice that the
arrangements is quite unlike that of the DSP. The DSP has different
parts that takes care of different things and each part has its own
registers. Though it add some limitations, it is also much faster
since the different things can be done at the same time. The ALU does
a calculation and at the same time data may be moved from and/or to
the memory. This is what is called parallel moving and is the
greatest optimisation when programming the DSP.
Part 2: Fixpoint numbers and ALU registers
Fixpoint numbers and ALU registers
In this second part of my DSP programming pages, I will describe how
the number representation in DSP56001 works, some instructions,
explainations of the ALU registers and how to make use of the address
registers and its offset and modulo registers.
The DSP doesn't use integers as the 68K does. Instead it uses
something called fix point numbers. This is not to be mixed up with
floating point numbers. Floating point numbers consists of two part,
one fractional part and one exponential part, to form its numbers.
Fix point numbers only use the fractional part, numbers between -1
and +1. This is how the DSP uses its numbers, 24-bit fix point. The
largest value that can be represented with the DSP is nealy one,
hexadecimal $7FFFFF=0.99999988079 and the lowest value is exactly
-1=$800000. The MSB is the sign of the value and the 23 LSBs are the
fractional number. Follows is a list with a few examples of numbers
and their corresponding hexadecimal values.
0.0=$000000
0.25=$200000
0.5=$400000
~1.0=$7FFFFF
-1.0=$800000
-0.5=$C00000
-0.25=$E00000
The ~ in front of 1.0 is because it is not exactly equal to 1.0 but
it is as near as it can get, using two complement 24-bit fix point.
Usually, the assembler accepts that you write 1.0 and uses $7FFFFF,
but may give a warning that you have used a number not representable.
How the DSP uses the numbers is very important to know. If you want
to move a value of, say 42 into data register X0, you might write:
MOVE #42,X0
This will put the hexadecimal value $2A0000 into X0 and NOT $00002A
as you might have expected it to do. To tell the assembler that you
really want the integer 42 put in X0, you'll have to insert a '>',
like this:
MOVE #>42,X0
Believe me, this is a common source of error, that isn't very easy to
discover.
Back to some fix points.
When multiplying two numbers in DSP, you use the instruction MPY.
This multiplies two 24-bit values and results in a 48-bit value.
Multiplications are always two complemental. Let's give an example,
we want to multiply 0.5 by -0.25, which will of course be -0.125. The
DSP will have the values $400000 and $E00000, which will result in a
48-bit values of $F00000:000000.
The instruction for this could look like:
MPY X0,Y0,A
X0=$400000, Y0=$E00000 and after execution: A=$FF:F00000:000000.
Which directly leads us into the construct of the accumulators. In
part one, I mentioned that the accumulators could be divided into
three registers, A2, A1 and A0. In the example above these registers
would be: A2=$FF, A1=$F00000 and A0=$000000. A2, you might wonder,
how did that get its value. Simple, doing a multiplication, A2 will
be the signextension of the MSB of A1. Sounds complicated? It's not,
if the result is positive, >=0, A2 will be $00 and if the result is
negative, <0 as in our case, A2=$FF. This sign extension will also
occur when a value is move to an accumulator. For example, if we
would do a:
MOVE #$876543,A
A would contain $FF:876543:000000 after execution.
In this case, when we move a 24-bit value into the 56-bit
accumulator, not only is the value sign extended into A2, but A0 is
also zeroed. The N flag in CCR is also set according to MSB of A2.
This takes us to another possible error in our programming. Say that
the accumulator A contains a value of $00:123456:789ABC and we want
to move this 56-bit value to the other accumulator, B. A normal
person would try:
MOVE A,B
But this will not do exactly what we intended to do. This example
would take the A1 part of A and put that in the B1 part of B, sign
extend into B2 and zero B0, which would make B=$00:123456:000000.
Close to what we want, but not close enough. But no worries, there is
a specially made instruction for this 56-bit transfer, namely TFR
(Transfer Data ALU).
This:
TFR A,B
will move the whole 56-bit value from A to B. Ok, that's good, but
what about when adding two numbers, that would make it possible to
produce a value larger than 1. Yes, that's correct. This is where we
get use of the 8-bit extension part of the accumulators, more than a
sign extension. We take two values, for example 0.75=$600000 and
0.5=$400000, which will make a result of 1.25=$?, hard to represent
with the usual two complement 24-bit fix point. The instruction would
be:
ADD X0,A
Before execution: X0=$600000 (0.75), A=$00:400000:000000 (0.5)
After execution: X0=$600000 (0.75), A=$00:A00000:000000 (1.25)
We see here that A2 is still zeroed even if MSB of A1 is set. When
using 56-bit values, it is MSB of A2 that decide if the value is
negative. This means that values from -128.0 to +127.999 can be
represented in an accumulator, which may be useful sometimes. When an
accumulator results in a value greater than 1.0 or less than -1.0,
the extension flag (E) of CCR is set.
But, when we want to move this value from the accumulator to some
other place, memory or data register, we bump into some problems.
Neither of them has got this extra byte, and therefore cannot
represent values below or above -1 - +1. If we do this:
MOVE A,X0
when A=$00:A00000:000000 (1.25) as after the previous example, the
value will be limited to be the closest representative value, i.e.
~1.0 or $7FFFFF. This is called limiting in the DSP and when this
occurs, the L flag will be set in CCR. If we neccessarily does want
to get $A00000 into X0, this can be done by using:
MOVE A1,X0
MOVE A,X Limiting will occur and X will be $7FFFFF:FFFFFF. When we
use X as a 48-bit value, X1 is the MSW and X0 the LSW. This means
that X1=$7FFFFF and X0=$FFFFFF. As with above, we might not want
limiting to occur. This is done with:
MOVE A10,X
which directly copy the value in A1 to X1 and A0 to X0 without any
change of the numbers and will give X=$A00000:000000.
There is a third way of using 48-bit values, to combine the two
accumulators. We can do this two ways:
MOVE AB,X
or
MOVE BA,X
As said before, if a 24-bit value is move to an accumulator, for
example A, the value will come in A1, sign extend into A2 and A0 will
be zeroed. But values can also be moved to and from parts of the
accumulator, A0, A1 or A2 and no sign extension or zeroing will
happen. None of the other parts are affected.
Of course, in all of the examples above, no restriction are to that
perticularly the A accumulator is used, or that the X data register
is used. Both accumulators may be used the same. X and Y may also be
used the same way.
Let's now give a summary of the registers in the ALU and how they may
be used:
X1, X0 - 24 bits
Y1, Y0 - 24 bits
A2, A1, A0 A2 - 8 bits and A1, A0 - 24 bits
B2, B1, B0 B2 - 8 bits and B1, B0 - 24 bits
X = X1:X0 - 48 bits
Y = Y1:Y0 - 48 bits
A = A2:A1:A0 - 56 bits*
B = B2:B1:B0 - 56 bits*
AB = A1:B1 - 2x24 bits*
BA = B1:A1 - 2x24 bits*
A10 = A1:A0 - 48 bits
B10 = B1:B0 - 48 bits
Those marked with * are those which makes limiting occur when used as
a source register. When they are used as destinationregisters, sign
extension and zeroing takes place.
That was a little on limiting and the use of accumulators and the
dataregisters. The L flag is set every time limiting has occured.
Part 3: Addressing modes
Addressing modes
This part will cover the uses of the memory and the different
addressing modes available in the DSP56001.
Often you need to use more space to store data than in the few
registers the ALU have. Then it's time to take use of the enormous
memory the DSP have access to. This can be done in two ways, like in
68K assembler, throught direct addressing or via one of the eight
address registers of the AGU. Either way, the memory bank to used
much always be specified. This is done by writing X:nn, Y:nn, L:nn or
P:nn where nn is an address.
Here are two examples:
MOVE #$123456,X:$2A
MOVE #$ABCDEF,Y:$2A
The memory position $2A in the X memory now contains the value
$123456 and in the Y memory, at the same position, a value of
$ABCDEF. A third way of using the memory, is to combine the X and Y
memory positions into one 48-bit position. By using the L:nn
addressing mode like this:
MOVE L:$2A,X
the register X will be $123456:ABCDEF.
Here are a few more examples and the results of them:
MOVE L:$2A,A
A=$00:123456:ABCDEF
MOVE L:$2A,A10
A=$XX:123456:ABCDEF
A2 is not affected, the previous value is still present.
MOVE L:$2A,AB
A=$00:123456:000000
B=$FF:ABCDEF:000000
The fourth memory addressing mode is P:nn and is used to move data to
and from the program memory. This have to be done with a special
instruction, MOVEM.
The second way of addressing is with the use of address registers.
You then use X:(Rn), Y:(Rn), L:(Rn) or P:(Rn) where n decides which
of the registers, R0-R7, that is used. As in 68K assembler, the
address register can also be predecreamented or postinkreamented.
This is done by using X:-(Rn) or X:(Rn)+. The DSP also allows
postdecreamentation, X:(Rn)-. These three only increase or decrease
with one each time, but the DSP also makes it possible to increase or
decrease with more than one at a time, with the use of the offset
registers of the AGU, N0-N7. Each address register has its own offset
register. R0 must therefore only use N0 as offset register, R1 with
N1 and so on. There are three addressing modes which use the offset
registers: X:(Rn)+Nn, X:(Rn)-Nn and X:(Rn+Nn). In the first two
cases, Rn is increased or decreased, respectively, after the address
has been used. In the last case, Nn is used as an offset to Rn when
the address is generated, but Rn is not changed. The offset registers
are two compliments 16-bit numbers, which gives values from -32768 to
+32767.
The last eight registers of the AGU, the moduloregister, M0-M7 are,
as with the offset registers, also restricted to its own address
register. The modulo registers are used to create circular buffers,
very useful in digital signal processing such as filters and FFT.
When you read from a circular buffer and reach the end, the address
register will jump back to the beginning of the buffer again.
Moduloregisters are not set to anything specific when a program is
started. It is therefore recommended that you begin your program with
setting proper values on the modulo registers. A value of 65535
($FFFF) turns off the modulo function for the corresponding address
register. Values from 1 to 32767 tells the size of the circular
buffer to be used. The size of the buffer is the value of the modulo
register plus one, giving possible buffer sizes of 2 to 32768 words.
Modulo register values of 32768 to 65534 are reserved in the DSP for
future use. A value of 0 (zero) specifies a special mode, which I
will get back to later.
When you have decided how large buffer you want, there are some
restrictions to where the buffer may be placed in the memory. Take
the size of the buffer, here called M, a value from 2 to 32768. In
our exampel, let's use a value of M=$548, which makes the buffer 1352
words large.The value $547 (1351) is placed in the modulo register to
be used, for example M3. Then find a value, k, so that 2^k>=M. The
address where the buffer begins, must then have k LSBs zeroed.
Complicated? Not very, in our example, the smallest value of k is 11,
since 2^11=2048. The buffer address therefore have to have eleven
cleared LSBs. The lowest position of the buffer will then be
%0000100000000000=$0800. The first position after the buffer will be
$800+$548=$D48 (3400).
The following example will set up our example buffer:
ORG X:$800 ; Position of buffer
buffer
DS 1352 ; reserve 1352 words
Then set the registers correctly:
MOVE #buffer,R3 ; R3=$800 (R3 is not restricted to the X
memory!)
MOVE #1352-1,M3 ; M3 equals buffer size minus one
If we then use R3 with postinkrementation, (R3)+, R3 will, after 1351
times of use, have a value of $D47 but the next time R3 is supposed
to be increased to get a value of $D48, it will instead loop around
the circular buffer and R3 will be $800. The same thing if (R3)- or
-(R3) is used, it will jump to $D47 when decreasing $800. When the
offset registers are used, the address will also modulo. Let's say we
set offset register N3 with the value 5 and that we use (R3)+N3.
Starting on $800, we will increase with N3 270 times to end up at
address $D46. Next time, we would have come to $D4B, but the modulo
register makes R3 loop around and we will come to $D4B-$548=$803. The
same thing, but backwords, will happen if we use (R3)-N3. When
(R3+N3) is used, the address register will also loop around before
used, but will not be updated.
If the offset register is set to be exactly equal to 2^k, in our
example, N3=2048, the offset register may be used in a special way.
Using (R3)+N3, when R3=$800, will not modulo around the buffer and
end up at position $AB8 as you might have expected. In this case, R3
will be placed in the next possible circular buffer, i.e. at position
$1000 where a different circular buffer may be placed. If more than
one circular buffer is needed, this is a way of jumping between them
without having to set the address register manually.
Left to be explain should be when a modulo register is set to 0
(zero), a mode which is called Reverse Carry or Bit Reverse Mode.
This is a mode used for Fast Fourier Transforms (FFT), something I
don't know anything about. I can't really say why this mode is
useful, but will explain what it does anyway.
It first perform a bit reverse the address register internally, which
means that MSB switches place with LSB, the next MSB and bit 1
changes places, and so on. It then increases or decreases according
to the mode used, with one or with an offset register. And finally it
bit reverses the address register again, use the address generated
and updates the register.
For a more descriptive exlaination of this, I refer to The DSP56001
User's Manual, section 5.
Summary of addressing modes possible:
Absolute addressing
(Rn)
(Rn)+
(Rn)-
-(Rn)
(Rn)+Nn
(Rn)-Nn
(Rn+Nn)
All modes may be used for all of X:, Y:, L: and P:.
Be careful with the pipelining effects when using the address, offset
and modulo registers. The register may not be used in the instruction
directly after it has been set. An increase or decrease of the
registers may be used though, since pipelining does not effect then.