Copy Link

Add to Bookmark

Report

How to program the FPU

Published in

· 1 year ago

It all really started with DHS. Although they were by no means the first people to code FPU stuff on the Falcon, they were the first to create a demo so good it made buying an FPU worthwhile. That demo was '4ever' and ever since coders have been scratching their head and thinking 'perhaps that FPU thing may be worth a look'.

From this starring role, the FPU has gone on to make cameo appearances in numerous other 4k demos 128 byte intros - mainly due to its inbuilt sine and cosine table. It has enough functionality to make it worthwhile using in full demos as well as intros, even more so now that most Falcon owners own one of these chips.

As demo effects become more complex and 3D worlds become more prevalent, hardware floating point calculations becomes a very tempting option. Although you can get away with using fixed point, floats give you so much more flexibility and power.

Fair enough, you say. But how do I program the damn thing?

Fear no more because the maggie team are here to lift the shroud of secrecy from FPU programming.

DATA FORMATS

On the 68030 you have these basic data types:

NAME	BITS
Bits	1
BitField	1-32
BCD	8
Byte	8
Word	16
LongWord	32
QuadWord	64

These are all integer formats. As the FPU is floating point based the data formats are very different, although there is some overlap.

The 68881/68882 support three integer types:

NAME	BITS
Byte	8
Word	16
LongWord	32

These are completely compatible with the 030 integer types.

The 68881/68882 has three floating point types:

NAME	BITS	EXPONENT	MANTISSA
Single	32	8	23
Double	64	11	53
Extended	96	15	64
Packed	96	12	68

The 'Single' format is equivalent to C's "float" data type, with 'Double' being, erm, 'the double' of C's "double" type. (Now you are just confusing everyone! -ED)

Whilst the 'Extended' type requires 96 bits of storage in memory (12 bytes) only 80 bits of this are actually used by the FPU. The rest is for 'future expansion'. So its a bit of of a waste using this format, especially if saving large chunks of floats in memory.

Internally, the FPU performs all calculations to 80 bit precision, first converting from the source type then converting to the destination type (if necessary).

Use of Double and Extended types requires more memory overhead thus more time fetching/storing data so I recommend sticking to the Single type - the overhead of the FPU conversions from Single to Extended are negligible compared to memory speed.

Another big advantage of the Single type is that it fits neatly in 68030 data registers so you can use these for temporary stored and/or calculations!

REGISTERS

The FPU follows the 68k by having 8 general purpose data registers named FP0-FP7. Each of these are 80 bit (extended format) and when data is moved into them it is converted into 80 bit precision.

There are also status and control registers and a program counter. Full discussion of this is beyond the scope of this article - after all, this is meant to be an introduction.

ADDRESSING MODES

The FPU has access to all addressing modes of the host processor. This means you can use all the types of addressing you are used to on the 030. The FPU can carry out instructions on memory and 030 registers, not just FPU registers! Obviously things are faster in FPU registers, but you are not limited to just using these 8 registers.

CONDITIONAL CODES

Like the 68000, the FPU has a status register with bits representing conditions. The condition codes reflect the last arithmetic operation that occurred in the FPU and can be tested.

The following conditions are supported

EQ	Equal
NE	Not (Equal)
GT	Greater Than
NGT	Not (Greater Than)
GE	Greater Than or Equal
NGE	Not (Greater Than or Equal)
GL	Greater or Less Than
NGL	Not (Greater or Less Than)
GLE	Greater or Less or Equal
NGLE	Not (Greater or Less or Equal)
OGT	Ordered Greater Than
ULE	Unordered or Less or Equal
OGE	Ordered Greater Than or Equal
ULT	Unordered or Less Than
OLT	Ordered Less Than
UGE	Unordered or Greater or Equal
OLE	Ordered Less Than or Equal
UGT	Unordered or Greater Than
OGL	Ordered Greater or Less Than
UEQ	Unordered or Equal
OR	Ordered
UN	Unordered

INSTRUCTIONS

Here follows a list of all the FPU instructions.

Syntax:

<fmt>: is one of

.B ( byte - 8 bits Integer)
.W ( word - 16 bits Integer )
.L ( long - 32 bits Integer )
.S ( single - 32 bits Float )
.D ( double - 64 bits Float )
.X ( extended - 96 bits Float )
.P ( packed - 96 bits BCD Float )

<ea>: Any 68030 addressing mode
<label>: A label
<list>: List of FPU data or control registers
FPcr: FPU control register (FPCR, FPSR or FPIAR)
FPn: FPU data register (FP0-FP7) this is the destination register
FPm: FPU data register (FP0-FP7) this is the source register
FPc - FPs: Two FPU data register (FP0-FP7)
d: Displacement
k: An integer
ccc: An index into FPCP constant ROM

Some basic instruction timings is given. This is based on FPU register to register operations. The numbers in brackets represent the time of the head and tail of the instruction respectively.

e.g.

FDIV 108(17/87)

108 cycles is the total execution time, 17 cycles for the head and 87 cycles for the tail. Don't forget that the tail of one instruction can overlap with the head of the next instruction to give you increased performance!

These instruction timings will not given accurate timings, that depends on the type of input/output format and addressing mode used. However, they do give an indication of the relative speed of the instructions.

---------------------------------------------------------------------- 
 FABS                                                       38(17/17) 
---------------------------------------------------------------------- 
 FABS.<fmt>     <ea>,FPn 
 FABS.X         FPm,FPn 
 FABS.X         FPn

Calculates the absolute value of the source operand and stores the result is the destination FPU register.

---------------------------------------------------------------------- 
 FACOS                                                    628(17/607) 
---------------------------------------------------------------------- 
 FACOS.<fmt>    <ea>,FPn 
 FACOS.X        FPm,FPn 
 FACOS.X        FPn

Calculates the arc cosine of the source operand. The source must be in the range [-1...+1] and the result is in the range [0...Pi] Arc cosine is basically and inverse cosine. The result is [0...Pi] as the FPU works in radians - to convert to degrees multiply by 180/Pi.

---------------------------------------------------------------------- 
 FADD                                                      56(17/35) 
---------------------------------------------------------------------- 
 FADD.<fmt>     <ea>,FPn 
 FADD.X         FPm,FPn

Adds the source operand to the destination operand.

---------------------------------------------------------------------- 
 FASIN                                                    584(17/563) 
---------------------------------------------------------------------- 
 FASIN.<fmt>    <ea>,FPn 
 FASIN.X        FPm,FPn

Calculates the arc sine of the source operand. The source must be in the range [-1...+1] and the result is in the range [-Pi/2...+Pi/2] Arc sine is basically and inverse sine.

---------------------------------------------------------------------- 
 FATAN                                                    406(17/385) 
---------------------------------------------------------------------- 
 FATAN.<fmt>    <ea>,FPn 
 FATAN.X        FPm,FPn

Calculates the arc tangent of the source operand. The source must be in the range [-1...+1] and the result is in the range [-Pi/2...+Pi/2] Arc tangent is basically and inverse tangent.

---------------------------------------------------------------------- 
 FATANH                                                   696(17/675) 
---------------------------------------------------------------------- 
 FATANH.<fmt>   <ea>,FPn 
 FATANH.X       FPm,FPn

Calculates the hyberbolic arc tangent of the source operand. The source must be in the range [-1...+1]

---------------------------------------------------------------------- 
 FBcc                                                     23 
---------------------------------------------------------------------- 
 FBcc.<size>    <label>

If the condition is met, program execution continues at PC+Displacement. <size> determines the size of the distplacement - if the label is +-32768 bytes away then size can be a word otherwise it is a longword.

---------------------------------------------------------------------- 
 FCMP                                                      38(17/17) 
---------------------------------------------------------------------- 
 FCMP.<fmt>     <ea>,FPn 
 FCMP.X         FPm,FPn

Subtracts the source operand from the destination operand and set the condition code flags accordingly.

---------------------------------------------------------------------- 
 FCOS                                                     394(17/373) 
---------------------------------------------------------------------- 
 FCOS.<fmt>     <ea>,FPn 
 FCOS.X         FPm,FPn 
 FCOS.X         FPn

Calculates the cosine of the source operand. The source must be in the range [-2Pi...+2Pi] and the result is in the range [-1...+1]

---------------------------------------------------------------------- 
 FCOSH                                                    610(17/598) 
---------------------------------------------------------------------- 
 FCOSH.<fmt>    <ea>,FPn 
 FCOSH.X        FPm,FPn 
 FCOSH.X        FPn

Calculates the hyperbolic cosine of the source operand and stores the result in the destination operand.

---------------------------------------------------------------------- 
 FDBcc                                                     32 
---------------------------------------------------------------------- 
 FDBcc          Dn,<label>

Decrements the specified 68030 data register and branches conditionally to the specified label. This instruction is analogous to the 68K DBcc instruction (dbra etc). For condition codes see above.

---------------------------------------------------------------------- 
 FDIV                                                     108(17/87) 
---------------------------------------------------------------------- 
 FDIV.<fmt>     <ea>,FPn 
 FDIV.X         FPm,FPn

Divides the destination FPU register by the source operand.

---------------------------------------------------------------------- 
 FETOX                                                   500(17/479) 
---------------------------------------------------------------------- 
 FETOX.<fmt>    <ea>,FPn 
 FETOX.X        FPm,FPn 
 FETOX.X        FPn

Calculates e to the power of the source operand and stores in destination FPU register.

---------------------------------------------------------------------- 
 FETOXM1                                                  548(17/527) 
---------------------------------------------------------------------- 
 FETOXM1.<fmt>  <ea>,FPn 
 FETOXM1.X      FPm,FPn 
 FETOXM1.X      FPn

Calculates e to the power of the source operand then subtracts one and stores in destination FPU register.

---------------------------------------------------------------------- 
 FGETEXP                                                  48(17/27) 
---------------------------------------------------------------------- 
 FGETEXP.<fmt>  <ea>,FPn 
 FGETEXP.X      FPm,FPn 
 FGETEXP.X      FPn

Extracts the exponent from the source operand and stores in the destination FPU register.

---------------------------------------------------------------------- 
 FGETMAN                                                   34(17/13) 
---------------------------------------------------------------------- 
 FGETMAN.<fmt>  <ea>,FPn 
 FGETMAN.X      FPm,FPn 
 FGETMAN.X      FPn

Extracts the mantissa from the source operand and stores in the destination FPU register.

---------------------------------------------------------------------- 
 FINT                                                       58(17/37) 
---------------------------------------------------------------------- 
 FINT.<fmt>     <ea>,FPn 
 FINT.X         FPm,FPn 
 FINT.X         FPn

Extracts the integer part of the source operand and stores in the destination FPU register.

---------------------------------------------------------------------- 
 FINTRZ                                                     58(17/37) 
---------------------------------------------------------------------- 
 FINT.<fmt>     <ea>,FPn 
 FINT.X         FPm,FPn 
 FINT.X         FPn

Extracts the integer part of the source operand, rounds down (towards zero) and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FLOG10                                                   584(17/563) 
---------------------------------------------------------------------- 
 FLOG10.<fmt>   <ea>,FPn 
 FLOG10.X       FPm,FPn 
 FLOG10.X       FPn

Calculates the logarithm of the source operand using base 10 arithmetic and stores in the destination FPU register.

---------------------------------------------------------------------- 
 FLOG2                                                    584(17/563) 
---------------------------------------------------------------------- 
 FLOG2.<fmt>    <ea>,FPn 
 FLOG2.X        FPm,FPn 
 FLOG2.X        FPn

Calculates the logarithm of the source operand using base 2 arithmetic and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FLOGN                                                    528(17/507) 
---------------------------------------------------------------------- 
 FLOGN.<fmt>    <ea>,FPn 
 FLOGN.X        FPm,FPn 
 FLOGN.X        FPn

Calculates the natural logarithm of the source operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FLOGNP1                                                  574(17/553) 
---------------------------------------------------------------------- 
 FLOGNP1.<fmt>  <ea>,FPn 
 FLOGNP1.X      FPm,FPn 
 FLOGNP1.X      FPn

Adds ones to the source operand and calculates the natural logarithm of this value and then stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FMOD                                                       75(17/54) 
---------------------------------------------------------------------- 
 FMOD.<fmt>     <ea>,FPn 
 FMOD.X         FPm,FPn 
 FMOD.X         FPn

Calculates the modulo remainder of the destination operand divided by the source operand and stores in the destination FPU register.

---------------------------------------------------------------------- 
 FMOVE                                                       21(10/0) 
---------------------------------------------------------------------- 
 FMOVE.<fmt>    <ea>,FPn 
 FMOVE.<fmt>    FPn,<ea> 
 FMOVE.L        <ea>,FPcr 
 FMOVE.L        FPcr,FPcr

Moves the source operand into the destination operand doing any necessary conversion.

---------------------------------------------------------------------- 
 FMOVEM                                                     54+25n/9 
---------------------------------------------------------------------- 
 FMOVEM.X       <list>,<ea> 
 FMOVEM.X       Dn,<ea> 
 FMOVEM.X       <ea>,<list> 
 FMOVEM.X       <ea>,Dn

Moves a set of FPU register to/from the specified address. This is analogous to the 68k MOVEM instruction.

---------------------------------------------------------------------- 
 FMUL                                                       76(17/55) 
---------------------------------------------------------------------- 
 FMUL.<fmt>     <ea>,FPn 
 FMUL.X         FPm,FPn

Multiplies the source operand by the destination operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FNEG                                                       38(17/17) 
---------------------------------------------------------------------- 
 FNEG.<fmt>     <ea>,FPn 
 FNEG.X         FPm,FPn 
 FNEG.X         FPn

Inverts the sign of the mantissa of the source operand and stores the result in the destination FPU register. Works like the 68k NEG instruction but on floats instead of integers.

---------------------------------------------------------------------- 
 FNOP                                                       19 
----------------------------------------------------------------------

No operation. This is useful for forcing synchronisation of the FPU with the 030 or to force processing of pending exceptions.

Usually the FPU doesn't wait for the current operation to complete before starting the next operation. FNOP causes the 030 to wait until the previous instruction has completed.

---------------------------------------------------------------------- 
 FREM                                                     105(17/84) 
---------------------------------------------------------------------- 

 FREM.<fmt>     <ea>,FPn 
 FREM.X         FPm,FPn

Calculates the modulo remainder of the destination operand divided by the source operand. Stores the result in the destination operand.

---------------------------------------------------------------------- 
 FRESTORE                                                  340 
---------------------------------------------------------------------- 

 FRESTORE       <ea>

Aborts execution of any operation in progress and loads the new internal state from the specified effective address. This can be used with the FMOVEM to restore the complete FPU context.

---------------------------------------------------------------------- 
 FSAVE                                                     336 
---------------------------------------------------------------------- 

 FSAVE          <ea>

Saves the internal state of the FPU to the specified effective address. This state can be restore with the FRESTORE instruction.

---------------------------------------------------------------------- 
 FSCALE                                                    46(17/25) 
---------------------------------------------------------------------- 

 FSCALE.<fmt>   <ea>,FPn 
 FSCALE.X       FPm,FPn

Multiplies the destination operand by 2 to the power of the source operand. Faster than a standard FMUL when working with integer values.

---------------------------------------------------------------------- 
 FSCC                                                      25 
---------------------------------------------------------------------- 

 FScc.<size>    <ea>

Sets a byte conditionally. If the specified condition is true then the byte at the specified effective address is set to TRUE (all ones) else it is set to zero. For condition codes see the earlier section.

---------------------------------------------------------------------- 
 FSGLDIV                                                   74(17/53) 
---------------------------------------------------------------------- 

 FSGLDIV.<fmt>  <ea>,FPn 
 FSGLDIV.X      FPm,FPn

Divides the destination operand by the source operand. Both registers are assumed to be in single precision format.

---------------------------------------------------------------------- 
 FSGLMUL                                                   64(17/43) 
---------------------------------------------------------------------- 

 FSGLMUL.<fmt>  <ea>,FPn 
 FSGLMUL.X      FPm,FPn

Multiplies the destination operand by the source operand and stores the result in the destination operand. Both operands are assumed to be in single precision format.

---------------------------------------------------------------------- 
 FSIN                                                     394(17/373) 
---------------------------------------------------------------------- 

 FSIN.<fmt>     <ea>,FPn 
 FSIN.X         FPm,FPn 
 FSIN.X         FPn

Calculates the sine of the source operand and stores the result in the destination operand. This operation works in radians. The source is assumed to be in the rand [-2pi...+2pi]. The result is in the rand [- 1...+1]

---------------------------------------------------------------------- 
 FSINCOS                                                  454(17/433) 
---------------------------------------------------------------------- 
 FSINCOS.<fmt>  <ea>,FPc:FPs 
 FSINCOS.X      FPm,FPc:FPs

Simultaneous sine and cosine. Calculates the sine and cosine of the source operand and stores the results in the two destination operands. This operation works in radians.

---------------------------------------------------------------------- 
 FSINH                                                    690(17/669) 
---------------------------------------------------------------------- 
 FSINH.<fmt>    <ea>,FPn 
 FSINH.X        FPm,FPn

Calculates the hyperbolic sine of the source operand and stores the result in the destination operand.

---------------------------------------------------------------------- 
 FSQRT                                                     110(17/89) 
---------------------------------------------------------------------- 
 FSQRT.<fmt>    <ea>,FPn 
 FSQRT.X        FPm,FPn 
 FSQRT.X        FPn

Calculates the square root of the source operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FSUB                                                      56(17/35) 
---------------------------------------------------------------------- 
 FSUB.<fmt>     <ea>,FPn 
 FSUB.X         FPm,FPn

Subtracts the source operand from the destination operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FTAN                                                     476(17/455) 
---------------------------------------------------------------------- 
 FTAN.<fmt>     <ea>,FPn 
 FTAN.X         FPm,FPn

Calculates the tangent of the source operand and stores the result in the destination FPU register. This operation works in radians.

---------------------------------------------------------------------- 
 FTANH                                                    664(17/643) 
---------------------------------------------------------------------- 
 FTANH.<fmt>    <ea>,FPn 
 FTANH.X        FPm,FPn

Calculates the hyperbolic tangent of the source operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FTENTOX                                                  570(17/549) 
---------------------------------------------------------------------- 
 FTENTOX.<fmt>  <ea>,FPn 
 FTENTOX.X      FPm,FPn

Calculates ten to the power of the source operand and stores the result in the destination FPU register.

---------------------------------------------------------------------- 
 FTRAPcc                                                  52 
---------------------------------------------------------------------- 
 FTRAPcc 
 FTRAPcc.W      #<data> 
 FTRAPcc.L      #<data>

If the specified condition is true is true an exception is generated and processing jumps to a vector. Optionally a data value can be specified which is pushed onto the stack and can be processed by the exception handler.

---------------------------------------------------------------------- 
 FTST                                                       36(17/15) 
---------------------------------------------------------------------- 
 FTST.<fmt>     <ea> 
 FTST.X         FPm

Tests the specified operand and sets the condition code flags accordingly.

---------------------------------------------------------------------- 
 FTWOTOX                                                 570(17/549) 
---------------------------------------------------------------------- 
 FTWOTOX.<fmt>  <ea>,FPn 
 FTWOTOX.X      FPm,FPn 
 FTWOTOX.X      FPn

Calculates two to the power of the source operand and stores the result in the destination FPU register.

---------------

Let's not pretend here, the FPU is slow. But it is damn accurate! Fixed point maths on the 030 is always going to outperform the FPU, but the old 16:16 format doesn't give you a great amount of space to play around with and once you start extending beyond 32 bits you run into a whole host of complexities and speed issues so once again the FPU becomes a viable option.

If you are going to do a 3D world system, I recommend that you DO NOT use the FPU for the transformation of all your vertices! The FPU is very useful for all the initial matrix stuff, the concatenation of various matrices. If you are going to be using Quaternions then the FPU is ideal.

Let the FPU loose on any sort of 3D maths you are needing to do, just keep it away from tight inner loops that process a great amount of data.

Functions like the FPU Square Root are extremely useful. Sure you could write a 68k version in less than 110 cycles, but if you want accuracy and compactness here is your solution. This is very handy for 4k intros and 128 byte demos!

Be aware that the sin and cos stuff works in radians so you will probably want to convert it into degrees before using it unless you are the type of masochist who enjoys working in radians. As the sine and cosine stuff is so slow I recommend using the FPU instruction to precalculate sin/cosine tables.

Use the single precision format where you can - it is 32 bits so fits neatly in a longword plus there is less to transfer to/from memory. Be aware that you cannot perform standard 68k instructions on FPU data and then expect it to make sense!

For example, if you move a single precision float from an FPU register into D0, then negate D0 with a NEG D0 instruction you will not a the negative version of the original data! This means you will probably also want to create negative sine and cosine tables as well as the positive ones.

The beauty of the FPU is that allows you tackle some complex mathematical problems without the limitations imposed by fixed point maths. But be warned, once you start programming the FPU you won't want to go back!

Mail me with your FPU questions:

[ mrpink@atari.org ]