An examination of an early tape loader
RECOLLECTION issue 2
by Fungus/Nostalgia/Onslaught
Loader: Pavloada v2 (I believe, correct me if I'm wrong please ;)
Used on Hektic, Orbitron, Munch Mania, Cosmic Kanga and other games from Mastertronic and several other software houses in 1985 and 1986.
Foreward:
Cracking tapes is an interesting and challenging process. I have decided to disect an Old pavloader as lesson 1 in a series of tape cracking articles I hope to continue with. All work was done only with the Action Replay Monitor (and NOT the freezer!). Please enjoy ;)
Kudos to Qed for doing the first article long ago. (In C= Hacking)
The Commodore C64 Programmers Reference Guide, and Mapping
The Commodore 64 were the reference books used in the research of registers, rom routines and system memory locations.
64doc.txt and NMOS_6502_extra_opcodes.txt documents were used in the reverse engineering of the unimplemented opcode auto-boot routine. These texts (and MUCH more!) are available at ftp://ftp.funet.fi/pub/cbm/documents/chipdata
And now the fun begins.
Insert your prospective tape into the tape drive. Rewind, clear tape counter.
Enter the AR or RR monitor and type the following.
L"",01
The tape header will now load into the tape buffer, which is located at $033c. The boot file will also load into $02a7.
If you examine the loaded parts, they appear as gibberish. The loader itself occupies the space in the tape buffer $0351-$03fc. It is EOR encoded at this point. Lets have a look at the boot file which was loaded to $02a7 and ended at $0304.
$0302-$0303 is the basic idle loop vector. This is how MOST tapes autostart. There are several different ways of achieving an autostart, this happens to be one of the most common. I'll discuss other methods in future installments.
NOTE: Disk games can also autostart this way. And you can use it in your own programs aswell.
Now, look at 02a7 with
M 02a7
you should see,
.:02a7 64 ae 4e bf 02 14 cc a2
If not, its not the same loader. Note: the 4th byte, $bf, maybe something else.
If you dissassemble it, you get seeming gibberish with a normal monitor.
.> 02a7 64 ???
.> 02a8 ae 4e bf ldx $bf4e
.> 02ab 02 ???
.> 02ac 14 ???
.> 02ad cc a2 ff cpy $ffa2
etc...
Well, it is really a sneaky trick. The code is partially comprised of uninplemented opcodes.
Lets have a look at how the code really looks to the processor.
.> 02a7 64 ae skb $ae ;skip byte ($ae)
.> 02a9 4e bf 02 lsr $02bf ;decode byte at $02bf ($06 becomes $03)
.> 02ac 14 cc skb $cc ;skip byte ($cc)
.> 02ae a2 ff ldx #$ff ;load x index with #$ff
.> 02b0 8b 51 xaa #$51 ;and x index with #$51 and transfer to accumulator (lda #$51)
.> 02b2 87 fb sax $fb ;and x index with accumulator and store in $fb
.> 02b4 04 4c skb $4c ;skip byte $4c
.> 02b6 8b e1 xaa #$e1 ;and x index with #$e1 and transfer to accumulator (lda #$e1)
.> 02b8 54 cc skb $cc ;skip byte $cc
.> 02ba 8f 28 03 sax $0328 ;and x index with accumulator and store in $0328 (disable run/stop)
.> 02bd af 3c 03 lax $033c ;load x index and accumulator with memory address $033c ($03, filetype)
.> 02c0 87 fc sax $fc ;and x index with accumulator and store in $fc
.> 02c2 a0 ff ldy #$ff ;load y index with #$ff
.> 02c4 b3 fb lax ($fb),y ;load x index and accumulator with indirect address at $fb/$fc ($0351)
.> 02c6 54 20 skb $20 ;skip byte $20
.> 02c8 4d 02 03 eor $0302 ;eor accumulator with memory address $0302 ($a7, loaded)
.> 02cb 80 ee skb $ee ;skip byte $ee
.> 02cd 4d 17 03 eor $0317 ;eor accumulator with memory address $0317 ($fe, normally)
.> 02d0 89 20 skb $20 ;skip byte $20
.> 02d2 91 fb sta ($fb),y ;store accumulator in indirect address at $fb/$fc ($0351)
.> 02d4 14 cc skb $cc ;skip byte $cc
.> 02d6 88 dey ;decrement y index.
.> 02d7 c0 ff cpy #$ff ;compare y index with #$ff
.> 02d9 80 ee skb $ee ;skip byte $ee
.> 02db d0 e7 bne $02c4 ;branch if y index <> #$ff
.> 02dd 14 4c skb $4c ;skip byte $4c
.> 02df f0 70 beq $0351 ;branch if y index = $ff , Start real loader
.> 02e1 a0 c0 ldy #$c0 ;load y index with #$c0
.> 02e3 1b 3c 03 aso $0330,y ;arithmetic shift left memory, or with accumulator
.> 02e6 88 dey ;decrement y index.
.> 02e7 d0 fa bne $02e3 ;branch if y index <> 0
.> 02e9 14 2e skb $2e ;skip byte $2e
.> 02eb 20 93 fc jsr $fc93 ;jump to subroutine $fc93
.> 02ee 6c 4e 00 jmp ($004e) ;jump to indirect address in $4e/$4f
.> 02f1 20 33 a5 jsr $a533 ;jump to subroutine $a533
.> 02f4 89 ee skb $ee ;skip byte $ee
.> 02f6 20 59 a6 jsr $a659 ;jump to subroutine $a659
.> 02f9 4c ae a7 jmp $a7ae ;jump to basic start $a7ae
Now, that makes a lot more sense doesn't it?
It's easy to see the program start by initting some zp vectors, and decoding the loader at $0351 before executing it at $02df. There is also some other stuff after the start code. Wonder what that is? I guess we have to continue examing the loader to determine this. So lets decode it shall we? The following routine will decode the loader, without having to use the above code. Although it's important to understand how these things work. Cheating is NOT the way to do things properly, your only cheating yourself of greater knowledge.
start lda #$50 ;setup indirect at $fb/$fc to $0350
sta $fb
lda #$03
sta $fc
ldy #$af ;load y index with #$af bytes to decode
decode
lda ($fb),y ;decode the loader in the cassette buffer ($0351-$03ff)
eor #$a7 ;notice the decode loop goes backwards through memory
eor #$fe
sta ($fb),y
dey
bne decode
rts
The values for decoding were taken from the boot, or break points were set in the boot code to extract the needed values. Feel free to practice doing this for yourself, sometimes it's a challenge all in itself.
Lets have a look at the decoded loader now. (oooo the fun stuff!)
.> 0351 78 sei ;disable interrupts
.> 0352 ad 11 d9 lda $d011 ;load accumulator with vic control register
.> 0355 29 ef and #$ef ;and with %11101111 (bit 4 = 00 , blank screen)
.> 0357 8d 11 d0 sta $d011 ;store accumulator in vic control register
.> 035a a9 00 lda #$00 ;load accumulator with 00
.> 035c 85 c6 sta $c6 ;keyboard que = 0
NOTE: somtimes ( .> 035e 85 9d sta $9d ;kernal msgs off ) is present here
.> 035e a9 80 lda #$80 ;load accumulator with #$80 : restart loop
.> 0360 8d 11 d0 sta $dd04 ;store accumulator in cia 2 timer a low byte latch
.> 0363 a9 01 lda #$01 ;load accumulator with #$01
.> 0365 8d 05 dd sta $dd05 ;store accumulator in cia 2 timer a high byte latch
.> 0368 a9 19 lda #$19 ;load accumulator with #%00011001
.> 036a 8d 0e dd sta $dd0e ;store accumulator in cia 2 control register a
;bit 0 = 1 start timer
;bit 1 = 1 timer a output mode to pb6 = yes
;bit 3 = 1 one shot mode
;bit 4 = 1 force load timer a
;bit 5 = 0 count phase 02 clock cycles
;bit 6 = 0 serial port i/o mode = input
.> 036d a5 01 lda $01 ;load accumulator with i/o port
.> 036f 29 1f and #$1f ;and accumulator with #%00011111
.> 0371 85 01 sta $01 ;store accumulator in i/o port , bit 5 off = cassette motor on
.> 0373 a0 00 ldy #$00 ;load y index with 00
.> 0375 20 bb 03 jsr $03bb ;sync to block
.> 0378 20 d2 03 jsr $03d2 ;get a byte
.> 037b 85 20 sta $20 ;store load address low byte
.> 037d 85 c1 sta $c1 ;make a copy of it
.> 037f 20 d2 03 jsr $03d2 ;get a byte
.> 0382 85 21 sta $21 ;store load address high byte
.> 0384 85 c2 sta $c2 ;make a copy of it
.> 0386 20 d2 03 jsr $03d2 ;get a byte
.> 0389 85 22 sta $22 ;store end address low byte
.> 038b 85 c3 sta $c3 ;make a copy of it
.> 038d 20 d2 03 jsr $03d2 ;get a byte
.> 0390 85 23 sta $23 ;store end address high byte
.> 0392 85 c4 sta $c4 ;make a copy of it
.> 0394 20 d2 03 jsr $03d2 ;get a byte - main loading loop
.> 0397 91 c1 sta ($c1),y ;store it memory at the indirect address loaded from block header
.> 0399 e6 c1 inc $c1 ;increment load address low byte
.> 039b d0 02 bne $039f ;skip next instruction if not equal to 00
.> 039d e6 c2 inc $c2 ;increment load address high byte
.> 039f d0 02 lda $c1 ;load accumulator with save address high byte
.> 03a1 c5 c3 cmp $c3 ;compare accumulator with low byte of end address (affecting the carry flag)
.> 03a3 a5 c2 lda $c2 ;load accumulator with load address high byte
.> 03a5 e5 c4 sbc $c4 ;subtract accumulator with carry from load address high byte (checking for end of file)
.> 03a7 90 eb bcc $0394 ;if carry is clear then continue loading
.> 03a9 20 d2 03 jsr $03d2 ;get a byte
.> 03ac d0 b0 bne $035e ;reset and restart load if not 00 (files to load)
.> 03ae 20 d2 03 jsr $03d2 ;get a byte
.> 03b1 85 4e sta $4e ;store start jump low byte
.> 03b3 20 d2 03 jsr $03d2 ;get a byte
.> 03b6 85 4f sta $4f ;store start jump high byte
.> 03b8 4c e1 02 jmp $02e1 ;jump back to boot file (finished loading now)
.> 03bb 20 e2 03 jsr $03e2 ;get a bit - sync to data block routine
.> 03be 66 bd ror $bd ;rotate bit into input byte. bit orientation is right to left ($bd = $00 on startup)
.> 03c0 a5 bd lda $bd ;load accumulator with input byte
.> 03c2 c9 96 cmp #$96 ;compare input byte to sync byte
.> 03c4 d0 f5 bne $03bb ;if not equal keep checking
.> 03c6 20 d2 03 jsr $03d2 ;get a byte
.> 03c9 c9 96 cmp #$96 ;compare accumulator to sync byte
.> 03cb f0 f9 beq $03c6 ;loop until end of sync mark
.> 03cd c9 81 cmp #$81 ;compare accumulator to block id = #%10000001
.> 03cf d0 ea bne $03bb ;resync
.> 03d1 60 rts ;return from subroutine
.> 03d2 a2 08 ldx #$08 ;load x index with #$08 - get a byte routine
.> 03d4 20 e2 03 jsr $03e2 ;get a bit
.> 03d7 66 bd ror $bd ;rotate bit into input byte
.> 03d9 ee 20 d0 inc $d020 ;increment border color (load effect)
.> 03dc ca dex ;decrememnt x index
.> 03dd d0 f5 bne get loop ;loop if 8 bits not received
.> 03df a5 bd lda $bd ;load accumulator with input byte
.> 03e1 60 rts ;return from subroutine
.> 03e2 a9 10 lda #$10 ;load accumulator with the mask #%00010000
.> 03e4 2c 0d dc bit $dc0d ;test the bits in accumulator against cia 1 interrupt control register
;bit 4: cassette read / serial buss SRQ input
.> 03e7 f9 fb beq $03e4 ;if bit = 0 then wait more
.> 03e9 ad 0d dd lda $dd0d ;load cia 2 interrupt control register
;bit 0 = timer A timeout (0 or 1)
.> 03ec 4a lsr ;shift bit into carry flag
.> 03ed a9 19 lda #$19 ;load accumulator with #%000110011
.> 03ef 8d 0e dd sta $dd0e ;store accumulator in cia 2 control register a
;bit 0 = 1 start timer
;bit 1 = 1 timer a output mode to pb6 = yes
;bit 3 = 1 one shot mode
;bit 4 = 1 force load timer a
;bit 5 = 0 count phase 02 clock cycles
;bit 6 = 0 serial port i/o mode = input
.> 03f2 60 rts ;return from subroutine
...and there it is.
$0351-$0373 turns off irq's, blanks the screen, sets up the timing constants for the loader and turns the cassette motor on.
The load loop begins at $0373, by doing a jsr to the sync routine to sync to a data block. The load then loads the start and end address of the file into zeropage.
Note: Conveniently it makes copies of these datas already, for making your own transfer tool. (which will be the next article using this loader).
The load loop is quite simple, fetching bytes and storing them and then checking for the end of file. Upon reaching the End of file mark, it then loads another byte, if this byte is equal to 00 then it exits the loader and does a few things to start the file, weather it be Machine Language or Basic.
The routine at $03bb is the data block sync routine. It rotates bits in one at a time until it gets a match to the sync byte. then reads sync bytes until it gets another byte. If this byte does not match the data block id byte, then the loader tries to find the next data block.
The byte fetch routine waits for a 1 in bit 4 of $dc0d, the cassette read line. When this bit is set a bit has been read from the datasette. We now check the the timeout flag of cia 2 interrupt control register. If a timeout occured, then the bit is a 1, if not it is a 0. The timer is restarted for the next bit and the fetched bit is rotated into the input byte. When 8 bits have been fetched it loads the input byte and returns.
Well kiddies, thats it for now. Next time we discuss how to make a tape to disk transfer out of this routine so we dont have to crack it by hand everytime. Bye for now.
Fungus/Nostalgia/Onslaught.
64doc
# $Id: 64doc,v 1.8 1994/06/03 19:50:04 jopi Exp $
#
# This file is part of Commodore 64 emulator
# and Program Development System.
#
# See README for copyright notice
#
# This file contains documentation for 6502/6510/8500/8502 instruction set.
#
#
# Written by
# John West (john@ucc.gu.uwa.edu.au)
# Marko Mäkelä (Marko.Makela@HUT.FI)
#
#
# $Log: 64doc,v $
# Revision 1.8 1994/06/03 19:50:04 jopi
# Patchlevel 2
#
# Revision 1.7 1994/04/15 13:07:04 jopi
# 65xx Register descriptions added
#
# Revision 1.6 1994/02/18 16:09:36 jopi
#
# Revision 1.5 1994/01/26 16:08:37 jopi
# X64 version 0.2 PL 1
#
# Revision 1.4 1993/11/10 01:55:34 jopi
#
# Revision 1.3 93/06/21 13:37:18 jopi
# X64 version 0.2 PL 0
#
# Revision 1.2 93/06/21 13:07:15 jopi
# *** empty log message ***
#
#
Note: To extract the uuencoded ML programs in this article most
easily you may use e.g. "uud" by Edwin Kremer <edwin@zlotty>,
which extracts them all at once.
Documentation for the NMOS 65xx/85xx Instruction Set
6510 Instructions by Addressing Modes
6502 Registers
6510/8502 Undocumented Commands
Register selection for load and store
Decimal mode in NMOS 6500 series
6510 features
Different CPU types
6510 Instruction Timing
How Real Programmers Acknowledge Interrupts
Memory Management
Autostart Code
Notes
References
6510 Instructions by Addressing Modes
off- ++++++++++ Positive ++++++++++ ---------- Negative ----------
set 00 20 40 60 80 a0 c0 e0 mode
+00 BRK JSR RTI RTS NOP* LDY CPY CPX Impl/immed
+01 ORA AND EOR ADC STA LDA CMP SBC (indir,x)
+02 t t t t NOP*t LDX NOP*t NOP*t ? /immed
+03 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* (indir,x)
+04 NOP* BIT NOP* NOP* STY LDY CPY CPX Zeropage
+05 ORA AND EOR ADC STA LDA CMP SBC Zeropage
+06 ASL ROL LSR ROR STX LDX DEC INC Zeropage
+07 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Zeropage
+08 PHP PLP PHA PLA DEY TAY INY INX Implied
+09 ORA AND EOR ADC NOP* LDA CMP SBC Immediate
+0a ASL ROL LSR ROR TXA TAX DEX NOP Accu/impl
+0b ANC** ANC** ASR** ARR** ANE** LXA** SBX** SBC* Immediate
+0c NOP* BIT JMP JMP () STY LDY CPY CPX Absolute
+0d ORA AND EOR ADC STA LDA CMP SBC Absolute
+0e ASL ROL LSR ROR STX LDX DEC INC Absolute
+0f SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Absolute
+10 BPL BMI BVC BVS BCC BCS BNE BEQ Relative
+11 ORA AND EOR ADC STA LDA CMP SBC (indir),y
+12 t t t t t t t t ?
+13 SLO* RLA* SRE* RRA* SHA** LAX* DCP* ISB* (indir),y
+14 NOP* NOP* NOP* NOP* STY LDY NOP* NOP* Zeropage,x
+15 ORA AND EOR ADC STA LDA CMP SBC Zeropage,x
+16 ASL ROL LSR ROR STX y) LDX y) DEC INC Zeropage,x
+17 SLO* RLA* SRE* RRA* SAX* y) LAX* y) DCP* ISB* Zeropage,x
+18 CLC SEC CLI SEI TYA CLV CLD SED Implied
+19 ORA AND EOR ADC STA LDA CMP SBC Absolute,y
+1a NOP* NOP* NOP* NOP* TXS TSX NOP* NOP* Implied
+1b SLO* RLA* SRE* RRA* SHS** LAS** DCP* ISB* Absolute,y
+1c NOP* NOP* NOP* NOP* SHY** LDY NOP* NOP* Absolute,x
+1d ORA AND EOR ADC STA LDA CMP SBC Absolute,x
+1e ASL ROL LSR ROR SHX**y) LDX y) DEC INC Absolute,x
+1f SLO* RLA* SRE* RRA* SHA**y) LAX* y) DCP* ISB* Absolute,x
ROR intruction is available on MC650x microprocessors after
June, 1976.
Legend:
t Jams the machine
*t Jams very rarely
* Undocumented command
** Unusual operation
y) indexed using Y instead of X
() indirect instead of absolute
Note that the NOP instructions do have other addressing modes
than the implied addressing. The NOP instruction is just like
any other load instruction, except it does not store the
result anywhere nor affects the flags.
6502 Registers
The NMOS 65xx processors are not ruined with too many registers. In
addition to that, the registers are mostly 8-bit. Here is a brief
description of each register:
PC Program Counter
This register points the address from which the next
instruction byte (opcode or parameter) will be fetched.
Unlike other registers, this one is 16 bits in length. The
low and high 8-bit halves of the register are called PCL
and PCH, respectively.
The Program Counter may be read by pushing its value on
the stack. This can be done either by jumping to a
subroutine or by causing an interrupt.
S Stack pointer
The NMOS 65xx processors have 256 bytes of stack memory,
ranging from $0100 to $01FF. The S register is a 8-bit
offset to the stack page. In other words, whenever
anything is being pushed on the stack, it will be stored
to the address $0100+S.
The Stack pointer can be read and written by transfering
its value to or from the index register X (see below) with
the TSX and TXS instructions.
P Processor status
This 8-bit register stores the state of the processor. The
bits in this register are called flags. Most of the flags
have something to do with arithmetic operations.
The P register can be read by pushing it on the stack
(with PHP or by causing an interrupt). If you only need to
read one flag, you can use the branch instructions.
Setting the flags is possible by pulling the P register
from stack or by using the flag set or clear instructions.
Following is a list of the flags, starting from the 8th
bit of the P register (bit 7, value $80):
N Negative flag
This flag will be set after any arithmetic operations
(when any of the registers A, X or Y is being loaded
with a value). Generally, the N flag will be copied
from the topmost bit of the register being loaded.
Note that TXS (Transfer X to S) is not an arithmetic
operation. Also note that the BIT instruction affects
the Negative flag just like arithmetic operations.
Finally, the Negative flag behaves differently in
Decimal operations (see description below).
V oVerflow flag
Like the Negative flag, this flag is intended to be
used with 8-bit signed integer numbers. The flag will
be affected by addition and subtraction, the
instructions PLP, CLV and BIT, and the hardware signal
-SO. Note that there is no SEV instruction, even though
the MOS engineers loved to use East European abbreviations,
like DDR (Deutsche Demokratische Republik vs. Data
Direction Register). (The Russian abbreviation for their
former trade association COMECON is SEV.) The -SO
(Set Overflow) signal is available on some processors,
at least the 6502, to set the V flag. This enables
response to an I/O activity in equal or less than
three clock cycles when using a BVC instruction branching
to itself ($50 $FE).
The CLV instruction clears the V flag, and the PLP and
BIT instructions copy the flag value from the bit 6 of
the topmost stack entry or from memory.
After a binary addition or subtraction, the V flag
will be set on a sign overflow, cleared otherwise.
What is a sign overflow? For instance, if you are
trying to add 123 and 45 together, the result (168)
does not fit in a 8-bit signed integer (upper limit
127 and lower limit -128). Similarly, adding -123 to
-45 causes the overflow, just like subtracting -45
from 123 or 123 from -45 would do.
Like the N flag, the V flag will not be set as
expected in the Decimal mode. Later in this document
is a precise operation description.
A common misbelief is that the V flag could only be
set by arithmetic operations, not cleared.
1 unused flag
To the current knowledge, this flag is always 1.
B Break flag
This flag is used to distinguish software (BRK)
interrupts from hardware interrupts (IRQ or NMI). The
B flag is always set except when the P register is
being pushed on stack when jumping to an interrupt
routine to process only a hardware interrupt.
The official NMOS 65xx documentation claims that the
BRK instruction could only cause a jump to the IRQ
vector ($FFFE). However, if an NMI interrupt occurs
while executing a BRK instruction, the processor will
jump to the NMI vector ($FFFA), and the P register
will be pushed on the stack with the B flag set.
D Decimal mode flag
This flag is used to select the (Binary Coded) Decimal
mode for addition and subtraction. In most
applications, the flag is zero.
The Decimal mode has many oddities, and it operates
differently on CMOS processors. See the description
of the ADC, SBC and ARR instructions below.
I Interrupt disable flag
This flag can be used to prevent the processor from
jumping to the IRQ handler vector ($FFFE) whenever the
hardware line -IRQ is active. The flag will be
automatically set after taking an interrupt, so that
the processor would not keep jumping to the interrupt
routine if the -IRQ signal remains low for several
clock cycles.
Z Zero flag
The Zero flag will be affected in the same cases than
the Negative flag. Generally, it will be set if an
arithmetic register is being loaded with the value
zero, and cleared otherwise. The flag will behave
differently in Decimal operations.
C Carry flag
This flag is used in additions, subtractions,
comparisons and bit rotations. In additions and
subtractions, it acts as a 9th bit and lets you to
chain operations to calculate with bigger than 8-bit
numbers. When subtracting, the Carry flag is the
negative of Borrow: if an overflow occurs, the flag
will be clear, otherwise set. Comparisons are a
special case of subtraction: they assume Carry flag
set and Decimal flag clear, and do not store the
result of the subtraction anywhere.
There are four kinds of bit rotations. All of them
store the bit that is being rotated off to the Carry
flag. The left shifting instructions are ROL and ASL.
ROL copies the initial Carry flag to the lowmost bit
of the byte; ASL always clears it. Similarly, the ROR
and LSR instructions shift to the right.
A Accumulator
The accumulator is the main register for arithmetic and
logic operations. Unlike the index registers X and Y, it
has a direct connection to the Arithmetic and Logic Unit
(ALU). This is why many operations are only available for
the accumulator, not the index registers.
X Index register X
This is the main register for addressing data with
indices. It has a special addressing mode, indexed
indirect, which lets you to have a vector table on the
zero page.
Y Index register Y
The Y register has the least operations available. On the
other hand, only it has the indirect indexed addressing
mode that enables access to any memory place without
having to use self-modifying code.
6510/8502 Undocumented Commands
-- A brief explanation about what may happen while
using don't care states.
ANE $8B A = (A | #$EE) & X & #byte
same as
A = ((A & #$11 & X) | ( #$EE & X)) & #byte
In real 6510/8502 the internal parameter #$11
may occasionally be #$10, #$01 or even #$00.
This occurs when the video chip starts DMA
between the opcode fetch and the parameter fetch
of the instruction. The value probably depends
on the data that was left on the bus by the VIC-II.
LXA $AB C=Lehti: A = X = ANE
Alternate: A = X = (A & #byte)
TXA and TAX have to be responsible for these.
SHA $93,$9F Store (A & X & (ADDR_HI + 1))
SHX $9E Store (X & (ADDR_HI + 1))
SHY $9C Store (Y & (ADDR_HI + 1))
SHS $9B SHA and TXS, where X is replaced by (A & X).
Note: The value to be stored is copied also
to ADDR_HI if page boundary is crossed.
SBX $CB Carry and Decimal flags are ignored but the
Carry flag will be set in substraction. This
is due to the CMP command, which is executed
instead of the real SBC.
ARR $6B This instruction first performs an AND
between the accumulator and the immediate
parameter, then it shifts the accumulator to
the right. However, this is not the whole
truth. See the description below.
Many undocumented commands do not use AND between registers, the CPU
just throws the bytes to a bus simultaneously and lets the
open-collector drivers perform the AND. I.e. the command called 'SAX',
which is in the STORE section (opcodes $A0...$BF), stores the result
of (A & X) by this way.
More fortunate is its opposite, 'LAX' which just loads a byte
simultaneously into both A and X.
$6B ARR
This instruction seems to be a harmless combination of AND and ROR at
first sight, but it turns out that it affects the V flag and also has
a special kind of decimal mode. This is because the instruction has
inherited some properties of the ADC instruction ($69) in addition to
the ROR ($6A).
In Binary mode (D flag clear), the instruction effectively does an AND
between the accumulator and the immediate parameter, and then shifts
the accumulator to the right, copying the C flag to the 8th bit. It
sets the Negative and Zero flags just like the ROR would. The ADC code
shows up in the Carry and oVerflow flags. The C flag will be copied
from the bit 6 of the result (which doesn't seem too logical), and the
V flag is the result of an Exclusive OR operation between the bit 6
and the bit 5 of the result. This makes sense, since the V flag will
be normally set by an Exclusive OR, too.
In Decimal mode (D flag set), the ARR instruction first performs the
AND and ROR, just like in Binary mode. The N flag will be copied from
the initial C flag, and the Z flag will be set according to the ROR
result, as expected. The V flag will be set if the bit 6 of the
accumulator changed its state between the AND and the ROR, cleared
otherwise.
Now comes the funny part. If the low nybble of the AND result,
incremented by its lowmost bit, is greater than 5, the low nybble in
the ROR result will be incremented by 6. The low nybble may overflow
as a consequence of this BCD fixup, but the high nybble won't be
adjusted. The high nybble will be BCD fixed in a similar way. If the
high nybble of the AND result, incremented by its lowmost bit, is
greater than 5, the high nybble in the ROR result will be incremented
by 6, and the Carry flag will be set. Otherwise the C flag will be
cleared.
To help you understand this description, here is a C routine that
illustrates the ARR operation in Decimal mode:
unsigned
A, /* Accumulator */
AL, /* low nybble of accumulator */
AH, /* high nybble of accumulator */
C, /* Carry flag */
Z, /* Zero flag */
V, /* oVerflow flag */
N, /* Negative flag */
t, /* temporary value */
s; /* value to be ARRed with Accumulator */
t = A & s; /* Perform the AND. */
AH = t >> 4; /* Separate the high */
AL = t & 15; /* and low nybbles. */
N = C; /* Set the N and */
Z = !(A = (t >> 1) | (C << 7)); /* Z flags traditionally */
V = (t ^ A) & 64; /* and V flag in a weird way. */
if (AL + (AL & 1) > 5) /* BCD "fixup" for low nybble. */
A = (A & 0xF0) | ((A + 6) & 0xF);
if (C = AH + (AH & 1) > 5) /* Set the Carry flag. */
A = (A + 0x60) & 0xFF; /* BCD "fixup" for high nybble. */
$CB SBX X <- (A & X) - Immediate
The 'SBX' ($CB) may seem to be very complex operation, even though it
is a combination of the subtraction of accumulator and parameter, as
in the 'CMP' instruction, and the command 'DEX'. As a result, both A
and X are connected to ALU but only the subtraction takes place. Since
the comparison logic was used, the result of subtraction should be
normally ignored, but the 'DEX' now happily stores to X the value of
(A & X) - Immediate. That is why this instruction does not have any
decimal mode, and it does not affect the V flag. Also Carry flag will
be ignored in the subtraction but set according to the result.
Proof:
begin 644 vsbx
M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```*D`H#V1*Z`_D2N@09$KJ0>%
M^QBE^VEZJ+$KH#F1*ZD`2"BI`*(`RP`(:-B@.5$K*4#P`E@`H#VQ*SAI`)$K
JD-Z@/[$K:0"1*Y#4J2X@TO\XH$&Q*VD`D2N0Q,;[$+188/_^]_:_OK>V
`
end
and
begin 644 sbx
M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI`*!-D2N@3Y$KH%&1*ZD#
MA?L8I?M*2)`#J1@LJ3B@29$K:$J0`ZGX+*G8R)$K&/BXJ?2B8\L)AOP(:(7]
MV#B@3;$KH$\Q*Z!1\2L(1?SP`0!H1?TIM]#XH$VQ*SAI`)$KD,N@3[$K:0"1
9*Y#!J2X@TO\XH%&Q*VD`D2N0L<;[$))88-#X
`
end
These test programs show if your machine is compatible with ours
regarding the opcode $CB. The first test, vsbx, proves that SBX does
not affect the V flag. The latter one, sbx, proves the rest of our
theory. The vsbx test tests 33554432 SBX combinations (16777216
different A, X and Immediate combinations, and two different V flag
states), and the sbx test doubles that amount (16777216*4 D and C flag
combinations). Both tests have run successfully on a C64 and a Vic20.
They ought to run on C16, +4 and the PET series as well. The tests
stop with BRK, if the opcode $CB does not work as expected. Successful
operation ends in RTS. As the tests are very slow, they print dots on
the screen while running so that you know that the machine has not
jammed. On computers running at 1 MHz, the first test prints
approximately one dot every four seconds and a total of 2048 dots,
whereas the second one prints half that amount, one dot every seven
seconds.
If the tests fail on your machine, please let us know your processor's
part number and revision. If possible, save the executable (after it
has stopped with BRK) under another name and send it to us so that we
know at which stage the program stopped.
The following program is a Commodore 64 executable that Marko Mäkelä
developed when trying to find out how the V flag is affected by SBX.
(It was believed that the SBX affects the flag in a weird way, and
this program shows how SBX sets the flag differently from SBC.) You
may find the subroutine at $C150 useful when researching other
undocumented instructions' flags. Run the program in a machine
language monitor, as it makes use of the BRK instruction. The result
tables will be written on pages $C2 and $C3.
begin 644 sbx-c100
M`,%XH`",#L&,$,&,$L&XJ8*B@LL7AOL(:(7\N#BM#L$M$,'M$L$(Q?OP`B@`
M:$7\\`,@4,'N#L'0U.X0P=#/SB#0[A+!T,<``````````````)BJ\!>M#L$M
L$,'=_\'0":T2P=W_PM`!8,K0Z:T.P2T0P9D`PID`!*T2P9D`PYD`!<C0Y``M
`
end
Other undocumented instructions usually cause two preceding opcodes
being executed. However 'NOP' seems to completely disappear from 'SBC'
code $EB.
The most difficult to comprehend are the rest of the instructions
located on the '$0B' line.
All the instructions located at the positive (left) side of this line
should rotate either memory or the accumulator, but the addressing
mode turns out to be immediate! No problem. Just read the operand, let
it be ANDed with the accumulator and finally use accumulator
addressing mode for the instructions above them.
RELIGION_MODE_ON
/* This part of the document is not accurate. You can
read it as a fairy tale, but do not count on it when
performing your own measurements. */
The rest two instructions on the same line, called 'ANE' and 'LXA'
($8B and $AB respectively) often give quite unpredictable results.
However, the most usual operation is to store ((A | #$ee) & X & #$nn)
to accumulator. Note that this does not work reliably in a real 64!
In the Commodore 128 the opcode $8B uses values 8C, CC, EE, and
occasionally 0C and 8E for the OR instead of EE,EF,FE and FF used in
the C64. With a C128 running at 2 MHz #$EE is always used. Opcode $AB
does not cause this OR taking place on 8502 while 6510 always performs
it. Note that this behaviour depends on processor and/or video chip
revision.
Let's take a closer look at $8B (6510).
A <- X & D & (A | VAL)
where VAL comes from this table:
X high D high D low VAL
even even --- $EE (1)
even odd --- $EE
odd even --- $EE
odd odd 0 $EE
odd odd not 0 $FE (2)
(1) If the bottom 2 bits of A are both 1, then the LSB of the result may
be 0. The values of X and D are different every time I run the test.
This appears to be very rare.
(2) VAL is $FE most of the time. Sometimes it is $EE - it seems to be random,
not related to any of the data. This is much more common than (1).
In decimal mode, VAL is usually $FE.
Two different functions have been discovered for LXA, opcode $AB. One
is A = X = ANE (see above) and the other, encountered with 6510 and
8502, is less complicated A = X = (A & #byte). However, according to
what is reported, the version altering only the lowest bits of each
nybble seems to be more common.
What happens, is that $AB loads a value into both A and X, ANDing the
low bit of each nybble with the corresponding bit of the old
A. However, there are exceptions. Sometimes the low bit is cleared
even when A contains a '1', and sometimes other bits are cleared. The
exceptions seem random (they change every time I run the test). Oops -
that was in decimal mode. Much the same with D=0.
What causes the randomness? Probably it is that it is marginal logic
levels - when too much wired-anding goes on, some of the signals get
very close to the threshold. Perhaps we're seeing some of them step
over it. The low bit of each nybble is special, since it has to cope
with carry differently (remember decimal mode). We never see a '0'
turn into a '1'.
Since these instructions are unpredictable, they should not be used.
There is still very strange instruction left, the one named SHA/X/Y,
which is the only one with only indexed addressing modes. Actually,
the commands 'SHA', 'SHX' and 'SHY' are generated by the indexing
algorithm.
While using indexed addressing, effective address for page boundary
crossing is calculated as soon as possible so it does not slow down
operation. As a result, in the case of SHA/X/Y, the address and data
are processed at the same time making AND between them to take place.
Thus, the value to be stored by SAX, for example, is in fact (A & X &
(ADDR_HI + 1)). On page boundary crossing the same value is copied
also to high byte of the effective address.
RELIGION_MODE_OFF
Register selection for load and store
bit1 bit0 A X Y
0 0 x
0 1 x
1 0 x
1 1 x x
So, A and X are selected by bits 1 and 0 respectively, while
~(bit1|bit0) enables Y.
Indexing is determined by bit4, even in relative addressing mode,
which is one kind of indexing.
Lines containing opcodes xxx000x1 (01 and 03) are treated as absolute
after the effective address has been loaded into CPU.
Zeropage,y and Absolute,y (codes 10x1 x11x) are distinquished by bit5.
Decimal mode in NMOS 6500 series
Most sources claim that the NMOS 6500 series sets the N, V and Z
flags unpredictably when performing addition or subtraction in decimal
mode. Of course, this is not true. While testing how the flags are
set, I also wanted to see what happens if you use illegal BCD values.
ADC works in Decimal mode in a quite complicated way. It is amazing
how it can do that all in a single cycle. Here's a C code version of
the instruction:
unsigned
A, /* Accumulator */
AL, /* low nybble of accumulator */
AH, /* high nybble of accumulator */
C, /* Carry flag */
Z, /* Zero flag */
V, /* oVerflow flag */
N, /* Negative flag */
s; /* value to be added to Accumulator */
AL = (A & 15) + (s & 15) + C; /* Calculate the lower nybble. */
AH = (A >> 4) + (s >> 4) + (AL > 15); /* Calculate the upper nybble. */
if (AL > 9) AL += 6; /* BCD fixup for lower nybble. */
Z = ((A + s + C) & 255 != 0); /* Zero flag is set just
like in Binary mode. */
/* Negative and Overflow flags are set with the same logic than in
Binary mode, but after fixing the lower nybble. */
N = (AH & 8 != 0);
V = ((AH << 4) ^ A) & 128 && !((A ^ s) & 128);
if (AH > 9) AH += 6; /* BCD fixup for upper nybble. */
/* Carry is the only flag set after fixing the result. */
C = (AH > 15);
A = ((AH << 4) | (AL & 15)) & 255;
The C flag is set as the quiche eaters expect, but the N and V flags
are set after fixing the lower nybble but before fixing the upper one.
They use the same logic than binary mode ADC. The Z flag is set before
any BCD fixup, so the D flag does not have any influence on it.
Proof: The following test program tests all 131072 ADC combinations in
Decimal mode, and aborts with BRK if anything breaks this theory.
If everything goes well, it ends in RTS.
begin 600 dadc
M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'BI&* A/N$_$B@+)$KH(V1
M*Q@(I?PI#X7]I?LI#V7]R0J0 FD%J"D/A?VE^RGP9?PI\ C $) ":0^JL @H
ML ?)H) &""@X:5\X!?V%_0AH*3W@ ! ""8"HBD7[$ JE^T7\, 28"4"H**7[
M9?S0!)@) J@8N/BE^V7\V A%_= G:(3]1?W0(.;[T(?F_-"#:$D8\ )88*D=
0&&4KA?NI &4LA?RI.&S[ A%
end
All programs in this chapter have been successfully tested on a Vic20
and a Commodore 64 and a Commodore 128D in C64 mode. They should run on
C16, +4 and on the PET series as well. If not, please report the problem
to Marko Mäkelä. Each test in this chapter should run in less than a
minute at 1 MHz.
SBC is much easier. Just like CMP, its flags are not affected by
the D flag.
Proof:
begin 600 dsbc-cmp-flags
M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'B@ (3[A/RB XH8:66HL2N@
M09$KH$R1*XII::BQ*Z!%D2N@4)$K^#BXI?OE_-@(:(7].+BE^^7\"&A%_? !
5 .;[T./F_-#?RA"_8!@X&#CEY<7%
end
The only difference in SBC's operation in decimal mode from binary mode
is the result-fixup:
unsigned
A, /* Accumulator */
AL, /* low nybble of accumulator */
AH, /* high nybble of accumulator */
C, /* Carry flag */
Z, /* Zero flag */
V, /* oVerflow flag */
N, /* Negative flag */
s; /* value to be added to Accumulator */
AL = (A & 15) - (s & 15) - !C; /* Calculate the lower nybble. */
if (AL & 16) AL -= 6; /* BCD fixup for lower nybble. */
AH = (A >> 4) - (s >> 4) - (AL & 16); /* Calculate the upper nybble. */
if (AH & 16) AH -= 6; /* BCD fixup for upper nybble. */
/* The flags are set just like in Binary mode. */
C = (A - s - !C) & 256 != 0;
Z = (A - s - !C) & 255 != 0;
V = ((A - s - !C) ^ s) & 128 && (A ^ s) & 128;
N = (A - s - !C) & 128 != 0;
A = ((AH << 4) | (AL & 15)) & 255;
Again Z flag is set before any BCD fixup. The N and V flags are set
at any time before fixing the high nybble. The C flag may be set in any
phase.
Decimal subtraction is easier than decimal addition, as you have to
make the BCD fixup only when a nybble overflows. In decimal addition,
you had to verify if the nybble was greater than 9. The processor has
an internal "half carry" flag for the lower nybble, used to trigger
the BCD fixup. When calculating with legal BCD values, the lower nybble
cannot overflow again when fixing it.
So, the processor does not handle overflows while performing the fixup.
Similarly, the BCD fixup occurs in the high nybble only if the value
overflows, i.e. when the C flag will be cleared.
Because SBC's flags are not affected by the Decimal mode flag, you
could guess that CMP uses the SBC logic, only setting the C flag
first. But the SBX instruction shows that CMP also temporarily clears
the D flag, although it is totally unnecessary.
The following program, which tests SBC's result and flags,
contains the 6502 version of the pseudo code example above.
begin 600 dsbc
M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'BI&* A/N$_$B@+)$KH':1
M*S@(I?PI#X7]I?LI#^7]L /I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL KI7RBP
M#ND/.+ )*+ &Z0^P NE?A/T%_87]*+BE^^7\"&BH.+CXI?OE_-@(1?W0FVB$
8_47]T)3F^]">YOS0FFA)&- $J3C0B%A@
end
Obviously the undocumented instructions RRA (ROR+ADC) and ISB
(INC+SBC) have inherited also the decimal operation from the official
instructions ADC and SBC. The program droradc proves this statement
for ROR, and the dincsbc test proves this for ISB. Finally,
dincsbc-deccmp proves that ISB's and DCP's (DEC+CMP) flags are not
affected by the D flag.
begin 644 droradc
M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH(V1
M*S@(I?PI#X7]I?LI#V7]R0J0`FD%J"D/A?VE^RGP9?PI\`C`$)`":0^JL`@H
ML`?)H)`&""@X:5\X!?V%_0AH*3W@`!`""8"HBD7[$`JE^T7\,`28"4"H**7[
M9?S0!)@)`J@XN/BE^R;\9_S8"$7]T"=HA/U%_=`@YOO0A>;\T(%H21CP`EA@
2J1T892N%^ZD`92R%_*DX;/L`
`
end
begin 644 dincsbc
M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH':1
M*S@(I?PI#X7]I?LI#^7]L`/I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL`KI7RBP
M#ND/.+`)*+`&Z0^P`NE?A/T%_87]*+BE^^7\"&BH.+CXI?O&_.?\V`A%_="9
::(3]1?W0DN;[T)SF_-"8:$D8T`2I.-"&6&#\
`
end
begin 644 dincsbc-deccmp
M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'B@`(3[A/RB`XH8:7>HL2N@
M3Y$KH%R1*XII>ZBQ*Z!3D2N@8)$KBFE_J+$KH%61*Z!BD2OX.+BE^^;\Q_S8
L"&B%_3BXI?OF_,?\"&A%_?`!`.;[T-_F_-#;RA"M8!@X&#CFYL;&Q\?GYP#8
`
end
6510 features
o PHP always pushes the Break (B) flag as a `1' to the stack.
Jukka Tapanimäki claimed in C=lehti issue 3/89, on page 27 that the
processor makes a logical OR between the status register's bit 4
and the bit 8 of the stack pointer register (which is always 1).
He did not give any reasons for this argument, and has refused to clarify
it afterwards. Well, this was not the only error in his article...
o Indirect addressing modes do not handle page boundary crossing at all.
When the parameter's low byte is $FF, the effective address wraps
around and the CPU fetches high byte from $xx00 instead of $xx00+$0100.
E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100,
and LDA ($FF),Y fetches the base address from $FF and $00.
o Indexed zero page addressing modes never fix the page address on
crossing the zero page boundary.
E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01.
o The processor always fetches the byte following a relative branch
instruction. If the branch is taken, the processor reads then the
opcode from the destination address. If page boundary is crossed, it
first reads a byte from the old page from a location that is bigger
or smaller than the correct address by one page.
o If you cross a page boundary in any other indexed mode,
the processor reads an incorrect location first, a location that is
smaller by one page.
o Read-Modify-Write instructions write unmodified data, then modified
(so INC effectively does LDX loc;STX loc;INX;STX loc)
o -RDY is ignored during writes
(This is why you must wait 3 cycles before doing any DMA --
the maximum number of consecutive writes is 3, which occurs
during interrupts except -RESET.)
o Some undefined opcodes may give really unpredictable results.
o All registers except the Program Counter remain unmodified after -RESET.
(This is why you must preset D and I flags in the RESET handler.)
Different CPU types
The Rockwell data booklet 29651N52 (technical information about R65C00
microprocessors, dated October 1984), lists the following differences between
NMOS R6502 microprocessor and CMOS R65C00 family:
1. Indexed addressing across page boundary.
NMOS: Extra read of invalid address.
CMOS: Extra read of last instruction byte.
2. Execution of invalid op codes.
NMOS: Some terminate only by reset. Results are undefined.
CMOS: All are NOPs (reserved for future use).
3. Jump indirect, operand = XXFF.
NMOS: Page address does not increment.
CMOS: Page address increments and adds one additional cycle.
4. Read/modify/write instructions at effective address.
NMOS: One read and two write cycles.
CMOS: Two read and one write cycle.
5. Decimal flag.
NMOS: Indeterminate after reset.
CMOS: Initialized to binary mode (D=0) after reset and interrupts.
6. Flags after decimal operation.
NMOS: Invalid N, V and Z flags.
CMOS: Valid flag adds one additional cycle.
7. Interrupt after fetch of BRK instruction.
NMOS: Interrupt vector is loaded, BRK vector is ignored.
CMOS: BRK is executed, then interrupt is executed.
6510 Instruction Timing
The NMOS 6500 series processors always perform at least two reads
for each instruction. In addition to the operation code (opcode), they
fetch the next byte. This is quite efficient, as most instructions are
two or three bytes long.
The processors also use a sort of pipelining. If an instruction does
not store data in memory on its last cycle, the processor can fetch
the opcode of the next instruction while executing the last cycle. For
instance, the instruction EOR #$FF truly takes three cycles. On the
first cycle, the opcode $49 will be fetched. During the second cycle
the processor decodes the opcode and fetches the parameter #$FF. On
the third cycle, the processor will perform the operation and store
the result to accumulator, but simultaneously it fetches the opcode
for the next instruction. This is why the instruction effectively
takes only two cycles.
The following tables show what happens on the bus while executing
different kinds of instructions.
Interrupts
NMI and IRQ both take 7 cycles. Their timing diagram is much like
BRK's (see below). IRQ will be executed only when the I flag is
clear. IRQ and BRK both set the I flag, whereas the NMI does not
affect its state.
The processor will usually wait for the current instruction to
complete before executing the interrupt sequence. To process the
interrupt before the next instruction, the interrupt must occur
before the last cycle of the current instruction.
There is one exception to this rule: the BRK instruction. If a
hardware interrupt (NMI or IRQ) occurs before the fourth (flags
saving) cycle of BRK, the BRK instruction will be skipped, and
the processor will jump to the hardware interrupt vector. This
sequence will always take 7 cycles.
You do not completely lose the BRK interrupt, the B flag will be
set in the pushed status register if a BRK instruction gets
interrupted. When BRK and IRQ occur at the same time, this does
not cause any problems, as your program will consider it as a
BRK, and the IRQ would occur again after the processor returned
from your BRK routine, unless you cleared the interrupt source in
your BRK handler. But the simultaneous occurrence of NMI and BRK
is far more fatal. If you do not check the B flag in the NMI
routine and subtract two from the return address when needed, the
BRK instruction will be skipped.
If the NMI and IRQ interrupts overlap each other (one interrupt
occurs before fetching the interrupt vector for the other
interrupt), the processor will most probably jump to the NMI
vector in every case, and then jump to the IRQ vector after
processing the first instruction of the NMI handler. This has not
been measured yet, but the IRQ is very similar to BRK, and many
sources state that the NMI has higher priority than IRQ. However,
it might be that the processor takes the interrupt that comes
later, i.e. you could lose an NMI interrupt if an IRQ occurred in
four cycles after it.
After finishing the interrupt sequence, the processor will start
to execute the first instruction of the interrupt routine. This
proves that the processor uses a sort of pipelining: it finishes
the current instruction (or interrupt sequence) while reading the
opcode of the next instruction.
RESET does not push program counter on stack, and it lasts
probably 6 cycles after deactivating the signal. Like NMI, RESET
preserves all registers except PC.
Instructions accessing the stack
BRK
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away),
increment PC
3 $0100,S W push PCH on stack, decrement S
4 $0100,S W push PCL on stack, decrement S
5 $0100,S W push P on stack (with B flag set), decrement S
6 $FFFE R fetch PCL
7 $FFFF R fetch PCH
RTI
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
3 $0100,S R increment S
4 $0100,S R pull P from stack, increment S
5 $0100,S R pull PCL from stack, increment S
6 $0100,S R pull PCH from stack
RTS
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
3 $0100,S R increment S
4 $0100,S R pull PCL from stack, increment S
5 $0100,S R pull PCH from stack
6 PC R increment PC
PHA, PHP
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
3 $0100,S W push register on stack, decrement S
PLA, PLP
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
3 $0100,S R increment S
4 $0100,S R pull register from stack
JSR
# address R/W description
--- ------- --- -------------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low address byte, increment PC
3 $0100,S R internal operation (predecrement S?)
4 $0100,S W push PCH on stack, decrement S
5 $0100,S W push PCL on stack, decrement S
6 PC R copy low address byte to PCL, fetch high address
byte to PCH
Accumulator or implied addressing
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
Immediate addressing
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch value, increment PC
Absolute addressing
JMP
# address R/W description
--- ------- --- -------------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low address byte, increment PC
3 PC R copy low address byte to PCL, fetch high address
byte to PCH
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
LAX, NOP)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address, increment PC
4 address R read from effective address
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address, increment PC
4 address R read from effective address
5 address W write the value back to effective address,
and do the operation on it
6 address W write the new value to effective address
Write instructions (STA, STX, STY, SAX)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address, increment PC
4 address W write register to effective address
Zero page addressing
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
LAX, NOP)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from effective address
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from effective address
4 address W write the value back to effective address,
and do the operation on it
5 address W write the new value to effective address
Write instructions (STA, STX, STY, SAX)
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address W write register to effective address
Zero page indexed addressing
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
LAX, NOP)
# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from address, add index register to it
4 address+I* R read from effective address
Notes: I denotes either index register (X or Y).
* The high byte of the effective address is always zero,
i.e. page boundary crossings are not handled.
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- --------- --- ---------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from address, add index register X to it
4 address+X* R read from effective address
5 address+X* W write the value back to effective address,
and do the operation on it
6 address+X* W write the new value to effective address
Note: * The high byte of the effective address is always zero,
i.e. page boundary crossings are not handled.
Write instructions (STA, STX, STY, SAX)
# address R/W description
--- --------- --- -------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from address, add index register to it
4 address+I* W write to effective address
Notes: I denotes either index register (X or Y).
* The high byte of the effective address is always zero,
i.e. page boundary crossings are not handled.
Absolute indexed addressing
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
LAX, LAE, SHS, NOP)
# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address,
add index register to low address byte,
increment PC
4 address+I* R read from effective address,
fix the high byte of effective address
5+ address+I R re-read from effective address
Notes: I denotes either index register (X or Y).
* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.
+ This cycle will be executed only if the effective address
was invalid during cycle #4, i.e. page boundary was crossed.
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address,
add index register X to low address byte,
increment PC
4 address+X* R read from effective address,
fix the high byte of effective address
5 address+X R re-read from effective address
6 address+X W write the value back to effective address,
and do the operation on it
7 address+X W write the new value to effective address
Notes: * The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.
Write instructions (STA, STX, STY, SHA, SHX, SHY)
# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low byte of address, increment PC
3 PC R fetch high byte of address,
add index register to low address byte,
increment PC
4 address+I* R read from effective address,
fix the high byte of effective address
5 address+I W write to effective address
Notes: I denotes either index register (X or Y).
* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100. Because
the processor cannot undo a write to an invalid
address, it always reads from the address first.
Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS)
# address R/W description
--- --------- --- ---------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch operand, increment PC
3 PC R Fetch opcode of next instruction,
If branch is taken, add operand to PCL.
Otherwise increment PC.
4+ PC* R Fetch opcode of next instruction.
Fix PCH. If it did not change, increment PC.
5! PC R Fetch opcode of next instruction,
increment PC.
Notes: The opcode fetch of the next instruction is included to
this diagram for illustration purposes. When determining
real execution times, remember to subtract the last
cycle.
* The high byte of Program Counter (PCH) may be invalid
at this time, i.e. it may be smaller or bigger by $100.
+ If branch is taken, this cycle will be executed.
! If branch occurs to different page, this cycle will be
executed.
Indexed indirect addressing
Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R read from the address, add X to it
4 pointer+X R fetch effective address low
5 pointer+X+1 R fetch effective address high
6 address R read from effective address
Note: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R read from the address, add X to it
4 pointer+X R fetch effective address low
5 pointer+X+1 R fetch effective address high
6 address R read from effective address
7 address W write the value back to effective address,
and do the operation on it
8 address W write the new value to effective address
Note: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
Write instructions (STA, SAX)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R read from the address, add X to it
4 pointer+X R fetch effective address low
5 pointer+X+1 R fetch effective address high
6 address W write to effective address
Note: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
Indirect indexed addressing
Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R fetch effective address low
4 pointer+1 R fetch effective address high,
add Y to low byte of effective address
5 address+Y* R read from effective address,
fix high byte of effective address
6+ address+Y R read from effective address
Notes: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.
+ This cycle will be executed only if the effective address
was invalid during cycle #5, i.e. page boundary was crossed.
Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R fetch effective address low
4 pointer+1 R fetch effective address high,
add Y to low byte of effective address
5 address+Y* R read from effective address,
fix high byte of effective address
6 address+Y R read from effective address
7 address+Y W write the value back to effective address,
and do the operation on it
8 address+Y W write the new value to effective address
Notes: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.
Write instructions (STA, SHA)
# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R fetch effective address low
4 pointer+1 R fetch effective address high,
add Y to low byte of effective address
5 address+Y* R read from effective address,
fix high byte of effective address
6 address+Y W write to effective address
Notes: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.
* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.
Absolute indirect addressing (JMP)
# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address low, increment PC
3 PC R fetch pointer address high, increment PC
4 pointer R fetch low address to latch
5 pointer+1* R fetch PCH, copy latch to PCL
Note: * The PCH will always be fetched from the same page
than PCL, i.e. page boundary crossing is not handled.
How Real Programmers Acknowledge Interrupts
With RMW instructions:
; beginning of combined raster/timer interrupt routine
LSR $D019 ; clear VIC interrupts, read raster interrupt flag to C
BCS raster ; jump if VIC caused an interrupt
... ; timer interrupt routine
Operational diagram of LSR $D019:
# data address R/W
--- ---- ------- --- ---------------------------------
1 4E PC R fetch opcode
2 19 PC+1 R fetch address low
3 D0 PC+2 R fetch address high
4 xx $D019 R read memory
5 xx $D019 W write the value back, rotate right
6 xx/2 $D019 W write the new value back
The 5th cycle acknowledges the interrupt by writing the same
value back. If only raster interrupts are used, the 6th cycle
has no effect on the VIC. (It might acknowledge also some
other interrupts.)
With indexed addressing:
; acknowledge interrupts to both CIAs
LDX #$10
LDA $DCFD,X
Operational diagram of LDA $DCFD,X:
# data address R/W description
--- ---- ------- --- ---------------------------------
1 BD PC R fetch opcode
2 FD PC+1 R fetch address low
3 DC PC+2 R fetch address high, add X to address low
4 xx $DC0D R read from address, fix high byte of address
5 yy $DD0D R read from right address
; acknowledge interrupts to CIA 2
LDX #$10
STA $DDFD,X
Operational diagram of STA $DDFD,X:
# data address R/W description
--- ---- ------- --- ---------------------------------
1 9D PC R fetch opcode
2 FD PC+1 R fetch address low
3 DC PC+2 R fetch address high, add X to address low
4 xx $DD0D R read from address, fix high byte of address
5 ac $DE0D W write to right address
With branch instructions:
; acknowledge interrupts to CIA 2
LDA #$00 ; clear N flag
JMP $DD0A
DD0A BPL $DC9D ; branch
DC9D BRK ; return
You need the following preparations to initialize the CIA registers:
LDA #$91 ; argument of BPL
STA $DD0B
LDA #$10 ; BPL
STA $DD0A
STA $DD08 ; load the ToD values from the latches
LDA $DD0B ; freeze the ToD display
LDA #$7F
STA $DC0D ; assure that $DC0D is $00
Operational diagram of BPL $DC9D:
# data address R/W description
--- ---- ------- --- ---------------------------------
1 10 $DD0A R fetch opcode
2 91 $DD0B R fetch argument
3 xx $DD0C R fetch opcode, add argument to PCL
4 yy $DD9D R fetch opcode, fix PCH
( 5 00 $DC9D R fetch opcode )
; acknowledge interrupts to CIA 1
LSR ; clear N flag
JMP $DCFA
DCFA BPL $DD0D
DD0D BRK
; Again you need to set the ToD registers of CIA 1 and the
; Interrupt Control Register of CIA 2 first.
Operational diagram of BPL $DD0D:
# data address R/W description
--- ---- ------- --- ---------------------------------
1 10 $DCFA R fetch opcode
2 11 $DCFB R fetch argument
3 xx $DCFC R fetch opcode, add argument to PCL
4 yy $DC0D R fetch opcode, fix PCH
( 5 00 $DD0D R fetch opcode )
; acknowledge interrupts to CIA 2 automagically
; preparations
LDA #$7F
STA $DD0D ; disable all interrupt sources of CIA2
LDA $DD0E
AND #$BE ; ensure that $DD0C remains constant
STA $DD0E ; and stop the timer
LDA #$FD
STA $DD0C ; parameter of BPL
LDA #$10
STA $DD0B ; BPL
LDA #$40
STA $DD0A ; RTI/parameter of LSR
LDA #$46
STA $DD09 ; LSR
STA $DD08 ; load the ToD values from the latches
LDA $DD0B ; freeze the ToD display
LDA #$09
STA $0318
LDA #$DD
STA $0319 ; change NMI vector to $DD09
LDA #$FF ; Try changing this instruction's operand
STA $DD05 ; (see comment below).
LDA #$FF
STA $DD04 ; set interrupt frequency to 1/65536 cycles
LDA $DD0E
AND #$80
ORA #$11
LDX #$81
STX $DD0D ; enable timer interrupt
STA $DD0E ; start timer
LDA #$00 ; To see that the interrupts really occur,
STA $D011 ; use something like this and see how
LOOP DEC $D020 ; changing the byte loaded to $DD05 from
BNE LOOP ; #$FF to #$0F changes the image.
When an NMI occurs, the processor jumps to Kernal code, which jumps to
($0318), which points to the following routine:
DD09 LSR $40 ; clear N flag
BPL $DD0A ; Note: $DD0A contains RTI.
Operational diagram of BPL $DD0A:
# data address R/W description
--- ---- ------- --- ---------------------------------
1 10 $DD0B R fetch opcode
2 11 $DD0C R fetch argument
3 xx $DD0D R fetch opcode, add argument to PCL
4 40 $DD0A R fetch opcode, (fix PCH)
With RTI:
; the fastest possible interrupt handler in the 6500 family
; preparations
SEI
LDA $01 ; disable ROM and enable I/O
AND #$FD
ORA #$05
STA $01
LDA #$7F
STA $DD0D ; disable CIA 2's all interrupt sources
LDA $DD0E
AND #$BE ; ensure that $DD0C remains constant
STA $DD0E ; and stop the timer
LDA #$40
STA $DD0C ; store RTI to $DD0C
LDA #$0C
STA $FFFA
LDA #$DD
STA $FFFB ; change NMI vector to $DD0C
LDA #$FF ; Try changing this instruction's operand
STA $DD05 ; (see comment below).
LDA #$FF
STA $DD04 ; set interrupt frequency to 1/65536 cycles
LDA $DD0E
AND #$80
ORA #$11
LDX #$81
STX $DD0D ; enable timer interrupt
STA $DD0E ; start timer
LDA #$00 ; To see that the interrupts really occur,
STA $D011 ; use something like this and see how
LOOP DEC $D020 ; changing the byte loaded to $DD05 from
BNE LOOP ; #$FF to #$0F changes the image.
When an NMI occurs, the processor jumps to Kernal code, which
jumps to ($0318), which points to the following routine:
DD0C RTI
How on earth can this clear the interrupts? Remember, the
processor always fetches two successive bytes for each
instruction.
A little more practical version of this is redirecting the NMI
(or IRQ) to your own routine, whose last instruction is JMP
$DD0C or JMP $DC0C. If you want to confuse more, change the 0
in the address to a hexadecimal digit different from the one
you used when writing the RTI.
Or you can combine the latter two methods:
DD09 LSR $xx ; xx is any appropriate BCD value 00-59.
BPL $DCFC
DCFC RTI
This example acknowledges interrupts to both CIAs.
If you want to confuse the examiners of your code, you can use any
of these techniques. Although these examples use no undefined opcodes,
they do not necessarily run correctly on CMOS processors. However, the
RTI example should run on 65C02 and 65C816, and the latter branch
instruction example might work as well.
The RMW instruction method has been used in some demos, others were
developed by Marko Mäkelä. His favourite is the automagical RTI
method, although it does not have any practical applications, except
for some time dependent data decryption routines for very complicated
copy protections.
Memory Management
The processor's point of view
The Commodore 64 has access to more memory than its processor can
directly handle. This is possible by banking the memory. There are
five user configurable inputs that affect the banking. Three of them
can be controlled by program, and the rest two serve as control lines
on the memory expansion port.
The 6510 MPU has an integrated I/O port with six I/O lines. This
port is accessed through the memory locations 0 and 1. The location 0
is the Data Direction Register for the Peripheral data Register, which
is mapped to the other location. When a bit in the DDR is set, the
corresponding PR bit controls the state of a corresponding Peripheral
line as an output. When it is clear, the state of the Peripheral line
is reflected by the Peripheral register. The Peripheral lines are
numbered from 0 to 5, and they are mapped to the DDR and PR bits 0 - 5,
respectively. The 8502 processor, which is used in the Commodore 128,
has seven Peripheral lines in its I/O port. The pin P6 is connected to
the ASC/CC key (Caps lock in English versions).
The I/O lines have the following functions:
Direction Line Function
--------- ---- --------
out P5 Cassette motor control. (0 = motor spins)
in P4 Cassette sense. (0 = PLAY button depressed)
out P3 Cassette write data.
out P2 CHAREN
out P1 HIRAM
out P0 LORAM
The default value of the DDR register is $2F, so all lines except
Cassette sense are outputs. The default PR value is $37 (Datassette
motor stopped, and all three memory management lines high).
If you turn any memory management line to input, the external pull-up
resistors make it to look like it is outputting logical "1". This
is actually why the computer always switches the ROMs in upon startup:
Pulling the -RESET line low resets all Peripheral lines to inputs,
thus setting all three processor-driven memory management lines to
logical "1" level.
The two remaining memory management lines are -EXROM and -GAME on
the cartridge port. Each line has a pull-up resistor, so the lines
are "1" by default.
Even though the memory banking has been implemented with a 82S100
Programmable _Logic_ Array, there is only one control line that seems
to behave logically at first sight, the -CHAREN line. It is mostly
used to choose between I/O address space and the character generator
ROM. The following memory map introduces the oddities of -CHAREN and
the other memory management lines. It is based on the memory maps in
the Commodore 64 Programmer's Reference Guide, pp. 263 - 267, and some
errors and inaccuracies have been corrected.
The leftmost column of the table contains addresses in hexadecimal
notation. The columns aside it introduce all possible memory
configurations. The default mode is on the left, and the absolutely
most rarely used Ultimax game console configuration is on the right.
(Has anybody ever seen any Ultimax games?) Each memory configuration
column has one or more four-digit binary numbers as a title. The bits,
from left to right, represent the state of the -LORAM, -HIRAM, -GAME
and -EXROM lines, respectively. The bits whose state does not matter
are marked with "x". For instance, when the Ultimax video game
configuration is active (the -GAME line is shorted to ground), the
-LORAM and -HIRAM lines have no effect.
default 001x Ultimax
1111 101x 1000 011x 00x0 1110 0100 1100 xx01
10000
----------------------------------------------------------------------
F000
Kernal RAM RAM Kernal RAM Kernal Kernal Kernal ROMH(*
E000
----------------------------------------------------------------------
D000 IO/C IO/C IO/RAM IO/C RAM IO/C IO/C IO/C I/O
----------------------------------------------------------------------
C000 RAM RAM RAM RAM RAM RAM RAM RAM -
----------------------------------------------------------------------
B000
BASIC RAM RAM RAM RAM BASIC ROMH ROMH -
A000
----------------------------------------------------------------------
9000
RAM RAM RAM RAM RAM ROML RAM ROML ROML(*
8000
----------------------------------------------------------------------
7000
6000
RAM RAM RAM RAM RAM RAM RAM RAM -
5000
4000
----------------------------------------------------------------------
3000
2000 RAM RAM RAM RAM RAM RAM RAM RAM -
1000
----------------------------------------------------------------------
0000 RAM RAM RAM RAM RAM RAM RAM RAM RAM
----------------------------------------------------------------------
*) Internal memory does not respond to write accesses to these
areas.
Legend: Kernal E000-FFFF Kernal ROM.
IO/C D000-DFFF I/O address space or Character
generator ROM, selected by
-CHAREN. If the CHAREN bit is
clear, the character generator
ROM will be selected. If it is
set, the I/O chips are
accessible.
IO/RAM D000-DFFF I/O address space or RAM,
selected by -CHAREN. If the
CHAREN bit is clear, the
character generator ROM will
be selected. If it is set, the
internal RAM is accessible.
I/O D000-DFFF I/O address space.
The -CHAREN line has no effect.
BASIC A000-BFFF BASIC ROM.
ROMH A000-BFFF or External ROM with the -ROMH line
E000-FFFF connected to its -CS line.
ROML 8000-9FFF External ROM with the -ROML line
connected to its -CS line.
RAM various ranges Commodore 64's internal RAM.
- 1000-7FFF and Open address space.
A000-CFFF The Commodore 64's memory chips
do not detect any memory accesses
to this area except the VIC-II's
DMA and memory refreshes.
NOTE: Whenever the processor tries to write to any ROM area
(Kernal, BASIC, CHAROM, ROML, ROMH), the data will get
"through the ROM" to the C64's internal RAM.
For this reason, you can easily copy data from ROM to RAM,
without any bank switching. But implementing external
memory expansions without DMA is very hard, as you have to
use a 256 byte window on the I/O1 or I/O2 area, like
GEORAM, or the Ultimax memory configuration, if you do not
want the data to be written both to internal and external
RAM.
However, this is not true for the Ultimax video game
configuration. In that mode, the internal RAM ignores all
memory accesses outside the area $0000-$0FFF, unless they
are performed by the VIC, and you can write to external
memory at $1000-$CFFF and $E000-$FFFF, if any, without
changing the contents of the internal RAM.
A note concerning the I/O area
The I/O area of the Commodore 64 is divided as follows:
Address range Owner
------------- -----
D000-D3FF MOS 6567/6569 VIC-II Video Interface Controller
D400-D7FF MOS 6581 SID Sound Interface Device
D800-DBFF Color RAM (only lower nybbles are connected)
DC00-DCFF MOS 6526 CIA Complex Interface Adapter #1
DD00-DDFF MOS 6526 CIA Complex Interface Adapter #2
DE00-DEFF User expansion #1 (-I/O1 on Expansion Port)
DF00-DFFF User expansion #2 (-I/O2 on Expansion Port)
As you can see, the address ranges for the chips are much larger
than required. Because of this, you can access the chips through
multiple memory areas. The VIC-II appears in its window every $40
addresses. For instance, the addresses $D040 and $D080 are both mapped
to the Sprite 0 X co-ordinate register. The SID has one register
selection line less, thus it appears at every $20 bytes. The CIA chips
have only 16 registers, so there are 16 copies of each in their memory
area.
However, you should not use other addresses than those specified by
Commodore. For instance, the Commodore 128 mapped its additional I/O
chips to this same memory area, and the SID responds only to the
addresses D400-D4FF, also when in C64 mode. And the Commodore 65, or
the C64DX, which unfortunately did not make its way to the market,
could narrow the memory window reserved for its CSG 4567 VIC-III.
The video chip
The MOS 6567/6569 VIC-II Video Interface Controller has access to
only 16 kilobytes at a time. To enable the VIC-II to access the whole
64 kB memory space, the main memory is divided to four banks of 16 kB
each. The lines PA0 and PA1 of the second CIA are the inverse of the
virtual VIC-II address lines VA14 and VA15, respectively. To select a
VIC-II bank other than the default, you must program the CIA lines to
output the desired bit pair. For instance, the following code selects
the memory area $4000-$7FFF (bank 1) for the video controller:
LDA $DD02 ; Data Direction Register A
ORA #$03 ; Set pins PA0 and PA1 to outputs
STA $DD02
LDA $DD00
AND #$FC ; Mask the lowmost bit pair off
ORA #$02 ; Select VIC-II bank 1 (the inverse of binary 01 is 10)
STA $DD00
Why should you set the pins to outputs? Hardware RESET resets all
I/O lines to inputs, and thanks to the CIA's internal pull-up
resistors, the inputs actually output logical high voltage level. So,
upon -RESET, the video bank 0 is selected automatically, and older
Kernals could leave it uninitialized.
Note that the VIC-II always fetches its information from the
internal RAM, totally ignoring the memory configuration lines. There
is only one exception to this rule: The character generator ROM.
Unless the Ultimax mode is selected, VIC-II "sees" character generator
ROM in the memory areas 1000-1FFF and 9000-9FFF. If the Ultimax
configuration is active, the VIC-II fetches all data from the internal
RAM.
Accessing the memory places 0 and 1
Although the addresses 0 and 1 of the processor are hard-wired to
its on-chip I/O port registers, you can access the memory places 0 and
1. The video chip always reads from RAM (or character generator ROM),
so you can use it to read also from 0 and 1. Enable the bit-map screen
and set the start address of the graphics screen to 0. Now you can see
these two memory locations in the upper left corner. Alternatively,
you could set the character generator start address to 0, in which
case you would see these locations in @ characters (code 0). Or, you
can activate a sprite with start address 0. Whichever method you
choose, you can read these locations with sprite collision registers.
Define a sprite consisting of only one dot, and move it to read the 8
bits of each byte with the sprite to sprite or sprite to background
collision registers.
But how can you write to these locations? If you execute the command
POKE 53265,59, you will see that the memory place 1 changes its value
wildly. If you disable the interrupts (POKE53664,127), it will remain
stable. How is this possible? When the processor writes to 0 or 1, it
will put the address on the address bus and set the R/-W line to indicate
a write cycle, but it does not put the data on the data bus. Thus, it
writes "random" data. Of course this data is not truly random. Actually
it is something that the video chip left on the bus on its clock half.
So, if you want to write a certain value on 0 or 1, you have to make the
video chip to read that value just before the store cycle. This requires
very accurate timing, but it can be achieved even with a carefully
written BASIC program. Just wait the video chip to be in the top or
bottom border and the beam to be in the middle of the screen (not in the
side borders). At this area, the video chip will always read the last
byte of the video bank (by default $3FFF). Now, if you store anything to
the I/O port registers 0 or 1 while the video chip is refreshing this
screen area, the contents of the memory place $3FFF will be written to
the respective memory place (0 or 1). Note that this trick does not work
reliably on all computers. You need good RF protection, as the data bus
will not be driven at all when the value remains on it.
On the C128 in its 2 MHz mode, you can write to the memory places
with an easier kludge. Just make sure that the video chip is not
performing the memory refresh (as it would slow down to 1 MHz in that
case), and use some instruction that reads from a proper memory location
before writing to 0 or 1. Indexed zero-page addressing modes are good
for it. I tested this trick with LDX#1 followed by STA $FF,X. As you
can read from the instruction timing section of this document, the
instruction first reads from $FF (the base address) and then writes to 0.
The timing can be done with a simple LDA$D012:CMP$D012:BEQ *-3 loop.
But in the C128 mode you can relocate the stack page to zero page, so
this trick is not really useful.
You can also read the memory places 0 and 1 much faster than with
sprite collisions. Just make the video chip to read from 0 or 1, and
then read from non-connected address space ($DE00-$DFFF on a stock C64;
also $D700-$D7FF on C128's). Actually, you can produce a complete map
of the video timing on your computer by making a loop that reads from
open address space, pausing one frame and one cycle in between. And if
you are into copy protections, you could write a program on the open
address space. Just remember that there must be a byte on the bus for
each clock cycle.
These tricks unfortunately do not work reliably on all units. So far
I have had the opportunity to try it on three computers, two of which
were Commodore 128 DCR's (C128's housed in metal case with a 1571 floppy
disk drive, whose controller is integrated on the mother board). One
C128DCR drove some of its data bits too heavily to high state. No wonder,
since its housing consisted of some newspapers spread on the floor.
Autostart Code
Although this document concentrates on hardware, there is one thing
that you must know about the firmware to get complete control over
your computer. As the Commodore 64 always switches the ROMs on upon
-RESET, you cannot relocate the RESET vector by writing something in
RAM. Instead, you have to use the Autostart code that will be
recognized by the KERNAL ROM. If the memory places from $8004 through
$8008 contain the PETSCII string 'CBM80' (C3 C2 CD 38 30), the RESET
routine jumps to ($8000) and the default NMI handler jumps to ($8002).
Some programs that load into RAM take advantage of this and don't
let the machine to be reset. You don't have to modify the ROM to get
rid of this annoying behaviour. Simply ground the -EXROM line for the
beginning of the RESET sequence.
Notes
See the MCS 6500 Microcomputer Family Programming Manual for less
information.
References:
C64 Memory Maps C64 Programmer's Reference Guide, pp. 262-267
C64 Schematic Diagram
6510 Block Diagram C64 Programmer's Reference Guide, p. 404
Instruction Set C64 Programmer's Reference Guide, pp. 254-255, 416-417
C64/128 Real Programmer's Revenge Guide
C=Lehti magazine 4/87
6502-NMOS.extra.opcodes
"Extra Instructions Of The 65XX Series CPU"
By: Adam Vardy (abe0084@infonet.st-johns.nf.ca)
[File created: 22, Aug. 1995... 27, Sept. 1996]
The following is a list of 65XX/85XX extra opcodes. The operation codes
for the 6502 CPU fit in a single byte; out of 256 possible combinations,
only 151 are "legal." This text describes the other 256-151= 105 operation
codes. These opcodes are not generally recognized as part of the 6502
instruction set. They are also referred to as undefined opcodes or
undocumented opcodes or non-standard opcodes or unofficial opcodes. In
"The Commodore 64 Programmer's Reference Guide" their hexadecimal values
are simply marked as future expansion. This list of opcodes was compiled
with help from "The Complete Inner Space Anthology" by Karl J. H. Hildon.
I have marked off the beginning of the description of each opcode with a
few asterisks. At times, I also included an alternate name in parenthesis.
All opcode values are given in hexadecimal. These hexadecimal values are
listed immediately to the right of any sample code. The lowercase letters
found in these examples represent the hex digits that you must provide as
the instruction's immediate byte value or as the instruction's destination
or source address. Thus immediate values and zero page addresses are
referred to as 'ab'. For absolute addressing mode the two bytes of an
absolute address are referred to as 'cd' and 'ab'.
Execution times for all opcodes are given alongside to the very right of
any sample code. A number of the opcodes described here combine the
operation of two regular 6502 instructions. You can refer to a book on the
6502 instruction set for more information, such as which flags a particular
instruction affects.
ASO *** (SLO)
This opcode ASLs the contents of a memory location and then ORs the result
with the accumulator.
Supported modes:
ASO abcd ;0F cd ab ;No. Cycles= 6
ASO abcd,X ;1F cd ab ; 7
ASO abcd,Y ;1B cd ab ; 7
ASO ab ;07 ab ; 5
ASO ab,X ;17 ab ; 6
ASO (ab,X) ;03 ab ; 8
ASO (ab),Y ;13 ab ; 8
(Sub-instructions: ORA, ASL)
Here is an example of how you might use this opcode:
ASO $C010 ;0F 10 C0
Here is the same code using equivalent instructions.
ASL $C010
ORA $C010
RLA ***
RLA ROLs the contents of a memory location and then ANDs the result with
the accumulator.
Supported modes:
RLA abcd ;2F cd ab ;No. Cycles= 6
RLA abcd,X ;3F cd ab ; 7
RLA abcd,Y ;3B cd ab ; 7
RLA ab ;27 ab ; 5
RLA ab,X ;37 ab ; 6
RLA (ab,X) ;23 ab ; 8
RLA (ab),Y ;33 ab ; 8
(Sub-instructions: AND, ROL)
Here's an example of how you might write it in a program.
RLA $FC,X ;37 FC
Here's the same code using equivalent instructions.
ROL $FC,X
AND $FC,X
LSE *** (SRE)
LSE LSRs the contents of a memory location and then EORs the result with
the accumulator.
Supported modes:
LSE abcd ;4F cd ab ;No. Cycles= 6
LSE abcd,X ;5F cd ab ; 7
LSE abcd,Y ;5B cd ab ; 7
LSE ab ;47 ab ; 5
LSE ab,X ;57 ab ; 6
LSE (ab,X) ;43 ab ; 8
LSE (ab),Y ;53 ab ; 8
(Sub-instructions: EOR, LSR)
Example:
LSE $C100,X ;5F 00 C1
Here's the same code using equivalent instructions.
LSR $C100,X
EOR $C100,X
RRA ***
RRA RORs the contents of a memory location and then ADCs the result with
the accumulator.
Supported modes:
RRA abcd ;6F cd ab ;No. Cycles= 6
RRA abcd,X ;7F cd ab ; 7
RRA abcd,Y ;7B cd ab ; 7
RRA ab ;67 ab ; 5
RRA ab,X ;77 ab ; 6
RRA (ab,X) ;63 ab ; 8
RRA (ab),Y ;73 ab ; 8
(Sub-instructions: ADC, ROR)
Example:
RRA $030C ;6F 0C 03
Equivalent instructions:
ROR $030C
ADC $030C
AXS *** (SAX)
AXS ANDs the contents of the A and X registers (without changing the
contents of either register) and stores the result in memory.
AXS does not affect any flags in the processor status register.
Supported modes:
AXS abcd ;8F cd ab ;No. Cycles= 4
AXS ab ;87 ab ; 3
AXS ab,Y ;97 ab ; 4
AXS (ab,X) ;83 ab ; 6
(Sub-instructions: STA, STX)
Example:
AXS $FE ;87 FE
Here's the same code using equivalent instructions.
STX $FE
PHA
AND $FE
STA $FE
PLA
LAX ***
This opcode loads both the accumulator and the X register with the contents
of a memory location.
Supported modes:
LAX abcd ;AF cd ab ;No. Cycles= 4
LAX abcd,Y ;BF cd ab ; 4*
LAX ab ;A7 ab ;*=add 1 3
LAX ab,Y ;B7 ab ;if page 4
LAX (ab,X) ;A3 ab ;boundary 6
LAX (ab),Y ;B3 ab ;is crossed 5*
(Sub-instructions: LDA, LDX)
Example:
LAX $8400,Y ;BF 00 84
Equivalent instructions:
LDA $8400,Y
LDX $8400,Y
DCM *** (DCP)
This opcode DECs the contents of a memory location and then CMPs the result
with the A register.
Supported modes:
DCM abcd ;CF cd ab ;No. Cycles= 6
DCM abcd,X ;DF cd ab ; 7
DCM abcd,Y ;DB cd ab ; 7
DCM ab ;C7 ab ; 5
DCM ab,X ;D7 ab ; 6
DCM (ab,X) ;C3 ab ; 8
DCM (ab),Y ;D3 ab ; 8
(Sub-instructions: CMP, DEC)
Example:
DCM $FF ;C7 FF
Equivalent instructions:
DEC $FF
CMP $FF
INS *** (ISC)
This opcode INCs the contents of a memory location and then SBCs the result
from the A register.
Supported modes:
INS abcd ;EF cd ab ;No. Cycles= 6
INS abcd,X ;FF cd ab ; 7
INS abcd,Y ;FB cd ab ; 7
INS ab ;E7 ab ; 5
INS ab,X ;F7 ab ; 6
INS (ab,X) ;E3 ab ; 8
INS (ab),Y ;F3 ab ; 8
(Sub-instructions: SBC, INC)
Example:
INS $FF ;E7 FF
Equivalent instructions:
INC $FF
SBC $FF
ALR ***
This opcode ANDs the contents of the A register with an immediate value and
then LSRs the result.
One supported mode:
ALR #ab ;4B ab ;No. Cycles= 2
Example:
ALR #$FE ;4B FE
Equivalent instructions:
AND #$FE
LSR A
ARR ***
This opcode ANDs the contents of the A register with an immediate value and
then RORs the result.
One supported mode:
ARR #ab ;6B ab ;No. Cycles= 2
Here's an example of how you might write it in a program.
ARR #$7F ;6B 7F
Here's the same code using equivalent instructions.
AND #$7F
ROR A
XAA ***
XAA transfers the contents of the X register to the A register and then
ANDs the A register with an immediate value.
One supported mode:
XAA #ab ;8B ab ;No. Cycles= 2
Example:
XAA #$44 ;8B 44
Equivalent instructions:
TXA
AND #$44
OAL ***
This opcode ORs the A register with #$EE, ANDs the result with an immediate
value, and then stores the result in both A and X.
One supported mode:
OAL #ab ;AB ab ;No. Cycles= 2
Here's an example of how you might use this opcode:
OAL #$AA ;AB AA
Here's the same code using equivalent instructions:
ORA #$EE
AND #$AA
TAX
SAX ***
SAX ANDs the contents of the A and X registers (leaving the contents of A
intact), subtracts an immediate value, and then stores the result in X.
... A few points might be made about the action of subtracting an immediate
value. It actually works just like the CMP instruction, except that CMP
does not store the result of the subtraction it performs in any register.
This subtract operation is not affected by the state of the Carry flag,
though it does affect the Carry flag. It does not affect the Overflow
flag.
One supported mode:
SAX #ab ;CB ab ;No. Cycles= 2
Example:
SAX #$5A ;CB 5A
Equivalent instructions:
STA $02
TXA
AND $02
SEC
SBC #$5A
TAX
LDA $02
Note: Memory location $02 would not be altered by the SAX opcode.
NOP ***
NOP performs no operation. Opcodes: 1A, 3A, 5A, 7A, DA, FA.
Takes 2 cycles to execute.
SKB ***
SKB stands for skip next byte.
Opcodes: 80, 82, C2, E2, 04, 14, 34, 44, 54, 64, 74, D4, F4.
Takes 2, 3, or 4 cycles to execute.
SKW ***
SKW skips next word (two bytes).
Opcodes: 0C, 1C, 3C, 5C, 7C, DC, FC.
Takes 4 cycles to execute.
To be dizzyingly precise, SKW actually performs a read operation. It's
just that the value read is not stored in any register. Further, opcode 0C
uses the absolute addressing mode. The two bytes which follow it form the
absolute address. All the other SKW opcodes use the absolute indexed X
addressing mode. If a page boundary is crossed, the execution time of one
of these SKW opcodes is upped to 5 clock cycles.
--------------------------------------------------------------------------
The following opcodes were discovered and named exclusively by the author.
(Or so it was thought before.)
HLT ***
HLT crashes the microprocessor. When this opcode is executed, program
execution ceases. No hardware interrupts will execute either. The author
has characterized this instruction as a halt instruction since this is the
most straightforward explanation for this opcode's behaviour. Only a reset
will restart execution. This opcode leaves no trace of any operation
performed! No registers affected.
Opcodes: 02, 12, 22, 32, 42, 52, 62, 72, 92, B2, D2, F2.
TAS ***
This opcode ANDs the contents of the A and X registers (without changing
the contents of either register) and transfers the result to the stack
pointer. It then ANDs that result with the contents of the high byte of
the target address of the operand +1 and stores that final result in
memory.
One supported mode:
TAS abcd,Y ;9B cd ab ;No. Cycles= 5
(Sub-instructions: STA, TXS)
Here is an example of how you might use this opcode:
TAS $7700,Y ;9B 00 77
Here is the same code using equivalent instructions.
STX $02
PHA
AND $02
TAX
TXS
AND #$78
STA $7700,Y
PLA
LDX $02
Note: Memory location $02 would not be altered by the TAS opcode.
Above I used the phrase 'the high byte of the target address of the operand
+1'. By the words target address, I mean the unindexed address, the one
specified explicitly in the operand. The high byte is then the second byte
after the opcode (ab). So we'll shorten that phrase to AB+1.
SAY ***
This opcode ANDs the contents of the Y register with <ab+1> and stores the
result in memory.
One supported mode:
SAY abcd,X ;9C cd ab ;No. Cycles= 5
Example:
SAY $7700,X ;9C 00 77
Equivalent instructions:
PHA
TYA
AND #$78
STA $7700,X
PLA
XAS ***
This opcode ANDs the contents of the X register with <ab+1> and stores the
result in memory.
One supported mode:
XAS abcd,Y ;9E cd ab ;No. Cycles= 5
Example:
XAS $6430,Y ;9E 30 64
Equivalent instructions:
PHA
TXA
AND #$65
STA $6430,Y
PLA
AXA ***
This opcode stores the result of A AND X AND the high byte of the target
address of the operand +1 in memory.
Supported modes:
AXA abcd,Y ;9F cd ab ;No. Cycles= 5
AXA (ab),Y ;93 ab ; 6
Example:
AXA $7133,Y ;9F 33 71
Equivalent instructions:
STX $02
PHA
AND $02
AND #$72
STA $7133,Y
PLA
LDX $02
Note: Memory location $02 would not be altered by the AXA opcode.
The following notes apply to the above four opcodes: TAS, SAY, XAS, AXA.
None of these opcodes affect the accumulator, the X register, the Y
register, or the processor status register!
The author has no explanation for the complexity of these
instructions. It is hard to comprehend how the microprocessor could handle
the convoluted sequence of events which appears to occur while executing
one of these opcodes. A partial explanation for what is going on is that
these instructions appear to be corruptions of other instructions. For
example, the opcode SAY would have been one of the addressing modes of the
standard instruction STY (absolute indexed X) were it not for the fact that
the normal operation of this instruction is impaired in this particular
instance.
One irregularity uncovered is that sometimes the actual value is stored in
memory, and the AND with <ab+1> part drops off (ex. SAY becomes true STY).
This happens very infrequently. The behaviour appears to be connected with
the video display. For example, it never seems to occur if either the
screen is blanked or C128 2MHz mode is enabled.
--- Imported example ---
Here is a demo program to illustrate the above effect. SYS 8200 to try it.
There is no exit, so you'll have to hit Stop-Restore to quit. And you may
want to clear the screen before running it. For contrast, there is a
second routine which runs during idle state display. Use SYS 8211 for it.
After trying the second routine, check it out again using POKE 53269,255 to
enable sprites.
begin 640 say->sty
D"""B`*`@G``%Z$P,("P1T##[+!'0$/NB`*`@G``%Z-#Z3!,@
`
end
--- Text import end ---
WARNING: If the target address crosses a page boundary because of indexing,
the instruction may not store at the intended address. It may end up
storing in zero page, or another address altogether (page=value stored).
Apparently certain internal 65XX registers are being overridden. The whole
scheme behind this erratic behaviour is very complex and strange.
And continuing with the list...
ANC ***
ANC ANDs the contents of the A register with an immediate value and then
moves bit 7 of A into the Carry flag. This opcode works basically
identically to AND #immed. except that the Carry flag is set to the same
state that the Negative flag is set to.
One supported mode:
ANC #ab ;2B ab ;No. Cycles= 2
ANC #ab ;0B ab
(Sub-instructions: AND, ROL)
OPCODE 89
Opcode 89 is another SKB instruction. It requires 2 cycles to execute.
LAS ***
This opcode ANDs the contents of a memory location with the contents of the
stack pointer register and stores the result in the accumulator, the X
register, and the stack pointer. Affected flags: N Z.
One supported mode:
LAS abcd,Y ;BB cd ab ;No. Cycles= 4*
OPCODE EB
Opcode EB seems to work exactly like SBC #immediate. Takes 2 cycles.
That is the end of the list.
This list is a full and complete list of all undocumented opcodes, every
last hex value. It provides complete and thorough information and it also
corrects some incorrect information found elsewhere. The opcodes MKA and
MKX (also known as TSTA and TSTX) as described in "The Complete Commodore
Inner Space Anthology" do not exist. Also, it is erroneously indicated
there that the instructions ASO, RLA, LSE, RRA have an immediate addressing
mode. (RLA #ab would be ANC #ab.)
[Recent additions to this text file]
Here are some other more scrutinizing observations.
The opcode ARR operates more complexily than actually described in the list
above. Here is a brief rundown on this. The following assumes the decimal
flag is clear. You see, the sub-instruction for ARR ($6B) is in fact ADC
($69), not AND. While ADC is not performed, some of the ADC mechanics are
evident. Like ADC, ARR affects the overflow flag. The following effects
occur after ANDing but before RORing. The V flag is set to the result of
exclusive ORing bit 7 with bit 6. Unlike ROR, bit 0 does not go into the
carry flag. The state of bit 7 is exchanged with the carry flag. Bit 0 is
lost. All of this may appear strange, but it makes sense if you consider
the probable internal operations of ADC itself.
SKB opcodes 82, C2, E2 may be HLTs. Since only one source claims this, and
no other sources corroborate this, it must be true on very few machines.
On all others, these opcodes always perform no operation.
LAS is suspect. This opcode is possibly unreliable.
OPCODE BIT-PATTERN: 10x0 1011
Now it is time to discuss XAA ($8B) and OAL ($AB). A fair bit of
controversy has surrounded these two opcodes. There are two good reasons
for this. 1 - They are rather weird in operation. 2 - They do operate
differently on different machines. Highly variable.
Here is the basic operation.
OAL
This opcode ORs the A register with #xx, ANDs the result with an immediate
value, and then stores the result in both A and X.
On my 128, xx may be EE,EF,FE, OR FF. These possibilities appear to depend
on three factors: the X register, PC, and the previous instruction
executed. Bit 0 is ORed from x, and also from PCH. As for XAA, on my 128
this opcode appears to work exactly as described in the list.
On my 64, OAL produces all sorts of values for xx: 00,04,06,80, etc... A
rough scenario I worked out to explain this is here. The constant value EE
disappears entirely. Instead of ORing with EE, the accumulator is ORed
with certain bits of X and also ORed with certain bits of another
"register" (nature unknown, whether it be the data bus, or something else).
However, if OAL is preceded by certain other instructions like NOP, the
constant value EE reappears and the foregoing does not take place.
On my 64, XAA works like this. While X is transfered to A, bit 0 and bit 4
are not. Instead, these bits are ANDed with those bits from A, and the
result is stored in A.
There may be many variations in the behaviour of both opcodes. XAA #$00 or
OAL #$00 are likely quite reliable in any case. It seems clear that the
video chip (i.e., VIC-II) bears responsibility for some small part of the
anomalousness, at least. Beyond that, the issue is unclear.
One idea I'll just throw up in the air about why the two opcodes behave as
they do is this observation. While other opcodes like 4B and 6B perform
AND as their first step, 8B and AB do not. Perhaps this difference leads
to some internal conflict in the microprocessor. Besides being subject to
"noise", the actual base operations do not vary.
All of the opcodes in this list (at least up to the dividing line) use the
naming convention from the CCISA Anthology book. There is another naming
convention used, for example in the first issue of C=Hacking. The only
assembler I know of that supports undocumented opcodes is Power Assembler.
And it uses the same naming conventions as used here.
One note on a different topic. A small error has been pointed out in the
64 Programmers Reference Guide with the instruction set listing. In the
last row, in the last column of the two instructions AND and ORA there
should be an asterisk, just as there is with ADC. That is the indirect,Y
addressing mode. In another table several pages later correct information
is given.
(A correction: There was one error in this document originally. One
addressing mode for LAX was given as LAX ab,X. This should have been
LAX ab,Y (B7). Also note that Power Assembler apparently has this same
error, likely because both it and this document derive first from the same
source as regards these opcodes. Coding LAX $00,X is accepted and
produces the output B7 00.)
References
o Joel Shepherd. "Extra Instructions" COMPUTE!, October 1983.
o Jim Butterfield. "Strange Opcodes" COMPUTE, March 1993.
o Raymond Quirling. "6510 Opcodes" The Transactor, March 1986.
o John West, Marko Mäkelä. '64doc' file, 1994/06/03.