Technical Interview of Y0SHi
Alan Dykes - January 29, 1997
Please state your name and affiliates.
Yoshi, part of Damaged Cybernetics, OldSkooL, MTDS, and DAC.
How would you start to write an EMU?
Quite a broad question... i'd have to say a little too broad... However...The firt thing to obtain before you begin programming an emulator is knowledge of the type of CPU you will be emulating. For instance, on the SNES, knowing 65c816 assembly or machine language can *REALLY* speed up the process
Second, obtaining information about the platform you plan to emulate -- get as much documentation as you can...
For the SNES, there's my SNES Document available to the public. For the NES, there's Marat Fayzullin's NES.DOC, provided on his iNES web page
Platforms which you can't find documentation for are much more difficult to emulate -- how can you emulate what you know nothing about? :-)
After you get that information, in what order would you start to build the EMU?
Hmm... The primary core of an emu is CPU emulation.Making sure ALL the opcodes which the machine supports are implemented, and *FLAWLESS*.
Debugging opcodes is a SERIOUS bitch... having test programs (such as my TEST.NES file, which tests the validity of a 6502 emulator while only requiring that ~10 opcodes work flawless prior to execution)
For qNES, I had to create test programs... books helped, a LOT.
So, emulating the CPU is *ALWAYS* the first task... don't forget simple routines; text output (just for debugging), file handling, memory allocation...
There's one aspect of programming an emulator which a lot of programmers forget at the beginning of the process.
That aspect is to write your code dynamically -- by this, I mean make sure you can EXPAND on it later on. Don't limit yourself to how much memory you can address, or the length of your registers, or anything of that sort
Give yourself room -- expect your emulation to be slow (well, more or less :-) ) at first, then optimize later.
For instance, I ran into a small problem recently regarding memory mappers in qNES -- I programmed the memory addressing routines in a "bad way," which didn't allow me to address memory under memory mappers used in NES games (to allow more ROM code, etc.)
It's my personal opinion that you should make sure your code is expandable. :-)
Could you possibly choose a simple opcode from an NES (or another system), and explain how it would be emulated on a standard PC?
Sure. The most simple, on a 6502 (NES) would be NOP, which stands for N o OP eration. :-)But, that's too simple -- because NOP does nothing! Heheh
So, let's see... (if Riff was here, he'd know of a good one! Hehe)
Can I explain two, so you can understand how they work in conjunction?
Go ahead, that would be even better than 1 I believe.
I'll be explaining the 6502 'LDA' and 'INC' opcodes.Both opcodes have what're known as "addressing modes" -- immediate, absolute, indexed, indirect, indirect indexed, and indexed indirect...
Well, actually, INC doesn't have that many :-) Hehe, I'm confusing myself actually, so let me start with LDA, then go onto INC
LDA stands for L oa D A ccumulator
The Accumulator on the 6502 is an 8-bit register ("internal variable", if you're a BASIC idiot) which allows you to do calculations and modifications, such as math, or other functions, and get the result from it
Now, regarding the "addressing modes" I spoke about
LDA supports the following addressing modes:
Immediate, absolute, zero-page, absolute indexed X, absolute indexed Y, ZP indexed X, ZP indexed indirect X, and ZP indirect indexed Y
Now, those probably make no sense to anyone who doesn't know about the 6502 addressing modes!
I'll explain each one individually...
Immediate addressing is VERY simple.
If you had "LDA #$40", it means "load 40 hexadecimal into the accumulator"
Very simple -- until you realize a new aspect to the 6502: the status flag register, which I will refer to as "P"
The P register holds multiple internal bits/flags which mean different things
The P register is 8-bits as well
However, the 6502 only uses 7 of the 8 bits inside the P register; the excess bit is for expansion (and is used in the 6502's successor, the 65816)
So, now off onto ANOTHER subset -- the P register and it's flags (As you can see, emulating addressing modes can be complex, because each mode affects the P register differently).
The P register on the 6502 holds 7 bits which handle different aspects of the 6502 -- the bits, in order from highest to lowest, are represented by letters: nv-bdizc
- n is the Negative Flag (set when negative)
- v is the Overflow Flag (set when overflow occurs)
- - is unused
- b is the Break Flag (set when the BRK interrupt occurs (another issue I don't want to cover, because it's complex))
- d is the Decimal Mode flag (set when you want to use decimal mode, something nearly 95% of the NES emulators out there *DO NOT SUPPORT*.[plug]qNES is the first NES emulator, to my knowledge, to support decimal mode [/plug])
- i is the IRQ Disable flag (set when IRQ is disabled (another thing I don't want to cover due to complexity))
- z is the Zero Flag (set when the operation is zero (0))
- c is the Carry Flag (set when carry occurs)
Now, with that in mind, we back up to the Immediate Addressing mode and we back up one more (partially) to address LDA itself once and for all:
The LDA opcode affects the n and z flags in the P register. n is set if the MSB (most-significant-bit (in this case, bit 8)) is set; otherwise, n is cleared. z is set if the value loaded is zero (0); otherwise it's cleared.
Changing these flags is **VERY** important -- one flaw, and your emulator will not function properly. All emulator authors go through HELL when it comes to dealing with the 6502 flags.
Back to the ORIGINAL issue of LDA and it's Immediate addressing mode The opcode would translate to byte value $A9 (for immediate addressing only); the byte following the $A9 is considered the operand (the value loaded) So, if you had bytes in this order: $A9 83, you would actually be doing a "LDA #$83" which is loading hex-value $83 into the accumulator
This operation would also set the n flag, but clear the zero flag (setting n because bit 8 is set to 1, and clear z because the value is NOT zero)
All of that is **JUST** for *ONE* opcode's addressing mode! And I listed all of LDA's addressing modes above. It's very complex how each work, but immediate is the easiest
Next, we have absolute addressing. It functions the same way, affecting the same flags in the P register, but loads values differently.
The Absolute address opcode value for LDA is $AD; therefore, if you saw the bytes $AD 00 21, you would be doing an "LDA $2100". Now, the first question you may ask is: why are you loading from $2100 (memory location $2100, BTW) when the bytes are in the order of "00 21"?. And the answer lies in one word: endian.
There's two CPU endian types: big and little. The 6502 is a big endian processor (reverse byte-order). The 68000 series CPU is a little endian processor (forward byte-order).
Oh, for the record -- I confuse endians all the time. However, the 65xxx series is reverse byte-order. I forget if it's little or big, to be honest :-) So, with that in mind..
The difference between immediate and absolute is that immediate loads a PHYSICAL value (taken from the operand), while absolute loads from a 16-bit address. In this case, loading from address $2100 -- which is a memory location :-). Who knows what's at $2100... Heheh... But whatever IS at $2100, it sets/clears the n and z flags in P according to the value there
So, that covers immediate and absolute addressing modes -- onto the more complex, and fun, addressing modes
Zero-page addressing...
First question is, "what the hell is zero-page?" Zero-page, on the 6502, is at address $0000 to $00FF -- 256 bytes of RAM which can be used in a "special way" on the 6502. It's called "zero-page" because the 6502 works in pages -- 256 byte pages... therefore, the first 256 bytes is considered the "first" page -- all CPUs consider zero the first number in existance, therefore it's called zero-page. The 65816 handles zero-page differently, just for the record -- but it's 100% zero-page compatible. Anyways, zero-page addressing is similar to absolute addressing....
The LDA opcode for zero-page addressing is $A5; therefore, if you had bytes $A5 3F, you would be loading from zero-page address $3F ("LDA $3F"). Same rules apply for the n & z flags.
Onto the next addressing mode -- absolute indexed X. This allows to you do the same thing as absolute addressing, but with a new addition: indexing
There's two other 8-bit registers on the 6502: X and Y. Basically, if you wanted to access a "set" of data in a sequential (or non-sequential, actually) order, starting at $2100, you could do it with absolute indexed addressing
The CPU internally calculates the address to load from by adding the operand bytes with the value in X. So, if your X register had the value $2D, and you did a "LDA $2100,X", you would be loading from the address $212D
A quick question answered: what happens if X holds, say, $10, and you do a "LDA $FFF5,X"? Well, the CPU wraps the address -- you'd load from address $0015, actually. :-). You can also use ",Y" to address using the Y register if you so desire.
Onto the next addressing mode: zero-page indexed X. This addressing mode works the same way as absolute indexed X, 'cept that: a) You cannot use the Y register to index zero page, and b) The address does not wrap at $FFFF -- it wraps at $FF, because zero-page is only 256 bytes. I'll make this one quick: Assuming X holds $05, "LDA $F2,X" would load from the direct page address $F7. sometimes
Finally, onto the "indirect" addressing modes. Starting with zero-page indexed indirect X. Very similar to zero-page indexed X, there's a new aspect to the addressing mode: indirect addressing.
An example of indirect addressing would be this:
Say you have hex values $00 21 at zero-page address $4F (So, $4F holds $00, and $50 holds $21) Also assume X holds value $0F. If you did an "LDA ($40,X)", you would ACTUALLY be loading the value from $2100 (!!) This is useful for lookup tables and the like :-)
Oh yes, and the opcode is value $A1 -- so if you had $A1 40, you'd be doing "LDA ($40,X)" (as shown above).
Now, onto the final addressing mode: zero-page indirect indexed Y. A lot of 6502 authors emulate this wrong -- they emulate it the same way as zero-page indexed indirect X, which is TOTALLY wrong (but understandably wrong, just for the record! :-) ). Instead of doing an "LDA ($40,X)" (or in this case, "LDA ($40,Y)") the syntax is different:
LDA ($40),Y
So, let's assume you have the same situation as the previous addressing example:
You have hex values $00 21 at zero-page address $4F. You assume Y holds value $0F. If you do a "LDA ($40),Y", you would assume you'd be loading from $2100 -- this is WRONG. What you need to do is this: Assume you have $00 21 at address $4F, assume Y holds value $0F. When you do a "LDA ($4F),Y", you're loading from address $210F. Here's why: Indirect indexed Y is calculated in this order of operation: Get indirect 16-bit value at zero-page address, then add Y to that. While in indexed indirect X, it is calculated like this: Add X to the zero-page address, then read from that zero-page 16-bit value to get the actual 8-bit value. I know it sounds confusing... but, hence the "indexed indirect" vs. "indirect indexed". It makes a difference, obviously. :-).
So, wrapping ALL THE WAY back to the VERY BEGINNING: That is how you emulate "LDA" on the 6502. And yes, for *ALL* of those LDA addressing modes, you set/clear n & z in the P flag. Now maybe you can see why emulators are considered "slow" -- there's a lot of work to do, PER OPCODE. The 6502 has something like 156 opcodes, so...
Anyways, onto the final opcode I mentioned: INC. INC supports the following addressing modes: Absolute, zero-page, absolute indexed X, and zero-page indexed X. (sorry folks, no Y index addressing :-) ). INC stands for INC rement, and it does just what it says: increments whatever is specified by that operand (or operand-address). INC also updates the n & z flags just like the LDA opcode. Oh, for the record -- the n & z flags are *THE MOST* commonly updated flags (almost every 6502 opcode updates them), so, write your n & z update routines wisely... make them THE most optimized.
Back to what I was saying about INC: What sucks about INC, on the 6502, is that you can't increment the accumulator by 1. (oh yes, BTW: INC increments by 1. There is no way to use "INC" to increment by more than one. You have to use the "ADC" opcode for this). However, in the 65c02, there is the INC accumulator support. But, the NES is not a 65c02; it's a 6502 :-)
So, in FINAl summary: That is how you emulate two opcodes on the 6502 (NES).
What other questions can I help answer? :-)
Well, maybe a short bit on implementing something like graphics?
NES graphics are complex (people disagree with me -- most say SNES graphics are complex. I say SNES graphics are easy :-) ). Honestly, I don't want to go into graphics emulation -- the NES bases it's graphics on sets of 2 and 4-bit formations, whihc is realy wacked. I know how, it's just a complex process of bit manipulation. To be completely honest, I have not mastered NES graphics to the point where I feel comfortable discussing them. Marat Fayzullin's NES.DOC goes over the NES graphics format -- however, it was ***HORRIBLY*** written, and therefore is very difficult to understand... even for me. Heh. Or maybe i'm just stupid. ;-)Mr. Snazz, the author of VeNES, wants to create his own documentation for the NES. However, Mr. Snazz's is no different than Marat's, and is therefore just as complex. I can say this much about the NES graphics: It is tile-based, and by tile I mean you have a set X-by-Y pixelgrid of graphics which you use. The screen is setup based on these tiles... which is good, because it makes things "fast," but, on the NES, due to it only being 16 colours, you run into a small situation: Colour calculation on the NES is very fucked up.
You have a PRE-SET palette (except for certain memory mappers, or so rumour states, which allow their own palette). Colours are calculated using two tables (as Mr. Fayzullin refers to them as) An Attribute Table, and a Pattern Table. These two tables help specify which colour to use PER FOUR TILES on the NES screen. That is where the NES is complex -- I still get very confused on how exactly the colour is calculated. The best person so far to explain it to me is Landy (Alex Krasivsky). Both Marat and Landy can be contacts via E-mail for more information -- I STRONGLY recommend getting Marat's NES.DOC and reading it's section on graphics.
I'd like to work with Marat on re-writing his documentation, if I could. I hope he reads this and contacts me about it :-) (hehe)... So, I apologize to anyone out there who wants detailed NES graphics documentation, BUT : Rest assured: As I continue on with Riff on the qNES project, I will be writing The NES Document, which should address all issues, including FULL memory mapper documentation support, and sound support (!!!)
Well, one last question, what kind of an interface do you think is good?
Ahh, a very good question. My personal opinion? DOS, MODE $13 :-) Or actually, MODE-Q (256x256x256). I do not recommend using Windows. However, I *AM* a Windows 95 fan (if you want to flame me for this, don't bother -- I love Windows 95, because it does what *I* want it to do (besides crash ;-) )). But, I believe emulators should be written for DOS, and DOS alone. Not UNIX (god I hate UNIX-based emulators... jesus, what a waste of the UNIX OS), and especially not Windows. But, I will admit, I am a BIG BIG fan of iNES, because of it's simplicity. And I love being able to double-click on an icon and have the emulator run *FASTER* than my NES at home. Another sub-question you might have: What about WinG? Well, WinG, as seen with Pasofami, is slow... it makes graphics routines easy to use, and provides you the abilities you need (panning, etc.) The only author in my lifetime which I have seen make efficent use of WinG's features while still keeping speed is Mayrat Fayzullin - I feel that DOS should be the primary development OS for emulators -- second recommendation is Windows. But I will ALWAYS stand firm on not running emulators under UNIX. UNIX is so much more...Friends of mine (Hi Suzanne :-) ) will disagree with me partially, but I agree with one aspect of her point of view: Quake was written under UNIX first, primarily 'cuz of it's support and ease of development. Therefore, I see NO problem PURELY writing the C/ASM code under UNIX, then porting it to DOS. But, if you're gonna do that, you might as well do it in DOS in the first place. So, I'm talking out of my ass here :-) Hehehe.
Summarization: I believe emulators should be written for DOS and no other OS.
Do you have anything you would like to add?
A few things, but not much. I'd like to make one big request to Marat right here and now. UPDATE YOUR NES DOCUMENTATION !!! We all rely on you as the source of NES information -- i've yet to see anyone write documentation for such a "little" platform. If you need help Marat, gimme all the info you have and i'll compile a better document giving you full credit for the information. I'd also like to take the time to thank all the members of the emulation scene for helping me out over the 6 months or so... It's been fun guys, and let's keep it going. ;-)Finally, remember: ...the w00pch00p is alive...
Affinity Would like to thank you for your time!
You're welcome; oh yeah! If anyone needs information or has commentary on this interview, E-mail Shadow Dragon or myself (yoshi@parodius.com)