NES game development in 6502 assembly - Part 1

The goal of the article is to introduce readers to the 6502 assembly, NES architecture and to write a simple game.

I will be using the ca65 assembly compiler which is part of the cc65 suite and FCEUX NES emulator. Both tools are available for Linux and Windows. The choice of the emulator does not matter however this is not the case for the compiler as the code will be using directives specific only to ca65.

Without further ado let’s dive into the game development for the NES in the mid 1980s.

This section contains a short description of opcodes(instructions), addressing modes and registers of the 6502 CPU that is used in NES.
Addressing modes. 6502 has a 16 bit address bus which means that it supports 64KB of memory. 8 bits are represented by two hex characters so in 6502 memory location will be represented as $ followed by four hex characters: $0000 - $ffff. You need to differentiate $ from #$. The first one represents address location while the second one a literal value.
Registers. 6502 has very few registers. A is an 8 bit accumulator that is primarily used for holding the result of a mathematical operation. X, Y are two 8-bit registers that are usually used in loops to hold the value of index. PC (program counter) holds the address of the currently executing instruction. There are also seven 1-bit flags that are holding information about the result of the last instruction. A few to mention: Negative flag is set if the result of arithmetic operation is negative, Carry flag is set if the arithmetic result overflows or is set after shift instruction, Zero flag is set if the result of arithmetic operation is zero.
Instructions. All instructions in 6502 assembly language take either 0, 1 or 2 arguments. I will cover some instructions with examples here but for the complete list of supported opcodes please visit this website as well as this for an interactive tutorial.

NES Architecture

NES is using a modified version of 6502 CPU that has a 16 bit address bus which supports up to 64KB of memory. 2KB are reserved for RAM, a few addresses that are used as ports to access PPU (Picture Processing Unit) and APU (Audio Processing Unit) and 32KB reserved for the code of the game. RAM is located between addresses $0000 and $0800.
The sprites are stored inside of the RAM and one thing to know is that NES is using what's called a dynamic memory which is cheaper to produce but one downside is that values inside of the RAM are degrading over time so we need to update them every frame or otherwise we will start getting random values.
PPU is a chip that does all the graphics in the NES. It draws pixels on the screen line by line. Some of the main parts of the PPU:
OAM. Object Attribute Memory is a memory location inside of the PPU that contains information about 64 sprites each taking 4 bytes of space.

BYTE 0 Y position on the screen
BYTE 1 Index of the sprite in the tile map
BYTE 2 Attributes of the sprite
BYTE 3 X position on the screen

Nametables. NES contains four nametables of 1024 bytes in size for the background tiles of the game. NES draws the layer of background separately from the layer of sprites. Each nametable contains a 64 byte attribute table which describes respective color palettes.
Palettes. Each palette contains 4 colors. We can choose the color of the sprite by setting the attribute mentioned in the OAM section.

For a more detailed description of the system please refer to the NesDev wiki.

Our first game
The end goal of this section is to display Mario sprite on the screen. Each subsection will describe and explain various segments of the game code. Later on, this code can be used as a boilerplate for any further games.

ca65 uses segments to structure the code. The first one we will take a look at is the “HEADER” segment. This part will let the NES know how banks are organized and how to run the game code.

.byte “NES” is just the name of the game. .byte $1a lets the NES know that this is a game file and that it can be loaded. .byte $02 and .byte $01 describes the number of banks of code and characters(tile maps) we have. The rest of the bytes include checksum and filler bytes. This is the end of the “HEADER” and the beginning of the “STARTUP” segment where the interrupt labels are described.

Reset interrupt
Interrupts are basically events that occur in the system that will pause the execution of the current line of code so that the CPU can handle them. In the NES there are two main interrupts: NMI and Reset.
Reset happens when the console is first booted up or when the reset button is pressed. We need to let the NES know what to do when this interrupt happens so we will define the Reset label and start some initialization. Labels are used to mark parts of the code to which we can jump later. You can think of them as the name of a function.

SEI disables all interrupts so that no events in the system interfere with the initialization. CLD turns off decimal mode that is supported by 6502 but not by NES. Lines 5-6 disables any sound as the system is not yet initialized and we do not want any weird sounds to start playing. Lines 8-9 initialize the stack which we are not going to use during this tutorial. INX sets X register back to #$00 because the current value of X after initialization of stack is #$FF and #$FF + 1 overflows. We now need to zero out PPU registers and disable the sound channel of the APU. Lines 11-17 are doing exactly this.

Before doing anything else we need to wait for a PPU to draw a single frame. This will indicate that the NES is ready for further operations. Every time PPU draws a frame there is a short time frame called v-blank where PPU waits for a graphics update. By waiting for this v-blank event to occur we will ensure that at least a single frame was drawn. To detect it we can check the 7th bit at the memory address $2002. If it is 1 then we are in the v-blank state, otherwise it is equal to 0. A little trick can be used to constantly check for this condition:

BIT opcode checks the 7th bit of the value at $2002 and if it is 1 sets N flag to 1 otherwise sets it to 0. BPL (Branch on PLus) is a branching instruction that operates based on the value of the N flag. Value of the N flag being 0 will make it “think” that some arithmetic operation finished with the positive value and it will jump back to the label VBlankWait1 and the loop will repeat till v-blank occurs.

Clearing memory
When the console starts up, the RAM is initialized with “garbage” values so before continuing the best practice is to zero-out those values. As we know in NES the RAM is 2KB starting from memory address $0000 and ending at $0800. One might want to load #$00 to the accumulator and then store this value in each of those addresses.

But this approach is too inconvenient as we will have to write 2048 lines of repetitive code. Instead we will use an iterative approach using X register as an index. The problem that you might discover is that X is only an 8 bit register meaning that we can only cover up to 256 address values. To overcome this issue we will go through several ranges of address values simultaneously:

First iteration will store #$00 from A register in memory addresses $0000, $0100, $0300,…, $0700. Second iteration will do the same but at memory addresses $0001, $0101, $0301, …, $0701. Opcode INX increments the value stored in X register by one. CPX compares the value of X to #$00 and sets flag Z to 1 if they are equal. BNE is a branching opcode that checks the values of the Z flag. If it is set to 0 then BNE will jump back to the label in our case the label is ClearMem otherwise the next opcode will be executed. You might wonder why do we compare X to #$00 and not to #$FF. The answer is simple: if we compare X to #$FF then the loop will stop at #$FE because we first increment X and then compare. So we will end up initializing not all of the memory addresses. As X is an 8 bit register incrementing #$FF will set its value back to #$00. This value of X will mean that we went through all 256 values.

Another question would be why did we initialize the range $0200 - $02FF with the value #$FE. If you remember, the NES can handle up to 64 sprites each of which is 4 bytes. This is exactly 256 bytes that match this range of the memory. So at the beginning we need to decide where we will store our sprites. We can choose any 256 bytes of memory from the range $0000 - $0800. Why then the value #$FE you will ask? Information about a sprite contains its X and Y position on the screen. We don’t want sprites to appear randomly on the screen as we load them into the memory. X and Y value of #$FE will position sprites outside of the screen and the PPU will ignore them.
We will wait for another v-blank to make sure that the NES is totally ready.

PPU and Palettes initialization
PPU now needs to know where all the sprites are located in the memory. We are going to load the most significant byte of $0200 meaning 02 to A and store it at the memory location of $4014.

NOP instruction at the end stands for No Operation. It takes some time for the PPU to load this value so NOP will make the CPU burn a cycle.
Now it is time to initialize color palettes so that sprites and backgrounds have proper colors. There are a total of 8 palettes. 4 for backgrounds and 4 for sprites taking 32 bytes of space. A palette consists of four colors where the first one is always the same.
PPU has its own internal RAM which can’t be accessed directly but is accessible through the data ports. Palettes are stored in the internal memory range of $3F00 - $3F1F. PPU first needs to know that we want to write to exactly this range of memory.

Lines 1-4 describe exactly this. $2006 is a data port that accepts first the most significant and then the least significant byte of the memory that we are trying to write to meaning #$3F and #$00 respectively.
To describe some data use “.byte” followed by the actual data. PaletteData label describes the colors of each palette. $22, $29, $1A, $0F is a first palette with four colors, $22, $36, $17, $0F is another one. As it was mentioned before, the first color is always the same for each palette.
Palettes are declared and the PPU is prepared to accept them starting from $3F00. Now let’s do the actual writing to the memory. By iterating through the PaletteData label and storing colors in the accumulator we will write them one by one to the data port $2007. PPU will read the color from that port and write first to $3F00 and then increment this address by itself and write the next color to $3F01, and so on.

Loading sprites
What I haven’t yet covered is loading sprites into the memory range $0200-02FF that was previously allocated for this purpose. I will remind you that the values that we are going to load right now are not actual sprites. These are just values that describe where to place a sprite on the screen and its index in the tile map about which I will talk later.

LoadSprites subroutine has almost the same logic as LoadPalettes. SpriteData describes our Mario character. Line 9 means: position first sprite with 8 pixel offset on Y axis ($08), its index in the tile map is 0 ($00), attribute is 0 ($00), and 8 pixel offset on X axis ($08). The Mario character is too big to fit into the limit of 8x8 pixel per sprite. If you take a look at the tile map there are 8 pieces that make up the full Mario character. That’s why we have lines 9-16 with an incrementing index in the tile map.

Fig 1. Indexes of the eight Mario sprites in the tile map

NMI interrupt
Non-maskable interrupt happens every frame. This is a good place to do updates to the graphics. If you remember, the PPU’s internal RAM was dynamic meaning that values degrade over time so we will use NMI to update that memory.
We also need to enable back all the interrupts that were disabled by SEI opcode and set back the PPU registers that were cleared in the “HEADER” segment.

CLI enables interrupts. Lines 3-6 enables PPU drawing back. We define the behaviour of the NMI interrupt via the NMI label. Here we tell PPU from where to take the sprites for an update by providing the most significant byte of the $0200 address.

Vectors and Chars
If you have made this far then congrats, this is the last section :)
We have defined the Reset and NMI labels but we didn’t connect them to the actual interrupts. Interrupts are also called vectors so we will proceed by creating “VECTORS” segment and connect labels to interrupts.

NMI and Reset have to be connected only in this order.
Lastly we have to include a tile map where the actual sprites reside that will be used by PPU. It is done in the “CHARS” segment. I took the .chr file from github that will be linked below.

The only thing left is to compile and run our code in the emulator. Use the following commands:

ca65 .asm
ld65 .o -o .nes -t nes

The final output will be the ROM file with extension .nes that can be open through the emulator.

Fig 2. Output of the emulator

Full source code: https://github.com/neonwalker/nes_tutorial
Stay tuned for Part 2 where we will create a more sophisticated game.

1. http://wiki.nesdev.com
2. https://nerdy-nights.nes.science
3. https://github.com/mchiaramonte/hellomario
4. http://skilldrick.github.io/easy6502