This article is meant to address some of the practical implications of the Chip8 instruction set and how it can be applied to writing games. Examples will be given using Octo assembler, but should be easy to translate into the assembler of your choice or raw Chip8 bytecode.

The Chip8 specification doesn’t say how fast programs should run. Experimentally, you may find that various games seem intended to run at different speeds. Pong.ch8, for example, is pretty playable when the game runs at about 7 chip8 cycles per frame and a framerate of 60fps. Many interpreters make it possible to adjust their speed for each game. This is less than ideal, however, as it requires players to experiment. Let’s consider a very simple program which moves a sprite on the screen at a fixed rate:

```
: box
0b11110000
0b10010000
0b10010000
0b11110000
: main
i := box
v0 := 10 # x position
v1 := 10 # y position
loop
clear
sprite v0 v1 4
v0 += 1
again
```

Since we know the `delay`

timer counts down at 60hz, we can use it to precisely time an interframe delay. Take a look at this modified program and note that animation will run at a consistent speed even if your interpreter is cranked up really fast:

```
: box
0b11110000
0b10010000
0b10010000
0b11110000
: sync
loop
vf := delay
if vf != 0 then
again
# delay for up to 1/60th of a second
# using the fixed-rate delay timer
vf := 1
delay := vf
;
: main
i := box
v0 := 10 # x position
v1 := 10 # y position
loop
clear
sprite v0 v1 4
v0 += 1
sync
again
```

Consider making use of a routine like `sync`

in your own programs to ensure a consistent gameplay experience on different interpreters.

Whether it’s counting down the number of missiles stored in your gigantic mech or telling you how many human skulls you’ve crushed beneath the treads of your tank, video games frequently involve displaying numbers on screen. Fortunately, Chip8 provides some instructions specially for this purpose.

To display a single digit from a register, we can make use of the `hex`

statement to load the address of a (built-in) hexadecimal character into `i`

. Then it’s just a matter of using `sprite`

to display it:

```
: main
v0 := 7 # some one-digit number
i := hex v0 # load the address of a character
sprite v0 v0 5 # built in characters are 5 pixels tall
```

There are two caveats here: the number displayed is in hexadecimal and we can only display a single digit. How do we display larger numbers? That’s where `bcd`

comes in. It’ll take a number in a register, split it into hundreds, tens and ones and then store those into sequential addresses in memory. Then we use `load`

to scoop those values into the bottom three registers and use `hex`

like before to display them.

```
# temporary storage for hundreds, tens and ones:
: digits 0 0 0
: main
v0 := 137 # some number 0-255
i := digits # the destination for bcd
bcd v0 # unpack digits in v0
va := 10 # x position of first digit
vb := 20 # y position of first digit
i := digits
load v2 # load digits into v0-v2
i := hex v0 # hundreds digit
sprite va vb 5
va += 5
i := hex v1 # tens digit
sprite va vb 5
va += 5
i := hex v2 # ones digit
sprite va vb 5
```

You can also use `bcd`

as a way of dividing numbers by 10 or 100, but it’s a little fiddly for this purpose.

It’s easy to make objects move in whole numbers of pixels per second- just store a speed in a register and add that speed to the object’s position every time you update the display. What may be less obvious is how you move objects slower than 1 pixel per frame.

One technique for accomplishing this is to use fixed-point arithmetic and the flag (`vf`

) register. You’ll store a fractional position in one register, accumulating your speed into it and incrementing the real position whenever the result sets the carry register- that is, whenever the result wraps around.

```
: ball
0b0110000
0b1001000
0b1001000
0b0110000
: main
i := ball
va := 1 # x position of ball 1 (pixels)
vb := 1 # x position of ball 2 (pixels)
v0 := 0 # x position of ball 1 (fractional)
v1 := 0 # x position of ball 2 (fractional)
v2 := 10 # x velocity of ball 1
v3 := 5 # x velocity of ball 2
loop
vc := 4
sprite va vc 4
vc := 12
sprite vb vc 4
v0 += v2
va += vf
v1 += v3
vb += vf
clear
again
```

Note that immediate adds (adding a constant) do not set `vf`

on overflow; we must store the velocity of the balls in registers, at least temporarily.

Say you want to make a turn-based game where a player moves an 8x8 tile at a time. We can use `key`

to wait for key input and then move the player based on the value we get. In this example I will represent the player’s horizontal and vertical position using a single “tile” coordinate system which numbers each 8x8 region on the screen left to right, top to bottom. Later on, this system would make it easy to add obstacles and tiles which do special things when a player walks on them.

```
: darwinian
0b00000000
0b00010000
0b00010000
0b01111100
0b00010000
0b00111000
0b00101000
0b00101000
: draw-player
va := v0
vc := 0b00111 # low 3 bits of v0 are the x coordinate
va &= vc # mask them off
va <<= va # multiply by 8 to get pixels
va <<= va
va <<= va
vb := v0
vc := 0b11000 # high 2 bits of v0 are the y coordinate
vb &= vc # mask them off
# and they're already in pixels!
i := darwinian
sprite va vb 8
;
: main
v0 := 12 # the player's position in tiles
loop
draw-player
v1 := key # wait for a keypress
draw-player # erase the player before moving
# assuming a default keyboard layout,
# these numbers will map to ASWD:
if v1 == 7 then v0 += -1 # move left
if v1 == 9 then v0 += 1 # move right
if v1 == 5 then v0 += -8 # move up
if v1 == 8 then v0 += 8 # move down
again
```

Note that since we do no boundary checking the player can wrap around the edges of the screen. You could add additional logic to prevent this, or you could take advantage of it as a game mechanic.

On the other hand, perhaps the tile coordinate system is unnecessary overhead for your game. Here’s a similar program which uses discrete x and y position registers. A jump table based on `jump0`

(which jumps to an address + the contents of `v0`

) is used to decode keypresses. The `launch`

routine is a subroutine so that after the jump0 and second jump into the body of each handler we can `return`

to the caller of `launch`

.

Movement is constrained to avoid having the player wrap around the edges of the screen. Finally, we avoid using subtraction instructions when moving the player by using immediate adds which will wrap around to the desired value.

```
: darwinian
0b00000000
0b00010000
0b00010000
0b01111100
0b00010000
0b00111000
0b00101000
0b00101000
: lf if va != 0 then va += 248 ;
: rt if va != 56 then va += 8 ;
: up if vb != 0 then vb += 248 ;
: dn if vb != 24 then vb += 8 ;
: code return return return return
return jump up return jump lf
jump dn jump rt return return
return return return return
: launch
# launch is a sub so that jump
# entries can return to the caller.
# jump table entries are
# 2 bytes, so double the key code:
v0 <<= v0
jump0 code
: main
va := 16 # x position
vb := 8 # y position
i := darwinian
loop
sprite va vb 8 # draw sprite
v0 := key # wait for key
sprite va vb 8 # erase sprite
launch
again
```

The `jump0`

instruction can be very useful for eliminating complex nests of branches and conditionals. Remember that if your jump table entries are 2 bytes long you can only have 128 in a table (such as if they are `jump`

s) and if they are 4 bytes long you can only have 64 (such as if they are a subroutine `call`

followed by a `return`

).

Say you want to make a game that draws a series of “tiles” to fill the screen as in an RPG overworld. We can start by declaring the tile data the rest of our experiments will use.

```
# 8x8 tiles for a simple RPG-like overworld
: ground 0b11101111 0b10111101 0b11110111 0b11011110
0b11110111 0b10111101 0b11101111 0b01111011
: water 0b00000000 0b00000000 0b11001100 0b00110000
0b00000000 0b10000110 0b00011000 0b00000000
: tree 0b11000011 0b10101001 0b01010101 0b00101010
0b01010100 0b10000001 0b11100111 0b11000011
: house 0b11111111 0b11100111 0b10011001 0b01111110
0b00000000 0b10111101 0b10100101 0b10100101
```

First, let’s try writing some simple loops that cover the entire display with tree tiles. Since the Chip8 display is 64x32 and our tiles are 8x8 we’ll draw four rows of 8 tiles:

```
: main
v0 := 0 # x position
v1 := 0 # y position
i := tree
loop
loop
sprite v0 v1 8
v0 += 8
if v0 != 64 then
again
v0 := 0
v1 += 8
if v1 != 32 then
again
```

Now we want to extend this program to randomly scatter tiles. Let’s generate a tile offset in v2 by using `random`

and a mask which will select random numbers spaced 8 bytes apart. This process is made very convenient because both the size and number of tiles are powers of two.

```
: get-tile
v2 := random 0b11000 # { 0, 8, 16, 24 }
i := ground # the base of the tileset
i += v2 # add the tile in
;
: main
v0 := 0 # x position
v1 := 0 # y position
loop
loop
get-tile
sprite v0 v1 8
v0 += 8
if v0 != 64 then
again
v0 := 0
v1 += 8
if v1 != 32 then
again
```

What if instead of picking tiles randomly we had static data defining a level? There are many ways we might store the level data with varying degrees of compactness. For simplicity and ease of editing, let’s represent each tile in the map with a byte, 0-3. Data will be stored a row at a time to correspond to the way we draw the level. I’ve rearranged our registers a bit because we need low registers free to be able to use `load`

.

```
: map
1 1 1 1 1 2 2 2
1 1 1 1 0 0 0 2
2 0 0 0 0 0 2 2
0 3 0 0 0 2 2 2
: get-tile
i := map # load the tile into v0
i += vc
load v0
v0 <<= v0 # multiply v0 by 8
v0 <<= v0
v0 <<= v0
i := ground
i += v0
;
: main
va := 0 # x position
vb := 0 # y position
vc := 0 # tile index
loop
loop
get-tile
sprite va vb 8
va += 8
vc += 1
if va != 64 then
again
va := 0
vb += 8
if vb != 32 then
again
```

Naturally we could simplify our `get-tile`

routine a bit if we stored tiles in multiples of 8 to begin with. Perhaps we could use the lower-order bits to store other information and mask them off with a simple `and`

before using them to index into the tile sheet. This approach can support up to 32 8x8 tiles before `v0`

will wrap around while computing an offset for `i`

. If we wanted more tiles, we would need to add to `i`

in multiple smaller steps.

In everyday programming, it is not unusual to want to dereference more than one pointer in sequence- imagine indexing a “jagged” 2d array or traversing a linked list. Unfortunately, it is difficult to do something like this in the Chip8 instruction set.

We can only load data from memory through the 16-bit `i`

register, and and we can only load into an 8 bit ordinary register. `i`

can be initialized to a constant or we can add an 8 bit value to it. How can we load an arbitrary value in normal registers into `i`

?

One approach would be to use repeated addition into `i`

:

```
v0 := 2 # high 4 bits of desired i
v1 := 37 # low 8 bits of desired i
v2 := 128 # constant
i := 0
i += v1
loop
while v0 != 0
i += v2 # we can't add 256 at once,
i += v2 # so we do it in two steps.
v0 += -1 # equivalent to adding 255
again
```

Another approach would be to use self-modifying code to place an `i := NNN`

instruction in memory before executing it. This looks like “0xANNN” in memory, so we can `or`

our high nybble with `0xA0`

.

```
: code
v0 := 2 # high 4 bits of desired i
v1 := 37 # low 8 bits of desired i
v2 := 0xA0 # constant
v0 |= v2 # now v0-v1 contain our instruction
i := trampoline # select destination for instruction
save v1 # write the instruction
: trampoline 0 0 # destination for i := NNN
```

If you want to generate an `i :=`

instruction pointing to a label known at assembly time, you can use Octo’s `:unpack`

statement to achieve the same trick:

```
: code
:unpack 0xA some-label # store address in v0-v1
i := trampoline # select destination for instruction
save v1 # write the instruction
: trampoline 0 0 # execute our instruction, setting i.
```

And another approach could be to use a jump table. This doesn’t allow us to store completely arbitrary values into `i`

, but it does allow us to select an `i`

based on the contents of a register. Since the jump table entries are 4 bytes each and are indexed by `v0`

, we can have at most 64 such entries in a table.

```
: table
i := 0x123 ;
i := 0x456 ;
i := 0x789 ;
i := 0xABC ;
i := 0xDEF ;
: get-i
# assume get-i is called as a subroutine,
# and v0 contains the table index.
v0 <<= v0 # multiply by 4, the table entry size
v0 <<= v0
jump0 table
```

Basic Chip8 doesn’t have any instructions for scrolling the screen, but since sprites occupy small regions of memory it’s fairly easy to scroll their images directly. We must remain aware that the `load`

and `save`

statements both increment `i`

to the address immediately after the region of memory they operate upon, which unfortunately is not very useful most of the time.

Let’s begin with a byte-wise scroll upwards of an 8 pixel tall sprite:

```
: letter 0x18 0x18 0x34 0x24 0x7C 0x62 0xC2 0xE7
: main
loop
v1 := 7 # memory offset
i := letter
load v0
v2 := v0 # prime v2 with the first row for wraparound
loop
i := letter
i += v1
load v0 # fetch the current row
v3 := v0 # stash current row in v3
v0 := v2
i := letter
i += v1
save v0 # write the previous row to the current
v2 := v3 # current becomes previous
v1 += -1
if v1 != -1 then
again
v0 := 10
i := letter
clear
sprite v0 v0 8
again
```

This is pretty slow. Since our sprite is small and registers are not at a premium, we might try shuffling data around in larger chunks:

```
: letter 0x18 0x18 0x34 0x24 0x7C 0x62 0xC2 0xE7
: main
loop
i := letter
load v0
v7 := v0 # fetch the top row and move it to v7
load v6 # fetch the remainder of the sprite into v0-v6
i := letter
save v7 # write them all back
v0 := 10
i := letter
clear
sprite v0 v0 8
again
```

Nice! Unfortunately, this technique uses 8 registers for temporary storage, so you are likely to need to spill some data before employing it.

On the other hand, we might save ourselves quite a bit of grief by repeating our sprite data twice and indexing into it based on a rolling offset, essentially producing a “sliding window” into the underlying sprite data. This approach can be used for other kinds of animation, too:

```
: letter
0x18 0x18 0x34 0x24 0x7C 0x62 0xC2 0xE7
0x18 0x18 0x34 0x24 0x7C 0x62 0xC2 0xE7
: main
v1 := 0 # window offset
loop
v0 := 10
i := letter
i += v1
clear
sprite v0 v0 8
v1 += 1 # increment v1 modulo 8
v0 := 0b111
v1 &= v0
again
```

Scrolling horizontally requires bitshifts. Both shift instructions leave `vf`

with the bit that was shifted out. Rotating a byte left is easy, since we can simply `or`

`vf`

into the result of the shift:

```
v0 <<= v0 # shift most significant bit out left
v0 |= vF # OR it back in as the new least significant bit
```

Rotating right is slightly more complicated because we need to mask `vf`

’s 1 or 0 value into the most significant bit of the result.

```
v0 >>= v0 # shift the least significant bit out right
if vF == 1 then vF := 128
v0 |= vF # OR it back in as the new most significant bit
```

If we use an unrolled loop, we can employ both operations for an interesting effect:

```
: letter 0x18 0x18 0x34 0x24 0x7C 0x62 0xC2 0xE7
: main
loop
v2 := 0 # memory offset
loop
i := letter
i += v2
load v1 # read in two bytes
v0 <<= v0
v0 |= vF # rotate left
v1 >>= v1
if vF == 1 then vF := 128
v1 |= vF # rotate right
i := letter
i += v2
save v1 # write back two bytes
v2 += 2
if v2 != 8 then
again
clear
v0 := 10
i := letter
sprite v0 v0 8
again
```

As demonstrated here, if you have enough registers to spare you can very conveniently operate on chunks of memory at once. An entire sprite can often fit in the Chip8 register file, but you may need to spill and restore your working registers. While it may seem unintuitive coming from other architectures, consider copying sprite data from place to place for animation as an alternative to using frame offsets.

Taking full advantage of the limited set of arithmetic operations available in Chip8 is extremely important for writing efficient programs. Let’s look at some applications of operators which may not be immediately obvious.

If we `xor`

a register with `0xFF`

it inverts all the bits in the original register- this is how you perform a `not`

:

```
vf := 0xFF
v0 ^= vf
```

Any number is 0 if you `xor`

it with itself, so `xor`

is often used as an equal-to operator in assembly language. Since the conditional branches in Chip8 can test if a register is equal to a constant or register this use of `xor`

is not often necessary.

Shifts leave the shifted out bit in `vf`

. Thus, we can use a left or right shift to obtain the most or least significant bit of a byte, respectively. We can apply this idea to count the number of 1 bits in a byte:

```
v0 := 0xF3 # number to examine
v1 := 0x00 # the count
loop
while v0 != 0
v0 >>= v0
v1 += vf
again
```

Of course, if speed was important this could be calculated using a 256-entry lookup table or an unrolled loop.

Sometimes it is useful to obtain the value (1 « N); that is, a particular single-bit vector for masking or setting bits in a register. There is no Chip8 instruction for doing multiple shifts at once, so like many operations the most efficient strategy ends up being to use a lookup table:

```
: bits 1 2 4 8 16 32 64 128
...
i := bits
i += v0 # index N
load v0 # result
```

Taking the bitwise `or`

of N with N+1 will have the effect of setting the least significant (or rightmost) zero bit in the byte:

```
v1 := v0 # use v1 as a temporary copy
v0 += 1
v0 |= v1
```

Similarly, the bitwise `and`

of N with N-1 will clear the least significant (or rightmost) one bit in the byte:

```
v1 := v0
v0 += -1
v0 &= v1
```

If a (nonzero) number becomes zero after performing this operation you know it had exactly one bit set and was thus a power of two.