Anybody here with 6502 experience?

Which means Commodore 64, Apple ][, Atari 2600, Acorn Atom, NES…inspired, of course, by this topic.

I’m working on the 6502 Å-machine interpreter, and my inexperience with 6502 assembly is really showing. I’m currently working on this block of code, which is run when entering a div:

	ldx	divsp        ; Div stack pointer
	lda	opermsb+0    ; Most significant byte of operand 0
	sta	divstk,x     ; Div stack
	lda	operlsb+0    ; Least significant byte of operand 0
	sta	divstk+1,x
	inx
	inx
	stx	divsp

	asl
	rol	opermsb+0
	asl
	rol	opermsb+0
	asl
	rol	opermsb+0
	;clc
	adc	stybase
	sta	phydata
	lda	opermsb+0
	adc	stybase+1
	sta	phydata+1

I can’t figure out what this ASL - ROL - ASL - ROL - ASL - ROL sequence is doing. I know it’s messing with the bits of the operand, but what happens when you interleave them like this?

This treats accumulator as the lowbyte and opermsb+0 as the highbyte of a 16-bit number, and shifts this number three steps to the left, i.e. multiplies it by 8.

ASL shifts a 0-bit in at the bottom of a value and shifts the top bit of the value out into the carry. ROL shifts the carry in at the bottom of a value and shifts the top bit out into the carry.

At the end, if we’re sure the original number was small enough that we can’t get an overflow (i.e. the result can’t be bigger than what fits in 16 bits), we know that the carry is clear, thus we can skip clc before the addition.

6 Likes

Since multiplication isn’t supported in hardware on 6502, one will typically use shift operations instead, when speed is important and one of the factors is known in advance. E.g. an entry in the object table is 14 bytes long in z5. Instead of calling a generic multiplication routine with the value and 14 as the parameters, you can:

  • shift the value three times (Now we have the original value x8)
  • subtract the original value from this (Now we have the original value x7)
  • shift the value one more time (Now we have the original value x14)
6 Likes

Ahh, that makes sense! This interpreter stores style class information in structs seven bytes long; I’m guessing those structs are padded out to eight bytes for simplicity, so this is taking the index of a struct, multiplying it by eight (23), then adding it to stybase (start of the struct array?) and putting the 16-bit result in phydata (start of the current struct?).

2 Likes

Yes, it performs:
phydata = stybase + 8 * oper0
(where oper0 consists of an LSB (least significant byte) and MSB (most significant byte), which are stored in two different arrays, because it’s faster to access them that way)

1 Like

Not much to add to what Fredrik wrote, but regarding the structure padding, a non uncommon way to optimize things on 6502 is to unpack “arrays of structs” into “structs of arrays”.

Say you have an array with 50 elements that each contain a 16 bit pointer, and two 8 bit variables, instead of storing that as 50 times 4 bytes, you could have 4 arrays of 50 elements:

  • Low byte of the pointer
  • High byte of the pointer
  • first 8 bit variable
  • second 8 bit variable

This way you can access any element just with X or Y indexed access without multiplying.
Obviously that’s only useful if you are going to have no more than 256 elements, and if all the sizes are known at compile time.

3 Likes

It’s a common technique for modern platforms too!

I think once I finally wrap my head around all the opcode mnemonics, I’ll get a lot faster at all of this. I have the advantage of experience with assembly on other systems (though generally ones with a lot of registers, like MIPS or the Z-machine); having only A, X, and Y is a bit of a change! Thankfully Linus commented his 6502 assembly a lot more than his C.

I felt very smart when I finally figured out the purpose of BIT yesterday. The C64 Å-machine interpreter has a “register” (is there a better short name for “byte in the zero page”?) called stflag that records whether it’s currently in a status area: $00 for nothing, $FF for the top status bar (supported), $80 for the inline status bar (not supported). It took me ages to figure out why these values specifically—it’s because they can both be created without any immediates (decrement $00 to get $FF, rotate $00 with the carry flag set to get $80), and can then be tested with BIT. N will be set if any status bar is active, and V will be set if it’s a supported one.

2 Likes

Just wait until you start seeing !byte $2C declarations in the code.

It’s a common 6502 idiom that allows you to skip the next two bytes, which is enough to skip an immediate load or ZP operation.

z_rfalse
   ldx #0
   !byte $2C
z_true
   ldx #1
   ; common code here...

(2C is the opcode for BIT absolute-address; as you’ve seen it’s an instruction that doesn’t modify any registers, although it does affect flags)

3 Likes

That is diabolical! I guess I should start paying attention to how many bytes different instructions take…

Size optimization on the 6502 is a compounding thing: Since relative branches are encoded on a single byte you can only go that far forward or backward, so as soon as a routine grows you start having to use JMP instead of Bcc… which in turn makes the code larger so you end up doing inverted tests to branch over a JMP that can actually go where you want to go.

So every time you can reduce by one byte here, one byte there, you increase the change you can turn a JMP into a Bcc (by ensure state flags are stable of course), which in turn can be a large enough change that you can convert some more, remove the inverted branches, etc…

You also often see things like the RTS statement is actually in the middle of the routine so the exit can be reached relatively from any point in the routine.

3 Likes

Global variables? :upside_down_face:

1 Like

I tend to say “zp something” like “zp entry”, “zp var” or “zp pointer”

1 Like