I realize I’m probably better off on a different forum site dedicated to retrocomputing, but that feels like starting over.
I’m working on a new interpreter, written from scratch, for Apple 2. I know there are others out there, and Ozmoo exists. I wanted to write one myself for the same reason I wrote an interpreted in C++ from scratch, and an IF development language from scratch. They’re interesting problems, and I enjoy programming.
Anyway, this interpreter has been designed from the ground up to be as fast as possible, and as such, I have it filling memory at startup (Apple 2 disks are 140k which is enough to hold any V3 story and the interpreter) and using extended memory.
On the Apple ][+, the most common memory expansion is a Saturn memory board. It essentially looks like eight language cards, and is large enough I can efficiently support any V3 game up to 128k. It’s pretty cool, because I only need to recompute virtual mappings any time the program counter crosses the $BFFF or $FFFF boundaries. (Well, and branches and jumps).
The problem is Apple ][e. It has a really bizarre memory setup where there are two broad classes of memory, where each class is either “main” or “aux” memory.
- Everything between $200 and $BFFF
- Everything between $000-$1FF and $D000 and $FFFF
$C000-$CFFF on Apple 2 is reserved for I/O cards etc and is never RAM. Apple 2e does emulate the language card from Apple 2+, but not the same way Saturn memory does. Essentially the language card can be “main” memory from $D000-$FFFF, or “aux” memory from $D000-$FFFF, which falls into the second case above.
It would have been really nice if the memory between $000-$1FF was managed separately, but that’s not the world we live in.
For each of the ranges above, we can control whether memory reads come from main or aux memory, and we can control whether memory writes go to main or aux memory.
Finally, on Apple 2e, the standard way to extend memory past 128k is to support additional “aux” memory banks. You can control which of them is active, versus the standard base aux bank.
Currently, my interpreter code lives between $800 and $1FFF, and $2000 to $BFFF is meant for 40k of dynamic memory (and, currently, static memory too, but that may change if I find V3 stories that need more than 40k dynamic+static memory).
On Apple 2+, the high memory ends up being whatever fits in the first 40k, along with enough 12k banks of “language card” memory to hold the rest of the story. Any time the instruction pointer changes (due to a call, jump, or branch) I do some quick comparisons and make the appropriate language card 16k bank resident. This works fine and is extremely fast.
The problem is how to do this on the Apple 2e.
Fortunately I only ever need to read high memory, not write it (except during boot of course). My interpreter uses mostly zero page for variables for obvious reasons, but there are reads and writes to $800-$1FFF some times as well (stack, shadowed globals, etc).
So I feel like the correct way forward here is to only bank $000-$1FF and $D000-$FFFF.
But… zero page and the stack are included there, which is a real problem.
A simple solution would be to just treat high memory like a RAM disk. The story would always be loaded into aux banks 1+, and then any time I have a page miss, I copy pages from aux memory into main high memory. I’m not crazy about that because it’s a lot more complexity I didn’t need on the Apple 2+.
On the other hand, if I kept the aux memory active all the time, I wouldn’t have to copy high memory around, but then I’d have to do something about zero page and the stack being out of sync. Any time I switched to a different main/aux bank, I’d need to copy zero page variables over, which sounds really fragile. (And, of course, be really careful with the stack).
We need to read high memory in only a few situations though:
- When decoding the current instruction (including branches)
- When dealing with inline print and print_ret, or print_paddr
So maybe it wouldn’t be too bad to bank in aux memory only long enough to do the read? Right now the code to read the next instruction is a macro which loads the data through ZP then increments a zero page location and if that overflows, calls a helper to do more work. But I can’t really use zero page here because it’s no longer the correct data.
On 2e, I guess I could make the macro always call a function. Then the next address could be self-modifying code (lda $1234) which would avoid the whole zero page mess. If the instruction pointer is in the first 40k, the function can just be that load and wrapping the address properly. If the pointer is past that, the entire function can be rewritten to bank in the correct page first, then do the load, then bank it back out again (which is just a few instructions). Code that does inline strings could be special-cased to avoid constantly banking.
So… maybe that’s the way forward.
Thank you for listening to my TED talk, I guess.
-Dave