You might also want to look at how Inform does things, since it’s optimized for Z-machine but has the benefit of testing and iteration instead of being locked into the first design.
Briefly: each dictionary word is attached to a block of two bytes. The first contains various flags (is it plural? can it be a noun?), the second is the “verb number” (or zero if it can’t be a verb). The parser takes the first word, looks up its verb number, and uses that to index into the table of grammar lines. Everything else is done with the indices of the actual words in the dictionary—each object has a property containing a bunch of word indices that can refer to it, for example.
Typical interpreters I’ve worked on have about 20 instructions per virtual opcode, but they were more modern purely stack-based ones like for Java or C#. For a virtual machine like the Z machine, which has a lot of type information either implicit (2op values from 0x00 to 0x7F) or explicit (VAR opcodes), that might be a bit low.
Typical 6502 instructions run in probably 4-5 cycles, so that’s 100 machine cycles per virtual opcode. Why is that information still available off the top of my head? Sigh.
Say that scanning an element of a grammar table and rejecting it (the common case) takes ten Z machine opcodes.
So that’s 1000 machine cycles per grammar line.
So if most commands have to run through 50 grammar lines, that’s about 50,000 machine cycles, or 1/200th of a second to identify the correct grammar rule. There’s probably at least that many cycles again to filter out appropriate objects and that many again, if not more, to run the rest of the logic.
So I’m not convinced using a verb index to short-circuit the grammar table traversal is going to really add up to that much savings, but there could be some gross misestimations in my math above.
For my own silly reference interpreter, written in C++, I would fully decode all operands and then do a switch (jump table) on the opcode. I imagine the interpreters from the 80’s did something similar because they’d be really worried about code size.
I guess I could just run a cycle-accurate emulator like VICE on my laptop with Ozmoo to get a feel for what the actual performance tradeoffs are once I get that far.
Ozmoo on a PAL C64 with REU can typically process about 1500-2000 Z-code instructions per second, i.e. each instruction takes 500-670 clock cycles on average. This includes the time taken up by the Kernal interrupt, and by the VIC chip “badlines” - i.e. Ozmoo can use maybe ~93% of the CPU cycles, so it’s actually more like 465-620 cycles per Z-code instruction . Some instructions are heavier than others of course. A Z-code function call costs about 1500 cycles IIRC, as Ozmoo needs to close a stack frame and start a new stack frame. Multiplication and division are quite heavy too, since there’s no hardware support for them.
Decoding an instruction is quite heavy, and so is reading the arguments. There can be 0-8 arguments. Each argument can be an 8-bit constant, a 16-bit constant or a variable (global, local, or the top value on the stack). If it’s a variable, you need to fetch the value of the variable.
Occasionally, and quite often when making a function call or returning from one, execution continues in a new virtual memory block (each block is 512 bytes), which means Ozmoo needs to find the block in its vmem cache, or read it from REU (or disk if there’s no REU). This is also included in the execution time estimates above.
An instruction on the 6510 CPU takes 2-7 clock cycles. On average I’d guesstimate that it executes about 100 CPU instructions for each virtual instruction.
If you have Ozmoo on your local machine, you can uncomment this line (line ~106) in make.rb: # 'PRINTSPEED'
This will make Ozmoo print the number of Z-code instructions executed every second.
This is indeed what Ozmoo does. To perform the jump, it retrieves the address of the opcode routine from a table, pushes it onto the stack, and performs the RTS instruction (Return from subroutine) - this is faster than modifying and performing a JMP instruction
Benchmarks indicate that Ozmoo running a z3 game on C64 with REU is more than twice as fast as Infocom’s z3 interpreter (also running on C64, with REU).
Okay, so my estimates were off by about an order of magnitude. Yikes!
Thanks for contributing, Fredrik!
It sounds like I’ve massively overestimated Z machine performance back in the day. I certainly remember often waiting a second or two playing Zork or Infidel on my 64k Apple 2 clone.
My current parser is driven by a linear table that is searched from top to bottom for the first match, at which point that command either succeeds or fails.
// Parse a word that is here but not held
routine Here [] {
ExcludeObject = Player;
return ParseCommon(Player parent);
}
// Parse a word that is held
routine Held [] {
ExcludeObject = 0;
return ParseCommon(Player);
}
// Parse a word here or held
routine Nearby [] {
ExcludeObject = 0;
return ParseCommon(Player parent);
}
action #inv { 'inventory'/'take inventory': Inventory }
action #take { 'get'/'take'/'pick up' Here: Take }
action #drop { 'drop'/'put down' Held: Drop }
action #look { 'look at'/'examine' Nearby: Examine }
action #lookHere { 'look': Look }
action #puton { 'hang'/'place'/'put' Held 'on'/'upon'/'on to'/'onto' Here: PutOn }
action #go { 'go'/'go to' Direction: Travel }
action #restart { 'restart'/'restart game': [] { if (istruth Yes()) restart; rfalse; } }
action #quit { 'quit'/'quit game': [] { if (istruth Yes()) quit; rfalse; } }
#if not RELEASE
action #traceOn { '#trace on' : [] { $tracebits = 1; rfalse; } }
action #traceOff { '#trace off' : [] { $tracebits = 0; rfalse; }}
#endif
global ActionRoutine;
global DirectObject;
global IndirectObject;
routine ScanParseTable [; n matched] {
na = 0;
DirectObject = 0;
IndirectObject = 0;
while ($actions[[na]] isn't -1) {
nw = 1;
matched = 0;
repeat {
repeat {
n = $actions[[na]];
++na;
if (n &= hasSecondWord) {
if (n & wordMask is parse_buffer[[nw]] and $actions[[na]] & wordMask is parse_buffer[[nw+2]])
matched = 2;
n = $actions[[na]];
++na;
}
else {
if (n & wordMask is parse_buffer[[nw]])
matched = 1;
}
if (n &= isRoutine) {
if (matched isn't 0) {
nw = nw + matched + matched;
matched = $actions[[na]];
if (DirectObject isn't zero) {
IndirectObject = call matched();
if (isfalse IndirectObject)
rfalse;
}
else {
DirectObject = call matched();
if (isfalse DirectObject)
rfalse;
}
}
++na;
}
} while (n & (isRoutine | isEnd) is 0);
if (n &= isEnd) {
ActionRoutine = $actions[[na]];
++na;
if (istruth matched)
rtrue;
}
matched = 0;
} while (n >= 0);
}
print "I didn't understand that.^";
rfalse;
}
That parse table is from my version of Cloak of Darkness, so obviously any real game is going to have a significantly longer parse table and I can see that getting pretty slow. There are weird cases where both the Drop and PutOn actions could be triggered by an initial word put but I imagine mostly commonly, the dictionary payload of a word could indicate the matching action unambiguously.
We observe in Zork 1 (now that I have the grammar table displayed!) that each initial verb has one to twelve grammar lines, with an average of 1.95. (LOOK is the long pole in the tent.)
Iterating through the lines is usually quick, because most of them have a preposition as the second word and you can reject mismatches right away.
Infocom games on typical 8-bit machines had to wait for disk access from time to time. On a computer with ~62 KB of RAM that the interpreter can use (like the C64), that would happen when you tried a new action, or entered a new location, and the routines etc weren’t already in RAM. On machines with a lot less RAM, the interpreter could be forced to swap on pretty much every turn. If you played on a C64 or C128 with an REU (extra RAM cartridge), it would get the data from the cartridge instead, and this was of course a lot faster.
Infocom’s initial interpreter for C64 didn’t even bother to use the last 12 KB of RAM, since that RAM is harder to use. They replaced this interpreter with a better one after several years (1985 maybe?).
I don’t know anything about Infocom’s parser and “library”, but at least for PunyInform, one of the heaviest tasks is to parse a noun phrase. E.g. “drop lamp, cage and emerald” in Adventure typically takes several seconds to parse on a C64. If the player is standing in a room where they keep stuff, there might be 10 objects in the room, maybe ten more held by the player, and maybe five scenery objects. That means the parser has to try to match “lamp” to 25 different objects, each of which may have lots of synonyms, or a parse_name routine. Then it has to do the same thing for “cage”, and then “emerald”. Then of course, there’s quite a bit of processing when performing the action as well, as all objects in scope get to have a say whether the action should be performed, and then what it should print, and decide if anything else should happen afterward.