Updating the Z-Machine Standard Documents

Dannii · December 9, 2023, 3:51am

The problem is partly that “2OP” refers to two things, the “opcode type” as used in sections 14 and 15, and the “operand count” as used in section 4.

But the specific problem looks to me to be 4.3.3:

In variable form, if bit 5 is 0 then the count is 2OP; if it is 1, then the count is VAR. The opcode number is given in the bottom 5 bits.

In variable form there is always a byte giving the operands (and therefore the operand count). So I think this section should just say that in variable form, the count is VAR. Then paragraph 4.5 can be left as it is.

Alternatively, perhaps the whole section should be rewritten. It would probably be much simpler with some tables. And I don’t think a distinct “operand count” concept is helpful. The only form that doesn’t correspond neatly to an “operand count” is short form, so it just needs a small explanation for that.

Marvin · December 9, 2023, 4:08am

The ‘opcode type’ is related to the ‘operand count’, though. It’s pretty clear that 2OP opcodes are primarily for two operands, and only in two cases do they use variable form to have more. It’s awkward.

Honestly, I tried rewriting the whole thing once. It was a huge pain, and then someone pointed out some issues, and I don’t feel like trying again.

If someone else wants to do the work, I’m fine with that, but I don’t feel like tackling a complete rethink of the description again (so much so that I’ve been avoiding even thinking about this one let alone bringing it up).

Dannii · December 9, 2023, 4:19am

Well, putting more substantial changes aside, editing paragraph 4.3.3 will be much simpler. All it needs to say is

In variable form then the count is VAR. The opcode number is given in the bottom 5 bits.

Marvin · December 9, 2023, 4:29am

This isn’t bad, but we lose the information about bit 5 which tells us which set of opcodes we’re looking at.

Marvin · December 9, 2023, 5:34am

How about:

4.3

Each instruction has a form (long, short, extended or variable) and an operand count (0OP, 1OP, 2OP or VAR). If the top two bits of the opcode are $$11 the form is variable; if $$10, the form is short. If the opcode is 190 ($BE in hexadecimal) and the version is 5 or later, the form is “extended”. Otherwise, the form is “long”.

The opcodes are categorized by their operand count (see section 14), however, 2OP opcodes can be assembled in variable form, which allows them to take more than two operands (see section 4.3.3)

4.3.3

In variable form the count is VAR. However, if bit 5 is 0 then the set of 2OP opcodes are used; if it is 1, then the VAR opcodes are used. The opcode number is given in the bottom 5 bits.

Dannii · December 9, 2023, 8:00am

Yeah that looks pretty good.

Marvin · December 10, 2023, 2:15am

The abbreviations table, the alphabet table, and the Unicode translation table can all be in writeable memory. What happens if a game messes with them?

prevtenet · December 10, 2023, 2:33am

For that matter, callable functions can be in writable memory, allowing one to write polymorphic code. This works on most traditional interpreters but (disappointingly) not in Parchment, which uses JIT compilation and therefore may not notice the change. (Don’t, uh, ask me how I know this.)

Marvin · December 10, 2023, 2:36am

Yeah, the remarks for Section 5 actually mention the idea of compiling code at run time. I’m not convinced we should require interpreters to notice that the various tables have changed, though.

Mike_G · December 10, 2023, 2:54am

I’ve made dynamic compiling zcode routines myself (nothing public), and altering tables during play is far easier than that.

I’d rather keep things as-is. Removing the requirement for interpreters to recognize changes to tables in dynamic memory permanently precludes any future game from writing to those tables during play (unless the author is willing to accept the game is unlikely to be supported by standard interpreters). It may give some flexibility to terp writers, but removes an equal measure from game writers.

Dannii · December 10, 2023, 3:08am

Parchment caches the alphabet and unicode translation tables and never checks if they’re updated. Noone has ever reported this being a problem.

It decodes abbreviations on the fly, so I assume there was a game I saw that needed this.

The JIT should allow for code in dynamic memory (without caching it), but I don’t really remember testing it.

Marvin · December 10, 2023, 3:09am

You say “keep things as-is”, but the Standard as-is neither prohibits games from changing the tables nor explicitly requires interpreters to constantly check them.

prevtenet · December 10, 2023, 3:42am

I recall that Parchment broke for me when using a PUT instruction to dynamically change the address of a CALL instruction that immediately followed it, but I don’t have code on hand at the moment to replicate this.

IMO polymorphism is most useful for (1) dynamically changing branch destinations to implement jump tables, etc. and (2) dynamically caching specific routines in low memory on systems that would otherwise have to load them from disk. Fortunately, #1 is absurd and can usually be accomplished by jumping to variables instead, and #2 is only relevant on limited platforms where polymorphism should work as expected.

Mike_G · December 10, 2023, 4:11am

If they are in dynamic memory, then clearly they can be changed.

If they aren’t allowed to be changed, then an interpreter that wanted to produce a diagnostic anytime a game violates the standard (something my z-machine library aspires to do), then it will be forced to check the bounds of each and every table on every memory write to ensure the game doesn’t write to them. If terps can lift these tables, then writes to those memory locations become errors. This is onerous and performance impacting.

Marvin · December 10, 2023, 4:39am

The way the spec is currently written probably leans more towards them being editable, yeah. But it’s not explicit.

If they are allowed to be changed, then this hobbles an interpreter that could more efficiently read the data once and keep it in memory for quicker access (something my z-machine interpreter does). This is a minor nuisance.

Mike_G · December 10, 2023, 4:46am

I realize being able to fully cache things makes things more convenient, but is it really quicker? Everything in dynamic memory will already be in memory regardless unless the interpreter in question is quite odd indeed.

Another thing to consider is that allowing tables to be cached means either:

A) writing to any of those locations is an error and MUST be prevented

-or-

B) allowing the writes to silently succeed but not change cached behavior will have consequences if a game is saved and restored. The newly written values will be used unless you re-write the full set of tables back to dynamic memory on every save.

Mike_G · December 10, 2023, 1:52pm

This can be avoided if on start-up the interpreter only ever reads the tables from the story file and never from the save file or dynamic memory. Of course behavior will vary from an interpreter which gets it’s cached values from the save file or after dynamic memory is populated, or one which doesn’t cache.

cas · December 10, 2023, 2:40pm

For what it’s worth, at least Infocom’s IBM interpreter seems to do the same for the alphabet and abbreviations table, and their docs imply that the abbreviations table is meant to be writable. From XZIP:

The frequent words table, pointed to by FWORDS and below ENDLOD […]

It’s required to be below ENDLOD, i.e. in writable memory.

The docs also say, of ENDLOD:

All major tables (VOCAB, OBJECT, etc.) are guaranteed to be below ENDLOD.

With, of course, “major tables” never being fully defined. Presumably the alphabet table isn’t major, given that their interpreter caches it.

Not that Infocom’s spec is the be-all-end-all as far as modern Z-machine specifications go, but at least it’s a data point.

Bocfel does the same thing as Parchment, and I’ve never seen any bug reports around it either.

Infocom IBM code for reference:

Marvin · December 10, 2023, 7:32pm

Almost certainly marginally quicker. My terp is written in python, so the data in the tables gets converted to python data types (strings, tuples, etc.).

Yes, okay, you’re right here. This does complicate the matter a little. It’s possible to work around this, as you point out later (interpreter only ever reads the tables from the original file or from cached versions).

Either way we go with this requires some interpreters to change behaviour, but I’m leaning more towards all tables in dynamic memory can be changed, even though this is a pain for me personally.

Dannii · December 10, 2023, 9:52pm

Caching also makes it much easier to do both conversions of ZSCII - Unicode and back. Using the memory in place would be great for one direction, but one require you to iterate through the array to do the reverse.