Z-Machine instruction encoding

In trying to clarify confusion in the Z-Machine Standard, I’ve noticed that the information for encoding instructions contains the following:

4.3
Each instruction has a form (long, short, extended or variable) and an operand count (0OP, 1OP, 2OP or VAR)…

4.3.3
In variable form, if bit 5 is 0 then the count is 2OP; if it is 1, then the count is VAR…

4.5
The operands are given next. Operand counts of 0OP, 1OP or 2OP require 0, 1 or 2 operands to be given, respectively. If the count is VAR, there must be as many operands as there were types other than ‘omitted’.

This seems to me to say that 2OP opcodes encoded in variable form can still only have two operands, which isn’t true. Am I misreading this? Did I miss something that makes it make sense elsewhere?

I don’t think you missed anything, the Standard just doesn’t explain this very well. It’s doubly misleading because in the table of instructions, je is shown as only taking two operands, but in fact it can take 4 when it’s in VAR form.

Infocom’s ZIP documentation frames it more helpfully.

The zip file link is dead, but can now be found here I cant post links, its can be found at the web archive organisation, under the date 20110718131227.

I think this is relevant to this topic, specifically

In variable form, if bit 5 is 0 then the count is 2OP; if it is 1, then the count is VAR

“the count is 2OP”, I understand, but what type are the two operands? It seems that really this note can just be ignored, as variable form 2OPs can have more than 2 operands, and you just read the next byte as you would with any other var form?

Or Is it implied that they’re large, otherwise you would use the standard long form? (What if you want large, variable?) Or is it just a hint that the proceeding VAR byte will be something aabb1111, eg: 2 ops and I only have to do work checking the first 4 bits to discover the types? The frotz implementation does not seem to check this case specifically (or I misunderstood it), so I think bit 5 is just an optimization hint?

The linked zip didn’t really clear it up for me, it talks about EXT format, which I think is VAR in the 1.1 standard, and EXTOP, being EXT in 1.1, but its discussion doesn’t mention bit 5.

gitlab DavidGriffith frotz src common process.c lines 275-287

I have this comment on VAR:

            ' Each pair of bits in next 1 or 2 bytes (XCALL & IXCALL) determine operand type
            ' 00 = long immediate (2 bytes)
            ' 01 = immediate (1 byte 00xx)
            ' 10 = variable (1 byte)
            ' 11 = no more operands

Using the excellent tool zd to illustrate:

Ordinary VAR (0-4 operands):
┌────────────────────────────────────────────────────────────────────┐
│4f1f: call_vn - VAR                                                 │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Variable form
 │  ┌─ VAR
 │  │     ┌─ @call_vn
┌┴┐ │ ┌───┴───┐
1 1 1 1 1 0 0 1 

Operand types:
 ┌─ Large constant
 │   ┌─ Variable
 │   │   ┌─ Small constant
 │   │   │   ┌─ Omitted
┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐
0 0 1 0 0 1 1 1 

Operand 0: 00011001 00001011 (6411)
Operand 1: 00000001 (L00)
Operand 2: 00010110 (22)

XCALL & IXCALL (0-8 operands):
┌────────────────────────────────────────────────────────────────────┐
│18c12: call_vn2 - VAR                                               │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Variable form
 │  ┌─ VAR
 │  │     ┌─ @call_vn2
┌┴┐ │ ┌───┴───┐
1 1 1 1 1 0 1 0 

Operand types:
 ┌─ Large constant
 │   ┌─ Variable
 │   │   ┌─ Small constant
 │   │   │   ┌─ Variable
┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐
0 0 1 0 0 1 1 0 

Operand types:
 ┌─ Variable
 │   ┌─ Omitted
 │   │   ┌─ Omitted
 │   │   │   ┌─ Omitted
┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐
1 0 1 1 1 1 1 1 

Operand 0: 00110000 10001111 (12431)
Operand 1: 00001000 (L07)
Operand 2: 01000000 (64)
Operand 3: 00000100 (L03)
Operand 4: 00000101 (L04)

Seems that the bit can be ignored, assuming zd would explode if it were illegal (?).

Ex, given 12, 32, 52 or 72 being the “literal 2OP form” of get_prop_addr (for each operand type permutation), and d2 being the “VAR form” with “bit 5 = 0 then the count is 2OP”, zd will happily read 1 input:

zd -v --verbose -b "d27F0100"
Beginning disassembly at 0000
-----------------------------
0000: get_prop_addr #01 -> -(SP)
0004: quit

--------------------------------------------------
┌────────────────────────────────────────────────────────────────────┐
│0: get_prop_addr - 2OP (main routine 0)                             │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Variable form
 │  ┌─ 2OP
 │  │     ┌─ @get_prop_addr
┌┴┐ │ ┌───┴───┐
1 1 0 1 0 0 1 0

Operand types:
 ┌─ Small constant
 │   ┌─ Omitted
 │   │   ┌─ Omitted
 │   │   │   ┌─ Omitted
┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐
0 1 1 1 1 1 1 1

Operand 0: 00000001 (1)

Store: 00000000 (0) -> -(SP)
┌────────────────────────────────────────────────────────────────────┐
│4: quit - 0OP                                                       │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Short form
 │   ┌─ Operand type: Omitted
 │   │     ┌─ @quit
┌┴┐ ┌┴┐ ┌──┴──┐
1 0 1 1 1 0 1 0

or 3

zd -v --verbose -b "d25701020300"
Beginning disassembly at 0000
-----------------------------
0000: get_prop_addr #01 #02 #03 -> -(SP)
0006: quit

--------------------------------------------------
┌────────────────────────────────────────────────────────────────────┐
│0: get_prop_addr - 2OP (main routine 0)                             │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Variable form
 │  ┌─ 2OP
 │  │     ┌─ @get_prop_addr
┌┴┐ │ ┌───┴───┐
1 1 0 1 0 0 1 0

Operand types:
 ┌─ Small constant
 │   ┌─ Small constant
 │   │   ┌─ Small constant
 │   │   │   ┌─ Omitted
┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐
0 1 0 1 0 1 1 1

Operand 0: 00000001 (1)
Operand 1: 00000010 (2)
Operand 2: 00000011 (3)

Store: 00000000 (0) -> -(SP)
┌────────────────────────────────────────────────────────────────────┐
│6: quit - 0OP                                                       │
└────────────────────────────────────────────────────────────────────┘
 ┌─ Short form
 │   ┌─ Operand type: Omitted
 │   │     ┌─ @quit
┌┴┐ ┌┴┐ ┌──┴──┐
1 0 1 1 1 0 1 0

Not necessarily… zd is, I’m somewhat embarrassed to say, roughly thrown together and not particularly well tested. To be honest, I didn’t really ever think anybody would even try using it!

I think I’d consider a “variable 2OP” with the wrong number of operands more undefined behavior than an ill-formed instruction. It can be decoded with no problem, even if it’s generally nonsensical. Maybe it’d be nice to have a little note/warning that it’s invalid, though.

For what it’s worth, interpreters seem to (almost) universally accept instructions like this. If you have the za assembler, you can build this:

start
add 10 20 -> sp
byte 0xd4 0xff 0x00
print "Result: "
print_num sp
new_line
read_char 1 -> sp
quit

The byte string there is a variable 2OP “add” instruction with all operands omitted. I’m attaching a build here: 2op.zip (254 Bytes)

Notice the add 10 20 -> sp as the first thing. Then we have the equivalent of, basically, add -> sp.

Almost all interpreters I tried ran this without failing.

Several interpreters printed out 30 (Bocfel, Fizmo, Frotz, Windows Frotz, XZip, Nitfol), presumably because they reuse an operand list which contains values from the previous instruction.

A couple printed out 0 (Zoom and ZVM/Lectrote), meaning they presumably zero out the operand list each time.

Filfre printed 12.

Viola failed with an IndexError, trying to index into an empty list. Presumably it creates an empty list for each new instruction encountered.

I don’t know if there’s much to do with this information, but it was interesting to look at.

If the two highest bits of the instruction are set, then the form is VAR - this means there will be a byte following which describes the number and types of the operands. If bit 5 (the third highest) is also set then this is a 2OP instruction (je, jg, etc. the exact instruction determined by the lowest 5 bits). If bit 5 is zero then it is one of the VAR instructions (call, storew, etc.)

Theoretically a 2OP instruction in VAR form allows 0-4 operands, but in almost all cases this was used only to support operands of types not allowed by long form, and not to vary the number of operands. Most interpreters will not behave properly as you might expect when a 2OP is given anything other than two operands. The notable exceptions to this are set_colour which supports an optional third argument (in V6 only) and je which does support up to four. A further caveat is je with zero or one operand is probably best considered undefined. Testing I did a long time ago revealed different interpreters will yield different behavior in this case (even Infocom’s own interpreters were not consistent here). It is an interesting question as to whether the 2OP call instructions can use fewer or more operands than two, making them indistinguishable from the call_var instructions which were developed first. Probably best to consider that as undefined as well, although I can’t say I’ve personally tested it on many interpreters.