Z-machine standard: unclear aspects/ambiguities

The following is a mix of experience and opinion. Take it with a grain of salt. :grinning:

  1. Table locations - If the standard doesn’t say, then assume the table can appear in either static or dynamic memory. Non-default dictionaries can be in dynamic memory. Even if current usages don’t place tables in dynamic memory, there’s nothing preventing them from being placed there.

  2. Interrupts are nebulous beasts in the z-machine. There are three types and honestly I feel they should have specific rules for each - namely output during a sound interrupt probably shouldn’t be allowed as it has the ability to mess with the display of player input if such is happening concurrently. Newline interrupts are an enourmous headache and luckily are relagated to V6 games. Restore, Restart and Quit are definitely legal during timed input interrupts as this is required for Border Zone to run properly. The instructions for catch and throw should be illegal as well as save_undo due to the z-machine’s save state not being able to record whether a player input is in progress.

  3. You are correct. I feel you can deduce this from the underflow behavior mentioned in the standard, but it is correct regardless.

  4. Good question - newline interrupts are a royal pain.

  5. I believe it is all lines combined. Again V6 is painful.

  6. You are correct - it should be interrupt countdown, not line count. If line count is not -999, then the line count should be checked against how many lines the interpreter can fit on the screen at once, so a large output can be paused with a [MORE] prompt or similar for the player to read.

  7. I wouldn’t make missing sound effects an error as games generally are playable without sound, but you could pop up a warning to the player (maybe make it suppressible).

  8. It shouldn’t matter if you don’t write anything, as the contents of that word will generally only be inspected when a menu click keypress occurrs.

  9. It is a quirk of the original Mac implementation.

  10. Yeah, there’s no reliable way to guess what each extended opcode may do. Ignore and hope for the best!

  11. This is a wording issue. The left over input isn’t from an interrupted read, but rather one which was typically terminated via something other than a newline - Beyond Zork uses this to implement a set of definable macro keys.

  12. Remember large constants can still be small numbers, so in general, this can still work. However in the case you mention where the high byte is non-zero, it’s somewhat ambiguous. I would just assume a field length of one byte is still legal and just won’t match the searched for value. I see no reason to make it an error.

  13. Empty object names are legal to print and print nothing. This definitely occurs in extant games.

  14. I’ve tested this before. It is my firm belief that these are bugs in the original games. Those words are not able to be looked up in Infocom’s original interpreters.

  15. Loading of operands happens before execution of the instruction, so the stack value referenced by the second operand is popped. The value is then stored in the new top of the stack (without pushing).

2 Likes

I like this better than my own answer.

2 Likes

Ah, thanks – that clarifies that.

2 Likes

You may find something useful from this old thread I made about undefined Z-machine behavior.

3 Likes

Wow, I’d never expect to get this many answers so quickly! Thank you very much everyone!

The problem is that whether there is a sequence or not, S3.2 clearly states that the highest bit in the last word marks the end of the text. It does say about sequence truncation later in S3.6.1, but that is mentioned specifically in the context of decoding ZSCII characters. The spec doesn’t say anywhere that the end bit could be omitted; ergo, my assumption was that these two things (the end bit and multi Z-character constructions) are unrelated.

At the same time, S.13 gives the exact length (in bytes) of every encoded dictionary word, but it still does not say anything about the end bit, so I had to assume that normal decoding rules still apply there.

And it seems my assumptions were right after all.

The errata seems to be very helpful, thank you. As for the test, I’ve already found it before and am going to put it into use whenever I’m actually able to run it.

I’m not sure if I can agree, at least in theory (since I haven’t begun working on the frontend yet). S.8 gives a rather thorough description of the V6 screen model, and aside from what I mentioned earlier there doesn’t seem to be much room for frivolous interpretation there. Yes, it is complex, but ultimately its implementation should just be a matter of time. But if it’s not very popular, I guess I may have to resort to looking at the source code of existing implementations, even though it would take away some of the challenge.

Sounds interesting. I will check it out, thanks.

1 Like

Note that praxix uses undefined behavior of the print_table instruction, see the discussion in Additional documentation for @print_table opcode?

1 Like

This only ever happens with multi Z-character constructions. I believe that whatever truncated the dictionary text had a bug that failed to set the high bit when in the middle of a construction, e.g. nasty-knife, and storm-tossed.

This should be Zork I Release 88 Serial 840726, with additional zero padding. In my opinion it is a good story to begin testing with. The first thing it will print upon starting is ZORK. :grinning:

Fun trivia - if you stub out the random number generator to just return zero (or one… my memory is failing me here), the game is unwinnable because neither you nor the troll can hit each other in combat.

1 Like

You want to be careful about this. The spec is just a document that some people wrote.

When you’re doing this kind of archeology, remember:

  • The spec can be wrong.
  • Modern interpreters can be wrong.
  • The modern Inform and ZILF compilers can be wrong.
  • Infocom’s interpreters can be wrong.
  • Infocom’s game files can be wrong.
  • Infocom’s compiler can be wrong.

There’s a couple of cases where Infocom’s game files and interpreter are wrong, but they cancel each other out so that the original game worked. (The Beyond Zork rotating-mirror bug is a classic.) That requires special-casing in modern interpreters.

For this dictionary situation, Infocom’s game files don’t match their interpreter (the words are not recognized) so we know Infocom made some mistake. Looking closely, we observe that the game files have a consistent discrepancy which implies a bug in Infocom’s compiler. We also observe that Infocom’s interpreters match modern interpreters (modern interpreters don’t recognize those words either). So there’s a clear path forward, but you don’t get there by blindly sticking to the spec.

(If we were updating the spec document, it would be worth a footnote here explaining what we just learned.)

Yeah, I’ve seen that one already while searching for more info on print_form and print_table.

You are right. That’s why I had to start this thread in the first place. =)

Here’s one more minor issue I’ve stumbled upon recently:

16. The verify opcode

This opcode is listed as being available from version 3, and it says that the interpreter should use word $1A in the header to calculate the checksum and compare it against the known value in word $1C. The problem is that S.11 also says that some early Version 3 files do not contain length and checksum data. What does this mean, exactly? (e.g. that both words will be 0, or perhaps they will contain some unrelated data pertaining to e.g. the object table or some such).

It would also be logical to assume that any file that does not contain this information also does not contain the verify opcode, but does anyone know if that’s actually the case? If not, it’s unclear what the interpreter should do if it sees that the length and checksum values are not provided, assuming it is possible to find that out in the first place.

They are zero in every example I’ve looked at (that does not contain a verify instruction).

This is true in every example I’ve looked at.

The only way would be if they are zero. There is no other indicator.

Edit: I think Infocom was forward thinking enough to leave unused header bits zeroed until needed.

Additional Edit: Thinking further on this, it is (however unlikely) entirely possible for the checksum to legitimately be zero, so the only thing you can do when encountering the verify opcode is naively calculate and compare to the header value.

I see. Then there’s nothing to worry about here.

I don’t actually know this story. What was it?

2 Likes

See thread: Beyond Zork passing invalid values to @get_prop_addr

2 Likes

Something else I’ve been wondering about recently:

17. Address truncation

Apparently, according to this thread: Index to @loadw: signed or unsigned?, the final address of the word indicated by the opcode loadw is supposed to be truncated to the bottom 16 bits. But what about the other opcodes? (Logically, at least storew should be expected to do the same.)

S.1 says that the total of dynamic plus static memory must not exceed 64K, so my current assumption is that any opcode that does not accept a packed address as an argument should be handled this way. Is this correct?

1 Like

That’s correct, loadw/storew and loadb/storeb are limited to 64K. This is unfortunate, but too late to change I think. Packed addresses allow reading strings and executing code beyond that limit.

In my thread about undefined behavior I alluded to a way to circumvent even the packed address limit, but that definitely falls in the realm of undefined behavior that probably will not work on most interpreters.

What about opcodes like copy_table, scan_table, or print_addr (NB: not print_paddr) ? Should the address wrap around from $FFFF to zero while scanning/copying/decoding?

I doubt you’ll ever see it happen, but there’s probably no harm in letting them exceed the limit. The description for print_addr says “Print (Z-encoded) string at given byte address, in dynamic or static memory.” but it isn’t clear if it simply means the address, or the entire string. Since print_paddr already prints strings beyond this limit, I don’t see any problem with a string that overruns it. I doubt anyone is going to make a game with a string that runs over the limit and then be surprised by the behavior of your interpreter if it prints the whole thing vs printing the contents of the story header! The descriptions for scan_table and print_table don’t mention memory limits, so it’s anybody’s guess, but again I doubt you’ll ever see it.

1 Like

Bocfel wraps in copy_table and scan_table, and it looks like Frotz does the same. I’d probably consider this undefined behavior, and just do what you want. Games can’t rely on accessing memory above 64K since some existing interpreters won’t allow it, so I’d say that any game that tries to do it is wrong.

print_addr is a more interesting case. Most interpreters (Bocfel, ZVM, Fizmo, Zoom, Nitfol, Viola) support crossing the 64K boundary. Frotz (including Windows Frotz) does not, although the result is a segfault, so I don’t think it was a conscious decision in Frotz to disallow it, just a consequence of not expecting it to ever happen.

Infocom’s Beyond Zork interpreter (from Lost Treasures) also handles it properly.

I’m attaching a small program that calls print_addr on a string that crosses the 64K boundary. It prints the address of the string (using print_num, so it’s signed), which is -2/0xfffe, and then prints the string itself. This should allow easy testing of such behavior.

64k.zip (388 Bytes)

Edit: I’m adding a less-annoying file to demonstrate the same thing. The old one has thousands of @nop calls, which made it easier for me when using a disassembler to ensure I was at the 64K boundary, but here’s one that just fills the space with zeros so disassembly/running is cleaner.

64k-new.zip (398 Bytes)

2 Likes

The issue here is that memory limits are not mentioned for loadw either, and yet they do exist.

Undefined behavior per se does not concern me much, but I suppose there might be known cases of a story file (ab)using it in order to achieve the desired result, like in the Praxix case - that’s why I had to ask. Although Praxix is apparently not a real game but a test file, that particular scenario is obviously constructed on purpose. I wonder if there any actual cases of real games leveraging this wrapping behavior…

I see. I think I’m going to implement wrapping just for loadw, storew, loadb, and storeb, and leave the other cases as-is then.

Many thanks! I’m going to try it out whenever I’m actually able to run it. =)

I havn’t developed any interpreters but if I did I would assume that everything is limited to 64kb, unless explicitly defined otherwise (as with packed addresses). Infocom designed the z-machine for 8-bit computers and 64kb-address space was a given. As a game developer i would treat the behavoiur of every instruction that don’t deal with packed addresses as undefined and that I couldn’t count on a certain consistent behaviour over every interpreter out there. It could wrap around or it could extend beyond, either way I can’t count on it being consistent.