Z-machine standard: unclear aspects/ambiguities

Mike_G · September 3, 2023, 1:49pm

That’s correct, loadw/storew and loadb/storeb are limited to 64K. This is unfortunate, but too late to change I think. Packed addresses allow reading strings and executing code beyond that limit.

In my thread about undefined behavior I alluded to a way to circumvent even the packed address limit, but that definitely falls in the realm of undefined behavior that probably will not work on most interpreters.

Player701 · September 3, 2023, 2:04pm

What about opcodes like copy_table, scan_table, or print_addr (NB: not print_paddr) ? Should the address wrap around from $FFFF to zero while scanning/copying/decoding?

Mike_G · September 3, 2023, 4:19pm

I doubt you’ll ever see it happen, but there’s probably no harm in letting them exceed the limit. The description for print_addr says “Print (Z-encoded) string at given byte address, in dynamic or static memory.” but it isn’t clear if it simply means the address, or the entire string. Since print_paddr already prints strings beyond this limit, I don’t see any problem with a string that overruns it. I doubt anyone is going to make a game with a string that runs over the limit and then be surprised by the behavior of your interpreter if it prints the whole thing vs printing the contents of the story header! The descriptions for scan_table and print_table don’t mention memory limits, so it’s anybody’s guess, but again I doubt you’ll ever see it.

cas · September 3, 2023, 4:38pm

Bocfel wraps in copy_table and scan_table, and it looks like Frotz does the same. I’d probably consider this undefined behavior, and just do what you want. Games can’t rely on accessing memory above 64K since some existing interpreters won’t allow it, so I’d say that any game that tries to do it is wrong.

print_addr is a more interesting case. Most interpreters (Bocfel, ZVM, Fizmo, Zoom, Nitfol, Viola) support crossing the 64K boundary. Frotz (including Windows Frotz) does not, although the result is a segfault, so I don’t think it was a conscious decision in Frotz to disallow it, just a consequence of not expecting it to ever happen.

Infocom’s Beyond Zork interpreter (from Lost Treasures) also handles it properly.

I’m attaching a small program that calls print_addr on a string that crosses the 64K boundary. It prints the address of the string (using print_num, so it’s signed), which is -2/0xfffe, and then prints the string itself. This should allow easy testing of such behavior.

64k.zip (388 Bytes)

Edit: I’m adding a less-annoying file to demonstrate the same thing. The old one has thousands of @nop calls, which made it easier for me when using a disassembler to ensure I was at the 64K boundary, but here’s one that just fills the space with zeros so disassembly/running is cleaner.

64k-new.zip (398 Bytes)

Player701 · September 3, 2023, 5:44pm

The issue here is that memory limits are not mentioned for loadw either, and yet they do exist.

Undefined behavior per se does not concern me much, but I suppose there might be known cases of a story file (ab)using it in order to achieve the desired result, like in the Praxix case - that’s why I had to ask. Although Praxix is apparently not a real game but a test file, that particular scenario is obviously constructed on purpose. I wonder if there any actual cases of real games leveraging this wrapping behavior…

I see. I think I’m going to implement wrapping just for loadw, storew, loadb, and storeb, and leave the other cases as-is then.

Many thanks! I’m going to try it out whenever I’m actually able to run it. =)

heasm66 · September 3, 2023, 6:12pm

I havn’t developed any interpreters but if I did I would assume that everything is limited to 64kb, unless explicitly defined otherwise (as with packed addresses). Infocom designed the z-machine for 8-bit computers and 64kb-address space was a given. As a game developer i would treat the behavoiur of every instruction that don’t deal with packed addresses as undefined and that I couldn’t count on a certain consistent behaviour over every interpreter out there. It could wrap around or it could extend beyond, either way I can’t count on it being consistent.

Mike_G · September 3, 2023, 6:40pm

The limits are mentioned:

Stores array–>word-index (i.e., the word at address array+2*word-index, which must lie in static or dynamic memory).

cas · September 3, 2023, 6:41pm

As an update, Frotz doesn’t segfault as a rule, but just as a consequence of this particular file and what address it wraps around to. Ultimately it’s undefined behavior in Frotz which might segfault, but might not.

Player701 · September 3, 2023, 7:50pm

Same for print_addr:

Print (Z-encoded) string at given byte address, in dynamic or static memory.

and yet:

In the end, it feels as if that remark is nothing more than a reminder that the behavior outside of the 64K boundary is not clearly defined.

Mike_G · September 3, 2023, 8:28pm

I do think there is a slight difference between those cases, as for loadw it clearly means the address must lie below 64K, while with print_addr it isn’t clear if it means the start of the string (which must) or the entire string (which is undefined).

I think Chris put it best above:

Games can’t rely on accessing memory above 64K since some existing interpreters won’t allow it, so I’d say that any game that tries to do it is wrong.

With the obvious exceptions allowed by packed addresses of course.

Edit: I am kind of surprised by Frotz’s behavior with a string crossing the boundary, since printing strings beyond 64K is something all terps must do. In most implementations I would assume text decoding wouldn’t know or care whether it was initiated by print_addr, print_paddr, or something else. Still, it shows the dangers of assuming.

DavidG · September 4, 2023, 1:39am

Since the consensus seems that a string straddling the 64K mark is illegal, I’ll modify the Frotz core to abort with a fatal error if encountered.

Without referring to any code, I think that Frotz barfs because an 8-bit variable is used internally and the others don’t.

I’m curious what Infocom’s terps would do when encountering this condition. I’ll explore that over the next couple days while addressing the Issue filed at https://gitlab.com/DavidGriffith/frotz/-/issues/276

Mike_G · September 4, 2023, 2:12am

It was mentioned above that the Beyond Zork interpreter from Lost Treasures handles it ok.

Mike_G · September 4, 2023, 2:23am

My personal feeling is printing strings across the boundary is a special case that should be allowed. Strings are valid on both sides already.

A similar case would be whether or not executable code can flow across the boundary. The program counter has to be able to handle addresses above 64K anyway. You can even have execution flow go from static to high memory without having code right at the boundary as well, via jumps, so checking the address of the program counter isn’t sufficient to warn or error.

Player701 · September 4, 2023, 11:25am

Yes, you are technically correct. In the end, like I said before, I probably should do explicit wrapping for loadw et al., and leave the rest as-is because that’s undefined behavior.

I think I’ve found one more aspect of the screen model which is not properly explained. It is not V6-exclusive, so I hope it should be more or less known what to do here.

18. Screen height and width in lines and characters

S8.1 states:

Text may be printed in any font of the interpreter’s choice, variable- or fixed-pitch: except that <…exceptions omitted…>, then a fixed-pitch font must be used.

And S8.4 states:

The screen should ideally be at least 60 characters wide by 14 lines deep. <…> The interpreter may change the exact dimensions whenever it likes but must write the current height (in lines) and width (in characters) into bytes $20 and $21 in the header.

The question: if the current font is variable-pitch, how should the screen width be calculated? Also, should it be updated if the font changes from variable-pitch to fixed-pitch? It does say current screen size, so I presume it should be updated whenever the actual size changes, e.g. when the user resizes the game window - but it doesn’t say anything about the font.

S8.1.1 does say that the width of a font is defined as the width of ‘0’, but the context there is slightly different (V5+ font size information to put into header bytes $25 and $26), and it’s still unclear what to do when the font has changed.

The header bytes $20 and $21 are listed in S.11 as V4+. For V5+, S.8.4.3 also states:

In Version 5 and later, the screen’s width and height in units should be written to the words at $22 and $24.

Are these values used instead of the ones written to $20 and $21, or in addition to them?

borg323 · September 4, 2023, 2:05pm

Maybe this will be of some help. @Marvin had a page with what remains of the original infocom zip documentation, but it is no longer up. Luckily there is a copy in the web archive.

heasm66 · September 4, 2023, 2:14pm

Fortunately they are also archived here: infocom-zip-specs.zip - IF Archive Unboxing Service

Mike_G · September 4, 2023, 2:14pm

The best kind of correct.

Anytime the screen or font changes in such a way that the effective screen size changes, I update the values.

For the header values: Update both sets of fields. Games may look at and use either. They should contain the same values. Ideally 1 unit = 1 character, as this remark in section 8 says:

It’s recommended that a Version 5 interpreter always use units to correspond to characters: that is, characters occupy 1x1 units. ‘Beyond Zork’ was written in the expectation that it could be using either 1x1 or 8x8, and contains correct code to calculate screen positions whatever units are used. (Infocom’s Version 5 interpreter for MSDOS could either run in a text mode, 1x1, or a graphics mode, 8x8.) However, the German translation of ‘Zork I’ contains incorrect code to calculate screen positions unless 1x1 units are used.

Player701 · September 4, 2023, 2:38pm

Sized in units are well-documented - the problem is that $21 must specify screen width in characters, and it’s not entirely clear how to measure it if the current font is not fixed-pitch, and whether it should even be changed if the font (not screen!) size changes.

In the absence of any clarifications I’d assume the measurement is taken the same way as in V5 for font width (effectively producing $21 = W/w0 where W is the actual screen width in pixels, and w0 is the width of ‘0’ in the current font). I’m not sure if that’s correct behavior, however.

Looks like something is wrong with those, as every file I download from there actually appears to be an HTML file apparently saved from Github, not a PDF.

borg323 · September 4, 2023, 2:39pm

I was considering whether to revive the old thread for this or make a new one, I can just reply here instead. The ifarchive zip file only has some html files that hint at the correct github locations, so the pdf files can be found with a little effort, but probably a new version should be created.

heasm66 · September 4, 2023, 2:45pm

Strange. It’s easily fixed, though. I’ll submit a new set of files for the archive.