Updating the Z-Machine Standard Documents

@check_unicode is supposed to check for support for a given unicode character, but different fonts are likely (in some cases certain) to support printing for different ranges of characters.

Is it reasonable to change the description to say that, if possible, interpreters should return the value for the current font, in the current window?

I think it’s slightly more reasonable than the current wording, but the whole concept of asking for font support is flawed. Something in the software stack has to know the right answer for text rendering purposes, but that’s usually very far removed from the interpreter and not readily (or at all) available to it. With a modern text rendering stack that solves all the hard parts for you and has access to all the system fonts, the most likely outcome is that you’ll still get something rendered even if the font the interpreter wanted to use doesn’t have all the glyphs. Of course, not all interpreters work that way today, but I think that’s the direction everything has been moving towards for good reasons. Note that asking about code point + font combinations is still an inaccurate oversimplification: it’s really about glyphs, which don’t have any simple or 1:1 relationship with code points. (Although the fanciest text rendering implementation also have use this simplification internally to some extent - one reason why Text Rendering Hates You)

2 Likes

To underscore this point: running @check_unicode on some runic codepoints (e.g., codepoint 5842) yields extremely inconsistent results across modern interpreters. Parchment behaves as expected; the default configuration of Windows Frotz claimes not to support this codepoint but actually does; and the default configuration of Gargoyle claims to support this codepoint but actually does not.

This is a problem even in the Glulx realm, which is presumably why Unicode-heavy games like Aotearoa perform a manual sanity-check (“player, can you read these characters?”) instead of relying on a programmatic check.

3 Likes

Also note that Parchment just claims to support all code points, which is pretty much the only thing you can do in a browser and will usually be correct (because browsers try hard to render as much text as possible and system fonts usually cover a lot of glyphs) but can also give completely wrong result. The other interpreters likely aren’t much more sophisticated, just wrong in different cases for equally simple reasons.

2 Likes

Yeah, I don’t think there is a reliable solution. The standard says “may be used” and doesn’t require it, so maybe just add a note that if done, it may break the layout of some games.

Edit: Or it could just be illegal when printing with fixed pitch (font, style, upper window, etc.)

Why not? They can be used (at least for output) without any issues, see Z-Machine 1.2 Proposal (again) - #30 by borg323

I wouldn’t mind seeing an addition to the standard to allow output of additional unicode, but the idea that it can work without issues is flawed. It may work on Windows through the incidental fact that Windows uses UTF-16, but practically everyone else in the world uses UTF-8. Just outputting two surrogate characters side by side doesn’t work on linux.

Because it would be a major change to the Z-Machine. It’s a great idea for a future update to the Standard, but not for 1.1.

The latest Gargoyle (2023.1) should support this—finally—because it now uses Unifont to look up glyphs that are missing in the main fonts, meaning there should be total Unicode coverage. Although I recently learned that the Debian (and thus Ubuntu) package patches Gargoyle to not install its Unifont, and also doesn’t patch Gargoyle to use any existing system-wide Unifont, so things will not work right if you’re using that package. I’ll be doing a bug report once I get a Debian install in a VM, so hopefully future releases will work better regarding fonts.

3 Likes

Bad wording on my part, I wrote that in a hurry. Having coded that for Linux, I meant to say it is easy to get it to work.

In any other interpreter, surrogate pairs are like any other unprintable character, so why do we need to treat them specially?

Understood.

I think adding the ability to output additional unicode should belong to a new version of the standard. Also, the combining of surrogates into a single character should be done in the interpreter and not rely on OS level translation like the output of two side by side surrogate characters.

In my interpreter written in Rust, output goes roughly like this:
zchars → zscii → unicode
where zchars to zscii is many-to-one and zscii to unicode is one-to-one.

Rust uses UTF-8 for its strings and surrogate code points are illegal. I’d have to change my code so that zscii to unicode becomes many-to-one. This has several knock-on implications for my interpreter because it supports saving state even mid-output.

1 Like

I wrote up a test program for unicode output. It’s a little limited, in that it prints everything in the upper window, so it doesn’t test the proportional font, but I figured it might be helpful for people. I put it up at http://frobnitz.co.uk/zmachine/unicode.z5

2 Likes

There’s already a unicode.z5 test - maybe call it all-unicode.z5?

Referring to memory streams, Section 7.1.2.1 of the standard says:

Output stream 3 writes to a table in dynamic memory. When the stream is selected, the table may have any contents (even the initial ‘size’ word will be ignored by the interpreter). While the stream is selected, the table’s contents are unspecified (and a game cannot safely read or write to it). When the stream is deselected, the initial word of the table holds the number of characters printed and subsequent bytes hold those characters. Similarly, in Version 6, the total width of printing (in units) will then be stored in the word at $30 in the header. (It is the programmer’s responsibility to make the table large enough: the interpreter performs no overflow checking.)

One thing not made clear is whether multiple simultaneous streams open to the same or an overlapping table is supported. The primary importance of this being where the interpreter stores the number of charcters written while the stream is open. Storing the count in the first word of the table while the stream is open would mean the last stream printed to would determine the final count when all streams are closed. If the count is stored in the interpreter’s own memory, then the first stream opened / last closed would determine the count.

Forbidding duplicate memory streams would remove the issue altogether…sort of. If the interpreter checks for duplicate table addresses, then it would be prevented (unless the tables are off by just one byte). There’s still the issue of overlapping tables, but with no initial size information given, there is no way to detect this unless the interpreter checks for overlap on every character printed to stream 3.

1 Like

This feels like an excellent candidate for “the behavior is implementation-defined and not guaranteed”.

2 Likes

Matching addresses, running into another stream’s table, or both?

Obviously there’s the possibility of weirdness here and I just want to be able to catch and report undefined behavior.

Overlapping tables.

The spec has a line “While the stream is selected, the table’s contents are unspecified (and a game cannot safely read or write to it)”. Extend that to include “cannot safely start a new stream that writes to overlapping memory” – which is, after all, part of “writing to it”.

Interpreters may (but do not have to) handle the stream-writing with a separate, non-VM buffer that is pasted back into VM memory when the stream closes. When the spec allows for this, it pretty much has to throw up its hands and leave the result unspecified.

I was thinking along the lines of: The contents can’t be read/written by the story, but what the interpreter can do is fair game, or at least unspecified.

Edit: It probably is best if overlapping tables are undefined/illegal. I was trying to imagine there might be some legitimate uses for them and not rule them out entirely.

Making overlapping stream 3 tables illegal allows an interpreter implementation the extra freedom to use the table in Z-machine memory to hold the printed characters and running total while the stream is open, instead of requiring a separate buffer.

Edit: It’s a shame that V6 requires a separate buffer be maintained for the total printing width in pixels. As always, V6 irritates me.

Agree on both counts. I’m staying right out of V6 issues.