Do any Z-machine interpreters not support Unicode translation tables?

Draconis · May 19, 2025, 4:44am

The Z-machine spec (1.0 and later) allows games to include a “Unicode translation table”, which defines ZSCII characters 155 through 251. If one is not provided, default values (from the Latin-1 character set) are used for 155 through 223, and 224 through 251 are undefined.

I’m working to improve Unicode support in the Dialog Z-machine backend, and I can see a couple main paths forward:

Keep the default values for 155-223, and assign whatever other characters appear in dictionary words to 224-251. If more than 28 non-standard characters are used, error. This offers the fewest characters to the user.
Start by keeping the default values for 155-223, and assign whatever other characters appear in dictionary words to 224-251. If more than 28 non-standard characters are used, start overwriting the ones from 155-223. This requires the most rewriting in the compiler.
Throw out the default table, and assign whatever characters appear in dictionary words to 155-251. This is the most elegant, but the least backwards-compatible.

My preference would be to do the first one, if there are a bunch of interpreters out there that support ZSCII 155-223 but don’t support redefining them, or the third one, if there aren’t.

So my question is: are there interpreters out there that support non-ASCII characters in input, but don’t support redefining the upper ZSCII characters with a Unicode translation table? In other words, is there a strong reason to keep the default 69 characters at their usual codepoints?

andrewj · May 19, 2025, 8:23am

I’m guessing the compiler makes it hard to know all the non-ASCII characters before actually assigning them a ZSCII (I had that issue with my Hintweaver program). Hence you need to assign a ZSCII at first usage.

Out of your options, I think (b) is the best one when aiming to be the most compatible with anything out there. But I can understand it may be simply too much work.

Some things I also noticed: for input you will need both a lowercase and uppercase version of any particular character, which chews up the slots even quicker.

Also to handle characters outside of the BMP (over U+FFFF) could probably be done using surrogate pairs, but it makes things even more complicated. I simply ignored these for my program (with a warning message).

Draconis · May 19, 2025, 5:05pm

Yeah, the existing compiler architecture makes it difficult to do two passes over the strings and dictionary words; it’ll be much easier if this can be done in a single pass. And currently Dialog just doesn’t try to handle casing at all for non-ASCII characters on Z-machine, even ones in the standard ZSCII set. It would be nice to improve that, but it’ll be a separate issue.

Warrigal · May 19, 2025, 5:20pm

This is just a gut feeling, but I don’t think any interpreters for 8-bit or 16-bit computers support Unicode, as Unicode hadn’t been invented way back then. Those computers were ASCII only (or pretty close to it). Most of them didn’t even have European characters with diacritic marks. If you include any non-ASCII characters in your games, those old interpreters will display them as strange characters or question marks.

Draconis · May 19, 2025, 8:18pm

I know at least some of Infocom’s interpreters support ß and ü, since those were needed for the German Zork 1, but I also don’t know if anyone still uses original Infocom interpreters for anything. It’s very possible that any given Dialog game will crash due to using post-Infocom features somewhere.

Mike_G · May 20, 2025, 2:25am

Since Standard 1.0, unicode support is mandatory for V5+

I imagine Infocom’s own interpreters wouldn’t qualify as compliant here (except for whatever the German Beta of Zork I was supposed to run on???)

I highly doubt there are many (any?) modern interpreters that both:
A) Provide the default unicode translation table
-and-
B) Can’t recognize the header extension to redefine those characters.

I’d just chuck the default table when a non-default character is used. If people are using original Infocom interpreters I don’t think it is reasonable to think all modern games would work on them. The standard isn’t meant to work that way.

JTN · May 20, 2025, 10:10am

I think the Puddle BuildTools, which are used to package some new works, like to use original Infocom interpreters for some old platforms? (But I don’t know how relevant this is.)

Mike_G · May 22, 2025, 3:35am

After looking through the source for Infocom’s interpreters it appears that only the Amiga had any support of course designed for the German translation of Zork I, which didn’t make it past beta.

The characters supported match the standard, but only include characters 155-163. There’s no evidence I can find that any other characters were ever supported.

I have to wonder if even that support ever made it into any actual released interpreters. Perhaps someone with an Amiga or emulator could do some testing.

fredrik · May 22, 2025, 10:13pm

Ozmoo can support pretty much any subset of the default 69 characters, but has no support for a unicode translation table. Since it runs on 8-bit hardware, with no unicode fonts and no unicode support on the computer platforms as such, it repurposes graphic characters as accented characters, and has to have a custom font where these replacements have been made.

Ozmoo ships with settings+fonts to support specific languages, e.g. if you invoke support for Swedish, it will support åäöÅÄÖé, and that’s it. If the game moves any of these characters to different code points, Ozmoo won’t detect it - it will just not work as intended.

Draconis · May 22, 2025, 10:17pm

Aha! That’s exactly the sort of thing I was wondering about.

The next question is: do Dialog games work on Ozmoo? I know I7 games generally don’t because of the RAM requirements.

Mike_G · May 23, 2025, 2:22am

If you replace the default unicode characters or extend them, does it make much difference in Ozmoo’s case? It sounds like it wouldn’t support the new character(s) no matter what you did.

Draconis · May 23, 2025, 2:37am

No difference if a game uses characters outside the default table—they won’t be supported either way. But if a game uses only characters from the default table, like é and ñ, then current Dialog should work on Ozmoo (since it always uses the default codepoints for those characters) and my proposed enhancement wouldn’t (since it would put them at any available codepoints).

If this is a problem—i.e. if current Dialog games can run on Ozmoo—then I’ll need to figure out some way to avoid breaking that.

fredrik · May 23, 2025, 10:27am

Dialog games generally work fine on Ozmoo. They can be a bit on the slow side on the slowest platforms, like the C64, but on most platforms they run well.

I7 games are a problem for two reasons - they typically need a bigger stack than what Ozmoo can provide on a memory-constrained platform such as C64, and they typically do so much processing that a 1-2 MHz CPU speed isn’t adequate. On the MEGA65 (40 MHz, 8+ MB of RAM) and the Commander X16 (8 MHz, 512+ KB of RAM), a few Inform 7 games work well.

Mike_G · May 23, 2025, 12:24pm

Sorry, I guess I don’t understand why you would be building a new table when there are no non-default characters used. I definitely wouldn’t do that. Even though the standard demands the new table be supported, it’s understandable that 8-bit interpreters would find that difficult or impossible to handle.

Now, if say a game uses several of the default characters and just one or two new characters (is this scenario you are envisioning?) In that case I’d just build the new table. The new characters aren’t going to work anyway on an interpreter that only supports the default, so I don’t think it matters much if some of the characters work vs none.

fredrik · May 23, 2025, 3:05pm

If you build a game in say German using Ozmoo, it will need äöü etc to be in their default positions in the table.

If a game author then adds say a © to be used in the banner, it seems like a bad idea to make the game unplayable on Ozmoo for this.

Draconis · May 23, 2025, 3:15pm

Logistical reasons, really:

Right now, the compiler only makes one pass through the dictionary words to encode them; adding another one is difficult for someone who doesn’t fully understand the compiler architecture (i.e. me)
Characters only need to be added to the table if they’re used in dictionary words (if they’re just printed to the screen, that’s handled with @print_unicode), and the compiler doesn’t always get this right

I think for now I’ll add extra characters to the end of the table, since that’s fairly easy to implement while not breaking Ozmoo compatibility. If 28 characters ends up being insufficient, we can add a more sophisticated approach later.

Mike_G · May 23, 2025, 5:18pm

Could the compiler not remember extra characters as they come up during encoding? Then decide whether or not to build a new table afterward?

Edit: Nevermind, I think I misunderstood what you were saying.

I think I get it now.

Mike_G · May 23, 2025, 5:38pm

That’s already going to be the case for existing games that were designed to run on standard interpreters. But I can see how replicating the default characters in a translation table would preserve a measure of compatibility for Ozmoo as long as not many new characters are used.

Though if more than 28 non-standard characters are used, then I’d rather see a build that follows the standard rather than erroring or subverting what the author intended.

Does Ozmoo produce a diagnostic if it encounters a game with a non-default translation table?

Draconis · May 24, 2025, 1:27am

I suppose this also raises the question: how many non-standard characters does a typical game use? Not counting things like © here—this is specifically for characters that need to be parsed in user input. (In practice, Dialog isn’t always perfect at figuring out which words need to be parsed in user input, and occasionally has false positives. But we’re actively working on that issue.) Characters that are only ever output, like smart quotes and em-dashes, are handled via @print_unicode, which bypasses the ZSCII encoding completely.

Most European languages are pretty well covered by the standard set, and only need a couple more at most. For example, Polish needs ą, ę, ć, ń, ś, ź, ł, ż, for a total of eight. (Since this is only needed for input, not output, capitals aren’t necessary. Including them would help with text compression, but we’re going for usable here, not optimized.) Outside of Europe, Maori needs five, Hittite needs seven, Turkish needs four (and also a locale-aware interpreter to handle the casing!). All of those would fit into the end of the standard table.

Greek, with an entirely separate alphabet, needs 33. Russian also needs 33. Japanese, if you stick to exclusively kana, needs 80. So all three of those need more than 28, but Greek and Russian just barely.

My current inclination is to add extra characters to the end of the Unicode table and error out if more than 28 are needed (for games in German, Polish, or Maori), unless a command-line option is set to discard the default table (for games in Greek, Russian, or Japanese).

fredrik · May 24, 2025, 9:10am

It does not, but it would probably be a good idea to add this.