The Z-Machine and @check_unicode

I ran a quick test file on Gargoyle, Windows Frotz, DOS Frotz and Parchment, and they all give at least some false information as to the availability of Unicode characters.

Are there any Z-Machine interpreters out there which actually implement @check_unicode as per the specification? Is it even remotely plausible for modern interpreters to do this?

Windows Frotz does try reasonably hard to get this right. Without knowing what you think is wrong, it’s hard to comment any further.

Well, for instance (on my computer at least) my test file on Windows Frotz comes out with

This obviously displays perfectly well, but the interpreter tell the game that it won’t.

Does Windows Frotz hardcode which characters are available for output? If so, surely that breaks the moment the user changes the font.

If I remember correctly, Parchment says everything is available for input and output, even control codes, DOS Frotz claims to handle every valid character up to 255 (which all interpreters are supposed to do) but actually prints many of the non-ASCII characters incorrectly, and Gargoyle had allegedly available characters printing as ?

If a glyph is not available in a font, Gargoyle shows a question mark rather than finding a fallback character. Bocfel (the default Z-machine interpreter in the latest Gargoyle) can print any Unicode character, assuming the Glk implementation in use supports it, and that the glk_put_char_uni() function works properly. Or, if Bocfel’s compiled in non-Glk mode, it can print any Unicode character if the terminal supports UTF-8.

@check_unicode in Bocfel is implemented as follows:

  • Characters 0-31 and 127-159 are considered invalid for input and output (standard 1.1: “Unicode characters U+0000 to U+001F and U+007F to U+009F are control codes, and must not be used.”)
  • All other characters are valid for output, unless the user has disabled Unicode, in which case characters >255 are invalid.
  • Characters in the Unicode translation table are valid for input; all others are not. If Unicode support is disabled, no characters >255 are considered part of the table.

I think this follows the specification properly (although I suppose technically if Unicode is disabled, it might be wrong to consider anything valid; however, I think “decaying” to Latin-1 is a good property, especially since the default Unicode translation table is mainly Latin-1).

No, it queries the operating system (see TextOutput::CanOutput() in the source code). The source for the “Unicode.z5” example that comes with Windows Frotz demonstrates this - when printing out the tables of characters, it uses the @check_unicode opcode to check whether a character can be output, and if not, outputs a space instead.

Some things to consider:

  • What does Unicode.z5 display on your computer? If it looks alright but your test code does not, there may be a problem with your test code.
  • What version of Windows are you running? Old Windows 9X versions have very limited support for querying fonts and printing Unicode text. On these Windows Frotz does what it can, but that may not be completely accurate.

(By the way, Windows Frotz always reports characters (except control characters) as available for input - with a virtual keyboard program, anything could be generated. So that aspect of @check_unicode is not very useful.)

I’m running Windows 7. Unicode.z5 does indeed skip those four characters which I listed, even though my test program shows they can be output successfully. I’m not sure how this is possible.

Despite this problem, Windows Frotz seems to have the most accurate implementation of @check_unicode of any of the interpreters I’ve tried. I never could figure out how to check for a character’s availability for the Z-Machine interpreter I was writing in Python.

It makes me wonder if it might be a good idea for the interpreter to be able to return a value that tells the game it just doesn’t know if it can print a character or not. Whilst not terribly helpful in helping a game decide if a given character will work, it would at least mean that interpreters wouldn’t have to lie to the game. I’m not sure on this one. I’d rather @check_unicode actually worked properly in all standard interpreters, but I’m not sure that’s reasonable.

Now that I look, I can see the same behaviour here. Hmmm, it may be that there is a bug in my code. I will investigate.

The trouble with that is that it is hard to see what a game is going to do with such information (assuming that anyone writes a game that actually does these tests, which is in practice unlikely).

The simplistic notion is that the game will test the availability of “Д”, find that it’s not available, decide that Cyrillic text cannot be displayed, and switch to a Latin transliteration.

I agree that in practice game authors will not do this, and if they try, interpreters will not give them reliable answers.

Well… I found the problem with @check_unicode because I am actually using it in a game. I guess my thinking is that if @check_unicode isn’t going to be useful, we shouldn’t have the standard pretending that it is. I guess a game that bothers to check is going to just print the characters anyway and hope for the best (like most games probably do anyway, without using @check_unicode).

Actually, what I’m assuming will happen is that changing the behaviour of @check_unicode will dissuade those game writers who may think to use it from doing so.

It’s probably not particularly important, it’s just bugging me that my current choices for my interpreter seem to be:

  • just lie and hope for the best
  • deny access to all Unicode characters outside Latin-1
  • hardcode which characters are available and don’t let the user change the font
  • try to write up some code that picks apart the actual font file to find out what characters are available

Like I say, not really hugely important, it’s just bugging me.

Perhaps the best we can manage is a small re-wording of the standard. Possibly @check_unicode should be rephrased as saying that it returns the output bit set if it is possible that the character can be output, and only returns without the output bit set if it is certain that it cannot. This at least is something that can be a) implemented in practice, and b) returns useful information to any game that asks for it.

That actually works better than the idea I had. It also has the bonus of being basically what interpreters are doing already.

That sounds like a sensible change.

I now have a fix for this, though I can’t see why Windows seems to get it wrong in the first place. Sigh. The next release of Windows Frotz will include this.

I have a couple Glulx games that spit out a message that “Run-time problem P52: This interpreter does not support Unicode.” My own test code generated from Inform 7 6M62 doesn’t say these messages and just prints question marks ??? where the Unicode output would normally be. Is it the story-file creating this message?

If I build the story Pogoman GO! from source: … e/ - it spits out that message “Run-time problem P52: This interpreter does not support Unicode.” in the opening messages. How is that being generated?

We don’t usually make a big thing of thread necromancy in these parts, but can you repost this as a new question in the Inform section? This thread is two and a half years old, and you are not asking about the Z-machine at all.

Can someone please provide an Inform 7 code sample, that will compile with Inform 7 6M62, that uses this @check_unicode? Yes, for a Z-machine interpreter is fine, as some of the issues I am dealing with are at the Glk layer and common to both Glulx and z8 virtual machine runtime.