Which characters are allowed in Glk output?

Draconis · October 20, 2022, 12:46am

For Formatting Capture, I need an escape character to encode formatting instructions into a string. Since the encoding isn’t especially clever at the moment (making it difficult to escape the escape character itself), I want to use one that’s unlikely to ever be printed by an actual game.

My first thought was, of course, ESC (U+001B). But this threw an error in Git (or more likely whatever Glk library it’s using), saying I tried to print an unprintable character to an output stream in text mode.

For the Z-machine, the specification lays out exactly which codepoints are valid for input and for output. But I haven’t found anything like this in the Glk spec. Is there an official list of which codepoints can be printed in text mode? Or is it up to the implementation, which practically speaking will probably mean “anything not defined as a control character in Unicode”?

Draconis · October 20, 2022, 12:49am

P.S. I realize I should just bite the bullet and try to find a way to escape the escape character, in which case I can use something normal like the backslash. Inform just makes this extremely difficult. In the meantime, I’m fairly confident no author has ever used the spacing cedilla (U+00B8) in a Glulx game.

rileypb · October 20, 2022, 12:53am

Since you said that, my next game will feature an alien character named “¸¸¸¸” (you might want to increase your font size.)

Dannii · October 20, 2022, 12:58am

There is a Glk gestalt code which you can use to ask the interpreter which characters are safe to print.

But the thing is, I thought the JS interpreters just said that everything could be printed. Turns out, almost:

github.com

erkyrath/glkote/blob/7e0218cf35106c166cab616c32ecf59c2eeff7d2/glkapi.js#L4239-L4248


      
          case 3: // gestalt_CharOutput
              /* Same thing again. We assume that all printable characters,
                 as well as the placeholders for nonprintables, are one character
                 wide. */
              if ((val > 0x10FFFF) 
                  || (val >= 0 && val < 32) 
                  || (val >= 127 && val < 160)) {
                  if (arr)
                      arr[0] = 1;
                  return 0; // gestalt_CharOutput_CannotPrint

So it says it can’t print anything outside the basic multilingual plane of Unicode (even though it might be able to, if you printed the right Unicode surrogates), nor the control codes. But this is incorrect as the C0 block includes ~~several supported whitespace characters~~ the newline character.

But I assume you weren’t actually checking the gestalt and printing ESC actually resulted in an error? I’m not sure which part of GlkApi would do that, especially when printing to a memory stream, in which case all codes should be supported. As the spec notes “[You can write a disk file in text mode, but a memory stream is effectively always in binary mode.]” I’d say that being unable to print ESC to a memory stream (or a file stream in binary mode!) is an interpreter bug. Unfortunately it may be a common bug, and if it’s present in more than just GlkApi then we could be waiting years for printing ESC to be safe…

Draconis · October 20, 2022, 1:17am

Yeah, this extension was written very quickly just to see if the concept would work, so I didn’t bother checking the gestalt. Let me get the exact error it gave:

[** Programming error: tried to print (char) 27, which is not a valid Glk character code for output **]

Looking at this again, this looks like an I6 error message, not an interpreter one. And indeed, this is the error the veneer would give for calling RT__Err(33). And that’s called from RT__ChPrintC:

if (c<10 || (c>10 && c<32) || (c>126 && c<160))
  return RT__Err(33,c);
if (c>=0 && c<256)
  @streamchar c;
else
  @streamunichar c;

So this is a limitation imposed by I6, which forbids all characters from 0 to 32 (except for 10).

Dannii · October 20, 2022, 1:20am

Oh, that’s great news then! Patching that function won’t be difficult at all.

zarf · October 20, 2022, 1:21am

Well, not to rain on the parade, but spec 2.2:

When you are sending text to a window, or to a file open in text mode, you can print any of the printable Latin-1 characters: 32 to 126, 160 to 255. You can also print the newline character (linefeed, control-J, decimal 10, hex 0x0A.)

It is not legal to print any other control characters (0 to 9, 11 to 31, 127 to 159). You may not print even common formatting characters such as tab (control-I), carriage return (control-M), or page break (control-L).

(The interpreter is not required to throw an error, however.)

Draconis · October 20, 2022, 1:22am

It can also just be circumvented, since this check only happens when using print (char) x! In other words, if I @streamchar directly, I can just bypass all of this.

At which point the question is how the Glk library feels about that.

Draconis · October 20, 2022, 1:23am

Ah, but this doesn’t apply to memory streams, does it? Which actually makes it perfect for my purposes: I can be sure the escape character will never be printed normally.

(The basic idea is to make [italic type] and the like print an escape code instead when writing to a memory stream, and then turn those back into formatting instructions when reading from that memory buffer, so the escape character should never actually reach a window or file stream.)

Zed · October 20, 2022, 1:28am

It’s probably a terrible idea, but my UTF-32 hack could be tweaked to be UTF-16 like normal but with 16 bits of metadata per character.

Draconis · October 20, 2022, 1:29am

The only issue now is what code to use on the Z-machine, which allows exactly the same subset of characters in all output streams. But it looks like the Z-machine allows NUL in output, which I’m pretty sure nobody ever uses (cue Phil creating an alien character named 0x00 0x00 0x00 0x00). So that should be safe for my purposes.

zarf · October 20, 2022, 1:31am

Memory streams should be able to handle arbitrary bytes, yes. I forgot the use case you were going for!

Dannii · October 20, 2022, 1:32am

You’ll have to test it in various interpreters. This could mean that nulls are just completely ignored:

3.8.2.1: ZSCII code 0 (“null”) is defined for output but has no effect in any output stream. (It is also used as a value meaning “no character” when reporting terminating character codes, but is not formally defined for input.)

You could just decide to make the extension Glulx only. Memory streams in Z-Code are risky too because you can’t set a length limit, and these escape codes could make captured text much longer than authors expect.

Draconis · October 20, 2022, 1:40am

Ah, truly unfortunate. I just tested it and indeed character 0 is discarded in Bocfel.

On the other hand, I think I can stick ESC into the Unicode translation table, right? Which will effectively reserve a ZSCII value that will never be used by anyone else. The question is how to do that…I truly have no idea how the Unicode translation table is assembled by Inform 6.

Time for a new thread!