For Formatting Capture, I need an escape character to encode formatting instructions into a string. Since the encoding isn’t especially clever at the moment (making it difficult to escape the escape character itself), I want to use one that’s unlikely to ever be printed by an actual game.
My first thought was, of course, ESC (U+001B). But this threw an error in Git (or more likely whatever Glk library it’s using), saying I tried to print an unprintable character to an output stream in text mode.
For the Z-machine, the specification lays out exactly which codepoints are valid for input and for output. But I haven’t found anything like this in the Glk spec. Is there an official list of which codepoints can be printed in text mode? Or is it up to the implementation, which practically speaking will probably mean “anything not defined as a control character in Unicode”?
P.S. I realize I should just bite the bullet and try to find a way to escape the escape character, in which case I can use something normal like the backslash. Inform just makes this extremely difficult. In the meantime, I’m fairly confident no author has ever used the spacing cedilla (U+00B8) in a Glulx game.
There is a Glk gestalt code which you can use to ask the interpreter which characters are safe to print.
But the thing is, I thought the JS interpreters just said that everything could be printed. Turns out, almost:
So it says it can’t print anything outside the basic multilingual plane of Unicode (even though it might be able to, if you printed the right Unicode surrogates), nor the control codes. But this is incorrect as the C0 block includes several supported whitespace characters the newline character.
But I assume you weren’t actually checking the gestalt and printing ESC actually resulted in an error? I’m not sure which part of GlkApi would do that, especially when printing to a memory stream, in which case all codes should be supported. As the spec notes “[You can write a disk file in text mode, but a memory stream is effectively always in binary mode.]” I’d say that being unable to print ESC to a memory stream (or a file stream in binary mode!) is an interpreter bug. Unfortunately it may be a common bug, and if it’s present in more than just GlkApi then we could be waiting years for printing ESC to be safe…
When you are sending text to a window, or to a file open in text mode, you can print any of the printable Latin-1 characters: 32 to 126, 160 to 255. You can also print the newline character (linefeed, control-J, decimal 10, hex 0x0A.)
It is not legal to print any other control characters (0 to 9, 11 to 31, 127 to 159). You may not print even common formatting characters such as tab (control-I), carriage return (control-M), or page break (control-L).
(The interpreter is not required to throw an error, however.)
Ah, but this doesn’t apply to memory streams, does it? Which actually makes it perfect for my purposes: I can be sure the escape character will never be printed normally.
(The basic idea is to make [italic type] and the like print an escape code instead when writing to a memory stream, and then turn those back into formatting instructions when reading from that memory buffer, so the escape character should never actually reach a window or file stream.)
The only issue now is what code to use on the Z-machine, which allows exactly the same subset of characters in all output streams. But it looks like the Z-machine allows NUL in output, which I’m pretty sure nobody ever uses (cue Phil creating an alien character named 0x00 0x00 0x00 0x00). So that should be safe for my purposes.
You’ll have to test it in various interpreters. This could mean that nulls are just completely ignored:
220.127.116.11: ZSCII code 0 (“null”) is defined for output but has no effect in any output stream. (It is also used as a value meaning “no character” when reporting terminating character codes, but is not formally defined for input.)
You could just decide to make the extension Glulx only. Memory streams in Z-Code are risky too because you can’t set a length limit, and these escape codes could make captured text much longer than authors expect.
Ah, truly unfortunate. I just tested it and indeed character 0 is discarded in Bocfel.
On the other hand, I think I can stick ESC into the Unicode translation table, right? Which will effectively reserve a ZSCII value that will never be used by anyone else. The question is how to do that…I truly have no idea how the Unicode translation table is assembled by Inform 6.