I’m currently reworking the print_unicode opcode in my interpeter and I’m trying to get a feel for what some other interpeters do in some oddball cases.
For example: Say a memory stream is active and the supplied operand is a unicode ‘null’, ‘tab’, or ‘newline’ character. Would these be translated to equivalent zscii characters for writing to the stream or produce question marks as in section 7.5.3 of the standard if they did not appear explicitly in the unicode characters table? In the ‘null’ case, the standard specifies that a Zscii null value produces no output in any stream but it doesn’t explicitly say a unicode null translates to a zscii null, even though they do share the same underlying value. Same argument applies for ‘newline’ and ‘tab’. I realize it is unlikely and not particularly useful for someone to send these values to print_unicode, I’m just considering the possibility.
Following are some of the relevent standard sections I’ve been looking at:
[b]3.8.5.1
*** To define which characters are required, the Unicode (or ISO 10646-1) character set is used: characters are specified by unsigned 16-bit codes. These values agree with ISO 8859 Latin-1 in the range 0 to 255, and with ASCII and ZSCII in the range 32 to 126. The Unicode standard leaves a range of values, the Private Use Area, free: however, an Internet group called the ConScript Unicode Registry is organising a standard mapping of invented scripts (such as Klingon, or Tolkien’s Elvish) into the Private Use Area, and this should be considered part of the Unicode standard for Z-machine purposes.
7.5
*** Because of the print_unicode opcode, it is possible for arbitrary Unicode characters to be sent to the output streams: that is, for characters which are not in the ZSCII set at all, even in the “extra characters” range.
7.5.3
When printed to stream 3, Unicode characters should be converted to ZSCII if possible. If this is not possible, a question mark should be printed to stream 3.
[/b]