print_unicode

MikeG · October 14, 2011, 5:03pm

I’m currently reworking the print_unicode opcode in my interpeter and I’m trying to get a feel for what some other interpeters do in some oddball cases.

For example: Say a memory stream is active and the supplied operand is a unicode ‘null’, ‘tab’, or ‘newline’ character. Would these be translated to equivalent zscii characters for writing to the stream or produce question marks as in section 7.5.3 of the standard if they did not appear explicitly in the unicode characters table? In the ‘null’ case, the standard specifies that a Zscii null value produces no output in any stream but it doesn’t explicitly say a unicode null translates to a zscii null, even though they do share the same underlying value. Same argument applies for ‘newline’ and ‘tab’. I realize it is unlikely and not particularly useful for someone to send these values to print_unicode, I’m just considering the possibility.

Following are some of the relevent standard sections I’ve been looking at:

[b]3.8.5.1

*** To define which characters are required, the Unicode (or ISO 10646-1) character set is used: characters are specified by unsigned 16-bit codes. These values agree with ISO 8859 Latin-1 in the range 0 to 255, and with ASCII and ZSCII in the range 32 to 126. The Unicode standard leaves a range of values, the Private Use Area, free: however, an Internet group called the ConScript Unicode Registry is organising a standard mapping of invented scripts (such as Klingon, or Tolkien’s Elvish) into the Private Use Area, and this should be considered part of the Unicode standard for Z-machine purposes.

7.5

*** Because of the print_unicode opcode, it is possible for arbitrary Unicode characters to be sent to the output streams: that is, for characters which are not in the ZSCII set at all, even in the “extra characters” range.

7.5.3

When printed to stream 3, Unicode characters should be converted to ZSCII if possible. If this is not possible, a question mark should be printed to stream 3.

[/b]

Ron_Newcomb · October 14, 2011, 6:56pm

It’s possible for a game to print everything in unicode, (like, say, because the authoring tool’s library causes it to do so), so getting unicode newlines and tabs and such I would imagine should produce said newlines and tabs and such. Besides, why shouldn’t a newline, etc. be a newline just because unicode said so? Any particular reason to be prejudice against unicode?

cas · October 14, 2011, 8:04pm

Version 1.1 of the standard has this to say: “Unicode characters U+0000 to U+001F and U+007F to U+009F are control codes, and must not be used.”

I take this to mean that it’s undefined behavior if such values are passed to @print_unicode. I’d think it makes sense to convert values like newline to the proper ZSCII equivalent, but I’d also think it’s not mandatory.

MikeG · October 14, 2011, 8:41pm

Interesting, I missed that one today. That seems to reinforce section 3.8.5.1 which says that the unicode characters agree with ZSCII in the range 32 to 126.

zarf · October 14, 2011, 9:21pm

Seems weird – newline and tab are control codes, equally so in ASCII and Unicode.

But I see that Frotz prints question marks for 0-31. If the spec says it and Frotz does it, that’s pretty much a Security Council majority.