[I6] Glulx vs Z-Machine character maps

The following code seems to have different behavior when compiled for Z-Machine and Glulx.

Array text -> "@oeuf";

[ main   key;
   if (text->0 == '@oe')
      print "OK";
   @read_char 1 -> key;
];

When compiled for Z-machine it prints OK, for Glulx I think it doesn’t.

I think it’s because

print (char) 220

prints “œ” (character 220 of Table 2B, the ZSCII char table) in Z-machine, but “Ü” (character 220 in Unicode) in Glulx (at least with the I7 compiler)

Is that a bug? I’m guessing the second bit of code is just incompatibility, and might be because of different specs. As for the first one, maybe I’m just using an old compiler and this problem has been fixed; if it hasn’t, though: shouldn’t the translation of “@oe” by the Glulx compiler be 339 (unicode of that character) instead of 220 ?

Thanks for helping me diagnose the problem!

You must be using an old compiler. When I try this, I get:

line 1: Warning: Entry in ‘->’, ‘string’ or ‘buffer’ array not in range 0 to 255
line 6: Error: Expected an opcode name but found read_char

(Obviously you can’t use @read_char in a Glulx program; I assume you commented that out for the -G test.)

You’re correct that the translation of “@oe” is 339. But “Array text ->” defines a byte array, and a byte can’t store the value 339!

If you change it to “Array text -->”, you get a different error:

line 1: Error: Unicode characters beyond Latin-1 are not yet supported in Glulx array literals

This is a feature that I just never got around to adding, I guess. Traditionally we make Unicode character arrays on the fly, with PrintAnyToArray() or a similar routine.

Thanks zarf!
I don’t get the warning, but that makes sense. Indeed, when I look at the first character of the array I get “83”, which is actually 339 mod 256.
I guess it doesn’t matter too much, all the other ZSCII-compatible accented letters have Unicode values under 256 so there probably won’t be an error for them. The only bug will be that a Glulx game will write “le oeuf” or “le oeil” instead of “l’oeuf” or “l’oeil” :slight_smile:
(Do you want me to put that on Mantis just to have it logged somewhere, in case one day you have time for it? I know it’s probably very low-priority…)

If needed, you could write

Array text --> 339 117 102; ! That’s “@oeuf

Yeah, go ahead with the bug report.

I wonder if you’re not seeing that compiler error because of weird signed-ness issues. What I6 compiler version are you using?

I was wondering because we’re trying to fix all the cases for contraction (“le arbre” -> “l’arbre”) in French ; for that we need to change LanguageContraction, which looks at the first letter of the word and returns 1 if it’s a vowel, but without forgetting to check if the vowel is accented or whatnot. So declaring the array like that wouldn’t work, since we’re looking in the printed text :slight_smile:

Nope, I can see that error message too – sorry, I guess that code was me trying to make a minimal source code to illustrate the bug? All the stuff I had looked at was in the context of LanguageContraction, and someone else on that forum gave a code like that… I’ll try to include a better minimal source code on Mantis :wink:

Thanks for your help!

EDIT: bug report submitted.

I think we’ve been talking at cross purposes here. :confused:

What you want to do with LanguageContraction doesn’t require an array literal. (It may require the latest release of the I6 compiler. I’m pretty sure you’re using an old version.)

The PrefaceByArticle routine writes to a byte array and then calls LanguageContraction, so unicode characters are always lost. (They could either get squashed mod-256 or converted to ‘?’, I think.) You could replace PrefaceByArticle with a version that writes to a word array, and then write a LanguageContraction that tests with --> instead of ->.

This would require a new global array to write to:

Array StorageForShortNameUni --> 250;

…and then a version of PrintAnyToArray which opened a Unicode stream rather than a byte stream.

Yeah, I just saw your note on Mantis. :confused: Sorry!

Now though I see what you mean, and I see what would be required for LanguageContraction. I’ll try to make it work.

As for the Mantis bug, you probably just mean the “Array text --> “@oeuf”;” that makes the “Error: Unicode characters beyond Latin-1 are not yet supported in Glulx array literals” appear? What’s the etiquette for Mantis, can I close the bug (how?) and open a new one? I probably need to replace the source code I gave by the one that appears in the first post of this thread.

Again, sorry for not understanding what you meant :s