Zcharacter tables, z3, interpreters

mulehollandaise · September 10, 2020, 6:25pm

(I’m messing with dark magic here…)

I’m attempting to determine if z3 files can be compatible with accented characters. At first I had hope because:

everything at this page on the Zcharacter directive doesn’t say anything about Z-machine v3 (granted, nobody uses v3)
when I try “Zcharacter '@^e'; Zcharacter '@^a';” and compile to z3, the compiler doesn’t complain.
Gargoyle displays it fine (but not most retro interpreters (for è and à they put a * and a ) instead, except the Amstrad CPC, which somehow displays 1/2 and 3/4 ?)

When I try to redefine the whole character table with
Zcharacter
“abcdefghijklmnopqrstuvwxyz”
“ABCDEFGHIJKLMNOPQRSTUVWXYZ”
“0123456789-’,.;:@'e@a@e@`u@^e@^i@^u”;
(Roger Firth’s table), even my Gargoyle (old version) gets all messed up (all the punctuation is wrong), and the other interpreters too; this hints at a hardcoded table.

Frotz seems to handle the punctuation much better (as well as GBAFrotz), but my slightly-behind version doesn’t support accents (they just disappear on the GBA version); also, there is a bug in the Story Headline display. (A period in the headline is replaced by a 9, in both versions; I should probably report this?)
(The ZXZVM interpreter for ZX Spectrum +3 also handles punctuation well, but interestingly it has the same bug as Frotz for the period in the headline; the periods display fine otherwise in both ZXZVM and Frotz.)

The Z-Machine standards seem to think that the table is fixed in version 2-4 of the Z-Machine, anyway. (Then why is the compiler letting me do it? Is it a bug?) This is paragraph 3 of the standard (3.5.3 and 3.5.5).

There’s also the talk about the higher ZSCII table, where the -C1 / Latin-1 encoding stores the accent (and it works fine in z5). But not all interpreters support this? (at least old z3 interpreters likely ignored this? It’s not an Inform 6 thing, is it?)
(And then there’s also the question of whether the system itself can display accents anyway, but that’s another question; I just would like to figure out the code/interpreter side.)

Thanks fo your help understanding all this

zarf · September 10, 2020, 7:01pm

Probably! As far as I know, there was no testing of v3 support from Inform 6.0.0 through 6.3.3. That’s when the Zcharacter directive was added. So anything could be happening in there.

jcompton · September 10, 2020, 7:32pm

Ozmoo supports accented characters in z3, might want to take a look at what/how they’re doing it.

Don’t count it out yet. Between ZILF and PunyInform, there’s probably been more .z3 output over the past 18 months than over the preceding 18 years.

mulehollandaise · September 10, 2020, 7:41pm

Not counting it out, I’m one of such users of PunyInform But historically, barely anyone has bothered with that format.

Yes, Ozmoo requires loading a charset (or, uh, something, I can’t quite remember unfortunately…) but in theory supports games in French; I also tested ZXZVM (supports v3, v5, v8 and accented characters) and the accents show up fine, although there is a bug with the headline. (I updated the original post to reflect this.)

To be honest, as z3 wasn’t used very much outside of Infocom games, I’m expecting to be in an awkward spot here. Some interpreters will use a fixed table (presumably because Infocom never changed it and it was codified like this in the standard), and some won’t, presumably because their code to look at the table works fine in z5/z8 and they didn’t write specific code to avoid doing that in z3.

Probably the answer is “it’s in the standard so it has to be a fixed table”, because “let’s change the standard and make all z3 interpreters wrong” is probably not going to happen; but the I6 compiler lets it happen and some interpreters still understand it, which means it might be interesting to leave things like that for the compiler side, with a heavy warning of “interpreters aren’t supposed to understand how to do these substitutions for v3, but some do, and you might be able to make it work on these systems provided you have the right charset loaded”…

mulehollandaise · September 10, 2020, 8:01pm

Actually, the behavior “if I change the whole Z-character table to the one in Roger Firth’s article, the punctuation doesn’t get messed up, except a period in the headline is replaced by a 9” is shared by quite a few implementations of the Z-Machine (ZXZVM, pinforic which is based on pinfocom, Frotz & GBAFrotz). I’m starting to wonder if it’s a compiler problem. Anyway, the rest of the conversation, and what standards say we should do, and if anything should be done, still stand

fredrik · September 10, 2020, 9:00pm

The compiler can add the custom alphabet table to the story file all it wants. There is no place in the header of a z3 file to point out where it is, so the interpreter won’t find it.

You can use accented characters even without a custom alphabet table, but it means each accented character will take up three character positions. For normal text, this just means the text takes up a bit more memory. For dictionary words, it means the dictionary resolution decreases from six letter to four, for words which have one accented letter. Still, this doesn’t have to be a disaster either.

mulehollandaise · September 10, 2020, 10:10pm

Ooh, you’re right! I had the wrong idea altogether!!
I can write @'e in the source code and it runs without a complaint. I had a problem with “é” in the source, when I don’t normally, but I think it’s because I messed up my source code encoding…

So, to sum up:

versions 2-4 cannot have their table replaced because there is no indication of where to find it in the header, whereas there is in version 5
the compiler adds that table even in version 3 (how are the letters encoded then? According to the table of the standard, or does the compiler mistakenly follow the Zcharacter directive in z3 also?)
some interpreters supporting version 3 and version 5 still manage to find that table (because it is always stored at the same place?), but others ignore it, hence the difference
characters with zscii values 155 to 251 are blank, and are populated by Inform with characters (by default, latin-1 characters), even in version 3
but since Infocom didnt use these values, old/their interpreters tend to not support them? (Also the underlying systems might not have the accented letters)
(modern) interpreters have the latin-1 zscii table in their memory by default, but if one wants to use latin-2, the compiler puts a translation table in there; does this break in v3?
when the ZXZVM interpreter says “supports accented characters but no unicode support”, they mean that they have the Latin-1 table in memory, but cannot change it even if the I6 compiler wrote a Unicode translation table (for, say, latin-2)

I feel like this makes more sense and that I learnt something, but how wrong am I now?

zarf · September 10, 2020, 10:43pm

Just to add to the fun, the I6 compiler writes the alphabet table address into the header regardless of Z-code version. The header extension address too. As you say, some interpreters must check that fields regardless of version.

Should this be fixed? (That is, should we update the compiler to store zero values in those header fields in V3/4?)

mulehollandaise · September 10, 2020, 11:08pm

But then what does the compiler use to represent the letters? Does it ignore or take into account the table in v3? (Ignores, right?)

If the compiler uses the table in v5 and includes it for the terp, but doesnt use it in v3 and includes it anyway, it’s kind of confusing for the interpreter (even though it’s kind of the terp’s fault for not checking the version).

And the Zcharacter directives should be ignored by the compiler and print a warning or an error (saying that v3 has a fixed char table) when compiling with the v3 switch, no?

zarf · September 10, 2020, 11:49pm

I think so? But I think that people more familiar with V3 should come to a set of decisions and then file a compiler request. (Issues · DavidKinder/Inform6 · GitHub)