frotz character encoding problems (Linux)

lunasspecto · December 5, 2012, 10:46pm

I’ve been using the frotz package from the repositories for testing as I do some casual Inform 6 encoding on my Linux Mint system. My only problem with it is that frotz seems to give its output in Western character encoding (ISO-8859-15), whereas my system’s default character encoding standard is Unicode (UTF-8). I can change my terminal emulator’s code as necessary, but the real annoyance is that Emacs, which I have set up to interface with frotz via the inform-mode package, is rendering accented characters improperly in such a way that it throws the line-wrapping off entirely. Does anyone have a solution for this? Can I change some system variable somewhere, or should I just compile frotz myself?

zarf · December 6, 2012, 1:52am

The nfrotz release, which is what I have installed, has a -U (utf8) command-line switch. I’m not sure what your version is.

Compiling it yourself should be very easy, anyhow.

lunasspecto · December 6, 2012, 2:03am

The frotz I have identifies itself as “FROTZ V2.43 curses interface.” It doesn’t have the -U command line option.

I’ll just compile nfrotz and see if I can swap it for what I’ve got without breaking emacs integration.

zarf · December 6, 2012, 6:45pm

Curses should be taking its encoding cue from the environment variables. Do you have $LANG or $LC_TYPE set? “en_US.UTF-8” would be the appropriate value – that’s what I use, anyhow.

RealNC · December 6, 2012, 7:57pm

I can confirm the same problem. The environment is setup correctly and other interpreters (Fizmo) work OK. Seems to be Frotz-specific. I don’t know whether this is actually supposed to work at all in Frotz though. Does it even attempt to convert the game’s encoding to the current system encoding, or does it just use whatever the game’s encoding is?

zarf · December 6, 2012, 10:03pm

Z-code games always use ZSCII internally, possibly with Unicode extensions. The interpreter has to convert that to some other encoding when printing out game text.

lunasspecto · December 7, 2012, 4:37am

I actually gave up on using an interpreter from within Emacs altogether and am now using Gargoyle frotz to test. Don’t know why I wasn’t doing that earlier.

DavidG · May 17, 2019, 9:36am

William Lash submitted a patch to add UTF-8 support to Frotz. See https://gitlab.com/DavidGriffith/frotz/commit/20cddb2182f95ce13d31e42ad9d7b734b4b93f84, which I merged in.

DavidG · May 17, 2019, 9:08pm

Following up, someone reported trouble with games using the Cyrillic alphabet. It’s being worked on.

borg323 · May 17, 2019, 9:33pm

If it is any help, here is the approach I took for jzip: https://github.com/borg323/jzip/blob/master/cursesio.c#L155
There is no explicit utf8 handling, I let curses do it all.

DavidG · May 17, 2019, 10:27pm

That’s the approach. Frotz uses ncursesw to do it but somehow Cyrillic isn’t working.

borg323 · May 17, 2019, 10:55pm

Unless I misunderstood, the current frotz code is not using the add_wch() function for unicode chars, but is trying to compose utf8 sequences and use multiple addch() calls, where only a small subset of cases are handled.

DavidG · May 18, 2019, 2:03am

No. It uses add_wch(). See https://gitlab.com/DavidGriffith/frotz/commit/8b898570775a2387f04f1504d09f88391afa07f9.

borg323 · May 18, 2019, 8:21am

I only see get_wch() used. What I meant is something like:

diff --git a/src/curses/ux_text.c b/src/curses/ux_text.c
index 20a678c..8da316c 100644
--- a/src/curses/ux_text.c
+++ b/src/curses/ux_text.c
@@ -223,26 +223,15 @@ void os_display_char (zchar c)
            addch(c);
 #else
        } else {
-         // Looking at the UTF-8 table at
-         // https://www.utf8-chartable.de/unicode-utf8-table.pl
-         // Shows that characters from 0xa0-0xbf are encoded as
-         // 0xc2 0xa0-0xbf, and characters from 0xc0-0xff are
-         // encoded as 0xc3 0x80-0xbf
-         if ( c < 0xc0) {
-           addch(0xc2);
-           addch(c);
-#ifdef HANDLE_OE_DIPTHONG
-         } else if (c == 0xd6) {
-           addch(0xc5);
-           addch(0x92);
-         } else if (c == 0xf6) {
-           addch(0xc5);
-           addch(0x93);
-#endif /* HANDLE_OE_DIPTHONG */
-         } else {
-           addch(0xc3);
-           addch(c - 0x40);
-         }
+       cchar_t ch[2];
+       wchar_t wch[2];
+       attr_t attr;
+       short pair;
+       wch[0] = c;
+       wch[1] = 0;
+       attr_get( &attr, &pair, NULL );
+       setcchar( ch, wch, attr, pair, NULL );
+       add_wch( ch );
        }
 #endif /* USE_UTF8 */
        return;

I see now that this is not enough, just a step in the right direction: I haven’t looked at the code in depth, this was just something that stood out.

DavidG · May 18, 2019, 2:36pm

Could I get you to to chime in on the issue at Gitlab?