unicode problem

#1

Hi,

I started to translate an IF game to my language. I’m a beginner in that. My language has some extra characters. As I knew, in the newer IF compilers/interpreters, it is not a problem to use unicode characters. But when I try to compile the (partially) translated source, I get the following error message:

The grammar token 'unicode 337' in the sentence 'Understand "kér [something]t [someone]t[unicode 337]l" as querysmalling'   looked to me as if it might be a unicode character, but this isn't something allowed in parsing grammar.

In the first word, there is an unicode character, but it is not a problem. I tried to google it, and it seems, the compiler only allows unicode characters with smaller code number. Is that true? Can I avoid that, and use my special characters somehow?

0 Likes

#2

Sorry, I forgot to write about that I tried to compile the source with gnome-inform7 on Ubuntu.

0 Likes

(matt w) #3

It looks to me as though the issue is this, from §5.10 of Writing with Inform:

The other Unicode characters can be written inside quotation text but not source text–which I’m guessing means they can’t be understood either. So é can be understood but unicode 337 can’t.

Unfortunately I suspect there isn’t a workaround for this–there’s some internal representation in a format that doesn’t include the Unicode characters (the ZSCII format). I’m not super familiar with the inner workings of the virtual machines though.

0 Likes

(Daniel Stelzer) #4

Indeed, nothing outside that range can be properly handled by the parser. This is a significant problem when trying to write IF in different languages, since the limited range shown above isn’t even enough for the entire European Union. (Greek, for instance, is missing its entire alphabet, while other languages have more subtle problems: Polish needs letters like ż, Romanian ă, Icelandic ð…it looks like you’re specifically missing Hungarian’s ő?)

Zarf has written an extension that updates the parser to support Unicode. But since you can’t use most Unicode characters in object names or Understand lines, you need to use Inform 6 inclusions for all parsing-related code (Understand lines, object names, verb definitions, conversation topics…).

Hopefully an upcoming release of Inform 7 will change this. But for now, it’s not really possible to use it for works in most non-English languages. Sorry about that.

0 Likes

(Daniel Stelzer) #5

That said, modern systems and interpreters do support Unicode quite well. If you managed to get a Hungarian game past the first stage of compiling, everything else would go off without a hitch, and it would be completely playable. The only problem is the ni compiler itself, which is also the one part that’s not open source (as opposed to the GUI, the template library, the I6 compiler, the blorb tools, the Glulx format, the Quixe interpreter…).

0 Likes

#6

Thanks for all the answers.
Yes, I would like to translate to Hungarian language. I know an old Hungarian IF game for C64, what was rewritten to I6, and it has unicode characters… I wrote its author, how he did it.
I downloaded an I6 source, wrote some special characters in it, and tried to compile it with the inform compiler, with -v8 flag, but it gave error messages for the spec characters… I also tried the -C2 flag, but it didn’t help.

0 Likes

#7

Is it possible to transcode I7 to I6/TADS/Hugo - if they handle the spec characters better?

0 Likes

(Andrew Plotkin) #8

You can transcode I7 to I6. That’s what the ni compiler does. That’s the piece we’re missing. :confused:

Other formats, no.

0 Likes

(Andrew Plotkin) #9

You need to use the -G flag (for Glulx), and -Cu (to indicate that the I6 source code is in UTF-8).

Then you need additional settings to get the I6 dictionary to be Unicode-compatible. I don’t have a complete example on hand, unfortunately.

0 Likes

(Daniel Stelzer) #10

Honestly, if ni could just be hacked to pass non-ZSCII characters through unmolested, then all the necessary transformations could be applied on the I6 side. This might be possible with disassembly, but might not: it depends on the data structures used internally. (Ideally it would just use UTF-8 in byte arrays, and depend on the I6 compiler to handle character sets, but I don’t know if this actually happens.)

0 Likes

#11

Thanks, it was helpful! I found auto.inf in the Build folder.

0 Likes

#12

It is very adventurous :slight_smile: :frowning: It turned out that my inform compiler doesn’t support unicode, because on Linux, the version is 6.31. I downloaded the inform 6.33 Windows executable, and I use it with wine.
So, I make the english-hungarian translation in gnome-inform7. Save, then quit. Recode őűŐŰ in story.ni to their ugly iso8859-1 version. Then I reopen the project in gnome-inform7, compile it, and quit again. I recode auto.inf. It wasn’t straightforward, because the auto.inf wasn’t utf8… I got an error. It recommended using DICT_CHAR_SIZE=4, so I appended

!% $DICT_CHAR_SIZE=4

to the first line of the auto.inf.
Ok, now it compiles, but when I play, I get errors for every command, even for quit…
Any idea, where I can find additional help, how to solve that?

0 Likes

(Andrew Plotkin) #13

DICT_CHAR_SIZE=4 is the correct flag, but as soon as you use it, you have to replace the I6 parser code. The existing parser assumes that the dictionary and all player input is stored in bytes. You have to replace that with code which uses (32-bit) words.

My extension github.com/erkyrath/i7-exts/blo … Parser.i7x contains this code. But I haven’t tried to use it the way you’re trying, so it will require some experimentation to get all the pieces in the right place.

I apologize for not having more explicit instructions.

0 Likes

(Daniel Stelzer) #14

In theory you can just include that extension into the Inform 7 project. Then write your game using replacements for all the characters ZSCII doesn’t have: for instance, Hungarian doesn’t use ô or û, so you could write ô û anywhere you need ő ű. Then finally, use sed to replace all ô with ő and û with ű in auto.inf before passing it to the Inform 6 compiler.

In practice I’ve never done this so things might break. But it seems promising! Especially for a language like Hungarian which doesn’t need very many “exotic” characters. I think ő and ű are the only ones not in the original character set.

If you want to be really fancy, and make it easy to type your source on a Hungarian keyboard, you could sed ő ű into ô û before invoking ni, then sed them back afterward to pass to the Inform 6 compiler. Depends how much you want to fiddle with the build process.

0 Likes

#15

I tried out the recommended extension, but as soon as I included it into the I7 source, the game didn’t accept any command (with simple ascii chars).

0 Likes

#16

I had to work on my source code conversion, then finally it worked. I didn’t tested it thoroughly, but it seems it is ok now with the extension.
Thanks for the help for everyone.

0 Likes