interactive fiction system for utf-8 RTL languages

[size=150]thanks for your reply,
this issue is an idea that i feel worth exploring and monetizing for IF programmers and IF growing community , so i present it as a chance for developers
i maybe the only Saudi nerd who plays interactive game :slight_smile:
[/size]

[size=150] thank you for your reply,
i am honored that you commented on this issue, you effort is beyond greatness with Glulx
wish that you continue your development efforts and here is a big chance to open a huge new markets (education,gaming) be enabling people around the globe to create interactive fiction in their native languages chinese,indian,thai,
you can be an open source developer and make a living by offering services [/size]

Well, the hard part is someone altering Inform 7 to handle your language. That’s well beyond my capabilities. But if you get that done, I’d be happy to get your story file working in fyrevm-web.

[size=150]altering inform 7 to support Unicode utf-8 (non Latin character set ) and right to left language style in my opinion is enough for any known language.
[/size]

Honestly, UTF-8 is the hard part. RTL is all in the display layer which is open source. But Inform accepting UTF-8 natively would be amazing and useful for a lot of languages.

(The library changes have already been made, by Andrew Plotkin. But even with his changes you can’t use non-Latin-1 characters in an Understand line. Which makes it very difficult to do any non-English parsing.)

Well, I did have this thing that pumps the output from dumbfrotz (made for piping to other programs) into a script that translates via Google translate: https://intfiction.org/t/if-through-google-translate-via-command-line/11654/1

For farsi, it spits out something like this:

Selection_041.png

You should be able to type in commands in the translated language as well, but the verbiage can be a bit hit or miss there.

Terminology correction: you’re talking about Inform handling Unicode input natively. The Inform compiler already accepts UTF-8 source files. The UTF-8 decoding is correct, but its internal text handling (and that of the I6 parser code) does not deal with Unicode strings.

Right. That covers the I6 parser code, which leaves the Inform compiler internals.

[size=150]thanks for your effort and kind reply,
the Farsi text output is aligned and represented in right-to-left style correctly [/size]

No problem. It also looks like the google-translate trick I mention in that thread works for inform games in quixe as well.

[size=150]yes it works for the content of the story but the commands in the parser are still in english [/size]

Indeed; you’ll need an actual game written in an RTL language for that.

I tried writing a one room game with Arabic and the description was okay, but the name property threw up a unicode error.

Yes, parsing would be nightmarish, and a system would probably need to be built from the ground-up by someone in their language.

However, click/choice based systems that don’t need to tear apart a player’s command and parse it are most likely the easiest first step.

I don’t think that’s true. The language stuff can be done in language.i6t and if the ni compiler were refactored to handle unicode, it should just work.

I believe the I6 compiler can handle unicode, so that shouldn’t be an issue.

The only thing that would hard would be to allow this:

المطبخ هو غرفة. “هذا هو المطبخ.”

Which translates to:

The kitchen is a room. “This is the kitchen.”

Of course if you were to really do this in Arabic (or any other RTL), you’re right…you’d want to build it for that language’s idiosyncrasies.

This worked:

“Arabic” by David

The kitchen is a room. “أنت واقف في المطبخ ، مقلّص بثلاجة ، وموقد ، ومنضدة رخامية.”

Rule for printing the name of the kitchen:
say “مطبخ”.

I get:

Arabic
An Interactive Fiction by David
Release 1 / Serial number 180316 / Inform 7 build 6M62 (I6/v6.33 lib 6/12N) SD

مطبخ
أنت واقف في المطبخ ، مقلّص بثلاجة ، وموقد ، ومنضدة رخامية.

(not accounting for the RTL layout though, which would be handled in the browser)

This makes me curious - has there ever been a parser system in Asian languages? I suspect the same hurdles would be encountered there.

There has! Sierra’s 1980 “Mystery House” was translated into Japanese twice, in 1982 and 1983. It used only katakana, and I don’t know how sophisticated the parser was, but it was definitely a parser game.

(If anyone knows more Japanese than I do, and wants to evaluate the parser, there’s a walkthrough that lists the commands you need.)

I’m sure there’s more recent IF in Asian languages as well, but given that I don’t speak any well enough to play, I haven’t done much research.

I recall the Digital Antiquarian reporting on the first Japanese adventure game – which was written in English. filfre.net/2012/07/japanese-adventuring/ I gather it’s been done (well, you can see screenshots of the Japanese conversions of Infocom games at Mobygames, memory suggests that the early offerings of Koei and Square (“The Death Trap”) also included Japanese text adventures like eg. the original version of Princess Tomato in the Salad Kingdom) but The Alphabet Problem likely led toward the genre evolving there toward visual novels and other menu-driven variants.

[size=150]that’s reasonable at that time. but now the programming technology is far advanced, the international market outside the English language still untapped so the chance for developers to make a great profit if they adapt Unicode utf-8 non Latin character set [/size]

I just did a quick test in TADS 3, and it supports UTF-8 Arabic verbs:

DefineIAction(Foo)
    execAction()
    {
        "هذا هو المطبخ";
    }
;

VerbRule(Foo)
    'المطبخ'
     : FooAction
     verbPhrase = ''
;

This results in the following while running the game in QTads:

>المطبخ هذا هو المطبخ

(The vanilla TADS interpreter probably won’t work for this. The TADS Workbench also won’t work for writing the code. You need an editor with Unicode support.)

However, you would need to translate the TADS 3 parser to Arabic yourself, and RTL input and output would need to be implemented in the interpreter. It might be easier to use WebUI instead so that the output is rendered by a web browser, not by the interpreter.