I have recently started working on my University thesis, which is the Greek translation of Inform 7, by building an extension for it. However, I soon realized that there is a number of issues regarding the Greek language.
As stated in this topic, non - Latin languages are actually not supported by the parser. The entire Greek alphabet is actually different than the Latin one, even though some characters are written in the same exact way (for example capital letters A,B, etc.). As a result, when I try to write some Greek text, the parser does not recognize it text.
The only place where I managed to make Greek characters work, is in the descriptions or printed names. For example, if I write down âThe printed name of [room] is [Greek name of room]â, then it is shown correctly in the story.
The Unicode extensions seem to make no changes, I face the same issue when using them or not
This is a serious problem that I am facing right now, and I donât really know how to tackle it. Is it possible to modify the parser to support Greek characters as normal input? Is there another solution that I should try to look for?
I canât help you with this, I just wanted to say that the task youâve set yourself is valuable and impressive. I hope some people more knowledgeable about Informâs workings can help you out.
A long time ago, I wrote an experimental parser update which enabled full Unicode input:
The included example code demonstrates some Greek letters in verbs and nouns.
However (a) you need some terrible workarounds to define objects with non-Latin words, and (b) this is for an old version of Inform (9.2 or 6L38). It wonât work for the current version.
We expect that something like this extension will be integrated into Inform eventually, but the process hasnât started as far as I know.
Note that it was a bit out-of-date even at the time with regard to what was then the latest version of Inform7. The main idea of this post however â that of converting the input at the âAfter reading a commandâ stage â might be a useful place for you to start.
Maybe @halkun or someone else familiar with their work will see this and step in.
zarf may not be giving himself enough credit. Using the version of Unicode Parser linked there (which is still compatible with 6M62, though Iâm not sure about 10.1.2), I got the following transcript:
Adventure Lab
You can see a ÎV here.
> x
(the ÎV)
You see nothing special about the ÎV.
> x ÎV
You see nothing special about the ÎV.
> ĎιίĎνĎ
(the ÎV)
Taken.
The source code used was:
Include Unicode Parser by Andrew Plotkin.
The Adventure Lab is a room.
A thing has a text called greek-name. Understand the greek-name property as describing a thing. [see Unicode Parser documentation]
A ÎV is in Adventure Lab. The printed name of it is "ÎV". The greek-name of it is "ÎV".
Include
(- Verb '@{3C0}@{3B1}@{3AF}@{3C1}@{3BD}@{3C9}' = 'get';
Verb '@{3C0}@{3B1}@{3B9}@{3C1}@{3BD}@{3C9}' = 'get';
Verb '@{11D}et' = 'get';
-) after "Grammar" in "Output.i6t". [see Unicode Parser documentation]
and to get it to compile, it was necessary to:
Compile I7 source code.
Go to the projects âBuildâ directory (where auto.inf was generated).
Issue the command /usr/lib/x86_64-linux-gnu/gnome-inform7/inform6 -wxE2kSDGC7 $huge auto.inf output.ulx
Test the compiled game with a non-IDE interpreter.
Note that the C7 switch is not normally included. Someone may be able to offer a way to get the IDE to add it; I donât know of one. [EDIT: Also note that the C7 switch does not appear to be necessary for Glulx. See post below.]
It sounds like you have your work cut out for you. Good luck!
P.S. You may also want to take special note of a mention in the Glk specification section 2.6.1:
The initial decomposition is only necessary because of a historical error in the Unicode spec: character 0x0345 (COMBINING GREEK YPOGEGRAMMENI) behaves inconsistently.
Iâm glad it still works with 6M62! However, it will definitely require updating for 10.1.
This is a hard problem for two reasons:
(1) The deepest parts of the parser â the code that reads in player input and operates on it â all uses byte arrays. That is, it all operates on 8-bit characters. Unicode characters just donât fit.
(2) The Inform compiler knows this, so it doesnât allow Unicode characters in verb synonyms, noun synonyms, etc.
The fix for (1) is to replace all those arrays with arrays of 32-bit values, and then replace all the code that operates on them. This is a lot of code. Itâs not conceptually difficult, itâs just a lot of plumbing.
Fixing (2) requires a compiler update after (1) is settled.
You could weasel around (1) by changing just the first step of player input to re-encode the input into Latin-7, which can be stored in byte arrays. This would work with only a few lines of the parser changed. But youâd have to write all your verb/noun synonyms in ASCII equivalent letters: âbq\worâ instead of âβĎÎŹĎÎżĎâ. It would be pretty unpleasant.
(Someone is going to suggest UTF-8 in 8-bit arrays. Yay, UTF-8! Donât go there. Itâs more work than changing the arrays to 32-bit arrays.)
I started porting this to 10.1 at one point, but itâs not just a matter of replacing instead ofâs with replacingâs: the code being replaced has changed in enough places that itâs finicky to get right. But not a huge or an especially hard problem, just a tedious one.
I suspect the best option for right nowâi.e. the best option barring an update to treat Unicode as a first-class citizen for parsingâis to add a little hack to the parser that reads input into a word array, then passes it through a translation table to convert it into ASCII which is saved in the byte array. Then you would write all your verb and noun synonyms in Betacode or something like it.
Distinctly not ideal! But until the compiler is updated, I think itâs the best we can do. Itâll be easier than defining all your verb and noun synonyms in Inform 6, certainly.
EDIT: Specifically, this works for Greek because the Greek alphabet has fewer characters than the English one. So you can replace each Greek letter with one English letter without losing any information (and even have two letters left over, which you can use for the tonos and the final sigma). For other languages, this hack wouldnât work as well.
Thereâs probably an equivalent for Modern Greek that doesnât have to worry about lunate sigma and polytonic accents, but betacode is what I know best. The two letters it doesnât use are J and V (except in really old dialectal texts where V is used for digamma), so I would use J for final sigma (or just ignore the difference between the different sigmata) and V for the tonos. The advantage of V over the slashes and parentheses of standard betacode is that you donât have to worry about the parser mistaking them for word boundaries.
At which point the question becomes, are you willing to type Understand âbiblivoâ as the βΚβΝίο all over the place? Or does that undermine the natural-language-ness too much?
I think youâre right â that switch is more applicable to Z-Machine.
I was confused by the fact that within the IDE interpreter I got responses like:
Adventure Lab
You can see a ÎV here.
>x
That's not a verb I recognise.
>x ÎV
That's not a verb I recognise.
>ĎιίĎνĎ
That's not a verb I recognise.
but the same compiled output.ulx file (made without the C7 switch) seems to work just fine with Gargoyle, so I guess the issue is something within the Linux IDE interpreter for 6M62.
Of course, it is too much at that time to focus on Ancient Greek. at are not familiar in English. Even for us, Greeks, reading Ancient Greek is quite difficult, because it is much more complicated than our modern language. I donât wanna think about all the foreign people that want to get involved with it
My primary goal would be to work for the Modern Greek language, and make Inform accessible to people that are not familiar with English. However, I understand that this is quite difficult, so I could focus more on developing a âgreeklishâ version, using for instance âvivlioâ or âbiblioâ instead of the word âβΚβΝίοâ (which means book for anyone interested). By providing instructions to the user, regarding the correct âtranslationâ from Greek to greeklish, I could see that becoming reality in the next months probably.
I would really appreciate if you could also help me on how to get started with those changes, because I am a relatively new user of Inform and I have little to no idea on how to modify the relevant code to make this work.
Last, is the âtranslationâ part going to take the most time according to your opinion (analyzing the grammar, and all that related that has already been created for other languages)?
I think the expertise that you are seeking is most likely to be found from someone who has already worked on a translation to a non-English language. Regrettably, I am not one of them. @Natrium729 may be willing to offer some guidance.
WWI 27.27 Translating the language of play basically suggests reading the Inform Designerâs Manual, 4th edition (aka DM4) and then inspecting the built-in extension âEnglish Language by Graham Nelsonâ and seeing what you can do. Thatâs only part of the job, though. You would then want to go through the entire Standard Rules and note all of the responses there, then issue replacments (perhaps via an extension of your own).
The good news is that this forum is a pretty good place to get help.
If youâre quite new to Inform, itâs understandable youâre having trouble getting your head around all that.
You can get away with a lot only with Inform 7, but at one point youâll have to dive into Inform 6 (now called Inter) and Preform, which are 2 lower level parts of the whole Inform ecosystem â especially if your languageâs grammar is quite different from English (for example doesnât have a subject-verb-complement order or has cases like in German or Latin).
Regarding the parsing of Greek characters, since right Unicode support is not really here for understand lines, I suggest staying within Inform 7 and exclusively using a Latin transliteration in your understand lines (especially if thereâs an âofficialâ or widely-known one). After that it would be âeasyâ to make the parser accept Greek characters with @Draconisâs suggestion. (But it wonât really be easy, hence the quotation marks.)
For example, in French, all the understand lines are written without diacritics, and thereâs an Inform 6 routine that strip them before parsing. (Itâs just more difficult to implement that kind of conversion when Unicode characters are involved.)
I guess youâre translating 10.1? What have you got translated right now?
In addition to reading the DM4 as suggested above, I fear the only way to learn how to translate Inform is to decipher the other translations, a check a few threads on this forum.
Iâm working on a guide for translating Inform right now, but itâs quite a lot of work, so in the meantime, feel free to ask questions!
Well, I havenât really started working on it because I didnât know how I should continue with that parser issue that I first mentioned. I started tinkering with 10.1 and tried to understand the way that other translations have been developed so far.
Regarding the transliteration, there is the ISO 843 (https://www.translitteration.com/transliteration/en/greek/iso-843) which someone can use to convert Greek characters to Latin, but not vice versa. So, I guess that you could tell that this is an âofficialâ way, however this might not be the most intuitive case for the user.
Something that I also need to mention, is that some effort has already been done for the Greek translation of Inform 6, at least in some part, but I donât know if that can be applied to work on the newer version of Inform 7.
Do you think that the best way to continue is transliteration?
Thank you a lot for your support!
I believe the transliteration is the way to go for everything related to user input. (As you found out, Unicode is OK in text output like descriptions).
If thereâs a standard like that ISO one, you wonât have to explain the rules.
Itâs true it wonât be intuitive for users, but you will be able to add a way to transliterate Greek characters in commands in a subsequent version of your translation. In the end only author would need to care about that, and you can start working on your translation right now.
It makes it easier for people without a Greek keyboard to play games in Greek.
The day Inform 7 supports understand lines containing Unicode, it will be trivial (if a bit tedious) to add them.
Yes, the work done on an Inform 6 translation can be used. In the best case, some parts will be copy-pastable; in the worst case, you would be able to take some inspiration from it.
I consider that 10.1 is not quite yet ready for translation, but it should be OK. (I decided to stay on 6L38 for French, but the Spanish and Italian updated their translation, I think.) The next Inform version will have some nice features regarding extensions, but itâs possible to start working on a translation right now.
Nifty link, that! The only thing thatâs not intuitive to me is the substitution of âbètaâ into âvâ where I would expect âbâ. A latin âbâ is just preserved as-is when I transliterate to Greek.
I have to add that I donât have any experience with modern Greek. I studied classical Greek in highschool, and Iâm aware that there are significant differences. Perhaps the modern âbètaâ does sound more like a âvâ. (The âbâ and âvâ sounds are very close to each other anyway, with a single letter to represent both, or a single in-between sound, in a number of languages I believe?)
What is the reason why the newer version is not the best for the translation?
Do you think that I should I go for 10.1, or stick to a previous version?
This is exactly the case!
The Greek âbetaâ sounds like âvâ, and the English âbâ sounds like the Greek characters âÎźĎâ.
For example, the word âballâ in Greek is called âÎźĎΏΝιâ. Unfortunately you cannot see the result in the link that I provided, because the ISO 843 only applies to âGreek to Latinâ and not vice versa