Cannot parse Greek characters in Inform 7

I suspect the best option for right now—i.e. the best option barring an update to treat Unicode as a first-class citizen for parsing—is to add a little hack to the parser that reads input into a word array, then passes it through a translation table to convert it into ASCII which is saved in the byte array. Then you would write all your verb and noun synonyms in Betacode or something like it.

Distinctly not ideal! But until the compiler is updated, I think it’s the best we can do. It’ll be easier than defining all your verb and noun synonyms in Inform 6, certainly.

EDIT: Specifically, this works for Greek because the Greek alphabet has fewer characters than the English one. So you can replace each Greek letter with one English letter without losing any information (and even have two letters left over, which you can use for the tonos and the final sigma). For other languages, this hack wouldn’t work as well.

4 Likes

Aha, yeah. I hadn’t heard of Betacode but it would be less grating than Latin-7.

There’s probably an equivalent for Modern Greek that doesn’t have to worry about lunate sigma and polytonic accents, but betacode is what I know best. The two letters it doesn’t use are J and V (except in really old dialectal texts where V is used for digamma), so I would use J for final sigma (or just ignore the difference between the different sigmata) and V for the tonos. The advantage of V over the slashes and parentheses of standard betacode is that you don’t have to worry about the parser mistaking them for word boundaries.

At which point the question becomes, are you willing to type Understand “biblivo” as the βιβλίο all over the place? Or does that undermine the natural-language-ness too much?

4 Likes

I think you’re right – that switch is more applicable to Z-Machine.

I was confused by the fact that within the IDE interpreter I got responses like:

Adventure Lab
You can see a ΔV here.

>x
That's not a verb I recognise.

>x ΔV
That's not a verb I recognise.

>παίρνω
That's not a verb I recognise.

but the same compiled output.ulx file (made without the C7 switch) seems to work just fine with Gargoyle, so I guess the issue is something within the Linux IDE interpreter for 6M62.

3 Likes

First of all I would really like to thank all of you for your answers! @zarf @Draconis @rovarsson @otistdog

Of course, it is too much at that time to focus on Ancient Greek. at are not familiar in English. Even for us, Greeks, reading Ancient Greek is quite difficult, because it is much more complicated than our modern language. I don’t wanna think about all the foreign people that want to get involved with it :sweat_smile:

My primary goal would be to work for the Modern Greek language, and make Inform accessible to people that are not familiar with English. However, I understand that this is quite difficult, so I could focus more on developing a “greeklish” version, using for instance “vivlio” or “biblio” instead of the word “βιβλίο” (which means book for anyone interested). By providing instructions to the user, regarding the correct “translation” from Greek to greeklish, I could see that becoming reality in the next months probably.

I would really appreciate if you could also help me on how to get started with those changes, because I am a relatively new user of Inform and I have little to no idea on how to modify the relevant code to make this work.

Last, is the “translation” part going to take the most time according to your opinion (analyzing the grammar, and all that related that has already been created for other languages)?

2 Likes

I think the expertise that you are seeking is most likely to be found from someone who has already worked on a translation to a non-English language. Regrettably, I am not one of them. @Natrium729 may be willing to offer some guidance.

WWI 27.27 Translating the language of play basically suggests reading the Inform Designer’s Manual, 4th edition (aka DM4) and then inspecting the built-in extension “English Language by Graham Nelson” and seeing what you can do. That’s only part of the job, though. You would then want to go through the entire Standard Rules and note all of the responses there, then issue replacments (perhaps via an extension of your own).

The good news is that this forum is a pretty good place to get help.

1 Like

If you’re quite new to Inform, it’s understandable you’re having trouble getting your head around all that.

You can get away with a lot only with Inform 7, but at one point you’ll have to dive into Inform 6 (now called Inter) and Preform, which are 2 lower level parts of the whole Inform ecosystem – especially if your language’s grammar is quite different from English (for example doesn’t have a subject-verb-complement order or has cases like in German or Latin).

Regarding the parsing of Greek characters, since right Unicode support is not really here for understand lines, I suggest staying within Inform 7 and exclusively using a Latin transliteration in your understand lines (especially if there’s an “official” or widely-known one). After that it would be “easy” to make the parser accept Greek characters with @Draconis’s suggestion. (But it won’t really be easy, hence the quotation marks.)

For example, in French, all the understand lines are written without diacritics, and there’s an Inform 6 routine that strip them before parsing. (It’s just more difficult to implement that kind of conversion when Unicode characters are involved.)

I guess you’re translating 10.1? What have you got translated right now?

In addition to reading the DM4 as suggested above, I fear the only way to learn how to translate Inform is to decipher the other translations, a check a few threads on this forum.

I’m working on a guide for translating Inform right now, but it’s quite a lot of work, so in the meantime, feel free to ask questions!

2 Likes

Well, I haven’t really started working on it because I didn’t know how I should continue with that parser issue that I first mentioned. I started tinkering with 10.1 and tried to understand the way that other translations have been developed so far.

Regarding the transliteration, there is the ISO 843 (https://www.translitteration.com/transliteration/en/greek/iso-843) which someone can use to convert Greek characters to Latin, but not vice versa. So, I guess that you could tell that this is an “official” way, however this might not be the most intuitive case for the user.

Something that I also need to mention, is that some effort has already been done for the Greek translation of Inform 6, at least in some part, but I don’t know if that can be applied to work on the newer version of Inform 7.

Do you think that the best way to continue is transliteration?
Thank you a lot for your support!

I believe the transliteration is the way to go for everything related to user input. (As you found out, Unicode is OK in text output like descriptions).

  • If there’s a standard like that ISO one, you won’t have to explain the rules.
  • It’s true it won’t be intuitive for users, but you will be able to add a way to transliterate Greek characters in commands in a subsequent version of your translation. In the end only author would need to care about that, and you can start working on your translation right now.
  • It makes it easier for people without a Greek keyboard to play games in Greek.
  • The day Inform 7 supports understand lines containing Unicode, it will be trivial (if a bit tedious) to add them.

Yes, the work done on an Inform 6 translation can be used. In the best case, some parts will be copy-pastable; in the worst case, you would be able to take some inspiration from it.


I consider that 10.1 is not quite yet ready for translation, but it should be OK. (I decided to stay on 6L38 for French, but the Spanish and Italian updated their translation, I think.) The next Inform version will have some nice features regarding extensions, but it’s possible to start working on a translation right now.

2 Likes

Nifty link, that! The only thing that’s not intuitive to me is the substitution of “bèta” into “v” where I would expect “b”. A latin “b” is just preserved as-is when I transliterate to Greek.

I have to add that I don’t have any experience with modern Greek. I studied classical Greek in highschool, and I’m aware that there are significant differences. Perhaps the modern “bèta” does sound more like a “v”. (The “b” and “v” sounds are very close to each other anyway, with a single letter to represent both, or a single in-between sound, in a number of languages I believe?)

1 Like

What is the reason why the newer version is not the best for the translation?
Do you think that I should I go for 10.1, or stick to a previous version?

This is exactly the case!
The Greek “beta” sounds like “v”, and the English “b” sounds like the Greek characters “μπ”.

For example, the word “ball” in Greek is called “μπάλα”. Unfortunately you cannot see the result in the link that I provided, because the ISO 843 only applies to “Greek to Latin” and not vice versa

1 Like

Thank you very much for clearing that up! I like to have my Greek ↔ Latin transliterations finetuned.

(As to what purpose, I’ll leave that to the imagination of the reader…)

1 Like

Usually, I would recommend using 6L38 for reasons (although it’s very old), but you should be OK with 10.1. There are some thing that are a pain to port from 6L68, some things that need to be refactored, and some things that don’t work anymore (or I didn’t find the documentation to make it work).

But that things might very well not apply to you, and you’re starting from scratch, so… Stick with 10.1, I guess.

And also, the next version of Inform will add features for things that mainly require hacks or workarounds right now (notably IE-0016), so I prefer to wait a little before fully updating the French translation.

2 Likes

The two biggest changes from Ancient to Modern pronunciation are:

  • β δ γ φ θ χ changed from stops to fricatives—but of those, only β and φ have a letter conveniently equivalent to their new pronunciation in English (or Latin), so the rest are still transcribed the same way as before
  • The incredible number of vowels got simplified down to only five, with most of the Ancient vowels all turning into /i/ (the vowel in “seed”)

One of the famous pieces of evidence for these changes (both of them!) is that ancient authors describe sheep going “βῆ βῆ”, and beh beh sounds much more like a sheep than vee vee does.

9 Likes

What a wonderful piece of linguistic archaeology!

1 Like

I remember reading a thesis about the first documented greek language version of inform 6. Not sure how helpful that would be, especially considering that’s fairly under the hood but if you’re interested i can probably find the actual paper.

Γεια σου Γιώργο! @GJMen

Probably you are talking about this thesis from Sofia Delikostidi. I also asked about that above, and it looks like that I could use at least some part of this thesis for the translation of Inform 7.

However, the problem is that it looks like that the best option is to continue with a transliteration instead of using Greek characters, so I will at least need to modify it to work with Latin characters

At least in the Understand lines, yeah. Making it so Understand lines can include non-ZSCII characters is going to be a significant pain and will involve modifications all the way down to the compiler level.

They’re modifications worth making, don’t get me wrong, but it’ll be even more of a challenge than the already-monumental undertaking of translating I7 to Greek in the first place.

1 Like

For anyone that may have not seen that, @zarf made a change that solves the issue, and instead of 8-bit character arrays, 32-bit are used instead!

4 Likes