Ideal Language for Playing Interactive Fiction

Al-Khwarizmi · December 18, 2013, 5:42pm

I think what you really mean is what is the language that makes creating interactive fiction easier for the programmer, right? For the player, any language would do as long as the parser supports it well (and therefore the obvious answer to the question would be “the player’s native language”)

For programmers, I think English is one of the easiest. If I compare English e.g. to Spanish, in Spanish you have to worry about at least (1) supporting characters not in 7-bit ASCII and all the related text encoding problems that can ensue, (2) supporting input verbs in imperative, infinitive or first person (in English those three forms are the same), (3) building sentences in a different ways depending on noun and article gender, as in “el árbol” vs. “la mesa” (common nouns and articles don’t have a gender in English), (4) supporting clitics (“eat it” -> “come + lo” -> “cómelo”, all together and with an accent that doesn’t appear in the simple form of the verb “come”), and (5) contractions (“de el” -> “del”). Although most of this falls on the shoulders of the programmers of IF systems, the programmers of actual games also have some hassles, like defining their nouns as masculine or feminine, maybe defining the imperative/first person for some exotic verbs specific to their game, etc.

There are even some common constructions in Spanish that are very ambiguous to parse and that systems just don’t handle by default. For example, “se” is a really devilish word: it can be an indirect object pronoun (“mandárselo” - to send it to him/her, “se lo mandé” - I sent it to him/her), a reflexive pronoun (“lavarse” - to wash oneself), or just an emphatic pronoun that does nothing at all (“comérselo” - to eat it, totally equivalent to just “comerlo” - compare with “mandárselo” above). Something similar happens with the equally common pronoun “le”, and it’s impossible to handle these kinds of things by default because they are dependent on context. So they have to be handled individually and context-sensitively by the IF author, which is mostly not done because it’s a lot of work to handle a couple of works that most players will not use due to previous bad experiences anyway. This is the most important drawback of IF in Spanish IMO.

Portuguese, Galician, Catalan and probably Italian have similar problems as Spanish (in some cases even more pronounced), and I wouldn’t vouch for French but I imagine it’s not very different in these respects either.

I wrote my system Aetheria Game Engine for Spanish, and then created the option to write IF in English too, and the adaptation to English (apart from the obvious translation of default messages, etc.) was mostly about ignoring things and deleting code. Gender? Just ignore it. Methods to generate an article+noun depending on gender? Unneeded. Methods to convert imperative to infinitive? Unneeded. Methods to find clitics? Unneeded. Etc. The only thing I actually had to add for English was the support for phrasal verbs such that, e.g., “pick the sword up” is understood as “pick up the sword”.

Even easier than English is Esperanto. That’s a language that’s been artificially made to be regular, so it’s very easy to program IF systems and IF itself for it.

I suppose Japanese and Chinese should be easy too, as they have a simple grammar (the big problem in those is segmentation, but for something as specific as IF where you are always parsing the same kind of constructions, that shouldn’t be a huge problem). Regarding languages with declensions such as German or Latin, I don’t think they are easier for this purpose. They would be in theory, if you could always go from a word form to its lemma (base form) + declension, but that’s not the case in practice. At least in Latin (I don’t know in German) there are many ambiguities there: words that have the same form in accusative and dative, and even word forms that could be the nominative of a given noun or the accusative of other, and things like that. So declension is great for human speakers (which have no or few problems with ambiguity) but a pain for the programming of an IF system.

Just as a curiosity, in real unrestricted natural language parsing (which doesn’t have much to do with IF parsing) the languages for which parsers typically get the best precisions tend to be English and Japanese, with Chinese following more or less closely. Then we have the Romance and the Germanic languages. The Slavic languages tend to be a bit more difficult, and Arabic and especially Turkish are very difficult and people get crappy precisions (Turkish is noteworthy for being an agglutinative language that includes a lot of morphological information in a word). Note that this is a very rough and arguable outline, as these things depend on the domain of the texts, the availability of corpora, the number of people that happen to work on parsing or building grammars for a language, etc.; but that’s more or less the picture for natural language parsing.