Which languages have extant parser games?

So, the discussion on this thread:

Got me thinking about all of the parsers I’ve run into in various languages.

You have the obvious ones, English, French, Spanish, Italian, and German.

But I’ve also seen parsers made in Japanese, Catalan, Russian, Ukrainian, Czech, Polish, and Portuguese.

That said, I haven’t organically encountered any parsers written in (no particular order, and not remotely exhaustive) Arabic, Chinese, Hebrew, Hindi, Afrikaans, Greek, Indonesian, Vietnamese, Swahili, Tamil, Turkish, Korean, Thai, Hungarian, and innumerable others.

I’m sure many of these have parser games written in their language, but I have no idea which those are, and which actually signify a current or past community or just a couple lone developers.

I expect, due to the simplification of removing the parser, Choice style IF might be more ubiquitous in more languages, and I’m interested in that too, but I am fascinated by the hoops different communities must have jumped to create their own parser libraries and what those look like.

I was hoping we might explore the expanse and breadth of the parser world, sharing examples in various languages, and perhaps even identifying communities around such efforts, past or present.

Like, the idea of a small but intense Arabic parser scene existing is fascinating to me.

8 Likes

Found this mailbag from Emily Short, which doesn’t look super promising:

I did find a Chinese parser on IFDB, which initially seems promising:

But the blurb is a little disconcerting.

Chinese and Arabic speakers make up 1.3 billion and 0.5 billion people respectively. It’s sort of wild if parsers never germinated there because the tools didn’t exist and/or weren’t popularized. I get that IF has always been a niche of a niche, even in its heyday, but with a pool of nearly 2 billion people with just those two languages alone, it seems like there’s still a significantly large group of people that might find an interest in that.

6 Likes

I’m curious too. Unfortunately, English is my best language and my only language of true fluency, but I know a smattering of Chinese.

The Wikipedia page for Adventure game, which has been translated into many languages, could be a good starting point. A fair number of IF-related articles have been translated into Chinese. There’s a Wikipedia page for text parsers, but it only exists in English. Baidu Baike would be the real place to go, though, since unlike Wikipedia it’s not censored in the PRC.

When it comes to China, it seems Infocom had no presence there. Maybe not surprising, but as an example of its obscurity, searching up “Infocom” on Rednote only gives me info about IEEE International Conference on Computer Communications (INFOCOM), the conference on computer networking for computer scientists. Apparently Infocom 2026 is in Tokyo. Fun.

I’m sure there’s something, but doubt it will have ties to this community. As far as I’m aware, this forum, IFDB, and all the other IF sites have only English UIs and no translation features, which is a huge impediment to anyone who could be interested but doesn’t know English. I’ll do more searching later.

[Edit: Discourse, which this forum runs on, does in fact provide translation options for the UI, and has Chinese options. But it doesn’t provide an option to automatically translate posts, and there are no Chinese-language posts on here, so I doubt it helps much.]

2 Likes

In the “lone developer” category: In university I created a multilingual version of the “In search of Dr. Livingston” adventure game (in Dutch, French, and Indonesian) with a custom parser (no doubt influenced by the compiler design course I was taking at the time). Unfortunately the source code did not survive until the present. I had it printed out but the printouts became unreadable at some point and I had to throw it all away. The same fate happened to the magnetic media on which I had kept a backup…

9 Likes

I have to say I am more intrigued by the possibility of making a Chinese language parser/translation. I too have studied it a little, and I expect that there are many horrors involved in parsing free text that I have yet to even imagine, let alone understand. But perhaps (looking at it from more of a game dev perspective) some very clever people could be contracted to make it happen. Currently I am thinking the lack of clear/reliable markers - between words and to identify different grammatical elements - is the biggest challenge … so far.

2 Likes

Making a simple parser wouldn’t be too difficult, since English parser commands can be easily mapped to Chinese. One character per command. Something like:

  • 看 - look
  • 拿 - take
  • 东南西北 - east, south, west, north respectively
  • 上下 - up, down respectively
  • 进出 - in, out respectively
  • 吃 - eat
  • 等 - wait
  • 开 - open
  • 关 - close

And so on. Adding complex grammar would be more complicated, but far from impossible.

  • 把[object name]给[recipient] or just 给[recipient][object name] - implementation of “give [object name] to [recipient]”
  • 用[object name][command] - implementation of “use [object name] to [command]”, so for example 用钥匙开门 would be use keys (钥匙) to open door (门)

There are some interesting aspects to consider like measure words, which you might want to put in if you’re doing an inventory system. I’m sure there’s a library out there to get the right measure words for a noun, and barring that you could just do the simple, stupid option of having the object name and then the number of it in your inventory after, in parentheses.

Does a parser really need reliable markers between words? When I tried my hand at a custom parser, I hardcoded it to recognize specific command verbs, which would go at the start of a player command, and then hardcoded in edge cases like give x to y where the parser would recognize the word “give” at the start of the string and then look for the other stuff. This grammar works for Chinese as well. The way I implemented it for English didn’t require spaces and could’ve easily been modified to accept commands without spaces. My parser sucked though, so I don’t know if there’s a better paradigm out there.

I looked at the Chinese parser linked in this post, and it already does a lot of what I talked about here. Interestingly, it seems to only accept traditional characters, not simplified. The traditional/simplified split should be fine to manage, as you’d just need to code the input to accept simplified only and convert all traditional into simplified beforehand. Doing everything in traditional would be more difficult, because the conversion from Traditional → Simplified is lossy and there are certain simplified characters that map to multiple traditional ones, but I don’t think it would be an issue for a simple parser.

I guess there’s possible ambiguity introduced with commands like 给[recipient][object name]. You’d have to start with a full list of all the possible recipients and objects and check each character after 给 in turn, crossing off the recipients/objects whose names don’t match, until you reach a recipient match. Then do the same thing with the remaining characters for an object match.

As a side note, my lack of Chinese is hampering my ability to look for any Chinese parsers. The Chinese-language Wikipedia article for MUDs, which has a section on the history of Chinese MUDs, may hold some interest. You’ll have to paste it into a translation service. It basically says the earliest MUDs were English-only, but in the 90s people worked on getting the tech to support Chinese characters. Then, it seems, you had MUDs that required you to input commands in English but gave you Chinese output. As full Chinese support was reached, there was an explosion of Chinese MUDs as people built on each other’s software.

[Edit: There’s a small but devoted Chinese MUD community that still plays text-based MUDs. According to the Chinese-language Wikipedia article on Chinese MUDs, and several videos I found on bilibili, one of the most popular early MUDs was created in late 1995 by a group of Chinese students studying abroad in the US. It’s called 北大侠客行 or just 侠客行 (Xiá Kè Xíng), meaning Ode to Gallantry. It’s titled after a highly popular novel of the same name, which was serialised in Hong Kong from 11 June 1966 to 19 April 1967 and still gets a lot of modern TV adaptations. I found this video on bilibili, which shows the gameplay, but is entirely in Chinese. The video says Ode to Gallantry still gets about 600-700 players a day. Not entirely relevant to your search, but not irrelevant either.]

[Edit 2: There are a few videos of Ode to Gallantry on Youtube, if you look up 北大侠客行, but they’re all in Chinese. There’s also a Baidu Baike article on the game, which can be copied into a translation service if you want to skim it, though no Wikipedia article. I’ll stop looking for now, since this is a serious rabbit hole.]

11 Likes

Thank you for that very thorough answer/exploration. My current homebrew parser uses syntax matching as you have described (looks for fixed bits of text, albeit with lots of alternatives and optional bits, and figures out what entity the bits inbetween relate to), and I already have a system for hard matches involving consecutive entities (e.g. give big man coiled rope) which brute force tries every possible split to find one (and only one) that works.

Edit: since I think I am resigned to having different compiled parser modules for different languages, there is some leeway to modify things for particular languages, and also to outsource some of the more tricky adaptations to native speaker coders

1 Like

On IFDB you can search for language: followed by a language code, but even more usefully, the “Search Tips” page lists all language codes currently in use by games in the database: Search for Games

(Of course you’d have to cross check which ones are parser games…)

2 Likes

On a tangent, but Discourse can automatically translate posts. They use that feature on the Discourse forum about Discourse: see Discourse Translator - Plugin - Discourse Meta. Chinese is one of the languages they have.

2 Likes

Here are the languages of games on IFWiki:

Language:

None (67) · Dutch (2) · English (3650) · Esperanto (1) · French (96) · German (28) · Gostakian English (1) · Italian (21) · Japanese (2) · Lojban (1) · Polish (1) · Russian (2) · Spanish (28) · Swedish (3)

“None” obviously doesn’t mean no language… in practice it’ll likely mean English :england:

There are also some interesting pages (and external links) within Category:Language - IFWiki.

3 Likes

In my own parser tinkering, I have now gained some experience in which aspects really differ between languages in terms of parsing. I must admit: my experience is limited to German and English, as well as various considerations I have made about a Spanish parser.

My first attempt at a parser was completely bilingual. At some point, this resulted in a terrible mess. In German, there are many compound words (“Blumentopf” = flower pot), which are very clear and none of these nouns are ever used as verbs. In English, it becomes “plant pot,” and here it can of course also be a verb. How to scan related words that describe an object in a continuous text is identical in both languages. In Spanish, however, the adjectives would come after the nouns.

Verb declensions and grammatical cases were not without effort, but ultimately much more manageable than expected. In English, an “s” or an “ed” is occasionally added to the verb, and that’s it. In German, there are several dozen variants, while in Spanish there are roughly 10 times as many. In the end, the grammatical cases were easier: reducing nouns and adjectives to their root form and only then parsing them quickly proved to be quite reliable. That makes me optimistic about my future attempts at Spanish.

The bottom line is that there are very different experiences with which modules are likely to work best across languages and which are best suited to specific languages or language groups. My current guess is that probably 80% of the parser code could be used across languages, which is quite a lot.

4 Likes

One of the biggest things getting in my way of localising an Unreal Engine based project is that (as far as I am aware) you can localise individual entries in an array of text, but you can’t substitute a different sized array, which is an issue with things like syntax strings, that are heavily language-dependent (the bit you most need to work on, really). Dumb thing but it throws rocks in my path.

1 Like

Matt, why do you have arrays with different language-dependent sizes?

1 Like

Each verb can be given an arbitrary number of syntax strings to match. This is a strength. An array is a natural implementation for that. How I adjust that for different languages with possibly different numbers of syntax strings per verb is a challenge but I might go down the route of child blueprints rather than finding an alternative system to the array (e.g. loading in string table - good in some ways bit much less easy to edit). But I am rather getting into the weeds now.

Edit: one reason I am using conventional systems as much as possible is because Unreal Engine has a localisation system built in to all that, which I may avoid, but not until I have to…

1 Like

How I adjust that for different languages with possibly different numbers of syntax strings per verb is a challenge

I solved it this way: each syntax is language-specific, but ends in one “action.” The actions are completely language-independent. There are dozens of variants of varying complexity for individual verbs.

What does Unreal’s localization tool offer that would be helpful for a parser framework? I always thought that the system was pretty basic, because most games don’t use complex language logic.

2 Likes

Unreal has text types which are like string types but automatically localise. As in, you push a button, and every text literal gets dumped into a big file (specially for localisation software) and whatever translators give you in return is automatically substituted if you change locale. Works great for UI stuff, for example, but I have had translators tell me to do it manually in string tables (.csv files), for which there is also some engine support, because you can add comments and therefore also necessary context.

Text literals can be blueprint variables, widget text, or embedded in c++ (amongst other things), which has special macros for declaring text for localisation. A teeny bit less slick that the blueprint side.

It’s a bit involved, but what all the big UE games are using, I imagine.

1 Like

You are correct. Unreal’s loc system, like other mainstream game frameworks, is built on “lines of output” as a fundamental unit. It doesn’t help with variable text or interpolations – you can build that on top, but nothing helps you with the extremely messy problem of generalizing a text interpolation across languages. And the system doesn’t deal with text input at all.

3 Likes

How are you counting games that use one language for input and another for output, such as Mystery House (1982)?

2 Likes

Considering that I had not considered this until I read your post 30 seconds ago, I sincerely have no idea.

2 Likes

Greek - :: CASA :: Herakles

Turkish - :: CASA :: Keloglan

Hungarian - :: CASA :: Games - Hungarian (46 results)

CASA games by language - :: CASA :: Browse Games - Languages

I’ve always wanted to make a text adventure in Welsh, particularly given the strong text adventure history in Wales, but my Welsh is sadly not good enough yet.

4 Likes