[I7] Multiple Language Framework -- a good idea?

There are many extensions for Inform 7 which contain only a very small amount of player-visible text. For example, Emily Short’s Basic Screen Effects contains the single line “Please press SPACE to continue.”; the total prose in Jon Ingold’s Title Page is “Display help menu / Start the story - from the beginning / - from a saved position / Quit”; and Mark Tilford’s Simple Chat has only “(or 0 to say nothing)”.

Translating such an extension to another language is little work, and several of these translations can be found on the Inform 7 website. The translators have had to choose between two different possibilities:

  • Making a new extension that is a copy of the extension to be translated, with the text changed. (Example: Effetti Visivi di Base)
  • “Include”-ing the extension to be translated, and using “in place of” headings to replace those parts of the original that contain prose. (Example: German Basic Screen Effects)

Neither of these possibilities is ideal. The former option evidently leads to maintenance hell, as an extension with N translations needs to be updated by N translators every time a bug is fixed – even when the prose in the extension is not changed at all. The latter option may seem better, but in order to ensure compatibility between the translating extension and the original extension, the translating extension should ask for a specific version of the original extension (“Include version X of …”). This also leads to a minor maintenance hell, since a new version of the original extension requires an (albeit trivial) update of all the translating extensions.

With this rationale in mind, I tried to think of a more elegant solution, and I came up with the following extension:

[code]Multiple Language Framework by Victor Gijsbers begins here.

MLF-language is a kind of value. The MLF-languages are English, English US, French, German, Italian, Spanish, Russian and Dutch.

Game language is an MLF-language that varies. [Defaults to English.]

Multiple Language Framework ends here.[/code]
To use, the end user includes this extension and defines the language of his game with something like:

Game language is French.

Obviously, this extension does nothing by itself, but it does allow other extensions to non-invasively incorporate translations of themselves. Here is an example of an extension that would support multiple languages:

[code]MLF Test by Victor Gijsbers begins here.

Section - Code

Instead of singing:
say “[Happy song]”.

Section - Standard text (for use without Multiple Language Framework by Victor Gijsbers)

To say happy song:
say “You sing a happy song.”.

Section - MLF text (for use with Multiple Language Framework by Victor Gijsbers)

To say happy song:
if the game language is:
– English:
say “You sing a happy song.”;
– English US:
say “You sing a happy song.”;
– French:
say “Tu chantes une chanson heureuse.”;
– German:
say “Du singst ein heiteres Lied.”;
– Dutch:
say “Je zingt een vrolijk lied.”.

When play begins (this is the MLF Test language check rule):
unless game language is English or game language is English US or game language is French or game language is German or game language is Dutch:
say “The language you have chosen is [bold type]not supported[roman type] by the MLF Test extension. You can disable this warning with the line: ‘The MLF Test language check rule is not listed in any rulebook.’”.

MLF Test ends here.[/code]
As you can see, if Multiple Language Framework is not included by the game, it is just a normal extension. So there are no extra dependencies, and people who are not interested in other languages will notice no change. But if Multiple Language Framework is included by the game, the translations kick in. (Both the main “to say” rule and the “when play begins rule” – which is non-essential but nice to include – follow a standard format, and can be easily copy-pasted from example code that would be in the Multiple Language Framework documentation.)

An extension using MLF can be easily updated by the original author, without needing any further work by the translators. (Unless the prose of the extension changes, of course, but in that case you will always need further work by the translators.) There do not need to be multiple versions, and there can be no compatibility issues. Even adding further languages to the MLF extension would create no problems. So to me this seems to be a fairly elegant solution to the problem of translating minimal-prose-extensions without increasing the difficulty of supporting those extensions.

What do you think?

The most obvious problem is that extensions will have to contain translations for all languages which makes them much bigger than required. To support .z5 and other size-limited platforms, we’ll end up with at least two variants of all the libraries, one english-only and one babel (which is .z5-incapable). In a usual application we would split the library into logic and translations and compile in only the translations needed for one language, but as the goal is to produce natural language I don’t think this is feasible.

Thanks for this post, Victor. I like the concept of making it easy to localize extensions. However, I also think of this as a subset of a larger problem, which is the problem of making any text printed by an extension easily changeable. Currently, most extensions do nothing to make strings easy to alter, so that the best option for an author is simply to edit the extension.

Given this, I’d prefer a best-practices solution over the meta-extension solution you’re proposing. For example, if extension authors adopted the practice of corralling all strings into a single table, in its own source code section, we would then have a robust solution to both problems:

[code]To say message number (N - a number):
if there is a string of N in the Table of Test Strings:
say “[message entry]”.

Section - English Localization

Table of Test Strings
string message
10 “You sing a happy song.”
20 “You sing a sad song.”[/code]

Given this, an author can amend the table to change just one thing (or the whole table):

Table of Test Strings (amended) string message 20 "You sing a lacrimose song."

Or, for translating or changing the whole table:

[code]Section - French Localization (replaces Section - English Localization in Test Extension by Erik Temple)

Table of Test Strings
string message
10 “Tu chantes une chanson joyeuse.”
20 “Tu chantes une chanson triste.”[/code]

This solution avoids the memory issues that Janka mentions, and also makes it far easier for someone to supply a Swedish translation if all I (as the extension author) have supplied is English and Spanish. An extension author could provide multiple localizations by making pasteable table code available in a special “localizations” section of the extension documentation, e.g.:

Alternatively, localization tables could be provided in the source code of the extension, but included automatically only when the larger Inform translation extension is also included, e.g.:

[code]Section - French Localization (for use with French by Eric Forgeot)

Table of Test Strings (amended)
string message
10 “Tu chantes une chanson joyeuse.”
20 “Tu chantes une chanson triste.”[/code]

Anyway, those are my dos centavos. Aaron Reed suggested in another thread that the next build of Inform will feature the long-awaited new approach to library messages. Depending on how this is structured, it might provide another avenue for this kind of thing…

–Erik

This would not be the case for English-language games, since sections headed “for use with Multiple Language Framework” don’t get compiled if you don’t include the MLF extension. Non-English games would become bigger, but only very modestly so: the extension would add just a few lines of code. So I don’t think this is a big problem.

(Also, why would anyone want to use .z5 anymore? .z8 is strictly better, and by now glulx has enough interpreters that even working with .z8 seems rather strange. I’m all for declaring the z-code format to be a legacy standard, and not worrying about supporting it for new games. But that is a completely different discussion, and one I’m pretty sure I’ve already had on this forum. :slight_smile: )

And they are good twee centen. I can see the advantages, but there is also the disadvantage that including extensions is more work for non-English authors. They not only have to include the extension, but they also have to copy-paste the translation code and add it to their game. The advantage of my proposal would be that adding the two lines “Include Multiple Language Framework by Victor Gijsbers. Game language is French.” is all the code you have to add to your game, even if you use 50 extensions. But, as I said, I can see the advantages of your proposal as well, as it would be a more robust system for general prose changes.

Perhaps this suggestions of yours combines the advantages:

[code]Section - French Localization (for use with French by Eric Forgeot)

Table of Test Strings (amended)
string message
10 “Tu chantes une chanson joyeuse.”
20 “Tu chantes une chanson triste.”[/code]
I should have thought about the fact that a game in language X will always already contain the language-X-extension, and that we can test for that. I mean, it will, right? You wouldn’t include French or German and not want all of your game to be in French / German.

So that means that my toy extension about singing could be written like this:

[code]MLF Test by Victor Gijsbers begins here.

Section - Code

Instead of singing:
say “[Happy song]”.

Section - Standard text

To say happy song:
say “You sing a happy song.”.

Section - French text (for use with French by Eric Forgeot)

To say happy song:
say “Tu chantes une chanson heureuse.”.

Section - German text (for use with German by Team GerX)

To say happy song:
say “Du singst ein heiteres Lied.”;

Section - Dutch text (for use with Nonexistent Dutch by Victor Gijsbers)

To say happy song:
say “Je zingt een vrolijk lied.”.

MLF Test ends here.[/code]
(Does a later declaration of “to say happy song” overwrite an earlier declaration? I don’t have Inform 7 ready to test this at the moment.)

This would not increase the size of compiled games. The only disadvantage I can think of is that there is no easy way to tell the game’s author if his chosen language is not available, but that is not too important. Is there any reason we are not doing this already?

I’d like to see a gettext-style solution. With gettext you usually print strings wrapped in a function that handles the translating:

__( "Click here to log in" );

The translation library looks for a match in the translation files and prints the localized version of the message if it finds one. The benefit is that you don’t have to create the English translation files separately. If the user’s language is English, the program prints the message as it is.

In Inform this could work something like this:

[code]Instead of singing:
print “You sing a happy song.”

Instead of pushing:
print { “%s doesn’t move at all.”, “[The noun]” }.

Table of translations
“You sing a happy song.” “Laulat iloisen laulun.”
“%s doesn’t move at all.” “%s ei liikahdakaan.”

To print (msg - text):
[say the translation from the table of translations]

To print (msg-list - a list of text):
[pick the right translation from the table matching the first item in the list, replace %s’s with the following items]
[/code]
This is not a perfect solution but the good part is that you don’t have to add translations to the extension itself and it doesn’t add (much) more work for the extension author to provide translatability. I think it’s important that translations can be provided outside the extension itself so that the extension author doesn’t have to support translations in addition to the extension functionality.

There is active work happening right now that should address many of these issues for future Inform builds – easier ways to replace default text in the library and in extensions, and some consideration for multiple languages. (Sorry to be so vague, but some elements are unfinished, so I don’t want to trail features in a misleading way.)

Do you have an estimate on when this new feature will arrive? (If it’s a week, I’ll wait with publishing my extension. If it’s a month, I’ll publish my extension and wait with the internationalisation until the feature arrives. If it’s a year, I’ll add internationalisation in one of the ways discussed in this topic for now, and change it when the new feature arrives.)

I see at least two big problems with the manual gettext substitution method:

  1. It requires the use of indexed text, which means that about 100kb worth of code would be compiled into the game whether the author wants it or not. This is bad for folks writing with z-code, and there are also authors who don’t want to use indexed text at all (on principle).
  2. As Felix mentioned, it is not nearly as easy to work with as Inform’s native system for text substitutions.

The main reason for doing things the gettext seems to be the ease of allowing the table lookup code to replace one string with another, instead of replacing a token* with a string. I don’t think that the ease of doing this outweighs these two problems.

*These “tokens” are numbers in the examples I pitched above, but they could also be KOVs. You could define the values using the table, so there would be no need to declare them ahead of time, e.g.:

[code]Message name is a kind of value. The message names are defined by the Table of Messages.

Table of Messages
message name msg
happy song “Sing a happy song.”
sad song “Sing a sad song.”

To say msg (T - a message name):
if T is a message name listed in the Table of Messages:
say msg entry.[/code]

…and of course, none of this stops you from using “to say” phrases in addition.

–Erik

I would guess it’s on the order of month(s), not a year, but I’ll pass on to Graham that you’re interested in this; he may be able to provide more details.

I’ve never been quite comfortable with gettext, and I think it’s a very un-Informish way of doing things. I might be willing to live with something like that under the hood without the printf-style formatting. From looking at NI translations of I7 source, I think that might not be so far-fetched…

Table of Translations source target "[The noun] doesn't move at all." "[The noun] überhaupt nicht sich bewegt."

As I’ve noted before, a simple table-driven approach (either Inform7 tables or gettext) would not be sufficient to generate natural language. Which is still the goal, I think. Take Plurality by Emily Short as an example: it has some code to produce the correct possessive for an English noun. But in English, that’s simple. In contrary for German, it’s required to know the declension of the noun to do that. And as there is no rule for that, the extension either has to use a German dictionary (impractical) or let the game author hint it. So Plurality even has to get an interface change to be able to produce natural German sentences.

Hm, I think you may be misunderstanding the purpose of Victor’s proposal. The idea is that the basic translation of Inform to the new language, including the kinds of library code needed to support gender, case, plurality, etc. are largely or completely taken care of already. Victor’s proposing a way to make it easier for the author of any old extension to provide the text of the extension in multiple languages (my and Juhana’s variations are intended to make it easy for end users to translate an extension after the fact as well). These translations would be able to make use of any tokens and text substitutions made available by the main translation library.

Mike, I’m not sure exactly what you mean, but it’s not possible to do what you seem to be suggesting by that table–Inform can’t directly match a text that includes substitutions. That’s because, when the text includes substitutions, it compiles to a function rather than just a numeric identifier. You can do it with indexed text, though, but I don’t think it’s desirable to force all strings in an extension to be indexed text just so that you can write, e.g. ‘print “[The noun] doesn’t move at all.”’ rather than ‘print the noun doesn’t move message’. The gettext-inspired solution that Juhana suggested allows for strings to be stored as text, but every string would have to be typecast to indexed text before printing to check for and carry out any simple gettext-style substitution. That may or may not be acceptable to potential users.

Anyway, here’s a quick illustration of the untenability of direct text matching:

[code]There is a room.

Bob is a man. The player is Bob.

Table of Translations
original translation
“[printed name of the player] exists.” “Existe [printed name of the player].”

When play begins:
print “There is a [printed name of the player].”

To print (msg - text):
if msg is an original listed in the Table of Translations:
say translation entry;
otherwise:
say “Could not match string.”[/code]

The text will not be matched. Move to indexed text, though, and it will (you’ll need to type at least the first column as indexed text as well as the input to the “print…” phrase).

–Erik

That’s too bad. I was assuming that every identical text would map to the same function, and thus be possible to test for equality. I don’t think it’s a great loss, though - as I said, I’m not really fond of gettext.

I look forward to seeing the new system for library messages. Hopefully it will make all of this moot.