Inform and Japanese

halkun · March 12, 2013, 8:10am

This is a longish continuation from an input problem I posted here

Because I wanted to get myself into inform, and not ask silly questions all the time, I took a month out of my life and just got though Aaron Reed’s “Creating Interactive Fiction with Inform 7” book. It was an awesome read (Picked it up from Amazon and read it on my tablet while I coded). With a better understanding of the Inform 7 system, I have some really far-out ideas I want to try and implement, and some of it is pretty advanced, and some of it is Glulx/Glimmr based. It’s also pretty complicated so I’ll try and break down the ideas so that you can get a handle of what I’m trying to do.

The goal - Two parsers in two languages.
I’ll break down each task and then how I plan to solve it.

Task #1 (less important)
You have the input parser that accepts English. When you type in a particular command, and a then a second parser is launched. Now the second parser only takes in Japanese sentences. Glimmr seems to have the ability to launch separate text streams in different windows, so I think I’ll need to investigate that.

Task #2 (more important)
The second parser, because it only accepts Japanese input, uses Subject - object - verb, (Or Subject -verb) sentence structure. I have a theory that this will work by capturing the Japanese sentience, identifying the subject, object, and verbal portions, (Along with auxiliary parts of speech) and flip them around. Then I can put them into the English parser for parsing. (Basically a translation method of sorts).

I understand that the parser does not take Unicode input. Romanji (Ascii-letter input) is fine.

To accomplish any of this, I see that in inform, there is the “After reading a command” method that you can use to start with. Now It appears that I can commandeer this method and branch off into my own little insanity, but I’m coming up short on the parser’s… guts for lack of a better term. The parser always make the assumption that the input is always going to be SVO (subject-verb-object) grammar, and I need to change it to SOV. (If there is a built-in switch to do this… that would be awesome!)

So as you can see, this goes a little deeper than “understand X as Y”. The issue is that the sentences are completely (If not systematically) jumbled.
To go down the rabbit hole further, Here’s a quick crash course in Japanese grammar.
Japanese nouns are called “nominals” (Because you can do things to them that you can’t do with normal nouns, mostly pluralization rules)
Japanese Adjectives are called “Adjectivals” (Same reason, for example you can tense an adjective in Japanese. Note: Adjectivals go before Nonminals in Japanese like Adjectives go before nouns in English)
Japanese Verbs are called “Verbals” (because of verbal modification rules)
Japanese auxiliary parts of speech are called “Particles” They are use to define what word is subject/object/action towards/etc)

To rewrite my last post, here are some input examples…

Open the mailbox
becomes
Yuubinbako o Akeru

Breaking down the Japanese it looks like this
(Yūbinbako) [Mailbox] {Object-Nominal}
(o) [part of speech saying the word previous was an object] {particle}
(Akeru) [open] {verbal}

going further

Read the leaflet
Tegami o Yomu
(Tegami) [Letter] {Object-Nominal}
(o)[part of speech saying the word previous was an object] {particle}
(Yomu) [Read] {verbal}

Also, having the computer capture that I say a person’s name will engage the secondary parser immediately. (And not launch a second window… which is fluff I’ll deal with later)

Kaori, pick up the newspaper.
Kaori-san, shinbun o totte kudasai

(Kaori-san), [Her name] {parser activator}
(sinbun) [newspaper] {object}
(o)[part of speech saying the word previous was an object] {particle}
(totte) [take] {verbal}
(kudasai) [please-do] {verbal} (note totte kudasai can be seen as one verb, the rule is if you see kudasai, execute the verb before. It’s just a politeness thing.

If I can break part the sentence, I can work the the nitty-gritty of the grammar. (that part, believe it or not, is the fun bit I want to tackle) However, how does one break it up the input and then inject the words back into the parser using the “understand as” method?

I guess, in the end, I’m more interested in task #2 first.

matt_weiner · March 12, 2013, 2:07pm

I might try to see if I can come up with something to allow for SOV parsing later today. It seems to me that capturing the Japanese sentence and flipping around the parts to feed them into the English parser is a bad idea (unless you’re prepared to write a full-blown Japanese parser extension or something). Since all the verbs will be in Japanese you can write special grammar lines for them such as

Understand "[something] o akeru" as opening.

And you can give things a Japanese name property and feed that into the parser:

A thing has some text called the Japanese name. The Japanese name of the mailbox is "Yuubinbako." [you may want to be very tolerant of alternate spellings!] Understand the Japanese name property as referring to a thing.

Making sure that you can’t mix languages like “mailbox o akeru” or “open yuubinbako” would take more work, as would switching from Japanese mode to English mode, but I think it’d probably be doable.

Buuuut, converting “Kaori-san, shinbun o totte kudasai” to “Kaori, get the newspaper” may be very difficult. The problem, as mentioned here and in section 16.10 of the documentation, is that the Inform parser can only tell that you’re issuing a command (rather than answering Kaori that “shinbun o totte kudasai”) is that the first word after the comma is a verb (or direction, I think). Since in this case, the first word after the comma isn’t a verb, there may be trouble.

Disclaimer: I’m not so expert, and I don’t speak a word of Japanese so I don’t know what other issues might be raised.

maga · March 12, 2013, 3:53pm

The main bit of advice I’d give you is: before you start working on this in earnest, I’d wait for the next release of I7. (One of the big new features involves stuff that will make it much easier to use languages other than English.)

Alex · March 12, 2013, 5:11pm

Quest could handle this pretty well. You could have two input windows easily enough - each text box can call its own Quest function to do the parsing. Quest’s parser is written using Quest’s ASL scripting language, and is in the CoreParser.aslx file, so you could adapt and modify that as required. Also Quest already supports multiple languages.

Not saying it will be easy, just that it would be easier to do it in Quest than Inform. All the internals are available to the game author and easily modifiable.

halkun · March 12, 2013, 6:01pm

Do you know when this may happen?

zarf · March 12, 2013, 6:41pm

It has not been announced. (See sticky thread in this forum group – popular question, no good answer.

Alex · March 12, 2013, 6:46pm

(Oh the other thing about Quest is no big secrecy around releases. Nobody is pretending to be Apple. You can grab the latest dev code from CodePlex at any time.)

zarf · March 12, 2013, 7:12pm

Alex, you are verging past informative and into harangue.

eu1 · March 13, 2013, 12:54am

So I tried this, using a parser from another project to do the heavy lifting. I can’t say I’m exactly happy with the results—maybe the range of possible commands is small enough that regexp replacements would be a better bet—but you can see them here.

(Incidentally, if I were playing Japanese IF, my first instinct would be to enter commands in the -te form, yonde'' rather than yomu’’ for example. But maybe that’s because my Japanese is so rusty.)

halkun · March 13, 2013, 5:10am

[ignore this post I have more on the second page.]

halkun · March 13, 2013, 6:53am

Oh my god, I didn’t even see this! This is a really cool framework to start from! I’ll play with this and see what happens tonight. (I may even be able to inject some Unicode output too. So awesome)

=== EDIT ===

Oops, I spoke too soon. There are several problems with this.

It seems it requires extensions that aren’t part of your typical inform. When I downloaded the extensions from githib, some of them were not in the release (Disambiguation Framework) and when I grabbed the master, would not work in my version of inform I have. [6C60]
This appears powerful but clunky/complicated. The idea is I want to write in IF game to teach Japanese, and it will have upwardly of 1000 items in the game. It would make the “when play begins” mind-numbingly gigantic. I may be able to brek that out into sections .
It’s GPL, and as I’m all for the open software movement, I’m not keen on releasing my code.

I guess I should take a step back and explain what’s going on:

The idea is that you are stuck on an island with this girl. There are no other people in the game. You can’t speak Japanese, and she can’t speak English. The girl will only be speaking using full-blown Japanese sentences and Japanese Unicode characters. When I mean you will have no idea what she’s saying, you will have zero idea of what she is saying.

You need to communicate with her in order to finish the game. She will tell you what you need to do in some parts and you need to tell her what to do in some parts. If you don’t understand what she is saying, or if your type out commands incorrectly, the game won’t progress.

This is where the two-parser idea comes in. In my second attempt to write this game (This is try #3), I introduced two mechanics to teach you Japanese. An argumentative and bitter electronic dictionary, and a book that the girl draws things in.

The idea is that when the dictionary was in a (rare) good mood, it would hint at what the girl is trying to say. Any Japanese you picked up from the device would be copied in the book, so that when the electric dictionary was being uncooperative, you have a log of what to do.

I was planning on using the dictionary as a parser. You type in your Japanese, and it would output the Unicode and the girl will do what you ask. You would have your own parser that you would use to navigate the game proper in English.

It seems I’m over complicating things. There may be a more elegant way.

With Inform, it appears and I use persuasion. The core mechanic is that you have to tell the girl what to do. If you use the right command, Inform’s actions can take it from there. I can use rules to make sure everything is aligned properly

== edit 2 ==

[I had code here, but didn’t work because I didnt follow what someone told me eariler, which is persuasion doesn’t work with “understand x as y”]

It looks like the most elegent code is above, but it doesn’t work and the whole GPL thing

Felix_Larsson · March 13, 2013, 10:08am

However, most or all of this stuff is to do with non-English output, not with non-English input, i.e. with the parsing of commands in other languages. Specifically, I doubt that there are plans for a built-in support for commands with an Object–Verb word order.

The ”officially” preferred way to deal with such things still seems to be to write an I6 LanguageToInformese routine that transforms the input language to the right number of lexical chunks in the right order for the parser to make sense of: in this case, I guess, to shift whatever words appear after the particle “o” to the front of the command, unless there is a comma or a period or a THEN etc. in it, in which case the words between “o” and the comma, period, THEN, …, or end of the sentence. should be placed directly after a preceding comma, period, THEN, etc . or else (if no such word precedes the noun) at the beginning of the sentence.

DavidC · March 13, 2013, 1:54pm

I was thinking the same thing. One plug/off-topic comment is okay. Two is … unnecessary?

David C.
www.textfyre.com

mostly_useless · March 13, 2013, 2:55pm

Both valid points, though. The whole idea behind Quest and the open-sourceiness of it is awesome, it’s just a shame there aren’t many good games being made with it.

DavidC · March 13, 2013, 3:46pm

The point is that this is the Inform topic…there is another place for mentioning Quest elsewhere or directly to user.

David C.

eu1 · March 13, 2013, 3:54pm

Whoa, let me clarify. I guess that I wanted to make two points.

First, that, in the worst case, there is in fact code out there that can systematically get from Japanese to English word order using the I7 equivalent of LanguageToInformese. There are some problems, of course, which is why I said I wasn’t happy with it: #2, the clunkiness, is the big one. (I’m not sure what’s going on in #1, since I can’t replicate either part of it. #3 would be trivial to waive.)

But, having tried it, I’m not sure that that kind of sophistication buys you anything. I had spent a bit, for instance, adding to the grammar so that it could get a command like tsukue no ue ni aru hon wo yonde kudasai'', to a command likeread the book that is on top of the desk’’ or even read the book on the desk''. In the end, it didn't do me any good; the Inform parser won't deal with such complications anyway. Were I you, I would inventory the set of commands you need to deal with. If, as I suspect, they are all fit into one of the forms‘’, _ wo _'', ni _ wo ‘’, or `` de _ wo _’’, then use the phrase at the end of WI 19.8 to rearrange the blanks and call it good. If not, then maybe give some of the non-conforming examples here.

halkun · March 13, 2013, 11:26pm

Ahh, I was a little too excited when I saw the code you wrote… Sorry
I’m running though some IF in Japanese right now and inventorying the commands. This may be my best bet… There are only a finite number of things Kaori can do, and I want to leverage as much of Inform’s native ability to handle objects

for what it’s worth, this is the error I got when I tried to run your example

C:\Program Files (x86)\Inform 7\Compilers\ni \
    -rules "C:\Program Files (x86)\Inform 7\Inform7\Extensions" -package "C:\Users\halkun\Documents\Inform\Projects\Jtest.inform" -extension=ulx
Inform 7 build 6G60 has started.
I've now read your source text, which is 937 words long.
I've also read Standard Rules by Graham Nelson, which is 39455 words long.
I've also read Context-Free Parsing Engine by Brady Garvin, which is 13902 words long.
I've also read Punctuated Word Parsing Engine by Brady Garvin, which is 5454 words long.
I've also read Disambiguation Framework by Brady Garvin, which is 6292 words long.
I've also read Runtime Checks by Brady Garvin, which is 1797 words long.
I've also read Low-Level Operations by Brady Garvin, which is 7588 words long.
I've also read Low-Level Text by Brady Garvin, which is 4835 words long.
I've also read Low-Level Linked Lists by Brady Garvin, which is 17877 words long.
I've also read Low-Level Hash Tables by Brady Garvin, which is 6601 words long.
I've also read Object Pools by Brady Garvin, which is 1521 words long.
In Book "Linked List Vertices", Chapter "The Linked List Vertex Structure" -
  unindexed, Section "Linked List Vertex Construction and Destruction" -
  unindexed in the extension Low-Level Linked Lists by Brady Garvin:
  >--> The sentence 'The linked list vertex object pool is an object pool that
    varies' (C:\Users\halkun\Documents\Inform\Extensions\brady garvin\low-level
    linked lists.i7x, line 99) appears to say two things are the same - I am
    reading 'linked list vertex object pool' and 'object pool that varies' as
    two different things, and therefore it makes no sense to say that one is
    the other: it would be like saying that 'Choucas is Hibou'. It would be all
    right if the second thing were the name of a kind, perhaps with properties:
    for instance 'The Hall is a lighted room' says that something called The
    Hall exists and that it is a 'room', which is a kind I know about, combined
    with a property called 'lighted' which I also know about.
 In Book "Linked Lists", Chapter "The Linked List Kind" in the extension
  Low-Level Linked Lists by Brady Garvin:
  >--> You wrote ''A linked list is a kind of value' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\low-level linked lists.i7x, line 172)', but that seems to say that
    some room or thing already created ('linked list vertex object pool',
    created by ''The linked list vertex object pool is an object pool that
    varies' (C:\Users\halkun\Documents\Inform\Extensions\brady garvin\low-level
    linked lists.i7x, line 99)') is now to become a kind. To prevent a variety
    of possible misunderstandings, this is not allowed: when a kind is created,
    the name given has to be a name not so far used. (Sometimes this happens
    due to confusion between names. For instance, if a room called 'Marble
    archway' exists, then Inform reads 'An archway is a kind of thing', Inform
    will read 'archway' as a reference to the existing room, not as a new name.
    To solve this, put the sentences the other way round.)
  >--> The sentence 'A linked list is an invalid linked list' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\low-level linked lists.i7x, line 173) appears to say two things are
    the same - I am reading 'linked list' and 'invalid linked list' as two
    different things, and therefore it makes no sense to say that one is the
    other: it would be like saying that 'Choucas is Hibou'. It would be all
    right if the second thing were the name of a kind, perhaps with properties:
    for instance 'The Hall is a lighted room' says that something called The
    Hall exists and that it is a 'room', which is a kind I know about, combined
    with a property called 'lighted' which I also know about.
 In Book "Object Pools", Chapter "Object Pools", Section "The Object Pool
  Kind" in the extension Object Pools by Brady Garvin:
  >--> You wrote ''An object pool is a kind of value' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\object pools.i7x, line 47)', but that seems to say that some room or
    thing already created ('object pool that varies', created by ''The linked
    list vertex object pool is an object pool that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\low-level linked lists.i7x, line 99)') is now to become a kind. To
    prevent a variety of possible misunderstandings, this is not allowed: when
    a kind is created, the name given has to be a name not so far used.
    (Sometimes this happens due to confusion between names. For instance, if a
    room called 'Marble archway' exists, then Inform reads 'An archway is a
    kind of thing', Inform will read 'archway' as a reference to the existing
    room, not as a new name. To solve this, put the sentences the other way
    round.)
  >--> The sentence 'An object pool is an invalid object pool' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\object pools.i7x, line 48) appears to say two things are the same -
    I am reading 'object pool' and 'invalid object pool' as two different
    things, and therefore it makes no sense to say that one is the other: it
    would be like saying that 'Choucas is Hibou'. It would be all right if the
    second thing were the name of a kind, perhaps with properties: for instance
    'The Hall is a lighted room' says that something called The Hall exists and
    that it is a 'room', which is a kind I know about, combined with a property
    called 'lighted' which I also know about.
 In Book "Data Structures", Part "Parsing Structures", Chapter "Parse Steps" -
  unindexed, Section "Helper Variables and Functions for [...] Step
  Construction and Destruction" - unindexed in the extension Context-Free
  Parsing Engine by Brady Garvin:
  >--> In 'The parse step object pool is an object pool that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\context-free parsing engine.i7x, line 399), 'object pool that
    varies' is a contradiction in terms, as this is something that cannot ever
    vary.
 In Part "Mixed-Purpose Data Structures", Chapter "Parse Tree Vertices",
  Section "Helper Variables and Functions for [...] ertex Construction and
  Destruction" - unindexed in the extension Context-Free Parsing Engine by
  Brady Garvin:
  >--> In 'The parse tree vertex object pool is an object pool that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\context-free parsing engine.i7x, line 464), 'object pool that
    varies' is a contradiction in terms, as this is something that cannot ever
    vary.
 In Book "Punctuated Word Parsing", Chapter "Understand Line Internals" -
  unindexed in the extension Punctuated Word Parsing Engine by Brady
  Garvin:
  >--> In 'The understand parseme linked list is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\punctuated word parsing engine.i7x, line 285), 'linked list that
    varies' is a contradiction in terms, as this is something that cannot ever
    vary.
 In Book "Disambiguation Framework", Chapter "Scoring", Section "Scoring
  Subroutines" - unindexed in the extension Disambiguation Framework by Brady
  Garvin:
  >--> In 'The parse tree score hash table stack is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 180), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
 In Chapter "Unification" - unindexed, Section "Private Unification Variables"
  - unindexed in the extension Disambiguation Framework by Brady Garvin:
  >--> In 'The canonical tree count list is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 383), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
  >--> In 'The canonical tree count list stack is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 385), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
 In Chapter "Local Disambiguation", Section "Private Local Disambiguation
  Variables" - unindexed in the extension Disambiguation Framework by Brady
  Garvin:
  >--> In 'The parse tree feature hash table stack is a linked list that
    varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 433), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
  >--> In 'The already matchable tree list is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 439), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
  >--> In 'The offered feature alternative list is a linked list that varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 443), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
  >--> In 'The offered beginning lexeme index stack is a linked list that
    varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 448), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
  >--> In 'The offered feature alternative list stack is a linked list that
    varies' (C:\Users\halkun\Documents\Inform\Extensions\brady
    garvin\disambiguation framework.i7x, line 449), 'linked list that varies'
    is a contradiction in terms, as this is something that cannot ever vary.
Inform 7 has finished.

Compiler finished with code 1

eu1 · March 14, 2013, 2:07am

Ah, you must have downloaded before I pushed the last commit to GitHub. At the top of Context-Free Parsing Engine you’ll want to move the Object Pools include earlier.

bukayeva · March 14, 2013, 1:06pm

Lighten up, folks. This ain’t politics or religion.

Valid points were made. Leave it up to the original poster to decide if they felt they were being harangued or given unnecessary commentary.

Healy · March 15, 2013, 3:45am

This thread reminds me of an idea I had for a Kawaiikochans fan-game, where Japanese words and Eigo are intermixed.
[TL NOTE: “Eigo” - English]
I envision it being functionally similar to The Gostak, only perhaps not so strict in what commands it will take. I also want to modify the standard Inform response to a command without a visible noun to “Such a thing! It can’t be seen.” (or something like that).