Japanese input parser - Once more into the breech!

halkun · November 6, 2016, 12:26am

Hello all
In 2014 I was doing some highly experimental stuff with Inform 7, and was able to get it to do some pretty crazy things. I decided to put it away for a little bit, and that little bit turned into a few years.

Upon loading my old code and extensions… it seems the language has changed somewhat and my code is now broken

In my story I have a character who only speaks and understands Japanese, and the hero of our story only understands English. The idea is that when you chat with the other character, you must write in Japanese for her to understand anything. Any command not invoked by calling her name first, goes though the parser correctly.

[code]>Take the pen
Taken.

Kaori, hon o totte kudasai.
かおり、本を取って下さい。

Kaori takes the book.[/code]

Now before anyone side-eyes and thinks this seems an impossible task… it was working and the Japanese parser had 56 verbs it could understand.

The parser works like this; I have an override after reading a command. It would take the player’s command and check to see if a Japanese speaker was being called to. If it was, then it would break apart the player’s command (Which was assumed to be in Japanese) and then put it back together in English and pass the result to Inform’s parser.

Sadly, it now seems that monkeying with the player’s command causes it to be deleted. I have the fist part of the parser below with some debug lines in there.

A Japanese name is some indexed text that varies.
An english name is some indexed text that varies.
Japanese output is some indexed text that varies.

Table 1.0 - Names
 ename (indexed text)	jname (indexed text)
"kaori,"	 "香"

[Block Inform's parser. Start the Japanese parser]
after reading a command:
	say "The player's command -> [the player's command][line break]";
	[Clear variables used in the parser and sanitize]
	Now Japanese output is "";
	Let the player's command be the player's command in lower case;
	say "The player's command -> [the player's command][line break]";
	now the english name is the unpunctuated word number 1 in player's command;
	if there is no jname corresponding to ename of english name in the table of names:
		[if kaori's name isn't said, kick the input back to Inform's parser]
		change the text of the player's command to player's command;
	otherwise:
		[Japanese parser goes here]

That should be enough that you can see at least the first part function.

However, it seems the command “Let the player’s command be the player’s command in lower case;” obliterates the players command and nothing is returned to the normal parser.

>take pen
The player's command -> take pen
The player's command -> 
I didn't understand that sentence.

Can you alter the player’s command anymore? That would make me sad if that’s true.

matt_weiner · November 6, 2016, 1:04am

The syntax for altering the player’s command is

change the text of the player's command to "[player's command in lower case]"

See §18.33 of Writing with Inform. This is one of very few cases where you still write “change… to…”; just about every other case is now “now… is…”.

Or is the problem that the “let…” syntax is causing some clash between the phrase “the player’s command” and a temporary variable? What if you do this:

after reading a command: say "The player's command -> [the player's command][line break]"; [Clear variables used in the parser and sanitize] Now Japanese output is ""; Let temp-command be the player's command in lower case; say "The player's command -> [temp-command][line break]"; now the english name is the unpunctuated word number 1 in temp-command; if there is no jname corresponding to ename of english name in the table of names: [if kaori's name isn't said, kick the input back to Inform's parser] change the text of the player's command to temp-command; otherwise: [Japanese parser goes here]

Also there’s no distinction between indexed text and text anymore… though you’re still allowed to say “indexed text.”

halkun · November 6, 2016, 1:55am

Yea, I fixed it using a temp variable. (Should I be using “normal” programming terms here?). I was thinking that the player’s command string has some mutable/immutable attributes going on with it. My Inform is super rusty, but it’s coming back.

Nice to know that indexed text is a thing of the past. However, now that I got past that speedbump, my Japanese text output is straight up crashing the interpreter.

Glulxe fatal error: Stack overflow in callstub.

I’m guessing some unicode shenanigans is going on under the hood. Unicode string literals are OK. For example this works great.

[Little sugar for a little sugar]
Persuasion rule for asking Kaori to try kissing:
	say "'もう、やだ！'[line break]";
	persuasion fails.

Do you call them literals in Inform? If my programmy-talk gets it the way, let me know. BTW the “Inform for Programmers” webpage is really useful.
Anyway it seems that my code that’s concatenating the Japanese text together is creating a string that’s causing the interpreter to barf. I’ll review the unicode changes in Inform.

matt_weiner · November 6, 2016, 4:10am

The thing is that the phrase “Let X be…” usually creates X as a temp variable. So I think you were using a temp variable all along… but when you wrote “Let the player’s command be the player’s command in lower case,” I think you probably created a temp variable called “the player’s command,” which led to all sorts of trouble when you wanted to refer to the actual player’s command. Not sure though.

The Stack overflow errors usually happen when you create an infinite loop. The following code gives you a stack overflow in callstub error, for instance:

[code]Lab is a room.

A rock is in the Lab.

For printing the name of the rock: say “a nice [rock]”.[/code]

So I’d check for a hidden infinite loop before I started worrying about Unicode–there might be some Unicode shenanigans going on, but I think most likely is that you have a text substitution that calls itself.

…oh, another thing about “the player’s command” is that it’s not really a string like everything else, but rather a snippet (I sort of understand what this means but wouldn’t do a good job explaining it right now). So when you say “change the player’s command to [text]” it’s actually a set routine–the underlying grammar is more like ChangeThePlayersCommandTo([text]) than like “PlayersCommand == [text]”, if that makes any sense. In fact that’s why the phrase still has “change” rather than “now.”

halkun · November 6, 2016, 4:17am

Yup found it. Inform 7 now has recursive text abilities that was giving me grief.
I fixed it, but I’ll use this as a warning to others…

This is how I used to concatenate strings in Inform

[add jverb to the front of Japanese output]
Now japanese output is "[jverb][japanese output]";

In this new version of Inform, when you say the above, it recursively adds the first to the second until it runs out of stack space. (repeating jverb over and over and over)

Thanks for the iterator ability though. I wish I didn’t find out about it by stepping on a mine field

halkun · November 6, 2016, 4:53am

Actually. I thought I fixed this… but it’s still recursively adding everything together.

How exactly DO you concatenate strings in Inform 7? I would think it would be easy…

		Now temp text is "[japanese output]";
		say "Temp text --> [temp text][line break]";
		Now japanese output is "[jverb][temp text]";
		say "japanese output -> [japanese output][line break]";

It seems to be throwing the text around by reference. I need to to become static text.
When I say

Now temp text is "[japanese output]";

I want it to be a copy of the string that is in there, not be a reference to [japanese output], because that’s causing recursion problems. Is concatenating strings this hard now?

Draconis · November 6, 2016, 5:16am

If you ask for “the substituted form of” a text, it performs all the text substitutions and turns it into an actual string (i.e. an array of characters). Before that, it’s internally a routine which performs all the substitutions and prints the result every time it’s called.

halkun · November 6, 2016, 6:08am

So how do you copy a string to another string without substitutions?

Input:

"Test" by Halkun

atext is some text that varies.
btext is some text that varies.
temptext is some text that varies.

the apartment is a room.

When play begins:
	say "Hello, world.";
	now atext is "a";
	now btext is "b";
	now temptext is "[atext][btext]";
	say "temptext --> [temptext][line break]";
	now atext is "";
	now btext is "";
	say "temptext --> [temptext][line break]";
	now atext is temptext;
	say "atext --> [atext][line break]";

output:

Hello, world.
temptext --> ab
temptext -->
atext -->
Glulxe fatal error: Stack overflow in function call.

Don’t mean to be nit-picky but the way my parser works is that it creates a string of by breaking down the Japanese into it’s components and then rebuilds the string using Japanese characters. Depending on the sentence, it will need to “glue” the text it worked on to the front of what it has already finished.

So how do you concatenate strings in Inform 7? Even a better question, how do you copy a string from one variable to another so that when you monkey with one it doesn’t mess with the copy?

Draconis · November 6, 2016, 6:24am

You can say

let X be the substituted form of "[X][Y]"

though I feel like there should be a better way.

halkun · November 6, 2016, 7:23am

I admit, it’s an ugly, ugly hack. The reason why is rather obvious. You see, Inform’s Japanese support is pretty much non-existent

Believe it or not, the translation bit works like a champ. I can feed my character Japanese commands and she dutifully executes them. What’s broken is the bit that takes my commands and re-renders them in Japanese text. Because it’s really a “render” I’m treating the Japanese characters and sentence parts as glyphs and using strings as the glyph containers.

The idea is that the girl you are with in the game does not speak English… I mean, not a single English letter is going to fall out of her mouth. The words you speak to her will help you understand the words she says to you. By the end of the game, not only will you be able to understand her, but the player themselves will actually have a functional understanding of Japanese.

That’s the goal anyway… Thanks for the substitution tip.

matt_weiner · November 6, 2016, 2:59pm

OK, I think I may have a handle on why this suddenly broke with the new Inform. It actually is the end of the difference between text and indexed text.

As I understand it, in older versions of Inform you had to explicitly turn your texts into indexed text whenever you wanted to do certain manipulations to them. At the moment of turning text into indexed text, it would perform all the text substitutions in your text and flatten things into a string. And anytime you wrote a text substitution to an indexed text, it would flatten it into a string.

So, given that you had declared japanese text as indexed text, it was safe to set it to “[jverb][japanese output]”, because in order to write “[jverb][japanese output]” to indexed text, Inform had to first flatten out [japanese text] (and [jverb]) and then write them into the string.

In new versions, though, there are only a few occasions where the substitutions get flattened out automatically; basically, if the text has a temporary variable in it and you’re exiting the block where the temporary variable is defined, it gets flattened out.* Otherwise, it doesn’t. In particular, since things aren’t getting flattened out when you write them to japanese text, when you set japanese text to something with substitutions, it actually keeps evaluating the substitutions.

Then the phrase “substituted form” is basically used to say “Hey! Flatten this out now!” See Writing with Inform §20.7. It’s new, because previously things would get flattened out whether you want them to or not.

This also means that one of the basic use cases of “substituted form” is the thing you said, making a copy that doesn’t get messed up when you change the original. So something like this; note the Carry out concentrating rule:

[code]Classroom is a room. “WRITE a word, or CONCENTRATE to commit what you’ve written to memory, and REMEMBER to recall it.”

The written text is a text that varies. The remembered text is a text that varies.

Writing is an action applying to one topic. Understand “write [text]” as writing.
Carry out writing: now the written text is the topic understood.
Report writing: say “You write ‘[topic understood]’.”

Concentrating is an action applying to nothing. Understand “concentrate” as concentrating.
Carry out concentrating: now the remembered text is the substituted form of “[the written text]”.
Report concentrating: say “You commit ‘[the written text]’ to memory.”

Remembering is an action applying to nothing. Understand “remember” as remembering.
Report remembering: say “You have commmitted ‘[remembered text]’ to memory.”

test me with “write xyzzy/concentrate/write plugh/remember/concentrate/write plover/remember”.[/code]

*From Writing with Inform §20.7:

halkun · November 12, 2016, 7:43pm

Ok! I got my parser somewhat functional, but now to make it hard on myself, I decided to start making it a proper extension. Right now my Japanese character is rather hard-coded as Japanese, But I want to be able to define someone as Japanese that will kick in my parser…

I can’t seem to get it to ID the Japanese attribute properly.

"Test" by Halkun

A target is some text that varies. 

A test chamber is a room.

A person can be Japanese.
A person is usually not Japanese .

in the test chamber is a woman called Kaori.
Kaori is Japanese.
in the test chamber is a man called Bob.

after reading a command:
	Now the target is word number 1 in the player's command in lower case;
	if the target is japanese:
		say "This person appears Asian";

the error is:
Problem. In the sentence ‘if the target is japanese’ , it looks as if you intend ‘target is japanese’ to be a condition, but that seems to involve applying the adjective ‘Japanese’ to a text - and I have no definition of it which would apply in that situation. (Try looking it up in the Lexicon part of the Phrasebook index to see what definition(s) ‘Japanese’ has.)

It appears it’s saying. “You can’t compare text to an object” so how to I query if the person I talked to is Japanese after I take the input?

Draconis · November 12, 2016, 8:34pm

There are a few ways, none of them particularly pleasant. To do this properly you’ll probably need to drop down to the Inform 6 level. (For instance, what if the person’s name requires more than one word to specify? What if the player said “woman” instead of “Kaori”, so they might be talking to (English-speaking) Alice instead?)

matt_weiner · November 12, 2016, 8:37pm

The way you’ve defined it, the target is (or should be) literally the first word of the command. So if the player typed “Kaori, you aren’t going to understand this because I don’t know a word of Japanese” the target will be “kaori” rather than Kaori. So you’ve got a text rather than a person, and you need a person.

Unfortunately, what I think you’re trying to do looks pretty hard… you want to cut out the stuff before the comma and check whether it’s the name of a Japanese person, right? And you have to do this before parsing the command, so you can’t use “the person asked”? I don’t think there’s a quick way to do that, though if you’re writing a bilingual parser you’re already OK with non-quick ways…

matt_weiner · November 12, 2016, 9:26pm

I feel like Mike Ciul’s Objects Matching Snippets extension is designed for just such an application–figure out what the snippet before the comma is and what matches that. But I don’t think it’s updated for the latest version, and I can’t make head or tail of it. Like, where is “identified with” even defined?