Parsing head-final languages in Dialog

Draconis · May 1, 2024, 6:21pm

I’m trying to clean off my rusty Dialog skills again, and one of its big advantages over Inform is how flexible and user-customizable the parser is. So I’m currently trying to adapt it to parse a head-final language.

What’s a head-final language? Well, linguistically speaking, sentences are formed by taking a “head” and attaching things to it to make a phrase. For the sorts of commands we use in parser games (“verb phrases”), the head is the verb. And in English, the head of a phrase tends to come at the beginning. Prepositions come before nouns, verbs come before objects, and so on.

Pedantry

Or at least this is one popular model of how syntax works. There are others too. But this one’s useful for my purposes right now.

But some other languages, like Japanese, are “head-final”: the head tends to come after its arguments. Prepositions come after their nouns (making them “postpositions” instead), and verbs come after their objects. For the purposes of this discussion, let’s imagine a version of English where we use commands like LOOK, TROLL X, COIN TAKE, COIN TROLL GIVE.

This doesn’t really work in Inform, where the parser starts by looking at the first word of the command to figure out which grammar to use. But in Dialog, it might! So I’m trying to figure out how best to do that. (I think there was some discussion of Dialog for Japanese years ago but I can’t find it again now…)

Option one, rewrite the command. Once you’ve got a sentence, take the last word and put it at the beginning instead. Now most of the machinery can work the same way as English.

Option two, adjust the parser. Make it start by looking at the last word and proceed from there.

The latter seems like it would be a more interesting Dialog exercise, exploring the flexibility of the parser, but Dialog is built around CONS-style lists, where the first element is the easiest to access. Will I be sacrificing tons of efficiency if I try to run through lists in reverse order?

ramstrong · May 1, 2024, 8:01pm

Would separation of Verbs and Noun helps? Assuming COIN is Noun, then wherever it’s encountered, it’s always Noun.

So, COIN SCRATCH (ticket), where you scratch the lottery ticket with a coin cannot be accepted as the working assumption would be that you’re Scratching the Coin with you fingers. COIN=NOUN; SCRATCH=VERB.

Pebblerubble · May 1, 2024, 8:25pm

I always assumed that the language’s grammar is hardcoded into the IF interpreter. Because the world model is undividable from the grammar. I would find it interesting to see a solution to your “problem”: Maybe Dialog is more easily language-configurable than Inform?

Another language that you could call head-final is Turkish. It makes heavy use of suffixes. An example:
Mut = happy
Mutlu = to be happy
Mutluyum = I am happy

nephar · May 1, 2024, 8:52pm

There are multiple moving parts that can be touched to support head-final syntax^[1].

Simplest to modify is the action syntax in action processing rules, they can be declared any way you want without changing anything in the library. They even report parsing errors in head-final patterns automatically.

(perform [$Obj pluck])
	(try [take $Obj])

> pluck sdfkj
(I only understood you as far as wanting to something pluck.)

Another part you might want to modify is the grammar definitions. For performance reasons it is best to use the same grammar entry rules as given in the library without changing their formats, because those rules are compiled into a table by the compiler for fast lookup.

(grammar entry @pluck [23] [$ pluck])

You might also want to allow the author to define new grammar like

(grammar [[takable] pluck] for [$ pluck] head-final)

I can give you an example of the necessary grammar transformation access predicates to support this, but since you wish to do exercises, I will refrain.

The main thing to do is of course tokenizing/parsing players input in head-final syntax. Rewriting would be the easiest way, I agree with your conclusion. If you don’t want to do that you have to consider the two understand rules in the library that scan the grammar entry table:

%% We need a couple of understand-rules that query the grammar table:

(understand [$Verb | $Words] as $Action)
	*(grammar entry $Verb $Grammar $Action)
	*(match grammar $Grammar against $Words into $ObjList)
	(populate template $Action with $ObjList)

(understand [$Verb | $Words] as $Action)
	*(grammar entry $Verb $Grammar $Action reversed)
	*(match grammar $Grammar against $Words into $ObjList)
	(reverse $ObjList $RevObjList)
	(populate template $Action with $RevObjList)

Since these rules access the head of the list as the verb, you have to extract the last word in the input list as the verb before the *(grammar entry...) queries. You can change them in these rules or even better in (parse action $Words) rule, which actually calls all the understand rules in this section:

(parse action $Words)
...
	(collect $A)
		*(understand $ActualWords as $A)
	(into $AllCandidatesWithDup)

I made the change in the two understand rules in a test and managed to get this output:

> lantern pluck
You the brass lantern pluck.

Yeah, don’t forget the change the narrations as well, which is trivial.

Edit: If you make the change in the (parse action $) rule, the standalone understand rules (the ones without grammar definitions) should use the default head-first syntax. If you change the two understand rules above instead, then all standalone understand rules have to accomodate head-final syntax on their own as well.

My own native language is head-final. ↩︎