Is Out‑of‑Order Parsing OK? My Experience with a Greedy, Destructive N‑Gram Parser

I’ve been developing a parser for my own IF engine, and I wanted it to have a certain fluidity, something closer to how people actually speak, and less tied to rigid English word order. That led me to build what I call a greedy, destructive N‑gram parser.

The basic idea is simple enough. The parser breaks the sentence/command input into match groups: verbs, nouns, adverbs/modifiers, prepositions, compass directions, exit names, etc.

Once those groups exist, the surface structure of the sentence stops mattering. The parser greedily claims the longest meaningful spans, marks them consumed, and resolves the remaining pieces by position and proximity.

This leads to some interesting (and to me, desirable) behavior:

“from the table take the book” parses the same as “take the book from the table”

If the world contains “a wood box” and “a glass box” in the same location, then
“in the box put the box” parses the same as “put the box in the box”. Disambiguation menu choice prompts are displayed for the player to pick the box for both noun and object.

If a rotor object has positions low, medium, high, off, then “turn the rotor to low” works the same as “to low turn the rotor”

I built this without any preconceived ideas, just memories of Scott Adams and Infocom games. One very strict the other not so much. I’m just remembering this from my teenager years when I read many paperbacks and played those text games. I formed a belief that you should be able to type things in flexible, natural ways and still be understood.

Recently someone testing one of my games remarked that the parser “accepts things out of order,” and it made me wonder: Is this kind of flexible, order‑agnostic parsing considered OK in parser IF? Is it helpful? Confusing? Too permissive? Do players expect strict English ordering, or do they appreciate looser, more natural phrasing?

Just curious about how you would make your parser work or made one work if you have tackled it. It is a great exercise in programming. Text parsing is a cottage industry for all kinds of computer programs including IF.

8 Likes

I think it falls into the category of “things the player will never notice because they’re too used to the standard vocabulary.” By the same token, nobody will be upset.

6 Likes

The flexibility could let you do experiments with form. For example, suppose there’s a magic artifact that only responds to you when all the words in a sentence are in alphabetical order. You might know what you want to say, but then have to find a way to express it that conforms with the rules.

2 Likes

Yoda would be impressed. “Do. Or do not. There is no try.” “Much to learn, you still have.”

6 Likes

Here is an example from my port of cloak of darkness.

4 Likes

I’d be lying if I didn’t say I think that’s pretty cool.

Still, the thing is, I don’t know about most players, but I tend to type the least possible amount of words to get my meaning across. VERB NOUN is the most economic way, and efficient. I am an example of a swine before whom your pearls are cast.

4 Likes

I have mixed feelings about this. Many people used to playing inform games expect

get brown

to work if there are both a brown shirt and a green shirt in scope, but my own personal parser requires a noun there even if the adjective would happen to be enough to disambiguate things.

If I’m understanding you correctly, with your parser even

brown take

would work, which honestly is kinda cool and might be handy for people unused to parser games who don’t have English as a first language.

1 Like

How would you handle

give the librarian the book

(which is less specific than give the book *to* the librarian, and I’d expect to the librarian the book give to reasonably work in your setup)

-Dave

2 Likes

I suppose it would try for a usual syntax if it can’t decide? Or use flags such as actor/npc for the subject, and handheld items for the object, if possible. It’s usually going to work, but edge cases (like a machine that reads coupons could be GIVEn receipts) can be worked out manually.

Sort of an inversion of your ask but still the same when it comes to taking back the book.

image

2 Likes

Zork 1 is pretty free with the order of words, for some phrases.

4 Likes

Here it is on the C64

3 Likes

How would it do with “plant the pot plant in the pot” or “plant in the pot the pot plant”?

3 Likes

If it had any self respect it would retort “Screw you.”

2 Likes

There’s also GIVE TO KILL A MOCKINGBIRD TO HARPER LEE.

3 Likes

If “The Plant Pot”, “Plant Pot”, “A Large Green Plant Pot”, or anything with “plant pot” in it as the name of the world object, it would resolve it first. That is how n-gram parsers behave. The n in n-gram is for the number of words that make up the object. It usually starts from some determined maximum number of words but not exceeding the words in the command. Then it works its way down to just matching on one word. “plant pot” would be captured as the object first because it matches on 2-gram. Both “plant” and “pot” would similarly be matched on the 1-gram pass. “Plant” would be in the verb table and “pot” would match to another object in the object table. So it would all work out. That is the great thing about n-gram parsing.

and to @Giger_Kitty’s remark, it could reply with that if an action to display “Screw you” were added to the rules defined for putting something, possibly the wrong thing :), in the planter. I’ll make up a small example. It sounds fun.

1 Like

I don’t think out of order word parsing is a good idea in languages where word order can matter. It’s going to get things wrong without doubt.

However, working with word phrase fragments is a good idea. A kind of “bottom up” approach with increasingly larger parse fragments being combined together to make an even larger fragment and eventually a complete parse tree.

No, no, no, no, wait, wait, wait.

BUFFALO.
Buffalo buffalo buffalo buffalo buffalo.
YOU KNOW THE ONE, RIGHT!??!?!?

2 Likes

This is similar to how the parser works in ZIL (both ZILF’s and Infocom’s), which is why Zork accepts leaflet the read and Advent accepts lamp pick up. It identifies words by their parts of speech flags, which are set by the compiler based on the contexts where each word is used, then condenses adjectives + nouns into noun phrases, decides which prepositions and noun phrases go together, and looks for a syntax line with the right verb, prepositions, and number of noun phrases.

I don’t recall ever getting comments about it, so I assume it doesn’t matter much to players one way or the other.

4 Likes

In general, people don’t care if a parser over-accepts (that is, if it accepts ungrammatical sentences). The only real problems will arise when the structure of the sentence conveys important information. Presumably you already take note of the order of noun phrases, so that SHOW ALICE BOB and SHOW BOB ALICE can resolve to different actions. But also:

  • Prepositions need to stay with their noun phrases: POUR WATER FROM FLASK INTO BOTTLE versus POUR WATER FROM BOTTLE INTO FLASK
  • Some words can be both verbs and nouns: in a woodworking shop, you could both FILE HAMMER and HAMMER FILE
  • If you let nouns be modified by prepositional phrases, you have to handle PUT THE BOOK ON THE SHELF IN THE BAG versus PUT THE BOOK IN THE BAG ON THE SHELF—and is that second one PUT THE BOOK INTO THE BAG ON THE SHELF, or PUT THE BOOK IN THE BAG ONTO THE SHELF?
  • If you allow more complex sentences, things like AND and ONLY can attach at multiple levels; at this point you probably need the whole tree
7 Likes