Using a Dictionary Database to Build a Smarter Parser

I’ve been toying with the idea of using a dictionary to provide recommendations for unrecognized commands. The responses could look something like this:

> travel north
I don't recognize "travel" as a command. You might be able to use "go" instead.

> inspect chemical
I don't recognize "inspect" as a command. You might be able to use "examine" or "analyze" instead.

> blaze wood
I don't recognize "blaze" as a command. You might be able to use one of these instead: burn, light, or ignite.

Right now I’m looking at WordNet as a tool for identifying synonyms.

Are there any other projects doing anything similar? From what I’ve seen, there’s very little automation in this area, although the results of my first experiment might explain why.

People have occasionally speculated about something like this, but I don’t know of any cases of it actually being done.

Two thoughts:

  1. Are you doing this with the game nouns too? Nouns actually vary more from one game to the next, so it feels like that would be where you would make the most gains from having a WordNet compiler plugin (whereas you could build a standard library with really rich verb synonyms).

  2. Is there a reason to make it generate the recommendation rather than just doing something like

travel north
(I’m reading “travel” as a synonym for “go”.)

North Moor

? (I ask because in my experience it tends to exasperate players if the parser makes it clear that it does understand what they meant but it’s going to make them type some alternate form instead. But maybe there’s a reason why you don’t want to go that route here.)

I’d definitely like to do it with game nouns. Adjectives too, for that matter. I’m starting with verbs because it’s a smaller scope with fewer moving parts. If my prototype looks promising, I’ll try expanding to include more vocabulary.

I was worried about cases where a verb matches multiple commands with different implementations, but now that you got me thinking about it, I might have overlooked a better solution.

In the “travel” example, it makes total sense to let the game understand it as “go”. There’s no ambiguity to it. With “inspect”, however, I can imagine a conflict: let’s say “examine” does just what you’d expect, while “analyze” refers to a scientific process that consumes a limited resource. If the game guesses wrong when you “inspect the sandwich”, you could wind up wasting the last of your chemical test kit when you only intended to look at it.

But here’s what I just realized: we might be able to resolve that conflict automatically in most cases. Let’s say the game understands “burn”, “light”, and “ignite”, but they all resolve to the exact same action. If the game identifies all three as synonyms for “blaze”, it can parse each of them and compare the results. If they all point to the same action, the game can go ahead and execute it. Otherwise, prompt for clarification. Hopefully, that would be enough to make prompts the exception instead of the rule.

The dictionary database might be more useful as an author tool instead of pulling the synonyms dynamically while the player is typing the commands. In the end most games have a constant set of nouns and verbs, so the author could just go through the synonyms one by one and choose the ones that make most sense.

Yeah, I agree. My goal is to use it in a developer tool that suggests and/or generates code for synonyms. It’ll still require some user intervention so it only uses relevant definitions of a given word; e.g., in my initial experiment, the synonyms for “go” included “choke” and “kick the bucket”. I also don’t want to require the database and related libraries in distributed game files. The database itself is over 400 MB. Given that we’d only require a fraction of it for any game, it’s a lot more practical to compile the parts we want directly into the game’s native code.

I looked at this problem once upon a time, and WordNet in particular.

WordNet also lists hypernyms and hyponyms. (That’s superclass and subclass to us Object-Oriented programmers.) The hypernym of every verb is, eventually, “do” and for nouns is eventually “thing” (or whatever – I forget). There are(?) lots of “most derived subclass” word that have no hyponym; they’re at the base of the tree. Given such a pyramid of definitions (from hypernym root to hyponum leaf), some heuristics for auto-choosing definitions could be:

  1. if a word is pretty high up in the hypernym space, it’s probably such a broad, vague word that you wouldn’t want to trust it with much. But if a word is very low on that scale (i.e., it has no hyponyms), it’s safer.

  2. the longer the word, the less likely it is to have contradictory definitions (“OBLITERATE”). Short words in natural languages tend to get overloaded to the point that nearby prepositions change everything (TAKE vs. TAKE OFF)

  3. starting with the words which actually have meaning to the game, find the lowest common hypernym. That word maybe shouldn’t be used as a synonym for either command since it would always provoke a Which Did You Mean from the parser. But from one level down from that, and on down to the command, most everything is probably fair game, given the first 2 heuristics.

Just some ideas to get you thinking.

Cool, I didn’t know about hypernyms and hyponyms. Thanks for the ideas.