This is mostly a thought experiment: I’m not proposing that anyone rush out and try to implement this in Inform or whatever. But I’ve always thought that a “noun/adjective” approach had potential. And it’s been nagging at me since the middle of last month so I finally sat down and put together some test code.
It’s fiddly to get the details right, but what part of an IF parser isn’t fiddly? And you can get it to the point where it handles a bunch of the common issues without author intervention, which is cool. So I thought I’d discuss it a little, even if it’s not practical for Inform now (or ever).
So. What if an object had two bags of words that name it: one set for “main” nouns, and the other for auxiliary descriptors (adjectives-ish)? And when trying to match a user’s input against an object’s name, the nouns score higher for disambiguation purposes. The object still has to match all of the player’s words (barring probably articles and prepositions) to be considered at all.
In English we can usually take the last word of a name as the core noun, and then the rest go in the “adjectives” bag. If there’s a prepositional phrase (do we only care about “of” or are there other common ones? maybe “with”?) we can take the first word before the preposition as the core noun:
- “shovel” in “blue plastic snow shovel”
- “portrait” in “big gaudy portrait of lord dimwit flathead”
In my code I also allow you to mark arbitrary words as core (+elephant+
) or auxiliary (-purple-
).
When a user types a name, we check the input words against the object’s name, scoring a large amount when one matches a noun (10? 100? any number that’s more than the largest number of adjectives that a player will ever type) and scoring 1 when it matches an adjective.
This lets “snow” prefer an object named “snow” or “heavy wet snow” to one named “snow shovel” because in the former case it’s matching a noun (more important) and the latter only an adjective.
To give us a tiny bit of word-order handling (“pot plant” vs. “plant pot”) we can also recognize the core noun of the input and double the score when it matches a core noun of the object name (only score it as an adjective otherwise).
“plant pot” matches itself with a 2-noun, 1-adjective score (we score the noun double because it’s the core noun in both the input and the object name).
But “plant pot” matches “pot plant” with a 1-noun, 1-adjective score: it has all the right words, but the core noun in the input doesn’t match up so we don’t get that double noun score.
And if we limit the object’s core nouns so they can match multiple times but only count score once, then we can handle adjective/noun ambiguity like “light light” too.
This is the tricky case: if we didn’t track which object nouns we’d already score, then if you typed the name “light light” it’d match both words against the noun for the “heavy light” object, making it match just as well as the “light light” one (or maybe even better if one counts as an adjective in the latter case).
But if we only score the object’s nouns once, then “heavy light” scores 1-noun (the other “light” matches but doesn’t score), while “light light” scores 1-noun, 1-adjective and is preferred.
If any of the input words aren’t matched at all, that object doesn’t match.
It handles synonyms in pretty much the same way as the single bag of words model: if you’re playing at the beach and you have a “plastic shovel” that’s also a “small spade” then the user can refer to it as a “plastic spade” just fine.
And of course, this fails to do the fancy stuff in some cases if (as in Inform, I gather?) it doesn’t have the input to score against by the time it gets to disambiguation. But if you have a parser that keeps that information around, this seems like a neat little upgrade? Maybe?
javascript code
Compute a score with:
inputMatchesObject(parseObjectName("INPUT"), parseObjectName("OBJECT"))
Some test output:
ok 15 "snow" prefers "snow" (20) to "snow shovel" (1)
ok 16 "plant pot" matches "pot plant"
ok 17 "pot" prefers "plant pot" (20) to "pot plant" (1)
ok 18 "plant" prefers "pot plant" (20) to "plant pot" (1)
ok 19 "plant pot" prefers "plant pot" (21) to "pot plant" (11)
ok 20 "light light" matches "light light"
ok 21 "heavy light" doesn't match "light light"
ok 22 "light light" prefers "light light" (21) to "heavy light" (20)
And the actual code:
const preposition = new Set(['of', 'with'])
const parseObjectName = (str, out) => {
let core, prev
const addWord = (word, implicitCore) => {
if(word == null) return
let isCore = implicitCore
const m = /^(?:\+.*\+|-.*-)$/.exec(word)
if(m) {
isCore = m[0].charAt(0) === '+'
word = word.substr(1, word.length-2)
}
if(isCore) {
if(implicitCore) core = word
out.core.add(word)
} else out.aux.add(word)
}
out ??= { core: new Set(), aux: new Set() }
const words = str.toLowerCase().trim().split(/\s+/g)
for(const word of words) {
if(preposition.has(word)) {
addWord(prev, core == null)
prev = null
} else {
addWord(prev)
prev = word
}
}
addWord(prev, core == null)
return out
}
const inputMatchesObject = (input,object) => {
let match = true, score = 0, core = new Set()
const check = (set, iCore, iAux) => set.forEach(word => {
const seenCore = core.has(word)
if(object.core.has(word) && !seenCore) {
core.add(word); score += iCore
} else if(object.aux.has(word)) score += iAux
else if(!seenCore) match = false
})
check(input.core, 20, 1)
check(input.aux, 10, 1)
return match ? score : 0
}