In connection with my project to put together a lean, mean IF machine that runs in php on any web server (without any special server-side software) I’m testing out my “IIB Parser” IIB stands for “Ignorance Is Bliss” because the parser doesn’t know anything at all about grammar, and can’t tell a verb from an adjective, and yet still gets the job done.
The first page of that document is old stuff. I went back and tossed out all the grammar-related stuff since I didn’t end up using that approach anyway. The document is now up to date with the code. (You may have to refresh the page in your browser if it still looks the same as before.)
Thanks for those links. It’s always interesting to study other approaches. Many years ago I played around with chatbots and did a lot of parsing code, but mostly is was built up from linguistic principles, and I borrowed the whole Brill Tagger for parts of speech tagging. This time around I’m looking to strip everything down to the bare essentials. Provided that it still works, of course.
Cool, I’d never heard of that. It does raise the question of backtracking (not that it would be significant) – how does the split/stat technique deal with possible backtracking based on changes in world state as the parser processes a command?
edit to add: and scope too, now that I think about it. Like the ‘cat in the hat’ vs. the ‘cat in the hat’. For example, in a room with a ‘cat in the hat’ and a cat in a hat, the command “give the cat in the hat to the cat in the hat”. Pretty dumb, but not totally out of the question!
I just uploaded a newer version of the parser sandbox that now handles things like “Take the marble out of the box with the tongs”, or even: “using the long tongs, remove the red marble from the tin container.” I’m up to 135 lines of code now. I also added an “Anti-Alias” function. Basically, this simplifies the split words tables, which are now much shorter, by reducing the vocabulary. It changes words into their “normalized” word. For example, if I type “remove the red marble out of the tin box using the tongs” it re-writes that as “get the red marble from the tin box with the tongs.” before it even starts parsing.
Your first question, re changes of state “as the parser processes a command”. I’m not sure I follow what you mean. Being a php script the action is strictly turn-based because nothing can happen between moves. The script is not loaded and run until the server gets a command from the browser.
As for the second question, Wow! That would confuse a human. However, he is called “the cat in the hat” because he’s a cat, and he’s in a hat, so in a sense “the cat in the hat” is synonymous with “the cat which is in a hat.” So either they are both the same object, or they are indistinguishable objects. On the other hand, if you wanted to give the book titled “the cat in the hat” to the actual “cat in the hat” then maybe you’d have to put quotation marks around the book title? Or maybe hyphenate the name the-cat-in-the-hat? I don’t really know.
That sounds like a problem for the story author to grapple with. (Passing the buck, I know. But if the code is going to stay lean and mean I have to stay on the right side of the 80/20 line.)
But that does bring up one valid point I had neglected to consider: disambiguation by location. Given that the red marble is in the wood box and the blue marble is in the leather bag, “take the marble out of the bag” will currently return “ambiguous”, because there are two marbles in the world. But it really should know that the only marble in the bag is the blue one, so that has to be it. To add that to the parser I will need to add code to keep track of the world state, which I don’t currently have in there.
Back to the drawing board. That gives me something to do tomorrow.
This parser approach sounds fundamentally similar to Quest’s parser. The concept of “split words” is analogous to the command patterns that Quest uses, as you have to have something to separate the “variable” parts of the player input.
The example of “boil soup in a kettle over a medium flame” would be handled with a Quest command pattern of “boil #object1# in #object2# over #object3#”, and you can specify alternate patterns by separating them with semicolons - so the full three lines of the “Expanding the Vocabulary of Commands” example on the linked page would be equivalent to:
boil #object1# in #object2# over #object3; boil #object1# over #object3# in #object2#; over #object3# boil #object1# in #object2#
You can handle extra optional words by specifying alternative forms of the command pattern in a similar way. For example:
open #object#; open up #object#; open #object# up
would be sufficient to handle “open the box”, “open up the box” and “open the box up” (as well as versions without “the”).
Personally I think this is easier to read syntax than the suggested function-calling syntax, but the approach to parsing sounds very similar.