Python Parser anyone?

Im building a game in python but am new to coding in depth. I would like to implement a parser but don’t know where to start. Does anyone have a robust parser that maybe can do verb, object, indirect object they could share? Any help if not would be greatly appreciated as well!

3 Likes

I don’t have one to share, but I did find some notes from several decades ago, which by happy coincidence, are about the basic steps for a parser that does verb, object, indirect object.

These notes - with some invent-on-the-fly-improvements - did result in a usable parser. I regret to say that I did not make any notes about those improvements.

Before reading the rest of this post, I recommend going to look at

Then come back and read the rest of this post.

Keep in mind that the central goal of a simple parser is to take the {verb}{noun} as typed by the player
and translate to [action][object]. IOW, you should define a Chompskian ‘deep language’ of actions upon objects to keep track of what is really happening. From that you determine what changes need to be made to the world.

Good luck. If you have any questions, feel free to post them here.

===================================================================

The player makes a request/command
The parser goal is to determine what changes that makes in the world - and on the screen.

The player may speak in terms of {verb}{noun}
The parser should translate to [action][object]
And then into changes.

The first pass of the parser resolves alphabetic strings into words
Those words are looked up in a dictionary.
Any misspellings or unknowns are clarified through conversation with the player.

Once a sequence of valid dictionary entries is obtained
The parser assigns a part of speech to each word.
Some such assignations may be uncertain.
These are noted as alternate entries in the part of speech data.

In the second pass, nouns and adjectives must be translated into objects.
This is done by ANDing arrays of bits.
This may involve some clarifying conversation with the player.

Eventually we reach the point in which all nouns and adjectives have been translated into unique objects.

Then a second part of speech pass is made to resolve alternates.
Example:
‘With’ can be turned into ‘using’, if the inventory bit array confirms that the player has the object.
Or it can be turned into ‘wearing’ or ‘carrying’ or ‘holding’ if another person has it.
The difference in how an object is possessed should be another field in a dictionary entry.

At this point all words should have parts of speech assigned,
And all nouns and adjectives have been resolved to unique objects.

Verbs ( and maybe indirect objects ) must be translated into actions.
Non-core verbs map N to 1 on to core verbs.
( example:‘hit’, ‘attack’, ‘strike’ all map to ‘attack’ )

Next we resolve the ID - if any - as a verb modifier.
Then we have the two indices into a 2D array of [verb]x[VM].
Each element in the array is an action.

Once we have a unique action and a unique direct object,
Then we use those as indices into a 2D array of [action]x[DO]
Each element of that array is a change (or a list of changes)

Each change is expressed as a new value of some variable, and/or some text written to the screen.
The execution of that change may depend the state of other variables.
If/then clauses are part of the content of a change.

2 Likes

This sounds like a topic for @tundish!

There’s also this: Wireframe Magazine: Building a Text Adventure in Python

1 Like

I’ve been having a play with Python text adventures this last weekend too, having just spent a few months learning Python for non-game purposes on a software dev course. Finding a nice way to package/distribute the end result seems like the trickiest thing!

I found it fun following along with Phillip Johnson’s Make Your Own Python Text Adventure book. The example game is more of a choice-based RPG when complete but it might give you some ideas.

There’s also a short game here - https://github.com/MyreMylar/christmas_adventure - which is a more traditional parser example and uses Pygame and Pygame GUI, if you want to have a look at that.

2 Likes

Implicit in the notes that I posted above, but not explicitly stated, is the idea that the parser is mostly data-driven.

IOW, the parser does not ‘know’ any verbs aside from the basic directions, and meta-commands like ‘quit’ or ‘save’ or ‘repeat’. Those are hard coded because they apply in all games.

But other verbs are contained in a file that is unique to the particular game. In the same file is an array of data, telling the game runner what happens when a certain verb is applied to a certain object.

This requires a little more coding initially, but makes large games much simpler in the long run.

IOW…most programmers start with hard coding directions, something like this:

If command = NORTH  {
  current_room := current_room.north;
  display ( current_room.description );
}

That is reasonable. But it leads programmers to reflexively hard code all verbs, like this:

If command = COOK  {
  switch( object ) {
    PIE: { 
      do whatever
      break
    }
    CAKE: {
      do something else
      break
    }
    DRAGON: {
      run!!!
      break
    }
  } //end switch
}

The parser then has to be custom written for every game. This ultimately means a whole lot more work.
But if it is data-driven, large games require no extra work by the programmer. The parser/runner is written once, and then can accept any file from the game writer.

EDIT: Thanks to FriendOfFred for helping me make this more presentable.

2 Likes

Thank you so much for the information! It’s a lot to take in but is starting to make sense now. I haven’t been introduced to using bit arrays in my code yet but I think it’s very useful for making fast connections like that. I will for sure come back with more questions once I test out these ideas!

Feel free to add more anytime!

Have you checked out ScottKit? It works something like that, and I don’t see anything wrong with that. It should be easier with Python.

1 Like

Thank you! That is a good game and has some useful ideas i can use. I too am finding it difficult to try to make transitions smooth and readable

1 Like

You can format a code block by setting it off with three backticks on a line by themselves:

```
code
```
2 Likes

Apologies in advance for a long, rambling answer. It’s offered in the hope that it’s helpful on multiple levels. I’ve spent some time going down this path, so I’ve taken some time this evening to explain what’s down it and to share some code. The code is the main part of what implements a reasonably decent parser in Python for a model world that’s implemented in a specific, reasonably Pythonic way. It’s certainly not the only way to do this, and it’s definitely not the best way. But it’s a reasonably good way, I think, and I’ve explained what I’ve done and why and what some of the trade-offs are.

And I just discovered now, after writing this, that there’s an upper limit to how long forum posts are allowed to be; this is by far my longest post ever here! So I’m going to have to break this up into multiple pieces. Sorry for that.

Part 1: Introduction and Why You Might Not Want to Do This

How you write your parser depends a lot on how your data ontology for the rest of the program is set up. Think hard about this before you put much effort into writing your code, because it’s much easier to do a good job early and then build on that than it is to try to change the way a complex system you’ve already built operates.

I’ve got a half-written piece of parser IF in Python, and I’ll talk in a bit about how the parser works, but first I want to encourage you to think hard about whether you really want to write a piece of parser IF in Python instead of in TADS, Inform, Adventuron, Dialog, etc. Other people have been encouraging about the process, and I won’t deny that writing an entire game from scratch in a general-purpose programming language is a fun task in a lot of ways. Python is, I think, even one of the better general-purpose programming languages to write IF in because of its’ object-based underlying mechanics, its introspection facilities, and its power and expressiveness.

But writing a piece of parser IF in a general-purpose programming language like Python is an awful lot of work compared to writing one in a language that’s specialized for writing parser IF. You’ll spend most of your time not writing a story, but writing the framework that the story hangs on. At a conservative estimate, my allocation of framework-writing to story-writing time for my half-written parser-IF-in-Python-project is maybe 20:1. It’s probably higher. Maybe much higher. You have to build the framework before you can start telling the story, and you have to keep tinkering with the framework as you’re telling the story to change how it works in small ways because you didn’t realize when you were writing it initially that you were going to want to … whatever. However well you plan, your system will be inadequate in some ways that you’ll discover as you go along, and you’ll have to modify it, and then you’ll have to go back and fix the things that you broke when you changed the basic rules about how the little world that you created works.

It’s an awful lot of work, especially if you want it to be a pleasant experience for an end-user to interact with. Because you’re going to want other people to play the game when you’re done with it, right? After you’ve spent a few thousand hours designing an underlying system to hang a story on and then hanging a story on it, you’re going to want to show it to someone so you can say Behold this thing I have created!, right? And you’re going to be concerned about whether they actually enjoy playing it? Of course you are! (OK, I admit that I don’t know you, personally, and I’m guessing. But I think it’s likely to be a fair extrapolation from most other makers that I know.)

But the question of “will someone else enjoy this thing?” is a hard one to answer in advance, and it’s easy to overestimate how wonderful the thing you’re building is because, well, it’s yours; and also because you know how much effort you’ve put in, whereas the people who encounter it will not. There’s a lot of parser IF being written these days relative to the size of the parser IF community. I myself couldn’t possibly play everything that piques my interest. I’ve got a to-play folder on my hard drive with hundreds of games in it, some of them thirty years old or more, games I want to get around to playing but just haven’t found time for. The community is saturated with well-written, well-executed content. This is not a bad thing, but it does set the bar high for your work, and players have high expectations for how the things they interact with should behave.

And the brutal truth is that when you say “I wrote this in Python, you know,” players as a whole aren’t going to care much. As a group. they largely don’t care if you write your game in Dialog or Inform or Python or TADS or C++ or Ruby or Lisp. They care about the game itself, not the underlying system. But it’s the underlying system you’re going to be spending the huge majority of your time on. Think about whether you’re willing to spend 95%+ of your time on a part of the project whose highest aspiration is to be invisible and get out of the way so that the player can enjoy the content you hang on top of the system you spent most of your time building. Personally, I could not care less whether or not Jack Kerouac wrote On the Road on one long sheet of paper-towel paper being fed through his typewriter; the question for me is “do I enjoy reading it?”.

The how-to articles that people have linked above show the easy versions of performing some of these tasks: a parser that understands a dozen words, and can move in four directions. Simple verb-noun parsers aren’t that hard to write, which is why they were standard in homebrew BASIC-written games in the eighties. But it gets harder to write parsers that allow for multiple direct objects, that allow for indirect objects, that allow you to use pronouns or the word ALL, that allow you give commands to NPCs. All of these are special cases that make the parser more complicated, but they’re also things that players expect to be able to do. It’s not just the articles that people have already linked that oversimplify the task of writing parsing code; that’s also true in general for the how-do-I-write-parser-IF-in-Python articles that you’ll run across if you do a Google search. As a rule, people writing those articles are doing a good job of showing off how to leverage Python language features, but only an elementary job of showing off how to write good parser IF and the underlying system that it depends on. There are plenty of similar articles out there; they tend to provide a simplified “here’s how the basic idea works in Python” sketch and then skip over the harder parts and the boring implementation details. But it’s the harder parts and the boring details that are going to occupy the vast majority of your coding time on the parser.

The uglier side of the fact that most people won’t care much about whether you invented a whole system from scratch is that no one is going to give you a pass on doing a shoddy job with building the underlying system just because you built it yourself from scratch. Nobody forced you to build it yourself from scratch. The player doesn’t care much what system you used to build your game; they care about the experience of playing the game. If I hire someone to mow my lawn, I’m not going to pay him extra because he decides to put on a blindfold and tie his hands behind his back and push the lawnmower around while holding the push-bar in his mouth. And if he decides to do that, I’m going to be really annoyed if I have to keep running outside and restarting the lawnmower for him because he can’t do it with his hands tied behind his back while wearing a blindfold.

Almost every year somebody submits a game to the IFComp made with a homebrew system that was written in Lisp or C or some other general-purpose programming language. Most of them turn out to be kind of disappointing games, despite the huge amount of effort that went into them. (The counterexample that springs to mind for me was 2019’s ALICE BLUE, which was written as a shell batch file for Linux systems. I quite liked the writing and thought it was rather well-designed, working with its limitations quite effectively instead of trying to do everything that traditional parser IF does. But it got an average score of only 3.22 out of 10, and only 9 people voted on it, and it finished 78th out of 82, which I personally think is a shame, but then I scored it more than twice as high as its average score, so I guess I’m an outlier there.)

So the short version of all of that is that if you want people to both play and enjoy your game, you need to care enough about the player experience and the fact that the bar has been set high to do a really really good job of implementing your underlying system. This means even more work than doing a mediocre job of implementing your system. It’s not even the 80-20 rule, where 80% of the effort goes into finishing the last 20% of the task; it’s much more extreme, perhaps a 90-10 or 97-3 rule.

Here are some things that Dialog, Inform, et al. do for you with no effort that you will have to build from scratch if you use Python, and will have to do a good job with to avoid irritating players and sending them away shaking their heads. This is only a partial list.

  • Setting up a data ontology for your created world and settling on an underlying machine-readable representation for each object in the game.
  • Parsing, of course.
  • Loading and saving.
  • Making it possible to undo an action.
  • Buffering text, deciding whether there needs to be a “press a key to continue” message when a whole page has been printed so it doesn’t scroll off the screen before the player has read it, and wrapping text to the window width.
  • Pluralizing irregular nouns.
  • Building text out of templates and text substitutions.
  • Conjugating verbs and dealing with subject-verb agreement.
  • Ensuring your underlying world model remains consistent when you modify it.
  • Inventing a system by which it’s possible to talk to NPCs, and give them commands that they may or may not obey.
  • A few dozen standard commands, meta-commands, and debugging commands (INVENTORY, EXAMINE, GO, RESTART, LOOK, SCORE, ABSTRACT, PURLOIN, …)
  • Making it possible for actions to be taken by NPCs in addition to the player character.
  • Putting a few hundred sensible real-world boundaries in place to keep the realistic illusion in place (“if A and B are both containers, and B is inside A, then it should not be possible to also put A into B at the same time”).
  • Packaging the game in such a way that it’s easy for everyone who already plays parser IF to run. (Lots of people double-click on a story file and their preferred 'terp pops up. Lots of people prefer it that way. Are you writing a console app that reads from stdin and writes to stdout? Not everybody’s computer is already set up to pop up a terminal emulator to run a Python file when a Python script is double-clicked from their file browser. For a goodly sized chunk of people, that’s already a deal-breaker: they will not open a terminal, navigate to the appropriate folder, and type a command to get your program running. They will not deal with the “do I have to type python3.exe before the name of your program?” question. Having to install Python first is already a deal-breaker for many people running Windows. Are you using other modules than those installed in the standard library? Hopefully you can redistribute them, because asking end-users to deal with pip means losing another big chunk of your potential players.)
  • Dealing with security issues. (This is a whole huge can of worms on its own, largely involving player trust. A story file generated by [say] Inform or Dialog is largely sandboxed so that it can’t steal data off of your hard drive and send it off to a remote server. This is of course not true for Python programs, which can of course do precisely that. And you’re not thinking about using the pickle module to handle saving and loading your game, are you? Another can of security worms: Pickle files are a Turing-complete computation system that can do more or less anything the Python language itself can do. If players are, or might be, exchanging game saves, and those game saves are Pickle files, than your game’s save files are a potential attack vector: opening a maliciously crafted game save could steal information from a player, erase their hard drive, etc. etc. etc.).

There are plenty of other things that need to be done, and done well; that list is just off the top of my head. All of this is stuff that would be done for you, or at least set up and require minimal fleshing-out effort, if you were using Adventuron or Inform or ScottKit or Dialog or TADS. All of it is stuff that you will have to write from scratch and debug and test on your own if you write in Python. Most of it will need to be modified at some point along the way if you write it yourself, because as you write your story, you’ll realize you want it to do things you didn’t originally build the underlying system to do. And there are always edge cases you won’t think of when you first write the code, generating unexpected bugs.

Which is not to say that you shouldn’t write parser IF in Python. It is to say that it’s probably a lot more work than you’re anticipating. The constant temptation is to just say “screw it, good enough.” But doing that chips away at the player experience, and players notice when you’re half-assing the game you’re writing. What I would like to assure you, having started to go down that path and put a substantial amount of time into following it to half-write a large work, is: learning a whole new domain-specific parser IF language is far less effort and takes far less time than doing a good job of building a whole underling world-simulation system in Python.

So, to my mind, there are really only about two and a half really good reasons to write parser IF in Python:

  • To learn Python. (You will learn Python really well if you do this. You will think a lot about data ontology, and develop a complex class hierarchy, and do a lot of debugging. You will solve complex problems and think about the implications of designing a complex system. I myself wrote my first-ever Python decorators and metaclasses for my half-finished Zombie Apocalypse: A Love Story. I still think it’s a good project that I’d like to finish some day. But that won’t happen before I finish this Inform game I’m working on, which is going much much much faster.) Remember, though, that this is a benefit for you, not for your players. Your players won’t be willing to overlook shoddy craftsmanship because you learned something along the way. Writing a good game that’s good for players involves prioritizing the player’s experience, not asking them to take one for the team.
  • The game you want to write wants to do things that can be done in Python but absolutely cannot be done in a domain-specific IF language.
  • Because you’re incredibly stubborn, you insist on doing this, and you’re willing to do it as well as it would be done in a domain-specific IF language. (I still think this is a half-reason.)
14 Likes

Part 2: Talkin’ ‘bout Data Ontology but Keepin’ it Funky

All that being said, though, let’s dive in and start by talking about data ontology.

The traditional way for homebrew parser IF to function was that objects were illusions presented by the underlying system; that there was no brass lantern represented as an object in the underlying memory banks, but rather just an entry in several tables or arrays with a common ID number. So it might be that the ID number for the brass lantern is, say, 22, and when the player types EXAMINE BRASS LANTERN, the parser seizes on the description BRASS LANTERN, then looks it up to discover that BRASS LANTERN corresponds to the object with ID# 22, and then looks at the twenty-second entry in the array of descriptions, and prints something along the lines of “It a small lantern made of brass.” If you need to know something else about the lantern, you look in another table or array, or find the information somewhere else in the model world: if you want to know where the brass lantern is, the system might look through every room, then see if it is in any containers, then if it is being held by anyone … or it might just have another array that tracks the current location of every object.

All of this was necessary because '80s homebrew games were largely written in BASIC, the language that came with virtually every home computer at the time; users could be assumed to have a BASIC interpreter pre-installed, because it was wired into the computer’s hardware. But BASIC in most implementations has no data types other than numbers and strings, and arrays of those things; there was no underlying representation of “objects,” so programmers made do without one. They did that by using tables and arrays of numbers and strings.

You can write BASIC-style code in Python if you want, and of course it’s possible to represent objects in tables and arrays, but it seems to me that representing in-world objects as Python objects is a good move for a lot of reasons: it makes it conceptually simpler and lets objects themselves carry around information about what they’re capable of and what can be done to or with or by them. it also lets you leverage Python’s inheritance system to provide and override defaults. And since Python has good introspection facilities for objects, it’s possible for the parser to query the objects themselves about their characteristics, which makes parsing easier.

Every in-game noun in Zombie Apocalypse is a descendant, at some (possibly far) remove, of an abstract object called Noun. Noun provides basic information for every single noun in the game. Mostly, this means default rejection messages and default attributes.

class Noun(object, metaclass=StandardGrammar):
    """Person, place, or thing. Everything accessible to the parser is an instance
    of this class, or a subclass of it.
    """
    _points = None
    _plural = False
    _plural_in_form = False
    _touched = False
    _size = 0

    def describe(self, **kwargs):
        """Every descendant needs to implement this in a different way."""
        raise NotImplementedError("ERROR: Noun.describe() must be overridden by all descendants, but the class %s does not do so." % self.__class__)

    def look(self, **kwargs):
        """LOOK is a synonym for DESCRIBE"""
        self.describe(**kwargs)

    def look_at(self, **kwargs):
        """A synonym, in this case."""
        self.look(**kwargs)

    def examine(self, **kwargs):
        """Yet another synonym for DESCRIBE"""
        self.describe(**kwargs)

    def inspect(self, **kwargs):
        """INSPECT is a synonym for DESCRIBE"""
        self.describe(**kwargs)

    def eat(self, actor):
        """Though I have a cat who disagrees with me about this, most things are not
        edible.
        """
        if actor == globs.the_hero:
            su.printer('Sorry, {[spec]} is not edible.', self)      # Subclass objects.Food overrides this; so do some medications.
        else:
            su.printer('{[SPEC]} is quite unwilling to eat {[spec]}.', actor, self)

    def swallow(self, **kwargs):
        """A synonym for EAT."""
        self.eat(**kwargs)

    def consume(self, **kwargs):
        """A synonym for EAT."""
        self.eat(**kwargs)

    def gobble(self, **kwargs):
        """A synonym for EAT."""
        self.eat(**kwargs)

(su.printer is a utility function, the printer routine in the small_utilities module, that breaks text into appropriately sized lines and buffers it until the entire round has been run, then dumps it to the screen all at once, pausing with a “press any key for more” message occasionally. Just using Python’s print() would happily let words be split between lines and would allow more than a screenful of text to be printed during a single turn, forcing the user to scroll back. If they’re playing in a terminal emulator that allows for scrollback, of course. It also supports transcripting. Every text-printing operation in Zombie Apocalypse uses su.printer() instead of print() except for a few debugging routines and, of course, su.printer() itself, which internally uses print() to print each line.) (We’ll skip the metaclass declaration for quite a long time before coming back to it.)

Again, that’s just an excerpt. The actual definition of Noun is much longer, of course, because it provides default-rejection messages for everything, many of which are overridden by subclasses. By default, you can’t EAT most things; but you can define something as Food instead of Noun, and Food and its subclasses override the .eat() method. Synonyms just dispatch to the canonically named method: if the player types GOBBLE STEAK or SWALLOW STEAK, the Python attribute search finds .gobble() defined on Noun, kicks of another attribute search for .eat(), and finds .eat() defined on Food, of which the STEAK object is either an instance of, or the instance of a subclass of. (There are currently no STEAK objects in ZA, so this is a purely hypothetical example.) At the level of an abstract base class like Noun, it makes sense to reject pretty much every action, because by default most actions should only succeed on certain subclasses. You can’t EAT a place; you can’t GO TO an object. (There’s no reason why either has to be impossible; it’s my game, after all, and certain types of fantasy writing support both ideas. But there have to be basic parameters at some point, and the Noun class sets a lot of them.)

Other defaults can be overridden by subclasses in the same way. Most other descendants of Noun are going to override _size, for instance, at some point in their inheritance chain. Again, there’s no inherent need to deal with the sizes of objects in parser IF, and plenty of parser IF makes no effort to do so. But ZA does have situations where the relative sizes of objects are important, so it defines a _size attribute on the base class that everything else derives from; this means that every descendant of Noun has a _size attribute that the Python inheritance search can find, so it’s always safe to refer to an in-world thing’s _size.

There’s a pattern here that’s important to support the mechanics of the parser I’m writing: object attributes, including method names, must begin with an underscore unless they are action routines that handle a command. So _plural is a boolean flag that indicates whether a noun is plural, a collection; and _plural_in_form indicates whether a noun is a single thing that gets grammatically treated as plural, like “pants.” Neither is an action – you can’t PLURAL STEAK – and the fact that the attribute begins with an underscore signals that to the parser, .eat() is an action-handling routine, so it doesn’t begin with an underscore; the parser, inspecting the object, knows that EAT is something that can be done to that object. (There’s nothing special or magical about the underscore at the beginning of the attribute name; it’s just a convention that the parsing code checks for. But there needs to be SOME way to distinguish whether something is an action-handling routine, and though Python lets you check whether something is callable with the callable() check, that doesn’t distinguish between games handling in-game actions and utility routines that do things for the class other than handling actions, and this adapts a common convention in Python by which “private” – actually pseudoprivate – names are signaled as such by being named with a name that begins with an underscore. It’s a rather small deformation of that convention, really.)

Noun itself is never instantiated directly. That’s why .describe(), the canonical synonym for the LOOK/LOOK AT/EXAMINE family of synonyms, raises NotImplementedError if it’s ever called: It’s intended to force me to realize early that I’ve done something I’m not supposed to do. (It’s better to see the errors when you’re designing the program instead of allowing them to propagate.) Python has a system for formally specifying that something is an abstract base class with methods that must be overridden in the abc module in the standard library, but I avoid working with that and just manually raise errors instead for two reasons: (1) it’s much slower to use the isinstance() call to check whether descendants of abstract base classes formally declared as such are instances of another class, and Zombie Apocalypse runs such checks a fair amount; and (2) formally registering Noun as an abstract base class with the abc module requires using abc.ABCMeta as the metaclass of Noun (and therefore all of its descendants), but I’ve already got another metaclass I’m using for something else, and a class can only have one metaclass. So formally declaring Noun to be an abstract base class is out, and I just do it informally by raising errors during play when things happen that are supposed to be handled by subclasses.

There are three primary subclasses of Noun, all of which are also mostly abstract but are occasionally instantiated directly: classes Creature, Thing (because object is a reserved word in Python), and Location. Each is subclassed repeatedly, sometimes with many steps in a descendant chain. Here is an excerpt from definitions for one abstract subclass:

class Creature(nouns.Noun):
    """A generic Creature defining default behaviors; possibly never instantiated
    directly.
    """
    _description = "an indescribable crawling Thing"
    _size = 3                   # But many descendants will override this. See nouns.Noun for documentation.
    _gender_pronoun = "its"
    _items = None
    _contained_by = None
    _hitpoints = 1
    _direction_traveling = None # if the Creature has travel plans that last beyond the current turn, those are here: a direction string
    _holding = None             # Only Humans and subclasses can use weapons, but let's make sure we explicitly track that other Creatures aren't equipped with anything.
    _last_location = None
    _relationships = None
    _responsive_after = -1      # Turn number on which this creature can execute scripts. Its primary
                                # purpose is to delay following a script until the next turn, if a
                                # Creature gets a script before its "move" comes up; this allows things
                                # to have a one- (or more-) turn delay before the script is executed.

    def describe(self, actor):
        """Routine for describing a Creature."""
        if actor is globs.the_hero:                          # The narrative is focalized through the protagonist!
            su.printer("{[SPEC]} {[verbf]} {[desc]}.", self, ('be', self), self)
        else:
            su.printer('"{[SPEC]} is {[desc]}," {[spec]} tells you{[str]}.', self, self, actor, random.choice(['', '', '', ' helpfully']))

Here, the method handling the DESCRIBE command is overridden: it has to be, otherwise EXAMINE HORSE would raise NotImplementedError (because that’s what’s defined at the base class, Noun). On the other hand – and you can’t tell this because you don’t see the entire definition of Creature here, which is about five hundred and fifty lines long, not counting the methods and attributes inherited from Noun – the command handling EAT is not overridden, because I’m happy with a Creature being non-edible by default for the purposes of the story I’m writing. Creature objects (and instances of their subclasses) also have a whole bunch of convenience methods, all of which start with underscores because they’re not verbs; these include _every_turn_trigger(), a routine that gets called every turn to give the Creature a chance to do something if it wants to; ._die(), something that happens to creatures often in a world where zombies roam; _go(), which handles movement; _possessive_pronoun(), which returns the possessive pronoun that’s grammatically appropriate to use for things belonging to this creature; and plenty of others.

Creature can be instantiated directly, but it’s also subclassed over and over and over; Mammal is a subclass of Creature, and Human is a subclass of Mammal. Housecat is also a subclass of Mammal that overrides different behaviors and attributes than Human does. Human itself is subclassed a lot, largely to provide generic support for character tropes from zombie movies that I poke fun at: subclasses include Coward, Leader, Asshole, Cynic, Child. Leader gives several new capabilities; these are people who can have bands of followers that follow them around. Protagonist is a singleton that’s a subclass of Leader; it has a lot of modifications to many of its parents’ methods, including that calling _die() on the single instance of the Protagonist class runs the handling-the-end-of-the-game code.

Similarly, Location subclasses Noun to provide location-based commands, because there are times where you want the player to be able to refer to locations when talking to the parser: GO TO HOSPITAL moves the player one step closer to a known location. Location is further subclassed, sometimes just to provide different default text so I don’t have to describe similar things over and over: InteriorLocation, ExteriorLocation, OutsideBuildingLocation (if the most interesting thing to say is that you’re standing outside a building), etc. etc. etc. If a whole group of Locations has specific properties – every room inside the large mall has specific every-turn behavior, for instance – those rooms are likely to all be instances of a specific further subclass of InteriorLocation. Or if every place in a forest exhibits a specific kind of “you’re lost in the forest” behavior, those are likely to all be instances of a further subclass of ExteriorLocation.

And of course the direct subclass of Noun that’s most frequently and deeply subclassed is Thing. Some inheritance chains descending from Thing are:

  • Noun -> Thing -> Food -> SpoiledFood (rejects attempts to eat it with a “that smells gross” message by overriding the .eat() method);
  • Bed (inherits behavior from the Noun -> Thing -> Container -> Furniture -> BarricadeableFurniture -> chain and from the Noun -> Thing -> Supporter -> Counter and also from the Noun -> Thing -> Container -> Counter chain, because Counter itself inherits from both Container and Supporter: Python supports multiple inheritance, so it can be both a container [you can put things in drawers under the bed] and a supporter [you can lie on top of it] as well as being a thing you can use to barricade the doors when the zombies are trying to break in);
  • Thing -> Cigarette
  • Thing -> PrintedMatter -> Plaque

Anyway, you get the idea. Attributes and behavior can be overridden repeatedly, at different levels, and are defined on objects themselves. (Well, on classes of objects, anyway. Python’s object model requires that objects be instances of a class, a small annoyance sometimes, and another one that’s ameliorated by languages like Inform, where you can define behaviors for a specific wine bottle and not on the WineBottle class as a whole, nor on a specific class that you have to write so you can instantiate it exactly once as BottleOfReallyExpensiveOldWine. C’est la vie: you’ve already decided to use Python, you’re bound by its rules. You could monkey-patch individual objects by attaching methods manually to them but that introduces huge problems that are not worth raising just to get a boost in conceptual purity.)

So this system groups together all of the behaviors for a specific object (or, well, class of objects, where there may be just one object tin the class, and I’m going to stop talking about this distinction now) on the object themselves and has real benefits in code organization: instead of writing a lot of if object == Bottle: [...] elif object == Cigarette: [...] elif object == Shotgun: [...] else: print('I don\'t know how to ' + verb + 'a '+ objName + '!' code in verb-dispatch tables, you define action-handling behavior on the objects themselves. Leveraging Python’s object orientation means you group verb-methods and object attributes together in classes instead of spreading them out throughout the code.

I think that this has a real conceptual benefit for code-organization on large games: You make an Orange a descendant of Food, which is a descendant of Thing, which is a descendant of Noun, and it inherits all of the behaviors of all of its superclasses, most of which are rejection messages, except for those behaviors that it specifically overrides. All of the code that’s specific to the Orange class is wrapped together under the class definition, not spread throughout the code base in verb-dispatching tables, all of which have to be looked up by object ID. The parser itself knows what verbs are possible on a given object: it can examine the object’s non-underscore attributes. This helps with disambiguation, too: if the player is trying to EAT ORANGE, and there is an ORANGE JACKET in the room, the parser can introspect the objects to see which have an .eat() method that doesn’t result in a rejection.

6 Likes

Part 3: Parsing, an Overview

All that being said, all you really need to do at the most basic level to get yourself a verb-noun parser when the objects themselves know what’s possible on them is to try to match what the user typed to the relevant textual description of the objects, then dispatch to the appropriate verb method. Of course, players deserve more than a simple verb-noun parser, and you’ll want to handle special cases: pronouns, addressing someone else, verbs that require an indirect object.

That being said, here’s most of the main loop for Zombie Apocalypse:

from bin import globs

from bin.parser import parsing_help as ph
from bin.parser.grammar import *
from bin.parser import parsing_help

from bin.start import startup

from bin.util import debugging
from bin.util import simple_commands as sc

def main_loop():
    """The main loop for the game."""
    try:                        # At this level, catch any exceptions not otherwise caught and dump the text buffer before re-raising
        while True:
            try:                # At this level, catch control-key combos.
                command = su.get_input(ph.command_prompt)
                if len(command) > 0:
                    split_command = [what.lower().strip() for what in ph.tokenize(command)]
                    # First, handle commands requiring ... well, special handling.
                    if split_command[0] == 'debug':                 # Parse DEBUG commands without altering capitalization.
                        current_verb = ph.parse(command)
                    elif split_command[0] == 'save':                # SAVE and LOAD need to represent potentially case-sensitive filenames, too.
                        current_verb = 'save'
                        sc.do_save(the_command=command)
                    elif split_command[0] in ['load', 'restore']:
                        current_verb = 'load'
                        sc.do_load(the_command=command)
                    else:
                        current_verb = ph.parse(command.lower())    # Figure out what the command is and execute it. Everything not handled above is case-insensitive.
                    if current_verb not in ph.extradiegetic_verbs:
                        parsing_help.late_turn_actions(current_verb, command)
                else:
                    su.printer("{[RND]}?", ["I beg your pardon", "I'm sorry", "Sorry, what", "Hmmmm", "What was that"])
            except EOFError:                                        # Ctrl-D in Linux; I forget what it is in Windows. Something else. Ctrl-Z, maybe?
                print("wait")
                sc.do_wait()
                parsing_help.late_turn_actions("wait", "wait")
            except KeyboardInterrupt:                               # If Ctrl-C is hit, just abort the current attempt to enter a command and start over.
                print()

    except Exception as e:                                          # If an unhandled error occurs ...
        su.printer("ERROR: We're about to crash. Here's the most recent exception:")
        su.flush_buffer()                                           # dump the buffer-queued text...
        import traceback
        for i in traceback.format_exc().split('\n'):                # Add the traceback to the transcript ...
            su.debug_printer(i.rstrip(), prefix="    ", min_level=-1)
        su.flush_buffer()
        su.close_transcript()
        raise e                                                     # ... and let the error propagate.


if __name__ == "__main__":
    # First, set up
    startup.opening()
    # Now, run the game.
    main_loop()

The outer try catches errors, helps to make them more intelligible, and then lets them crash the program; it’s better to find errors and then fix the underlying causes than to try to keep a broken system running evenly. But it’s better still to get the most detailed, helpful tracebacks you can.

There’s then an inner try handler that only catches control-key combos: control-C aborts the current attempt to enter a command and does nothing without “taking a turn”; control-D is mapped to WAIT.

Aside from that, on every turn …

  1. Input is requested. If it’s zero-length, a randomized “Wait, what?” message is printed and nothing happens. If it’s not zero-length, processing continues.
  2. It’s split into a list of words, after being lowercased and having spaces stripped off of both ends of each word. This is done by parsing_help.tokenize(), which doesn’t do much more than the .split() method on a string.
  3. The parser checks to see if it’s a debug command; if it is, debugging code is invoked and no other processing happens this turn.
  4. If the command is SAVE, LOAD, OR RESTORE, the relevant file-handling code is invoked, and nothing else happens this turn.
  5. Otherwise, the tokenized list is passed to parsing_help.parse(), which performs (or delegates) the meat of the parsing and returns the name of the verb that it decided on. The name of the verb is important because some verbs are “out of world” actions, metacommands, that don’t “take a turn”; this is determined by looking at the list of extradiegetic_verbs in the parsing_help module. If the detected verb is not in this list, then the action “takes a turn,” and also a series of “every turn actions” has a chance to occur.

The process then repeats until the player quits the game or reaches the end.

parsing_help.parse is responsible for understanding the command and executing any actions that need to be taken in response to it. If the player types EAT DONUT, it determines whether there is an object in scope that matches the description DONUT. If there isn’t, it prints a message saying, essentially, “What are you talking about? You can’t see a donut here.”

if there is an object matching the description DONUT, then it examines the object to see if it can find an action-handling method that matches EAT. If it can, it calls it, and that action-handling method does whatever needs to be done to handle the action: in this case, prints a message saying something like “Yum, the donut is tasty” and removes the donut from the model world.

If there is no action-handler defined for EAT on the Donut class or any of its ancestors, it prints a message saying something along the lines of “I don’t know how to EAT a DONUT.” (Though this will never happen: there’s an .eat() method defined on Noun, the ancestor of everything, so that will be invoked instead. This is handy if you type EAT TABLE: the rejection-by-default message defined on Noun will print “Sorry, the table doesn’t look edible.” for everything unless something lower down in the inheritance chain overrides it. So Food, a descendant of Thing, which is a descendant of Noun, has a handler for the eat() action that prints a generic message, which is sometimes good enough, and the Donut class can define a Donut-specific handler that overrides Food because you want to print a special message because Donuts are especially delicious. Similarly, you can write custom rejection messages for other classes: Horse.eat() might print “But horses are beautiful, noble creatures! You would NEVER eat one!”, whereas Human.eat() might print “The world is falling apart, but you’re not ready to resort to cannibalism yet.”)

All that being said, and given that data ontology, here’s the current, mediocre version of parsing_help.parse(), which handles only some special cases that need to be handled, then passes everything else to another routine to do the real parsing:

def parse(command):
    """Parse commands entered by the user and respond to them. Note that this routine
    only handles simple situations, delegating more complex multi-part commands to
    the routine multi_parse(), above.
    """
    su.debug_printer("the command is: %s" % command, 3, prefix="PARSING: ")

    # First, split the command up and check to see if any preprocessing needs to be done.
    command_parts = tokenize(command)
    command_parts = regularize_command(command_parts)
    su.debug_printer("there are %s parts to the command" % (len(command_parts)), 3, prefix="  ")
    try:
        # Next, check to see if there's just one word in the command.
        if len(command_parts) == 1:
            if command_parts[0] == "again":
                parse(globs.command_history[-1])
            elif command_parts[0] in extradiegetic_verbs:
                extradiegetic_verbs[command_parts[0]]()
            elif command_parts[0] in snowflake._all_verbs:
                getattr(snowflake, command_parts[0])(actor=globs.the_hero)      # If we can dispatch one-word commands through this proxy, do so.
            else:
                su.printer("Sorry, I don't know how to " + command_parts[0].strip() + ".")

        # If we haven't handled it yet, pass control off to the real parsing engine.
        elif len(command_parts) > 1:
            multi_parse(command_parts)
    except ParseError as the_complaint:
        su.printer("Sorry, I couldn't understand that. %s" % the_complaint)
    except SilentParseError as the_complaint:
        if str(the_complaint):
            su.printer(str(the_complaint))
    return command_parts[0]

ParseError and SilentParseError are exceptions that are caught at this level; they can be raised anywhere down the call chain to stop processing immediately if it becomes clear that the command cannot be processed. As you might expect, ParseError necessarily prints an “I couldn’t understand that” message, whereas SilentParseError does not necessarily do so.

So parse() is maybe inaccurately named because it does very little of the actual parsing work; it mostly handles special cases and dispatches more common commands to the longer and more complex multi_parse() routine, about which more in a minute. The only commands actually handled at this level are: (1) AGAIN, which repeats the last action (at this point, by re-parsing it, which is not ideal and this will eventually have to be re-written, but that’s for later); (2) extradiegetic (“out of world”) verbs, which are listed in a command-dispatch dictionary I’ll talk about in a minute; and (3) “snowflake” commands, those that need special handling, and which are listed in a “snowflake” command-dispatch dictionary that I’ll also talk about in a minute. (I initially named it that years ago, using the reasoning that “every one of these situations is different, but they can all be handled by an every-situation-is-different object”; that was before American conservatism was sneeringly applying “snowflake” as a label to anyone who isn’t a terrible person, or at least when I wasn’t as cognizant of that usage. In retrospect, I might rename it for exactly that reason, but that won’t happen today. I have other things to do today.)

The extradiegetic_verbs dispatch dictionary just maps specific verbs to functions that handle them, like so:

# Extradiegetic verbs don't increment the command counter or otherwise "take a turn".
extradiegetic_verbs = {'about': sc.do_about,
                       'brief': sc.do_brief,
                       'commands': sc.do_help_commands,
                       'credits': sc.do_credits,
                       'debug': sc.do_help_debug,
                       'exit': sc.do_quit,
                       'gender': gender.set_pronoun_preference,
                       'help': sc.do_help,
                       'hint': sc.do_hint,
                       'history': sc.do_print_history,
                       'inventory': sc.do_inventory,
                       'license': sc.do_license,
                       'load': sc.do_load,
                       'ponder': sc.do_ponder,
                       'quit': sc.do_quit,
                       'restore': sc.do_load,
                       'save': sc.do_save,
                       'score': su.print_score,
                       'script': su.ob.start_transcript,
                       'verbs': sc.do_list_verbs,
                       'verbose': sc.do_verbose,
                     }

Most of these are imported from another module, simple_commands, with import simple_commands as sc. Doing extradiegetic_verbs[command_parts[0]]() just looks up the first word of the command in that dictionary and calls the function listed there.

The “snowflake” mostly handles slightly more complex tasks, where an objectless verb needs to be translated to a verb-plus-object pair. it serves as a substitute Noun (note that it is not actually a descendant of Noun) that can be passed to the part of the parser that calls a method on the pseudo-Noun just as if it were an in-game Noun. So, for instance, here is part of its definition:

class SnowflakeDispatcher(object):
    def defecate(self, actor):
        """Refusal text."""
        sc.do_bodily_functions()

    def go(self, actor, direction_text):
        """Move in a direction."""
        actor._go(' '.join(direction_text))

    def hear(self, actor):
        """Listen for any noises."""
        sc.do_listen()

    def listen(self, actor):
        """Listen for any noises."""
        sc.do_listen()

    def look(self, actor):
        """Describe the current area."""
        sc.do_look()

    def smell(self, actor):
        """Smell the current area."""
        sc.do_smell()

    def sniff(self, actor):
        """Smell the current area."""
        sc.do_smell()

    def wait(self, actor):
        """Let a turn pass without doing anything."""
        sc.do_wait()

    def xyzzy(self, actor):
        """Refusal text."""
        sc.do_xyzzy()


snowflake = SnowflakeDispatcher()

So this is the object that transforms “go n” to the game’s internal representation of movement action: “north” is not an in-game object, unlike in certain other development systems. (Typing GO NORTH results in calling .go('north') for the Protagonist object, not in finding the north object and GOing it. Directions here are strings, not Nouns. For this reason, it’s handled outside the meat of the parsing loop, as a special case, because most of the parsing loop is focused on identifying in-game objects.)

The meat of the parsing is handled by multi_parse, which takes a tokenized list of strings and tries to match descriptions in it objects that are in scope:

def multi_parse(command):
    """Parse a multi-part command. Pass in the tokenized command list as COMMAND.
    Some special cases are understood and handled outside of this main parsing
    logic. Currently, these are:                #FIXME: should just be extradiegetic verbs
          * SAVE, LOAD, RESTORE;                #FIXME: this whole list needs revision
          * GO, MOVE;
          * any single-word command;
          * commands that begin with the verb DEBUG;
          * and maybe other things handled below in parse(), though I try to
            remember to keep this list more or less current.
    """
    from bin.core.nouns import Noun  # Avoid a circular dependency by not importing at the top.

    # This is the new v4 parser that evolved out of the v3 parser.
    # Last commit with v1 parser had SHA-1 hash of cec1b506508d058560981b06d212746bca9e4c5b.
    # 2nd parser was too complex, never worked well, and was never committed in Git.
    # Final commit for v3 parser had SHA-1 hash of 4bbccb4fae21a109d4974ba414e35feb4f020a3f (21 July 2016)
    # First work on v4.1 parser started: 30 May 2018.
    su.debug_printer("Tokenized command is: %s" % command, 3, prefix="PARSING: ")

    # Modifiers of various kinds get shoved into this dictionary. This will become the **kwargs parameter for the call
    # to the relevant object's verb method, once we've identified the relevant object and its verb.
    # Examples of modifiers understood include:
    #   actor           ->  who's performing the action. The grammatical subject of the action. (Default: globs.the_hero). IMPLEMENTED.
    #   using           ->  what tool is used in performing the action.
    #   about           ->  a topic of conversation.
    #   dest-*          ->  broken up automatically into:
    #     dest          ->    where to? -- for movement of things
    #     prep          ->    what preposition expresses the spatial relationship?
    #   direction_text  ->  tokenized description of the user's directional phrase (for snowflake.go)
    call_parameters = dict()

    # Clean up the command by removing any words that have been identified as NEVER being meaningful to the parser.
    stripped_command = [x for x in command if x not in fluff_words]  # FIXME: don't strip fluff words inside quotes

    # First: check to see if someone is being addressed directly.
    su.debug_printer("Checking to see if anyone is being addressed directly", 3, prefix="PARSING: ")
    comma = False
    for pos, word in enumerate(stripped_command):  # Find the first word in the command that ends with a comma
        if word.endswith(','):
            comma = pos
            break
    if not isinstance(comma, bool):  # Something ends with a comma. Let's see if it's a plausible addressee.
        possible_object = stripped_command[0:1 + comma]
        possible_object[-1] = possible_object[-1].rstrip(',')  # Take the comma off
        possible_addressees = [what for what in object_list_from_description(possible_object, default_scope()) if what._is_addressable()]
        if possible_addressees:  # Someone present is in fact being addressed here.
            su.debug_printer("taking the string '%s' to indicate that someone is being directly addressed.  ... potential matches are: %s" %
                (' '.join(possible_object).upper(), possible_addressees), 3, prefix="PARSING: ")
            possible_addressee = prune_possibilities(object_description=possible_object,
                                                     current_options=possible_addressees, the_verb=None, actor=None)
            if possible_addressee:  # If there's anything left, we'll assume the Creature being addressed is the first one in the list
                gender.update_pronouns_from(possible_addressee)
                stripped_command = stripped_command[1 + comma:]     # Remove the appositive from the command before parsing any more
                su.debug_printer("determined that the person being addressed is: %s" % possible_addressee, 3, prefix="PARSING: ")
                if possible_addressee._accept_command(the_command=stripped_command, who_commands=globs.the_hero):
                    call_parameters['actor'] = possible_addressee
                else:  # If the person addressed declines to do what the Protagonist wants, stop trying to parse: it's over
                    raise SilentParseError

    if 'actor' not in call_parameters:
        call_parameters['actor'] = globs.the_hero

    # Our next goals are: (a) to find one or more direct objects in the command and map the textual representation
    # that the player typed in zir command onto one or more in-game objects; and (b) to discover what verb method the
    # player's command needs to call on those objects.
    # Subtasks of (a) include parsing prepositional phrases.

    # Some commands don't map neatly onto the [verb] [in-game object(s)] model. For these special-snowflake cases, we
    # use the proxy SnowflakeDispatcher object to act as a virtual in-game object for the player's command. In these
    # special cases, the object is manually "found" by special-casing code in the parser and allowed to serve as
    # direct-object-to-which-calls-are-dispatched, just as if it were a real in-game object. Its methods then re-route
    # to whatever other calls are necessary to execute the command. These verb methods on the virtual proxy object
    # take **kwargs parameters just like "real" objects (i.e., descendants of nouns.Noun). In particular, they should
    # expect to receive an ACTOR parameter, just like other verb methods.

    the_verb = None
    direct_objects = [][:]

    # First, treat special cases.
    if stripped_command[0] == "debug":
        debugging.do_debug_command(stripped_command)
        return
    elif stripped_command[0] in ['go', 'move'] and len(stripped_command) == 2 and stripped_command[1] in movement_directions:
        the_verb = 'go'
        direct_objects = [snowflake]                #FIXME: we should be able to MOVE a Noun.
        call_parameters['direction_text'] = stripped_command[1:]
    elif False:  # Do whatever other special-situation processing needs to be done for other special situations.
        pass
    elif len(stripped_command) == 1 and stripped_command[0] in snowflake._all_verbs:
        the_verb = stripped_command[0]
        direct_objects = [snowflake]

    if the_verb is None:
        # All right, there are a number of assumptions about the command structure that we're parsing. ("Non-simple"
        # means commands that aren't handled by parse(), below.) These assumptions are:
        # 1. A (non-simple) command has this form:
        #       [ADDRESSEE], (VERB) (DIRECT_OBJECT) [and DIRECT_OBJECT and DIRECT_OBJECT .. ] [PREPOSITIONAL_PHRASE] [PREPOSITIONAL_PHRASE...]
        # 2. Some verbs can only take one direct object; those are listed in single_direct_object_verbs.
        # 3. Multiple direct objects have to be connected with AND. If there are more than 2, *all* have to be
        #    connected with AND. (No commas.)
        # 4. Prepositional phrases are sometimes optional, sometimes required to complete an action.
        #       - they are essentially modifiers, as in ATTACK RICK WITH SPOON or ASK STEVE ABOUT THE PUB
        #       - grammar is enforced with decorators: verb methods might be modified with, say, @MustSpecifyTool, or @CantSpecifyTopic
        #       - prepositional phrases MUST COME AFTER all direct objects.

        # We've already found any appositive leaders. What's next is the verb, or at least the first word of it.
        the_verb = stripped_command[0]          # Luckily, the simple imperative mood in English always begins with the verb

        # Check to see if there are any quoted (spoken or written) phrases in the command
        # Current limitations:
        #   * double quotes only
        #   * no nested quotes, period
        #   * ABSOLUTELY NO FUNNY BUSINESS WITH QUOTES
        if '"' in ' '.join(stripped_command[1:]):
            found_a_phrase = True           # Pass through at least once
            while found_a_phrase and '"' in ' '.join(stripped_command[1:]):
                opening_pos, closing_pos, found_a_phrase = False, False, False
                for i, word in enumerate(stripped_command[1:]):     # Skip the verb
                    if not opening_pos:             # We're looking for an opening quote
                        if word.startswith('"'):
                            opening_pos = i + 1     # Remember, we skipped the verb; we want this to be the position relative to the command as a whole
                    if opening_pos:                           # We're looking for a closing quote.
                        if word.endswith('"'):
                            closing_pos, found_a_phrase = i + 1, True
                            break

                if not found_a_phrase: raise ParseError("I'm confused by your use of quotation marks.")

                the_phrase = stripped_command[opening_pos:closing_pos+1]
                direct_objects += [objects.Phrase(_the_phrase=' '.join(the_phrase))]
                stripped_command = [][:] + \
                                   stripped_command[:opening_pos] if (opening_pos > 0) else [][:] + \
                                   + stripped_command[closing_pos+1:] if (closing_pos < len(stripped_command)) else [][:]

        # Now, check to see if the verb might be a phrasal verb.
        # FIXME: practically speaking, this method only works for two-word (not three-or-more-word) phrasal verbs.
        endings = possible_phrasal_endings(the_verb)
        if endings:
            su.debug_printer("possible phrasal verb endings %s detected for verb %s." % (', '.join([shlex.quote(e) for e in endings]), the_verb), 3, prefix="PARSING: ")
            for e in endings:
                su.debug_printer("Checking for potential ending %s in command. " % e, 3, prefix="  ")
                if e in stripped_command[1:]:               # We found the other part of our separable multipart verb
                    the_verb = "%s_%s" % (the_verb, e)      # Munge the verb name to match the method name for the object
                    stripped_command[0] = the_verb
                    stripped_command.remove(e)
                    su.debug_printer("Found it, and rearranged command to %s." % ' '.join(stripped_command).upper(), 3, prefix="  ")
                    break
                else: su.debug_printer("...not found.", 3, prefix="  ")

        # Now that we know what the verb is, figure out what the potential objects we might be trying to match for that verb are.
        search_path = get_scope(verb=the_verb, actor=call_parameters['actor'] if ('actor' in call_parameters) else None)

        su.debug_printer("  PARSING: After pruning, the stripped, verb-normalized command is: %s." % (str(stripped_command)), 3)
        su.debug_printer("    The search path for objects is %s." % str(search_path), 3)
        su.debug_printer("    The verb detected is '%s'." % the_verb, 3)

        objects_text = stripped_command[1:]     # OK, let's deal with everything after the verb.

        # Check to see if there are any prepositional phrases in the command.
        su.debug_printer("PARSING: looking for prepositional phrases.", 3)
        su.debug_printer("  Text remaining to parse: %s" % objects_text, 3)

        preposition_locations = [][:]                           # First, let's pull out all the prepositional phrases
        for i, w in enumerate(objects_text):                    # We start by finding the list indexes where each phrase starts.
            if w in prepositions:                               # Note that prepositional phrases have to come at the end of the sentence.
                preposition_locations += [i]

        if preposition_locations:
            su.debug_printer("  prepositions found:", 3)
            for i in preposition_locations: su.debug_printer("\t%d\t->\t%s" % (i, objects_text[i]), 3)
        else:
            su.debug_printer("  no prepositions found.", 3)

        for i, w in enumerate(preposition_locations):           # OK, examine and process each phrase individually.
            if i < (len(preposition_locations) - 1):
                the_phrase = objects_text[w:preposition_locations[i + 1]]
            else:
                the_phrase = objects_text[preposition_locations[i]:]
            su.debug_printer("PARSING: examining the prepositional phrase '%s'" % ' '.join(the_phrase), 3)
            key, what = prepositions[the_phrase[0]](command=the_phrase, actor=call_parameters['actor'])
            if key.strip().startswith('dest-'):                 # Special handling here.
                call_parameters['dest'] = what
                call_parameters['prep'] = key.strip()[len('dest-'):]
            else:
                call_parameters[key] = what
            if isinstance(what, Noun):            # Only try to update pronouns based on parser-accessible in-game objects.
                gender.update_pronouns_from(what)

        if preposition_locations:
            objects_text = objects_text[: preposition_locations[0]]    # Strip off the prepositional phrases at the end.

        # OK. Now, split the objects_text into a list of chunks (each of which is a list of words), separated by 'and',
        # dropping the 'and' each time. Each of these chunks will be parsed as a direct object.
        # This may result in just one part. In fact, it usually will. That's OK.
        direct_object_phrases = su.split_list(objects_text, 'and')
        su.debug_printer("direct objects split into: %s" % direct_object_phrases, 3, prefix="PARSING: ")
        for which_phrase in direct_object_phrases:
            if which_phrase:        # Don't try to match empty lists and other non-truthy objects
                su.debug_printer("about to find an object matching the description %s" % ' '.join(which_phrase).upper(), 3)
                the_object = object_from_description(which_phrase if (isinstance(which_phrase, list)) else [which_phrase], the_verb, search_path, actor=call_parameters['actor'])
                direct_objects.append(the_object)
                gender.update_pronouns_from(the_object)
                #FIXME: we have previously updated from prepositions, which come later. This is potentially confusing.

    if the_verb in single_direct_object_verbs:
        su.debug_printer("Note that %s is a verb that can take only one direct object." % the_verb, 3, prefix="         ")
        if len(direct_objects) > 1:
            raise ParseError('You can only %s one thing at a time.' % the_verb)

    # Now we've got a verb and a list of things to do it to. Let's do it to them.
    if not direct_objects:
        raise ParseError("I couldn't figure out what you're trying to interact with.")      #FIXME: find a missing dir. obj.
    for the_item in direct_objects:
        getattr(the_item, the_verb)(**call_parameters)   # Call the THE_VERB() method of each THE_ITEM with CALL_PARAMETERS.
5 Likes

Part 4: Parsing Notes, and Enforcing Grammar

I would like to say again that, although this handles a lot, it is still not a great parser. Inform, TADS, et al. give you a great parser. These ~220 heavily commented lines of Python give you a mediocre-to-reasonably-decent parser.

I’m omitting some helper functions here: regularize_command moves some elements of possible commands into canonical forms, such as changing “n” to ['go', 'north'], and making changes such as “eastward” to “east”, so that the main parser code only has to deal with canonical forms. possible_phrasal_endings helps to work with phrasal verbs in English, so that if the first part of the verb is “look” it knows that it needs to be aware that [‘in’, ‘at’] are possible parts of the verb itself and not necessarily prepositions, because prepositions also are used to separate direct from indirect objects, and this is a headache. (This parser assumes that direct and indirect objects are always separated by prepositions, and indirect objects must be preceded by a preposition. How is this enforced at the parsing level? That’s a great question and I’m glad you asked it and I’m not going to talk about it until later. But the short answer is “decorators.”) So if the player types PUT APPLE ON BED, then APPLE is the direct object, BED is the indirect object, and ON is a preposition specifying their desired relationship. The parser will barf if you type PUT APPLE BED instead of PUT APPLE ON BED because there is no preposition between the direct and indirect objects, so it looks for an object in scope for which APPLE BED is a fair description; and even if it finds one, it will complain that it doesn’t know what to put the apple bed on, because you didn’t specify an indirect object. How could you have? You didn’t use a preposition. Duh.

But here’s the basic strategy it takes: Given a list of strings (the tokenized command), it tries to identify all of the relevant bits of data that might have to be passed to an action-processing routine on an object. The direct object of the command itself is the object being operated on; and the verb is the name of the (non-underscore-named) method that’s being called on that object. There are a few other parameters that might be necessary, depending on the action that is being performed and what the user typed; these include the actor (who’s performing the action: usually, but not always, the PC); using (a tool that’s being used to perform the action: SHOOT TROLL WITH ARROW puts the arrow, if one is in scope, in the using parameter); about, a string or in-game object specifying a topic of conversation; dest and prep, a pair of parameters that specify the thing you’re putting the direct object on/in/under/etc., and which of those relationships it is; a few others. As parsing progresses, these parameters are assembled into a dictionary that’s passed to the action-dispatch method when the action method is called on a particular in-game object.

Processing runs through, roughly, this series of steps:

  1. Normalization of command forms, as touched on briefly above, but more exhaustively and boringly.
  2. Checking to see if someone is being directly addressed. This happens by looking to see if any words in the command end with a comma. If any does, the bit of the command before a comma is treated as a description of someone visible, and the parser tries to match that description to someone addressable. Alternate forms of order-giving are not supported, so the parser currently supports RICK, SHAVE YOUR STUPID-LOOKING BEARD but not TELL SHERIFF RICK TO SHAVE HIS BEARD. C’est la vie. The game also implements ASK/TELL in the form ASK RICK ABOUT STUPID BEARD but not RICK, WHAT’S UP WITH YOUR BEARD, because a comma always means that a command is being issued. C’est la vie.
  3. If no one is being addressed, the protagonist is assumed to be the one the order is being given to.
  4. Special processing is used for debugging commands, movement commands, and anything handled by the snowflake.
  5. If we still haven’t figured out what the verb is by this point, which is the case most of the time, we take a look at the first (remaining) word in the command (after dropping anything before a comma, which would be the name of whomever we’re addressing). In English, conveniently, commands begin with the verb, and if the verb is only one word, that’s the verb! Yay!
  6. We check for uses of quotation marks, which indicate a literal string, usually for talking or writing; the game supports SPRAY “YOU SUCK” ON WALL, and SAY “HELLO” TO BARBARA. Quotation-mark handling is awkward and difficult and the parser currently only supports double quotes with no interior single quotes, and no nested quotes, and ABSOLUTELY NO FUNNY BUSINESS WITH QUOTES. If there is a quoted phrase, there need to be both opening and closing quotes. We construct a Phrase, an obscure descendant of Noun that contains a list of words that are treated literally by other code that knows how to handle Phrases. That Phrase is the direct object.
  7. On the other hand, commands are sometimes made of verbs that, practically, have more than one word as part of the verb: LOOK IN as a synonym for search. Irritatingly, English, because it’s a Germanic language, doesn’t always keep the second and subsequent parts of the verb right next to the first parts. Even worse, whether a phrasal verb requires that the parts be kept together or split apart is a regional and/or age and/or class difference for many verbs: some people find TURN BLENDER OFF to be the natural formulation, whereas others prefer TURN OFF BLENDER. (Zach de la Rocha encourages you to “turn on the radio,” whereas Lisa Loeb recounts how, habitually, “I turn the radio on, I turn the radio up.”) We really want to support both. So we check to see if the first part of the verb might possibly be a phrasal verb: if the first word in the verb is LOOK, we check to see if there’s an IN or AT later in the command, and if there is, we rearrange the command so it puts the two parts of the phrasal verb together. This might produce a command that a native speaker wouldn’t think to produce naturally, be we don’t display the intermediate stages of parsing to the user anyway, so it doesn’t matter. (Side note: action-dispatch methods on Noun descendants use underscores to represent the spaces between words in a phrasal verb, and the parsing routines massage this as a late step, so Noun and descendants have a .look_in() method that’s a synonym for .search(), and so forth.)
  8. Once we’ve pulled out any prepositions that are actually part of phrasal verbs, we locate all other prepositions in the parts of the command that we haven’t yet parsed. Then we break the command into the phrases in between the prepositions.
  9. Then, for each of those phrases in between the propositions, we track what the preposition preceding it is, and mach each noun phrase to an object in scope if possible. The noun phrase that’s not preceded by a preposition is the direct object, the other noun phrases are indirect objects, and the preposition used indicates what their role in the sentence is. So for the phrase PUT THE BIRTHDAY CAKE ON THE BED WITH THE SHOVEL, we wind up tracking that the direct object is BiRTHDAY CAKE, the verb is PUT, and the indirect objects are {'using': shovel, 'on': bed}. Identifying the objects in scope is handled by another helper function, get_scope(), which is complex and not shown here.
  10. If any noun phrase can’t be matched to an object in scope, the whole parsing process fails and a message along the lines of YOU CAN’T SEE ANY BIRTHDAY CAKE HERE or THE TINY SHOVEL MADE OF PINK PLASTIC IS TOO SMALL TO SUPPORT THE BIRTHDAY CAKE is printed. Currently, there is no handling of partial correction of erroneous commands, which you get for free with Inform et al.; the player has to retype the whole command on the next turn. (Fixing this is going to require adding a whole other level of abstraction to the parser.)

Some subtleties have been glossed over here. One is that multiple direct (but not indirect) objects are supported provided that the word AND occurs between them. (Inform et al. supports comma-separated lists of direct objects; this parser doesn’t. Commas always always always mean someone is being given a command.) There is also some support for pronouns; this is provided by massaging during preprocessing. I also haven’t talked about disambiguation at all, which is a whole other kettle of fish.

Once the command has been broken down and all of its parts have been identified, we have a list of direct objects, a verb, and a dictionary of indirect objects. The actual work of dispatching is done by iterating over the list of direct objects, looking up the relevant method for them using getattr(), and expanding the list of keywords into a parameter list by using double-star expansion. Doing this looks like:

    for the_item in direct_objects:
        getattr(the_item, the_verb)(**call_parameters)

The command PUT THE BIRTHDAY CAKE ON THE BED WITH THE SHOVEL gets translated into a direct_objects list holding one object, the birthday cake object, presumably an instance of Food or a subclass; the verb becomes the .put() method of that object, and the call that’s actually made becomes [the birthday cake object].put(using=[the shovel object], on=[the bed object]), because double-star expansion translates a dictionary into keyword parameters. Then it’s the job of the Food class’s .put() method to handle those parameters and that object. (Or perhaps a higher-level superclass: there’s no particular reason why Food would need to override the .put() method from the Thing class that I can think of offhand.)

So this works! Sort of. Mostly. For many common cases. Provided the player understands the system deeply and is cooperative and competent. It gets you handling of things things like SHOOT RICK WITH THE SHOTGUN, finding the Rick object and the shotgun object, if they’re in scope, and calling the Rick object’s .shoot() method and passing the shotgun object to the Rick object’s .shoot() method as a using= parameter. Or you can type LILLIAN, PLAY PIANO and, assuming both Lillian and a piano are visible, it will invoke that piano’s .play() method while passing the Lillian object to the method’s actor= parameter. Or you can type MOM, SLAP ME AND RICK and, if your mother and Rick are both visible, and if the persuasion approval rules (which we haven’t discussed) succeed, she’ll go ahead and slap first you, and then Rick. Similarly, you can DRAG COUCH TO DOOR to construct a barricade, SNIFF THE PIZZA, etc. etc. etc. and, as long as a handler for the appropriate action has been written for the relevant object’s class, or for one of its superclasses, there’ll be a message printed in response. Perhaps it will be “Yup, that smells just like a pizza,” if one of the generic, default handlers way up the chain winds up getting invoked; and if I want to write a more specific response for the pizza object, I can write a custom response by defining a .smell() method on the Pizza class that says overrides the default to print something like “Of all of the many smells God created, surely pizza is one of the finest.” It’s easy to customize items by attaching overriding methods to the class that the object in question is a direct instance of, and it’s easy to override a message for a whole class of items by writing a method for a common superclass that prints a custom message for all descendant classes. Because Python supports multiple inheritance, it’s even easy to write mix-in classes that override certain types of behavior for certain subclasses that are not direct descendants of each other in their primary line of descent, provided you understand the complex nitty-gritty details of how inheritance works in Python. (This is another case of “I told you you’d learn a lot about Python by doing this.”) You can even monkey-patch an object to change its class in the middle of a run, if you really need to; this is very much like trying to change an oil filter while your engine is running, but it’s possible and comes in handy if, for instance, an NPC is bitten and you want them to now be a member of class ZombieHuman instead of Cynic. Making substitutions like that can change whole swaths of object behavior all at once: the previously human character now gets all of the default Zombie behaviors.

The burning question then becomes “how do you enforce basic requirements for the grammar of various commands?” Because you want to both avoid nonsensical commands (EAT BIRTHDAY CAKE USING SHOTGUN) and commands that contain a verb and direct object, but not enough information to fully specify the action required (BARRICADE DOOR isn’t good enough; we need to BARRICADE DOOR WITH SOFA – remember that one of the ways in which our parser is suboptimal compared to Inform or ZIL is that it doesn’t do partial parsing at all; it just prints an error message saying what’s wrong and tells the user to try typing the necessary command again, only right this time). The underlying problem is one of interfaces to methods of objects, which specify what they need in their declarations. For instance, the declaration for Door.barricade() looks roughly like this:

class Door(Thing):
    def barricade(self, actor, using):
        [...]

So if the player just types BARRICADE DOOR, the parser will happily fill in the self parameter for the door, disambiguate if there’s more than one door in scope (“Do you mean the door to the living room or the bathroom door?”), and it will fill in the actor parameter with the object referring to the protagonist, because it always already does that anyway; these are common enough tasks that the parser accounts for them. But it can’t fill in the using parameter, because that’s where I’ve drawn the line: parameters other than the direct object and the actor need to be specified manually, or the parser would balloon out of control. My personal boundary on this issue was “the direct parsing routines only supply values that must be supplied in EVERY action; it doesn’t try to figure out values for parameters that are specified in indirect objects or other less-common grammar situations.”

Your mileage may vary and your priorities may be different, of course. Perhaps you want to bloat the multi_parse() routine to a thousand lines: it’s not an inherently evil way to approach the problem, just one that introduces new complexities of its own. For my own part, I prefer to handle the “you must supply additional parameters in your command” issue in other, smaller chunks of code outside of multi_parse(). Because if we’re not spitting off code that validates that all necessary information has been supplied into smaller chunks of code that we can write elsewhere, we’re back to having big tables of verbs and what each verb requires: OPEN doesn’t require anything special, but BARRICADE requires a using parameter; for UNLOCK, you can optionally specify the relevant key with using, and …

This is all kind of ugly; worse, it’s hard to keep the tables of which verbs require which parameters in sync with the code that handles the actions themselves, and letting them get out of sync introduces crashes and other errors. There’s also the issue that the same English word can mean different things and be handled in different ways depending on context: SCREW BOLT WITH SCREWDRIVER requires a using parameter, but the Person.screw() method doesn’t (and just prints a rejection message. It’s not that kind of game. Not that there’s anything wrong with that). This is currently handled rather easily simply by defining the relevant action-handling methods differently on the different object classes: Bolt.screw requires a using parameter, Person.screw() does not. But this would get complex if we had to draw up a grammar table for each verb: we’d have to account for different situations in that table itself. Boo! Plus, having to maintain a separate set of big tables is one of the things that we’ve successfully avoided doing so far, and it would be a shame to have to give up on that now.

So far we’ve been successful at keeping all of the information about handling actions at the level of the action-handling code itself and relying on Python’s introspection facilities to deal with the rest. (Well, ALMOST all. There’s a list of verbs that might be phrasal, and a list of verbs that aren’t allowed multiple direct objects. These are small enough lists that eliminating them would be more work than maintaining them, though.) What we really want is to keep it that way, not to throw our hands up and start writing a big set of tables that are inevitably going to get out of sync from time to time during development.

Here’s the problem more specifically: as the system has been described up to this point, if a user just types BARRICADE DOOR, the program will crash, because the .barricade() method on the door class requires three arguments: the Door instance (which is mapped to the self parameter because that’s how we’ve set up our system, and because the first parameter to a method call is always the self instance in Python); and the actor parameter, which is supplied to every action method by the parser itself; but nothing supplies the using= parameter to the Door's .barricade() method if the user doesn’t remember to add USING KITCHEN TABLE to BARRICADE DOOR, so it won’t be filled in in the keyword-arguments dictionary that gets expanded with the double-star notation in getattr(the_item, the_verb)(**call_parameters), and the game will crash with a message like TypeError: barricade() missing 1 required positional argument: 'using'. I think that we can all probably agree that a game that crashes because the user was insufficiently specific in typing their command is not a very well-written game.

So the question then becomes “How do we add logic that steps in right before the method is called on the direct object and make sure that everything is specified that should be specified, before we get to the point where we actually try to call the direct object’s method and the game crashes?” This is a classic validate-the-data-before-passing-it-to-the-function problem, and Python has a really good advanced feature that can do exactly that: decorators.

Decorators are a Python feature that allow functions or class methods (classes, too, though that’s not relevant here) to have additional logic added to them without having to re-write the function or class method itself. You can use them for a lot of things, including validation, and they’re a way to separate out the validation logic from the method that’s being validated. There’s no reason why you couldn’t just put the validation logic at the top of every single method that needs to have it; but applying a decorator only takes one line of code, and, once the decorator’s been written, applying it is easier, and easier to remember to do, than inserting several lines of boilerplate code at the beginning of each method that needs to be validated. (It also makes the syntax clearer, as it turns out.) They also help to make it clear at a glance which rules apply to which object methods.

Decorators are (usually, and most naturally) applied with the @ sign, above a method. So the declaration for the .barricade() method on the Door object looks like this:

class Door(Thing):

    [ ... more definitions applying to Door ...]

    @MustSpecifyWith
    def barricade(self, actor, using):
        [ ... the code handling the BARRICADE command ...]

    [ ... more definitions applying to Door ...]

This applies a decorator called MustSpecifyWith to the Door's .barricade() method. MustSpecifyWith is a function that runs before the .barricade() method, calls the .barricade() method (or any other method or function that it decorates), then continues running after the .barricade() method is done. This means that it can intervene to manage what’s passed to the .barricade() method; can avoid calling it entirely; can massage the results passed back from the method, if it wants to.

What the MustSpecifyWith function does is simple: it checks to see if the using parameter was passed in to the .barricade() method. If it was, it goes ahead and calls the .barricade() method. If using wasn’t passed in, instead of calling .barricade() with too few parameters and letting the game crash, it raises ParseError. That ParseError that it raises bubbles back up through the call stack, out of the decorator, out of multi_parse(), and back to the try: statement in the parse() method, which called multi_parse(), which called the Door.barricade() method, in which the @MustSpecifyWith decorator intervened. The try/except prints a message that boils down to “You need to say what you want to barricade the bedroom door with,” and the handler at that level lets the outer scope – the game’s main loop – know that nothing happened this turn, and the process begins again: the user is prompted for another command, the command is broken down and parsed, and if special cases aren’t handled, the whole process of trying to match noun phrases to objects in scope, determine a verb, determine if an order is being given to an NPC, determining action parameters, and dispatching to the action-handling methods of the direct object(s) that was/were located, possibly validated by decorators, happens again.

So all the @MustSpecifyWith decorator really does is step in right before the action-handling method is called and safely abort if everything necessary isn’t there, jumping out of the whole parsing and action-handling loop and explaining why the door didn’t get barricaded.

The MustSpecifyWith function is written like so:

def MustSpecifyWith(func):
    def when_called(*args, **kwargs):
        if 'using' not in kwargs:
            raise ParseError("Please try again, and specify what you want to %s with." % func.__name__)
        if isinstance(kwargs['using'], nope.Nope):
            raise SilentParseError(su._decode("It doesn't look like {[spec]} will help {[spec]} to {[str]}.",
                                              kwargs['using'], kwargs['actor'], func.__name__))
        return func(*args, **kwargs)
    return when_called

This is a bit of a simplification to illustrate the point, and it omits some of the implementation details that help to support introspection by other code. It’s a closure-based decorator, one of the basic Python decorator patterns: when it’s executed, after the method it’s decorating has been defined, the function when_called, defined insie the decorator itself, is substituted for the method being decorated, and the function func is stored as the function that’s going to be called. When something tries to call the original method, it instead (unknowingly) winds up calling the when_called function that was returned by the MustSpecifyWith decorator. That when_called function executes, checking that the using parameter was included in the kwargs argument parameters dictionary. Once it’s verified that the keyword arguments include a using parameter, it calls the function that was stored in the func variable when the decorator was applied. That func variable holds a reference to the original method that was decorated, which is the .barricade() method of the Door class. The upshot is that:

  1. When the .barricade() method is defined, the decorator replaces it with the decorator’s own when_called() function, and it stores a reference to the original function being replaced – the .barricade() method of the Door class – as the decorator’s func attribute.
  2. When other code tries to call the .barricade() method, it instead winds up calling that when_called() function that was substituted for it. when_called(), when it’s called, checks that the parameters that were passed to it include a using parameter, then it goes ahead and dispatches to the original .barricade() method of the Door class, which it had previously stored.

If all of this is totally new, there’s a pretty good and fairly in-depth primer on Python decorators here.

There are of course other decorators to enforce other rules, including …

  • @MustSpecifyTopic, which is applied to the handlers for ASK and TELL to enforce the need to specify a topic of conversation;
  • @MustSpecifySource, which is applied to verbs like FILL to enforce the FILL TANK FROM PUMP syntax;
  • @MustSpecifyDest, which ensures that a phrase like PUT DONUT is followed by a phrase like ON TABLE,
  • @DestMustBeContainer ensures that indirect object onto/into/under which the player is trying to place the direct object is in fact a Container.

There’s also another decorator that’s automatically applied to almost every method of every single descendant of Noun, a decorator called NoExtraParameters. Predictably, this decorator simply raises a ParseError if any parameters are passed to a function other than those that are specified in the function header.

How does that decorator get applied to several thousand methods throughout the code base automatically? That’s the role of the StandardGrammar metaclass that was mentioned briefly in the initial discussion of the Noun abstract class, way up above. A metaclass is an abstraction that’s responsible for controlling how classes – not objects, but classes – get created. (What a class is to objects, a metaclass is to classes. What’s an object an instance of? A class. What’s a class an instance of? A metaclass. What’s a metaclass an instance of? Also a metaclass. That’s the top of the conceptual hierarchy.) So the StandardGrammar metaclass intervenes in the in the creation of Noun and all of its descendant classes, automatically applying the @NoExtraParameters decorator to most of the action-handling methods that each class defines.

StandardGrammar looks like this:

class StandardGrammar(type):
    """A metaclass for nouns.Noun. For Noun and descendants, it decorates verb
    parameters appropriately with decorators to enforce a standard grammar.

    Note that this is only one of many places in the code that assumes that in-game
    nouns obey the rule that any of their method whose name does not begin with an
    underscore is an in-game verb performed on that object.

    Adapted from Mark Lutz's *Learning Python*, p. 1402.

    Standard Grammar currently means that verb methods are wrapped with ...
    * @NoExtraParameters (unless the method's arguments list allow **kwargs.)
    * Nothing else, for now.
    """
    def __new__(meta, classname, supers, classdict):
        for attr, attrval in classdict.items():
            if type(attrval) is types.FunctionType and not attr.startswith('_'):
                if not inspect.getfullargspec(_unwrap_function(attrval)).varkw:
                    classdict[attr] = NoExtraParameters(attrval)
        return type.__new__(meta, classname, supers, classdict)

I’ll forego digging into the details of exactly how that works, except to say that it intercepts the type() call that normally happens automatically when a class is being created and adds some extra logic around it to apply the decorator automatically.

So there you have it: a data ontology and a parser that understands it, interpreting commands and dispatching to action-handling methods defined on the classes to which in-game objects belong. There’s a system for dealing with more than a simple verb-noun parser, and a system for enforcing grammatical rules that apply to commands. It’s a reasonably flexible parsing system that’s tied into a world model and leverages Python’s class system to make things easy to introspect. Action-handling code is grouped together with the objects it operates on, and it’s easy to specify default behavior high up in the class tree and then override it for certain types of objects lower down. All in all, it’s a decent parser system, I think.

But – and here I’m going to beat that horse again (even though hoses are beautiful and noble creatures) – it’s still not a great parser system. It’s missing a lot, and it’s not flexible in ways that it should be. There’s no current way to get processing of ALL (as in TAKE ALL), which is something players expect to have, for instance. And it makes no attempt to store the results of partially completed parsing and ask for clarification, which gets annoying if you make many typos. Its system of having to dispatch an action to a single direct object makes it awkward to deal with multiple direct objects, and there are probably ways that this could be exploited to do things that would likely make purists cry “unfair!” in at least some circumstances. (The current system tries to avoid this by maintaining a list of verbs that aren’t allowed to be applied to multiple direct objects, in a single turn, and this is one of several ways in which verb information is stored in tables or lists instead of being grouped on an action-handling method.)

Less obviously, it brute-forces the parsing problem and isn’t flexible with the kinds of grammar it can parse. Everything it can understand is a fairly small set of variations on a single basic pattern. It can’t deal with alternative basic sentence constructions at all; it doesn’t try to match objects to descriptions based on a set of flexible grammar-declaration string patterns as in, say, BNF; it just has a hardcoded notion of where in the sentence various things will happen to fall, based on a set of rough heuristics hard-coding informal knowledge about how certain things in English tend to work. If that basic pattern fials, it doesn’t have any way of trying alternative “readings” of the command to try to extract sense (and the thing that’s most likely to trip it up is misreading a preposition as part of a phrasal verb, which there’s currently no way to correct for without re-engineering the whole system).

This also means it’s useless for parsing languages other than English, if that ever becomes something the game or its underlying engine wants to do.

More abstractly, there’s no good way to override processing in particular circumstances without ripping into the whole parser engine to account for the exception. If I wanted to add a magic ring that makes it possible to pass through locked doors, I’d have to modify several parts of the code base to check whether the player is wearing the magic ring; in Inform, I could simply declare something along the lines of The can't pass through locked doors rule does nothing when the player wears the glowing ring.

How much all of that matters depends on how restrictive it is for the story you want to write. The system I’ve written is mostly adequate for a story that revolves around objects. It’s less satisfying for a story that turns on relationships and conversation, or that otherwise needs to work at more abstract levels. And it’s always going to be more work – substantially more work – to write a piece of parser IF in Python under this system than in a domain-specific language like ZIL or Dialog or TADS. The end result is almost certainly going to be more polished, too, though how much more polished depends on how much work I want to put into polishing it.

5 Likes

First of all, thank you for writing a fantastic job explaining how Python can be used to implement IF game. As you know, I’m writing TACK from scratch, and I find it enlightening, indeed. However, our approaches are basically diametrically opposite from one another. Think of it as your CISC vs my RISC. ScottKit is just tables, really, and that’s how I approach things.

The parsing discussion is interesting because I had to deal with the same issues. I decided to basically force preposition.

GIVE TROLL THE APPLE maybe expected, but it’s a lot easier to demand that the players type GIVE TO TROLL THE APPLE. Note that ZIL has parser function to swap direct and indirect objects in this case. If you do, then prepare to parse LOOK AT OBJECT, as well as the usual LOOK OBJECT.

Speaking of which, ZIL processes the sentence into 3 parts: PRSA, PRSO, PRSI, which stand for Action-Object-Indirect. There’s also something about character, that you can give orders to, but I’m fuzzy about that.

And although you’re right that a lot of things need to be done just to get the game going, realize that it only needs to be done once. As Steve Meretzky put it, you should just copy and paste parts that carry over to the next game, and only write stuff that is new to the next game.

  1. Is not a good reason. Or even a reason. You can learn Python by writing a lot of useless little codes, and arguably learn more that way. What I found out is how little you need to improve ScottKit, which is just a big 2D array and some tables, processing the input. No need to get fancy there.

  2. Yes. I ran out of patience waiting for good Graphic support for existing IF language, and decided to write my own. Although maybe not included on my first IF work. Gotta get the parsing going first!

  3. Uh, you got me there. That’s me. Absolutely.

1 Like

Amazing writeup, @patrick_mooney – you should publish it as a blog post or something outside of this forum.

You should also definitely publish your IF parser as a Python library/module on Pip when it’s done. There seem to be many people interested in writing IF in Python, but all of them seem to homebrew their own parser system (probably because of all 3 reasons you listed, since @tundish’s Balladeer does exist and people could use that if it fits their needs), adding to the problematic Python IF ecosystem.

3 Likes

Glad to be helpful! I’ve been away from the forums for a while because life has been busy but recall the “what language to write in?” poll from … early January, I think? I haven’t found updates since then but still have a hundred or so threads that I want to read. Feel free to point to specific threads if you’ve been discussing it, though. I’d love to see how it’s going.

Indeed, very different approaches! There’s a lot to be said for simple table-based lookup of object attributes, too: it can be done easily on 8-bit systems without straining them, for one thing, whereas you’re never going to get a fully-fledged object system with functions as first-class objects on, say, a Radio Shack Color Computer 2 or an Apple ///. And in some ways it’s conceptually purer: once you start using a system that models in-game objects as Python objects, you then have to deal with all of the implications of that. Coding is a kind of wizardry, and one of the basic rules of wizardry is that everything has to obey the rules of the shape that it takes. If you build a dense simulation, the simulation rules can become cumbersome. If you build a thin veneer of plausibility, you can be a lot more flexible.

For my own part, I like the abstraction capabilities provided by a high-level abstract language and would prefer to avoid having to deal with low-level things like memory allocation, making sure that the DATA statements encode the information in the order that the READ statement expects, etc.etc. etc. But that’s an preference, not a moral judgment. I also wrote enough bad parser IF in BASIC and Pascal in the 90s that I have a horror of trying to solve all those same bugs again, even though I’m a better coder now.

Notice that this is one of the situations that can’t be handled by the current Zombie Apocalypse parser for reasons that weren’t mentioned above (but I think are implicit in the discussion): the current ZA parser has to get the direct object first, and indirect objects have to be separated from each other (and from the direct object) by prepositions, because that’s a basic constraint on the parser in its current form. (It’s also the most common way for that to work in English, but that’s not universally true for every valid English imperative-voice statement that native speakers would accept as “grammatical” and “natural.”) So the current ZA parser would accept GIVE APPLE TO THE TROLL but not GIVE TO THE TROLL THE APPLE because it has no facility for trying to separate noun phrases from each other besides looking for prepositions between them.

One of the things I am proud about having done with this parser is making it possible to use multiple indirect objects without having to overhaul the parser completely, because nothing enforces an arity of 3 (or lower) on commands, as many IF systems do: you can PUT THE CAKE ON THE BED USING THE SHOVEL, which has two indirect objects. This is largely because the parameters are packed into a Python dictionary, which can theoretically be any size, instead of a set number of variables This opens up a lot, but still doesn’t get us the full-fledged Inglish parser of The Hobbit, which was still more flexible.

Another thing I like about this system: because it takes multi-word verbs into account, you don’t need to muck about with switching direct and indirect objects around, because that AT won’t be read as a preposiition, but as part of the verb. I just define a look_at() method on the Noun base class, something like this:

class Noun(object):
    [ ... many declarations ...]
    def look_at(self, **kwargs):
        self.examine(**kwargs)

and add 'look at' to the list of known phrasal verbs.

This is true; it’s just that there so much of it that needs to be done, and of course it would need to be modified in any second game, at least to some extent. I recall reading somewhere that “using WordPress to manage a site that isn’t blog-oriented feels like teaching a hippo to fly.” That’s very much like what it feels to try to do a good job of writing parser if in a general-purpose programming language: I keep thinking “Just these two or three more things, and then the framework will be pretty much done and I’ll have reached the tipping point where I can mostly be working on the puzzles and the story and not so much on the framework.” But that tipping point has been a mirage in the distance for more than the last year of development on a project that’s now nearly 20k lines of Python. I keep thinking “At last! I’ve more or less got the framework in place!” and I start gleefully sprinting down the road of story- and puzzle-writing for all of two days before I trip over another missing piece of the framework.

The last thing I was working on in ZA before I put it on the shelf and started working on an Inform project was a simple making-coffee-for-your-mother puzzle – not even so much a puzzle as a task, something to keep the PC busy and give them something to do while some action plays out in the background (your mother is watching the morning news on TV and finding out about the rise of the zombies). It seemed like an easy enough thing to write – I have objects, devices, containers; a facility for turning coffee beans into coffee grounds; a working coffee pot – and what I discovered was that I have little in the way of mechanics for handling the scene as a whole. What i wanted – I realize just now, as I’m looking back at the experience --was something very much along the lines of scenes in Inform. But there’s currently nothing at all like that in the underlying framework for Zombie Apocalypse.

Which is when I put it on the shelf and started working on my current Inform piece. I’ll go back to it someday. I’ve barely even started writing the bits that parody The Walking Dead! But this Inform project is far faster. It’s nice to be able to just write story and puzzles and learn a new language without having to build a whole underlying framework.

I’d like to politely disagree. It’s by definition a reason for doing something, insofar as it’s a motivation that drives a human to adopt a course of action, it’s fair to call it “a reason”: it explains a chunk of why I started writing the program in the first place. It’s arguably even a good reason: I have in fact learned an awful lot about writing Python from my experience writing Zombie Apocalypse.

Weeeeelllll … sort of.

You can learn a lot of Python language features by writing short little scripts. You can stumble across a lot of features of the language by reading a good book on Python (I really like both of Mark Lutz’s books, even though they’re getting dated these days) and then using Python to solve your problems. Or, you can go through (for instance) the Codecademy course on Python for free, then starting automating your everyday tasks and Googling when you need answers. both of these approaches will teach you a fair amount about Python along the way. A language needs to be learned by using it, ideally to produce new things out of the language, and that’s true for programming languages just as it is for natural languages. You can even force yourself to practice new language features just for the sake of learning new language features (“next time I’m iterating over something I’m going to force myself to write a generator instead of a list comprehension”).

But writing something really big is a separate kind of learning experience that teaches you things about the language you’re using, too. When I started writing Zombie Apocalypse, I’d written some Twitter parody bots and a few other automatic text generators and CGI quiz scripts and a few thousand lines of code across a dozen scripts to help me automate my digital-photo-processing tasks.

But none of these was a really big project. None of them ever hit a point where I realized that I still had a long way to go and so I’d better dig in and change the bad idea now instead of suffering with it for a few more tens of thousands of lines (as when I realized, maybe 2.5k lines in, that if I wanted it to be possible for NPCs to take actions, I was going to have to modify the calling parameters list for every single action-handling routine I’d written so far, so I went back and modified a few hundred parameter lists by hand). None of the earlier projects has required that I think that hard about data representation and then work extensively with the data representation I’d settled on. Nothing else has required that I specify and develop a system, and then work extensively with that system that I myself had specified and developed, and also to modify it when the specification turns out not to be what I needed. At lot of what I learned early about Python decorators and metaclasses was directly driven by Zombie Apocalypse, because decorators and metaclasses are things that are far more useful in large code bases than in smaller one-off scripts that top out at a few hundred lines. (Why write a decorator if you can just write a helper function that you’re only going to need three or four times? Decorators become attractive when you starting thinking about needing to apply them dozens of hundreds of times, and want the fact that the rule is applied to be visually explicit at the level of the class’s method declaration to distinguish them easily from the dozens or hundreds of method they’re not applying to. Similarly, why write a metaclass to automatically apply decorators when you’re talking about having to put a decorator on a few dozen methods? Just copy and paste. But when you start thinking about putting the decorator on three or four thousand methods, a metaclass that does it for you starts to look attractive. There are things that only happen at scale.)

All of these aspects were real and meaningful learning experiences. Lots of small projects can teach breadth, but depth is worth learning too.

But it’s also important to distinguish between two things: “wanting to learn Python” is, I think, a relatively good reason to write parser IF in Python, because it will in fact teach the writer quite a lot about Python. I imagine the same is true of C, a language I hardly know at all.

That’s not the same thing as saying that it’s worthwhile to release the learning projects that were written to learn Python, though. I’d like to finish and polish and release Zombie Apocalypse some day, but that’s going to depend on how good it is, not how much I personally learned form it.

That’s a great example: Inform, TADS, et al. have some graphics support, but it’s hard to do more than display static pictures with them, and you again have the “teaching a hippo to fly” problem if you’re trying to do anything substantially more elaborate.

I would add a few other things, like network access, which is basically impossible in languages that are concerned with keeping security files sandboxed. And with good reason: but that’s a tradeoff, and one that rules out writing certain kinds of games. Getting most dedicated IF languages to interoperate with other ssytems in general is a bit of a slog, even for relatively simple tasks, as far as I can tell.

Me too. I’m admittedly a rather stubborn bastard.

3 Likes

Thanks for the compliment! I’m very glad to be helpful.

You’re absolutely right, thank you for the suggestion! I’ve republished it on my own blog: part 1, part 2, part 3, part 4

I had forgotten about Balladeeer, and I should take another look at it. It didn’t exist when I first started the intermittent poking at what is gradually becoming Zombie Apocalypse, way back in 2015. There were other systems that did some of what I wanted, but they were either abandoned long ago (who wants to adapt a system someone wrote in Python 2.5 fifteen years ago to modern Pythons?) or just toyboxes (two-word noun-verb parsers are out there) or that have very restrictive licensing terms.

There’s two things currently preventing me from sharing the whole code base:

  1. It hasn’t even been used to build a whole thing yet, just a part of one. It’s still a toy box until it’s been used a s tool to build a real thing, ideally one that’s been released. It’s inevitably going to get better-tested and be more configurable as I finish off the game that’s being built on top of it. That won’t happen soon because I’m working on finishing an Inform work first.
  2. It’s deeply tied to the rest of the system that it’s part of, and trying to disentangle it would be a lot of work. It may be that it needs to be released along with a standard library (Noun and its major descendants, plus a lot of other helper code) to be adaptable to other works. Or it may be that it can be abstracted (“you just need to provide a scoping mechanism and make sure that your in-game objects are Python objects that always have these properties”). Neither is going to be something that can be done while it’s still in flux because it’s a toy box, though.

For what it’s worth, if the sample code here is useful to anyone, I do hereby license it under the GPL, either version 3 or, at anyone’s option, any later version. People who are unhappy with the GPL license terms are welcome to ask for me to grant them another license. I’m a rather amenable fellow. :slight_smile:

4 Likes

Great thread and content. Well work a permanent bookmark.

Thank you.

2 Likes

Thank you so much for taking the time to write this out! This is more than I could have ever asked for. I am in the exact boat you’ve described, I am building this game for the main purpose of learning python. It fascinates me how much knowledge people are willing to share in this forum. Thank you once again!

3 Likes

I’m really glad to be helpful!

1 Like