Python Parser anyone?

patrick_mooney · March 1, 2022, 9:30am

Apologies in advance for a long, rambling answer. It’s offered in the hope that it’s helpful on multiple levels. I’ve spent some time going down this path, so I’ve taken some time this evening to explain what’s down it and to share some code. The code is the main part of what implements a reasonably decent parser in Python for a model world that’s implemented in a specific, reasonably Pythonic way. It’s certainly not the only way to do this, and it’s definitely not the best way. But it’s a reasonably good way, I think, and I’ve explained what I’ve done and why and what some of the trade-offs are.

And I just discovered now, after writing this, that there’s an upper limit to how long forum posts are allowed to be; this is by far my longest post ever here! So I’m going to have to break this up into multiple pieces. Sorry for that.

Part 1: Introduction and Why You Might Not Want to Do This

How you write your parser depends a lot on how your data ontology for the rest of the program is set up. Think hard about this before you put much effort into writing your code, because it’s much easier to do a good job early and then build on that than it is to try to change the way a complex system you’ve already built operates.

I’ve got a half-written piece of parser IF in Python, and I’ll talk in a bit about how the parser works, but first I want to encourage you to think hard about whether you really want to write a piece of parser IF in Python instead of in TADS, Inform, Adventuron, Dialog, etc. Other people have been encouraging about the process, and I won’t deny that writing an entire game from scratch in a general-purpose programming language is a fun task in a lot of ways. Python is, I think, even one of the better general-purpose programming languages to write IF in because of its’ object-based underlying mechanics, its introspection facilities, and its power and expressiveness.

But writing a piece of parser IF in a general-purpose programming language like Python is an awful lot of work compared to writing one in a language that’s specialized for writing parser IF. You’ll spend most of your time not writing a story, but writing the framework that the story hangs on. At a conservative estimate, my allocation of framework-writing to story-writing time for my half-written parser-IF-in-Python-project is maybe 20:1. It’s probably higher. Maybe much higher. You have to build the framework before you can start telling the story, and you have to keep tinkering with the framework as you’re telling the story to change how it works in small ways because you didn’t realize when you were writing it initially that you were going to want to … whatever. However well you plan, your system will be inadequate in some ways that you’ll discover as you go along, and you’ll have to modify it, and then you’ll have to go back and fix the things that you broke when you changed the basic rules about how the little world that you created works.

It’s an awful lot of work, especially if you want it to be a pleasant experience for an end-user to interact with. Because you’re going to want other people to play the game when you’re done with it, right? After you’ve spent a few thousand hours designing an underlying system to hang a story on and then hanging a story on it, you’re going to want to show it to someone so you can say Behold this thing I have created!, right? And you’re going to be concerned about whether they actually enjoy playing it? Of course you are! (OK, I admit that I don’t know you, personally, and I’m guessing. But I think it’s likely to be a fair extrapolation from most other makers that I know.)

But the question of “will someone else enjoy this thing?” is a hard one to answer in advance, and it’s easy to overestimate how wonderful the thing you’re building is because, well, it’s yours; and also because you know how much effort you’ve put in, whereas the people who encounter it will not. There’s a lot of parser IF being written these days relative to the size of the parser IF community. I myself couldn’t possibly play everything that piques my interest. I’ve got a to-play folder on my hard drive with hundreds of games in it, some of them thirty years old or more, games I want to get around to playing but just haven’t found time for. The community is saturated with well-written, well-executed content. This is not a bad thing, but it does set the bar high for your work, and players have high expectations for how the things they interact with should behave.

And the brutal truth is that when you say “I wrote this in Python, you know,” players as a whole aren’t going to care much. As a group. they largely don’t care if you write your game in Dialog or Inform or Python or TADS or C++ or Ruby or Lisp. They care about the game itself, not the underlying system. But it’s the underlying system you’re going to be spending the huge majority of your time on. Think about whether you’re willing to spend 95%+ of your time on a part of the project whose highest aspiration is to be invisible and get out of the way so that the player can enjoy the content you hang on top of the system you spent most of your time building. Personally, I could not care less whether or not Jack Kerouac wrote On the Road on one long sheet of paper-towel paper being fed through his typewriter; the question for me is “do I enjoy reading it?”.

The how-to articles that people have linked above show the easy versions of performing some of these tasks: a parser that understands a dozen words, and can move in four directions. Simple verb-noun parsers aren’t that hard to write, which is why they were standard in homebrew BASIC-written games in the eighties. But it gets harder to write parsers that allow for multiple direct objects, that allow for indirect objects, that allow you to use pronouns or the word ALL, that allow you give commands to NPCs. All of these are special cases that make the parser more complicated, but they’re also things that players expect to be able to do. It’s not just the articles that people have already linked that oversimplify the task of writing parsing code; that’s also true in general for the how-do-I-write-parser-IF-in-Python articles that you’ll run across if you do a Google search. As a rule, people writing those articles are doing a good job of showing off how to leverage Python language features, but only an elementary job of showing off how to write good parser IF and the underlying system that it depends on. There are plenty of similar articles out there; they tend to provide a simplified “here’s how the basic idea works in Python” sketch and then skip over the harder parts and the boring implementation details. But it’s the harder parts and the boring details that are going to occupy the vast majority of your coding time on the parser.

The uglier side of the fact that most people won’t care much about whether you invented a whole system from scratch is that no one is going to give you a pass on doing a shoddy job with building the underlying system just because you built it yourself from scratch. Nobody forced you to build it yourself from scratch. The player doesn’t care much what system you used to build your game; they care about the experience of playing the game. If I hire someone to mow my lawn, I’m not going to pay him extra because he decides to put on a blindfold and tie his hands behind his back and push the lawnmower around while holding the push-bar in his mouth. And if he decides to do that, I’m going to be really annoyed if I have to keep running outside and restarting the lawnmower for him because he can’t do it with his hands tied behind his back while wearing a blindfold.

Almost every year somebody submits a game to the IFComp made with a homebrew system that was written in Lisp or C or some other general-purpose programming language. Most of them turn out to be kind of disappointing games, despite the huge amount of effort that went into them. (The counterexample that springs to mind for me was 2019’s ALICE BLUE, which was written as a shell batch file for Linux systems. I quite liked the writing and thought it was rather well-designed, working with its limitations quite effectively instead of trying to do everything that traditional parser IF does. But it got an average score of only 3.22 out of 10, and only 9 people voted on it, and it finished 78th out of 82, which I personally think is a shame, but then I scored it more than twice as high as its average score, so I guess I’m an outlier there.)

So the short version of all of that is that if you want people to both play and enjoy your game, you need to care enough about the player experience and the fact that the bar has been set high to do a really really good job of implementing your underlying system. This means even more work than doing a mediocre job of implementing your system. It’s not even the 80-20 rule, where 80% of the effort goes into finishing the last 20% of the task; it’s much more extreme, perhaps a 90-10 or 97-3 rule.

Here are some things that Dialog, Inform, et al. do for you with no effort that you will have to build from scratch if you use Python, and will have to do a good job with to avoid irritating players and sending them away shaking their heads. This is only a partial list.

Setting up a data ontology for your created world and settling on an underlying machine-readable representation for each object in the game.
Parsing, of course.
Loading and saving.
Making it possible to undo an action.
Buffering text, deciding whether there needs to be a “press a key to continue” message when a whole page has been printed so it doesn’t scroll off the screen before the player has read it, and wrapping text to the window width.
Pluralizing irregular nouns.
Building text out of templates and text substitutions.
Conjugating verbs and dealing with subject-verb agreement.
Ensuring your underlying world model remains consistent when you modify it.
Inventing a system by which it’s possible to talk to NPCs, and give them commands that they may or may not obey.
A few dozen standard commands, meta-commands, and debugging commands (INVENTORY, EXAMINE, GO, RESTART, LOOK, SCORE, ABSTRACT, PURLOIN, …)
Making it possible for actions to be taken by NPCs in addition to the player character.
Putting a few hundred sensible real-world boundaries in place to keep the realistic illusion in place (“if A and B are both containers, and B is inside A, then it should not be possible to also put A into B at the same time”).
Packaging the game in such a way that it’s easy for everyone who already plays parser IF to run. (Lots of people double-click on a story file and their preferred 'terp pops up. Lots of people prefer it that way. Are you writing a console app that reads from stdin and writes to stdout? Not everybody’s computer is already set up to pop up a terminal emulator to run a Python file when a Python script is double-clicked from their file browser. For a goodly sized chunk of people, that’s already a deal-breaker: they will not open a terminal, navigate to the appropriate folder, and type a command to get your program running. They will not deal with the “do I have to type python3.exe before the name of your program?” question. Having to install Python first is already a deal-breaker for many people running Windows. Are you using other modules than those installed in the standard library? Hopefully you can redistribute them, because asking end-users to deal with pip means losing another big chunk of your potential players.)
Dealing with security issues. (This is a whole huge can of worms on its own, largely involving player trust. A story file generated by [say] Inform or Dialog is largely sandboxed so that it can’t steal data off of your hard drive and send it off to a remote server. This is of course not true for Python programs, which can of course do precisely that. And you’re not thinking about using the pickle module to handle saving and loading your game, are you? Another can of security worms: Pickle files are a Turing-complete computation system that can do more or less anything the Python language itself can do. If players are, or might be, exchanging game saves, and those game saves are Pickle files, than your game’s save files are a potential attack vector: opening a maliciously crafted game save could steal information from a player, erase their hard drive, etc. etc. etc.).

There are plenty of other things that need to be done, and done well; that list is just off the top of my head. All of this is stuff that would be done for you, or at least set up and require minimal fleshing-out effort, if you were using Adventuron or Inform or ScottKit or Dialog or TADS. All of it is stuff that you will have to write from scratch and debug and test on your own if you write in Python. Most of it will need to be modified at some point along the way if you write it yourself, because as you write your story, you’ll realize you want it to do things you didn’t originally build the underlying system to do. And there are always edge cases you won’t think of when you first write the code, generating unexpected bugs.

Which is not to say that you shouldn’t write parser IF in Python. It is to say that it’s probably a lot more work than you’re anticipating. The constant temptation is to just say “screw it, good enough.” But doing that chips away at the player experience, and players notice when you’re half-assing the game you’re writing. What I would like to assure you, having started to go down that path and put a substantial amount of time into following it to half-write a large work, is: learning a whole new domain-specific parser IF language is far less effort and takes far less time than doing a good job of building a whole underling world-simulation system in Python.

So, to my mind, there are really only about two and a half really good reasons to write parser IF in Python:

To learn Python. (You will learn Python really well if you do this. You will think a lot about data ontology, and develop a complex class hierarchy, and do a lot of debugging. You will solve complex problems and think about the implications of designing a complex system. I myself wrote my first-ever Python decorators and metaclasses for my half-finished Zombie Apocalypse: A Love Story. I still think it’s a good project that I’d like to finish some day. But that won’t happen before I finish this Inform game I’m working on, which is going much much much faster.) Remember, though, that this is a benefit for you, not for your players. Your players won’t be willing to overlook shoddy craftsmanship because you learned something along the way. Writing a good game that’s good for players involves prioritizing the player’s experience, not asking them to take one for the team.
The game you want to write wants to do things that can be done in Python but absolutely cannot be done in a domain-specific IF language.
Because you’re incredibly stubborn, you insist on doing this, and you’re willing to do it as well as it would be done in a domain-specific IF language. (I still think this is a half-reason.)