What is a "parser" game?

I think this whole question is an elaborate version of the No true Scotsman fallacy.

We should resist the temptation to say that “Last Audit of the Damned” is no true parser game. It absolutely is a parser game. I think it’s not a very good parser game.

In practice, nobody’s going to define what makes a parser game “good.” Certainly there are many wonderful parser games with a consistent world model and a rules engine, but I’m not going to say that a parser game can’t be “good” unless it has those.

It seems impossible-ish to have a good puzzle without some kinda rules that create some degree of consistency, but parser games obviously don’t have to have puzzles, let alone “good” ones.

It would be an outrageously bold prediction to say that an AI-assisted parser game could never have a consistent world model. At a minimum, we’ve seen that LLMs can generate working Inform code, with working puzzles. They’re just not very good.

I’ll conclude with my theory about why LLM-generated IF tends not to be very good. In my opinion, good stories and good puzzles have something in common: they have to be surprising, but inevitable in hindsight.

Have you ever asked an LLM to tell you a joke? They’re rarely funny at all; they never make you actually laugh.

LLMs are trained to predict what the “next word” would be a sentence. Their objective requires the LLM to keep surprise to an absolute minimum.

When you ask an LLM to tell a joke, the LLM is guessing what joke a majority of people would find funny. The result is almost never funny.

This is also why LLM-generated stories (plots) are so boring. AI slop is simply too predictable.

Even mathematical proofs are hard for the very best LLMs. Sure, they can prove stuff, but they struggle to prove anything surprising.

This is why we call LLM-generated content “AI slop.” Slop is just more and more of the same thing we’ve already seen. It’s unsurprising, and so it never impresses us or inspires us.

The problem with “Last Audit of the Damned” isn’t that it uses an LLM to generate text at runtime. The problem is just as bad with AI-slop games with a pregenerated world model, rules engine, prose, etc. The slop is the problem, not when it was generated, or how.

7 Likes

Well i asked an LLM to tell me a joke, but i don’t think it was original. I think it was part of the training set.

I asked for a joke about pirates. The joke was:

Q: What do you call a pirate with two arms and two legs?
A: a rookie.

I hadn’t heard that one. But i don’t think the LLM invented it either.

I think I understand what you mean. An AI can do the job of a parser, but it’s not the hand-crafted kind made with just Inform or TADS or another system from beck in mah day…before we had Pentium processors and ergonomic keyboards!

Perhaps the answer to my original question regarding nomenclature - just specify AI Parser versus Parser? That makes it easer to know which contest to enter - ParserComp or AIParserComp.

And not that the games need be segregated for any disparaging reason; if you enter a chess competition, you don’t want to be pitted against a chess-algorithm. And as a player or comp reviewer, you should know ahead of time whether you’re tangling with Eliza or not.

1 Like

This seems like it’s not really a job for an LLM, but rather for a model more explicitly designed for classification (like BERT or whatever the latest equivalent is) and explicitly trained on a labeled dataset, which everyone seems to have forgotten is still an option.

11 Likes

I have never figured out how to use classification models - and certainly not how to train them. I would be happy to know how you could get such a model to generate JSON or similar that allows the calling program to decipher the commands/intents and arguments of the players input - things like “TAKE {object}”, “GO {direction}” or “PUT {object1} into {object2}”. Currently I wouldn’t even know which model to use or where/how to run it.

In theory, yes, but only if the game is (1) transparent about what it’s transforming your input into and (2) explicit about why it’s failing when it fails. I feel like a broken record about this, but my experience dealing with [i] doesn’t exist’s AI parser was extremely frustrating on this score, with it frequently transforming what I typed into something wildly different from what I was trying to do without giving any insight into why it didn’t understand what I originally typed. (It also failed to understand very basic synonyms, like “take” and “get”, and was in that way a bit of a downgrade from, say, an Inform game by a newbie who didn’t bother implementing any synonyms that didn’t come pre-implemented out of the box, but even if that hadn’t been true this would still be a problem.)

Or if everyone’s tired of me pulling out the same examples all the time, here’s one not from IF: Amazon has recently replaced a bunch of its app functions with an AI chatbot. A couple weeks ago I was trying to look at the price history on an item, which is now something you can only do through the chatbot. I wanted to see if the item was legitimately on sale or if it was one of those things where it’s constantly listed as being on sale to scammily encourage people to buy it. However, no matter what I tried the chatbot would do nothing besides give me the tracking details of a recent order. I had no idea why it was doing that or how to get it to stop doing that. It was frankly a more frustrating experience than I’ve ever had playing guess-the-verb in a parser game.

I think reliability or predictability—if I type X, the game will do Y—is hugely important for a parser game, and that’s explicitly not what LLMs are designed for. Their black-box nature makes knowing what you did wrong and what would be useful to try next really hard.

Also pulling against this is that the kinds of people who make games with LLM parsers generally don’t seem to want transparency. They don’t want the chatbot to ever say it couldn’t match your command with anything it recognized. They want for the player to never have to think about how they’re phrasing what they type because everything will magically Just Work—which is all very well until it doesn’t.

And frankly, given that humans are easily capable of misunderstanding what other humans meant by something, I’m not sure that we will ever be in the position of being able to assume that the AI will understand everything correctly and everything will always Just Work, even if AGI is truly right around the corner as some insist.

13 Likes

It’s funny, not a lot of people are talking about putting an AI in between your controller and your FPS, to interpret your button mashing into motions. And nobody at all is proposing putting an AI in charge of the physics or the damage calculations. It’s just generally understood that learning to control a game with consistent rules is part of the fun.

7 Likes

I feel like there’s a huge difference between a game that has pre-generated AI text and one that uses AI during play. I don’t care for either of them, but the latter is more akin to a live-service game.

Anyone who’s played a live-service game only to find the experience changed radically after some time away knows what I mean. Note that this is not a good thing…new content is well and good, but altering existing content is generally not. I like my games mostly deterministic, thank you very much.

3 Likes

Non-deterministic games also make comparative and analytical discussions about that game more difficult. My experience with a non-deterministic game is a one off, one I couldn’t perfectly recreate even if I fed back the same transcript of commands I did the first time. Picking up the phone booth at a certain game state should have the same effect (or finite set of effects, if random) every time. We’re about making and playing IF, but the other half is talking about and sharing our experiences.

3 Likes

The only two games I can think of:

Left 4 Dead had what they called an “AI Director” that kept track of the action level and could unleash a horde of enemies if things had been too peaceful for too long, or hold enemies back if the party was damaged and suffering and needed a break.

Middle-earth: Shadow of Mordor purported:

  • Dynamic and Personal Enemies: The Nemesis System generates unique Orc captains and Warchiefs with individual strengths, weaknesses, personalities, and names.
  • Adaptive AI: These enemies remember past encounters with the player character, Talion.
  • Procedural Storytelling: If an Orc captain defeats Talion, they will become stronger, move up in the ranks, and might taunt Talion during subsequent encounters. They might even remember specific details of the fight and bring it up later.
  • Emergent Gameplay: The system creates dynamic relationships and rivalries between Orcs and Talion, leading to unique and unscripted story moments based on player actions. For example, an Orc who fled a previous battle might return later seeking revenge and stronger than before.

While I don’t think any of these utilize LLM/AI as we think of it, that’s how they were box-copying the features. You don’t really need an AI to track variables for NPC knowledge. And it doesn’t require high level learning for a game to peek at player damage levels and keep track of timers to adjust when enemies should attack.

Maybe there’s more going on than I’m supposing but these scenarios are possible with normal algorithm checks to decide the state of the game.

2 Likes

Those games are using “AI” in the same way that we can say the ghosts in Pac-Man have “AI”—an algorithm that determines their behavior. The big innovation in L4D was having one big algorithm controlling all the enemies in the level, rather than individual algorithms running each individual enemy like in Half-Life 2 and other Source engine games.

“AI” meaning specifically “LLMs” is a very recent thing.

8 Likes

Well, the model itself wouldn’t generate JSON, although obviously you could take its output and format it into JSON before passing it along to some other part of your game.

The exact way you’d formulate the problem could vary. One option that comes to mind would be using BERT (or whatever the current equivalent is), with inputs of the form “user input[separator token]action[separator token]object”, and then training a binary “yes/no” classifier (i.e. classifying whether or not the given action & object is a reasonable interpretation of the given user input). This would require testing multiple action/object pairs at runtime, which is not optimal, and would probably also require a bigger training data set compared to some other options.

A probably better solution (but slightly more involved) would be a sequence labeling approach, where you identify which spans in the input sequence correspond to an action and/or object (similar to e.g. Named Entity Recognition or POS tagging), and then have some subsequent step of linking that action/object to one of the actions/objects available in the game.

I guess that’s probably not a helpful description if you don’t already have pre-existing NLP knowledge. Of course, the whole appeal of LLMs is that they really don’t require you to know anything about how they work, which is obviously why everyone is flocking to them. But my comment about people “forgetting that classifiers are an option” was basically intended to be targeted at the sort of people who are trying to present themselves as innovators in the application of “AI,” who really ought to be able to use BERT if they’re genuinely interested in engaging seriously with NLP.

EDIT: I guess what I’m really saying is that you could use BERT to make a parser that’s better about automatically identifying synonyms & slight variations in phrasing than traditional parsers are, but the fundamental activity would look a lot more like “designing a better parser (that happens to use machine learning),” rather than “using the magic box to avoid having to write a parser.”

7 Likes

This is an interesting point.

When you pregenerate an AI-slop Inform game, the game doesn’t feel like an online service. The game is consistent from play to play. You could write a walkthrough or a hint guide for a pregenerated AI-slop Inform game.

But when a game uses an LLM to generate text as you play, it feels like a live-service game, regardless of whether it’s actually talking to a live service over the network or whether it’s talking to a locally hosted LLM running on your phone/computer.

Live-generated text is a liquid in motion, changing, inconsistent, kinda like an online service.

1 Like

Not just kinda…it is an online service.

If the entire LLM is contained locally in the game then it should be fully deterministic unless it is using some local random generation. Even then at least the mechnaism for change is local and understandable and possibly overridable via seed values like any good game that uses “randomness”. That said, there are a lot of games that handle randomness badly.

Still, local randomness is a lot better than the kind of wholesale rip-and-replace back alley surgery that live services perform. If you played Destiny, you probably know what I mean.

1 Like

Never played Left 4 Dead, but that actually sounds incredibly annoying - much like rubberband AI in racing games. I did find the nemesis system in the Mordor games mostly enjoyable.

1 Like

If the entire LLM is contained locally in the game then it should be fully deterministic unless it is using some local random generation. Even then at least the mechnaism for change is local and understandable and possibly overridable via seed values like any good game that uses “randomness”.

Sampling from an LLM, even locally, almost always involves randomness unless you set the temperature to 0 (which is usually not done, for various reasons). That’s a standard part of the sampling procedure. You could seed it if you wanted to, of course, but that wouldn’t be typical, and it wouldn’t change the fact that the LLM’s response to not-explicitly-tested inputs will still be “random” (in the sense of being unpredictable) from the developer’s perspective, as well as the player’s.

2 Likes

True but it is at least locally controllable (theoretically if not practically). That’s a very different animal than talking to a random black box on the internet under someone else’s control.

1 Like

Alien Isolation is another example of this; although it’s not controlling a horde of enemies (or is it?), it’s guiding the single enemy xenomorph in a way that paces the encounters dynamically.

2 Likes

RimWorld also has the concept of the narrator that generates world events. You can customise it and I think mods can even add new narrators.

1 Like

I think “AI Director” is a fancy way to describe something that most games do: balancing the game by making it harder or easier depending on how well the player is doing.

3 Likes