Why can't the parser just be an LLM?

I’ve read the other threads here about the hot topics of LLMs and emerging AI. I don’t know much, but I want to throw out a stupid question.

Why can’t we have an IF game game with a persistent state and simply use LLM technology to make a more-or-less perfect parser?

I’m not very knowledgeable about any of this, even comparatively about IF, but I want to know whether Chat GPT could hypothetically eliminate the guess-the-verb problem and otherwise leave traditional parser IF reasonably unchanged, if you IF dev folks were to have access to implement stuff like that with some kind of magic API. Or maybe such a way to build on Chat GPT actually does exist; I don’t know.

After playing AI Dungeon (over a year ago, on the app version) and playing with Chat GPT a little, I feel like LLMs and AI chat are simply fulfilling a basic and intuitive wish that all computer uses since the dawn of computing have felt. We want to talk to our computers in order to tell them what we want them to do. Command line interfaces and text adventure parsers are two examples of imperfect implementations of this essential wish. IF parsers are less abstract that CLIs, but at heart they are both ways to tell computers what to do with their data models using simplified and altered subsets of English.

One of my first thoughts after poking at Chat GPT (admittedly not original to me, as I watched a couple videos) was, Hey, so now computers automatically know real human languages. Great. So, Computer, would you kindly reduce all the jpegs in /home/photos by 25% and save new copies named with the suffix “_edit” appended just before the “.jpg” extensions? Thank you, Computer. I would have had to click on all the icons one-by-one before LLM came along, because I would have given up in frustration trying to google the exact bash script command to make this work.

It seems like we’re all focused on the output of AI systems, but what could we do with the LLM input scheme, if we could put it to work in our existing models?

I have no doubt that these are basically naive questions, that everyone else must have had similar thoughts years ago. I hope by posting them here within the context of IF, it might satisfy this blatant speculation for other passing dabblers.

I mean, I think output is connected to input in all of this; I think the generative output could have a place within structured game design. I’m certain that game devs of all kinds have been exploring such things since long before the current AI craze. But the fascination with the seemingly magic output seems to me that it’s missing an even more wonderful (and perhaps less potentially sinister) breakthrough – the realization of the dream of effortlessly being understood by the machine, and being able to effortlessly exercise control over the machine’s limited functionality by because of such understanding.

2 Likes

How do you get the LLM to update a world state? That’s not a trivial question. I tried to use ChatGPT for some design assistance a while back, and I found that it could track some aspects of the conversation, but it could not maintain a meaningful model of our conversation with any consistency—it would lose track of things. I could ask it for corrections, but the longer the conversation went on, the more pronounced the divergence between my understanding of the conversation and the outputs that the LLM provided.

I don’t want to suggest that it would be impossible to create a dedicated tool, but trying to adapt the systems that exist currently as some sort of front-end to a traditional IF or hypertext world or narrative model feels like a “draw the rest of the owl” situation. Just using an LLM presents a harder design challenge, not an easier one.

8 Likes

There have been several threads here on this subject. Here are two of them:

6 Likes

Further to @jsnlxndrlv 's point;

What most people mean when they talk about the “parser” is in fact two things:

  1. something that transforms words into a syntax tree.
  2. something that resolves that tree against the world model and updates it.

LLM can do (1) but not (2), easily.

This is mainly because your world model will need to be some sort of data structures and you’re basically asking it to generate code to update those structures.

I’ve looked into LLM code generation. Currently, it’s mostly full of holes.

Although one day it might be possible.

11 Likes

Square Enix tried this recently and it didn’t work very well. Doesn’t mean it can’t be done, but it’s a data point. It also made the size of the game absurdly large for what it is.

4 Likes

I mostly agree, but I would say it doesn’t even do (1), at least not out of the box. LLMs are text prediction models: given some input, they predict what the next word is likely to be. If you let them predict a lot of words in a row, and give them the right kind of input (including ‘instructions’ about what kind of output to expect), they are pretty good at text generation. But they don’t do it by analysing the input and turning it into some formal syntax tree! In fact, they are basically black box models using billions of parameters that , somehow, encapsulate their training data.

It certainly doesn’t do (2), and in fact, (2) requires a totally different type of AI, namely an AI that uses world modelling. This is just not the architecture of LLMs, and is in fact something that LLMs are ‘bad at’. (I put that between scare quotes, because saying that LLMs are bad at world modelling and telling the truth is a little bit like saying that hammers are bad at screwing. I mean, they are bad at screwing, but you shouldn’t expect them to be good at it.) Try to get ChapGPT reasoning about a universe of coloured blocks à la SHRDLU (1968), and it performs worse than it’s 55 year old predecessor.

I think there are genuinely interesting things you can do with LLMs and IF, but ‘replacing the parser’ isn’t really it.

12 Likes

Perhaps sometime in the future it would be easy to add an AI/LLM as a preprocessor so whenever the player types/says e.g. “I would like to put the chair under the shelf so I can climb onto it” then the preprocessor writes:
[simplified command: TAKE CHAIR]
You take the chair
[simplified command: PUT CHAIR UNDER SHELF]
Done
[simplified command: STAND ON CHAIR]
You step onto the chair. You can now see everything on the shelf.

Something like that. The above “preprocessor” should have some idea of understanding the output too, as it should stop the command sequence in case e.g. the chair cannot be taken.

With such an approach parser games would be very accessible to new players and the new players will quickly see the pattern of how simple the commands are underneath.

The author could also supply the preprocessor with info on all verbs understood by the game etc. The question is if anyone would bother to make such a preprocessor for e.g. Inform games. But it would be good for new players I think.

3 Likes

Cheree: Remembering my Murderer in this year’s ParserComp was I think a good example of the possibilities - it allows for free form conversation with an NPC, with relationship tracking and some puzzle gating providing the structure.

(The Fortuna, from the same Comp, is I’d say an example of the worst-case scenario).

3 Likes

Yeah, that’s about where the previous two threads trailed off.

2 Likes

Interestingly, Cheree was handcrafted and trained on the authors own data (I’m pretty sure, although I can’t find where I heard that now), while Fortuna used standard ChatGPT or equivalent.

I remember I thought Cheree had to be based on a large corpus-trained general model because it knew some random video games I typed in, butt if you go to Robert Godwin’s GitHub page he hardcoded a list of video games and their publishers, which is neat, worked out great for me, and is also a lot of work.

4 Likes

It’s in the About the Game text here:

No deep learning generative neural nets are used in this project, and no data has been scraped from the internet. All AI is handcrafted. All art is handcrafted. All photographs are handtaken.

6 Likes

Yeah, sorry, I guess it’s not technically an LLM – just a similar approach, though as you say way more work than just using ChatGPT off the shelf.

(I suspect the folks interested in AI type approaches to IF because they think it will reduce the amount of work involved are pretty mistaken; the folks interested in them because they open up new gameplay/design possibilities are IMO closer to the mark).

4 Likes

There was a Kickstarter for a game called [I] doesn’t exist that claims that “you don’t need to learn the language of the game - we have an AI that parses everything you type and tells the game what to do accordingly”.

But I don’t know… I played the demo (the full game hasn’t been released yet), and I can’t say I noticed the parser being any more capable than a traditional one. And in some instances, commands I’m used to having did not work in this game.

2 Likes

As someone who works with LLMs daily, they can certainly do (2). But there are definitely caveats and so I entirely agree: it’s not necessarily easy in all cases.

For example, I help companies with ontologies.

An ontology is a formal representation of knowledge that defines concepts, their attributes, and the relationships between them in a specific domain. This is like a “world model.” Both ontologies and world models involve representing concepts and their relationships. In a text adventure, concepts would include rooms, items, characters, actions, and their interactions.

GPT-3, and even more so GPT-4, can be used to understand and generate text based on the information present in ontologies. In my context, we utilize them to generate descriptions of ontology concepts, answer queries about ontology structures, but also – and crucially – with populating or refining ontologies. Think of that as updating a world model.

Combining ontologies with LLMs for parsing and – in this context, world state management – is definitely complex, though, hence a lot of the caveats. Ensuring that the language model understands and respects the ontology’s structure and semantics is not always easy. But how it works is conceptually simple.

Essentially, the generated changes from the LLM are integrated into the ontology’s data structures. Think of that is updating the “world model.” This usually involves updating a graph database, a knowledge representation system, and so on. But it can be any other form of ontology storage.

The how of this is also conceptually simple: you have an integration layer that bridges the gap between the LLM and the ontology. This layer interprets the LLM’s responses and generates instructions for making changes to the ontology. But via what? Usually through the Ontology API. And that’s where things get interesting.

So putting this in context here, let’s say you have IF Ontology API. It serves as a mechanism to update the world model (ontology) based on player actions. It allows you to make changes to the ontology’s concepts, attributes, and relationships as the player interacts with the game. If a player picks up an item, the Ontology API could be used to remove the item from its current location and associate it with the player’s inventory. The API would offer functions to add or remove instances of concepts, modify attributes, establish or break relationships, and update the state of the game world.

What I just described there as the “game world” is essentially what we do with spatial/temporal ontologies and event ontologies.

Text adventures often involve spatial relationships (rooms, locations) and temporal aspects (sequence of events). A spatial ontology would help define the layout and relationships between game locations, while a temporal ontology could capture the sequence of actions and events.

Events obviously play a crucial role in a text adventure as players take actions and trigger changes in the game world. An event ontology could represent actions, state changes, and their effects on the game’s entities.

One thing that I would like to see more of is causal ontologies, which could apply here. (The notion of applying causality to AI models is one of the next major elements being worked on.) I say this because causal relationships can be important in a text adventure to ensure consistent outcomes based on player decisions. For example, opening a door might cause a change in the room’s state. A causal ontology could help model these cause-and-effect relationships.

Anyway, apologies for the long blurb of text here. But this an area that’s fascinating to me.

8 Likes

Thanks for sharing, this is really interesting and has a lot of new information that hasn’t been very present in AI discussion in recent months. This does seem like a bigger step to a world-model AI.

I was thinking after reading this that it might be useful to define from a game viewpoint: what is the desired function of the AI? Feasibility and enjoyability change a lot depending on that function.

I can think of a few functions AI could have:

  1. Adding a command-interpreting layer over a traditional parser game. Here the AI’s sole use would be taking unrecognized commands and matching them to the most similar or likely recognized command. This is less an LLM problem and more of a classification problem, but could be approached with similar techniques. Game errors would still exist in a useful way (like ‘There is no object called that here’ kind of stuff) but typos or long sentences could be fixed (like if someone types ‘please go west’ it could direct it to GO WEST).
  2. Adding non-error, world-changing responses to every command. So if the programmer hasn’t implemented ‘DIG’ and the player types ‘DIG SAND’ in a desert. Since the AI is generating stuff on the fly here, this could end up with the AI going ‘off script’ and providing an entirely different story than the one programmed. AI Dungeon is basically nothing but this.
  3. Programming/writing the whole game itself (coming up with plotline and coding). Fortuna in recent parsercomp was closer to this.

I could see 1 be exciting and/or fun for drawing in new people; eventually I think it won’t be useful, because I’ve learned over time that:

  1. players eventually learn what commands your game expects and create a mental list of those commands and just reuses them, except for instances you require something special, and
    2)players eventually prefer shorter forms of commands. So many parsers pride themselves on being able to understand complex sentences, but I’ve never seen a player say ‘I wish I could use longer and complete, grammatically correct sentences while playing’. Like, it’s cool to know that I could type “I take my journey to the land of the west’ and the game moves you west, but if you have a game with 40 locations and lots of movement you’re not going to type that every time! That’s why it’s more like ‘GO WEST,’ 'WEST”, or even just ‘W’.

So for those 2 reasons I think that AI assistance would mostly be useful for new players or for the beginner of the game.

I’m not really excited about the 2nd and 3rd kinds of games I listed above. In the 2nd game, if the AI is doing the heavy lifting, then the programmed game becomes irrelevant. You might as well just have the pure AI, like AI Dungeon. In the 3rd kind, AI is currently programmed to write what people expect in a given situation, so it tends to do bad at surprises or unusual things.

5 Likes

Some of the places I want my game to be played would ban it if it included an LLM (or anything pointing to a non-approved website) due to concerns about copyright, information security and business secrets. This rules out using an LLM for any aspect of the work that was still connected on release.

Also, it means players can’t guarantee equal levels of responsiveness, as it depends on what’s been fed into the feedback loop previously. Hallucinated content is also a serious problem, which would break credibility for most players.

A better way to remove guess-the-verb would be to use a thesaurus and have good playtesters who can figure out what players (rather than computers) will do with a given prompt. This allows the same game experience for everyone.

(I’ve played Dungeon AI 2 once. For a single exchange. It did extremely poorly with that prompt, apparently having no idea what the eight-word prompt I’d given it related to anything within itself and managed to contradict itself three times in the same sentence. I don’t see how an AI that works on that sort of basis can provide a consistent game).

2 Likes

Yeah, this is interesting. One of the more salient aspects I see all the time in gaming discussions is people want to feel like they’re in a “lived in world” but also one that is responding to what’s going on – not just with what the player does, but also with dynamic events not generated by the player.

Yet people do want a consistency that allows for a coherent story or theme.

Scenario

So let’s say you have a text adventure with a defined beginning and end. And a story that the author wants to convey. But the pacing of the story and the events the player encounters are all based on what they do in going from the beginning to the end, of course.

Traditionally, a lot of the development time goes into accounting for how the player might encounter things and possibly gating them so that they only encounter things in a given order. (This is the same thing “open world” games had to adapt to as they removed a lot of the linearity in pathing, which led to conformity of experience.)

Structure

So let’s assume the story is modeled as an ontology as is the game world (“world model”).

In that case, could an LLM based approach along with a command interpretation layer and perhaps an ontology API having some interesting possibilities here?

I think yes.

The command interpretation layer powered by the AI could allow players to input natural language commands, even if they are not predefined in the game’s parser. They don’t have to be long commands but they could be. So you don’t necessarily worry about how much or how little people want to type. You have a system that adapts. And I agree: this would largely be a classification exercise to an extent.

So, first, an LLM can serve as the tool for generating dynamic narrative content. As players progress through the game, the LLM can create personalized descriptions, dialogues, and events that adapt to the choices and interactions the player has.

However, by using the LLM-generated content with an ontology API, the game can dynamically adjust the pacing of the story based on player decisions. Significant choices could trigger branching storylines, unexpected events, or different character interactions.

An Ontological Mystery

So let’s say we create an ontology for a mystery story where the player’s choices impact the outcomes of various events.

I work with OWL (Web Ontology Language) a lot, so let me give an idea of how I might start doing this:

Ontology: <mystery-story>

Class: ex:Story

Class: ex:Branch

Class: ex:Path

Class: ex:Event

Class: ex:ConvergencePoint

Individual: ex:MainStory
    Types: ex:Story,
           ex:Branch ;
    Facts: ex:hasPath ex:MainPath .

Individual: ex:BranchA
    Types: ex:Branch ;
    Facts: ex:hasPath ex:PathA .

Individual: ex:BranchB
    Types: ex:Branch ;
    Facts: ex:hasPath ex:PathB .

Individual: ex:ConvergencePointA
    Types: ex:ConvergencePoint ;
    Facts: ex:leadsTo ex:MainPath .

Individual: ex:ConvergencePointB
    Types: ex:ConvergencePoint ;
    Facts: ex:leadsTo ex:PathB .

Individual: ex:MainPath
    Types: ex:Path ;
    Facts: ex:hasEvent ex:Event1,
           ex:Event2 ;
           ex:hasConvergence ex:ConvergencePointA .

Individual: ex:PathA
    Types: ex:Path ;
    Facts: ex:hasEvent ex:Event1,
           ex:Event3 ;
           ex:hasConvergence ex:ConvergencePointA .

Individual: ex:PathB
    Types: ex:Path ;
    Facts: ex:hasEvent ex:Event1,
           ex:Event4 ;
           ex:hasConvergence ex:ConvergencePointB .

Individual: ex:Event1
    Types: ex:Event ;
    Facts: ex:description "Player starts the story." .

Individual: ex:Event2
    Types: ex:Event ;
    Facts: ex:description "Player collects evidence." .

Individual: ex:Event3
    Types: ex:Event ;
    Facts: ex:description "Player confronts a suspect." .

Individual: ex:Event4
    Types: ex:Event ;
    Facts: ex:description "Player gains detective's trust." .

Granted, this is bare-bones here so my apologies on that. But the ontology, simple as it is, captures the structure of the branching narrative by defining story branches, paths, events, and – crucially – convergence points.

Okay, so then how does this allow for a sort of emergent gameplay where the player can still have a consistent story experience but a very tailored ludic experience?

Each path within the ontology represents a unique sequence of events and choices that the player can take. By following different paths, players can thus have individualized gameplay experiences tailored to their choices.

Yeah … okay. But that’s sort of like what we can do now, right?

So the convergence points in the ontology come into play here. They represent moments where different branches come back together. These points allow for emergent gameplay, as players can take varied paths and still arrive at shared narrative moments, ensuring that consistent story experience.

But … wait? Is this really emergent at all? Let’s ask it this way: would this structure allow for interactions not programmed in assuming the LLM and ontology API layers were operative?

Alright, so we have a murder mystery we’re talking about. In that context, let’s say the player could witness a bit of dialogue between two characters if the player happens to be at the right place at the right time. (Maybe the “right place at the right time” will differ because the two characters are not on set paths, but rather guided by events. Meaning, the dialogue takes place when they happen to meet up in the same location.)

But it’s also the case that the two characters will eventually go their own way. One of those characters, however, has a crucial piece of evidence in their pocket that they get from the other character.

So what can happen here?

Well, the player could intercept the conversation and try to get the evidence on the spot. Or the player could watch the characters behind cover, see the transaction, and then follow the character with the incriminating item. The player could try to pickpocket the item. Or they could confront the character. Or maybe just continue to wait and see what the character does with it, if anything. Or maybe the character eventually hangs up their jacket in a closet.

But … how about this scenario: perhaps a random encounter happens. The character with the evidence bumps into another character and that causes the item to fall out of their pocket.

Can all of this could be modeled with what I’m talking about?

Maybe?

Modeling the Mystery

We could certainly model the dialogue between the two characters as events or interactions within the ontology. The player’s presence or absence at the location can trigger these events. (Note: even their absence can trigger this. This leaves entirely open what else can; that’s where the emergence would come in.)

Equally certainly the characters’ movements and actions can be represented as part of the ontology. The ontology can keep track of their current locations, planned paths, and interactions with the player or other characters.

Planned paths? But what about unplanned paths? Based on characters’ behaviors and the game’s context, the ontology could be set up such that the characters can make dynamic decisions about where to move next. For instance, a character might decide to move to a location where they heard an important conversation is taking place or when they come into possession of something that they believe is important. Or maybe the character has gotten suspicious of the player and actively tries to go only where the player isn’t or where the player can’t go. (These would be modeled as somewhat equivalent to weights and biases in the model.)

The evidence itself can be an entity within the ontology, linked to the character who possesses it. The ontology can define rules for how the evidence can change hands and how it can be interacted with. And not just that piece of evidence, but anything that can be treated as evidential in the story. (These would be like attention masks applied to the entities.)

Here’s where we go to your interpreting command part as well. Player actions such as intercepting the conversation, pickpocketing, or confronting characters can be mapped to specific events or interactions within the ontology. The LLM-based layer can interpret the player’s commands and trigger the corresponding ontology events. But the range of possible interpretations could be interesting. For example:

FOLLOW THE SUSPECT UNTIL THEY STOP

Or:

WHEN THE SUSPECT STOPS, APPROACH THEM AND LOOK AT THEIR POCKET

Or:

TAKE THE EVIDENCE FROM THEIR POCKET WHEN THEY ARE DISTRACTED

I’m probably not conveying this well but the idea is you could open up a whole range of interaction. The LLM-based interpretation layer processes the input and extracts the intent, actions, and relevant entities.

Crucially, the interpretation layer must identify the main intent of the command, which is to take the evidence from a character’s pocket. But the intent is also to do so stealthily or without causing a scene or, perhaps more crucially, without the suspect knowing.

The interpretation layer then recognizes the entities in the command:

  • “THE EVIDENCE” as the item to be taken.
  • “THEIR POCKET” as the location of the evidence.
  • “WHEN THEY ARE DISTRACTED” as a condition.

Key to this is that the ontology stores information about characters, their pockets, and conditions for distraction. (And this is just for pockets! We can imagine many other scenarios here.) The ontology, remember, defines relationships between characters, items, and conditions on a very broad scale.

The interpretation layer generates a query for the ontology based on the extracted intent and entities. The query then seeks to find a way to fulfill the command, just as any AI-based task is handled by a learning model. This is where the inherent prediction-based nature of AI comes in.

The ontology responds to the query by checking if the conditions are met. It assesses, for example, whether the character is indeed distracted and if the evidence is in their pocket. (What if, for example, the character dropped the evidence in the trash when the player wasn’t aware of that because they weren’t in the same location?)

The concept of “being distracted” can encompass a range of possibilities, of course.

So you could create “distractor entities” within the ontology. These could include things like loud noises, sudden events, engaging conversations, unexpected occurrences, and whatever else. Then represent different states that characters can be in, including “distracted.” The ontology can define factors that contribute to a character’s distraction, such as their focus, attention, and emotional state. (Example: character recognize they are being followed by the player and is no longer distracted but very, very focused.)

Then you model events that can serve as distractions in the game world. These events might be associated with specific locations, characters, or conditions. For instance, a loud crash in the next room could be a distraction event. Or the character happening to run into another NPC who stops them to talk. Or the player gets another character to call the suspect. Imagine if you could do something like:

SUSAN, CALL THE SUSPECT AND KEEP THEM TALKING FOR A FEW MINUTES

So our ontology starts to look like this:

Ontology: <distraction>

Class: ex:CharacterState

Individual: ex:Distracted
    Types: ex:CharacterState .

Class: ex:DistractorEntity

Class: ex:DistractingEvent

Individual: ex:LoudNoise
    Types: ex:DistractorEntity,
           ex:DistractingEvent .

Class: ex:InteractionModifier

Individual: ex:PickpocketModifier
    Types: ex:InteractionModifier ;
    Facts: ex:enhances ex:PickpocketingInteraction .

Class: ex:Interaction

Individual: ex:PickpocketingInteraction
    Types: ex:Interaction .

Individual: ex:Event
    Types: ex:Interaction ;
    Facts: ex:description "A general game event." .

Individual: ex:DistractedByEvent
    Types: ex:Interaction ;
    Facts: ex:description "Character gets distracted by an event." ;
           ex:requires ex:DistractingEvent .

ObjectProperty: ex:enhances
    Domain: ex:InteractionModifier
    Range: ex:Interaction

ObjectProperty: ex:requires
    Domain: ex:Interaction
    Range: ex:DistractingEvent

So broadly, but also simplistically, speaking, the ontology provides a structured foundation for tracking and reasoning about all these interactions, while the LLM layer enhances player engagement through intent-based natural language interactions and tailored responses based not only on how the intent was expressed but also on the conditions that might make the intent easier or harder to implement.

2 Likes

That looks really complicated…

But should that third command be generated? Did the player mean that they actually want to stand on the chair now, or only that they want to be able to stand on it? Traditional IF parsers are designed to minimise ambiguity, or to stop and ask the player to disambiguate if necessary. They can do that because they only understand a subset of the natural language.

With an LLM preprocessor either it would absorb all ambiguities, leading to statistical actions being generated (sometimes that command would result in the player character standing on the chair and sometimes it wouldn’t), or the LLM might be able to be programmed to ask the player to clarify whenever its input is ambiguous. But normal language has so many ambiguities I’d be worried that might happen every single turn!

3 Likes

Yep, I think it would be right now. That’s where my agreement comes in with the “not easily” part. But I also think abstraction layers for this kind of thing are inevitable. Some are here. For example, the ROO (Resource Oriented Ontology) language. There’s also OWL2VOWL, which can actually create interesting visualizations. Protégé also allows for an interesting abstraction layer.

In essence, some of this is no more complicated than providing a robust domain-specific language. But there are definitely some complexities beyond just that, which really goes into how interpretable all of this would be for someone developing it.

One of the bigger challenges here is the nature of the trained models that would best support this. Trained models can contribute to updating an ontology-based world model in response to player actions, influencing how the game world evolves.

That said, this isn’t an entirely unsolved problem. Trained models are commonly used in various aspects of gaming AI even now. It’s just that this often is so specialized for the context it was developed in and isn’t really scalable. I talked about this a bit with an AI tester I used for Elden Ring. That can’t be used in any other context except Elden Ring, although it could be generalized a bit to other “Soulsborne” games.

That said, the work in this field is rapidly evolving.

1 Like