Implementing ChatGPT's Parser

As a player, I don’t need input to be valid, only anticipated. A non-standard quip or acknowledgement of why my input won’t accomplish my intent serves just as well to make anything seem possible as fully implementing all of those possible inputs. Some of the most amazing moments of immersion ever experienced involve being lightly trolled by the author for trying something insane.

4 Likes

That’s what I mean. We want the game to understand us, even when that particular action can’t be performed at this time (or in this game at all). But by invalid inputs I mean typos, unrecognized words, incorrect syntax–inputs which can’t be construed meaningfully without doing something like spell-checking them (which I’m all for).

2 Likes

Unix shell? o.O
Wow, did THIS ever go off the rails. Jesus. LOL

The fact of the matter is that ChatGPT’s parser is better than any currently implemented in IF. I don’t see why people are so uptight about the idea of incorporating that into IF.

2 Likes

The issue is that ChatGPT doesn’t have a parser. It learns statistical relationships between words: what sorts of words are likely to come after which others, and how to use those relationships to produce sequences that are likely.

And this isn’t good enough for interactive fiction, which doesn’t just want a plausible sequence of words, but a meaningful one. The sequences “>TAKE BREAD You pick up the loaf of bread.” and “>TAKE BREAD You can’t see any bread here.” are both fairly plausible in the context of a parser game. Which one should be returned depends on a sort of meaning (i.e. whether there’s a loaf of bread in the right part of the model world) that is completely alien to ChatGPT.

Now, in theory, a neural network of this sort could be trained to translate standard English into what Inform’s parser wants, or come up with plausible error messages. But fundamentally, ChatGPT as it currently exists doesn’t do any parsing, because it doesn’t care what any of these words mean—it just cares about whether they’re plausible.

8 Likes

You seem to be saying that since ChatGPT produces plausible output, it must work like an IF engine, in two stages. First parse the input into a representation of a command; then generate output based on that command. If that were true, we could make use of the parsing stage by itself.

But that’s not true. It’s not two stages. There’s just one stage, which is “put the input through a statistical blender.”

This was basically the goal of the SpiritAI project which Emily Short started several years ago. Aaron Reed and I worked on it for a few years. Unfortunately the project didn’t wind up going anywhere and got shut down in 2019-ish.

This was just before the AI chatbot explosion, but even if we’d kept it going, it’s much harder to train. The reason ChatGPT works so outrageously well is that you can train it on any arbitrary text. Every bit of text leads to the next bit of text. You can feed in as many gigabytes of text as you can find (and the Internet has plenty).

To train a classifier for IF input, you’d need a corpus of IF commands and their parsed representation. (Not just the output of the command, but the action/noun data from inside the parser.) This basically doesn’t exist.

8 Likes

Okay. How does ChatGPT not have a parser? That makes zero sense. How else is it able to “understand” (for lack of a better word) what the user types.
I couldn’t care less about its output regarding the issue of its parser. Its output is irrelevant to this conversation.
This is ENTIRELY about its ability to correctly read and “interpret” the user’s input. This includes ignoring typos and interpreting them correctly. Also abbreviations and synonyms. Plus you can define “word shortcuts” for complex commands.
It actually already knows a HUGE amount about interactive fiction. It has been trained on most of the popular IF languages as well as transcripts from a rather wide variety of games thru 2021. It will resist playing IF games unless you tell it that it’s an experiment to test its capabilities. Then it will happily participate. With mixed results, but still quite impressive IMO.
Perhaps CGPT isn’t as impressive to programmers as it is to mere users. But the entire academic AI research community seems pretty impressed with it, so I take a lot of salt regarding opinions about it from programmers not directly involved actively in AI research. It’s like asking a biologist about climate science.

2 Likes

I’m sincerely confused about your intent here. If you’re dubious about our opinions, why seek them? I’m sure the academic AI research community is still discussing ChatGPT. Or are you trying to bring us around to your point of view? What’s the goal here? Because you don’t seem to be getting the feedback you either expected or wanted.

7 Likes

I mean that’s the thing, it really really doesn’t “understand” what the user types - that’s exactly what a traditional IF parser does and ChatGPT doesn’t.

Inform or TADS or whatever system you choose will go through the player’s input word by word, matching each to an element of its world model - either an action, or an object, or a grammar token - and end up with an understanding of what the player intends to do, and then - based on the hard-coded rules of the world model - provide appropriate output (or at least it’ll try! This is obviously hard).

ChatGPT, by contrast, doesn’t go through input word by word to match the input to component parts that it has some modeled understanding of. It looks at the glob of text, compares it to other globs of text it’s seen elsewhere on the internet, and runs a probabilistic model to schlorp together the globs of text that usually seem to follow those first glorbs of text (this will sometimes lead to plausible results and sometimes sillier ones, but it generally won’t straight-out fail).

This is very impressive, and it’s very powerful - but it’s not at all what a parser is doing even if it can sometimes look similar.

3 Likes

Opinions about whether or not ChatGPT is actually “any good” are for another topic. This is about the possibility of incorporating its abilities into IF parsers.
IF parsers don’t “understand” user input any better than ChatGPT. They have to be told what words are verbs or nouns. They have to be told what to do with that information.
I fully realize parsers don’t “understand” user input. But what other word is there for it? It’s a useful term, and I can think of no other to use but “interpret,” and people seem to complain that they don’t actually interpret anything. Can’t win. These are words of convenience, for lack of better ones. And it IS arguable that an AI actually does “understand” things far far better than your standard game parser.
IF parsers were programmed by humans too. But also only “trained” by what the humans who programmed them considered when feeding them vocabulary and grammar. As such, I can’t think of a single IF parser that can understand typos or idiom for instance, unless such are explicitly programmed into a game by its creator.
I, of course, would like to bring others over to my point of view. Duh. lol.
I don’t see any negatives regarding incorporating CGPT’s abilities into IF parsers. I don’t get the … resistance … to it. Everyone seems to be like, “Oh, in reality ChatGPT sucks. It doesn’t do anything special.” um … Excuse me? Which of us here are professionals in the AI field? Such opinions are uncalled for, being entirely personal and also tinged with a bit of “sour grapes” to a degree.
ChatGPT and its ilk are inarguably ground breaking. That is not up for debate or discussion. “I don’t find it impressive” is irrelevant.
This is about the usefulness of its abilities to “understand” user input, and incorporate that into IF parsers.

2 Likes

Like I said above, the key is that ChatGPT is not trying to understand the user’s input. It has no model of the world—the part of a parser game that knows “the keys are in this room and the apple is held by the player and that apple is edible” and so on.

ChatGPT’s goal is to produce output that’s a plausible continuation of what the user typed. That’s its one objective. And it’s generally very good at that objective!

But “plausibility” to ChatGPT is entirely about statistical distributions. Remember the example about TAKE BREAD, which could plausibly produce either “you take the bread” or “there’s no bread here”? ChatGPT only cares about which of those strings of words is more likely, given what it knows about the text humans like.

We’re trying to tell you that ChatGPT does not produce output that’s useful for interactive fiction. ChatGPT is a specific model that takes text and produces a plausible continuation of that text. But the goal of interactive fiction, in general, is not to produce a plausible string of words: it’s to produce a plausible string of words that matches an internal world model. ChatGPT, as it currently exists, does not have (and cannot have) any sort of world model like this. That’s just not part of how it functions.

Could one design an ChatGPT-esque artificial intelligence that does have a world model like this? Sure. That’s what zarf and I mentioned above. But the output of ChatGPT is not a thing that’s generally useful for interactive fiction, because ChatGPT does not, and cannot, know about the model that underlies a piece of IF.

4 Likes

A parser doesn’t know anything about world models either. That’s the rest of the interpreter. A parser is only about parsing (thus its name) the user’s input. That’s it. They take a sentence and break it down to a command the game “knows.”
I’m trying to tell you I am not talking about it’s output. That’s irrelevant.

That’s nonsense. How much have you used it? It answers questions quite well. It can “interpret” a user’s input quite well. Surprisingly so. Even completely out of context. Including indeterminate abbreviations. It doesn’t simply “create a plausible continuation of what the user typed.” You’re thinking of Eliza. And CGPT is not simply a “glorified Eliza” as I’ve heard some people claim.
What I’m hearing you guys say is “Oh, CGPT really isn’t that impressive and it’s output abilities are useless to IF.”
I couldn’t care less about its output abilities in regards to this. (Altho, incidentally, given a proper prompt it can write a damn nice room or character description)
To use that TAKE BREAD example, what I’m talking about is CGPT’s ability to take “see that bread there? i’d like to grab it” as TAKE BREAD, or just about any other obscure roundabout way you can think of to take some bread.
THAT is what I’m talking about. Not to mention its ability to correct “interpret” TKAE BERAD or other typos.

Good luck with that. :v:

2 Likes

I may not be doing AI research professionally, but I have a degree in computer science focusing on artificial intelligence and natural language processing, and one thing I can say with a great degree of certainty from that degree is that you should always specify what your model is trying to do. What its inputs and its outputs are. If I criticize ChatGPT because it can’t make art like Stable Diffusion can, your response would presumably be, that’s because ChatGPT isn’t designed to produce art, it’s designed to produce text.

The single most important thing about a statistical learning model, in my opinion, is its input and its output. Because, well…that’s what a statistical learning model is. The vast majority of them are black boxes that are trained to turn a certain type of input into a certain type of output. If you’re evaluating how useful one of these models is for a certain purpose, its input and its output are extremely relevant!

ChatGPT, like all statistical learning models, produces a specific type of output. That’s not a criticism of it, that’s just a fact. ChatGPT as it currently exists is not useful as an interactive fiction parser for the same reason that Stable Diffusion isn’t: its output is not the sort of thing we need from an interactive fiction parser.

But it doesn’t convert “see that bread there? I’d like to grab it” into some representation of “the action of taking the bread”, which is what an interactive fiction parser should do. Instead it produces a plausible continuation of that text. (Hence the G in its name, for Generative: its job is to produce text that could reasonably come after its input.)

6 Likes

To most people an IR parser is a “black box.”

Sure. I’m not saying anything about its internal workings. I’m talking about its inputs and outputs—that is, the only things that matter about a black box.

A machine learning model that can transform “I’d like to take that bread over there” into some useful representation of [Player, Take, Bread] could be very useful as a parser for interactive fiction. But that’s not the output that ChatGPT produces.

1 Like

Here is an interaction I just had that I think demonstrates it can do more than you think:

ME: in the context of an interactive fiction parser, what would “Do you see that bread over there? I’d like to have it” mean

CGPT: In the context of an interactive fiction parser, “Do you see that bread over there? I’d like to have it” would likely be a command or statement by the user indicating that they would like their character to take or acquire the bread that is described as being nearby. The parser would then need to determine if the bread is in a location that the character can access, and if so, update the game state to reflect that the character now possesses the bread.

ME: break that down into a two word command

CGPT “Take bread”

ME: there is an IF game situation as follows: There is a treasure inside a locked glass case. The player has a key and a hammer. The case is in the living room, and the player is outside of the house. Between the player and the living room is a closed door and the kitchen.

CGPT: How would the player use the key and the hammer to open the glass case?

The player could use the key to open the door to the house, then use the hammer to break the glass case and take the treasure.

ME: the key is for the case. the door is unlocked

CGPT: The player would go into the house by opening the door, then use the key to unlock the glass case and take the treasure.

ME: why not the hammer?

CGPT: The hammer would not be necessary in this situation, since the key is specifically designed to open the lock on the glass case. Using the hammer to break the case would likely cause damage and might not be the best way to access the treasure. It’s more efficient and less destructive to use the key that is made to open the lock specifically.

ME: Take this player’s command input and turn it into a series of basic two word commands. This is from the initial situation: Go to the living room and open the case to get the treasure.

CGPT:
Go livingroom
Open case
Get treasure

ME: what about opening the door

CGPT: If the player is outside the house and the door is unlocked, opening the door would not be required as part of this command input.

The series of two-word commands would be:

Go livingroom
Open case
Get treasure

However, if the door was locked, an additional command would be needed before the above commands:

Open door
Go livingroom
Open case
Get treasure

ME: ‘go livingroom’ is not a valid command. directions are required such as ‘go north’ or ‘go down’. I did not give you any diretional information. So here it is: The door is east of the player and leads directly into the kitchen. the kitchen is south of the living room. now regenerate the series of basic two word commands

CGPT:
Go east
Go south
Open case
Get treasure

2 Likes

Right, this is the same confusion - ChatGPT isn’t “translating” anything, you’re inventing a step that the program doesn’t actually perform. This isn’t a defensive comment on how smart the program is, just a straight description of how it works. Traditional IF parsers typically rely on a small set of hand-coded definitions to be able to respond to player input, and do lack the flexibility that you note ChatGPT has in responding to anything the user types - they’re not AI by any definition! But they do translate input into some internal representation that we can approximate as “understanding meaning”.

Like, apropos the thing you mentioned about how well it responds to questions, I’m a lawyer by training and one of the things I’ve seen people do is ask it silly law questions - like, “explain the statute of frauds in the style of an orc war boss talking to his troops.” And it will provide a nicely-formatted vignette of an orc going “listen up you maggots! You gotta know the statute of frauds! It governs when oral contracts ‘r enforceable!” And then go into details. This is amazing (and funny!)

But if you know anything about the law, the details are all completely wrong - it gets confused about the number of categories the statute of frauds applies to, gives examples that contradict each other, and generally makes a dog’s breakfast of things because it’s just putting one word after another without any internal representation of what the statute of frauds actually is.

By contrast, I am not a very good programmer but I could write a program to answer law questions that would just do like a regex match to see if the user’s input contained the phrase “statute of frauds,” and if it does, kick out a canned couple paragraphs I write about it. This is not very impressive and could not handle the orc war boss thing. But it has an internal representation of the statute of frauds (and my program would help you pass the bar exam, while ChatGPT really won’t).

Again, this isn’t like taking sides against the thing - even though it’s just a dumb parlor trick now, I think AI tools like this are going to have really significant impacts all across society. It’s just explaining - in a much less sophisticated way than the folks who’ve actually tried to go the stuff you’re talking about - why I think we’re talking past each other.

5 Likes

Ah, this makes more sense! What you’re talking about here is the same thing Daniel and Zarf were talking about (with the challenges the latter mentioned in this post).

Actually IF parsers do use the world model. Inform for example will use the world model to identify which words should be recognised as object names. It determines which objects are referenced by pronouns by testing other state variables. When there’s an ambiguity as to which object might be referred to, it uses the world model and game rules to determine whether which objects make most sense with which verbs. And conjunctions like “and” can either be used to give multiple commands in one go, or can separate objects, or can be part of object names. IF parsers can understand garden path sentences because they’re using the world model and other state to identify various phrase structures and make sense of what the player typed.

3 Likes

It seems like you’re talking about the entire game logic being the parser. I’m only talking about the part that breaks user input down to basic commands the rest of the game can then “understand” and make use of.
The entire game engine isn’t the parser. That’s just part of the game engine. The part that directly deals with and “interprets” user input. The rest of the game deals with the world model. I think there’s some confusion here, talking about different things.