Why can't the parser just be an LLM?

jkj_yuio · August 24, 2023, 9:25pm

Ok your problem really is that no one has come up with a general system of knowledge representation. Some years ago i was researching this to discover not much has been invented since the 60s. There are a number of systems kicking around today, like OWL and RDF based things with labels and tuples. They all inevitably get messy.

And they are all laughably rudimentary when it comes to doing anything outside of a limited domain.

And to be clear i don’t mean to cast aspersions on your work or to undermine or belittle what you’re doing - I just don’t think there’s been much progress in this area. period.

About 5 years ago i started building a language called KL. Stood for “Knowledge language”. The idea was to make a system of knowledge representation together with a programming system.

I developed a bunch of theory and came up with two things:

a novel data structure called a “trist”
a set of knowledge operators that acted on trists.

A “trist” is both a kind of list and a kind of tree at the same time. It’s quite organic in nature and bits of it can “bud out” or “elaborate” as I call it. This property is important because:

when you start accumulating knowledge into a trist you have no idea in advance which parts will become huge and which parts will remain a handful of terms. In all cases it is essential that the data structure remain efficient for storage size, modification and retrieval.

And instead of an API, you have (2) a set of operators that modify a trist in place. That last bit is important because your trist could be enormous!

I can talk more if interested, but probably need a separate thread.

Jeff_Nyman · August 24, 2023, 9:54pm

Entirely! I get it and certainly take no offense. My stance would be a bit opposite to yours but not entirely orthogonal. There’s been progress but it’s been way too specialized. It’s much like the early era of computers and computing: very specialized machines that could do one thing and do it fairly well. But it was just that one thing. Eventually there was democratization and generalization – but that required a great deal of experimentation. And it was only by experimentation that we could generalize and then democratize. An interesting feedback loop! I believe AI is in that phase right now. (I talk about that a bit in Keeping People in Computing.)

With the increased democratization, even if it’s by exposing more models to the public, I do think we’re at least poised to make better inroads.

Until that happens, I do agree with you that things remain locked into relatively limited domains. But, already in the wider game industry for example where I do a lot of contract testing, I’ve seen the dissemination of more generalized AI-based models. Here I think the reinforcement learning aspects have actually been the most impressive.

In terms of ontologies in general, outside of the game sphere, the work in materials science, particle physics, and molecular analysis is light years beyond where it was even a few years ago. When I worked with Citrine Informatics back in 2019, as just one example, we couldn’t do a thing with rolled homogeneous armor, polyethylene fibers, and boron carbide (military contract, obviously) to balance thermal conductivity, corrosion resistance and deformation resilience in a cost effective way. Now I can run models that handle those and more on my laptop.

Ontology engineering, I would argue, has come if not a long way, at least a less short way as well. Pellet, HermiT, the rules-based TrOWL, etc.

So bringing this to the title of the post – since I fear I derailed things – “Why can’t the parser just be an LLM”? I think those interpretation layers and ontologies (along with API) are part of the reason why. The LLM can serve as one component but modeling actions, events, and conditions is crucial and that has to work seamlessly with the parser.

Bainespal · August 25, 2023, 12:32am

I want to further discuss the abbreviated commands that we all probably use, because these are relevant to the concept of natural language processing and the longstanding ideal of a conversational exchange with the computer, which the LLMs are approximating.

We on the this forum are self-selected keyboard slingers. We don’t mind reading and typing; many many people would find speaking to be a much more intuitive and pleasurable method of input, and such a preference is very explainable, because written text is already the first layer of interpretation. Writing systems are absolutely the most basic and foundational information technology. It is perhaps because of this fact that the idea of writing text into the machine makes so much sense.

Yet speaking in the freedom of unmediated language is the most idealized form of computer interaction that we have always seen depicted in science fiction. I almost want to try rigging up either text-to-speech or a voice assistant to try speaking commands into a text adventure game. Almost–like I said, I’m a keyboard slinger and my free time is over now that I work full time in a clerical healthcare job. But even though I never tried, I am quite certain that I would never enunciate the syllables, “double-yoo” in order to literally input >W into an IF parser to ask it to move the player character westward. I’m confident that I would say, “Walk west.” I would almost never type >WALK WEST; when I play text adventures, I type >W, but if I were to play by speaking input, there would be no more effort and considerably more pleasure in speaking out the full, linguistically correct command phrase. Likewise, I imagine that I would often but not always include articles when speaking commands, although I omit them when typing.

Why does this matter, since I’ve been asking about using LLMs in our keyboard-defaulting garden? Because ultimately parser commands are abstractions of conversation instructions in spoken language, and because the science fictional ideal that AI is bringing us toward has always shown us that, in theory, interacting with computers could happen without codes or even textual writing. As fiction and as a digital personality, the IF parser is always kind of pretending to be an actor in a natural conversation. I can’t feel that this approach of the ideal that a major fascet of our genre is based on is merely tangential.

mathbrush · August 25, 2023, 12:41am

I’ve actually tried a game recently with a voice-based parser and that requires full sentences:

It may be worth trying to see your impressions! It was entered in Spring Thing in 2022.

Bainespal · August 25, 2023, 12:44am

Hey, thanks for the intriguing perspective. I couldn’t pretend to understand your work, but I appreciate and understand that LLMs involve an entirely new workflow. It makes sense that the data model would have to change to fit the different ways that an LLM would handle input. I appreciate the work of game devs and insanely brilliant digital analysts.

DeusIrae · August 25, 2023, 1:18am

This all seems right me. At the same time, having seen the comedy of errors that ensues when I try to get our Google Home to play a song where there’s even the slightest ambiguity, much less when my accent-having
in-laws attempt it, or when my wife is watching tv or on the phone in the other room, I don’t think we’re sufficiently close to the sci-fi ideal to abandon the humble W yet!

cchennnn · August 25, 2023, 2:21am

A better question might be, why haven’t IF/text adventures used any of the advancements in natural language processing since the 80s/early 90s? And the experiments that did use NLP were generally not so well received. Like mathbrush has said here, I think the answer is that for the core IF/text-adventure audience, natural-language interfaces do not necessarily make games more fun to play.

I feel like people too often ignore the developments in NLP before the rise of LLMs or deep neural networks. Conditional random fields were pretty good at parsing natural language into trees [1][2]. There has also been a lot of work on mapping natural language queries to ontologies or knowledge bases [3][4], or to robot planning systems [5][6] (I am not an expert in any of these fields; these are some representative citations I got by randomly searching). It’s not a stretch to imagine this being applied to, say, the Inform world model.

So why are the popular parser systems still using 80s technology? There was this one quote from Brian Eno, “Whatever you now find weird, ugly, uncomfortable and nasty about a new medium will surely become its signature”. But also, pure “natural” language is inefficient for computer input. I think even if parsers could interpret natural language and connect them to the world model, and new players were trained in them, as players gained more experience they would eventually use an abbreviated code anyway, just to reduce the effort of typing. That’s why keyboard shortcuts exist even though it’s “easier” to use GUIs. Also, the existence of a shared code makes communicating with other players easier.

The games submitted to IFComp that tried to use natural language parsers usually do poorly. The highest placing recently might be Thanatophobia from IFComp 2022, which tried to approximate a conversation rather than being a traditional parser game.

inventor200 · August 25, 2023, 2:25am

This is great if your vision of an elevated parser can still handle keyboard inputs and all the shortcuts and shorthands that typically come with it.

Some of us have a disability that make voice commands extremely difficult to use. Sometimes I am not capable of spoken communication, and other times it’s something I find incredibly difficult and taxing.

I need every single computer interface to allow me to interact silently, as an accessibility requirement.

Also, even on days where I can speak, I get a sharp spike of anxiety whenever I have to use voice commands, and this often becomes paralyzing.

I was actually unable to set up orders for my medications, because the local pharmacies temporarily removed their dial-in options over the phone, and did not support my medications through their website or app. If I had to guess, it was probably because they had assumed everyone would rather work through their shiny new conversational interface.

I had to sign papers to allow someone else to call for me. It felt absolutely awful to have to resort to that, and really cut down my self-esteem. Eventually, the dial-in options came back as an alternative, and I regained that part of my independence.

So while I understand that sci-fi typically advertises some future Star Trek “ideal” of spoken computer interfaces, the Star Trek writers apparently hadn’t heard of people like me, and my disability is not ultra-rare either; we just go ignored.

Kayne_agent · August 25, 2023, 2:55am

Sorry if rehashing something that’s been discussed to death, but one of the major things I find myself coming back to when thinking about LLM AI is that in a game, the rules matter.

It’s been said by much better scholars than I that that’s the difference between ‘play’ and ‘a game’. Consistent rules, in the form of logical and physical rules, are essential to a consistent gameplay experience and part of what makes a player skilled at a game is an understanding of how the game’s world model and features interact, and how the player can use it.

But LLMs like ChatGPT have no framework that constructs or enforces a logically consistent response, only a grammatically consistent one. They have no world model at all, only prompt text.
If the game author tells the LLM that the elephant weighs 6 tons and the player tells the LLM “Well, I pick up the elephant and put it in my pocket anyway.”, it will accept the player is capable of such a thing, which might be fun as ‘play’, but doesn’t make for a good ‘game’.

Even when you start with a world state defined by the author, if the LLM can update the world state with its own output to keep track of changes, it can quickly become a feedback loop that spirals into less and less sense - building its new inputs off flawed outputs.

pinkunz · August 25, 2023, 3:09am

Forgive me, but I don’t know if I understand the end goal here. I see posts experimenting with AI code, AI prose, AI puzzle design, AI world model design, AI dialog generation, AI cover art, AI plotlines, AI game summaries, etc., etc.

Ummm, if all of this gets figured out flawlessly, what exactly are we doing? Are we making games anymore or simply ordering content set to our preferences?

I find making games both a personal challenge and a creative outlet. If both the challenge and the creativity are done via AI, I’m not sure why I’m here.

zarf · August 25, 2023, 3:32am

Well, it could be that only some of the experiments succeed.

pinkunz · August 25, 2023, 3:47am

Agreed, but I was speaking more regarding the intent honestly. Most AI proponents seem fairly certain all obstacles will be eventually conquered. So, if that’s the world view, what’s the end game?

inventor200 · August 25, 2023, 3:53am

Wow, as usual, I wrote a lot more than I expected to.

And even if the world state is created and maintained by the author, and the LLM just translates inputs into the closest author-defined command (which we enter into parsers nowadays), there are still times where similar commands are not synonyms, and the game (using an author-defined starting state and author-defined rules) will send back something that has a wildly different outcome than what the player expected.

Of course, this already happens, but that’s because of an error on the author’s side of the game, either from implementation or communication. The extreme problem space that an LLM is built to handle would be overwhelming for the author, however, so I’m not sure if the author would be prepared (or could be expected) to handle every error that might create an unexpected mismatch of user input to command. Already, modern parsers are already close enough to driving authors mad when it comes to unexpected inputs and exploits by the player.

Imagine the bug complexity if an LLM-based input translator sat atop the modern, human-written command-handling layer of a game. If the LLM is placed any deeper than that, you risk your game world losing coherency during gameplay.

I also still think there is game design value in having specific distinctions between two similar commands for the player’s gameplay experience.

Also, I’m curious about how people are imagining this bit, in particular:

Where, exactly, are we storing the LLM part of this LLM-powered parser? On the player’s machine? Aren’t these rather large? Wouldn’t this create problems for mobile? If we’re putting the LLM on a server somewhere, is this not open to various network security concerns? Would we require the player to create an account to play the game, to solve some of these security and accessibility concerns? If someone lives somewhere that has poor internet, or their life situation doesn’t allow them frequent internet access, then are we just locking these potential player out of games which use this parser?

I feel like while an LLM theoretically could fix some of the parser difficulties, I’m afraid it might actually exacerbate many other issues. It might turn out to raise some unexpected entry/difficulty barriers for players (and authors) alike, that lowering barriers linguistically wouldn’t actually have a noticeable impact. As a result, it might be relegated to a niche technology for a handful of games, similar to how Vorple is a powerful framework, which still presents enough technical and accessibility challenges for authors and players to give some pause.

In the relatively-short time I’ve been here (even just hanging out on the TADS side of the community), the majority of Inform games being released seem to still just be standard Inform (if I’m just focusing on one parser system, in particular).

And sure, Twine games run from a webpage, but it seems like a player can download 99% of those and run them locally; the webpage aspect is just how it’s deployed and played.

At best, it might help create some on-ramp games for brand-new players, but I think the game’s design might need to be unexpectedly simple to make sure the LLM’s first impression is free of most bugs. From there, though, the problem remains that the majority of parser games simply don’t work like that, and have enough coherency to support quite a bit more complexity. This might require a second on-ramp.

Of course, all of this is speculation, but I’d like to think I’m at least informed.

Outside of IF, there is a lot of gaming controversy around rising (and potentially-predatory) download sizes of AAA games, and the slow march toward nobody truly “owning” any of the games they play, which requires increasing degrees of internet access and account information to continue enjoying their game libraries.

I’d like to think IF could continue to be a safe space from this corporate theme, and not stumble into joining these trends in the future. I’m not sure if an LLM could meet this standard. If it’s not meant to, then I’m unsure it could be the “future of parsers”.

On an additional point:

Agreed, both as an author and a player.

TL;DR: As someone who’s been following the developments of LLMs (from the perspective of the software industry), it’s finally being discovered that they might simplify some problems, but they actually make other problems unbelievably more complex and unmanageable, often in wildly-unexpected and subtle ways that humans simply cannot clearly understand, or immediately detect. No matter where on the technology stack you place an LLM in a parser game, it just seems to create different arrays of potential problems.

DeusIrae · August 25, 2023, 4:04am

I think there are three or four different ideas folks have, and they wind up getting all fuzzy and lumped together in casual discussion, as is the way of things:

Using these tools to create new gameplay experiences. Thanataphobia and Cheree I think set a strong standard here, providing a proof of concept for a chatbot type interface for an adventure game with some state tracking and a goal. These are games that couldn’t exist using traditional parser or choice systems and mechanics, and I think open up some potentially interesting new IF design space.
Using these tools as a shortcut for traditional game development. I’m not sure how much these will increase accessibility until we see several orders of magnitude more sophisticated tools – the effort required to learn enough Twine or Inform to debug the code ChatGPT spits out is probably about the same as the effort required to learn enough to write games in the first place – but there are some places where I could see them providing some labor-saving improvements. Like, my next Inform game will involve a lot of struck-through text, which from my understanding requires doing some Unicode stuff on a per-character basis to ensure maximum interpreter compatibility. I’ll probably wind up writing another Inform program to create output that I can then plug into the main program, but it’ll be annoying – it would be nice to be able to just ask ChatGPT to do that grunt work for me!

2.5) Using these tools to generate art for text games, which is conceptually similar but raises different issues since here it’s usually less about a developer saving their own time, and more about them being able to generate images when they otherwise might lack the skill to do so on their own.

Using these tools to fill in the blanks in designs. Authors often use procedural generation in their works, leading to gameplay that can surprise even them, and of course folks have used things like Tarot cards or random tables to spur their creativity since time immemorial, so the basic idea makes sense I guess, but it feels to me very easy to take this too far and wind up with the bowl-full-of-oatmeal problem (witness the Fortuna – or better yet, don’t).
Using these tools to replace or supplement the traditional IF parser, I think usually with an eye towards greater accessibility or functionality. As others have discussed in this thread and the previous ones, I’m pretty skeptical of some of the premises here - often the idea seems to be, if we just make the text parser friendlier to new players, there’ll be a giant parser-IF renaissance! But a) these approaches might make a parser more powerful, but they could also easily make it much harder to use; b) over the past 15-20 years, there’s been a ton of work done to try to make parsers friendlier, and none of it seemed to lead to many more players or made games dramatically friendlier; and c) it’s significantly easier to learn to play a parser game than it is to play Elden Ring, yet the latter has sold tens of millions of copies, so I’m unconvinced that that’s as much of a barrier to entry as folks suppose.

I’m sure there are other goals too, but that’s basically the taxonomy I carry around in my head.

Michael.Penner · August 25, 2023, 5:02am

Just a few random thoughts on this as an LLM novice. My experiences with ChatGPT as an adventure author have been pretty good. Once I climbed the learning curve I was able to harness it to produce content/prose that was tight, elegant, and FAR better than anything I could come up with (I’ll sadly admit), although I rewrote the output and believe in doing so it further improved. It seems like the LLM is now capable (or will be) of generating compelling prose (either directly or with a human rewrite) that may rival even the best wordsmiths, especially when GPT-4 is taken into account. Knowing how to query it is key because it does tend to ramble. I learned that the hard way.

As other people have said, the ambiguities and probabilistic nature of the LLM, as well as lack of world state would seem to make its use as a parser problematic. While IF parsers are old technology there’s no reason they can’t keep evolving and improving. Its been long known that the v2 Infocom parser used in those final games was superior to the original yet if I understand, it was never widely adopted. That would probably be a better place to start when looking to build the “ultimate parser”.

I also spent some time a while ago investigating the whole idea of LLM as a “story teller”. At least as of GPT 3.5, it fails pretty quickly at that as well. At first the story is coherent, but then it starts to disintegrate as the LLM goes off the rails. Maybe this doesn’t happen with GPT-4, not sure.

There is the possibility the LLM could produce games in an IF language if fed enough IF-specific input, but I have had mixed luck with it as a code generator. I’ve caught it lying through its digital teeth when queried about stuff like regex expressions and almost all the C# code it produced for me needed to be hacked to get it compiling.

So, for me, its utility is as a prose generator, not much else. Maybe newer LLMs will fix some of these shortcomings.

TheGrandRascal · August 25, 2023, 8:21am

O.K., I give up : What is “LLM”?

borg323 · August 25, 2023, 8:25am

TheGrandRascal · August 25, 2023, 8:27am

Eeeeek! I thought I was the ONLY person who had read about SHRDLU!!!

Does anyone know whether it is still available someplace? (Maybe someone ported it to Windows?)

Lancelot · August 25, 2023, 9:07am

SHRDLU I also know about… But then again I am old enough lol. I found his book “Language as a cognitive process, Volume 1: Syntax” quite interesting, where he used a chart parser to parse English sentences. I got quite far building a parser this way, until I discovered he had abandoned this approach because Volume 2 never got published.

Looks like there is a Windows program you can play with… I cannot verify it myself (on a company laptop and no installs allowed), but this site mentions a Windows text-only console version:

SHRDLU (stanford.edu)

rovarsson · August 25, 2023, 11:04am

Uhm… No room for the dried apricots?

Seriously, what is this oatmeal problem? A logical paradox, a thought-experiment, a metaphor,…?