Use of LLMs in custom interactive fiction engines

So, there’s been several posts on this board that are years old now, but I can’t seem to find anything recent. Has anyone integrated local generative LLM into a game engine? I’m talking about using an to interpret player input into something the parser can easily break down, or giving some life and variety into conversations with NPCs?

There have been various experiments in this vein, but none have been particularly well-received on this forum. (Look at last year’s Parsercomp for some examples and the controversy around them.)

I think there’s potential in preprocessing player input, like you suggest, but I’ve never seen an implementation of the concept that really impressed me. And using LLMs for NPC dialogue runs into the problem that nobody really wants to read LLM-generated text.

2 Likes

Running something local on platforms where IF is expected to be runnable is going to be very difficult, if you’re also hoping for anything useful.

1 Like

SO… I’m glad I got some responses from the community on this. I’ve spent a day playing with a custom LLM-based game engine I wrote and designed and I have some thoughts.

As a software developer, I think it’s undeniable that AI is the future of code. Computers are getting vastly superior to writing software than humans are. (Crud, if you wanna talk about compilers, they have been for decades. Though LLMs are different—I get it.)

There may be something to using an LLM as a preprocessor. In my opinion, it is too slow. Most users (game players) would rather TYPE LESS than communicate with a parser using commands like “so hey go north already!” or “um, why don’t you tell me about the green key.” Right now, an LLM can already break this down into something the parser can understand, but good gravy it slows the game down. You’ll be sitting there thinking: in a real game, I would literally just type A KEY rather than type a bunch of extra words.

Now the hot take: LLMs will NEVER be good for generated content. EVER. I get that they’re getting better all the time, and that eventually they’ll have local GPU-style chips running the math at brain-numbing speed. LLMs will NEVER, EVER be good at producing words for the player to read in a game. I did a lot of testing on this with NPCs, with the idea that maybe you could use LLMs to create player interactions that would sound more natural than ASK THE MAN ABOUT THE TREASURE.

But I’m going to argue that the premise is fundamentally flawed. Just as computers are great at writing instructions for computers, humans are the only known thing in the universe that can write for humans. Writing is perhaps the most human thing we do. At its very core, writing is a human act.

When I write, I tend to use my ear a lot. Is this text readable? How’s the meter? Not that I’m any good at this, by the way. I can’t describe the process of writing, or what makes words go together. It just is.

I also can’t explain how hard it is to make an LLM produce NPC dialogue that people will want to read. The algorithm can take my words, whirl them in a blender with a bunch of statistical tchotchkes, and blast them onto the screen after a LOT of compute power is used. The computer doesn’t have an ear, period. It CAN’T approach writing by thinking about meter, or how words sound, or wit.

I also experimented with condensing content to make it easier to read. You know how, when you execute a LOOK command, the game produces something like this:

You’re in a beautiful garden, with flowers all around.

Near your feet is a small iron key.

Not far away is a trickling stone fountain.

You can see a welcome sign and a red umbrella here.

I wanted the LLM to combine all the various scenery text and object descriptions together, in a way that read maybe a bit more like a novel. Not only was this slow, but the result was UNREADABLE. I don’t mean unreadable as though it was bad English. I mean it was garbage content, poorly written and badly executed. The sentences made sense, but the words didn’t go together. Read this:

The garden blooms around you—perfumed petals and bright color everywhere. At your feet, something small and cold catches the light: an iron key. A fountain murmurs nearby, its water whispering over stone. Beyond it all, a welcome sign stands waiting, and a red umbrella tilts in the sun, like an invitation you haven’t accepted yet.

I’m LITERALLY SCREAMING on the inside at reading this.

Sucking up all the written content in the world (and at this point, that’s been tried, several times over) can’t give the computer a soul.

If you’ve read this far, sorry for the diatribe. While I’m not against experimenting with new technology, I will die on this hill: using LLMs for generating text in games is a dead end.

2 Likes

The ability to parse commands like that would be most useful in speech-to-text applications, IMO. People who dictate text messages into their phones, or play games on Alexa, might appreciate being able to play IF the same way.

3 Likes

Yeah I get that I guess.

As a software developer, I think it is deniable that AI is the future of code. The one thing that an LLM can never have, no matter what, is insight. They sometimes appear to have it, but when you get down to causes and conditions, the thing they come up with is always part of the input.

I believe we call AI insight “hallucinations”

3 Likes

Despite having big reservations about AI, I do believe that genuine artificial intelligence is the future of programming, and nearly all other jobs too.

Exactly how it all plays out is going to be “interesting”, and it could all end up very badly. The better AI and robots get, the more uneasy I feel.

AI was not required to understand that command to go north

Not only is it annoying to read, it suggests:

  • X BLOOM
  • SMELL PERFUME
  • TAKE PETAL
  • LISTEN TO MURMUR
  • LISTEN TO WHISPER
  • ACCEPT INVITATION

None of which are likely to be programmed. If they are accepted anyway because an LLM guesses, it might have undesirable consequences for gameplay.

3 Likes

That output is already nearly perfect and I wouldn’t want to change it. AI generated prose is godawful…every…single…time.

2 Likes

I think in this particular case, the Adventuron parser happened to get the correct answer by dumb luck. If we throw it something a bit more difficult it doesn’t work, where an LLM probably would.

The next command, PULL LEVER, works correctly here. In the example above, because the word GO is in the command, the parser is thinking I want to move in a direction. SO HEY GO NORTH ALREADY contains the character string GO NORTH so Adventuron can figure it out.

A human would have no issue with this, even if the phrasing isn’t perfect, because context clues are sufficient for me to gather that what the user wants to do is to pull the lever. An LLM of any recent design would likely be able to surmise the same, as long as the engine fed it enough context data.

Seasoned IF gamers are able to play games because they’ve been trained; they know how to interact with the parser and what it can and can’t do (usually.) I kind of think that without a LLM as a preprocessor, you’d have to do the same for voice-interacted games: get the user on board in advance with “here’s the way you can interact with the computer.” Just as simulation or strategy games require the use of a mouse and GUI, we don’t expect the parser to accept common human English. Having a standardized paradigm is great for users and developers, as we can reasonably guess what most users will expect from the game.

1 Like

I accidently turned on co-pilot in VS-Code and left it on for awhile to see what it was like. It was like pair programming with an over-eager junior developer– very annoying and I had to do a lot of correcting when it thought it knew what I wanted, but really didn’t understand. I will admit it was sometimes uncannily good at refactoring code, but jumping from writing code to reviewing code and back again over and over was very taxing. It was impossible to get into a flow state, which is the thing I really like about programming.

3 Likes

Yeah, I tried co-pilot briefly when it was first released and my experience was like yours. It’s like super-sized auto-correct, and I already hate regular auto-correct as it is. IMO having an LLM running in the IDE with you is the worst possible use case for LLM. Which is why it’s so frustrating that every software and service now is trying to cram an AI assistant into everything you do.

1 Like

True, but not damning regarding AI coding. The software dev does the architecture and design — where insight lives — and the AI takes the grunt work. And acts as a sounding board, if you want it to.

2 Likes

I use Visual Studio and turned off everything regarding copilot I could. I may even uninstall it from the IDE altogether. I also disabled sharing my private GitHub repositories, which is another deep dive setting. The thing about Copilot is when you use it in your work you are granting Microsoft free use of your IP. I have over 25 years of source code on GitHub. A lot of hard won lessons and I’m just not ready to hand all that work over so that some lay about who doesn’t know how to code gets all that for free.

1 Like

That wasn’t Adventuron.

  1. Sure. Why not? Sometimes a player doesn’t have enough experience to know the expected structure of parser games. Sometimes the author doesn’t think of every synonym or intuitive command even a competent player can think of. Both of these scenarios increase accessibility without altering gameplay or authorial intent.

  2. You’re talking about content creation here though and whether or not an LLM can generate prose that is valuable to a player’s experience. I think you’ve already answered that question when you later stated…

Fair enough.

Regarding generated content. I think LLMs can’t do it well because they lack any understanding and what something actually means to a human being. We create content to share ideas and express our thoughts. We feel that these ideas are worth sharing, that they carry importance somehow. When we communicate these ideas, we communicate their worth. We also try to respect the audience as we communicate the worth and importance of these ideas. An LLM can give the illusion of an idea, the illusion of it’s worth, and the illusion of respecting the audience for only so long. When it inevitably fails, it’s catastrophic to the whole experience because it shows that it ultimately has no comprehension of anything and that the whole time, it was just that… an illusion.

When we string together words, we create meaning that is personal, cultural, societal. When an LLM interprets words as tokens, I think real meaning is lost.

If an author thinks an LLM can express ideas of actual worth… they’ve already said to the audience, “I can’t be bothered to articulate an idea myself so here’s an LLM that may or may not do a good job of it. I’ll never know if it does a good job though because what the LLM tells you will be different each time. I hope you don’t mind spending time with something that may or may not pay off in any meaningful way. I hope you enjoy my story that I didn’t write.”

The only person interested in LLM generated content is the author who designed the safe guards. The audience is simply a part of the experiment… which is why you typically have to pay people to participate in experiments.

Other than that, I like LLMs and see them as useful tools. :slight_smile:

1 Like

Copilot is more than just inline completions. IIRC, inline completions use the basic free models anyway, which aren’t aging well. If you want to see the real benefit of Copilot, try agent mode in the chat sidebar with a recent model.

I tried it back when all it had was inline completions, and concluded it was a cute trick that occasionally saved me some time writing boilerplate: promising, but not terribly useful at the time. Now, I can give it the symptoms of a bug or a failing test and it’ll find the root cause and fix it. For example, tracing a bug in some generated code back to the part of the generator that produces it:


1 Like