LampGPT: A basis for LLM-enhanced IF experiences

Hi all,

I’ve developed some early, experimental support for playing interactive fiction (in particular Infocom and other ZIL games) with the help of LLMs. It can be found on GitHub in the UlfarErl/lampgpt repo and I’d love to get feedback from this community and accept any pull requests that I may get.

I started down this path both to play with LLMs and get an excuse to play some of the old, early Infocom games like Starcross. Those games were very terse and sparse, to the point of not having much of an atmosphere at all. And the parser was far less capable than in the later games.

In the GitHub repo there are some examples of transcripts playing using ChatGPT 4. For example, below is a section from Starcross in the style of a high space opera.

starboard
Storage Chamber You’ve ventured into the compact storage chamber of the
Starcross, a repository of tools and gear essential for the intrepid explorer.
The only passage retreats to port, back towards the heart of the ship. Suspended
on a rack, your space suit awaits, a sentinel of fabric and metal poised for
adventure beyond the confines of this vessel. Beside it, a safety line dangles
from the wall, a slender thread that ties you to the safety of the Starcross
amidst the vast ballet of the cosmos. Here, in this small cubicle, lies the
equipment that bridges the gap between the known and the unknown, ready to
accompany you into the starry expanse.

bite my nails
In the grand expanse of the cosmos, with mysteries untold and adventures
unnumbered waiting just beyond the bulkhead, you find a moment of introspection,
a brief pause in the relentless march of destiny. Yet, as you contemplate the
universe and your place within it, you realize that some habits of terrestrial
life, such as biting one’s nails, hold little sway in the face of the infinite.
There are worlds to explore, secrets to uncover, and the vast unknown calls to
you, urging you to look outward, to the stars, rather than inward.

perform Calisthenic exercises
In the vast expanse of space, aboard the Starcross, your desire to maintain
physical fitness is commendable. Though the ship’s confines may limit the
grandeur of calisthenic exercises you might envision, you find a way to adapt,
performing stretches and basic routines in the limited space. Your actions,
though small in the grand scheme of the cosmos, are a testament to the human
spirit’s adaptability and determination to thrive, even in the most unlikely of
environments.

For comparison, here is the original response for the storage closet from Infocom’s Starcross:

starboard
Storage
This cubicle is used for storage. The only exit is to port.
Hanging on a rack is your space suit.
A safety line hangs on the wall.

I’m looking forward to hearing whatever feedback or ideas people here have.

Cheers,
Ulfar

2 Likes

Since you don’t have permissions yet to add links, here’s the GitHub repository for LampGPT.

1 Like

Thanks @Hidnook !

For anybody intrepid enough to try playing via the script, be warned that it’s pretty slow—even slower than playing the games on a Commodore 64 with a floppy drive back in the old days. Also, be warned that with ChatGPT4 it is pretty expensive as well, because I haven’t done anything to reduce the number of tokens I’m sending.

What happens if you try to refer to nouns that don’t exist in the game file? Does the LLM ever mention nouns that weren’t originally in the game?

I went on a longer text dump about this elsewhere on the forum, but the biggest issue I see with LLMs in a parser game is that it might mutate the story beyond what the author intended, which can break things in both story and gameplay.

Sure, this technically means more content for the player, but the LLM content can become nonsensical over time, and can also become repetitive, if the LLM falls into certain phrasing habits.

Also, terse responses have a place in some parser games, because there are moments where respecting the player’s time is more important.

It’s your project, so do as you want, but these are some issues that immediately occur to me when contemplating this, and I’m curious to know if these have been addressed or if they’re simply outside of project scope.

6 Likes

Good points @inventor200.

I don’t think LLM rewriting output is a good idea for all games. For example, very rich games such as more recent one (from Spider and the Web onwards) are rich enough already, and it both feels disprespectful to the authors to play via massive rewriting and it can easily change the game for the worse.

The early Infocom stuff is strikingly barren, however. Not quite Questprobe Hulk “bite lip” bad, but close. It seems worth trying to improve on that, since modern players probably won’t get much out of the games as they are.

To answer your concrete questions, yes, the LLMs will hallucinate stuff that can be detrimental to gameplay. In one of the Starcross playlogs in my repo the LLM is hallucinating the coordinates for a detected mass, while the original game contained those only in the printed material (a basic version of copy protection).

And, yes, the LLMs will make up stuff, new objects etc. However, they are also surprisingly good at giving reasonable answers. So, you can examine the ceiling, rifle through the cupboards, etc. I still have to do more on the input side and prompt engineering, but it all seems to work. Being able to interact with a richer environment was definitely one of my goals.

I’m just doing this for fun and for exploring how to use the APIs of modern LLMs and experiment with what they can do.

3 Likes

For anybody who isn’t put off completely by the ickiness of having LLMs rewrite IF responses, I just spent a part of this afternoon doing a walkthrough of the first quarter of Zork 1 in a Terry Pratchett style. The experience was very enjoyable, and decidedly richer and more fun than playing the original version of the game.

For example, when getting the coffin treasure, instead of a simple Taken. the player gets this more rewarding response

With a Herculean effort that would make Atlas nod in approval, you somehow manage to hoist the solid-gold coffin of Ramses II onto your shoulders. It's as if you've suddenly found yourself in a weightlifting competition with history itself as your opponent. The ancient artifact, heavy with both gold and the weight of millennia, is now yours. But do tread carefully, for such treasures are not acquired without consequence.

If anybody other than me wants to play with this stuff, the Python script is pretty usable, and allows the use of Gemini and Claude in addition to ChatGPT. The cost of playing the first quarter of Zork1 was $2.33 using chatgpt-4-turbo-preview.

1 Like

It would be interesting to see the two side by side, like a Loeb book – for example, the player’s input and the elaborate responses on the left, and the command that’s sent to the game on the right with the original responses.

Even better, it would be good to be able to switch between typing into the enhanced or original versions.

Yes you are absolutely right. It is actually somewhat necessary at the moment, because the LLM stuff sometimes gets confused. There’s a bunch of GUI work (or at least text terminal work) that I’ll need to do to get that up and running, but even so I’ll put it on my TODO list.

FYI, it is now possible to play all of the Infocom games with a phenomenal parser using LampGPT, with just ./lampgpt.py -O gamename. The LLM is only consulted if the original parser doesn’t understand what you want, so this is a super fast and cheap way to improve the way the original games play.

Only two technical things were needed to get this “ultimate parser” functionality to work super reliably (at least with ChatGPT4). First, LampGPT prompts the LLM with a list of all the noun phrases that can be interacted with in the current room. Second, and this was a key advance, the LLM is asked to give chain-of-thought reasoning for the new command it suggests. This gives great results.

Some examples of command rewriting right at start of Zork1 are below:

The player's original command was a vivid and imaginative attempt to interact with 
the game environment in a specific and dramatic way. The command aimed to 
"summon all of your strength and tear up the infernal receptacle of blasphemous 
communications," referring to the act of forcefully destroying the mailbox. However, 
due to the game's limitations in understanding complex and thematic language, we 
need to simplify the command while retaining the essence of the action desired.

1. "attack" or "smash" - These verbs are understood by the game and convey the 
intent of using force to destroy or damage an object. "Smash" is particularly fitting as 
it suggests a strong, destructive action, closely matching the player's original intent.

2. "small mailbox" or "mailbox" - This is the object the player intends to interact with.
The game recognizes "small mailbox" or "mailbox" as valid noun phrases for this object.

By using these words, we can formulate a command that the game engine 
is capable of understanding, while still attempting to fulfill the player's original 
intention as closely as possible. Here's a suggested new command that aligns
with the game's vocabulary and the player's intent:

+++smash mailbox+++

and

The player's original command combines a desire to look at something inspiring 
("Gaze upon the far blue yonder") with an emotional or spiritual action ("feed upon 
the spirit of optimism it provides"). To align this with the game's vocabulary while 
maintaining the essence of the intended action, we need to focus on the concrete 
action that can be translated into the game's language: looking at something inspiring.

1. "gaze" or "look" - These verbs are closest to the action of observing or focusing 
on something within the game's environment. "Look" is universally accepted in text 
adventure games for examining or observing the player's surroundings.
2. "at" - This preposition is necessary to direct the action of looking towards a 
specific object or direction. However, the game's simplified parser might not 
recognize complex prepositions, so we may need to omit it in the command for 
broader compatibility.
3. "air" or "sky" - While "the far blue yonder" poetically refers to the sky or the 
horizon, the game's vocabulary does not include such metaphoric language. 
The game does recognize "air," but since "sky" is not explicitly listed and might be 
more fitting, we'll have to choose a noun that the game understands and that could 
metaphorically encompass looking at the sky or horizon.

Considering the limitations, a direct translation of "feed upon the spirit of optimism" 
is not possible. Instead, we focus on the action that could symbolically represent 
drawing optimism from the environment.

Given these constraints and aiming to preserve the player's intent as closely as 
possible, the command could be simplified to:

+++look air+++

This command focuses on the actionable part of the player's request, adhering to 
the game's understood commands while attempting to capture the essence of 
seeking inspiration or optimism from the surroundings.

and

The player's original command, "Steel yourself for any unpleasantness that may lie 
ahead," expresses a desire to prepare mentally or emotionally for upcoming 
challenges. This preparation is not directly translatable into a specific action within 
the game's limited vocabulary, as the game primarily understands concrete actions 
rather than abstract preparations or emotional states.

To align as closely as possible with the player's intent using the game's vocabulary:

1. "prepare" or "ready" - Neither of these verbs directly appears in the game's list of 
understood verbs. However, the essence of preparing or readying oneself could be 
conveyed through verbs related to action since the game might not recognize 
abstract preparation.

Since there’s no direct way to express mental or emotional preparation in the 
game's language, we look for an action that could symbolize taking a moment to 
brace oneself for what's coming. Unfortunately, the game's vocabulary focuses 
on physical interactions with the world.

Given these constraints, the closest we can come to fulfilling the player's intent is 
to use an action that might symbolically represent preparing oneself for future 
challenges. Actions like "look," "listen," or even "wait" can serve as proxies for a 
player taking a moment to assess their situation and brace for what lies ahead.

Considering this, a simplified and game-acceptable command that attempts to 
capture the essence of preparing for future difficulties might be:

+++wait+++

This command doesn't directly translate the player's intent to mentally prepare for 
unpleasantness but suggests a pause, which might imply taking a moment to steel 
oneself against whatever challenges lie ahead.
4 Likes

That is fantastic work! Any Long Play session incoming? :grin:

1 Like

Thanks! It is exciting to think that we may never have to fight the parser again, imho.

Also, I’d like it if the game would always give reasonable responses whenever the player is exploring the combinatorial possibilities of interactions. For example, in Zork1, “touch the ceiling of the cave with the bottle perched upon the tip of your sword” might get a response “Even with the bottle precariously balanced upon your sword, and stretching as far as you can, you are unable to reach the top of the cave” instead of “You can’t do that” or “I don’t understand.”

The writers of the best games spend a lot of time anticipating random or useless user behavior and creating decent responses, to avoid breaking the narrative flow by exposing the limitations of the game engine. Even so, no author can possibly anticipate everything. To me, at least, that part of the game is much better automated, under the author’s direction, rather than left blank or broken.

Re longplays, there is a playlog of a large part of Zork1 at /UlfarErl/lampgpt/blob/main/walkthroughs/zork1_pratchett.txt on github.com. I’ll put more stuff in that walkthrough directory as I experiment more with long plays.

2 Likes

But the flipside of this is that a good error message tells you that something fundamentally doesn’t work or isn’t useful in the game. For example, you talked about ChatGPT hallucinating coordinates onto the map in Starcross that were supposed to be found in the feelies. How long could a player go on examining these nonexistent coordinates and trying to solve puzzles with them before realizing they’re not actually part of the game?

6 Likes

Absolutely. The game still has to give what are effectively error messages for each of those interactions. For this, I think the LLM just has to be prompted in the right way.

For example “get ground” might originally result in “You can’t take the ground.” and the LLM can be prompted to rewrite that into something better, guided by style-directions from the author. For example, standing in the field at the start of Zork1, the LLM might rewrite this to “Grasping at the ground, you try to grab a handful of dirt and gravel only to have it slip through your fingers.” whereas inside the caves the LLM might rewrite it to “You momentarily reach with an open hand towards the solid rock beneath you. Fortunately, you shake off the impulse before bloodying your fingertips in futility.”

You are right about hallucinations. The prompts would have to be very specific to make sure the LLM doesn’t indicate that these useless actions have side effects. This is much, much harder when the LLM is rewriting the entire game response, not just the “error responses.” That is what happened with the coordinates in Starcross—I was using the LLM for too much, and not in a way that any IF author would want.

1 Like

I took Ulfar’s effort and expanded it further, adding a web interface and swapped out the bocfel engine for Jericho (which uses Frotz under the hood). I also added the ability to run it against any open source model that you can grab off HuggingFace, and deploy it via Modal.

I’m also new to the forum so I can’t post links… but it’s on github, under Manto/llm-if-wrapper project.

1 Like