This is the key question, but requires nuance. To some extent, the same could be said of traditional parser IF, where much of the author’s work is functional specification (“The chest is lockable and locked. The matching key of the chest is the brass key.”) that is rendered into prose by the parser (“You unlock the chest.”) Or likewise, an artist using 3D modeling tools where most of the actual rendering is done by a computer. At what point on the spectrum from “grinding your own ochre” to “prompt engineering” does something stop being a truly human creative work?
But more importantly, I don’t think using LLMs prohibits the author from writing their own prose—although how it is incorporated could vary with the system:
Mostly traditional prose, with the LLM only used “under the hood” for command parsing or game logic.
Most prose is human-authored, but the LLM acts as an editor and improviser, figuring out how to best deliver it. (For instance, the LLM is strongly steered towards drawing dialogue only from a large database of prewritten text, but with some ability to add transitional description and edit the text on-the-fly if the prewritten text contradicts the current game state.)
Most prose is LLM-generated, but informed by an extensive library of example prose and character information.
There is a lot of territory still to be explored between “traditional code that delivers only pre-written text” and “the ‘author’ just writes ‘the player’s son is called Darien’ and the LLM comes up with the rest.”
Hmmmm… I’m not sure I’m convinced by that. You are describing the skeletal framework that works well enough if untouched; the foundations that, should anything fail, will hold up the game; but by and large, that is not what people come to see and enjoy (unless the author is making a specific point).
If, however, you’re saying that LLM prose may be as bland and uninteresting and purely functional as the default constructed responses, then I agree! But I don’t think you’re saying that, I think that’s me saying that.
Now that you mention it, I am worried about AI butting into situations which should be simple foundational defaults and altering them in such a way as to interfere with the flow, and the prose; and to give the player red herrings by being too colourful at a time when it must be practical. Things which are the art of game designing - not just “this puzzle here”, but “direct the player’s attention to this place in this way”, by all sorts of techniques and little tricks.
Could you make a stable, completely predictable game in that manner, or would you be always at the mercy of the LLM? Could you trust the LLM to be coherent all the way through? How would you even test that game for output, bugs and consistency?
If the answer to that is “LLMs are not quite there yet”, we can only speculate. Speculation goes both ways, though, towards a positive and negative outcome. I’m not sure the current panorama gives me hope for a positive outcome. But if it does happen - and if what you are describing is that positive outcome - does it reduce the author to someone who just writes the mechanical aspects of the game, and lets the LLM handle the prose? If a hybrid, can you be sure that the LLM will emulate your prose well enough that people won’t feel it jar? Can you know that you’ll be able to go through the output and make fine-edits yourself, and trust the AI to retain those edits?
And… quite honestly, now… will it not bother you to put in care and work in the bits of prose that you DO write, which involves so many choices, and then find that AI can replicate them reasonably okay-ly just like that?
(“reasonably okay-ly” is a sign of personal expression, which I never saw before and just decided to come up with right now; which says a lot about my casual posture and the way in which I may express myself by making up terms which are understandable. It also stands in contrast to my sometimes possibly stiffer writing style. There is quite a bit of character here. If I used this in IF, or static fiction, it wouldn’t be great writing, but it would be effective for accomplishing a certain purpose of characterization, economically. It is also not based on anything have read previously, which is the basis for AI. Would an author want to trade away the possibility to do this? Or to co-author with a machine which cannot be as original as the author?
Honestly, I mostly see the appeal for authors who don’t consider themselves good writers and who therefore will jump at the chance to make a game where they don’t have to do the bit they don’t like or think are not good at. While I can understand this, it seems to tie into the whole “because AI makes things easy for people without skills, people are not bothering to learn those skills; and people who have learned those skills are qute POd”. These are troubled times)
*****
Also, if I may address combinatorial explosion, which is inevitable in a game that plays like your transcript: how do you deal with that in a system where you yourself may not know where the AI will take you (because the input it can take from the players will be massive, and if it accomodates it all, it’s inhuman to try and deal with it)? Or do you maybe let AI handle that? Which comes to the question, will you trust the AI to handle the messes that may be created by its combinatorial explosions? Will you embrace that your murder mystery might change its culprit partway through? Or if you don’t want that to happen, can you trust the AI not to mess it up by improvising madly?
If the answer to all of this is “I think that all of this will be possible, but LLMs are just not at that point yet”… I hope that by bringing these points up now rather than later can encourage whoever actively works on it to address them pre-emptively; and don’t see anything else that can be done about it.
(although I’d be happier if the AI plug were just pulled completely. Ah well, ships that have sailed, and so on. I’ll keep ranting against it - but not here in this nice clean blue forum)
EDIT - Rereading the post and the thread, I am not sure that I’m not rehashing ground already covered. If so, I apologise. It’s not that I’m not reading; it’s more like, it’s not getting through to me. You know, like one of those students in class…
There is a lot that could be said about this, but in short, I think the technical and artistic challenges you (correctly) identify basically boil down to two points:
By and large, LLMs are not an improvement over traditional IF for the things traditional IF is already good at (tight mechanical puzzles, for example). Rather, they open up fundamentally new types of interaction, like a freeform social scenario where the win condition is “somehow convince Dr. Broest not to do such-and-such.”
Figuring out how to keep an LLM “on rails” to deliver the intended experience is the fundamental technical challenge, but LLMs have gotten to the point where I am convinced this is a problem of architecture, not a matter of waiting for improved LLMs.
So it can be reasonably viewed as a problem that can be worked on now rather than later, even if the results might not be clear until other parts catch up, yes?
Yes, and I think authors are already starting to make some headway (see DetectivesGame, for example, which makes some interesting architectural and design choices to keep the LLMs “on rails”).
Although (IMO) that project also illustrates an unfortunate reality of LLMs in IF—for every heartfelt, creative, human experience that effectively harnesses LLMs, there are also going to be acres of slop.
In my personal opinion, the only thing I would enjoy AI being is the parser/host not the game master - in the sense of a D&D GM has to improvise and role-play and make up solutions, where a “game show host” only comments, clarifies, and follows the rules.
The AI/parser doesn’t go off-script, and only considers objects actually in the game world. It only improvises a bit when given a nebulous or troll command.
Where I see it being potentially useful is jumping in whenever the standard parser doesn’t understand or throws an error. It’s job is basically smoothing out “You can’t do that” responses. For example:
the AI Zork I tried I entered PARKOUR THROUGH WINDOW which isn’t a recognized action. The AI looked at my command and understood “parkour” off-script as a form of locomotion, and re-translated my command into [entering the window] without comment and the game proceeded normally. If PARKOUR was a game action it would defer to that processing.
Supposing KITCHEN SINK is scenery and not important to the story. EXAMINE SINK provides the normal scenery description. If I interact with the sink, the AI understands there are no actions important to the game world and constructs a polite refusal message. “You don’t really have any reason to mess with the kitchen sink.”
TURN ON FAUCET - ‘faucet’ isn’t implemented, but the AI figures out that’s a normal part of a sink which is described as scenery, so it understands I mean the sink. Since there is no implemented SWITCH ON action associated with the sink, then the AI can use what it knows about sinks and the world-model of the game to reply “The kitchen sink operates as you’d imagine, but there’s nothing significant about it.” (If there’s game text/description that the faucet is broken, it knows this and can use that “The sink is broken.”)
GET WATER FROM SINK - again, not implemented. AI understands this is a common use for a kitchen sink, however there is no mention of needing water in the game, so instead of hallucinating an object, it says something like “You switch the water on and off. It works as you’d expect, but the water runs down the drain.”
FILL CHALICE WITH WATER - I’ve found a trophy that is a container, so it makes sense that I could fill it with water. But since the sink is not implemented as an actual water source, the AI works around it. “The Bowling Trophy is shaped like a chalice that you’d imagine could contain water, but that doesn’t seem to be its correct function here.” (bad implementation, but honoring author’s intent and informing the player). If there are drinking glasses in the game, “You’re not thirsty. You don’t need a glass of water from the kitchen sink.”
PUT OUT THE FIRE - There’s a puzzle that I need to put out a fire in the fireplace in order to take the key in the middle of the flames. The author has mentioned a sink in the kitchen, but the proper solution is to find the bucket in the Garden Shed and fill it with water from the hand-pump outside. The AI is aware of this, so if I FILL CHALICE WITH WATER it understands my intention but that’s not the correct puzzle solution, so here it sees there is a “water” object possible in the game world that comes from somewhere else and a proper vessel to solve the puzzle. If the fire is already out it refuses normally “That’s not really meant to contain water.” If the fire is still lit, it can see I’ve been trying to put it out in the transcript, “While you might need some water, the chalice isn’t an appropriate container for it.” If I find the bucket but not the water pump to fill it and try FILL BUCKET AT SINK that’s when the AI needs to referee: “You probably could fill the bucket at the sink, but the flow is minimal and it will take forever. You might need to find a different water-source.” Yes, bad game implementation, but the AI does not allow or imply an alternate solution to a game puzzle, even if logical.
For absolutely wacky commands, the AI can devise an appropriate response, but not based on any important game action. EAT DRINKING GLASS “That would only result in mouth-lacerations.” SMELL (in the kitchen) “You imagine when a meal is cooked here that there could be delightful aromas.” (not suggesting an aroma exists and giving any false clue/lead) LISTEN [fire lit and does exist - non game-changing] “You think you might faintly hear that fire crackling in the fireplace in the Living Room from here.” COOK A MEAL “While the kitchen is the appropriate place, you’re not hungry.” [if the player actually has a hunger stat] “This isn’t even your kitchen so you wouldn’t know where to begin.” (the player needs to find the frozen dinner in the freezer and put it in the microwave). CONSIDER MY OPTIONS. “You consider for a while, staring at the Kitchen surrounding you. There is a refrigerator, freezer, a kitchen table and cabinets.” DO A CARTWHEEL “Impressive. The world inverts for only a second.”
You just described an experiment I’m currently working on Hanon
But yes, I built a functional auto-GM a while ago. It was able to keep a narrative going, but it was a terrible narrative, as it was too eager to give good outcomes and did not properly keep track of a world state.
This is, as theorised here, probably alleviated by making sure the world state is tracked by something like an adventure module, or a parser. Parser fiction has the advantage of already being able to respond to a number of player actions in a non-sycophantic way.
Now, I happen to know some researchers are trying to use Inform 7 for something similar, but I think that is the wrong move. Inform 7 is far too sensitive to slightly misplaced grammar.
I am actually looking into using Dialog, as it tracks the world state using a set of statements, and an LLM can be used to insert new rules during gameplay, which appends the world state.
I’m actually implementing a GM master for solo storytelling in 2 different ways. They are commercial projects intended mostly for dungeon crawling but easily adaptable to murder mysteries also. Both architectures have one thing in common: the AI is given the ability to “record events” in the engine, where “event” is anything that changes the character’s state or the environnement state… changes in hit points, fatigue points, number of coins, number of arrows, statuses linked to various effects are all examples of events. So the AI doesn’t have to remember: it simply records events in the engine, which will remind it, each turn, of the current state of the game world.
In my personal opinion, the only thing I would enjoy AI being is the parser/host not the game master
The second project of mine is like you described: the AI is a parser and a ‘describer of outcome’ but the engine is deterministic. In the first project AI is a real game master. Both use the event tecnique I described above.
I’m also curious about whose paying the potential ongoing costs related to getting prompt responses from a (commercial grade) LLM, because the “free” usage plans of such only go so far…
…and before someone states that the player’s input is likely to be quite small in the overall scheme of things, I will remind them that the project’s character & story-line & etc… specifications also needs to sent with each prompt else the LLM won’t have that information available to it when it generates the response to the player’s input.
I will remind them that the project’s character & story-line & etc… specifications also needs to sent with each promp
exactly. Actually I plan to make subscription with a ‘message packet’ model i.e. you pay X $ for 50 messages to the LLM. I have to determine X but I’ll make it as low as possible: gemini 2.5 has quite generous offers so I think X will be in line with the competitors (Dungeon AI, Dreamio and similar).
I tried running full NPC dialogue once. That definitely requires an external API, but with programs like Ollama, if you keep the actual task simple enough, you should be able to work with a lighter, local model. But, that definitely means engineering around that and not leaving big narrative decisions to the model.
I’ve also considered that AI might have use for NPC dialog, but I imagine it would almost require a “director/rehearsal” mode…basically the author conversing with the NPC/AI and giving it directions about what the character knows and the gameworld/plot situation.
>ASK BOB ABOUT WALLET
Bob says, “My wallet is under the leaves in the garden.”
>NOTE BOB: When the game begins, Bob has lost his wallet and does not know where it is located until Bob's wallet is shown to him or given back to him. [Bob - Note taken.]
>ASK BOB ABOUT WALLET
“I don’t know where my wallet is,” says Bob. “I think I’ve lost it somewhere.”
Pretty much like that. Got it working with a Unity build at one point. It requires you to share a whole lot of context with each request like that, pretty much like a director.
However the big thing is then accounting for latency with the LLM generating a response. Either way, you end up sharing a ton of context and instructions with each call, to prevent the LLM from going off-script.
Though I found that there was a precarious balance to be struck with giving context, and over-prompting the LLM. Especially when a character is meant to have world specific information, you need to set up a structure where you don’t need to share the entire world bible with every LLM call, and instead rely on a first LLM pass to determine whether to give a direct response, or use the “lore dump” function.
Even here I found it was usually most token and latency efficient to write sections of prewritten lore sections, and an input field to allow the LLM to write a prosaic bridge. Not doing this would allow the LLM to just invent pieces of world building not in the game.
Edit:
Couldn’t find a recent screenshot of the Auto-GM. Here is an early prototype.
you need to set up a structure where you don’t need to share the entire world bible with every LLM call
Exactly. I divided the word in logic ‘regions’ and I pass to the LLM just the region where the PC is. Also the NPCs are filtered by region: I pass just the NPCs in the actual region. Another possibility is to avoid fixed pre-scripted regions and pass to the LLM only the content at 0 or 1 move distance from the PC.
Even here I found it was usually most token and latency efficient to write sections of prewritten lore sections, and an input field to allow the LLM to write a prosaic bridge. Not doing this would allow the LLM to just invent pieces of world building not in the game.
Same here: for each NPC I have an infos field and a secrets field.
To connect this to @Giger_Kitty’s earlier question, this is one of the areas where I do expect improvements in LLM tech to help developers. At the moment, the question of “how do I select and summarize the correct information to pass in each LLM call?” is a key architectural question, but with ongoing improvements to long-context reasoning, context caching, and smaller+smarter+faster inference (driven by strong economic incentives from the corporate AI world, where cheap long-context reasoning is the Holy Grail of AI), I expect a corresponding shift towards long-context prompting where it becomes practical to give LLMs access to most or all of the information about the game world simultaneously.
I may have made some headway today with Eidolon. Currently trying to get it to run Cloak of Darkness. I’ll keep anyone posted on whether it’s not just functional, but actually produces narratively satisfying results.
I suspect the key is good organization of artifacts - rules, state, clear system for referencing. I’m not sure if you’ve worked with Claude Code at all, but that is what came to mind (skills, memory, mcps, subagents, etc.) You could have the equivalent of skills for specific areas such as combat, npc dialogue and a shared world state reflected in JSON or markdown. Some of the context would be dedicated to always reflecting current world state. but for skills, just like claude code, the skill only would need the bits relevant to its work. How much and in what direction the world is allowed to evolve would need to be clearly defined up front.
LLMs are an order of magnitude smarter than AI dungeon days.
I’m also curious about whose paying the potential ongoing costs related to getting prompt responses from a (commercial grade) LLM, because the “free” usage plans of such only go so far…
It’s a problem, but I suspect it is a “now problem” that will be solved as either on device models get better, overall prices come down for yesterday’s state of the art, and new data centers ramp up. I also think balancing the model performance vs need is key.
In the IF Engine I built, I use AI for a fallback mode if:
The player decides to enable it and opts in after understanding data / privacy
The parser failed a first pass.
The goal is to help ease a broader group / new generation of players into succeeding with parser based interactive fiction.
So things like “I think I’d like to have that apple” becomes “take apple”.
I found a company (https://www.nscale.com/) I could use via API that is located In the UK. The model I need works at a cost of .01 / million input tokens .03 / million output token.
Long term sustainability is still questionable, but I can do it for now. Perhaps it becomes a premium feature.
Do we want to teach the players that commands like that are good and the way they should be thinking? Part of any interface is showing the players what the limitations are so they can best interact. Is it useful to allow any and all input like that? Doesn’t such a free form method even distract from the story?
Maybe in the the new paradigm of limitless interaction it makes no difference, but I’d like to put it out there. It seems to encourage a type of interaction that I’m not sure is in the player’s or author’s best interests.