the dreary recurrence of announcements of new platforms and games with big promises
Exactly. I like to encourage development of new systems. Some ideas have shown a lot of promise. And we do need new systems. But as everyone knows, getting a new system actually working is a huge undertaking and most run out of steam well before then.
But it seems the new fashion is to just dump the whole problem to AI and claim you have it all done in one super-swoop. Just like that!
Actually you don’t. And this is making things even more negative for those that are receptive to (or interested in) aspects of AI use. Or at least, see AI as having some use, in some areas.
I’ve always claimed AI cannot solve the “whole problem”, whether that means writing a complete story or making a finished picture or even a body of text. This is why it’s my belief AI is less capable of putting creatives out of work than many people think.
Driving AI properly is a skill, like using any tool. A lot of the AI drivel people criticise is largely down to lazy use. Garbage in = garbage out, certainly applies here more than ever. People see using AI as a easy option. But actually, it’s not.
And i don’t think AI is going to get all that much better either. Scaling it up just means making it quicker, but ultimately it’s fundamental problems won’t go away simply by doing that.
So, back to my main point. If you’re using AI in or for a new system, for goodness sake, put some effort into it. Otherwise it really will be rubbish.
Your example is describing a choice-based narrative, which is in fact a large portion of what this community creates. Yes, as you imply, it is impossible for one person to tell every possible story, but I think you will find that we’re inexplicably excited to listen to all the stories that you do have to tell. Getting a prediction engine to guess at every variation you didn’t create doesn’t create anything.
But it’s also possible to get a prediction engine to deal with input not directly accounted for by the story writer ( I tell Chakotay, “I don’t know. I’ll think about it.” ) and then weave that back into the human written narrative. Maybe my conversation with Chakotay just affects his ‘disposition’ variable, and then this variable affects his behavior later down the line.
The problem with AI-generated dialogue is that it usually doesn’t mean anything, either.
Suppose the surrounding human-generated story wants the conversation with Inspector Lestrade to reveal one piece of information (he found a bloody fingerprint in the hallway) and have two possible outcomes (he’s impressed by your deductive skills, or he thinks you’re a prideful idiot).
An LLM could simulate a full conversation that reveals this piece of information and leads to one of two outcomes, sure. But the vast majority of that conversation will be pure hallucination, which is likely to mislead the player: revealing additional fake “clues”, reacting to dialogue in a way that implies more than two outcomes, and so on.
I’d prefer a human-written dialogue that has fewer choices to pick from (rather than freeform input), but they’ve been specifically crafted by the author to express exactly what the surrounding story needs. No unintended red herrings, only what the author actually wants out of the dialogue.
Now, this could be a great use-case for a more restricted model, like I was talking about earlier! Train a specialized model to map any text the player types into one of the dialogue options you’ve implemented, one of which is “blast it all, Holmes, why are you going on about this nonsense when a man has been murdered in the middle of Parliament”. If the model gets it wrong, then…well, Lestrade not seeing Holmes’s point is entirely in-character!
That’s the sort of thing I hope we can still get in comps going forward: no ChatGPT-generated text, but leveraging this new technology in clever ways to improve the experience of interacting with human-written prose.
And this is in fact exactly what happened when I played the Mystery Academy game listed above, although it was allegedly specifically designed and limited to prevent that kind of thing from happening.
More generally, to the example, I’m not sure it’s an oversight that in the hand-generated version of this Star Trek game, the player isn’t given an option to shrug their way through a pivotal conversation and say “gee I dunno boss.”
I’m skeptical of whether “limited improvisation” can ever truly avoid hallucination problems, or dull, meaningless text. See Mike Russo’s experience above, for example. But I’m willing to be convinced!
I’m no great fan of LLMs, but I do think neural networks in general have enormous potential if they’re applied in the right ways, and I’d love to see more exploration there. The fact that RNNs are so good at processing text now should be a wonderful thing that can have so many benefits to us; the main reason it’s not, imo, comes down to smoke and mirrors and a lot of unethical behavior on OpenAI’s part rather than any inherent property of the underlying technology.
In my own field of research, language models (not LLMs) were just starting to be used to solve intractibly huge problems like “which of these thousands of fragmentary clay tablets might fit together” when LLMs took off, and now everyone is relying on confidently wrong ChatGPT hallucinations to translate words and it’s poisoning the well of language models as a whole. Which is a real shame.
In response to the plagiarism concerns, could the argument be made that Chinese AI models avoid these problems, since China has stricter AI regulations and less permissive attitudes towards big tech enterprises?
This articles shows the type of legal outcome that I think we’ve yet to see in the U.S.
Potentially! But as far as I know, all of the current LLM makers are very cagey about their sources, Chinese and American alike. I’d love to see some company decide to stand out on the ethical and environmental front instead of sheer volume of training data, but I haven’t seen one do that yet.
Fundamentally, I think good results are going to require building and training your own models in the end, rather than just calling out to a commercial product.
I don’t think AI is ideologically bad, and there are some parts of my life where I use it in moderation and almost everyone I’ve talked to uses it similarly (specifically, putting code error messages and small snippets of code into it and asking it to guess where the error is coming from, realizing it is likely only going to be right some of the time).
@discerning90, here are three separate issues with AI games right now that are all fairly different from each other:
Speed. It is true that parser games were once very slow. Now they are fast. Games with ‘timed text’ are almost universally despised (about 10% manage to find some fans). Right now, live AI service to parse commands for any reason is, I’d say, unacceptably slow. It’s possible that a slimmer model trained only on IF inputs could speed things up. If that happened, then this problem would of course go away.
AI has a high ecological footprint. Right now, any use of AI contributes to that, so for some, there is no ethical use for AI, whereas for others (like me), it is used occasionally (for me, about once a month) with a slight unease in the back of my mind (and others don’t care at all). New innovations in hardware or software could reduce this burden, but until it happens, there will be a significant chunk of people who won’t like it for any reason.
AI-generated text is repetitive, doesn’t provide helpful in-game feedback, and is often misleading. In every case where an author has posted both AI text and the prompt they used to generate it, the prompt is better-written and more applicable to the game. AI text ‘looks right’. Good game text, though, will omit unnecessary details (to relieve a player from the burden of exploring 100 empty bookshelves and spiderwebs and trees), give an indication of what can be done in a location (for instance, if a well is described, it should be relevant that we can go in it or look in it or raise the bucket, etc.), and contribute to the overall plot (so if there is a ‘sense of mystery’ about a door, then it shouldn’t go to a broom closet, or if it does it should be a joke and not just an oversight). Until these issues are resolved, players will simply not enjoy AI-written games.
It sounds like you might be able to overcome 3, and that many people won’t care about 2, but 1 is still going to be a sticking point. Especially since free-form text entry isn’t really beneficial in a parser game as the main method of input. Typically, you’d want a simplified set of easily-understandable commands that can be quickly processed to move around a map and take things. You’d be using the same set of commands for 90% of the game, with unusual commands (like spells, passwords, or riddles) making up the rest. In modern parser games, all those commands execute in milliseconds. AI doesn’t really help here, unless you just care about fixing simple typos, and there are non-AI extensions that already do that.
You could make a game where most of the commands entered are unusual. Andrew Plotkin has a very old game called Praser 5 that is like that; it’s all really hard riddles and thought experiments. Theoretically, an AI could help tell if someone types in something close to the answer and count it as correct, and timing wouldn’t matter as much since each command corresponds to a lot of thought.
But just hooking up an AI as an intermediary between typing and pre-made human commands seems like it would provide a lot of slowdown with little benefit.
The best-received AI game I’ve seen so far is Terra Nova:
The author had this to say about AI usage:
And finally, while AI tools were used for proofreading and image prototyping, the story itself – conceptually and structurally – was human-authored. The world map is hand-drawn, the character art refined by illustrators, and the story’s speculative arc was something I cared deeply about developing. This is mentioned in the game credits.
I felt like much of the writing resembled standard AI text, but not all; a lot of it feels custom-made. It’s terse (which is nice, compared to AI’s verboseness), but some of it’s text has AI hallmarks. Like this:
A gaping chasm splits the rocky terrain, revealing a deep void. A rusty iron ladder leads down into the dim light. The descent into the unknown fills you with a mix of fear and curiosity.
The first two sentences are just fine. The third is typical AI. Every AI-written game I’ve seen says “you’re filled with wonder and amazement” or “you can sense the work and care that went into this item” or “there is an aura of mystery and fear” etc. over and over again. It’s so repetitive that you might as well just have a line in the status bar that says what aura or sense you’re currently experiencing. Terra Nova fortunately avoids the worst of these excesses, so if you do use AI-generated text, I’d recommend you at least read it and edit it or use it judiciously like this author did.
Edit: I should add that, if you do integrate parser and LLM, I think that’s impressive and is a worthwhile learning experience, and could lead to great things in the future as you gain a deeper understanding of parser games. It doesn’t mean, though, the people will like to play the game; sometimes the best experiments and most beneficial games to write as an author get the least praise or attention from audiences.
The descent into the unknown fills you with a mix of fear and curiosity.
Isn’t this what writers do? I mean the first two sentences are in the picture, but the atmosphere is the province of words. So ok, this isn’t particularly inspiring, and maybe that’s the point. And also the general over-egging of it from AI. But it is what writers do.
Going back on topic, I think the idea of an “anything goes” contest would be an ideal way to encourage out of the box experimentation. Both in AI and otherwise.
Recently, I’ve been looking at AI style conversions. Text “prompts” are always way too vague, and a better kind of prompt is picture input. For example, could AI take your pencil sketches as a prompt and transform them according to your wishes? I don’t know. But it would be an experiment.
Here is Sugamo;
The render version on the left and the Anime version on the right.
She’s from an off-the-wall game idea MikeR would like called “Wildcats”, it features;
Incongruous half-naked Anime girls that team up to fight invisible demons on the Tokyo subway.
“Anything Goes” right?
And here’s Takebashi;
From the design:
Invisible demons from Hell are sent to the real world to make things go wrong. Which is why things go wrong!
The three girls are ascended humans with powers. One has expert hand-to-hand combat with sticks, one is a crack shot with guns and one has bio-force and can also heal. You are their friend, called “The Player”, who is trapped in the “real world”, but can communicate with them in a way that deliberately breaks the forth wall. An action RPG with a compelling Interactive Fiction story.
Could be one for the anything goes comp. There won’t be any AI text [1], and only limited AI use in transformation experiments of existing images rather than full generation.
This is it for sure. You’re completely right, that line by itself is good. It’s just that AI likes to use it too much. It’s like when I was writing a game and one person pointed out that I was starting every single paragraph with ‘you’ and it was getting repetitive.
I would say this isn’t great writing because it’s overwritten to the point of being repetitive.
Nouns:
Chasm
Void
Unknown
Adjectives:
Gaping
Deep
Dim
All the nouns and all the adjectives are basically saying the same thing. As for the last sentence, IMO a well-written description would present a sense of fear and curiosity without needing to spell out to the player that that’s what the PC feels.
Edit: And I think that repetitiveness I pointed to is a hallmark of AI writing—it says the same thing over and over in different but not-any-more-effective words.
Years ago, when storage and memory mattered a lot, i wrote an IF text compressor. The idea was to pre-process by searching for repeated words and assign those a token. After that, it would compress using a regular method. The idea being, tokenised repeated word groups would improve the saving. And it did, but is was also revealing;
We discovered that location descriptions were the worst at repeating various phrases. For example, they would start with things like “You are standing..” or “You find yourself” or “You are in a”. And so on. Also the way in which available direction were mentioned, “To the east you can see”, “Northwards leads to”. And so on.
When we showed this to our writers, they immediately rushed off and edited their text killing the space savings we had achieved
I believe that’s still allowed by the rules of every major competition. The new rules are about using LLMs to generate text and images; building and training your own neural network to do something other than generating text isn’t covered by that.
So if you want to use an RNN classifier to help in parsing, go for it!