AI in competitions

Personally, I think using an LLM for “stage 1” of the parser (as jkj yuio put it) would be fine, if you could somehow get around the latency issues. You’re not generating any of the actual creative content with AI, you’re just using an LLM for what they’re actually good at: processing text, turning I WANT TO GO AROUND THE HOUSE AND OPEN THE WINDOW ON THE OTHER SIDE into GO EAST THEN OPEN WINDOW.

3 Likes

Right. In my short experience with of one of the Zork implementations, AI can work as a sort of “game show host” interpreting any weird complicated commands into something the parser can actually use, and potentially improvising when the parser responds “I don’t understand MOW GRASS” the AI can pick up the slack with a more user-friendly response that isn’t game-altering like “The grass is overgrown and hasn’t been trimmed in a while, but that’s not your job.”

That part I’m more wary of, both because it’s doing creative work, and because the LLM has no way to know what counts as user-friendly but not game-altering. What if the grass being meticulously trimmed is an important clue later, since it shows that the so-called Baron is actually the gardener impersonating his deceased master?

9 Likes

I understand what you mean. I was impressed that the Zork AI’s improvs (at least in the part I played) seemed very in-world, and I don’t know if that means the author sits and “directs” the AI to understand the world by having conversations with it?

I assume if the grass is an important part of the game, it would have authored implementation so the parser wouldn’t throw an “I do not understand” error and it’s only in cases where the parser is flummoxed would the AI compose a minimal but more fluent refusal like “I’m not sure gardening is an optimal use of your time.” Basically customizing a generic refusal based on real world understanding of what the player typed even when the game is not designed to accept that response.

1 Like

Oh, sure. But what if e.g. it’s an unrecognized verb, and MOW wouldn’t work on any noun in the game?

Aside from latency, the other issue I’ve encountered with that use is that sometimes it transforms your input in weird ways and ends up giving you responses that have very little to do with the command you entered, leaving you unclear on why the thing you were trying to do didn’t work. Like, if I type “get batteries” and the game reacts as though I tried turning on the shower, I don’t know if the problem is that the game didn’t understand “get” or didn’t understand “batteries” or the batteries are not in scope or “get batteries” specifically wasn’t implemented because the batteries weren’t supposed to be portable, and I also have no insight into the process that led the game to decide I wanted to try to interact with a different item instead.

Part of the issue is that people who want to use LLM-based parsers mostly seem to want players to experience them as something that Just Works, seamlessly and under-the-hood, so they’re not going to have the game say “[I don’t understand ‘get’, rephrasing that to ‘turn on shower’]”. But also you can’t really have an LLM fully show its workings, so probably some of the “wait, how did we get from this to that?” just comes with the territory.

7 Likes

The general view seems to be that Zork AI is the most advanced parser currently available. But when playing it I noticed some issues. When I type in inventory, it gives me a complete and accurate list of what I’m holding. But when I type in ‘I check my inventory,’ immediately afterwards, I get this response. “Your inventory consists of a collection of essentials such as a brass lantern, a sword, and a large amount of optimism—perfect for navigating narrow passages and fending off any lurking Grues.” (Even though I was carrying like 5 other things) This tells me that room-available actions are being mapped as possible matches for the AI parser, but universal actions that can be taken in any room are not being mapped.

2 Likes

I wonder if the AI interpreted this as EXAMINE INVENTORY but since CHECK likely isn’t a standard action, it might have interpreted you wanted a definition of what inventory means - in the context of “check out my inventory”.

I support using GenAI to write and test code. I could see maybe limited use for parser enhancement or help systems, but I’m fully against any GenAI use for storytelling.

2 Likes

I don’t know if they were bragging but the apple ad with the crushing machine was pretty memorable https://www.youtube.com/watch?v=xGyOIFRJPII

Edit: FYI this is a very short AP video which shows most of the original ad - I can’t find the original ad since Apple has tried to erase it. It isn’t necessarily about AI, though that’s largely how it was interpreted, and I just think that everyone should see it because it’s fantastic.

Part of the problem right now is that it takes a lot of effort to invent a genuinely cool use-case for AI in IF, but almost no effort at all to churn out derivative slop. So, any comp that is open to AI is going to get a lot more slop AI than cool AI.

Eventually we’ll figure out how to sort the good from the bad, but we don’t really know what “good” AI will look like yet. So, in the meantime, comps can either allow AI and incentivize slop, or ban AI and disincentivize ever figuring out how to do something other than slop. It’s a catch-22.

(Personally, I think a good interim approach is for comps to allow live API calls [typically a sign of high-effort experimentation] but ban pregenerated AI text [typically a sign of low-effort slop].)

6 Likes

Speaking as one of the judge-reviewers who’s talked about not wanting to engage with generative AI–based games, I have to say that what I’ve seen of games with live API calls has not been more fun to read than games with pregenerated text, and in some ways it’s worse because the putative author has ceded some or all control over the story to the LLM, resulting in something that’s highly unlikely to hang together even on a basic plot level, much less tonally or thematically. It might be more technically impressive but it does not make me enjoy playing the game more, and it definitely does not make me want to spend time reviewing it given that so little of what’s there was put there deliberately.

7 Likes

I feel like a broken record about this, but I’ve found the games that have tried this so far kind of frustrating because it’s not clear what they’re translating your input into or why it wasn’t understood in its original form, so although with traditional parsers my take has always been “If you can tell me I can’t read the book because I’m not holding it, you can (and should) make me try taking the book first instead,” I do see in this case why Draconis would suggest having it tell the player what syntax to use instead. I would at least want a “didn’t understand X, trying Y” sort of message when it translates a command.

9 Likes

Yes - this is more or less exactly what I’d be hoping for. Personally, I’d want it to pop up a message in brackets much like “(first taking the book)” or “(first opening the door).” If the player typed something like “seize lamp” it could print “(take the lamp)” and then try taking. It could even just say “‘Seize’ isn’t a verb in this game. Would you like to try taking instead?” with a y/n choice rather than an invitation to type the correct command.

I think this could be a pretty minor quality-of-life tweak, even. Something that isn’t even run unless the command as typed would have produced an error, and would (hopefully) stay out of the way unless the invalid command was reasonably close to something else that would have worked. Ideally I’d hope for something virtually indistinguishable from a very wide range of synonyms having been implemented.

5 Likes

If someone does work more on implementing LLM assistance on parser games, here’s what I think would help:

  • Having it interpret commands and then execute them can be obnoxious. A lot of times, as a player knowing the boundaries of what’s possible is useful. If a room only has one exit (to the east), and I try going GO DOWN, the game might search for all possible effective commands and choose GO EAST because it’s the better result. But I don’t really want to go EAST!
  • What I think would be better is if the game instead trained the player on how to play a parser game. Once someone gets the main commands of parser games, they don’t really need help any more. So instead of automatically executing commands, I’d prefer something like smaller grey text under your command saying “Recommend typing X ME instead of WHO AM I?”.
  • Another benefit to having advisory text instead of automatically adjusting your command is command delay. LLMs are slow. Normal parser games have instant response times (usually; I accidentally made one with 1s lag time once). Waiting for LLMS to adjust your response is annoying. But if it’s just popping up in response to earlier commands (hopefully in previously-allocated space so the prompt doesn’t jump around), then the player doesn’t have to wait around.
7 Likes

I second that. Once a player figures out the basics of parser input, there’s almost nothing of value an AI can add there. A short tutorial game or transcript would be just as useful. Using AI for this is a solution in search of a problem.

5 Likes

It’s almost like expending effort in conventional tutorial innovation might pay dividends.

3 Likes

The question is, can you accomplish this without all the drawbacks that come with LLMs? (Both the objections to the technology, and the practical issues like “needing to ping an external server for every command, causing a few seconds of latency”.)

I think this could potentially be a very valuable feature to reduce the barriers to entry…but only if the cure isn’t worse than the disease. I think pinging out to ChatGPT on every command will cause more problems than it solves.

6 Likes

If we focus purely on the technical side, and our goal is just to translate user input into a proper command the parser understands then the task is fairly simple and there’s no need to use a huge model like ChatGPT. You might not even need a language model at all, there are simpler and cheaper neural networks. In any case, you should be able to do it locally, on the user’s PC. It would still be a bit slow, so as suggested, it would be best to only do it when the parser doesn’t understand the user’s input.

That is, if we accept that when the user types “I look around for enemies”, the AI just translates that into look. Or the user types “I want to poison the soup” and the AI translates that into put poison in soup.

A larger model is more important if you want “I look around for enemies” to be interpreted as the AI reading the room description, and then on the fly making up the answer “There are no enemies here.”

In any case, to graft something like this onto an interpreter for existing systems has some challenges, as the AI will work best if you let it know which verbs the game understands and which objects are currently in scope.

3 Likes

Yeah, I haven’t really been following neural network research for the past few years, but this feels like something a RNN could do great at. Fundamentally it’s just a classification task, and neural networks are amazing at classification tasks. It’s their best field!

Now, I’m also not especially interested in being the one to make this work. But if someone does, I’d like to see the results!

(And for what it’s worth, I don’t think a system like we’re describing would run afoul of the LLM-generated text rules. This is an LM, not an LLM, and it’s generating suggestions to the player, not game text.)

3 Likes