AI in competitions

Personally, I think using an LLM for “stage 1” of the parser (as jkj yuio put it) would be fine, if you could somehow get around the latency issues. You’re not generating any of the actual creative content with AI, you’re just using an LLM for what they’re actually good at: processing text, turning I WANT TO GO AROUND THE HOUSE AND OPEN THE WINDOW ON THE OTHER SIDE into GO EAST THEN OPEN WINDOW.

3 Likes

Right. In my short experience with of one of the Zork implementations, AI can work as a sort of “game show host” interpreting any weird complicated commands into something the parser can actually use, and potentially improvising when the parser responds “I don’t understand MOW GRASS” the AI can pick up the slack with a more user-friendly response that isn’t game-altering like “The grass is overgrown and hasn’t been trimmed in a while, but that’s not your job.”

That part I’m more wary of, both because it’s doing creative work, and because the LLM has no way to know what counts as user-friendly but not game-altering. What if the grass being meticulously trimmed is an important clue later, since it shows that the so-called Baron is actually the gardener impersonating his deceased master?

8 Likes

I understand what you mean. I was impressed that the Zork AI’s improvs (at least in the part I played) seemed very in-world, and I don’t know if that means the author sits and “directs” the AI to understand the world by having conversations with it?

I assume if the grass is an important part of the game, it would have authored implementation so the parser wouldn’t throw an “I do not understand” error and it’s only in cases where the parser is flummoxed would the AI compose a minimal but more fluent refusal like “I’m not sure gardening is an optimal use of your time.” Basically customizing a generic refusal based on real world understanding of what the player typed even when the game is not designed to accept that response.

1 Like

Oh, sure. But what if e.g. it’s an unrecognized verb, and MOW wouldn’t work on any noun in the game?

Aside from latency, the other issue I’ve encountered with that use is that sometimes it transforms your input in weird ways and ends up giving you responses that have very little to do with the command you entered, leaving you unclear on why the thing you were trying to do didn’t work. Like, if I type “get batteries” and the game reacts as though I tried turning on the shower, I don’t know if the problem is that the game didn’t understand “get” or didn’t understand “batteries” or the batteries are not in scope or “get batteries” specifically wasn’t implemented because the batteries weren’t supposed to be portable, and I also have no insight into the process that led the game to decide I wanted to try to interact with a different item instead.

Part of the issue is that people who want to use LLM-based parsers mostly seem to want players to experience them as something that Just Works, seamlessly and under-the-hood, so they’re not going to have the game say “[I don’t understand ‘get’, rephrasing that to ‘turn on shower’]”. But also you can’t really have an LLM fully show its workings, so probably some of the “wait, how did we get from this to that?” just comes with the territory.

6 Likes

The general view seems to be that Zork AI is the most advanced parser currently available. But when playing it I noticed some issues. When I type in inventory, it gives me a complete and accurate list of what I’m holding. But when I type in ‘I check my inventory,’ immediately afterwards, I get this response. “Your inventory consists of a collection of essentials such as a brass lantern, a sword, and a large amount of optimism—perfect for navigating narrow passages and fending off any lurking Grues.” (Even though I was carrying like 5 other things) This tells me that room-available actions are being mapped as possible matches for the AI parser, but universal actions that can be taken in any room are not being mapped.

2 Likes

I wonder if the AI interpreted this as EXAMINE INVENTORY but since CHECK likely isn’t a standard action, it might have interpreted you wanted a definition of what inventory means - in the context of “check out my inventory”.