AI instead of classical parser

AI is still soooo much farther away from actual intelligence than the recent hoopla would have us believe.

Ouch. That’s top-tier stupid. I generally prefer older games, and I liked the original TR games, but I have to say the more recent Survivor trilogy is excellent.

NFT’s are for suckers.

6 Likes

I hadn’t paid attention to this one Discord server I’m on but then I realized they were talking about this very game.

A snippet from there:

S: I haven’t been this railroaded since playing Densha De Go!
C: I did look in the Steam forums for a command list, I totally forgot police brutality was a command
S: I remembered that part and was trying to get it to happen but couldnt find the right words
S: It’s crazy that this game has become the ultimate version of the problem they were attempting to solve
C: Yeah for real. The command is just hit him
S: Laura Bow’s parser from '89 is infinitely better than this game lol
C: If you do stumble on some Japanese reviews I’d love to hear if the parser’s any better

S: I’m having luck exclusively using commands I know from the NES game lol

Painful.

3 Likes

Two thoughts about this:

  • Could it be that the tech demo refers exclusively to the “demonstration of Natural Language Processing”? That it is not about the parser and the understanding of text, but exclusively about the recognition of spoken language? If this would work locally, that would be pretty good!
  • Enix is not very well known for its IF. The game is from 1983 and apparently has not been revised. It was certainly a bit clumsy to choose this game as the basis for the demo. At least for a western audience; in Japan it might be seen differently.
4 Likes

I can see why you’d think that (and it’s nice to give them the benefit of the doubt), but NLP is an umbrella term that encompasses many things other than speech-to-text, including parsing typed text input that’s in a natural conversational style, and it’s clear from their press release that their innovation was supposed to cover everything except language generation. They break down all the different sub-fields of NLP and talk about how this game demonstrates them. In fact, the thing that they spotlight the most is the text parsing:

At the time of the game’s original release, most adventure games were played using a “command input” system, where the player was asked to type in text to decide the actions of their character. Free text input systems like these allowed players to feel a great deal of freedom. However, they did come with one common source of frustration: players knowing what action they wanted to perform but being unable to do so because they could not find the right wording.

Basically, this was being explicitly touted as a way to eliminate guess-the-verb problems, and it looks like it failed at that.

About your second point, I’m not sure whether you’re saying it was a poor choice for SE to do this because they were never a major player in parser IF or this game specifically was a poor choice because it hadn’t previously been remade (or both), but as for the first, Enix was well known for its IF in Japan in the era of commercial text adventures, and this game in particular was absolutely iconic to Japanese gamers. It’s frequently referenced in pop culture even now. As for the second… what commercial parser games have been remade/updated recently? Other than the recent graphical remake of Colossal Cave Adventure, I can’t think of much. So it’s not like there was an easy option to go for a parser game that had been adapted for modern player expectations in every regard except the parsing.

That said, I do wonder how much of the problem here is the parsing and how much is the game being underimplemented by modern standards. Parsing the player’s input correctly doesn’t help much if the game has no response for that topic/action. I was going to check it out for myself before commenting on this aspect, but apparently this game is 10GB (another issue with the viability of this technology as it currently stands!) so I’ll have to wait a bit for it to download and install.

On a side note, I see they were originally planning to get around the implementation problem through generated text, but axed that because they couldn’t guarantee the language model wouldn’t give offensive responses, but honestly, even if they could, I think language generation is risky for a mystery game. The language model could generate a response with details that sound significant, and the player could go off chasing a string of red herrings without the game ever giving any feedback to indicate that they were on the wrong track, because it can generate a response to any question they ask or action they take. To a certain extent, I think a mystery game needs “I don’t have anything to say about that” responses to indicate when the player is trying to pursue an unhelpful line of investigation.

6 Likes

This is the big issue with making smarter parsers. Infocom experimented with parsing adverbs in some of their games…and it added basically nothing to the experience, because for the most part the world model didn’t care if you did something “carefully” or “thoroughly”. The few cases where it did matter just ended up being frustrating guess-the-adverb problems.

6 Likes

This sounds like what Spirit AI tried to do. I wonder how they solved the problem of limited training data. Or perhaps the result is an indication that they didn’t.

they were originally planning to get around the implementation problem through generated text, but axed that because they couldn’t guarantee the language model wouldn’t give offensive responses

It seems to me there is a middle ground here where a language model generates lots and lots of plausible responses, which are then curated by humans, and the good ones are hard coded into the final game.

3 Likes

As I recall, we had enough training data to get decent indications out of player input. It wasn’t really trying to get Adventure-style commands out – just general flags like “player is greeting me”, “player is frustrated”, “player is asking me a question”. The hope was that this could be combined with keyword search (for specific topics) to get a workable model.

The hard part was what Daniel Stelzer mentions: coming up with an authoring model to use those player input indications.

5 Likes

From a quick gander at the transcript excerpts people were posting in the Steam reviews, it seems like this is a both/and rather than an either or; a majority of the stuff I saw was player input that included all the correct nouns and verbs for an implemented command, but the parser/LLM not fully understanding it due to word order, prepositions, unimplemented synonyms, etc.

4 Likes

That’s true, and I think it’s actually worse than that in that I don’t think many players actually want that kind of thing.

Like right now you could put together, using existing libraries, a JavaScript-based parser that could handle something like >RAKISHLY DIVEST YOURSELF OF THAT THREADBARE WORSTED MONSTROSITY as being synonymous with >DOFF SWEATER, but there are probably only a vanishingly small number of players who would want to play an entire game using syntax more like the former than the latter.

5 Likes

Overpromised and underdelivered AI based gaming reminds me of the good old days:

3 Likes

As a pre processor I think the idea has some potential for newcomers, hypothetically something like:

~> I want to get the cookies from the top shelf and if necessary I will stand on the stool to reach it
(Command simplified:
~>Stand on stool
~>Get cookies )

Okay, you stand on the stool and take the cookies.

In this way, the player learns how to play.

2 Likes

Well, the Steam review transcripts seem to indicate that there are one or two people out there who want that kind of thing.

2 Likes

I’m sure there are some, but I’d be willing to bet that even most of the players who think they want that, don’t.

Like I haven’t done any double-blind surveys or anything, but I imagine most IF players use >X LAMP where it’s possible, and comparetively few will resolutely stick to >EXAMINE LAMP just for the verisimilitude or whatever.

2 Likes

I don’t think those Steam reviewers are IF players in that sense. They most likely have no idea that X is short for EXAMINE, or even that you are supposed to EXAMINE things. From their perspective, the game tells them to type in what they want to do, and then utterly fails to understand anything they type.

3 Likes

Re; adverbs in parsers, I believe Infocom’s goal was to create puzzles that relied on adverbs to solve, such as >OPEN DOOR QUIETLY to prevent the rusty hinge’s squeak from alerting the guard. (This paper lists five adverbs the parser recognized: CAREFULLY, QUIETLY, SLOWLY, QUICKLY, and BRIEFLY.)

3 Likes

Right. And that’s aggravating, regardless of what’s running on the back end.

The point I’m trying to make is that people generally look at this problem and think the solution is to have a UI (parser, interpreter, whatever) that is bigger, wider, more complicated, more expressive, whatever. And that might work better for totally new users with absolutely no experience with the interface. But outside of that, and a couple of other corner cases, the opposite is usually true: people typically want terse, well-defined interfaces. In the sense that that’s actually how most users tend to behave, regardless of how they might respond to queries about their preferences.

Like at a fast food drive-thru window, you theoretically have at your disposal the entire power of whatever language(s) you happen to share with the person working the register. But what most people tend to actually use is a fairly sparse vocabulary, simplistic syntax, and so on. Because they’re actually prioritizing being understood and getting what they want over an abstract desire for expressiveness. Even when the other end of the communication is an actual literal human who is much more proficient at natural language processing than most UIs, even those featuring “AI”.

5 Likes

This seems like the worst of both worlds unless you’re really, really, really good at hinting which adverbs are usable, and where.

5 Likes

That’s probably why it didn’t take. But (in Infocom’s instance, at least), adverb support wasn’t added merely to permit florid input from the player.

4 Likes

This is completely anecdotal, and names have been redacted to protect the innocent and otherwise, but this recent conversation was oddly pertinent to this discussion:

The sentiment at the end is probably the best to remember; there is no wrong way to eat an interactive fiction.

5 Likes

Yeah, in general programming (and specifically network programming) there’s an idea called the Postel principle which is basically that you should be broad in what you accept and narrow in what you produce. So accept multiple variant input forms, but always produce the same output form consistently.

I think that’s a pretty good principle in general, although I think the Infocom adverbs thing is a pretty good counterexample: if adverbs only matter for one puzzle, then you’re probably going to end up with a lot of players getting stuck (because they got trained out of using adverbs by all the situations where they don’t matter), or you’re going have a lot of players wasting time experimenting with them to no effect.

I think Deadline, a game I quite like, is also guilty of this sort of thing, with >SEARCH NEAR (something) being helpfully suggested in Appendix B of the manual. But since it’s important in, as I recall, exactly one puzzle, I imagine the length of time between when a player reads the manual and when they encounter the puzzle is a good predictor of how much trouble they’ll have with that puzzle.

4 Likes