Testing your game with ChatGPT

Something that came up in this thread was the possibility of using ChatGPT (or similar chatbots) to simulate a player playing your game, for testing purposes. The initial forays in that thread were not very successful, but you can get reasonably good results by giving ChatGPT an appropriate initial prompt such as:

You are now playing a text adventure game. When I input a prompt, it is what the game is saying to you; respond with your command, which should be a very short imperative sentence starting with a verb and no more than two nouns, and then stop. Type “START” to begin.

ChatGPT will then output “START”. Paste the game’s starting text into ChatGPT, and ferry its responses back and forth to the interpreter.

ChatGPT is not particularly good at actually solving your game, but it does succeed very well at one specific (and thoroughly maddening) task: seeing what kinds of incorrect commands and conceptual misunderstandings a novice might try in your game. For example, here’s what it tries to do with Savoir-Faire:

(click to see transcript of Savoir-Faire)

You are now playing a text adventure game. When I input a prompt, it is what the game is saying to you; respond with your command, which should be a very short imperative sentence starting with a verb and no more than two nouns, and then stop. Type “START” to begin.

START

The beautiful life is always damned, they say. As for you, you’ve overexpended yourself: fifteen years of prominence, champagne, carriage rides in the Tuileries, having your name whispered behind manicured hands, getting elegant ladies out of elegant fixes — and you’re in debt. Bound by oath and honor to a pack of scoundrels. Your father, old peasant that he was, could have warned you against their type.

You’d hoped to find the Count at home; your original plan was to ask him for an advance on the money you expect to earn on your estate in a month or two. But according to the local population, the Count hasn’t been home in months, and Marie, who was in residence until recently, has vanished as well. “No one around the big house now,” says one of the peasant women tersely. “Even turned off the cook, they did.”

But you need the money at once; if you return to Paris without it, your creditors are unlikely to be forgiving. So you walked over the fields anyway, and got in through the gap in the kitchen garden wall. You will just have to find what you can.

Savoir-Faire
An interactive search for loot. Copyright Emily Short 2002-4.
Type HELP if you have not played this game before, even if you are familiar with IF.
Release 8 / Serial number 040205 / Inform v6.15 Library 6/10

Kitchen Garden
West is the wall of the kitchen. Most of the space along the wall is planted with herbs — bay, parsley, stalks of mint — left of the open doorway. To its right is only a drainage ditch, catching the outflow of a pipe that
comes through the wall.

In one corner of the plot is the well, drilled many deep feet; in another is the antique sundial.

A tangle of climbing roses covers almost completely the wall and door of a shed to the southeast.

Search the shed.

The dense tangle of roses covers up the door of the shed, effectively preventing you from opening it.

Remove the roses.

(first taking the roses)
The roses are equipped with really ferocious thorns.

Find gloves.

That’s not a verb I recognize.

Get gloves.

You can’t see any such thing.

Look for gloves.

I only understood you as far as wanting to look.

Search for gloves.

You can’t see any such thing.

Look around for gloves or any other tool that could help me remove the roses without getting hurt by thorns.

I only understood you as far as wanting to look.

If you’ve spent any amount of time reading transcripts of your games from novice players, some of this will look quite familiar! Referring to objects that haven’t been mentioned, vague commands like “search for ______”, rephrasing by increasing the complexity of a command rather than decreasing it… One thing we learn from this is that the default parser responses not always very helpful at guiding novice players towards more correct forms of input. For a contrast, consider the same initial prompt applied to Hadean Lands, which has comparatively friendlier responses and a more sophisticated tutorial system:

(click to see transcript of Hadean Lands)

You are now playing a text adventure game. When I input a prompt, it is what the game is saying to you; respond with your command, which should be a very short imperative sentence starting with a verb and no more than two nouns, and then stop. Type “START” to begin.

START

You smell copal incense, machine oil, rosemary, alcohol, and blood. Creaking, bending steel beams… no, that’s not an odor. Why did you think the bulkheads were crumpling in on you? What would that even smell
like?

You’re pretty sure it’s not your own blood, anyway.

The secondary alchemy lab is your duty station this shift; where you were, before — before now. Before you woke up on the floor. Smelling copal, oil, rosemary, and — well, alcohol is right. You were scrubbing the
workbench. Enjoying your shift, swabbie?

Not any more, Sarge.

The lamps shouldn’t be flickering like that. And His Majesty’s Marcher The Unanswerable Retort really, really shouldn’t be this quiet.

Hadean Lands
Overdue Interactive Fiction by Andrew Plotkin
Copyright (c) 2014-6 by Andrew Plotkin
Release 5 / Serial number 160522 / Inform 7 build 6G60 (I6/v6.33)

[You play this game by typing commands. To look around right now, type “LOOK” and hit Enter.]

LOOK.

Secondary Alchemy Lab
The lab is unsettlingly dim, but familiar enough: rough wooden walls, the broad stone surface of the workbench. The lab door is to the north; it’s closed.

To the east is the lab’s supply closet. Except… the entire east side of the room is filled with clear crystal. That’s not so familiar. That’s downright strange.

You notice a heap of alchemical supplies piled on a table. Next to the table, an iron panel is set in the floor.

[There’s quite a bit here! You can examine these things more closely. Try “EXAMINE BENCH”.]

EXAMINE BENCH.

The workbench is a broad slab of bluestone, chipped and scarred. A bound is incised in its surface — the circular arc that contains and constrains works of alchemy.

The workbench bound is empty.

You notice a worn steel bolt lying under the workbench.

[You’ll be using these ritual bounds a lot. The “EXAMINE” command works on nearly anything. Since you’ll use it so often, you can abbreviate it. Try “X MYSELF”.]

X MYSELF.

Not the most promising Ensign in His Majesty’s Navy.

[You’ve learned a bit about yourself, at least. Check out the rest of this room. Remember, you can get a general overview by typing “LOOK” or “L”.]

L or LOOK.

[I understood the command “l”, but not the following words. To look more closely at something, try “LOOK AT OBJECT” or “EXAMINE OBJECT”. You can also use the abbreviation “X OBJECT”.]

LOOK AROUND.

Secondary Alchemy Lab
The lab is unsettlingly dim, but familiar enough: rough wooden walls, the broad stone surface of the workbench. The lab door is to the north; it’s closed.

To the east is the lab’s supply closet. Except… the entire east side of the room is filled with clear crystal. That’s not so familiar. That’s downright strange.

You notice a heap of alchemical supplies piled on a table. Next to the table, an iron panel is set in the floor.

A worn steel bolt is lying half-hidden under the workbench.

EXAMINE CRYSTAL WALL.

You can’t see any such thing.

EXAMINE EAST WALL.

You can’t see any such thing.

EXAMINE CRYSTAL.

It’s not crystal. It’s more as if the air has cracked, right across the room, leaving a plane of cleavage which is visible only in refraction.

[You can’t do anything about this fracture blockage right now.]

OPEN DOOR

You pull the door open. Then you nearly fall down, so sharp is your disorientation. Only your grip on the door convinces you that you are still standing upright.

A Hadean land lies outside the lab door.

I’m sure this wasn’t in the regs. The east-side corridor is missing. Or most of it is missing. A foot or so of ordinary tile floor is at your feet; but beyond that, the corridor is cut away as if by a knife. Beyond is a barren grey wasteland — a Hadean land, deadly to any breathing life. Unweathered grey rocks and dead dust rest under bright, unwinking stars.

EXAMINE WASTELAND.

You can’t see any such thing.

LOOK OUTSIDE.

Most Hadean lands look alike — dead rocks, night sky — and you can tell nothing about this one, except that it very definitely shouldn’t be where the Retort’s east-side corridor usually is.

GO OUTSIDE

You might be sucked through the door into vacuum. But wouldn’t the lab’s atmosphere be sucked out anyway…?

You could ponder that for hours, but your hand has more curiosity or less sense than your head. You reach out — and your fingers brush another crystal-hard, slick surface. The Hadean land is blocked by a fracture, just like the supply closet.

GO BACK INSIDE

You can’t see any such thing.

GO INSIDE LAB

The fracture, or whatever it is, is a foot or so beyond the door. There’s no room to step outside. And you wouldn’t want to take a stroll out there, anyway.

EXAMINE TABLE

Lying on the flimsy table are a sheet of instructions, a sprig of rosemary, two impets of essential oil (peppermint and ginger), a pair of tarnished calipers, and a brass pin.

[This looks promising. You can “EXAMINE” any of the items on the table. Common synonyms should work. For example, you can “EXAMINE THE SHEET” or “READ INSTRUCTIONS”.]

EXAMINE INSTRUCTIONS

You pick up the sheet, and find Sergeant Brooks’ familiar scrawl. “Ensign Forsyth: When you’re finished scrubbing, get these calipers freshened up. You should be able to handle the ritual. See me afterwards.”

The other side is printed in the familiar smudgy blackletter of Navy receipts. “FOR THE CLEANSING OF BRASS TARNISH: Prepare an atmosphere of fiery principles. Place a brass token within the bound, and seal it.
Speak a word of essential nature, so that the properties of brass may be evoked. Compound the atmosphere with a resinous note. Then intone the Lesser Phlogistical Saturation to complete the token’s investment.
Place token directly on tarnished item.”

The syllables of the Saturation formula are spelled out below. And scrawled beneath that: “Resinous note — wave the rosemary, swabbie.”

You memorize the instructions, including the Lesser Phlogistical Saturation. You also pick up the sheet. Just in case.

[To list what you’ve memorized, type “RECALL” or “REMEMBER”.]

RECALL

You have learned one ritual: (* marks rituals you have not yet completed)
basic tarnish cleansing inscription (*)

After months of diligent practice, you have memorized several common alchemical formulae:
the Lesser Phlogistical Saturation
an elementary word of binding
a word of essential nature
a simple sealing word
an unsealing mantra

You have picked up assorted facts:
the work assignment
the resinous note
a lecture on aromas

[You can recall more about any of these topics. Try “RECALL AROMA LECTURE”.]
[Type “RECALL WORD” to remember more about a specific formula. Type “SPEAK WORD” to say one of them out loud.]

SPEAK WORD OF ESSENTIAL NATURE

You speak the word of essential nature. There is no effect.

PERFORM TARNISH CLEANSING INSCRIPTION

You’ve learned the ritual, but never performed it. Once you have gone through the steps, you’ll have a better grasp of the technique.

PREPARE ATMOSPHERE OF FIREY PRINCIPLES

[That’s not an action I understand. Common actions include “EXAMINE” (“X”), “TAKE”, and “DROP”. Type “LOOK” (“L”) to review your surroundings.]

PREPARE INGREDIENTS FOR TARNISH CLEANSING INSCRIPTION

[That’s not an action I understand. Common actions include “EXAMINE” (“X”), “TAKE”, and “DROP”. Type “LOOK” (“L”) to review your surroundings.]

TAKE CALIPERS, SPRIG OF ROSEMARY, IMPETS OF PEPPERMINT AND GINGER, AND BRASS PIN

You can’t see any such thing.

Much better, but doing this exercise still helps reveal some of the weak spots in the parser’s responses.

If we want to construct a virtual player who actually sort of knows what they are doing, I found the following starting prompt to work fairly well:

You are now playing a text adventure game. When I input a prompt, it is what the game is saying to you; respond with your command, which should be a very short imperative sentence starting with a verb and no more than two nouns, and then stop. If the game does not understand you, try simplifying your command or using synonyms. Good commands might start with verbs like EXAMINE, LOOK, TAKE, DROP, PUT, OPEN, ENTER, NORTH, INVENTORY, or other logical verbs, followed by a noun, like: TAKE THE BUCKET. You should only refer to objects you have seen or have in your inventory. Type “START” to begin.

Yielding the following typical output:

(click to see transcript of Savoir-Faire)

You are now playing a text adventure game. When I input a prompt, it is what the game is saying to you; respond with your command, which should be a very short imperative sentence starting with a verb and no more than two nouns, and then stop. If the game does not understand you, try simplifying your command or using synonyms. Good commands might start with verbs like EXAMINE, LOOK, TAKE, DROP, PUT, OPEN, ENTER, NORTH, INVENTORY, or other logical verbs, followed by a noun, like: TAKE THE BUCKET. You should only refer to objects you have seen or have in your inventory. Type “START” to begin.

START

The beautiful life is always damned, they say. As for you, you’ve overexpended yourself: fifteen years of prominence, champagne, carriage rides in the Tuileries, having your name whispered behind manicured hands,
getting elegant ladies out of elegant fixes — and you’re in debt. Bound by oath and honor to a pack of scoundrels. Your father, old peasant that he was, could have warned you against their type.

You’d hoped to find the Count at home; your original plan was to ask him for an advance on the money you expect to earn on your estate in a month or two. But according to the local population, the Count hasn’t
been home in months, and Marie, who was in residence until recently, has vanished as well. “No one around the big house now,” says one of the peasant women tersely. “Even turned off the cook, they did.”

But you need the money at once; if you return to Paris without it, your creditors are unlikely to be forgiving. So you walked over the fields anyway, and got in through the gap in the kitchen garden wall. You will just have to find what you can.

Savoir-Faire
An interactive search for loot. Copyright Emily Short 2002-4.
Type HELP if you have not played this game before, even if you are familiar with IF.
Release 8 / Serial number 040205 / Inform v6.15 Library 6/10

Kitchen Garden
West is the wall of the kitchen. Most of the space along the wall is planted with herbs — bay, parsley, stalks of mint — left of the open doorway. To its right is only a drainage ditch, catching the outflow of a pipe that
comes through the wall.

In one corner of the plot is the well, drilled many deep feet; in another is the antique sundial.

A tangle of climbing roses covers almost completely the wall and door of a shed to the southeast.

EXAMINE WELL

A dark circle of stone, the depths of which you cannot see. The water comes up within reach.

A bar of iron crosses the well, to support a bucket on a rope (which, however, is itself lacking).

In the well is a body of water.

LOOK IN BUCKET

You can’t see any such thing.

EXAMINE SUNDIAL

A sturdy old sundial, its bronze weathered green. Unlike various tawdry sundials you have encountered in your life, it has a firmly-attached gnomon that is unlikely to come off in times of trouble.

EXAMINE DOORWAY

A doorway, fitted with a plain wooden door, that leads from the kitchen to the garden.

OPEN DOORWAY

That’s already open.

ENTER DOORWAY

Kitchen
A long, cross-beamed room, originally washed with white, though smoke has tinged the walls above the massive fireplace in which roasts are prepared. In addition, there are thick high tables that run the length of the
room.

To the east is the kitchen garden through which you entered; west is the corridor that connects with the main house. To the north there is the little scullery, and upstairs is the attic where the servants sleep.

On the kitchen tables are a loaf of stale bread and a delicate contraption (which is closed).

A string of andouillettes hangs from a beam of the ceiling.

TAKE BREAD

Taken.

EXAMINE CONTRAPTION

The body of the contraption is mostly just a large box, into which things can be put, and an assembly of clockwork parts in a separate case, not to be interfered with. There is also a dial which may be set to various positions, and a spindle, on which is a cylinder with a pattern of strange bumps on it, labeled “Andouillettes Recipe”.

The delicate contraption is currently switched off.

TAKE ANDOUILLETTE

You can’t reach the andouillettes from here.

GO NORTH

Scullery
A small, workmanlike room with a floor of white stones. The principal features of the room are a counter, with a large inset washbasin also made of stone, and a tall wooden cupboard.

On the countertop are a sponge, a butcher knife and a sack (in which is a single yellow onion).

TAKE KNIFE

Taken.

GO WEST

You can go only south.

GO SOUTH

Kitchen
A long, cross-beamed room, originally washed with white, though smoke has tinged the walls above the massive fireplace in which roasts are prepared. In addition, there are thick high tables that run the length of the
room.

To the east is the kitchen garden through which you entered; west is the corridor that connects with the main house. To the north there is the little scullery, and upstairs is the attic where the servants sleep.

On the kitchen tables is a delicate contraption (which is closed).

A string of andouillettes hangs from a beam of the ceiling.

CUT ANDOUILLETTE

What do you want to cut the andouillettes with?

CUT ANDOUILLETTE WITH KNIFE

It’s difficult to chop up the andouillettes from this vantage point.

Not very good at solving puzzles, but at least good at following directions.

9 Likes

If anyone wasn’t aware of it, Text World is a Microsoft research project specifically about training an AI on text adventures. They’ve got it both playing games and generating games… the latter was news to me; I hadn’t checked in for a while.

6 Likes

ChatGPT is not particularly good at actually solving your game, but it does succeed very well at one specific (and thoroughly maddening) task: seeing what kinds of incorrect commands and conceptual misunderstandings a novice might try in your game.

Excellent notion!

3 Likes

I realize ChatGPT doesn’t reason about the game it’s playing, but I wonder if you trained it on transcripts, would it get better?

2 Likes

I’m assuming you’re manually feeding responses back and forth, copy and paste. Is there a way to automate this? I’d love to see how many turns it would take ChatGPT to solve various IF games. Sort of like the notion that a million monkeys beating on typewriters infinitely will eventually accidentally type out the complete works of William Shakespeare? ChatGPT should do much better than randomly beating on a keyboard.

1 Like

Probably would take forever. In my attempts it kept obsessing over one detail or other so much that I had to intervene. (“Stop trying to open the weapons compartment”) Or it would make lists of all the high-level things it could do (“try to find a way to open the weapons compartment, perhaps by searching the nearby rooms”) and ask me to give it directions.

2 Likes

A forthcoming ChatGPT API was just announced, and an unofficial ChatGPT API exists, so such a thing should be possible.

It’s somewhat buried, but the initial paper on TextWorld, TextWorld: A Learning Environment for Text-based Games (PDF) admits:

If the agent wins (which never happens in practice) […]

(The paper, by the way cites the Zarfian cruelty scale :grinning: …without saying “Zarfian”, but it explicitly credits Andrew.)

I’ve thought about writing a testing agent – with no expectation it would ever come close to solving anything… just to be an automated wise-ass beta tester who insists on tasting the captain and putting the moon in the pond. ( @Juhana 's Object Response Tests extension is a sort of a white-box version of the concept, but I haven’t used it beyond running its included examples.)

3 Likes

I wonder if it would do better with some of the easier TALP games?

1 Like

Some of those might be so small and simple that either of ChatGPT or a million monkeys would have a shot. :smile:

2 Likes

I think the built-in tutorial and hint focused nature of TALP games might help too. They’re built for IF novices, which ChatGPT would qualify.

It’d be an interesting way to rate how well a tutorial is done. Can ChatGPT beet your game in less than 100,000 turns?

1 Like

A phenomenon I noticed while having ChatGPT play Crash:

Starting in a room:
"This is the staging area of the ship, where marines prepare for missions. It is adjacent to the aft airlock. It houses a number of vac suits in an open locker, and various munitions in locked compartments set into the walls. A ladder leads up into the rest of the ship.

Through the airlock windows you can see the gangway to Space Station Omicron-5."

ChatGPT first tries

EXAMINE LOCKED COMPARTMENTS

which of course returns “You can’t see any such thing.”, so it tries

EXAMINE LOCKED COMPARTMENTS SET INTO WALLS

and having seen this thing before, I suggest “try fewer adjectives” to get it to finally try:

EXAMINE COMPARTMENTS

I’ve seen similar things happen with human players before. There is a tendency to narrow the scope of a command when it is initially rejected, i.e., to specify more closely what you want to do. "Okay, they didn’t understand “locked compartments”, let’s try something more specific. On the other hand, parser games reward being as brief as possible without being ambiguous. I don’t see a fix for this.

The other thing is that ChatGPT and people love adjectives. They’ll say “locked compartments” even though “compartments” would suffice. This of course can be handled by being more diligent with descriptive words (understand statements for the Inform 7 folks). ChatGPT did point out a lot of places where even after all the work I put in, the game still needed more synonyms/descriptive words.

5 Likes

Something I’ve noticed myself doing:
When I’m in an especially absorbing or suspenseful segment of a game, I type the objects’ full names, adjectives and all. I’ll also use EXAMINE instead of X. A trick to prolong the experience.

4 Likes

Let me just necro this topic…

I’ve been doing some testing this way the last few days, and I’ve noticed that GPT-4-Turbo is really very good at it. Using a custom GPT to define the player’s behavior is very convenient, as the directions never go out of scope. Unfortunately you’re limited to 40 queries every 3 hours.

So I partially automated the process using the GPT API. I get very good results, which I feel are pretty comparable to a smart player who isn’t familiar with all of the conventions of text adventures. The problem is that using GPT-4 through the API is $$$.

One lovely capability – you can ask GPT to narrate its thinking, critique the game, tell you when it’s reached a dead end, etc. It’s really quite illuminating. It totally stopped the game at one point to tell me how much it sucked.

GPT-3.5 is far less expensive, but I don’t think the reduced capabilities are sufficient unfortunately. Once GPT-5 comes along maybe 4 will come down in price.

8 Likes

Have you tried using Microsoft Bing/Copilot on “Creative”? IIRC it uses GPT-4 and it’s free.

Also, can you send the text you use to explain to GPT what to do?

Some of this was written by GPT itself:

“You are tailored to playtest text adventure games, focusing on puzzle-solving and necessary interactions with game characters. You strictly adhere to command formats like "VERB NOUN" or "VERB NOUN PREPOSITION NOUN," but are skilled in discovering new gameplay verbs. Essential verbs include LOOK, TAKE, DROP, INVENTORY, EXAMINE, TELL, and ASK. You employ a methodical exploration strategy, meticulously examining each area before moving on and seeking additional information within the game world for unclear puzzles. You simulate a strategic player’s approach, aiming for a deep understanding of the game’s narrative and mechanics. When providing feedback or describing actions, You maintain a strict adherence to the game’s command format, ensuring clear and consistent communication within the game’s established interaction style. If you want to communicate your thoughts while playing, surround them with square brackets . The user can provide extra information in square brackets as well. Use primarily compass directions to move. Read the given text and provide a response as if you were playing the game. Use simple actions such as "GO WEST" rather than complex ones like "SEARCH THE BUILDING FOR CLUES." You can make critical comments about the game in double angle brackets << >>. Critical comments may include suggestions for improvement, comments on the game’s design, or other feedback.”

2 Likes

I saw this a while back on youtube: ChatGPT vs. Zork -Modern AI vs. 80s Retro!

There were parts of the video where it was obvious bits were cut out.
He later explains there were times where ChatGPT couldn’t progress on its own.

2 Likes

When I tried that a while back, GPT got stuck in a loop, looking at the grate in the forest, going back to the tree, climbing it, going back to the grate, over and over and over.

1 Like

GPT-4’s comments upon finishing part 1 of my current WIP: (spoilers redacted)

This appears to be the end of the current content available in the game. It has concluded with an unexpected ######, which could be interpreted as a cliffhanger or a transition to a different part of the story. As a playtester, I would note that the pathway to this conclusion involved the use of #### and ####, which added a unique layer of complexity to the puzzle-solving. It might be helpful for the game to provide clearer hints or interactions with characters like #### to guide players through this sequence. Additionally, ensuring that all nouns mentioned in descriptions are implemented would afford a smoother gameplay experience. Looking forward to seeing how the game continues in Part 2.

I filled in the missing nouns as GPT played. Very useful.

4 Likes