AI Class Case Study: Room Connections in Inform 7

(I hope these posts aren’t distracting. I won’t post too many of these, I promise. I just find it fascinating and I hope others at least find it somewhat interesting.)

So the AI was instructed to craft examples for the manual based on what it learned or knew. Here’s something that was really interesting to me. It suggested that the locations of Zork should be used instead of what the manual shows for room connections and regions. Here’s what it came up with when asked to generate its logic via the Explainable AI approach:

When prompted for the correct way to do things, the AI came up with:

There is a room called West of House.
A room called North of House is north of West of House.
A room called South of House is south of West of House.

I thought this next bit was interesting as well. The AI came up with:

Behind the House is east of North of House and east of South of House.

When asked to explain:

Where it got even more interesting was regions given the above. The AI first tried this:

Outside is a region.

It realized that wasn’t working. When asked why, it responded:

It then used:

Outdoors is a region.

So then it tried to put our above locations in the region:

West of House, South of House, North of House and Behind the House are in the Outdoors.

This didn’t work. The AI didn’t really explain why except to say that it assumed that even though the rooms were now defined explicitly this statement was using directional commands. It ended up coming up with this:

The rooms West of House, South of House, North of House and Behind the House are in the Outdoors.

Which, of course, works. (Although there is still a problem with it! A problem the AI nor the humans, as of yet, seem to have noticed.)

So, again, the context here was the “Explainable AI” approach. The AI was tasked with helping craft examples that would showcase Inform logic with some test cases (for lack of better term) to show what can go wrong and what can go right.

It’s harder to get it to work from the reverse. Meaning, if it finds a solution that works, it tends to stick with it (exploit above explore). But when it hits on a solution that doesn’t work, it definitely does quite a bit more exploring.

Finally, when the AI was queried as to why it made sense to use Zork, the response:

Altogether, not too bad of a response!

2 Likes

Okay, I’ve been digging in a bit more. Check this out.

When asked if the Zork example continued to have relevance, the response was:

And, yeah, sure enough, that is actually correct. Most of the early parts of Chapter 3 of the manual could be explained with using Zork. So then I asked for recommendations. Let me show you the full list first because it gives you an idea of the scope. So when asked for recommendations of how to keep the example going, here’s what I got:

I actually like that it’s suggesting calling out the index because that’s something I feel new learners to Inform 7 are not encouraged to do enough.

But … wait. Did you notice something there? I missed it initially.

The AI is assuming a bug is in Zork. It’s assuming that the trophy case is not transparent. That was interesting. When prompted why this was considered a bug and why we should assume the trophy case is not transparent, the AI simply responded with its a container that had to be opened.

Not exactly a great answer since a container, by its own recommendation, could be transparent. So I asked the AI for a transcript of how it interacted with Zork and it came up with this:

Okay, so I get where it saw the “I see an empty case before I opened it” and the rest I assumed were test cases it was generating. That was partially true but part of that was the reasoning for the supposed bug. So I asked for an the explanation.

That’s pretty interesting. What it’s clearly grasping at here is the idea that the trophy case, either as described or implemented, seemed a bit contradictory. It wasn’t really allowing for the idea that the trophy case was more like a cabinet of sorts on the wall.

I’m not sure whether that initial cognitive friction happens with humans who play the game but I found it interesting that it did come up here. When the AI was prompted accordingly about the possible nature of the trophy case, it suggested a better name might be “display case.” When asked why this was better, the AI indicated that “display case” implies items on display thus visible even when closed.

Finally, when prompted that the idea of the object was to be a “treasure display” to show you all the treasures you have gathered and those were used in the sense of being a “trophy” you acquired, it suggested that a better description of the trophy case was warranted.

It’s interesting the things that it gets hung up on but, if you think about it, that design thinking process, can mirror a bit of what humans go through when reasoning about how to describe something or how to implement something. Clearly, in this case, the trophy case would really be acting more as a supporter and a container: meaning, a display case that you could open and that consisted of shelves that you could put items on.

I’m going to see as we continue refining these examples if that’s what the AI suggests for implementation.

3 Likes

@Jeff_Nyman I’ve been reading your earlier post and digging into some of the Inform 7 posts on your site.

I was wondering if you could post a bit more about the AI you’re using and how you are asking it to craft the code samples your are sharing. I’m just interested in learning a bit more.

We’re using GPT 4 as the interface, using defined actors, with some reinforcement learning algorithms working behind the scenes. Those algorithms are backed up with some causal inference modules. One of those is DoWhy.

To give some context, there’s a particular example in the Inform 7 guide called “Gopher-wood” that reads like this:

The Ark is a room. A bearded man is in the Ark.

Instead of examining the bearded man for the first time:
    now the printed name of the bearded man is "Japheth";
    now the bearded man is proper-named;
    say "You peer at him a bit more closely and realize that it's Japheth."

Understand "Japheth" as the bearded man when the bearded man is proper-named.

The AI in our context was asked to reason about how to reframe this if we wanted a woman character Skylar in the game but where we wanted the name of the character to be the named as such in the source text. So we would prompt like this (simplified a bit):

What it came up with was this:

The Ark is a room. Skylar is in the Ark.
Skylar is not proper-named.

Rule for printing the name of Skylar:
    if Skylar is proper-named:
        say "Skylar";
    otherwise:
        say "young woman".

Instead of examining Skylar for the first time:
    now Skylar is proper-named;
    say "You peer at her a bit more closely and realize that it's Skylar.";
	

Understand "young woman" as Skylar when Skylar is not proper-named.

Which isn’t too bad.

It understood that in order for this to work Skylar had to be set as “not proper-named.” Then it just “inverted the causality” (using that phrase in the context of AI) to get the effect. It “understood” what causes led to the Gopher-wood text working as it did and so had to consider those effects differently when the situation was changed: rather than a general name (bearded man / young woman) in the source, a proper name (Skylar) was used in the text.

Key to this, of course, is that the model was trained on a tokenized and embedded data set that was made up of the Inform manuals and curated examples from the manual. We had two sets of example sets: one that had the narrative comments with the examples and others without.

One part of what I’m doing is seeing how to create a very refined set of examples that only show the techniques needed for learning. Those posts of mine you reference before were actually some early work in this regard, showing how to craft examples that while not as colorful as those in the Inform manuals, often did more to speed up learning of Inform 7.

So part of this current experiment is to work with the AI to generate examples and then these will eventually (starting 8 January 2024, to be precise) be used with some classes to see if that holds true. Some classes will use the manual with examples as they are and others will use the refined examples.

The side corollary to this work is to see how quickly a new AI model can also train on the refined examples. Right now, for example, using the refined examples generated by one AI model leads to much faster learning (and accuracy) in a second AI model that uses those.

I’ve probably made all this sound quite a bit more boring than it is. :rofl:

2 Likes

To give one more context here, the causal reasoning still gets screwed up, occasionally. Consider this:

Skylar is a woman in the Office.

Skylar wears the gloves of phasing.
Skylar carries a USB drive.

Report examining someone:
	say "[The noun] is [if the noun is wearing something]wearing [a list of things worn by the noun] and [end if]carrying [a list of things carried by the noun]."

Rule for deciding the concealed possessions of Skylar:
	if the particular possession is the USB drive, yes.

When it generated this, the AI noted that the game text displays this:

Meaning, the USB drive is still displayed even though the rule was stated to have that be a concealed possession.

The AI tried to causally reason and initially suggested that the “Report examining someone” should be made “Report examining Skylar”. When asked to explain, it assumed this was the problem: that the more general phrasing of “someone” overrode the rule applied to Skylar. That reasoning is wrong, of course, but it was interesting because it does know that more specific rules tend to take precedence over more general ones.

Then it decided that “Report” was wrong and it should use “After” instead. That too was wrong but when asked to explain it felt that perhaps the handling was not being short-circuited in this case by the Report rule.

What it eventually stumbled onto was something that it interestingly has a lot of problems reasoning about, which is simply doing this:

[a list of unconcealed things carried by the noun]

This is the explore-exploit problem in action. Once it finds a bit of text that generates as it wants, it tends to just keep that text in place and try to add slews of rules to get the effect it wants. As opposed to reasoning about qualifiers on the text itself.

2 Likes

Not at all! Fascinating stuff.

I agree.