Using AI to Generate a Working Inform Ecosystem

Greetings all. So some folks may find this interesting. From a few previous posts of mine, y’all know I’m working on some projects with AI and these will serve as the basis for introducing Inform into some classroom settings at the early part of next year.

In that context, here is a repo that was entirely generated by an AI acting as myself:

The AI was given a tokenized and encoded representation of the inweb documentation for Inform 7 (including all related projects). It trained on that and then was asked to generate a repo that would create a playground repo modeled somewhat on my Inform 6 playground:

And may TADS 3 playground:

It didn’t actually do too bad!

It couldn’t make heads or tails of getting Intest to work at all. The documentation was just totally obtuse to it. In fact, that was the case for most of the Inform documentation. It literally learned more by trial-and-error than it did by using the documentation, which I found ironic given the idea of a “literate” approach. Essentially, it just took a starting Basic Inform project that looked like this:

To begin:
    say "Testing Inform 7."

And then kept adding things from the Inform distro that it felt it needed based on error messages it was receiving when it tried to compile that simple logic.

To give you an idea, it would keep adding individual files – like CSS, JavaScript, etc – files until it found the minimum that would work. And it was only ever the minimum. So it would try combinations and then pick the minimum that worked as the final implementation.

That was just using a file called testing.i7 which it creates as part of a projects directory in the repo. This directory is not version controlled.

Eventually it did figure out how to create a project (Testing.inform) that also created an index. Although it never worked out how to actually have the images show up in the index. There’s an “inform:” bit to the URL that it can’t work out at all. Problems aside, it was actually a fairly impressive showing and, in doing this, it generated much more concise documentation than the actual “inweb” documentation that is generated as part of Inform.

What it generated essentially shows someone how to set up simple file (the testing.i7) and compile that all the way to a working ulx file. It then shows someone how to set up a project (Testing.inform) and do the same thing, providing commands along the way.

I was actually going to have it use Inweb for that but it found Inweb entirely obtuse to just generating code that it then explained. An extremely condensed version of that is what it generated as the README and that part clearly needs some work. In fact, it was a bit dismaying that when asked to condense it’s rich tutorial documentation into the concise README, it actually did a pretty poor job.

So far this is just using Basic Inform. Next up is to include the Standard Rules. Also this was only done on a Windows machine so far. The Mac version will be next. Although interestingly it did make some assumptions in that regard. For example, in the HTML directory it did include the macos-platform.css along with the windows-platform.css even when that wasn’t needed. But so far as I can see, that was the only extraneous inclusion.

This is still very much a work in progress but the initial results were far better than I expected. I would still like it to figure out how to generate a working series of Intest files in this repo. It’s been tasked with taking the existing tests in the Inform repository for this but it just finds the whole Delia concept entirely baffling.

Yet one more element of this will be a much more concise rendering of a basic usage of the toolchain. In parsing the Inweb, the AI had a series of suggestions on how to make the documentation much more prescriptive in an iterative way for learning. I’m curious if this will follow a pattern of how it massively condensed a lot of the “Writing with Inform” material to provide what it considered a much more prescriptive articulation of the key features of the Inform 7 language. We’ll see how that evolves!

If you read this far, I hope this was an interesting diversion in your regular Inform travels.

2 Likes

Excuse my ignorance, but what is Inweb and Intest?

I find it interesting that the AI found the Inform 7 documentation totally obtuse. I thought I was the only one, but I put it down to a lack of patience on my part. As a tech writer, I don’t like reading waffle, just give me the facts.

Of all the criticisms of Inform’s documentation, “a large language model did something unexpected with it” has got to be the least useful.

4 Likes

I was not aware that there were any criticisms of Inform’s documentation. If that’s the case, then we, as a community, could surely fix that.

Hey all. Sorry, I feel I might have sounded a bit snide in my comments so I want to rectify that.

What I more meant was that while the AI was able to do quite a bit with the Inweb and Inform documentation, it could do very, very little with the intest documentation.

For example, it could create working inweb examples and working Inform examples. But it couldn’t do squat with intest.

But, that said, when asked for where and when it was having trouble generating examples, it often cited that it was trying to correlate bits of documentation that seemed relevant together but were in fact spread out. And it is correct, up to a point. There is a bit of hop-scotching around you have to do to get the full picture. (Or even half of the picture.)

Anyway, what the AI model did was something similar it did with the standard documentation: take one running example and iteratively build out the instructions for how that example changes as more parts are introduced. So it started with a testing.i7 file as a standalone and then morphed that into a story.ni in a project, but still using Basic Inform.

I use CausalBERT as part of the model and that tries to understand causal relationships between what is being described. This did apply to the Inform 7 “Writing with Inform” documentation as well where the AI eventually just crafted a set of continuous examples based on recreating the initial rooms and objects inZork and – truth be told – it actually was quite a bit more concise, removed incidentals, and just focused on the core mechanics that were being explained at a given time.

This doesn’t mean that’s the “right” approach. Just that it was an interesting approach and one that is certainly distinct from how the current documentation is frame, which requires a lot of forward and backward referencing, often with examples that have a lot of “color” or “flavor” but that often threaten to obfuscate the key technique being described.

Now … all this said … part of what I’m doing here is that both bits of material will be used in the upcoming classes in January with an A/B testing approach. Some groups are going to have the standard documentation and others are going to have the AI generated documentation. A measure will be how quickly people seem to grasp the core concepts and how quickly they can build on those concepts. (With obvious qualifiers here that such measures have an element of subjectivity that has to be controlled for.)

So I hope this corrects a bit for where I might have sounded a bit flip. I realize a lot of work has gone into the documentation, as it does with any documentation for a complex system. My intent was not to belittle that work.

Ah, yes, context probably would have been helpful on my part, no?

Since Inform was open sourced, you can essentially create the whole ecosystem yourself locally. Inweb is the documentation mechanism (essentially the Literate Programming part). And Intest is a sort of test harness that allows you to specify tests to run against some system. In this context, there are a series of tests in the Inform context.

2 Likes

So your model couldn’t write tests? Interesting – I experimented with having ChatGPT write Python unit tests based on a spec, and it failed badly. All the while writing a mostly correct solution based on the same spec.

It can write tests if we consider Inform examples as tests. But it couldn’t find a way to leverage intest for this, even using the examples that are provided for how to use the tool.

As an example of what I mean by “Inform examples”, I asked it to provide some concise examples that would serve as tests for certain rule handling mechanics. It came up with this:

The Lab is a room.

A desk is in the Lab.
The desk is a container.

A drawer is part of the desk.
A panel is part of the desk.

That was the Given (or context). Then it came up with the following:

The can't take component parts rule does nothing.

Test me with "take drawer / take panel".

That was the When (key action) and Then (observable). I’m using a Gherkin-style description for this.

Then it would generate:

The can't take component parts rule does nothing when the noun is the drawer.

Test me with "take drawer / take panel".

Note it generated the same test but the expected results were different here. It also tried a variation with:

The can't take components parts rule is not listed in the check taking rules.

When asked to test replacing rules, it would generate something like this:

This is the can't remove component rule:
    say "Cannot remove component."

The can't remove component rule is listed instead of the can't take component parts rule in the check taking rules.

The prompting for this was based on running against a model that was trained on a tokenized and encoded representation of the documentation.

As another example, when asked to come up with a specific example test around visibility and touchability (essentially a test to understand scoping), here’s what it came up with:

The Inform Museum is a room.
The Interactive Exhibit Area is a room.
The Interactive Exhibit Area is north of the Inform Museum.

A pedestal is in the Interactive Exhibit Area.
A glass case is on the pedestal.
The glass case is closed and transparent.
A black rod is in the glass case.

An enterable container called a cage is in the Interactive Exhibit Area.
The cage is transparent and openable.

After deciding the scope of the player:
	if the location is the Inform Museum:
		place the pedestal in scope.

Rule for reaching inside a room:
	say "You can only see it from here.";
	deny access.

Instead of touching the glass case:
	say "It's smooth to the touch."

Rule for reaching inside a container:
	if the container is the glass case:
		allow access.

Test me with "actions / examine pedestal / examine glass case / touch glass case / north / examine glass case / touch glass case / take the black rod / enter cage / close cage / examine glass case / touch glass case".

Maybe not the best, but it’s not horrible. This actually was a second level iteration that it generated as the start of providing a tutorial to show concisely what these concepts meant.

Essentially, I follow the same logic as the Inform examples in the manual. The main difference is I ask the AI to generate minimal examples that don’t clutter up the technique being taught. So you’ll notice it doesn’t create descriptions for rooms in these because those are irrelevant. Likewise, it doesn’t create descriptions for objects unless that would be relevant for the test.